Introduction and Background

Do today’s web applications truly exploit the facilities provided by the cloud? Not many do. Even though increasingly, web applications are being deployed on the cloud, they do not exploit the full power of the cloud.
Most of them use traditonal (in premise) apis and infrastructure for development as well as deployment.
Meanwhile, the world has moved on, and exciting new cloud based apis and deployment options are becoming available. Heroku is a cloud deployment platform built on top of Amazon Web services that enforces standard development practices for developing and deploying applications on the cloud. Using cloud based APIs to develop, and then using Heroku for deployment is now becoming the rule.
Experienced programmers who have not used Heroku and Github may require a quick background to the architecture and interplay between these systems. This document attempts to provide that.

Structure of this document

Git background and its differences with other conventional Version Control systems are described first.

A model of Heroku with enough details to operate it is described. The interconnections of Heroku and Git is described. Way to setup Gradeables on Heroku for the first and subsequent times is described Finally, a deployment architecture and set of processes is proposed.

Git Essentials

Git is a distributed version control system (DVCS). It is different from earlier Version control systems in the sense that it can work in a disconnected mode and the entire repository is stored in the client machine. This section takes material from http://git-scm.com/book, with references to corresponding chapters. Where some point is not completely clear, reader can check the corresponding chapter in the book.

Structure of Git and Differences with conventional version control systems

Entire repository is copied

Conventional systems store all the different source code versions and metadata on the server (called the repository). Git stores the entire repository on each client as shown in the diagram below.

Because of this, clients do not checkout code from the server they clone code from the server. When a repository (say Gradeables) is cloned on the local machine

A directory with name of the repository (Gradeables) is created
A hidden subdirectory called .git is created (Gradeables/.git), the entire repository with version differences is stored here
A checkout is performed into the top level directory and the latest version of the source code is available here.

Versions are stored as snapshots and not as deltas

As the following diagrams illustrate, versions are stored as snapshots in Git, not as deltas

Versions in CVS

Versions in Git

As can be seen from the pictures above, whenever a new version is created in a conventional version control system the delta, or difference with previous version is stored. With Git, and entirely new directory structure is created for each version, all unchanged files are stored as links to the previous versions, while for changed files a copy is made.

Checkins are local

In Git all checkins done via git commit are local and do not change the remote server. Since all checkins, checkouts, modifications and commits are local, the central Git server is usually termed a remote

States of git files

A file is in state (untracked) when it is first locally created. After adding the file to the local git repository using the command git add, the file goes to unmodified state. If changes are made to the file, then it goes to modified state. Using the git command git add <filename>, moves it into a special state called staged. On committing the file using the git commit command, the file goes into unmodified state and changes become part of the local repository.

All commits are local. For changing the common remote server, git remote push command is required. To pull latest versions from remote server, git pull command is required.

The standard practice for using git is

Developer first initializes the repository, adds files and pushes to git remote server(usually github.com)
Developers create a clone of the remote repository using the clone command
Developers make changes
When developer has finished with changes, the code needs to be pushed to the remote server. For this
- Developer first commits changes locally
- Pulls latest changes from remote server using git pull (not clone), in case other developers have made changes to the same files. In cases of conflict, developer resolves them.
- Developer pushes the changes to the remote server.

Git Server

Git can be installed as a server, which can be used by multiple developers to collaborate. Github.com is the most popular git server, but Git can be installed as a server on any machine. When directories are pushed to Git server, only the .git directory is stored, the current working copy is not stored.

Git Server Hooks

Git server provides a facility to run server side scripts when a push is initiated or after a push is complete. This is very important in the context of heroku because heroku uses this feature to run deployment scripts.

Working with Git – common commands

Cloning a repository
Making changes
Updating server with changes
Reverting
Branching
Merging

Heroku Essentials

This section contains excerpts from the book “Hacker’s guide to Heroku”, an excellent guide to Heroku. Please refer this book for more detailed information on Heroku.
Heroku is a cloud based deployment system which abstracts best practices for deploying SAAS applications. It runs on top of Amazon Web Services and provides facilities to easily deploy and massively scale applications.

Heroku components

The following diagram illustrates Heroku components.

Users can create accounts in Heroku. Once an account is created, they can create applications on Heroku. Whenever an application is created, a corresponding git repository is automatically created. This repository is created within a Git Server running on the heroku system.
Whenever an application is created, a virtual server running ubuntu (512 MB RAM, 4 CPU Cores) is created along with it. This virtual server is termed a Dyno (short for Dynosaur). Heroku provides the facility to create more Dynos on demand. Heroku takes care of load balancing and directing requests to corresponding dynos.
Whenever an application is deployed to Heroku, Heroku runs a deployment script called the Slug compiler, and creates a binary version of the deployment – this is termed a Slug.
Whenever a new Dyno is created/provisioned, Heroku automatically deploys the slug on the Dyno and makes it ready to run.
Heroku provides a set of infrastructure components – The database component creates instances of database servers, Cloudmail provides the facility to receive emails using specified email addresses and forward them to the application, whereas Memcache provides a cache facility. Heroku provides the facility to attach infrastructure components to applications.

Interacting with Heroku

Users would need to interact with Heroku to

Configure application
Attach infrastructure components to application
Push and pull source code from/to Heroku
Deploy application to Heroku
Configure runtime Dynos
Dial into a runtime Dyno to check health, logs etc

Heroku provides a set of client tools called the heroku toolbelt for interacting with heroku. When the heroku toolbelt is created, the heroku command is available.
The following are the most important commands

heroku login– to login to heroku. A heroku account is required
heroku accounts– to switch between accounts. Plugin heroku:accounts needs to be installed
heroku apps– app related commands. Most important app commands are create and destroy
heroku config– to set environment variables in the Dynos
heroku run bash– this is to open a shell terminal to the virtual server (Dyno). A file system consisting only of the application files are displayed, and many shell commands can be executed here
heroku run console– A rails console is opened and debugging can be done from this console

Heroku Git Interplay

How are applications pushed to the git repository within Heroku? After pushing an app to Heroku, how does it get automagically deployed?
Whenever an application is created in Heroku, it automatically creates a corresponding git repository for it, and the url to that repository is returned.
After the developer finishes development, this application can be pushed to the git repository using a git push heroku command.
Heroku intercepts a git push by using the Githooks mechanism, by hooking up a language specific script called a Buildpack to Git. As soon as push completes, the buildpack is run. It essentially

runs a language specific slug compiler and creats a slug (or binary)
deploys it in all the Dynos (virtual machines) configured for the application
Deployment can be customized by providing a Proc file in the root directory of the application. An example Procfile could contain the line web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb -E $RACK_ENV

Deployment Steps

First Time

The following steps (and commands) need to be executed for the first time. A heroku account needs to be created by visiting heroku.com and the heroku client toolbelt installed.

Clone existing code from github.com
Login to heroku
Create an application
Push code to heroku
Set configuration variables
- Environment – staging or production
- Gmail credentials
Import data into the application database
Change some parameters in configuration files for css (once system is finetuned, this should go away)

System is ready to go!

Subsequent times

Pull code from heroku
Make changes
Push code to heroku
Import data into the application database (if data has changed)

References

Git book – http://git-scm.com/book
Heroku book – http://www.theherokuhackersguide.com/
Article on Heroku push deployment – http://www.jamesward.com/2012/07/18/the-magic-behind-herokus-git-push-deployment
Buildpacks – https://devcenter.heroku.com/articles/buildpacks

Deploying web applications on the cloud with github and heroku