The Plumbing

Name: Mijingo, LLC
Price range: $

The plumbing are the low level commands that make up the Git system. They are the commands that do the, uh, dirty work, and make your repository track and manage files and changes.

Let’s look at the same stuff we just covered — blob and tree — but in terms of the low level commands that make this happen.

We’ll start off by creating a new directory for our project and initializing a fresh repository.

Let’s create a new website project for a plumber. We’ll very simply name this “plumber.”

$ mkdir plumber && cd plumber
$ git init .

What happens when we initialize a Git repository?

First, Git creates a .git directory. This hidden directory won’t show unless we listing out files and directories using the -a option.

$ ls -al

Now we can see it. There it is, the first part of the magic that is Git plumbing. Let’s see what is in that hidden directory.

$ cd .git
$ ls -al
	
	-rw-r--r--   1 ryan  staff   23 Apr 21 15:40 HEAD
	drwxr-xr-x   2 ryan  staff   68 Apr 21 15:40 branches/
	-rw-r--r--   1 ryan  staff  137 Apr 21 15:40 config
	-rw-r--r--   1 ryan  staff   73 Apr 21 15:40 description
	drwxr-xr-x  11 ryan  staff  374 Apr 21 15:40 hooks/
	drwxr-xr-x   3 ryan  staff  102 Apr 21 15:40 info/
	drwxr-xr-x   4 ryan  staff  136 Apr 21 15:40 objects/
	drwxr-xr-x   4 ryan  staff  136 Apr 21 15:40 refs/

This directory is the repository. Let’s dig in a little deeper.

The first one listed is HEAD. This is, as you might expect, a pointer to the current branch.

$ cat HEAD

  ref: refs/heads/master

This is how Git stores what HEAD is.

Next in the list is the branches directory. We’re going to skip this one because it’s “slightly deprecated way to store shorthands to be used to specify URL to git fetch, git pull and git push commands”. This will more likely than not remain empty.

Let’s move on.

Now let’s look at config.

$ cat config
	
	[core]
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true

These are some project-specific config settings. I’ve never used this and, let’s be honest, you probably won’t either. You’ll most likely set config options in the user-specific .gitconfig file that lives in your user directory or the one that lives in your project. One thing to note, however, is that this global config file overrides all others.

However, if you wanted to create configuration defaults for the project, this is the way to do it.

Next up is description. This is used by the git-instaweb which is a snappy way to create a local web server to interface with your Git repository.

The hooks directory is where git hooks are stored. If we ls it we can see some sample hooks listed. This is where you would place any custom hooks that you want to be part of the repository.

Moving on to the info directory now. If we look inside of it we get:

$ ls -al info
  -rw-r--r--   1 ryan  staff  240 Apr 21 15:40 exclude

The exclude file inside of info is a global set of patterns to define which files or directories you want to ignore in your project.

$ cat info/exclude
	
  # git ls-files --others --exclude-from=.git/info/exclude
  # Lines that start with '#' are comments.
  # For a project mostly in C, the following would be a good set of
  # exclude patterns (uncomment them if you want to use them):
  # *.[oa]
  # *~

You typically do this in your .gitignore file but you can also set those patterns here. It’s just a level deeper and perhaps there’s less of a chance that someone will mess with it.

Okay, moving on because we’re almost done reviewing these directories. Next up is the objects directory. This one is an important one because it stores all of your Git objects; this is your repository data.

Let’s see what is in there.

$ ls -al objects
	
  drwxr-xr-x   2 ryan  staff   68 Apr 21 15:40 info
  drwxr-xr-x   2 ryan  staff   68 Apr 21 15:40 pack

There are a couple of directories in there that we have to explore. The info directory is where “additional information about the object store is recorded.” I looked through some older repositories of my own and didn’t see anything listed in the info directory.

Next to info is the pack directory.

I don’t want to get too far into this — because we might never come back out — but the pack directory is where Git stores packs of objects as a binary file. These are called packfiles and used to keep disk usage to a minimum by combining multiple objects together that are similar. You can ask Git to create packfiles by running git gc (which stands for “git cleanup”) and will cleanup files and optimize the local copy of your repository.

If we run git gc in our new project, we get back a message like this:

  Nothing new to pack.

Let’s switch over to our The Commits project and run it there.

$ git gc

You should see output similar to this:

	Counting objects: 112, done.
	Delta compression using up to 4 threads.
	Compressing objects: 100% (62/62), done.
	Writing objects: 100% (112/112), done.
	Total 112 (delta 45), reused 112 (delta 45)

The objects directory in our new sample project is empty because we haven’t added anything yet. If you do the same exploring in an existing repository you will see a long list of objects, organized by directories.

Last, but not even close to least, is the refs directory.

The refs is the home of git references. They are organized in subdirectories. Hop over to the The Commits projects and let’s take a look.

$ cd .git/refs && ls -al
	
  rwxr-xr-x   2 ryan  staff   68 Apr 27 21:28 heads
  drwxr-xr-x   3 ryan  staff  102 Apr 25 22:54 remotes
  drwxr-xr-x   2 ryan  staff   68 Apr 25 22:54 tags

Inside of heads directory we have a listing of the local branches as files. Each branch file contains a commit hash that denotes the location of the tip of the branch.

We don’t see any heads in our Plumber project yet — not even one for master — because we don’t have any commits.

That’s a rundown of the directories and files in the .git directory. These will be in every project but with different contents. You probably won’t need them day-to-day but it is good to know that they’re there and what’s in theme.

Creating an Object #

Earlier we looked at the different types of data objects in Git. Now let’s create them by hand. Well, sort of.

We’ll start by creating a file in our Plumber repository. Most people create README.md files to get started. I think that’s boring. Let’s start with our index.html file and get this project rolling.

$ vim index.html

And then we can populate it with sample markup. Use exactly this markup, if you don’t mind. It’ll come in handy in a few minutes.

<!DOCTYPE html>
  <html>
    <head>
      <title>Plumber</title>
    </head>
    <body>
      <h1>Plumber</h1>
      <h2>A website dedicated to speeding up your websites.</h2>
    </body>
  </html>

Okay, now we have our file and we’re ready to add it to our Git repository so it can be stored and tracked by Git.

Normally, we’d do this:

$ git add index.html

to add the file to our repository staging area to be committed. But this time we want to forgo the porcelain commands and use the plumbing commands.

Instead we’ll first use a command call hash-object. This command takes a file an creates a blob out of it.

$ git hash-object -w index.html

The -w tells hash-object to write the new object. Without using the -w hash-object will only return what the object would look like if we created it.

If you used the exact markup I did above, you should get this back:

89e41503bb5f2f7366f7d8eecc7f41439fa1fe8d

That’s the hash id of the object. It’s created by generating an SHA1 hash of the file contents. This is only the contents, not the metadata of the file (the stuff that makes the file unique between our two computers). If you used the same exact markup I did you should also get this exact hash.

Files created on different computers with the same contents will always have the same hash in Git. This is one of the ways that Git can be so efficient in data storage.

Let’s make sure this is really a blob:

$ git cat-file -t 89e4150

That should return:

blob

Okay, so we have just created a blob object.

Feeling good?

Let’s continue.

Trees #

As we saw earlier when we reviewed trees and blobs in Git, tree objects can contain blobs. Let’s look at trees and blobs in an existing project.

Switch over to the The Commits project and go to the project root. Then run:

$ git cat-file -p master^{tree}

This command tells Git to return the tree object to which the master branch is currently pointing (based on the last commit). We specify the branch (master) and then that we want the tree object.

You should see something like this returned:

  100644 blob 496ee2ca6a2f08396a4076fe43dedf3dc0da8b6d    .gitignore
  040000 tree 6c0cbeabafbe200101bd2f763cef356bc272fe6d    images  
  100644 blob 9fd3fc38b1f79782968eaa11514919e588630a83    index.html
  040000 tree aa506736eb118ab6585410d8ca549dd84d7a9ab1    javascripts
  040000 tree 126bd44f234cd1feee3f6267cde75155223cf637    stylesheets

You can see that images, javascripts, and stylesheets are all tree objects and the other two files are blobs.

If we view the contents of the stylesheets tree object then we can see that it contains a series of blob objects.

$ git cat-file -p 126bd44f234cd1feee3f6267cde75155223cf637

This returns:

100644 blob 82c9b265bb471fb2470b82a902d938788987c927    app.css
100644 blob 3652bf55fa5342358951874225972d99886fb07a    foundation.css
100644 blob 5744060d94490921ba59cd8859750f46c872a1fb    foundation.min.css

The tree object points to three blob objects; these are our CSS files for this project.

Creating a `tree` Object

Just like created a blob object earlier, we can also create a tree object using a Git plumbing command. This one is called write-tree and does exactly what it says. It writes a tree object using the staged files.

To create a tree, Git takes the files from the index, creates objects from them. So, first we need to stage some files so Git has something to use to write the tree object.

Usually you’d use git-add for this or make a change to a tracked file, but we can use another plumbing command called update-index using the --add option.

update-index is a command that allows us to alter the repository index. As a refresher from earlier, the Index is the staging area where changes go before they are committed to the repository as commit objects.

Since we want to build a tree object, we need to first add some files to our index so Git can use them.

We’ll create a new file called about.htmland save a little markup in it.

$ vim about.html

Here’s the markup we’ll use:

	<!DOCTYPE html>
	<html>
	    <head>
	        <title>About Plumber</title>
	    </head>
	    <body>
	        <h1>Plumber</h1>
	        <h2>A website dedicated to speeding up your websites.</h2>
	
	    </body>
	</html>

And then add the new files to the repository index:

$ git update-index --add about.html index.html

If we look at the status, we’ll see the new file added:

$ git status
	
  new file:   about.html
  new file:   index.html

Now we can write that tree to the database:

$ git write-tree
	3fc239523266f3970efe5859cbf8fdf6992b5bbb

And we get back a hash of the tree contents.

If we look at the new tree object we just created, we see the two files listed, including their object IDs.

$ git cat-file -p 3fc239523266f3970efe5859cbf8fdf6992b5bbb
	100644 blob 21c8f80eb9fdd31eb356c3f112f2b3afda00add5    about.html
	100644 blob 89e41503bb5f2f7366f7d8eecc7f41439fa1fe8d    index.html

Let’s create another directory for CSS (called css) and then add a site.css file inside of it. We’ll then update the index and then write the tree object.

$ mkdir css
$ vim css/site.css

We’ll add a simple to the css/site.css file just so it has some content in it.

	/* This is my site CSS */

Save that and then we can check that it is seen by Git but not staged.

$ git status

Now we’re ready to update the index and add the site.css file (and the css directory) to our repository index.

$ git update-index --add css/site.css

A git-status will show that it has indeed been added to the index.

	Changes to be committed:
	(use "git rm --cached <file>..." to unstage)
	
	new file:   about.html
	new file:   css/site.css
	new file:   index.html

And now we can write the tree object again, which will pull in our changes to the index.

$ git write-tree 
	3de79856453c6ad7e510d825527a86330d994908

If we cat-file the newly created object, we’ll see the updated repository. This object is a tree object and it has changed the ID because we changed the contents of the tree. Therefore, Git creates a new hash since the contents of the tree changed.

$ git cat-file -p 3de79856453c6ad7e510d825527a86330d994908
	
	100644 blob 21c8f80eb9fdd31eb356c3f112f2b3afda00add5    about.html
	040000 tree 3b2baef730ebbc725f1772c94bbe8348dc6b7b9d    css
	100644 blob 89e41503bb5f2f7366f7d8eecc7f41439fa1fe8d    index.html

Now we have a new tree (css) inside of the main tree object. We can also cat-file the css directory/tree and see what it has inside.

$ git cat-file -p 3b2baef730ebbc725f1772c94bbe8348dc6b7b9d
100644 blob b9bb7b6e28dc3ba00d2d353dea71f7afcb5f10a6    site.css

And it returns the site.css file type, the hash and the file name.

Committing the Tree Object

Now we’re ready to commit the changes in the index and create our commit objects. We do this using commit-tree a Git plumbing command that take a single tree SHA‑1 and, optionally, any previous commit objects.

$ git commit-tree 3fc239523266f3970efe5859cbf8fdf6992b5bbb -m "adding first commit"
  055bd26f599db2bebf0bd75c9d8859e4d2dc534c

Since this is the first commit we are doing in this repository, we don’t need to include the previous commit ID (the parent). But we do want to include a commit message using the -m flag that you’re probably familiar with from using git-commit.

commit-tree return a commit object hash that is created using the contents of the commit.

Pointing HEAD to Latest Commit #

But now if we check our Git status again, we still see the files staged. What’s going on?

The problem here is that HEAD and the branch have not been updated to include this new commit we just created. This is called, quite awkwardly I might add, a dangling commit. It’s one that doesn’t belong to any branch.

Let’s fix that by adding it to the master branch that we’re working from. This will clear the staging and area and allow us to see the commit in our git-log output.

Git stores the branch information in the refs directory that we looked at earlier. We need to edit (or create if it’s not created yet) .git/refs/heads/master and add point HEAD to the current commit so it represents the latest state of the repository.

Let’s open the master file in a text editor. This time I’ll use Vim, but you can use whatever editor is most convenient for you.

$ vim ./git/refs/heads/master

Inside of the file, add the latest commit object ID to the top (or replace it with the HEAD path that is there):

	055bd26f599db2bebf0bd75c9d8859e4d2dc534c

Save the file and then check your work.

$ git status

You should get back:

  On branch master
  nothing to commit, working directory clean

Look in the log file to see if the latest commit is now there:

$ git log
	
  commit 055bd26f599db2bebf0bd75c9d8859e4d2dc534c
  Author: Ryan Irelan <[email protected]>
  Date:   Wed Apr 29 15:15:09 2015 -0500
	
  adding first tree

And now we just recreated adding, staging, and committing changes to our Git repository using lower level plumbing commands.

Pretty nice that we don’t always have to do that, right?

Wrap-up #

While I don’t encourage you to interact with your Git repository using these lower level commands on a daily basis, it could come in handy if you need to identify a problem or issue.

You know, it’s also nice just to understand how exactly the tool works, even if you don’t ever again use it in this way.

The Plumbing

Cre­at­ing an Object #

Trees #

Cre­at­ing a tree Object

Com­mit­ting the Tree Object

Point­ing HEAD to Lat­est Com­mit #

Wrap-up #

Creating an Object #

Creating a `tree` Object

Committing the Tree Object

Pointing HEAD to Latest Commit #