The Plumbing

The plumb­ing are the low lev­el com­mands that make up the Git sys­tem. They are the com­mands that do the, uh, dirty work, and make your repos­i­to­ry track and man­age files and changes.

Let’s look at the same stuff we just cov­ered — blob and tree — but in terms of the low lev­el com­mands that make this happen.

We’ll start off by cre­at­ing a new direc­to­ry for our project and ini­tial­iz­ing a fresh repository.

Let’s cre­ate a new web­site project for a plumber. We’ll very sim­ply name this plumber.”

$ mkdir plumber && cd plumber
$ git init .

What hap­pens when we ini­tial­ize a Git repository?

First, Git cre­ates a .git direc­to­ry. This hid­den direc­to­ry won’t show unless we list­ing out files and direc­to­ries using the -a option. 

$ ls -al

Now we can see it. There it is, the first part of the mag­ic that is Git plumb­ing. Let’s see what is in that hid­den directory.

$ cd .git
$ ls -al
	-rw-r--r--   1 ryan  staff   23 Apr 21 15:40 HEAD
	drwxr-xr-x   2 ryan  staff   68 Apr 21 15:40 branches/
	-rw-r--r--   1 ryan  staff  137 Apr 21 15:40 config
	-rw-r--r--   1 ryan  staff   73 Apr 21 15:40 description
	drwxr-xr-x  11 ryan  staff  374 Apr 21 15:40 hooks/
	drwxr-xr-x   3 ryan  staff  102 Apr 21 15:40 info/
	drwxr-xr-x   4 ryan  staff  136 Apr 21 15:40 objects/
	drwxr-xr-x   4 ryan  staff  136 Apr 21 15:40 refs/

This direc­to­ry is the repos­i­to­ry. Let’s dig in a lit­tle deeper.

The first one list­ed is HEAD. This is, as you might expect, a point­er to the cur­rent branch.

$ cat HEAD

  ref: refs/heads/master

This is how Git stores what HEAD is. 

Next in the list is the branches direc­to­ry. We’re going to skip this one because it’s slight­ly dep­re­cat­ed way to store short­hands to be used to spec­i­fy URL to git fetch, git pull and git push com­mands”. This will more like­ly than not remain empty. 

Let’s move on.

Now let’s look at config.

$ cat config
	repositoryformatversion = 0
	filemode = true
	bare = false
	logallrefupdates = true
	ignorecase = true
	precomposeunicode = true

These are some project-spe­cif­ic con­fig set­tings. I’ve nev­er used this and, let’s be hon­est, you prob­a­bly won’t either. You’ll most like­ly set con­fig options in the user-spe­cif­ic .gitconfig file that lives in your user direc­to­ry or the one that lives in your project. One thing to note, how­ev­er, is that this glob­al con­fig file over­rides all others.

How­ev­er, if you want­ed to cre­ate con­fig­u­ra­tion defaults for the project, this is the way to do it.

Next up is description. This is used by the git-instaweb which is a snap­py way to cre­ate a local web serv­er to inter­face with your Git repository. 

The hooks direc­to­ry is where git hooks are stored. If we ls it we can see some sam­ple hooks list­ed. This is where you would place any cus­tom hooks that you want to be part of the repository.

Mov­ing on to the info direc­to­ry now. If we look inside of it we get:

$ ls -al info
  -rw-r--r--   1 ryan  staff  240 Apr 21 15:40 exclude

The exclude file inside of info is a glob­al set of pat­terns to define which files or direc­to­ries you want to ignore in your project. 

$ cat info/exclude
  # git ls-files --others --exclude-from=.git/info/exclude
  # Lines that start with '#' are comments.
  # For a project mostly in C, the following would be a good set of
  # exclude patterns (uncomment them if you want to use them):
  # *.[oa]
  # *~

You typ­i­cal­ly do this in your .gitignore file but you can also set those pat­terns here. It’s just a lev­el deep­er and per­haps there’s less of a chance that some­one will mess with it.

Okay, mov­ing on because we’re almost done review­ing these direc­to­ries. Next up is the objects direc­to­ry. This one is an impor­tant one because it stores all of your Git objects; this is your repos­i­to­ry data.

Let’s see what is in there.

$ ls -al objects
  drwxr-xr-x   2 ryan  staff   68 Apr 21 15:40 info
  drwxr-xr-x   2 ryan  staff   68 Apr 21 15:40 pack

There are a cou­ple of direc­to­ries in there that we have to explore. The info direc­to­ry is where addi­tion­al infor­ma­tion about the object store is record­ed.” I looked through some old­er repos­i­to­ries of my own and didn’t see any­thing list­ed in the info directory.

Next to info is the pack directory.

I don’t want to get too far into this — because we might nev­er come back out — but the pack direc­to­ry is where Git stores packs of objects as a bina­ry file. These are called pack­files and used to keep disk usage to a min­i­mum by com­bin­ing mul­ti­ple objects togeth­er that are sim­i­lar. You can ask Git to cre­ate pack­files by run­ning git gc (which stands for git cleanup”) and will cleanup files and opti­mize the local copy of your repos­i­to­ry.

If we run git gc in our new project, we get back a mes­sage like this:

  Nothing new to pack.

Let’s switch over to our The Com­mits project and run it there.

$ git gc

You should see out­put sim­i­lar to this:

	Counting objects: 112, done.
	Delta compression using up to 4 threads.
	Compressing objects: 100% (62/62), done.
	Writing objects: 100% (112/112), done.
	Total 112 (delta 45), reused 112 (delta 45)

The objects direc­to­ry in our new sam­ple project is emp­ty because we haven’t added any­thing yet. If you do the same explor­ing in an exist­ing repos­i­to­ry you will see a long list of objects, orga­nized by directories.

Last, but not even close to least, is the refs directory.

The refs is the home of git ref­er­ences. They are orga­nized in sub­di­rec­to­ries. Hop over to the The Com­mits projects and let’s take a look.

$ cd .git/refs && ls -al
  rwxr-xr-x   2 ryan  staff   68 Apr 27 21:28 heads
  drwxr-xr-x   3 ryan  staff  102 Apr 25 22:54 remotes
  drwxr-xr-x   2 ryan  staff   68 Apr 25 22:54 tags

Inside of heads direc­to­ry we have a list­ing of the local branch­es as files. Each branch file con­tains a com­mit hash that denotes the loca­tion of the tip of the branch.

We don’t see any heads in our Plumber project yet — not even one for mas­ter — because we don’t have any commits.

That’s a run­down of the direc­to­ries and files in the .git direc­to­ry. These will be in every project but with dif­fer­ent con­tents. You prob­a­bly won’t need them day-to-day but it is good to know that they’re there and what’s in theme.

Cre­at­ing an Object #

Ear­li­er we looked at the dif­fer­ent types of data objects in Git. Now let’s cre­ate them by hand. Well, sort of.

We’ll start by cre­at­ing a file in our Plumber repos­i­to­ry. Most peo­ple cre­ate files to get start­ed. I think that’s bor­ing. Let’s start with our index.html file and get this project rolling.

$ vim index.html

And then we can pop­u­late it with sam­ple markup. Use exact­ly this markup, if you don’t mind. It’ll come in handy in a few minutes.

<!DOCTYPE html>
      <h2>A website dedicated to speeding up your websites.</h2>

Okay, now we have our file and we’re ready to add it to our Git repos­i­to­ry so it can be stored and tracked by Git.

Nor­mal­ly, we’d do this:

$ git add index.html

to add the file to our repos­i­to­ry stag­ing area to be com­mit­ted. But this time we want to for­go the porce­lain com­mands and use the plumb­ing commands.

Instead we’ll first use a com­mand call hash-object. This com­mand takes a file an cre­ates a blob out of it.

$ git hash-object -w index.html

The -w tells hash-object to write the new object. With­out using the -whash-object will only return what the object would look like if we cre­at­ed it. 

If you used the exact markup I did above, you should get this back:


That’s the hash id of the object. It’s cre­at­ed by gen­er­at­ing an SHA1 hash of the file con­tents. This is only the con­tents, not the meta­da­ta of the file (the stuff that makes the file unique between our two com­put­ers). If you used the same exact markup I did you should also get this exact hash.

Files cre­at­ed on dif­fer­ent com­put­ers with the same con­tents will always have the same hash in Git. This is one of the ways that Git can be so effi­cient in data storage.

Let’s make sure this is real­ly a blob:

$ git cat-file -t 89e4150

That should return:


Okay, so we have just cre­at­ed a blob object.

Feel­ing good? 

Let’s con­tin­ue.

Trees #

As we saw ear­li­er when we reviewed trees and blobs in Git, tree objects can con­tain blobs. Let’s look at trees and blobs in an exist­ing project.

Switch over to the The Com­mits project and go to the project root. Then run:

$ git cat-file -p master^{tree}

This com­mand tells Git to return the tree object to which the mas­ter branch is cur­rent­ly point­ing (based on the last com­mit). We spec­i­fy the branch (mas­ter) and then that we want the tree object.

You should see some­thing like this returned:

  100644 blob 496ee2ca6a2f08396a4076fe43dedf3dc0da8b6d    .gitignore
  040000 tree 6c0cbeabafbe200101bd2f763cef356bc272fe6d    images  
  100644 blob 9fd3fc38b1f79782968eaa11514919e588630a83    index.html
  040000 tree aa506736eb118ab6585410d8ca549dd84d7a9ab1    javascripts
  040000 tree 126bd44f234cd1feee3f6267cde75155223cf637    stylesheets

You can see that images, javascripts, and stylesheets are all tree objects and the oth­er two files are blobs.

If we view the con­tents of the stylesheets tree object then we can see that it con­tains a series of blob objects. 

$ git cat-file -p 126bd44f234cd1feee3f6267cde75155223cf637

This returns:

100644 blob 82c9b265bb471fb2470b82a902d938788987c927    app.css
100644 blob 3652bf55fa5342358951874225972d99886fb07a    foundation.css
100644 blob 5744060d94490921ba59cd8859750f46c872a1fb    foundation.min.css

The tree object points to three blob objects; these are our CSS files for this project.

Cre­at­ing a tree Object

Just like cre­at­ed a blob object ear­li­er, we can also cre­ate a tree object using a Git plumb­ing com­mand. This one is called write-tree and does exact­ly what it says. It writes a tree object using the staged files.

To cre­ate a tree, Git takes the files from the index, cre­ates objects from them. So, first we need to stage some files so Git has some­thing to use to write the tree object.

Usu­al­ly you’d use git-add for this or make a change to a tracked file, but we can use anoth­er plumb­ing com­mand called update-index using the --add option.

update-index is a com­mand that allows us to alter the repos­i­to­ry index. As a refresh­er from ear­li­er, the Index is the stag­ing area where changes go before they are com­mit­ted to the repos­i­to­ry as com­mit objects.

Since we want to build a tree object, we need to first add some files to our index so Git can use them.

We’ll cre­ate a new file called about.htmland save a lit­tle markup in it. 

$ vim about.html

Here’s the markup we’ll use:

	<!DOCTYPE html>
	        <title>About Plumber</title>
	        <h2>A website dedicated to speeding up your websites.</h2>

And then add the new files to the repos­i­to­ry index:

$ git update-index --add about.html index.html

If we look at the sta­tus, we’ll see the new file added:

$ git status
  new file:   about.html
  new file:   index.html

Now we can write that tree to the database:

$ git write-tree

And we get back a hash of the tree contents.

If we look at the new tree object we just cre­at­ed, we see the two files list­ed, includ­ing their object IDs.

$ git cat-file -p 3fc239523266f3970efe5859cbf8fdf6992b5bbb
	100644 blob 21c8f80eb9fdd31eb356c3f112f2b3afda00add5    about.html
	100644 blob 89e41503bb5f2f7366f7d8eecc7f41439fa1fe8d    index.html

Let’s cre­ate anoth­er direc­to­ry for CSS (called css) and then add a site.css file inside of it. We’ll then update the index and then write the tree object.

$ mkdir css
$ vim css/site.css

We’ll add a sim­ple to the css/site.css file just so it has some con­tent in it.

	/* This is my site CSS */

Save that and then we can check that it is seen by Git but not staged.

$ git status

Now we’re ready to update the index and add the site.css file (and the css direc­to­ry) to our repos­i­to­ry index.

$ git update-index --add css/site.css

A git-status will show that it has indeed been added to the index.

	Changes to be committed:
	(use "git rm --cached <file>..." to unstage)
	new file:   about.html
	new file:   css/site.css
	new file:   index.html  

And now we can write the tree object again, which will pull in our changes to the index.

$ git write-tree 

If we cat-file the new­ly cre­at­ed object, we’ll see the updat­ed repos­i­to­ry. This object is a tree object and it has changed the ID because we changed the con­tents of the tree. There­fore, Git cre­ates a new hash since the con­tents of the tree changed.

$ git cat-file -p 3de79856453c6ad7e510d825527a86330d994908
	100644 blob 21c8f80eb9fdd31eb356c3f112f2b3afda00add5    about.html
	040000 tree 3b2baef730ebbc725f1772c94bbe8348dc6b7b9d    css
	100644 blob 89e41503bb5f2f7366f7d8eecc7f41439fa1fe8d    index.html

Now we have a new tree (css) inside of the main tree object. We can also cat-file the css directory/​tree and see what it has inside.

$ git cat-file -p 3b2baef730ebbc725f1772c94bbe8348dc6b7b9d
100644 blob b9bb7b6e28dc3ba00d2d353dea71f7afcb5f10a6    site.css

And it returns the site.css file type, the hash and the file name.

Com­mit­ting the Tree Object

Now we’re ready to com­mit the changes in the index and cre­ate our com­mit objects. We do this using commit-tree a Git plumb­ing com­mand that take a sin­gle tree SHA1 and, option­al­ly, any pre­vi­ous com­mit objects. 

$ git commit-tree 3fc239523266f3970efe5859cbf8fdf6992b5bbb -m "adding first commit"

Since this is the first com­mit we are doing in this repos­i­to­ry, we don’t need to include the pre­vi­ous com­mit ID (the par­ent). But we do want to include a com­mit mes­sage using the -m flag that you’re prob­a­bly famil­iar with from using git-commit.

commit-tree return a com­mit object hash that is cre­at­ed using the con­tents of the commit.

Point­ing HEAD to Lat­est Com­mit #

But now if we check our Git sta­tus again, we still see the files staged. What’s going on?

The prob­lem here is that HEAD and the branch have not been updat­ed to include this new com­mit we just cre­at­ed. This is called, quite awk­ward­ly I might add, a dan­gling com­mit. It’s one that doesn’t belong to any branch. 

Let’s fix that by adding it to the master branch that we’re work­ing from. This will clear the stag­ing and area and allow us to see the com­mit in our git-log output.

Git stores the branch infor­ma­tion in the refs direc­to­ry that we looked at ear­li­er. We need to edit (or cre­ate if it’s not cre­at­ed yet) .git/refs/heads/master and add point HEAD to the cur­rent com­mit so it rep­re­sents the lat­est state of the repository.

Let’s open the master file in a text edi­tor. This time I’ll use Vim, but you can use what­ev­er edi­tor is most con­ve­nient for you.

$ vim ./git/refs/heads/master

Inside of the file, add the lat­est com­mit object ID to the top (or replace it with the HEAD path that is there):


Save the file and then check your work. 

$ git status

You should get back:

  On branch master
  nothing to commit, working directory clean

Look in the log file to see if the lat­est com­mit is now there:

$ git log
  commit 055bd26f599db2bebf0bd75c9d8859e4d2dc534c
  Author: Ryan Irelan <[email protected]>
  Date:   Wed Apr 29 15:15:09 2015 -0500
  adding first tree

And now we just recre­at­ed adding, stag­ing, and com­mit­ting changes to our Git repos­i­to­ry using low­er lev­el plumb­ing commands.

Pret­ty nice that we don’t always have to do that, right?

Wrap-up #

While I don’t encour­age you to inter­act with your Git repos­i­to­ry using these low­er lev­el com­mands on a dai­ly basis, it could come in handy if you need to iden­ti­fy a prob­lem or issue. 

You know, it’s also nice just to under­stand how exact­ly the tool works, even if you don’t ever again use it in this way.