The Plumbing
The plumbing are the low level commands that make up the Git system. They are the commands that do the, uh, dirty work, and make your repository track and manage files and changes.
Let’s look at the same stuff we just covered — blob and tree — but in terms of the low level commands that make this happen.
We’ll start off by creating a new directory for our project and initializing a fresh repository.
Let’s create a new website project for a plumber. We’ll very simply name this “plumber.”
$ mkdir plumber && cd plumber
$ git init .
What happens when we initialize a Git repository?
First, Git creates a .git
directory. This hidden directory won’t show unless we listing out files and directories using the -a
option.
$ ls -al
Now we can see it. There it is, the first part of the magic that is Git plumbing. Let’s see what is in that hidden directory.
$ cd .git
$ ls -al
-rw-r--r-- 1 ryan staff 23 Apr 21 15:40 HEAD
drwxr-xr-x 2 ryan staff 68 Apr 21 15:40 branches/
-rw-r--r-- 1 ryan staff 137 Apr 21 15:40 config
-rw-r--r-- 1 ryan staff 73 Apr 21 15:40 description
drwxr-xr-x 11 ryan staff 374 Apr 21 15:40 hooks/
drwxr-xr-x 3 ryan staff 102 Apr 21 15:40 info/
drwxr-xr-x 4 ryan staff 136 Apr 21 15:40 objects/
drwxr-xr-x 4 ryan staff 136 Apr 21 15:40 refs/
This directory is the repository. Let’s dig in a little deeper.
The first one listed is HEAD
. This is, as you might expect, a pointer to the current branch.
$ cat HEAD
ref: refs/heads/master
This is how Git stores what HEAD is.
Next in the list is the branches
directory. We’re going to skip this one because it’s “slightly deprecated way to store shorthands to be used to specify URL to git fetch, git pull and git push commands”. This will more likely than not remain empty.
Let’s move on.
Now let’s look at config
.
$ cat config
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true
These are some project-specific config settings. I’ve never used this and, let’s be honest, you probably won’t either. You’ll most likely set config options in the user-specific .gitconfig
file that lives in your user directory or the one that lives in your project. One thing to note, however, is that this global config file overrides all others.
However, if you wanted to create configuration defaults for the project, this is the way to do it.
Next up is description
. This is used by the git-instaweb
which is a snappy way to create a local web server to interface with your Git repository.
The hooks
directory is where git hooks are stored. If we ls
it we can see some sample hooks listed. This is where you would place any custom hooks that you want to be part of the repository.
Moving on to the info
directory now. If we look inside of it we get:
$ ls -al info
-rw-r--r-- 1 ryan staff 240 Apr 21 15:40 exclude
The exclude file inside of info
is a global set of patterns to define which files or directories you want to ignore in your project.
$ cat info/exclude
# git ls-files --others --exclude-from=.git/info/exclude
# Lines that start with '#' are comments.
# For a project mostly in C, the following would be a good set of
# exclude patterns (uncomment them if you want to use them):
# *.[oa]
# *~
You typically do this in your .gitignore
file but you can also set those patterns here. It’s just a level deeper and perhaps there’s less of a chance that someone will mess with it.
Okay, moving on because we’re almost done reviewing these directories. Next up is the objects
directory. This one is an important one because it stores all of your Git objects; this is your repository data.
Let’s see what is in there.
$ ls -al objects
drwxr-xr-x 2 ryan staff 68 Apr 21 15:40 info
drwxr-xr-x 2 ryan staff 68 Apr 21 15:40 pack
There are a couple of directories in there that we have to explore. The info
directory is where “additional information about the object store is recorded.” I looked through some older repositories of my own and didn’t see anything listed in the info
directory.
Next to info
is the pack
directory.
I don’t want to get too far into this — because we might never come back out — but the pack
directory is where Git stores packs of objects as a binary file. These are called packfiles and used to keep disk usage to a minimum by combining multiple objects together that are similar. You can ask Git to create packfiles by running git gc
(which stands for “git cleanup”) and will cleanup files and optimize the local copy of your repository.
If we run git gc
in our new project, we get back a message like this:
Nothing new to pack.
Let’s switch over to our The Commits project and run it there.
$ git gc
You should see output similar to this:
Counting objects: 112, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (62/62), done.
Writing objects: 100% (112/112), done.
Total 112 (delta 45), reused 112 (delta 45)
The objects
directory in our new sample project is empty because we haven’t added anything yet. If you do the same exploring in an existing repository you will see a long list of objects, organized by directories.
Last, but not even close to least, is the refs
directory.
The refs
is the home of git references. They are organized in subdirectories. Hop over to the The Commits projects and let’s take a look.
$ cd .git/refs && ls -al
rwxr-xr-x 2 ryan staff 68 Apr 27 21:28 heads
drwxr-xr-x 3 ryan staff 102 Apr 25 22:54 remotes
drwxr-xr-x 2 ryan staff 68 Apr 25 22:54 tags
Inside of heads
directory we have a listing of the local branches as files. Each branch file contains a commit hash that denotes the location of the tip of the branch.
We don’t see any heads in our Plumber project yet — not even one for master — because we don’t have any commits.
That’s a rundown of the directories and files in the .git
directory. These will be in every project but with different contents. You probably won’t need them day-to-day but it is good to know that they’re there and what’s in theme.
Creating an Object #
Earlier we looked at the different types of data objects in Git. Now let’s create them by hand. Well, sort of.
We’ll start by creating a file in our Plumber repository. Most people create README.md
files to get started. I think that’s boring. Let’s start with our index.html
file and get this project rolling.
$ vim index.html
And then we can populate it with sample markup. Use exactly this markup, if you don’t mind. It’ll come in handy in a few minutes.
<!DOCTYPE html>
<html>
<head>
<title>Plumber</title>
</head>
<body>
<h1>Plumber</h1>
<h2>A website dedicated to speeding up your websites.</h2>
</body>
</html>
Okay, now we have our file and we’re ready to add it to our Git repository so it can be stored and tracked by Git.
Normally, we’d do this:
$ git add index.html
to add the file to our repository staging area to be committed. But this time we want to forgo the porcelain commands and use the plumbing commands.
Instead we’ll first use a command call hash-object
. This command takes a file an creates a blob out of it.
$ git hash-object -w index.html
The -w
tells hash-object
to write the new object. Without using the -w
hash-object
will only return what the object would look like if we created it.
If you used the exact markup I did above, you should get this back:
89e41503bb5f2f7366f7d8eecc7f41439fa1fe8d
That’s the hash id of the object. It’s created by generating an SHA1 hash of the file contents. This is only the contents, not the metadata of the file (the stuff that makes the file unique between our two computers). If you used the same exact markup I did you should also get this exact hash.
Files created on different computers with the same contents will always have the same hash in Git. This is one of the ways that Git can be so efficient in data storage.
Let’s make sure this is really a blob:
$ git cat-file -t 89e4150
That should return:
blob
Okay, so we have just created a blob object.
Feeling good?
Let’s continue.
Trees #
As we saw earlier when we reviewed trees and blobs in Git, tree objects can contain blobs. Let’s look at trees and blobs in an existing project.
Switch over to the The Commits project and go to the project root. Then run:
$ git cat-file -p master^{tree}
This command tells Git to return the tree object to which the master branch is currently pointing (based on the last commit). We specify the branch (master) and then that we want the tree object.
You should see something like this returned:
100644 blob 496ee2ca6a2f08396a4076fe43dedf3dc0da8b6d .gitignore
040000 tree 6c0cbeabafbe200101bd2f763cef356bc272fe6d images
100644 blob 9fd3fc38b1f79782968eaa11514919e588630a83 index.html
040000 tree aa506736eb118ab6585410d8ca549dd84d7a9ab1 javascripts
040000 tree 126bd44f234cd1feee3f6267cde75155223cf637 stylesheets
You can see that images
, javascripts
, and stylesheets
are all tree objects and the other two files are blobs.
If we view the contents of the stylesheets
tree object then we can see that it contains a series of blob objects.
$ git cat-file -p 126bd44f234cd1feee3f6267cde75155223cf637
This returns:
100644 blob 82c9b265bb471fb2470b82a902d938788987c927 app.css
100644 blob 3652bf55fa5342358951874225972d99886fb07a foundation.css
100644 blob 5744060d94490921ba59cd8859750f46c872a1fb foundation.min.css
The tree
object points to three blob
objects; these are our CSS files for this project.
Creating a tree
Object
Just like created a blob
object earlier, we can also create a tree
object using a Git plumbing command. This one is called write-tree
and does exactly what it says. It writes a tree
object using the staged files.
To create a tree, Git takes the files from the index, creates objects from them. So, first we need to stage some files so Git has something to use to write the tree
object.
Usually you’d use git-add
for this or make a change to a tracked file, but we can use another plumbing command called update-index
using the --add
option.
update-index
is a command that allows us to alter the repository index. As a refresher from earlier, the Index is the staging area where changes go before they are committed to the repository as commit objects.
Since we want to build a tree object, we need to first add some files to our index so Git can use them.
We’ll create a new file called about.html
and save a little markup in it.
$ vim about.html
Here’s the markup we’ll use:
<!DOCTYPE html>
<html>
<head>
<title>About Plumber</title>
</head>
<body>
<h1>Plumber</h1>
<h2>A website dedicated to speeding up your websites.</h2>
</body>
</html>
And then add the new files to the repository index:
$ git update-index --add about.html index.html
If we look at the status, we’ll see the new file added:
$ git status
new file: about.html
new file: index.html
Now we can write that tree to the database:
$ git write-tree
3fc239523266f3970efe5859cbf8fdf6992b5bbb
And we get back a hash of the tree contents.
If we look at the new tree object we just created, we see the two files listed, including their object IDs.
$ git cat-file -p 3fc239523266f3970efe5859cbf8fdf6992b5bbb
100644 blob 21c8f80eb9fdd31eb356c3f112f2b3afda00add5 about.html
100644 blob 89e41503bb5f2f7366f7d8eecc7f41439fa1fe8d index.html
Let’s create another directory for CSS (called css
) and then add a site.css
file inside of it. We’ll then update the index and then write the tree object.
$ mkdir css
$ vim css/site.css
We’ll add a simple to the css/site.css
file just so it has some content in it.
/* This is my site CSS */
Save that and then we can check that it is seen by Git but not staged.
$ git status
Now we’re ready to update the index and add the site.css
file (and the css
directory) to our repository index.
$ git update-index --add css/site.css
A git-status
will show that it has indeed been added to the index.
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: about.html
new file: css/site.css
new file: index.html
And now we can write the tree object again, which will pull in our changes to the index.
$ git write-tree
3de79856453c6ad7e510d825527a86330d994908
If we cat-file
the newly created object, we’ll see the updated repository. This object is a tree object and it has changed the ID because we changed the contents of the tree. Therefore, Git creates a new hash since the contents of the tree changed.
$ git cat-file -p 3de79856453c6ad7e510d825527a86330d994908
100644 blob 21c8f80eb9fdd31eb356c3f112f2b3afda00add5 about.html
040000 tree 3b2baef730ebbc725f1772c94bbe8348dc6b7b9d css
100644 blob 89e41503bb5f2f7366f7d8eecc7f41439fa1fe8d index.html
Now we have a new tree (css
) inside of the main tree object. We can also cat-file
the css
directory/tree and see what it has inside.
$ git cat-file -p 3b2baef730ebbc725f1772c94bbe8348dc6b7b9d
100644 blob b9bb7b6e28dc3ba00d2d353dea71f7afcb5f10a6 site.css
And it returns the site.css
file type, the hash and the file name.
Committing the Tree Object
Now we’re ready to commit the changes in the index and create our commit objects. We do this using commit-tree
a Git plumbing command that take a single tree SHA‑1 and, optionally, any previous commit objects.
$ git commit-tree 3fc239523266f3970efe5859cbf8fdf6992b5bbb -m "adding first commit"
055bd26f599db2bebf0bd75c9d8859e4d2dc534c
Since this is the first commit we are doing in this repository, we don’t need to include the previous commit ID (the parent). But we do want to include a commit message using the -m
flag that you’re probably familiar with from using git-commit
.
commit-tree
return a commit object hash that is created using the contents of the commit.
Pointing HEAD to Latest Commit #
But now if we check our Git status again, we still see the files staged. What’s going on?
The problem here is that HEAD and the branch have not been updated to include this new commit we just created. This is called, quite awkwardly I might add, a dangling commit. It’s one that doesn’t belong to any branch.
Let’s fix that by adding it to the master
branch that we’re working from. This will clear the staging and area and allow us to see the commit in our git-log output.
Git stores the branch information in the refs
directory that we looked at earlier. We need to edit (or create if it’s not created yet) .git/refs/heads/master
and add point HEAD to the current commit so it represents the latest state of the repository.
Let’s open the master
file in a text editor. This time I’ll use Vim, but you can use whatever editor is most convenient for you.
$ vim ./git/refs/heads/master
Inside of the file, add the latest commit object ID to the top (or replace it with the HEAD path that is there):
055bd26f599db2bebf0bd75c9d8859e4d2dc534c
Save the file and then check your work.
$ git status
You should get back:
On branch master
nothing to commit, working directory clean
Look in the log file to see if the latest commit is now there:
$ git log
commit 055bd26f599db2bebf0bd75c9d8859e4d2dc534c
Author: Ryan Irelan <[email protected]>
Date: Wed Apr 29 15:15:09 2015 -0500
adding first tree
And now we just recreated adding, staging, and committing changes to our Git repository using lower level plumbing commands.
Pretty nice that we don’t always have to do that, right?
Wrap-up #
While I don’t encourage you to interact with your Git repository using these lower level commands on a daily basis, it could come in handy if you need to identify a problem or issue.
You know, it’s also nice just to understand how exactly the tool works, even if you don’t ever again use it in this way.