Git Data Storage

Name: Mijingo, LLC
Price range: $

Later we are going to use some of the lower level commands in Git and create Git data objects. To prepare ourselves, let’s review three pieces of Git:

We’ll talk about the following pieces:

Blob
Tree
Commit

What is stored in these data objects? The contents and changes of the repository.

Blob #

We’ll start with a Blob.

A blob is where Git stores the contents of a file that it tracking as part of the repository.

The file is referenced using a 40 character SHA‑1 hash made from the contents of the blob (the contents of the file). You’ve seen these before when referencing commits. Git uses SHA1 hashes for tracking all data in its repository. It guarantees a unique id for each data object.

One side effect of using only the contents of the file in a blob is that other files with the same contents just reference the same blob and don’t have to be store twice.

So a blob, with a unique id, which is created with a SHA1 hash of the contents of the blog, points to an actual file.

[blob 3837d8] — [index.html]

Using an SHA‑1 hash

Git doesn’t use the SHA‑1 hash to secure anything. It uses it because it’s “content-addressable” storage. This means that it can be tracked and retrieved based on its content, not its location.

Tree #

Like a blob is a representation of a file, a tree is really a representation of a file system object but with a different name. For our purposes, we can think of a tree as a directory. It contains blobs (files) and other trees (subdirectories). And the trees inside the tree can contain both as well.

[TREE] / [blob]
       — [TREE] / [blob]
                \ [blob]    
       \ [blob]

Just like with a blob, a tree contains the contents of the directory it references (which would be pointers to other blobs and trees) and is identified with a SHA1 hash.

Commit #

A commit is a snapshot of what the tree looked like at any given time.

The HEAD in a repository is just a pointer to a commit, which is the object that store the state of the repository when that commit object was created.

[COMMIT] —    [TREE] / [blob]
                   — [TREE] / [blob]
                            \ [blob]    
                   \ [blob]

Commits are organized in a one-way collection (directed acyclic graph) and represent the history of your changes in the repository.

A commit object contain the following:

a hash of the tree object that contains the commit
the name of the author who created the new version
the name of the person who created the commit object (usually the same as the person who created the new version)
the commit message

All of those together are the commit object and the object’s SHA‑1 hash is based on those. This makes commits unique while also keeping the pieces of the commit separate.

Git Data Storage

Blob #

Using an SHA‑1 hash

Tree #

Com­mit #

Commit #