The Pieces of Git

Git is made up of three dif­fer­ent pieces: 

  • Repos­i­to­ry
  • Work­ing Tree
  • Index

Let’s review each one in a bit more detail. If you want to step back and learn some more Git fun­da­men­tals before mov­ing for­ward, I rec­om­mend watch­ing the Basics of Git course or the Inter­me­di­ate Git course. Both are avail­able right here from Mijingo.

Repos­i­to­ry

Let’s start with the Repository.

The repos­i­to­ry is a col­lec­tion of com­mits — or changes to files the repos­i­to­ry — and a his­to­ry or archive of what the project looked like at one time.

A repos­i­to­ry is orga­nized into branch­es. These are forks in the his­to­ry of the repos­i­to­ry where a new set of changes was made. Typ­i­cal­ly, a branch will be merged back into a main branch (typ­i­cal­ly called master) when its pur­pose is met.

Branch­es are cre­at­ed for all sorts of rea­sons. The most com­mon rea­son for cre­at­ing a branch is to iso­late work so you don’t inter­fere or break the code on the master branch.

Each repos­i­to­ry has a HEAD. This is the cur­rent start­ing point of the repos­i­to­ry. If you switch branch­es in the repos­i­to­ry, the HEAD changes. If you make a change in the repos­i­to­ry and com­mit, the HEAD changes again. 

In sum­ma­ry:

  • A repos­i­to­ry has a HEAD that is the cur­rent start­ing point of the repos­i­to­ry. More on HEAD lat­er when we look at the low­er lev­el Git commands.
  • A repos­i­to­ry is into branch­es, with the main branch typ­i­cal­ly called master, which allow you to have dif­fer­ent ver­sions of the repos­i­to­ry going on simul­ta­ne­ous­ly (which would lat­er be merged together)
  • Final­ly, the repos­i­to­ry is a his­to­ry or archive of the project’s Work­ing Tree.

Work­ing Tree

So, what’s a Work­ing Tree?

This is a direc­to­ry on your file sys­tem that is asso­ci­at­ed with a repository. 

You can think of this as the file sys­tem man­i­fes­ta­tion of the repos­i­to­ry.

It’s full of the files you edit, where you add new files, and from which you remove unneed­ed files. When you do your work on the project — like adding new code or assets — you do that in the Work­ing Tree.

Any changes to the Work­ing Tree are reflect­ed in the Index, and show up as mod­i­fied files.

Index

Okay, what’s the Index?

The Index is a mid­dle area that sits between your Git repos­i­to­ry and the data files on your file sys­tem (the things you edit and change).

You might have heard the Index also called:

  • stag­ing area
  • stag­ing
  • stage
  • cache
  • Work­ing Tree cache

I like the name stag­ing area” because it’s exact­ly what happens. 

Changes are record­ed to the Index before they are com­mit­ted to the repos­i­to­ry as com­mit objects. I like using Stag­ing area” because you can stage your changes — store them some­one tem­porar­i­ly until you are ready to make a com­mit to the repository.

But I should clar­i­fy: the Git Index isn’t a place where actu­al data is stored — like the changed files or their con­tents. It only tracks the objects (files) that have changed so you can lat­er bun­dle them up as a commit.

And it does that by keep­ing a list of all of the project files and then tracks new, removed, or changed files against it. 

The Git Index is a bina­ry file locat­ed at:

.git/index

To see the index you can run:

git ls-files

You might think that you use git-status to see the Index. That’s sort of true. What git-status does is deter­mine the dif­fer­ence between the Work­ing Tree and the Index and dis­plays that dif­fer­ence to you. 

A moment ago we used git ls-files to see the Index. We can use that same com­mand to mim­ic a sim­i­lar out­put as the git-staus command.

git ls-files --modified --deleted --others --exclude-standard

Now instead of run­ning a stan­dard ls-files com­mand we fil­ter the out­put using a series of options. 

  • First, we want to show the mod­i­fied files using --modified
  • then we also want to show any delet­ed files or direc­to­ries using --deleted,
  • and to show new files — those that are still untracked by the repos­i­to­ry — we use --others,
  • and, final­ly, we use --exclude-standard to hon­or the repos­i­to­ries stan­dard exclud­ed files and directories.

All of this togeth­er sort of recre­ates what we get when we run git-status. Of course, the out­put isn’t sim­i­lar styled but the con­tent is the same.

The next step after know­ing the sta­tus of our index — what’s stage and ready to be com­mit­ted — is to cre­ate the com­mit. That’s when we start get­ting into com­mit objects. 

Let’s jump in and talk about how Git stores data, includ­ing com­mit and tree objects.