A time traveller’s guide to Git


A time traveller's guide to Git

Brandon Keepers explains how to rewrite history in Git

While scientists have crushed the dream of travelling back in time, Git offers control over the fourth dimension when the wrongs of the past need to be corrected. The distributed version control system allows commits to be amended, discarded, reordered and modified to scrub the history of a repository.

But, heed the warnings of an experienced time traveller. Git obeys the law of causality; every commit in a Git repository is inextricably linked to the commit before it. Changing one commit alters all the commits that come after, creating an alternate reality. Altering the past can be dangerous and — except in rare circumstances — should only be done if the events being altered have not been observed by anyone else. Branches that have already been pushed to a remote should not be altered.

Join me as we explore ways to rewrite history with Git.

Amend recent history

For whatever reason, the human brain seems wired to remember something important just after pressing the ‘Send’ button on an email, and the right words always come to mind after a conversation is over. Likewise, I often realise I made a mistake immediately after making a commit in Git. The safest and most common form of rewriting the Git history is to amend the latest commit.

This article was written in a Git repository. The first commit was to create a README explaining the purpose of the repository.

  1. $ git add .
  2. $ git commit -am ‘Add README
  3. [master (root-commit) 6261ead] Add README
  4.  2 files changed, 12 insertions(+)
  5.  create mode 100644 README.md
  6.  create mode 100644 article.md

Oops, after committing, I realised that I had committed article.md, which was just some notes and the first few sentences of the introduction. I did not intend to commit that file yet, so let’s remove it from the history.

  1. $ git rm –cached article.md
  2. rm ‘article.md

The --cached argument to git rm tells Git to stage the removal of the file, but to not actually delete the file from the filesystem. If you also want to delete the file, simply leave that argument out.

We can also make other modifications like we would if we were going to create another commit, such as making edits to the README.md and staging them with git add. Amend the previous commit by passing the --amend flag to git commit:

  1. $ git commit –amend
  2. [master 667f8c9] Add README
  3.  1 file changed, 7 insertions(+)
  4.  create mode 100644 README.md

Git will open an editor to allow editing the previous commit message. The Git log now shows that there is still only one commit, and that commit only has README.md.

  1. $ git log –oneline –stat
  2. 667f8c9 Add README
  3.  README.md | 7 +++++++
  4.  1 file changed, 7 insertions(+)

Let’s commit this article now that progress has been made.

Undo recent history

Sometimes, a commit has so many mistakes that it’s easier to just undo it. Maybe it was committed to the wrong branch, or a directory of unwanted files got accidentally added.

        1.  $ git reset HEAD^

This tells Git to remove to the previous commit, but to keep the changes introduced by that commit locally. git reset is powerful and can be destructive if used improperly. It’s worth reading more about it on git-scm.com.

The log now shows that the latest commit is gone, but article.md is still modified.

  1. $ git log –oneline
  2. 667f8c9 Add README
  4. $ git status -s
  5. M  article.md

From here, the changes can be committed on a different branch, stashed, discarded or modified and recommitted.

Maintain a tidy history

If you have used Git with a team, then there is no doubt that you have seen a push get rejected.

  1. $ git push origin master
  2. To git@github.com:bkeepers/git-history.git
  3.  ! [rejected]        master -> master (non-fast-forward)
  4. error: failed to push some refs to ‘git@github.com:bkeepers/git-history.git
  5. hint: Updates were rejected because the tip of your current branch is behind
  6. hint: its remote counterpart. Merge the remote changes (e.g. ‘git pull’)
  7. hint: before pushing again.
  8. hint: See the ‘Note about fast-forwards‘ in ‘git push –help‘ for details.

While this message looks big and scary, it’s actually quite helpful. The hints tell us that since we started our work, one of our team members pushed changes and we need to get them, usually by running git pull. The hint also recommends checking out the note about ‘fast-forwards’ in the Git docs. I second that recommendation.

Running git pull will fetch the remote changes and create a new commit that merges them with our local changes. While there is nothing wrong with the merge commit, it adds unnecessary complexity to the revision history.

  1. $ git log –decorate –graph –oneline
  2. *   aaf6c0c (HEAD, master) Merge branch ‘master‘ of origin
  3. |\
  4. | * 9f7e4de Update README
  5. * | 00165a8 first draft of amend section
  6. |/
  7. * 667f8c9 (origin/reset) Add README

What would make our history clearer and more readable is a way of taking our changes and applying them on top of the remote changes, like so:

  1. $ git pull –rebase origin master
  2. First, rewinding head to replay your work on top of it…
  3. Applying: update README

This makes the revision history appear as if the change was made after a team member made their commit.

  1. $ git log –decorate –graph –oneline
  2. * 8dbf5d5 first draft of amend section
  3. * c408281 update README
  4. * 667f8c9 Add README

You can see that our Git history is now much cleaner and easier to scan.

Unless a repository is being pushed to multiple remotes, rebasing when pulling is almost always a good idea. I have Git configured to rebase automatically.

  1.  $ git config –global branch.autosetuprebase always

Keeping the revision history tidy may seem superficial, but it helps immensely when managing a large project.

Clean up recent history

Sometimes it is not clear until after a few mis-steps that there’s a better path. Git’s flexibility makes it easy to create checkpoints along the way, offering a point to return to if things go wrong.

In my daily development, I commit as often as possible. Anytime I think to myself, “OK, that is done, now what?”, I commit. While this leads to a revision history that accurately reflects the order of events, the noise of many tiny commits can actually inhibit the maintainability of large projects. So once I am ready to share my changes with my team, I review my unpublished commits and clean them up.

An interactive rebase allows commits to be edited, squashed together or completely removed from the recent history of a branch.

While reviewing my progress on this article, I discovered a few embarrassing typos. Since the repository had not been shared with anyone yet, I covered my tracks by fixing the typos in the original commit. I preserved my original mistake, so you can follow along by checking out the typos branch of the repository.

First, I created two new commits to fix the typos.

  1. $ git log –oneline
  2. 7445019 Fix misspelling of amend
  3. b0377f9 Fix typo in title
  4. b1cdd72 first draft of pull –rebase
  5. 2fbe35b first draft of reset
  6. 7bb9109 first draft of amend section
  7. 667f8c9 Add README

Take note of the commit that needs fixed up. Both the typos were from commit 7bb9109, first draft of amend section. Start the rebase at the revision before:

      1.   $ git rebase -i 7bb9109^

Git will open the editor with the list of commits and a very helpful message.

  1. pick 7bb9109 first draft of amend section
  2. pick 2fbe35b first draft of reset
  3. pick b1cdd72 first draft of pull –rebase
  4. pick b0377f9 Fix typo in title
  5. pick 7445019 Fix misspelling of amend
  7. # Rebase 667f8c9..7445019 onto 667f8c9
  8. #
  9. # Commands:
  10. #  p, pick = use commit
  11. #  r, reword = use commit, but edit the commit message
  12. #  e, edit = use commit, but stop for amending
  13. #  s, squash = use commit, but meld into previous commit
  14. #  f, fixup = like “squash“, but discard this commit’s log message
  15. #  x, exec = run command (the rest of the line) using shell
  16. #
  17. # These lines can be re-ordered; they are executed from top to bottom.
  18. #
  19. # If you remove a line here THAT COMMIT WILL BE LOST.
  20. #
  21. # However, if you remove everything, the rebase will be aborted.
  22. #
  23. # Note that empty commits are commented out

As the note explains, commits can be rearrange to change their order, or pick can be changed to one of the other commands.

  1. pick 7bb9109 first draft of amend section
  2. fixup b0377f9 Fix typo in title
  3. fixup 7445019 Fix misspelling of amend
  4. pick 2fbe35b first draft of reset
  5. pick b1cdd72 first draft of pull —rebase

I moved the two typo fixes to just after the commit where they were introduced and changed pick to fixup to meld them in to the original commit. After saving and closing the editor, Git will apply the changes:

  1. [detached HEAD 00165a8] first draft of amend section
  2.  1 file changed, 47 insertions(+)
  3.  create mode 100644 article.md
  4. Successfully rebased and updated refs/heads/master.

The log shows that the typo fixing commits are now gone. The fixes were applied to the original commits and there is no evidence of my poor spelling (in this branch).

This rebase worked without any other interaction, but occasionally a rebase will require manual fixes for merge conflicts. If that happens, don’t freak out. Simply read the messages. Git will usually help get you out of a bind.

Rewrite all of history

All the Git commands we have examined so far are useful for modifying recent commits, but sometimes more extreme measures are necessary, whether it is to remove sensitive or extremely large files, or to simply make a project easier to manage.

git filter-branch supports a hand full of custom filters that can rewrite the revision history for a range of commits.

My first legitimate use of git filter-branch was on a large project where the server and the client were both in the same repository. As more people were added to the team, and tensions between the hipsters and neck-beards rose, it became obvious that two repositories would be more appropriate.

A simple solution would’ve been to clone the repository twice, delete the unnecessary files and move the remaining files around. But that leaves two repositories with duplicate histories that take up unnecessary space. Instead, we cloned the repository twice, and used the --subdirectory-filter to create two new repositories that only contained the changes for the relevant parts of the application.

1.   $ git filter-branch –subdirectory-filter client — –-all

Many people use different email addresses for personal and work projects, which can easily result in commits to a repository using the wrong email address. The --env-filter can modify basic metadata about a commit, such as author information or the commit date.

  1. $ git filter-branch –env-filter ‘
  2.  if [ $GIT_AUTHOR_EMAIL = personal@example.com ];
  3.  then GIT_AUTHOR_EMAIL=work@example.com;
  4.  fi; export GIT_AUTHOR_EMAIL’
  5. Rewrite f853027b7979756bab7146d3bb34d8829b81a884 (8/8)
  6. Ref ‘refs/heads/master‘ was rewritten

Suppose that early on in a project, someone committed some extremely large assets, and now everyone that clones the repository has to wait for those assets to download. Or maybe you are open-sourcing a project that has some sensitive data stored in it.

  1. $ git filter-branch –index-filter ‘git rm -r –cached –ignore-unmatch docs/designs’ \
  2.   –prune-empty –tag-name-filter cat — –all

All of the following changes will rewrite the full history of a repository, essentially making it a new repository. Pushing to the same remote that was used originally will get rejected.

  1. $ git push
  2.  ! [rejected]        master -> master (non-fast-forward)

It is possible to force Git to push all changes to an existing remote, but remember that this could have adverse effects for everyone else working on the project.

     1.   $ git push –force –all –tags

Power and flexibility

Git’s powerful features, extreme flexibility and often unintuitive command line may seem overwhelming, but taking time to learn and experiment is a worth-while investment. When in doubt, pass --help to any Git command to learn more. Understanding how and when to rewrite the revision history will give you complete control over your projects and make them easier to manage.

Image used courtesy of JohnGoode under Creative Commons Licensing