Word-by-word diffs in Git

I personally like all my LaTeX files wrapped at about 80 columns, with rare exceptions. Of course, I track them all with Git. I use Vim, so it’s really easy to rewrap a paragraph whenever I add or delete content from it. The sad thing is, it totally screws up my diffs!

Have you ever tried looking at the history for any page in Wikipedia? It’s beautiful, intuitive, and word-centered. Forget about lines, they’re not suitable for text. Words, however, are.

Example of the word-by-word diff in Wikipedia

Example of the word-by-word diff in Wikipedia

Well, version control systems use the line-by-line diffs, which don’t really help when comparing two versions of a text file. What to do now?

I have googled a bit, and found about wdiff. Seemed interesting, but the output was really ugly. I first thought of using it as an external diff tool in git, but gave up. Nice to know it exists, though.

Then, for some reason, I decided to read the git-diff help one more time and, to my surprise, the feature we want is already there! Introducing to you… git diff --color-words!

Sample output of <code>git diff --color-words</code>

Sample output of git diff --color-words

Amazing, right? Notice how deleted words are in red, added words in green, and the unchanged text surrounding it in white (or black, if you’re not into dark backgrounds). Much more natural and easier to spot differences than the original git diff! But you should be asking yourself “Am I going to have to write this long command every time?”. Certainly not! Git allows you to create aliases, which I just learned about. You can create an alias named wdiff, and the result is a new command, git wdiff, that perfectly fits our needs! Great, isn’t it?

Here’s two ways to do it:

  • Via command-line: git config --global alias.wdiff diff --color-words; or
  • Editing your ~/.gitconfig, add the following lines:
    [alias]
        wdiff = diff --color-words

Piece of cake, right? Hope you liked it, and happy word-diffing! 8-)

Special thanks to my friend Carlos Eduardo, who helped me find a good way to illustrate the word-by-word diff!

Update: Better tokenization for LaTeX files

It seems that Git 1.6.2 and earlier now support the use of a regular expression to define what is a word for --color-words. Also, the manpage for the .gitattributes file shows that some filetypes already have useful built-in configurations, including the regular expressions in question, and LaTeX is one of those formats.

In practical terms, this means that \section{Title} can now be seen as several tokens instead of a single word, so renaming the section will provide you with a smarter highlighting.

So, if you don’t have a .gitattributes file yet, you should create one. I just found out about them, and it’s pretty cool. It adds other interesting functionalities such as hints on the diffs for which point in the file (for instance, the LaTeX chapter/(sub)section name) those diffs are taking place. The manpages can tell you more about it, so I’ll keep it short. It’s very easy to enable this, just add the following line to a .gitattributes file in your repository (its syntax is very similar to .gitignore) and everything is set for you:

*.tex    diff=tex

Even if your version is older than 1.6.2, the other benefits are much probably available for you and are well worth it, so give it a try anyway!

About these ads

10 Responses to “Word-by-word diffs in Git”

  1. Leho Kraav Says:

    holy schmokes, this works wonder for comparing gentoo USE flag changes in make.conf. cheers for the writeup mate (y)

  2. Version control for Gentoo make.conf, USE flags — Leho Kraav 24/7 Says:

    […] Obviously line based diffs don’t do us much good here, since these lines can shift around arbitrarily due to a single use flag change. Enter wdiff which lets us compare USE flags one by one. Eduardo has written a nice piece how to make git wdiff alias. […]

  3. Leo Says:

    What I have always been looking for is for a VCS to **merge** word by word.

    You may have coauthors that wrap the words differently and you don’t want a huge spurious conflict everytime you both edit the same file.

    Does git do that too?

    • Eduardo Says:

      Hi Leo.

      As far as I can tell, git is 100% line-based. However, it does seem to be possible to write your own merge driver or to use a merge tool to aid in resolving conflicts (check the manpages for gitattributes and git-mergetool). It should be possible to use an adequate tool that handles the differences word-by-word at this point. Sadly, to keep the repository clean, you’d need all authors with this setup fully configured, otherwise they may still commit “useless” changes.

      If you happen to stumble upon a solution, please share :)

    • cirosantilli Says:

      I’d love to see this too…

  4. Wang Says:

    This is wonderful. I also use git+vim+latex, also like to wrap lines at about 80 characters (I use 78). Thanks for sharing!

    I do have a problem. I use green text on black background. The green color for added text does not show. I guess I will take some time to find out how to change it but if you happen to know, please share.

    • Eduardo Says:

      Hi Wang,

      Colors used by git are configurable using git config or directly editing your ~/.gitconfig file. Take a look at the documentation, as I do not recall the exact options.

      Hope it helps!

  5. Coloring differences at a word-level using gitk at avp::ptr weblog Says:

    […] Please find further interesting thoughts in the discussion of AndrĂ¡s Salamon on StackExchange and Eduardo’s post. Also Iain Murray’s cwdiff wrapper might be worth taking a closer […]

  6. plaindocs Says:

    Cheers for that. I think you need quotes around the command line version though. Something like
    git config –global alias.wdiff “diff –color-words”
    (untested)


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: