All of lore.kernel.org
 help / color / mirror / Atom feed
* Trouble using --word-diff results
@ 2010-09-17  7:19 Wincent Colaiuta
  0 siblings, 0 replies; only message in thread
From: Wincent Colaiuta @ 2010-09-17  7:19 UTC (permalink / raw)
  To: Git List

Hi,

I've been working with the output of "git diff --word-diff" and am seeing some unexpected results which at first I thought might be a bug, but now I am beginning to wonder if --word-diff is actually useful at all for the purposes of scripting.

I noticed this specifically with the --word-diff=porcelain format, which I thought would be helpful to produce output colorized diff output like that which you can see here:

  http://github.com/git/git/commit/c2e0940b44ded03f0af02be95c35b231fea633c1
  http://qt.gitorious.org/+qt-developers/qt/staging/commit/45851a64ead74748d6b5045066545ee2c95d83f6

ie. normal unified diff style, but with darker background coloring within each added or removed line to draw attention to the specific parts of the line that were modified.

So, I tried to achieve this using --word-diff=porcelain format and got some unexpected results. But the problem can be demonstrated using the standard --word-diff=plain format as well, so I'll use that here seeing as it is a little easier to read.

Given a normal diff like this:

  -a, b, c, d
  +b, a, c, d

With --word-diff we get:

  [-a,-]b, {+a,+} c, d

Note how there is no whitespace between the removed "a" and the "b", and in the "pre" version of the file (where there is no "a" between the "b" and the "c") there will effectively be too much whitespace. Reconstructing the above in order to give us a colorized diff will yield:

  -a,b,  c, d
  +b, a, c, d

Using HTML-like tags to show where the color tags would be inserted:

  -<red>a,</red>b,  c, d
  +b, <green>a,</green>, c, d

So we get the desired coloring, but our whitespace is wrong in the "pre" line, and right in the "post" line. The problem is information about the whitespace is lost and can't be reconstructed from the output. This kind of whitespace damage doesn't happen in all cases, but in this particular example of moving something within a line, the damage occurs.

Like I said, at first I thought this was a bug, but on reading the "git diff" man page I see:

  "Every non-overlapping match of the <regex> is considered a word. Anything between these matches is considered whitespace and ignored(!) for the purposes of finding differences. You may want to append |[^[:space:]] to your regular expression to make sure that it matches all non-whitespace characters. A match that contains a newline is silently truncated(!) at the newline."

It seems from that that some whitespace information loss is expected. So now I'm wondering if --word-diff, and particularly --word-diff=porcelain, is actually useful for consumption by a script. 

If the output were:

  [-a, -]b, {+a, +}c, d

Then I could colorize like this with no whitespace damage:

  -<red>a, </red>b, c, d
  +b, <green>a, </green> c, d

And I could optionally add a post-processing pass in my script to massage the exact positioning of those color tags to not highlight those trailing spaces:

  -<red>a,</red> b, c, d
  +b, <green>a,</green> c, d

Is what I'm talking about here possibly using "--word-diff=porcelain"? For now I am working with normal diffs and rolling my own intra-line colorization from scratch.

Cheers,
Wincent

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2010-09-17  7:48 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-17  7:19 Trouble using --word-diff results Wincent Colaiuta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.