[PATCH] diff compaction heuristic: favor shortest neighboring blank lines

* [PATCH] diff compaction heuristic: favor shortest neighboring blank lines
@ 2016-06-16 17:46 Stefan Beller
  2016-06-16 20:27 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Stefan Beller @ 2016-06-16 17:46 UTC (permalink / raw)
  To: peff; +Cc: git, jacob.keller, mhagger, Stefan Beller

The compaction heuristic for diffs was deemed quite good, but in 5580b27
we have an example demonstrates a common case, that fails with the existing
heuristic. That is why we disabled the heuristic in the v2.9.0 release.

With this patch a diff looks like:

         def bar
                 do_bar_stuff()

                 common_ending()
         end

+        def bal
+                do_bal_stuff()
+
+                common_ending()
+        end
+
         def baz
                 do_baz_stuff()

                 common_ending()
         end

whereas before we had:

  WIP (TODO: ask peff to provide an example that actually triggers, I seem to be
       unable to write one, though I thought the above was one)


The way we do it, is by inspecting the neighboring lines and see how
much indent there is and we choose the line that has the shortest amount
of blanks in the neighboring lines.

(TODO: update comments in the file)

Signed-off-by: Stefan Beller <sbeller@google.com>
---

Hi Jeff, hi Michael,

thanks for pointing out the flaw in this ruby code base before the 2.9 release.
I think this patch would fix your finding, though I cannot reproduce it.
Can you provide an example repo/patch that looks ugly on current origin/master,
so I can be sure this fixes the issue?

This can also be a start of a discussion if this is a too short-sighted enhancement
of the heuristic. Essentially we'd want to prefer 

+        end
+
+        def bal

over:

+                do_bal_stuff()
+
+                common_ending()

because there is less space between line start and {end, def bal}
than for {do_bal_stuff, common_ending}.

Thanks,
Stefan

 xdiff/xdiffi.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index b3c6848..58adc74 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -405,6 +405,35 @@ static int is_blank_line(xrecord_t **recs, long ix, long flags)
 	return xdl_blankline(recs[ix]->ptr, recs[ix]->size, flags);
 }
 
+static unsigned int leading_blank(const char *line)
+{
+	unsigned int ret = 0;
+	while (*line) {
+		if (*line == '\t')
+			ret += 8;
+		else if (*line == ' ')
+			ret ++;
+		else
+			break;
+		line++;
+	}
+	return ret;
+}
+
+static unsigned int surrounding_leading_blank(xrecord_t **recs, long ix,
+		long flags, long nrec)
+{
+	unsigned int i, ret = UINT_MAX;
+	if (ix > 0)
+		ret = leading_blank(recs[ix - 1]->ptr);
+	if (ix < nrec - 1) {
+		i = leading_blank(recs[ix + 1]->ptr);
+		if (i < ret)
+			ret = i;
+	}
+	return ret;
+}
+
 static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 {
 	return (recs[ixs]->ha == recs[ix]->ha &&
@@ -416,7 +445,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
-	unsigned int blank_lines;
+	unsigned int blank_lines, min_bl_neigh_indent;
 	xrecord_t **recs = xdf->recs;
 
 	/*
@@ -451,6 +480,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 		do {
 			grpsiz = ix - ixs;
 			blank_lines = 0;
+			min_bl_neigh_indent = UINT_MAX;
 
 			/*
 			 * If the line before the current change group, is equal to
@@ -485,7 +515,13 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the group.
 			 */
 			while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
-				blank_lines += is_blank_line(recs, ix, flags);
+				if (is_blank_line(recs, ix, flags)) {
+					unsigned int bl_neigh_indent =
+						surrounding_leading_blank(recs, ix, flags, nrec);
+					if (min_bl_neigh_indent > bl_neigh_indent)
+						min_bl_neigh_indent = min_bl_neigh_indent;
+					blank_lines++;
+				}
 
 				rchg[ixs++] = 0;
 				rchg[ix++] = 1;
@@ -525,6 +561,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 		if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
 			while (ixs > 0 &&
 			       !is_blank_line(recs, ix - 1, flags) &&
+			       surrounding_leading_blank(recs, ix - 1, flags, nrec) > min_bl_neigh_indent &&
 			       recs_match(recs, ixs - 1, ix - 1, flags)) {
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
-- 
2.9.0.8.gb6cbe66


^ permalink raw reply related	[flat|nested] 15+ messages in thread