All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2 v4] xdiff: implement empty line chunk heuristic
@ 2016-04-18 21:12 Stefan Beller
  2016-04-18 21:12 ` [PATCH 1/2] xdiff: add recs_match helper function Stefan Beller
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Stefan Beller @ 2016-04-18 21:12 UTC (permalink / raw)
  To: gitster; +Cc: git, jacob.keller, Stefan Beller

> OK, so perhaps either of you two can do a final version people can
> start having fun with?

Here we go. I squashed in your patch, although with a minor change:

-               if ((flags & XDF_SHORTEST_LINE_HEURISTIC)) {
+               if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {

We did not need that in the "shortest line" heuristic as we know
a line with the shortest line length must exist. We do not know about
empty lines though.

Thanks,
Stefan

Jacob Keller (1):
  xdiff: add recs_match helper function

Stefan Beller (1):
  xdiff: implement empty line chunk heuristic

 Documentation/diff-config.txt  |  5 +++++
 Documentation/diff-options.txt |  6 ++++++
 diff.c                         | 11 +++++++++++
 xdiff/xdiff.h                  |  2 ++
 xdiff/xdiffi.c                 | 40 ++++++++++++++++++++++++++++++++++++----
 5 files changed, 60 insertions(+), 4 deletions(-)

-- 
2.8.0.26.gba39a1b.dirty

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/2] xdiff: add recs_match helper function
  2016-04-18 21:12 [PATCH 0/2 v4] xdiff: implement empty line chunk heuristic Stefan Beller
@ 2016-04-18 21:12 ` Stefan Beller
  2016-04-18 21:12 ` [PATCH 2/2] xdiff: implement empty line chunk heuristic Stefan Beller
  2016-04-18 21:22 ` [PATCH 0/2 v4] " Junio C Hamano
  2 siblings, 0 replies; 45+ messages in thread
From: Stefan Beller @ 2016-04-18 21:12 UTC (permalink / raw)
  To: gitster; +Cc: git, jacob.keller, Jacob Keller, Stefan Beller

From: Jacob Keller <jacob.keller@gmail.com>

It is a common pattern in xdl_change_compact to check that hashes and
strings match. The resulting code to perform this change causes very
long lines and makes it hard to follow the intention. Introduce a helper
function recs_match which performs both checks to increase
code readability.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 xdiff/xdiffi.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 2358a2d..748eeb9 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,14 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
 }
 
 
+static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
+{
+	return (recs[ixs]->ha == recs[ix]->ha &&
+		xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
+			     recs[ix]->ptr, recs[ix]->size,
+			     flags));
+}
+
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
@@ -442,8 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the last line of the current change group, shift backward
 			 * the group.
 			 */
-			while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha &&
-			       xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {
+			while (ixs > 0 && recs_match(recs, ixs - 1, ix - 1, flags)) {
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
 
@@ -470,8 +477,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the line next of the current change group, shift forward
 			 * the group.
 			 */
-			while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
-			       xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, recs[ix]->ptr, recs[ix]->size, flags)) {
+			while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
 				rchg[ixs++] = 0;
 				rchg[ix++] = 1;
 
-- 
2.8.0.26.gba39a1b.dirty

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-18 21:12 [PATCH 0/2 v4] xdiff: implement empty line chunk heuristic Stefan Beller
  2016-04-18 21:12 ` [PATCH 1/2] xdiff: add recs_match helper function Stefan Beller
@ 2016-04-18 21:12 ` Stefan Beller
  2016-04-18 22:04   ` Jacob Keller
  2016-04-19  5:03   ` Jeff King
  2016-04-18 21:22 ` [PATCH 0/2 v4] " Junio C Hamano
  2 siblings, 2 replies; 45+ messages in thread
From: Stefan Beller @ 2016-04-18 21:12 UTC (permalink / raw)
  To: gitster; +Cc: git, jacob.keller, Stefan Beller, Jacob Keller

In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:

...
 /*
+ *
+ *
+ */
+
+/*
...

instead of the more readable equivalent of

...
+/*
+ *
+ *
+ */
+
 /*
...

Implement the following heuristic to (optionally) produce the desired
output.

  If there are diff chunks which can be shifted around, shift each hunk
  such that the last common empty line is below the chunk with the rest
  of the context above.

This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, it can be disabled via diff.compactionHeuristic.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-config.txt  |  5 +++++
 Documentation/diff-options.txt |  6 ++++++
 diff.c                         | 11 +++++++++++
 xdiff/xdiff.h                  |  2 ++
 xdiff/xdiffi.c                 | 26 ++++++++++++++++++++++++++
 5 files changed, 50 insertions(+)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index edba565..a9f4b57 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -170,6 +170,11 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
+diff.compactionHeuristic::
+	Set this option to enable an experimental heuristic that
+	shifts the hunk boundary in an attempt to make the resulting
+	patch easier to read.
+
 diff.algorithm::
 	Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4b0318e..0993742 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,6 +63,12 @@ ifndef::git-format-patch[]
 	Synonym for `-p --raw`.
 endif::git-format-patch[]
 
+--compaction-heuristic::
+--no-compaction-heuristic::
+	These are to help debugging and tuning an experimental
+	heuristic that shifts the hunk boundary in an attempt to
+	make the resulting patch easier to read.
+
 --minimal::
 	Spend extra time to make sure the smallest possible
 	diff is produced.
diff --git a/diff.c b/diff.c
index 4dfe660..d3734d3 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_compaction_heuristic = 1;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_detect_rename_default = git_config_rename(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "diff.compactionheuristic")) {
+		diff_compaction_heuristic = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "diff.autorefreshindex")) {
 		diff_auto_refresh_index = git_config_bool(var, value);
 		return 0;
@@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
 	options->use_color = diff_use_color_default;
 	options->detect_rename = diff_detect_rename_default;
 	options->xdl_opts |= diff_algorithm;
+	if (diff_compaction_heuristic)
+		DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
 
 	options->orderfile = diff_order_file_cfg;
 
@@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+	else if (!strcmp(arg, "--compaction-heuristic"))
+		DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
+	else if (!strcmp(arg, "--no-compaction-heuristic"))
+		DIFF_XDL_CLR(options, COMPACTION_HEURISTIC);
 	else if (!strcmp(arg, "--patience"))
 		options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
 	else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 4fb7e79..7423f77 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -41,6 +41,8 @@ extern "C" {
 
 #define XDF_IGNORE_BLANK_LINES (1 << 7)
 
+#define XDF_COMPACTION_HEURISTIC (1 << 8)
+
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 748eeb9..5a02b15 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,11 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
 }
 
 
+static int is_blank_line(xrecord_t **recs, long ix, long flags)
+{
+	return xdl_blankline(recs[ix]->ptr, recs[ix]->size, flags);
+}
+
 static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 {
 	return (recs[ixs]->ha == recs[ix]->ha &&
@@ -411,6 +416,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
+	unsigned int blank_lines;
 	xrecord_t **recs = xdf->recs;
 
 	/*
@@ -444,6 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 
 		do {
 			grpsiz = ix - ixs;
+			blank_lines = 0;
 
 			/*
 			 * If the line before the current change group, is equal to
@@ -478,6 +485,8 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the group.
 			 */
 			while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
+				blank_lines += is_blank_line(recs, ix, flags);
+
 				rchg[ixs++] = 0;
 				rchg[ix++] = 1;
 
@@ -504,6 +513,23 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			rchg[--ix] = 0;
 			while (rchgo[--ixo]);
 		}
+
+		/*
+		 * If a group can be moved back and forth, see if there is an
+		 * blank line in the moving space. If there is a blank line,
+		 * make sure the last blank line is the end of the group.
+		 *
+		 * As we shifted the group forward as far as possible, we only
+		 * need to shift it back if at all.
+		 */
+		if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
+			while (ixs > 0 &&
+			       !is_blank_line(recs, ix - 1, flags) &&
+			       recs_match(recs, ixs - 1, ix - 1, flags)) {
+				rchg[--ixs] = 1;
+				rchg[--ix] = 0;
+			}
+		}
 	}
 
 	return 0;
-- 
2.8.0.26.gba39a1b.dirty

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/2 v4] xdiff: implement empty line chunk heuristic
  2016-04-18 21:12 [PATCH 0/2 v4] xdiff: implement empty line chunk heuristic Stefan Beller
  2016-04-18 21:12 ` [PATCH 1/2] xdiff: add recs_match helper function Stefan Beller
  2016-04-18 21:12 ` [PATCH 2/2] xdiff: implement empty line chunk heuristic Stefan Beller
@ 2016-04-18 21:22 ` Junio C Hamano
  2016-04-18 23:53   ` Stefan Beller
  2 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2016-04-18 21:22 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git, jacob.keller

Stefan Beller <sbeller@google.com> writes:

>> OK, so perhaps either of you two can do a final version people can
>> start having fun with?
>
> Here we go. I squashed in your patch, although with a minor change:
>
> -               if ((flags & XDF_SHORTEST_LINE_HEURISTIC)) {
> +               if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
>
> We did not need that in the "shortest line" heuristic as we know
> a line with the shortest line length must exist. We do not know about
> empty lines though.

Makes sense.  The last hunk of

$ git show 9614b8dcf -- update-cache.c

gives an unexpected result without "&& blank_lines" above.  Lack of
"&& blank_lines" happens to make the result slightly easier to read,
but at the cost of having an extra line in the hunk.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-18 21:12 ` [PATCH 2/2] xdiff: implement empty line chunk heuristic Stefan Beller
@ 2016-04-18 22:04   ` Jacob Keller
  2016-04-18 22:24     ` Junio C Hamano
  2016-04-19  5:03   ` Jeff King
  1 sibling, 1 reply; 45+ messages in thread
From: Jacob Keller @ 2016-04-18 22:04 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Git mailing list, Jacob Keller

On Mon, Apr 18, 2016 at 2:12 PM, Stefan Beller <sbeller@google.com> wrote:
> In order to produce the smallest possible diff and combine several diff
> hunks together, we implement a heuristic from GNU Diff which moves diff
> hunks forward as far as possible when we find common context above and
> below a diff hunk. This sometimes produces less readable diffs when
> writing C, Shell, or other programming languages, ie:
>
> ...
>  /*
> + *
> + *
> + */
> +
> +/*
> ...
>
> instead of the more readable equivalent of
>
> ...
> +/*
> + *
> + *
> + */
> +
>  /*
> ...
>
> Implement the following heuristic to (optionally) produce the desired
> output.
>
>   If there are diff chunks which can be shifted around, shift each hunk
>   such that the last common empty line is below the chunk with the rest
>   of the context above.
>
> This heuristic appears to resolve the above example and several other
> common issues without producing significantly weird results. However, as
> with any heuristic it is not really known whether this will always be
> more optimal. Thus, it can be disabled via diff.compactionHeuristic.
>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---

Thanks Stephan and Junio, this looks pretty good. I think before it's
merged we'd probably want to implement some sort of attributes which
allows per-path configuration, incase it needs to be configured at
all.

I've got it applied to my local git, and I'm going to try to run a
diff between enabled vs disabled on a large section of the Linux
kernel history and a few other projects to see if I spot anything odd.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-18 22:04   ` Jacob Keller
@ 2016-04-18 22:24     ` Junio C Hamano
  0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2016-04-18 22:24 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Stefan Beller, Git mailing list, Jacob Keller

Jacob Keller <jacob.keller@gmail.com> writes:

> Thanks Stephan and Junio, this looks pretty good. I think before it's
> merged we'd probably want to implement some sort of attributes which
> allows per-path configuration, incase it needs to be configured at
> all.

My take on it is that we'd want to make sure that the shift with
blank line heuristics is "good enough", i.e. there is no need for
end-user configuration or attributes, and then remove the tentative
option, configuration and its documentation, before this is merged.

If we really want to add knobs to handle different kind of payloads
in vastly different way, the right place to do so is to add a set of
bits "use this and that heuristics" to userdiff driver, I would say,
but in the compaction codepath it does not seem to be enough room to
have that many knobs to be tweaked in the first place to me.

> I've got it applied to my local git, and I'm going to try to run a
> diff between enabled vs disabled on a large section of the Linux
> kernel history and a few other projects to see if I spot anything odd.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/2 v4] xdiff: implement empty line chunk heuristic
  2016-04-18 21:22 ` [PATCH 0/2 v4] " Junio C Hamano
@ 2016-04-18 23:53   ` Stefan Beller
  0 siblings, 0 replies; 45+ messages in thread
From: Stefan Beller @ 2016-04-18 23:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jacob Keller

On Mon, Apr 18, 2016 at 2:22 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>>> OK, so perhaps either of you two can do a final version people can
>>> start having fun with?
>>
>> Here we go. I squashed in your patch, although with a minor change:
>>
>> -               if ((flags & XDF_SHORTEST_LINE_HEURISTIC)) {
>> +               if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
>>
>> We did not need that in the "shortest line" heuristic as we know
>> a line with the shortest line length must exist. We do not know about
>> empty lines though.
>
> Makes sense.  The last hunk of
>
> $ git show 9614b8dcf -- update-cache.c
>
> gives an unexpected result without "&& blank_lines" above.  Lack of
> "&& blank_lines" happens to make the result slightly easier to read,
> but at the cost of having an extra line in the hunk.

So without the blank_lines check you get  (A):
    @@ -271,15 +279,14 @@ int main(int argc, char **argv)
                     if (!verify_path(path)) {
                             fprintf(stderr, "Ignoring path %s\n", argv[i]);
                             continue;
    -                }
    -                if (add_file_to_cache(path)) {
    -                        fprintf(stderr, "Unable to add %s to
database\n", path);
    -                        goto out;
                     }
    +                if (add_file_to_cache(path))
    +                        usage("Unable to add %s to database", path);
             }
    ...

and with the heuristic you get (B):

@@ -272,14 +280,13 @@ int main(int argc, char **argv)
    @@ -272,14 +280,13 @@ int main(int argc, char **argv)
                             fprintf(stderr, "Ignoring path %s\n", argv[i]);
                             continue;
                     }
    -                if (add_file_to_cache(path)) {
    -                        fprintf(stderr, "Unable to add %s to
database\n", path);
    -                        goto out;
    -                }
    +                if (add_file_to_cache(path))
    +                        usage("Unable to add %s to database", path);
             }
    ...

In case of (A) the compaction heuristic tries to shift the hunk upwards,
stopping at the first empty line or when lines miss match.
As there is no blank line, it goes until the miss match.

Personally I'd find it less readable, because the intent was not to remove

    -                }
    -                if (add_file_to_cache(path)) {
    -                        fprintf(stderr, "Unable to add %s to
database\n", path);
    -                        goto out;

but rather remove

    -                if (add_file_to_cache(path)) {
    -                        fprintf(stderr, "Unable to add %s to
database\n", path);
    -                        goto out;
    -                }

as that is the logic unit I'd think.

Although you find this instance easier to read the behavior without the
blank_lines check would result in

    Shift hunk upward as much as possible, stop at the first empty line.

For hunks without empty line this just becomes

    Shift hunk upward as much as possible.

which is 50:50 for looking good, so we kept the old behavior as
that is just as good.

Thanks,
Stefan


>
> Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-18 21:12 ` [PATCH 2/2] xdiff: implement empty line chunk heuristic Stefan Beller
  2016-04-18 22:04   ` Jacob Keller
@ 2016-04-19  5:03   ` Jeff King
  2016-04-19  6:47     ` Stefan Beller
                       ` (2 more replies)
  1 sibling, 3 replies; 45+ messages in thread
From: Jeff King @ 2016-04-19  5:03 UTC (permalink / raw)
  To: Stefan Beller; +Cc: gitster, git, jacob.keller, Jacob Keller

On Mon, Apr 18, 2016 at 02:12:30PM -0700, Stefan Beller wrote:

> +
> +		/*
> +		 * If a group can be moved back and forth, see if there is an
> +		 * blank line in the moving space. If there is a blank line,
> +		 * make sure the last blank line is the end of the group.

s/an/a/ on the first line

> +		 * As we shifted the group forward as far as possible, we only
> +		 * need to shift it back if at all.

Maybe because I'm reading it as a diff that only contains this hunk and
not the whole rest of the function, but the "we" here confused me. You
mean the earlier, existing loop in xdl_change_compact, right?

Maybe something like:

  As we already shifted the group forward as far as possible in the
  earlier loop...

would help.

> +		if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
> +			while (ixs > 0 &&
> +			       !is_blank_line(recs, ix - 1, flags) &&
> +			       recs_match(recs, ixs - 1, ix - 1, flags)) {
> +				rchg[--ixs] = 1;
> +				rchg[--ix] = 0;
> +			}
> +		}

This turned out to be delightfully simple (especially compared to the
perl monstrosity).

I tried comparing the output to the perl one, but it's not quite the
same. In that one we had to work with the existing hunks and context
lines, so any hunk that got shifted ended up with extra context on one
side, and too little on the other. But here, we can actually bump the
context lines to give the correct amount on both sides, which is good.

I guess this will invalidate old patch-ids, but there's not much to be
done about that.

-Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-19  5:03   ` Jeff King
@ 2016-04-19  6:47     ` Stefan Beller
  2016-04-19  7:00       ` Jeff King
  2016-04-19 15:17     ` Stefan Beller
  2016-04-19 16:51     ` Junio C Hamano
  2 siblings, 1 reply; 45+ messages in thread
From: Stefan Beller @ 2016-04-19  6:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Jacob Keller, Jacob Keller

On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Apr 18, 2016 at 02:12:30PM -0700, Stefan Beller wrote:
>
>> +
>> +             /*
>> +              * If a group can be moved back and forth, see if there is an
>> +              * blank line in the moving space. If there is a blank line,
>> +              * make sure the last blank line is the end of the group.
>
> s/an/a/ on the first line

So it looks like I'll be resending another version for this series tomorrow.
Thanks for pointing this out!

>
>> +              * As we shifted the group forward as far as possible, we only
>> +              * need to shift it back if at all.
>
> Maybe because I'm reading it as a diff that only contains this hunk and
> not the whole rest of the function, but the "we" here confused me. You
> mean the earlier, existing loop in xdl_change_compact, right?
>
> Maybe something like:
>
>   As we already shifted the group forward as far as possible in the
>   earlier loop...
>
> would help.

I'll see to get rid of the 'we', otherwise I'll stick with your suggestion.

>
>> +             if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
>> +                     while (ixs > 0 &&
>> +                            !is_blank_line(recs, ix - 1, flags) &&
>> +                            recs_match(recs, ixs - 1, ix - 1, flags)) {
>> +                             rchg[--ixs] = 1;
>> +                             rchg[--ix] = 0;
>> +                     }
>> +             }
>
> This turned out to be delightfully simple (especially compared to the
> perl monstrosity).
>
> I tried comparing the output to the perl one, but it's not quite the
> same. In that one we had to work with the existing hunks and context
> lines, so any hunk that got shifted ended up with extra context on one
> side, and too little on the other. But here, we can actually bump the
> context lines to give the correct amount on both sides, which is good.
>
> I guess this will invalidate old patch-ids, but there's not much to be
> done about that.

For the record:
I thought about "optimal hunk separation" for a while, specially during my
bike commute. And while this heuristic seems to be a good fit for most of
the cases inspected, we can do better (in the future).

I am convinced the better way to do it is like this:

    Calculate the entropy for each line and take the last line with the
    lowest entropy as the last line of the hunk.

That heuristic requires more compute though as it will be hard to compute
the entropy for the line. To do that I would imagine, we'd need to loop over
the whole file and count the occurrences for each char (byte) and then
take the negative log of (#number of that byte / #number of bytes in file) [1].

This would model our actual goal a bit more closely to split at parts, where
there is low information density (the definition of entropy).

One example Jacob pointed out was a thing like

/**
 * Comment here. Over
 * more lines.
 *
+ *  Add line here with a blank line
+ *
+ * in between and a trailing blank after.
+ *
 */

I think we had cases like this in the kernel tree and else where,
and for a human it is clear to break after the last "empty line"
(which for comments starts with " * "). To detect those we can use
the entropy as it doesn't convey lots of information.
(git show e1f7037167323461c0415447676262dcb)

It also keeps the false positives out, Jacob pointed at
85ed2f32064b82e541fc7dcf2b0049a05 IIRC, which was bad with
the shortest lines only, but I'd imagine the entropy based
heuristic will do better there.

[1] https://en.wikipedia.org/wiki/Entropy_(information_theory)

Thanks for the review,
Stefan

>
> -Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-19  6:47     ` Stefan Beller
@ 2016-04-19  7:00       ` Jeff King
  2016-04-19  7:05         ` Stefan Beller
  0 siblings, 1 reply; 45+ messages in thread
From: Jeff King @ 2016-04-19  7:00 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git, Jacob Keller, Jacob Keller

On Mon, Apr 18, 2016 at 11:47:52PM -0700, Stefan Beller wrote:

> I am convinced the better way to do it is like this:
> 
>     Calculate the entropy for each line and take the last line with the
>     lowest entropy as the last line of the hunk.

I'll be curious to see the results, but I think sometimes predictable
and stupid may be the best route with these sorts of things. In
particular, I'd worry that a content-independent measure of entropy
might miss some subtleties of a particular language (e.g., that "*" is
more or less meaningful than some other character). But we'll see. :)

-Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-19  7:00       ` Jeff King
@ 2016-04-19  7:05         ` Stefan Beller
  0 siblings, 0 replies; 45+ messages in thread
From: Stefan Beller @ 2016-04-19  7:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Jacob Keller, Jacob Keller

On Tue, Apr 19, 2016 at 12:00 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Apr 18, 2016 at 11:47:52PM -0700, Stefan Beller wrote:
>
>> I am convinced the better way to do it is like this:
>>
>>     Calculate the entropy for each line and take the last line with the
>>     lowest entropy as the last line of the hunk.
>
> I'll be curious to see the results, but I think sometimes predictable
> and stupid may be the best route with these sorts of things. In
> particular, I'd worry that a content-independent measure of entropy
> might miss some subtleties of a particular language (e.g., that "*" is
> more or less meaningful than some other character). But we'll see. :)

I would assume that the "*" would have little entropy when there are lots
of comments, i.e. it just "feels" like an empty line.
If there are no "*", then the entropy is high as it is unusual. And
unusual things
should not be at the border of a hunk I would assume.
So m prediction is that the  'subtleties of a particular language' correlate
highly with the actual use of characters.

Anyway, the experiment can be carried out later. :)

Thanks,
Stefan

>
> -Peff
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-19  5:03   ` Jeff King
  2016-04-19  6:47     ` Stefan Beller
@ 2016-04-19 15:17     ` Stefan Beller
  2016-04-19 17:06       ` Jeff King
  2016-04-19 16:51     ` Junio C Hamano
  2 siblings, 1 reply; 45+ messages in thread
From: Stefan Beller @ 2016-04-19 15:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Jacob Keller, Jacob Keller

On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote:

> I guess this will invalidate old patch-ids, but there's not much to be
> done about that.

What do you mean by that? (What consequences do you imagine?)
I think diffs with any kind of heuristic can still be applied, no?

Thanks,
Stefan

>
> -Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-19  5:03   ` Jeff King
  2016-04-19  6:47     ` Stefan Beller
  2016-04-19 15:17     ` Stefan Beller
@ 2016-04-19 16:51     ` Junio C Hamano
  2 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2016-04-19 16:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, git, jacob.keller, Jacob Keller

Jeff King <peff@peff.net> writes:

> I guess this will invalidate old patch-ids, but there's not much to be
> done about that.

If we really cared, we could disable this (and any future) change to
the compaction logic to "patch-id --[un]stable" option.

I am not sure if it is worth the effort, though ;-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-19 15:17     ` Stefan Beller
@ 2016-04-19 17:06       ` Jeff King
  2016-04-19 23:02         ` Jacob Keller
  2016-04-20  6:00         ` Junio C Hamano
  0 siblings, 2 replies; 45+ messages in thread
From: Jeff King @ 2016-04-19 17:06 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, git, Jacob Keller, Jacob Keller

On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote:

> On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote:
> 
> > I guess this will invalidate old patch-ids, but there's not much to be
> > done about that.
> 
> What do you mean by that? (What consequences do you imagine?)
> I think diffs with any kind of heuristic can still be applied, no?

I mean that if you save any old patch-ids from "git patch-id", they
won't match up when compared with new versions of git. We can probably
ignore it, though. This isn't the first time that patch-ids might have
changed, and I think the advice is already that one should not count on
them to be stable in the long term.

-Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-19 17:06       ` Jeff King
@ 2016-04-19 23:02         ` Jacob Keller
  2016-04-19 23:07           ` Junio C Hamano
  2016-04-20  6:00         ` Junio C Hamano
  1 sibling, 1 reply; 45+ messages in thread
From: Jacob Keller @ 2016-04-19 23:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, Junio C Hamano, git, Jacob Keller

On Tue, Apr 19, 2016 at 10:06 AM, Jeff King <peff@peff.net> wrote:
> On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote:
>
>> On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote:
>>
>> > I guess this will invalidate old patch-ids, but there's not much to be
>> > done about that.
>>
>> What do you mean by that? (What consequences do you imagine?)
>> I think diffs with any kind of heuristic can still be applied, no?
>
> I mean that if you save any old patch-ids from "git patch-id", they
> won't match up when compared with new versions of git. We can probably
> ignore it, though. This isn't the first time that patch-ids might have
> changed, and I think the advice is already that one should not count on
> them to be stable in the long term.
>
> -Peff

Plus they'll be stable within a version of Git, it's only recorded
patch ids that change, which hopefully isn't done very much if at all.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-19 23:02         ` Jacob Keller
@ 2016-04-19 23:07           ` Junio C Hamano
  2016-04-20 13:12             ` Michael S. Tsirkin
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2016-04-19 23:07 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jacob Keller, Jeff King, Stefan Beller, git, Jacob Keller

Jacob Keller <jacob.keller@gmail.com> writes:

> On Tue, Apr 19, 2016 at 10:06 AM, Jeff King <peff@peff.net> wrote:
>> On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote:
>>
>>> On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote:
>>>
>>> > I guess this will invalidate old patch-ids, but there's not much to be
>>> > done about that.
>>>
>>> What do you mean by that? (What consequences do you imagine?)
>>> I think diffs with any kind of heuristic can still be applied, no?
>>
>> I mean that if you save any old patch-ids from "git patch-id", they
>> won't match up when compared with new versions of git. We can probably
>> ignore it, though. This isn't the first time that patch-ids might have
>> changed, and I think the advice is already that one should not count on
>> them to be stable in the long term.
>>
>> -Peff
>
> Plus they'll be stable within a version of Git, it's only recorded
> patch ids that change, which hopefully isn't done very much if at all.
>
> Thanks,
> Jake

Some people, like those who did things like 30e12b92 (patch-id: make
it stable against hunk reordering, 2014-04-27), _may_ care.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-19 17:06       ` Jeff King
  2016-04-19 23:02         ` Jacob Keller
@ 2016-04-20  6:00         ` Junio C Hamano
  1 sibling, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2016-04-20  6:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, git, Jacob Keller, Jacob Keller

Jeff King <peff@peff.net> writes:

> I mean that if you save any old patch-ids from "git patch-id", they
> won't match up when compared with new versions of git. We can probably
> ignore it, though. This isn't the first time that patch-ids might have
> changed, and I think the advice is already that one should not count on
> them to be stable in the long term.

Another thing that this *will* break is the patch signature upload
protocol k.org uses to allow Linus, Greg, et al. on the road with
limited hotel wifi bandwidth to prepare patch-X-test1.gz and
patch-X-test1.sign file.  They can locally tag X-test1, prepare
"git diff X X-test1 | gzip -n >patch-X-test1.gz" and sign the
result, and upload _only_ the detached signature after pushing.

They can tell k.org, when uploading the detached signature, to
recreate the patchfile by running the same "git diff" to save the
bandwidth of sending the same thing twice (as they have to "push"
anyway, having to send the generated patch is a pure overhead).

Having said all that, kup(1) users are already warned that the
textual diff produced by "git diff-tree -p" (which is mentioned in
the documentation of the tool) varies across versions of Git and
the above "optimization" would not work unless both ends have the
same version of Git, so it may not be too big an issue for them.
They have already been burned once when we corrected "git archive"
output in the past (they obviously have the same optimization to
sign tarballs, and the kup(1) mechanism relies to have byte-for-byte
identical output).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-19 23:07           ` Junio C Hamano
@ 2016-04-20 13:12             ` Michael S. Tsirkin
  2016-04-20 16:09               ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Michael S. Tsirkin @ 2016-04-20 13:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jacob Keller, Jeff King, Stefan Beller, git, Jacob Keller

On Tue, Apr 19, 2016 at 04:07:35PM -0700, Junio C Hamano wrote:
> Jacob Keller <jacob.keller@gmail.com> writes:
> 
> > On Tue, Apr 19, 2016 at 10:06 AM, Jeff King <peff@peff.net> wrote:
> >> On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote:
> >>
> >>> On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote:
> >>>
> >>> > I guess this will invalidate old patch-ids, but there's not much to be
> >>> > done about that.
> >>>
> >>> What do you mean by that? (What consequences do you imagine?)
> >>> I think diffs with any kind of heuristic can still be applied, no?
> >>
> >> I mean that if you save any old patch-ids from "git patch-id", they
> >> won't match up when compared with new versions of git. We can probably
> >> ignore it, though. This isn't the first time that patch-ids might have
> >> changed, and I think the advice is already that one should not count on
> >> them to be stable in the long term.
> >>
> >> -Peff
> >
> > Plus they'll be stable within a version of Git, it's only recorded
> > patch ids that change, which hopefully isn't done very much if at all.
> >
> > Thanks,
> > Jake
> 
> Some people, like those who did things like 30e12b92 (patch-id: make
> it stable against hunk reordering, 2014-04-27), _may_ care.
> 

FWIW IIRC what that commit is about is ability to reorder the chunks in
a patch without changing patch-id. Not about keeping id stable across
git revisions.

-- 
MST

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-20 13:12             ` Michael S. Tsirkin
@ 2016-04-20 16:09               ` Junio C Hamano
  2016-04-20 16:17                 ` Jeff King
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2016-04-20 16:09 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Jacob Keller, Jeff King, Stefan Beller, git, Jacob Keller

"Michael S. Tsirkin" <mst@redhat.com> writes:

> FWIW IIRC what that commit is about is ability to reorder the chunks in
> a patch without changing patch-id. Not about keeping id stable across
> git revisions.

OK, but "reorder the chunks" is not meant to stay to be the _ONLY_
purpose for an option whose name is a broad "--[un]stable", but
merely one (and only) possible cause of patch-id instability that
happened to be noticed as an issue back then and was dealt with that
commit, no?  In other words, the intent of the "--stable" feature is
to give a stable ID that is not affected by random end-user settings
(e.g. diff.orderfile) and if somebody invents a new configurable knob
in the future, they are supposed to pay attention to the "--stable"
feature or existing users who do use "--stable" will be broken, no?

I can still buy "--stable is not about stability across versions of
Git"--it makes our job easier ;-)  I just want to make sure that
"--stable is about stability inside a single version of Git that
patch ID for the same commit will stay the same and unaffected by
random end-user configuration knobs".

Which in turn would mean that we won't have to worry about this
option in patch-id as long as we remove the diff.compactionheuristic
configuration and command line option once the developers are done
experimenting with their heuristics code.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-20 16:09               ` Junio C Hamano
@ 2016-04-20 16:17                 ` Jeff King
  0 siblings, 0 replies; 45+ messages in thread
From: Jeff King @ 2016-04-20 16:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Michael S. Tsirkin, Jacob Keller, Stefan Beller, git, Jacob Keller

On Wed, Apr 20, 2016 at 09:09:53AM -0700, Junio C Hamano wrote:

> "Michael S. Tsirkin" <mst@redhat.com> writes:
> 
> > FWIW IIRC what that commit is about is ability to reorder the chunks in
> > a patch without changing patch-id. Not about keeping id stable across
> > git revisions.
> 
> OK, but "reorder the chunks" is not meant to stay to be the _ONLY_
> purpose for an option whose name is a broad "--[un]stable", but
> merely one (and only) possible cause of patch-id instability that
> happened to be noticed as an issue back then and was dealt with that
> commit, no?  In other words, the intent of the "--stable" feature is
> to give a stable ID that is not affected by random end-user settings
> (e.g. diff.orderfile) and if somebody invents a new configurable knob
> in the future, they are supposed to pay attention to the "--stable"
> feature or existing users who do use "--stable" will be broken, no?

I forgot that we added "--stable". Evne if it is not meant to be about
stability across versions, is there any reason _not_ to turn off
this heuristic for --stable (or for patch-ids in general)?

I guess maybe that creates some inconsistency between generating a
patch-id directly, and making one from a diff given on stdin (though I
don't know that we can promise much about the latter in the general
case; we can fix file ordering, but we don't have enough information to
tweak other aspects).

-Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-05-02 18:02                       ` Jeff King
@ 2016-05-03 17:55                         ` Jacob Keller
  0 siblings, 0 replies; 45+ messages in thread
From: Jacob Keller @ 2016-05-03 17:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Keller, Jacob E, sbeller, git

On Mon, May 2, 2016 at 11:02 AM, Jeff King <peff@peff.net> wrote:
> On Mon, May 02, 2016 at 10:40:28AM -0700, Junio C Hamano wrote:
>
>> "Keller, Jacob E" <jacob.e.keller@intel.com> writes:
>>
>> > True. I think the chances that it needs such a thing are quite minor,
>> > and if an undocumented knob gets exposed it would have to become
>> > documented and maintained, so I'd prefer to avoid it. Given that the
>> > risk is pretty small I think that's ok.
>>
>> OK, then let's do only the "documentation" part.
>>
>> -- >8 --
>> Subject: [PATCH] diff: undocument the compaction heuristic knobs for experimentation
>>
>> It seems that people around here are all happy with the updated
>> heuristics used to decide where the hunks are separated.  Let's keep
>> that as the default.  Even though we do not expect too much trouble
>> from the difference between the old and the new algorithms, just in
>> case let's leave the implementation of the knobs to turn it off for
>> emergencies.  There is no longer need for documenting them, though.
>
> I agree with this reasoning. Thanks.
>
> -Peff

I think I agree too.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-05-02 17:40                     ` Junio C Hamano
  2016-05-02 17:45                       ` Stefan Beller
@ 2016-05-02 18:02                       ` Jeff King
  2016-05-03 17:55                         ` Jacob Keller
  1 sibling, 1 reply; 45+ messages in thread
From: Jeff King @ 2016-05-02 18:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Keller, Jacob E, sbeller, git, jacob.keller

On Mon, May 02, 2016 at 10:40:28AM -0700, Junio C Hamano wrote:

> "Keller, Jacob E" <jacob.e.keller@intel.com> writes:
> 
> > True. I think the chances that it needs such a thing are quite minor,
> > and if an undocumented knob gets exposed it would have to become
> > documented and maintained, so I'd prefer to avoid it. Given that the
> > risk is pretty small I think that's ok.
> 
> OK, then let's do only the "documentation" part.
> 
> -- >8 --
> Subject: [PATCH] diff: undocument the compaction heuristic knobs for experimentation
> 
> It seems that people around here are all happy with the updated
> heuristics used to decide where the hunks are separated.  Let's keep
> that as the default.  Even though we do not expect too much trouble
> from the difference between the old and the new algorithms, just in
> case let's leave the implementation of the knobs to turn it off for
> emergencies.  There is no longer need for documenting them, though.

I agree with this reasoning. Thanks.

-Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-05-02 17:40                     ` Junio C Hamano
@ 2016-05-02 17:45                       ` Stefan Beller
  2016-05-02 18:02                       ` Jeff King
  1 sibling, 0 replies; 45+ messages in thread
From: Stefan Beller @ 2016-05-02 17:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Keller, Jacob E, git, peff, jacob.keller

On Mon, May 2, 2016 at 10:40 AM, Junio C Hamano <gitster@pobox.com> wrote:
> "Keller, Jacob E" <jacob.e.keller@intel.com> writes:
>
>> True. I think the chances that it needs such a thing are quite minor,
>> and if an undocumented knob gets exposed it would have to become
>> documented and maintained, so I'd prefer to avoid it. Given that the
>> risk is pretty small I think that's ok.
>
> OK, then let's do only the "documentation" part.

The patch below looks good to me.

Thanks,
Stefan

>
> -- >8 --
> Subject: [PATCH] diff: undocument the compaction heuristic knobs for experimentation
>
> It seems that people around here are all happy with the updated
> heuristics used to decide where the hunks are separated.  Let's keep
> that as the default.  Even though we do not expect too much trouble
> from the difference between the old and the new algorithms, just in
> case let's leave the implementation of the knobs to turn it off for
> emergencies.  There is no longer need for documenting them, though.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/diff-config.txt  | 5 -----
>  Documentation/diff-options.txt | 6 ------
>  2 files changed, 11 deletions(-)
>
> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
> index 9bf3e92..6eaa452 100644
> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
> @@ -166,11 +166,6 @@ diff.tool::
>
>  include::mergetools-diff.txt[]
>
> -diff.compactionHeuristic::
> -       Set this option to enable an experimental heuristic that
> -       shifts the hunk boundary in an attempt to make the resulting
> -       patch easier to read.
> -
>  diff.algorithm::
>         Choose a diff algorithm.  The variants are as follows:
>  +
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index b513023..3ad6404 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -63,12 +63,6 @@ ifndef::git-format-patch[]
>         Synonym for `-p --raw`.
>  endif::git-format-patch[]
>
> ---compaction-heuristic::
> ---no-compaction-heuristic::
> -       These are to help debugging and tuning an experimental
> -       heuristic that shifts the hunk boundary in an attempt to
> -       make the resulting patch easier to read.
> -
>  --minimal::
>         Spend extra time to make sure the smallest possible
>         diff is produced.
> --
> 2.8.2-458-gacc1066
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-29 22:48                   ` Keller, Jacob E
@ 2016-05-02 17:40                     ` Junio C Hamano
  2016-05-02 17:45                       ` Stefan Beller
  2016-05-02 18:02                       ` Jeff King
  0 siblings, 2 replies; 45+ messages in thread
From: Junio C Hamano @ 2016-05-02 17:40 UTC (permalink / raw)
  To: Keller, Jacob E; +Cc: sbeller, git, peff, jacob.keller

"Keller, Jacob E" <jacob.e.keller@intel.com> writes:

> True. I think the chances that it needs such a thing are quite minor,
> and if an undocumented knob gets exposed it would have to become
> documented and maintained, so I'd prefer to avoid it. Given that the
> risk is pretty small I think that's ok.

OK, then let's do only the "documentation" part.

-- >8 --
Subject: [PATCH] diff: undocument the compaction heuristic knobs for experimentation

It seems that people around here are all happy with the updated
heuristics used to decide where the hunks are separated.  Let's keep
that as the default.  Even though we do not expect too much trouble
from the difference between the old and the new algorithms, just in
case let's leave the implementation of the knobs to turn it off for
emergencies.  There is no longer need for documenting them, though.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/diff-config.txt  | 5 -----
 Documentation/diff-options.txt | 6 ------
 2 files changed, 11 deletions(-)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index 9bf3e92..6eaa452 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -166,11 +166,6 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
-diff.compactionHeuristic::
-	Set this option to enable an experimental heuristic that
-	shifts the hunk boundary in an attempt to make the resulting
-	patch easier to read.
-
 diff.algorithm::
 	Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index b513023..3ad6404 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,12 +63,6 @@ ifndef::git-format-patch[]
 	Synonym for `-p --raw`.
 endif::git-format-patch[]
 
---compaction-heuristic::
---no-compaction-heuristic::
-	These are to help debugging and tuning an experimental
-	heuristic that shifts the hunk boundary in an attempt to
-	make the resulting patch easier to read.
-
 --minimal::
 	Spend extra time to make sure the smallest possible
 	diff is produced.
-- 
2.8.2-458-gacc1066

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-29 22:35             ` Stefan Beller
  2016-04-29 22:39               ` Keller, Jacob E
@ 2016-04-30  3:06               ` Jeff King
  1 sibling, 0 replies; 45+ messages in thread
From: Jeff King @ 2016-04-30  3:06 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Junio C Hamano, Jacob Keller, Git mailing list, Jacob Keller

On Fri, Apr 29, 2016 at 03:35:54PM -0700, Stefan Beller wrote:

> > -- >8 --
> > diff: enable "compaction heuristics" and lose experimentation knob
> >
> > It seems that the new "find a good hunk boundary by locating a blank
> > line" heuristics gives much more pleasant result without much
> > noticeable downsides.  Let's make it the new algorithm for real,
> > without the opt-out knob we added while experimenting with it.
> 
> I would remove the opt-out knob much later in the game, i.e.
> 
>     1) make a patch that removes the documentation only
>        before the next release (i.e. before 2.9)
>     2) make a patch to remove the actual (unlabeled) knobs,
>         merge into master before 2.10 (i.e. just after the 2.9 release)

Yeah, I think it might be a good idea to keep some sort of undocumented
safety valve in the release, at least for a cycle or two. The heuristic
won't _really_ see wide use until it is in a released version of git, as
much as we would like it to be otherwise.

So I am anticipating a possible conversation where somebody reports that
the new output looks bad, and it would be nice to say "try it with this
flag (or environment variable, or whatever) and see if that looks
better".  And then based on that conversation we can decide what the
right next is (a real user-visible flag, or reversion, or deciding the
user's case isn't worth it). Or maybe if we're lucky that conversation
never happens.

The "whatever" in the instructions can obviously be "build with this
patch" or "try with an older version of git", but we're a lot more
likely to get a good response if it's easy for the user to do.

-Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-29 22:44                 ` Stefan Beller
@ 2016-04-29 22:48                   ` Keller, Jacob E
  2016-05-02 17:40                     ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Keller, Jacob E @ 2016-04-29 22:48 UTC (permalink / raw)
  To: sbeller; +Cc: git, gitster, peff, jacob.keller

On Fri, 2016-04-29 at 15:44 -0700, Stefan Beller wrote:
> > 
> > Currently it's an "opt in" knob, so this doesn't make sense to me.
> +static int diff_compaction_heuristic = 1;
> 

Oops didn't know we'd made it default at some point. (all my versions
had it disabled by default)

> It's rather an opt-out knob going by the current
> origin/jk/diff-compact-heuristic
> 

Yea in that case, we could keep it.

> 
> > 
> > If
> > we remove the entire knob as is, we can always (fairly easily) add
> > it
> > back. I would keep the code inside xdiff as a knob, but set it to
> > enable default so that the user config has no knob at the top level
> > but
> > the xdiff machinery does (this making a "disable" be relatively
> > small
> > patch).
> When writing my reply, I thought about people using Git from a binary
> distribution with little to no admin rights. They want to have an
> emergency
> knob to disable this thing, but cannot patch/recompile Git.
> 
> If you can patch and compile your version of Git, then reverting is
> easy, so
> in that case Junios patch looks good to me.
> 
> Thanks,
> Stefan

True. I think the chances that it needs such a thing are quite minor,
and if an undocumented knob gets exposed it would have to become
documented and maintained, so I'd prefer to avoid it. Given that the
risk is pretty small I think that's ok.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-29 22:39               ` Keller, Jacob E
@ 2016-04-29 22:44                 ` Stefan Beller
  2016-04-29 22:48                   ` Keller, Jacob E
  0 siblings, 1 reply; 45+ messages in thread
From: Stefan Beller @ 2016-04-29 22:44 UTC (permalink / raw)
  To: Keller, Jacob E; +Cc: gitster, git, peff, jacob.keller

> Currently it's an "opt in" knob, so this doesn't make sense to me.

+static int diff_compaction_heuristic = 1;

It's rather an opt-out knob going by the current
origin/jk/diff-compact-heuristic


> If
> we remove the entire knob as is, we can always (fairly easily) add it
> back. I would keep the code inside xdiff as a knob, but set it to
> enable default so that the user config has no knob at the top level but
> the xdiff machinery does (this making a "disable" be relatively small
> patch).

When writing my reply, I thought about people using Git from a binary
distribution with little to no admin rights. They want to have an emergency
knob to disable this thing, but cannot patch/recompile Git.

If you can patch and compile your version of Git, then reverting is easy, so
in that case Junios patch looks good to me.

Thanks,
Stefan

>
> Thanks,
> Jake

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-29 22:35             ` Stefan Beller
@ 2016-04-29 22:39               ` Keller, Jacob E
  2016-04-29 22:44                 ` Stefan Beller
  2016-04-30  3:06               ` Jeff King
  1 sibling, 1 reply; 45+ messages in thread
From: Keller, Jacob E @ 2016-04-29 22:39 UTC (permalink / raw)
  To: gitster, sbeller; +Cc: git, peff, jacob.keller

On Fri, 2016-04-29 at 15:35 -0700, Stefan Beller wrote:
> On Fri, Apr 29, 2016 at 3:18 PM, Junio C Hamano <gitster@pobox.com>
> wrote:
> > 
> > Jacob Keller <jacob.keller@gmail.com> writes:
> > 
> > > 
> > > On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano <gitster@pobox.co
> > > m> wrote:
> > > > 
> > > > Jeff King <peff@peff.net> writes:
> > > > 
> > > > > 
> > > > > ... Having the two directly next to each other reads
> > > > > better to me. This is a pretty unusual diff, though, in that
> > > > > it did
> > > > > change the surrounding whitespace (and if you look further in
> > > > > the diff,
> > > > > the identical change is made elsewhere _without_ touching the
> > > > > whitespace). So this is kind of an anomaly. And IMHO the
> > > > > weirdness here
> > > > > is outweighed by the vast number of improvements elsewhere.
> > > > So... is everybody happy with the result and now we can drop
> > > > the
> > > > tweaking knob added to help experimentation before merging the
> > > > result to 'master'?
> > > > 
> > > > I am pretty happy with the end result myself.
> > > I am very happy with it. I haven't had any issues, and I think
> > > we'll
> > > find better traction by enabling it at this point and seeing
> > > when/if
> > > someone complains.
> > > 
> > > I think for most it won't be noticed and for those that do it
> > > will
> > > likely be positive.
> > I am doing this only to prepare in case we have a concensus,
> > i.e. this is not to declare that I do not care what other people
> > say.  Here is a patch to remove the experimentation knob.
> > 
> > Let's say we keep this patch out of tree for now and keep the topic
> > in 'next' so that people can further play with it for several more
> > weeks, and then apply this on top and merge the result to 'master'
> > early in the next cycle.
> > 
> > -- >8 --
> > diff: enable "compaction heuristics" and lose experimentation knob
> > 
> > It seems that the new "find a good hunk boundary by locating a
> > blank
> > line" heuristics gives much more pleasant result without much
> > noticeable downsides.  Let's make it the new algorithm for real,
> > without the opt-out knob we added while experimenting with it.
> I would remove the opt-out knob much later in the game, i.e.
> 
>     1) make a patch that removes the documentation only
>        before the next release (i.e. before 2.9)
>     2) make a patch to remove the actual (unlabeled) knobs,
>         merge into master before 2.10 (i.e. just after the 2.9
> release)
> 
> Then we get the most of the community to test it with the 2.9 release
> and still have an emergency knob in case some major headaches
> show up. After one release cycle we'll be much more confident
> about its usage and its short comings and do not need the
> emergency turn off. If the community doesn't like it for some reason
> we can document it and debate the default setting?
> 
> I agree we want the knob gone eventually.
> Making it an undocumented feature is as good as that from
> a users point of view?
> 

Currently it's an "opt in" knob, so this doesn't make sense to me. If
we remove the entire knob as is, we can always (fairly easily) add it
back. I would keep the code inside xdiff as a knob, but set it to
enable default so that the user config has no knob at the top level but
the xdiff machinery does (this making a "disable" be relatively small
patch).

Thanks,
Jake

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-29 22:18           ` Junio C Hamano
@ 2016-04-29 22:35             ` Stefan Beller
  2016-04-29 22:39               ` Keller, Jacob E
  2016-04-30  3:06               ` Jeff King
  0 siblings, 2 replies; 45+ messages in thread
From: Stefan Beller @ 2016-04-29 22:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jacob Keller, Jeff King, Git mailing list, Jacob Keller

On Fri, Apr 29, 2016 at 3:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jacob Keller <jacob.keller@gmail.com> writes:
>
>> On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Jeff King <peff@peff.net> writes:
>>>
>>>> ... Having the two directly next to each other reads
>>>> better to me. This is a pretty unusual diff, though, in that it did
>>>> change the surrounding whitespace (and if you look further in the diff,
>>>> the identical change is made elsewhere _without_ touching the
>>>> whitespace). So this is kind of an anomaly. And IMHO the weirdness here
>>>> is outweighed by the vast number of improvements elsewhere.
>>>
>>> So... is everybody happy with the result and now we can drop the
>>> tweaking knob added to help experimentation before merging the
>>> result to 'master'?
>>>
>>> I am pretty happy with the end result myself.
>>
>> I am very happy with it. I haven't had any issues, and I think we'll
>> find better traction by enabling it at this point and seeing when/if
>> someone complains.
>>
>> I think for most it won't be noticed and for those that do it will
>> likely be positive.
>
> I am doing this only to prepare in case we have a concensus,
> i.e. this is not to declare that I do not care what other people
> say.  Here is a patch to remove the experimentation knob.
>
> Let's say we keep this patch out of tree for now and keep the topic
> in 'next' so that people can further play with it for several more
> weeks, and then apply this on top and merge the result to 'master'
> early in the next cycle.
>
> -- >8 --
> diff: enable "compaction heuristics" and lose experimentation knob
>
> It seems that the new "find a good hunk boundary by locating a blank
> line" heuristics gives much more pleasant result without much
> noticeable downsides.  Let's make it the new algorithm for real,
> without the opt-out knob we added while experimenting with it.

I would remove the opt-out knob much later in the game, i.e.

    1) make a patch that removes the documentation only
       before the next release (i.e. before 2.9)
    2) make a patch to remove the actual (unlabeled) knobs,
        merge into master before 2.10 (i.e. just after the 2.9 release)

Then we get the most of the community to test it with the 2.9 release
and still have an emergency knob in case some major headaches
show up. After one release cycle we'll be much more confident
about its usage and its short comings and do not need the
emergency turn off. If the community doesn't like it for some reason
we can document it and debate the default setting?

I agree we want the knob gone eventually.
Making it an undocumented feature is as good as that from
a users point of view?

>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Documentation/diff-config.txt  |  5 -----
>  Documentation/diff-options.txt |  6 ------
>  diff.c                         | 11 -----------
>  xdiff/xdiff.h                  |  2 --
>  xdiff/xdiffi.c                 |  2 +-
>  5 files changed, 1 insertion(+), 25 deletions(-)
>
> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
> index 9bf3e92..6eaa452 100644
> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
> @@ -166,11 +166,6 @@ diff.tool::
>
>  include::mergetools-diff.txt[]
>
> -diff.compactionHeuristic::
> -       Set this option to enable an experimental heuristic that
> -       shifts the hunk boundary in an attempt to make the resulting
> -       patch easier to read.
> -
>  diff.algorithm::
>         Choose a diff algorithm.  The variants are as follows:
>  +
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index b513023..3ad6404 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -63,12 +63,6 @@ ifndef::git-format-patch[]
>         Synonym for `-p --raw`.
>  endif::git-format-patch[]
>
> ---compaction-heuristic::
> ---no-compaction-heuristic::
> -       These are to help debugging and tuning an experimental
> -       heuristic that shifts the hunk boundary in an attempt to
> -       make the resulting patch easier to read.
> -
>  --minimal::
>         Spend extra time to make sure the smallest possible
>         diff is produced.
> diff --git a/diff.c b/diff.c
> index 05ca3ce..f62b7f7 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -25,7 +25,6 @@
>  #endif
>
>  static int diff_detect_rename_default;
> -static int diff_compaction_heuristic = 1;
>  static int diff_rename_limit_default = 400;
>  static int diff_suppress_blank_empty;
>  static int diff_use_color_default = -1;
> @@ -184,10 +183,6 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>                 diff_detect_rename_default = git_config_rename(var, value);
>                 return 0;
>         }
> -       if (!strcmp(var, "diff.compactionheuristic")) {
> -               diff_compaction_heuristic = git_config_bool(var, value);
> -               return 0;
> -       }
>         if (!strcmp(var, "diff.autorefreshindex")) {
>                 diff_auto_refresh_index = git_config_bool(var, value);
>                 return 0;
> @@ -3240,8 +3235,6 @@ void diff_setup(struct diff_options *options)
>         options->use_color = diff_use_color_default;
>         options->detect_rename = diff_detect_rename_default;
>         options->xdl_opts |= diff_algorithm;
> -       if (diff_compaction_heuristic)
> -               DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
>
>         options->orderfile = diff_order_file_cfg;
>
> @@ -3719,10 +3712,6 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
>                 DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
>         else if (!strcmp(arg, "--ignore-blank-lines"))
>                 DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
> -       else if (!strcmp(arg, "--compaction-heuristic"))
> -               DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
> -       else if (!strcmp(arg, "--no-compaction-heuristic"))
> -               DIFF_XDL_CLR(options, COMPACTION_HEURISTIC);
>         else if (!strcmp(arg, "--patience"))
>                 options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
>         else if (!strcmp(arg, "--histogram"))
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index d1dbb27..c033991 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -41,8 +41,6 @@ extern "C" {
>
>  #define XDF_IGNORE_BLANK_LINES (1 << 7)
>
> -#define XDF_COMPACTION_HEURISTIC (1 << 8)
> -
>  #define XDL_EMIT_FUNCNAMES (1 << 0)
>  #define XDL_EMIT_COMMON (1 << 1)
>  #define XDL_EMIT_FUNCCONTEXT (1 << 2)
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index b3c6848..574f83c 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -522,7 +522,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>                  * As we already shifted the group forward as far as possible
>                  * in the earlier loop, we need to shift it back only if at all.
>                  */
> -               if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
> +               if (blank_lines) {
>                         while (ixs > 0 &&
>                                !is_blank_line(recs, ix - 1, flags) &&
>                                recs_match(recs, ixs - 1, ix - 1, flags)) {

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-29 20:59         ` Jacob Keller
@ 2016-04-29 22:18           ` Junio C Hamano
  2016-04-29 22:35             ` Stefan Beller
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2016-04-29 22:18 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jeff King, Stefan Beller, Git mailing list, Jacob Keller

Jacob Keller <jacob.keller@gmail.com> writes:

> On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Jeff King <peff@peff.net> writes:
>>
>>> ... Having the two directly next to each other reads
>>> better to me. This is a pretty unusual diff, though, in that it did
>>> change the surrounding whitespace (and if you look further in the diff,
>>> the identical change is made elsewhere _without_ touching the
>>> whitespace). So this is kind of an anomaly. And IMHO the weirdness here
>>> is outweighed by the vast number of improvements elsewhere.
>>
>> So... is everybody happy with the result and now we can drop the
>> tweaking knob added to help experimentation before merging the
>> result to 'master'?
>>
>> I am pretty happy with the end result myself.
>
> I am very happy with it. I haven't had any issues, and I think we'll
> find better traction by enabling it at this point and seeing when/if
> someone complains.
>
> I think for most it won't be noticed and for those that do it will
> likely be positive.

I am doing this only to prepare in case we have a concensus,
i.e. this is not to declare that I do not care what other people
say.  Here is a patch to remove the experimentation knob.

Let's say we keep this patch out of tree for now and keep the topic
in 'next' so that people can further play with it for several more
weeks, and then apply this on top and merge the result to 'master'
early in the next cycle.

-- >8 --
diff: enable "compaction heuristics" and lose experimentation knob

It seems that the new "find a good hunk boundary by locating a blank
line" heuristics gives much more pleasant result without much
noticeable downsides.  Let's make it the new algorithm for real,
without the opt-out knob we added while experimenting with it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Documentation/diff-config.txt  |  5 -----
 Documentation/diff-options.txt |  6 ------
 diff.c                         | 11 -----------
 xdiff/xdiff.h                  |  2 --
 xdiff/xdiffi.c                 |  2 +-
 5 files changed, 1 insertion(+), 25 deletions(-)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index 9bf3e92..6eaa452 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -166,11 +166,6 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
-diff.compactionHeuristic::
-	Set this option to enable an experimental heuristic that
-	shifts the hunk boundary in an attempt to make the resulting
-	patch easier to read.
-
 diff.algorithm::
 	Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index b513023..3ad6404 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,12 +63,6 @@ ifndef::git-format-patch[]
 	Synonym for `-p --raw`.
 endif::git-format-patch[]
 
---compaction-heuristic::
---no-compaction-heuristic::
-	These are to help debugging and tuning an experimental
-	heuristic that shifts the hunk boundary in an attempt to
-	make the resulting patch easier to read.
-
 --minimal::
 	Spend extra time to make sure the smallest possible
 	diff is produced.
diff --git a/diff.c b/diff.c
index 05ca3ce..f62b7f7 100644
--- a/diff.c
+++ b/diff.c
@@ -25,7 +25,6 @@
 #endif
 
 static int diff_detect_rename_default;
-static int diff_compaction_heuristic = 1;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -184,10 +183,6 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_detect_rename_default = git_config_rename(var, value);
 		return 0;
 	}
-	if (!strcmp(var, "diff.compactionheuristic")) {
-		diff_compaction_heuristic = git_config_bool(var, value);
-		return 0;
-	}
 	if (!strcmp(var, "diff.autorefreshindex")) {
 		diff_auto_refresh_index = git_config_bool(var, value);
 		return 0;
@@ -3240,8 +3235,6 @@ void diff_setup(struct diff_options *options)
 	options->use_color = diff_use_color_default;
 	options->detect_rename = diff_detect_rename_default;
 	options->xdl_opts |= diff_algorithm;
-	if (diff_compaction_heuristic)
-		DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
 
 	options->orderfile = diff_order_file_cfg;
 
@@ -3719,10 +3712,6 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
-	else if (!strcmp(arg, "--compaction-heuristic"))
-		DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
-	else if (!strcmp(arg, "--no-compaction-heuristic"))
-		DIFF_XDL_CLR(options, COMPACTION_HEURISTIC);
 	else if (!strcmp(arg, "--patience"))
 		options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
 	else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index d1dbb27..c033991 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -41,8 +41,6 @@ extern "C" {
 
 #define XDF_IGNORE_BLANK_LINES (1 << 7)
 
-#define XDF_COMPACTION_HEURISTIC (1 << 8)
-
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_COMMON (1 << 1)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index b3c6848..574f83c 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -522,7 +522,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 		 * As we already shifted the group forward as far as possible
 		 * in the earlier loop, we need to shift it back only if at all.
 		 */
-		if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
+		if (blank_lines) {
 			while (ixs > 0 &&
 			       !is_blank_line(recs, ix - 1, flags) &&
 			       recs_match(recs, ixs - 1, ix - 1, flags)) {

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-29 20:29       ` Junio C Hamano
@ 2016-04-29 20:59         ` Jacob Keller
  2016-04-29 22:18           ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Jacob Keller @ 2016-04-29 20:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Stefan Beller, Git mailing list, Jacob Keller

On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> ... Having the two directly next to each other reads
>> better to me. This is a pretty unusual diff, though, in that it did
>> change the surrounding whitespace (and if you look further in the diff,
>> the identical change is made elsewhere _without_ touching the
>> whitespace). So this is kind of an anomaly. And IMHO the weirdness here
>> is outweighed by the vast number of improvements elsewhere.
>
> So... is everybody happy with the result and now we can drop the
> tweaking knob added to help experimentation before merging the
> result to 'master'?
>
> I am pretty happy with the end result myself.

I am very happy with it. I haven't had any issues, and I think we'll
find better traction by enabling it at this point and seeing when/if
someone complains.

I think for most it won't be noticed and for those that do it will
likely be positive.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-20  4:18     ` Jeff King
  2016-04-20  4:37       ` Jeff King
  2016-04-20  4:37       ` Stefan Beller
@ 2016-04-29 20:29       ` Junio C Hamano
  2016-04-29 20:59         ` Jacob Keller
  2 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2016-04-29 20:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Jacob Keller, Stefan Beller, Git mailing list, Jacob Keller

Jeff King <peff@peff.net> writes:

> ... Having the two directly next to each other reads
> better to me. This is a pretty unusual diff, though, in that it did
> change the surrounding whitespace (and if you look further in the diff,
> the identical change is made elsewhere _without_ touching the
> whitespace). So this is kind of an anomaly. And IMHO the weirdness here
> is outweighed by the vast number of improvements elsewhere.

So... is everybody happy with the result and now we can drop the
tweaking knob added to help experimentation before merging the
result to 'master'?

I am pretty happy with the end result myself.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-20  4:18     ` Jeff King
  2016-04-20  4:37       ` Jeff King
@ 2016-04-20  4:37       ` Stefan Beller
  2016-04-29 20:29       ` Junio C Hamano
  2 siblings, 0 replies; 45+ messages in thread
From: Stefan Beller @ 2016-04-20  4:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Jacob Keller, Junio C Hamano, Git mailing list, Jacob Keller

On Tue, Apr 19, 2016 at 9:18 PM, Jeff King <peff@peff.net> wrote:
> [your original probably didn't make it to the list because of its 5MB
>  attachment; the list has a 100K limit; I'll try to quote liberally]
>
> On Tue, Apr 19, 2016 at 04:17:50PM -0700, Jacob Keller wrote:
>
>> I ran this version of the patch against the entire Linux kernel
>> history, as I figured this has a large batch of C code to try and spot
>> any issues.
>>
>> I ran something like the following command in bash
>>
>> $git rev-list HEAD | while read -r rev; do diff -F ^commit -u <(git
>> show --format="commit %H" --no-compaction-heuristic $rev) <(git show
>> --format="commit %H" --compaction-heuristic $rev); done >
>> heuristic.patch
>
> My earlier tests with the perl script were all done with "git log -p",
> which will not show anything at all for merges (and my script wouldn't
> know how to deal with combined diffs anyway). But I think this new patch
> _will_ kick in for combined diffs (because it is built on individual
> diffs). It will be interesting to see if this has any effect there, and
> what it looks like.
>
> We should be able to see it (on a small enough repository) with:
>
>   git log --format='commit %H' --cc --merges
>
> and comparing the before/after.
>
>> I've attached the file that I generated for the Linux history, it's
>> rather large so hopefully I can get some help to spot any differences.
>> The above approach will work for pretty much any repository, and works
>> better than trying to generate the entire thing first and then diff
>> (since that runs out of memory pretty fast).
>
> I don't think there is much point in generating a complete diff between
> the patches for every commit, when nobody can look at the whole thing.
> Unless we have automated tooling to find "interesting" bits (and
> certainly a tool to remove the boring "a comment got shifted by one"
> lines would help; those are all known improvements, but it's the _other_
> stuff we want to look).
>
> But if we are not using automated tooling to find the needle in the
> haystack, we might as well using sampling to make the dataset more
> manageable. Adding "--since=1.year.ago" is one way, though we may want
> to sample more randomly across time.
>
>> So far, I haven't spotted anything that would want me to disable it,
>> while I've spotted several cases where I felt that readability was
>> improved. It's somewhat difficult to spot though.
>
> I did find one case that I think is worse. Look at 857942fd1a in the
> kernel. It has a pattern like this:
>
>   ... surrounding code ...
>
>   function_one();
>   ... more surrounding code ...
>
> which becomes:
>
>   ... surrounding code ...
>
>   function_two();
>
>   ... more surrounding code
>
> Without the new heuristic, that looks like:
>
>   -function_one();
>   +function_two();
>   +
>
> but with it, it becomes:
>
>   +
>   +function_two();
>
>   -function_one();
>
> which is kind of weird. Having the two directly next to each other reads
> better to me. This is a pretty unusual diff, though, in that it did
> change the surrounding whitespace (and if you look further in the diff,
> the identical change is made elsewhere _without_ touching the
> whitespace). So this is kind of an anomaly. And IMHO the weirdness here
> is outweighed by the vast number of improvements elsewhere.

The new implementation supports the flags for ignoring white space, too.

>
> -Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-20  4:18     ` Jeff King
@ 2016-04-20  4:37       ` Jeff King
  2016-04-20  4:37       ` Stefan Beller
  2016-04-29 20:29       ` Junio C Hamano
  2 siblings, 0 replies; 45+ messages in thread
From: Jeff King @ 2016-04-20  4:37 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Stefan Beller, Junio C Hamano, Git mailing list, Jacob Keller

On Wed, Apr 20, 2016 at 12:18:27AM -0400, Jeff King wrote:

> My earlier tests with the perl script were all done with "git log -p",
> which will not show anything at all for merges (and my script wouldn't
> know how to deal with combined diffs anyway). But I think this new patch
> _will_ kick in for combined diffs (because it is built on individual
> diffs). It will be interesting to see if this has any effect there, and
> what it looks like.
> 
> We should be able to see it (on a small enough repository) with:
> 
>   git log --format='commit %H' --cc --merges
> 
> and comparing the before/after.

Add in "-p" if you are testing the tip of jk/diff-compact-heuristic. It
is based on the older maintenance track in which "--cc" does not imply
"-p".

Looking over the results, it's about what you'd expect (comment blocks
shifted by one as we want, and then there happens to be a one-line
conflict resolved later in the hunk).

The most interesting thing I found was db65f0fc3b1e. There we have two
functions being added in the same spot, and the resolution obviously is
to put one after the other. So both sides do the usual comment-block
thing, and the resulting combined diff carries through that improvement
as you'd expect.

-Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
       [not found]   ` <CA+P7+xoqn3fxEZGn02ST1XV-2UpQGr3iwV-37R8pakFJy_9n0w@mail.gmail.com>
@ 2016-04-20  4:18     ` Jeff King
  2016-04-20  4:37       ` Jeff King
                         ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Jeff King @ 2016-04-20  4:18 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Stefan Beller, Junio C Hamano, Git mailing list, Jacob Keller

[your original probably didn't make it to the list because of its 5MB
 attachment; the list has a 100K limit; I'll try to quote liberally]

On Tue, Apr 19, 2016 at 04:17:50PM -0700, Jacob Keller wrote:

> I ran this version of the patch against the entire Linux kernel
> history, as I figured this has a large batch of C code to try and spot
> any issues.
> 
> I ran something like the following command in bash
> 
> $git rev-list HEAD | while read -r rev; do diff -F ^commit -u <(git
> show --format="commit %H" --no-compaction-heuristic $rev) <(git show
> --format="commit %H" --compaction-heuristic $rev); done >
> heuristic.patch

My earlier tests with the perl script were all done with "git log -p",
which will not show anything at all for merges (and my script wouldn't
know how to deal with combined diffs anyway). But I think this new patch
_will_ kick in for combined diffs (because it is built on individual
diffs). It will be interesting to see if this has any effect there, and
what it looks like.

We should be able to see it (on a small enough repository) with:

  git log --format='commit %H' --cc --merges

and comparing the before/after.

> I've attached the file that I generated for the Linux history, it's
> rather large so hopefully I can get some help to spot any differences.
> The above approach will work for pretty much any repository, and works
> better than trying to generate the entire thing first and then diff
> (since that runs out of memory pretty fast).

I don't think there is much point in generating a complete diff between
the patches for every commit, when nobody can look at the whole thing.
Unless we have automated tooling to find "interesting" bits (and
certainly a tool to remove the boring "a comment got shifted by one"
lines would help; those are all known improvements, but it's the _other_
stuff we want to look).

But if we are not using automated tooling to find the needle in the
haystack, we might as well using sampling to make the dataset more
manageable. Adding "--since=1.year.ago" is one way, though we may want
to sample more randomly across time.

> So far, I haven't spotted anything that would want me to disable it,
> while I've spotted several cases where I felt that readability was
> improved. It's somewhat difficult to spot though.

I did find one case that I think is worse. Look at 857942fd1a in the
kernel. It has a pattern like this:

  ... surrounding code ...

  function_one();
  ... more surrounding code ...

which becomes:

  ... surrounding code ...

  function_two();

  ... more surrounding code

Without the new heuristic, that looks like:

  -function_one();
  +function_two();
  +

but with it, it becomes:

  +
  +function_two();

  -function_one();

which is kind of weird. Having the two directly next to each other reads
better to me. This is a pretty unusual diff, though, in that it did
change the surrounding whitespace (and if you look further in the diff,
the identical change is made elsewhere _without_ touching the
whitespace). So this is kind of an anomaly. And IMHO the weirdness here
is outweighed by the vast number of improvements elsewhere.

-Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-19 15:21 [PATCHv5 0/2] " Stefan Beller
@ 2016-04-19 15:21 ` Stefan Beller
       [not found]   ` <CA+P7+xoqn3fxEZGn02ST1XV-2UpQGr3iwV-37R8pakFJy_9n0w@mail.gmail.com>
  0 siblings, 1 reply; 45+ messages in thread
From: Stefan Beller @ 2016-04-19 15:21 UTC (permalink / raw)
  To: gitster, git, jacob.keller, peff; +Cc: Stefan Beller, Jacob Keller

In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:

...
 /*
+ *
+ *
+ */
+
+/*
...

instead of the more readable equivalent of

...
+/*
+ *
+ *
+ */
+
 /*
...

Implement the following heuristic to (optionally) produce the desired
output.

  If there are diff chunks which can be shifted around, shift each hunk
  such that the last common empty line is below the chunk with the rest
  of the context above.

This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. However, as
with any heuristic it is not really known whether this will always be
more optimal. Thus, it can be disabled via diff.compactionHeuristic.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-config.txt  |  5 +++++
 Documentation/diff-options.txt |  6 ++++++
 diff.c                         | 11 +++++++++++
 xdiff/xdiff.h                  |  2 ++
 xdiff/xdiffi.c                 | 26 ++++++++++++++++++++++++++
 5 files changed, 50 insertions(+)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index 6eaa452..9bf3e92 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -166,6 +166,11 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
+diff.compactionHeuristic::
+	Set this option to enable an experimental heuristic that
+	shifts the hunk boundary in an attempt to make the resulting
+	patch easier to read.
+
 diff.algorithm::
 	Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 3ad6404..b513023 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,6 +63,12 @@ ifndef::git-format-patch[]
 	Synonym for `-p --raw`.
 endif::git-format-patch[]
 
+--compaction-heuristic::
+--no-compaction-heuristic::
+	These are to help debugging and tuning an experimental
+	heuristic that shifts the hunk boundary in an attempt to
+	make the resulting patch easier to read.
+
 --minimal::
 	Spend extra time to make sure the smallest possible
 	diff is produced.
diff --git a/diff.c b/diff.c
index f62b7f7..05ca3ce 100644
--- a/diff.c
+++ b/diff.c
@@ -25,6 +25,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_compaction_heuristic = 1;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -181,6 +182,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 	}
 	if (!strcmp(var, "diff.renames")) {
 		diff_detect_rename_default = git_config_rename(var, value);
+		return 0;
+	}
+	if (!strcmp(var, "diff.compactionheuristic")) {
+		diff_compaction_heuristic = git_config_bool(var, value);
 		return 0;
 	}
 	if (!strcmp(var, "diff.autorefreshindex")) {
@@ -3235,6 +3240,8 @@ void diff_setup(struct diff_options *options)
 	options->use_color = diff_use_color_default;
 	options->detect_rename = diff_detect_rename_default;
 	options->xdl_opts |= diff_algorithm;
+	if (diff_compaction_heuristic)
+		DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
 
 	options->orderfile = diff_order_file_cfg;
 
@@ -3712,6 +3719,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+	else if (!strcmp(arg, "--compaction-heuristic"))
+		DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
+	else if (!strcmp(arg, "--no-compaction-heuristic"))
+		DIFF_XDL_CLR(options, COMPACTION_HEURISTIC);
 	else if (!strcmp(arg, "--patience"))
 		options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
 	else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index c033991..d1dbb27 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -41,6 +41,8 @@ extern "C" {
 
 #define XDF_IGNORE_BLANK_LINES (1 << 7)
 
+#define XDF_COMPACTION_HEURISTIC (1 << 8)
+
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_COMMON (1 << 1)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 748eeb9..b3c6848 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,11 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
 }
 
 
+static int is_blank_line(xrecord_t **recs, long ix, long flags)
+{
+	return xdl_blankline(recs[ix]->ptr, recs[ix]->size, flags);
+}
+
 static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 {
 	return (recs[ixs]->ha == recs[ix]->ha &&
@@ -411,6 +416,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
+	unsigned int blank_lines;
 	xrecord_t **recs = xdf->recs;
 
 	/*
@@ -444,6 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 
 		do {
 			grpsiz = ix - ixs;
+			blank_lines = 0;
 
 			/*
 			 * If the line before the current change group, is equal to
@@ -478,6 +485,8 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the group.
 			 */
 			while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
+				blank_lines += is_blank_line(recs, ix, flags);
+
 				rchg[ixs++] = 0;
 				rchg[ix++] = 1;
 
@@ -503,6 +512,23 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			rchg[--ixs] = 1;
 			rchg[--ix] = 0;
 			while (rchgo[--ixo]);
+		}
+
+		/*
+		 * If a group can be moved back and forth, see if there is a
+		 * blank line in the moving space. If there is a blank line,
+		 * make sure the last blank line is the end of the group.
+		 *
+		 * As we already shifted the group forward as far as possible
+		 * in the earlier loop, we need to shift it back only if at all.
+		 */
+		if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
+			while (ixs > 0 &&
+			       !is_blank_line(recs, ix - 1, flags) &&
+			       recs_match(recs, ixs - 1, ix - 1, flags)) {
+				rchg[--ixs] = 1;
+				rchg[--ix] = 0;
+			}
 		}
 	}
 
-- 
2.4.11.2.g96ed4e5.dirty

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-18 19:22       ` Junio C Hamano
@ 2016-04-18 19:33         ` Stefan Beller
  0 siblings, 0 replies; 45+ messages in thread
From: Stefan Beller @ 2016-04-18 19:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jacob Keller, Git mailing list, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

On Mon, Apr 18, 2016 at 12:22 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jacob Keller <jacob.keller@gmail.com> writes:
>
>> I think we're going to make use of xdl_blankline instead of this or
>> our own "is_emptyline"
>
> OK, so perhaps either of you two can do a final version people can
> start having fun with?

Junios proposal seems to be on top of my latest series sent out,
I'll squash it in and send it out as a final version if you don't mind
(though I'll do it later today; currently diving into Gerrits Java)

>
> By the way, I really do not want to see something this low-level to
> be end-user tweakable with "one bit enable/disable"; the end users
> shouldn't have to bother [1].

Ok. Thanks for fixing that mistake.

> I left it in but renamed after "what"
> it enables/disables, not "how" the enabled thing works, to clarify
> that we have this only as a developers' aid.


>
> *1* I am fine with --compaction-heuristic=(shortest|blank|...) that
> allows a choice among many as a developers' aid, but I do not think
> this topic is there yet.

This doesn't bode well with
    > +--compaction-heuristic::
    > +--no-compaction-heuristic::

in the future? I'd rather have
    +--compaction-heuristic=none
    +--compaction-heuristic=lastEmptyLine
such that we don't have to worry about further experiments (or matured
heuristics) later?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-16  1:07     ` Jacob Keller
@ 2016-04-18 19:22       ` Junio C Hamano
  2016-04-18 19:33         ` Stefan Beller
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2016-04-18 19:22 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Stefan Beller, Git mailing list, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

Jacob Keller <jacob.keller@gmail.com> writes:

> I think we're going to make use of xdl_blankline instead of this or
> our own "is_emptyline"

OK, so perhaps either of you two can do a final version people can
start having fun with?

By the way, I really do not want to see something this low-level to
be end-user tweakable with "one bit enable/disable"; the end users
shouldn't have to bother [1].  I left it in but renamed after "what"
it enables/disables, not "how" the enabled thing works, to clarify
that we have this only as a developers' aid.

*1* I am fine with --compaction-heuristic=(shortest|blank|...) that
allows a choice among many as a developers' aid, but I do not think
this topic is there yet.

 Documentation/diff-config.txt  |  9 ++++-----
 Documentation/diff-options.txt | 10 +++++-----
 diff.c                         | 18 +++++++++---------
 xdiff/xdiff.h                  |  2 +-
 xdiff/xdiffi.c                 | 22 ++++++++++------------
 5 files changed, 29 insertions(+), 32 deletions(-)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index c62745b..9bf3e92 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -166,11 +166,10 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
-diff.shortestLineHeuristic::
-	Set this option to true to enable the shortest line chunk heuristic when
-	producing diff output. This heuristic will attempt to shift hunks such
-	that the last shortest common line occurs below the hunk with the rest of
-	the context above it.
+diff.compactionHeuristic::
+	Set this option to enable an experimental heuristic that
+	shifts the hunk boundary in an attempt to make the resulting
+	patch easier to read.
 
 diff.algorithm::
 	Choose a diff algorithm.  The variants are as follows:
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 238f39c..b513023 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,11 +63,11 @@ ifndef::git-format-patch[]
 	Synonym for `-p --raw`.
 endif::git-format-patch[]
 
---shortest-line-heuristic::
---no-shortest-line-heuristic::
-	When possible, shift common shortest line in diff hunks below the hunk
-	such that the last common shortest line for each hunk is below, with the
-	rest of the context above the hunk.
+--compaction-heuristic::
+--no-compaction-heuristic::
+	These are to help debugging and tuning an experimental
+	heuristic that shifts the hunk boundary in an attempt to
+	make the resulting patch easier to read.
 
 --minimal::
 	Spend extra time to make sure the smallest possible
diff --git a/diff.c b/diff.c
index 276174c..02c75c3 100644
--- a/diff.c
+++ b/diff.c
@@ -25,7 +25,7 @@
 #endif
 
 static int diff_detect_rename_default;
-static int diff_shortest_line_heuristic = 0;
+static int diff_compaction_heuristic = 1;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -184,8 +184,8 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_detect_rename_default = git_config_rename(var, value);
 		return 0;
 	}
-	if (!strcmp(var, "diff.shortestlineheuristic")) {
-		diff_shortest_line_heuristic = git_config_bool(var, value);
+	if (!strcmp(var, "diff.compactionheuristic")) {
+		diff_compaction_heuristic = git_config_bool(var, value);
 		return 0;
 	}
 	if (!strcmp(var, "diff.autorefreshindex")) {
@@ -3240,8 +3240,8 @@ void diff_setup(struct diff_options *options)
 	options->use_color = diff_use_color_default;
 	options->detect_rename = diff_detect_rename_default;
 	options->xdl_opts |= diff_algorithm;
-	if (diff_shortest_line_heuristic)
-		DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC);
+	if (diff_compaction_heuristic)
+		DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
 
 	options->orderfile = diff_order_file_cfg;
 
@@ -3719,10 +3719,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
-	else if (!strcmp(arg, "--shortest-line-heuristic"))
-		DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC);
-	else if (!strcmp(arg, "--no-shortest-line-heuristic"))
-		DIFF_XDL_CLR(options, SHORTEST_LINE_HEURISTIC);
+	else if (!strcmp(arg, "--compaction-heuristic"))
+		DIFF_XDL_SET(options, COMPACTION_HEURISTIC);
+	else if (!strcmp(arg, "--no-compaction-heuristic"))
+		DIFF_XDL_CLR(options, COMPACTION_HEURISTIC);
 	else if (!strcmp(arg, "--patience"))
 		options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
 	else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 968ac62..d1dbb27 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -41,7 +41,7 @@ extern "C" {
 
 #define XDF_IGNORE_BLANK_LINES (1 << 7)
 
-#define XDF_SHORTEST_LINE_HEURISTIC (1 << 8)
+#define XDF_COMPACTION_HEURISTIC (1 << 8)
 
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_COMMON (1 << 1)
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 7d15b26..1ec46e0 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,10 +400,9 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
 }
 
 
-static int line_length(const char *recs)
+static int is_blank_line(xrecord_t **recs, long ix, long flags)
 {
-	char *s = strchr(recs, '\n');
-	return s ? s - recs : strlen(recs);
+	return xdl_blankline(recs[ix]->ptr, recs[ix]->size, flags);
 }
 
 static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
@@ -417,7 +416,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
-	unsigned int shortest_line;
+	unsigned int blank_lines;
 	xrecord_t **recs = xdf->recs;
 
 	/*
@@ -451,7 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 
 		do {
 			grpsiz = ix - ixs;
-			shortest_line = UINT_MAX;
+			blank_lines = 0;
 
 			/*
 			 * If the line before the current change group, is equal to
@@ -486,9 +485,8 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the group.
 			 */
 			while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
-				int l = line_length(recs[ix]->ptr);
-				if (l < shortest_line)
-					shortest_line = l;
+
+				blank_lines += is_blank_line(recs, ix, flags);
 
 				rchg[ixs++] = 0;
 				rchg[ix++] = 1;
@@ -519,15 +517,15 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 
 		/*
 		 * If a group can be moved back and forth, see if there is an
-		 * empty line in the moving space. If there is an empty line,
-		 * make sure the last empty line is the end of the group.
+		 * blank line in the moving space. If there is a blank line,
+		 * make sure the last blank line is the end of the group.
 		 *
 		 * As we shifted the group forward as far as possible, we only
 		 * need to shift it back if at all.
 		 */
-		if ((flags & XDF_SHORTEST_LINE_HEURISTIC)) {
+		if ((flags & XDF_COMPACTION_HEURISTIC)) {
 			while (ixs > 0 &&
-			       line_length(recs[ix - 1]->ptr) > shortest_line &&
+			       !is_blank_line(recs, ix - 1, flags) &&
 			       recs_match(recs, ixs - 1, ix - 1, flags)) {
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
-- 
2.8.1-399-g96b3b3a

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-16  0:49   ` Junio C Hamano
  2016-04-16  0:59     ` Stefan Beller
@ 2016-04-16  1:07     ` Jacob Keller
  2016-04-18 19:22       ` Junio C Hamano
  1 sibling, 1 reply; 45+ messages in thread
From: Jacob Keller @ 2016-04-16  1:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Stefan Beller, Git mailing list, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

On Fri, Apr 15, 2016 at 5:49 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> +static int line_length(const char *recs)
>> +{
>> +     char *s = strchr(recs, '\n');
>> +     return s ? s - recs : strlen(recs);
>> +}
>
> It seems that you guys are discarding this "number of bytes on a
> line, no matter what these bytes are" idea, so this may be moot, but
> is there a guarantee that reading through recs until you happen to
> see a NUL is safe?
>
> Shouldn't the code that accesses a "line" be using the same "from
> here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid
> having to scan the underlying string in an unbounded way?
>
>

I think we're going to make use of xdl_blankline instead of this or
our own "is_emptyline"

Thanks,
Jake

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-16  0:49   ` Junio C Hamano
@ 2016-04-16  0:59     ` Stefan Beller
  2016-04-16  1:07     ` Jacob Keller
  1 sibling, 0 replies; 45+ messages in thread
From: Stefan Beller @ 2016-04-16  0:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jacob Keller, git, Jeff King, Jens Lehmann, Davide Libenzi, Jacob Keller

On Fri, Apr 15, 2016 at 5:49 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> +static int line_length(const char *recs)
>> +{
>> +     char *s = strchr(recs, '\n');
>> +     return s ? s - recs : strlen(recs);
>> +}
>
> It seems that you guys are discarding this "number of bytes on a
> line, no matter what these bytes are" idea, so this may be moot, but
> is there a guarantee that reading through recs until you happen to
> see a NUL is safe?

We discarded this idea as it produces to many errors.
(We'd be back at the 50:50 case, "is it really worth it?")

We will go back to the "empty line" heuristic, which will be solved
via xdl_blankline(rec[i]->ptr, rec[i]->size, flags); which could be inlined.
That will solve the CRLF issue as a CR is covered as a whitespace
(with CRLF you'd have to specify diff to ignore white spaces).

For the safety I assumed
* there is always a \n even on the last line by convention.
* in case it is not, the string is null terminated, hence
  strchr and strlen for the rescue.

>
> Shouldn't the code that accesses a "line" be using the same "from
> here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid
> having to scan the underlying string in an unbounded way?

xdl_blankline will use ->size, so we'll be holding it right.

Thanks,
Stefan

>
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 23:01 ` [PATCH 2/2] xdiff: implement empty line " Stefan Beller
  2016-04-15 23:05   ` Jacob Keller
@ 2016-04-16  0:49   ` Junio C Hamano
  2016-04-16  0:59     ` Stefan Beller
  2016-04-16  1:07     ` Jacob Keller
  1 sibling, 2 replies; 45+ messages in thread
From: Junio C Hamano @ 2016-04-16  0:49 UTC (permalink / raw)
  To: Stefan Beller
  Cc: jacob.keller, git, peff, Jens.Lehmann, davidel, Jacob Keller

Stefan Beller <sbeller@google.com> writes:

> +static int line_length(const char *recs)
> +{
> +	char *s = strchr(recs, '\n');
> +	return s ? s - recs : strlen(recs);
> +}

It seems that you guys are discarding this "number of bytes on a
line, no matter what these bytes are" idea, so this may be moot, but
is there a guarantee that reading through recs until you happen to
see a NUL is safe?

Shouldn't the code that accesses a "line" be using the same "from
here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid
having to scan the underlying string in an unbounded way?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 23:32     ` Jacob Keller
@ 2016-04-15 23:45       ` Stefan Beller
  0 siblings, 0 replies; 45+ messages in thread
From: Stefan Beller @ 2016-04-15 23:45 UTC (permalink / raw)
  To: Jacob Keller
  Cc: Git mailing list, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

On Fri, Apr 15, 2016 at 4:32 PM, Jacob Keller <jacob.keller@gmail.com> wrote:
> On Fri, Apr 15, 2016 at 4:05 PM, Jacob Keller <jacob.keller@gmail.com> wrote:
>> There's a few places that will need cleaning up (comments and such)
>> that mention empty line still, but that's not surprising. I am going
>> to test this for a bit on my local repos, and see if it makes any
>> difference to the old heuristic as well.
>>
>> Thanks,
>> Jake
>>
>
> I ran this heuristic on git.git and it produces tons of false positive
> transforms which are much lease readable (to me at least), far more
> than those produced by the newline/blank link heuristic did.
>
> I think we should stick with the empty line heuristic instead of this
> version, even if it's easier to implement this version.

I agree. The heuristic is worse as we often have these 50:50 chances
of messing stuff up.

>
> We still would need to figure out how to handle CRLF properly but it's
> worth resolving that than this heuristic is.
>
> Thanks,
> Jake

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 23:05   ` Jacob Keller
@ 2016-04-15 23:32     ` Jacob Keller
  2016-04-15 23:45       ` Stefan Beller
  0 siblings, 1 reply; 45+ messages in thread
From: Jacob Keller @ 2016-04-15 23:32 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git mailing list, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

On Fri, Apr 15, 2016 at 4:05 PM, Jacob Keller <jacob.keller@gmail.com> wrote:
> There's a few places that will need cleaning up (comments and such)
> that mention empty line still, but that's not surprising. I am going
> to test this for a bit on my local repos, and see if it makes any
> difference to the old heuristic as well.
>
> Thanks,
> Jake
>

I ran this heuristic on git.git and it produces tons of false positive
transforms which are much lease readable (to me at least), far more
than those produced by the newline/blank link heuristic did.

I think we should stick with the empty line heuristic instead of this
version, even if it's easier to implement this version.

We still would need to figure out how to handle CRLF properly but it's
worth resolving that than this heuristic is.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 23:01 ` [PATCH 2/2] xdiff: implement empty line " Stefan Beller
@ 2016-04-15 23:05   ` Jacob Keller
  2016-04-15 23:32     ` Jacob Keller
  2016-04-16  0:49   ` Junio C Hamano
  1 sibling, 1 reply; 45+ messages in thread
From: Jacob Keller @ 2016-04-15 23:05 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Git mailing list, Junio C Hamano, Jeff King, Jens Lehmann,
	Davide Libenzi, Jacob Keller

On Fri, Apr 15, 2016 at 4:01 PM, Stefan Beller <sbeller@google.com> wrote:
> In order to produce the smallest possible diff and combine several diff
> hunks together, we implement a heuristic from GNU Diff which moves diff
> hunks forward as far as possible when we find common context above and
> below a diff hunk. This sometimes produces less readable diffs when
> writing C, Shell, or other programming languages, ie:
>
> ...
>  /*
> + *
> + *
> + */
> +
> +/*
> ...
>
> instead of the more readable equivalent of
>
> ...
> +/*
> + *
> + *
> + */
> +
>  /*
> ...
>
> Original discussion and testing found the following heuristic to be
> producing the desired output:
>
>   If there are diff chunks which can be shifted around, shift each hunk
>   such that the last common empty line is below the chunk with the rest
>   of the context above.
>
> This heuristic appears to resolve the above example and several other
> common issues without producing significantly weird results. When
> implementing this heuristic the handling of empty lines was awkward as
> it is unclear what an empty line is. ('\n' or do we include "\r\n" as it
> is common on Windows?) Instead we implement a slightly different heuristic:
>
>   If there are diff chunks which can be shifted around, find the shortest
>   line in the overlapping parts. Use the line with the shortest length that
>   occurs last as the last line of the chunk with the rest
>   of the context above.
>
> However, as with any heuristic it is not really known whether this will
> always be more optimal. Thus, leave the heuristic disabled by default.
>
> Add an XDIFF flag to enable this heuristic only conditionally. Add
> a diff command line option and diff configuration option to allow users
> to enable this option when desired.
>
> TODO:
> * Add tests
> * Add better/more documentation explaining the heuristic, possibly with
>   examples(?)
> * better name(?)
>

There's a few places that will need cleaning up (comments and such)
that mention empty line still, but that's not surprising. I am going
to test this for a bit on my local repos, and see if it makes any
difference to the old heuristic as well.

Thanks,
Jake

> Signed-off-by: Stefan Beller <sbeller@google.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>  Documentation/diff-config.txt  |  6 ++++++
>  Documentation/diff-options.txt |  6 ++++++
>  diff.c                         | 11 +++++++++++
>  xdiff/xdiff.h                  |  2 ++
>  xdiff/xdiffi.c                 | 29 +++++++++++++++++++++++++++++
>  5 files changed, 54 insertions(+)
>
> diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
> index edba565..3d99a90 100644
> --- a/Documentation/diff-config.txt
> +++ b/Documentation/diff-config.txt
> @@ -170,6 +170,12 @@ diff.tool::
>
>  include::mergetools-diff.txt[]
>
> +diff.shortestLineHeuristic::
> +       Set this option to true to enable the shortest line chunk heuristic when
> +       producing diff output. This heuristic will attempt to shift hunks such
> +       that the last shortest common line occurs below the hunk with the rest of
> +       the context above it.
> +
>  diff.algorithm::
>         Choose a diff algorithm.  The variants are as follows:
>  +
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index 4b0318e..b1ca83d 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -63,6 +63,12 @@ ifndef::git-format-patch[]
>         Synonym for `-p --raw`.
>  endif::git-format-patch[]
>
> +--shortest-line-heuristic::
> +--no-shortest-line-heuristic::
> +       When possible, shift common shortest line in diff hunks below the hunk
> +       such that the last common shortest line for each hunk is below, with the
> +       rest of the context above the hunk.
> +
>  --minimal::
>         Spend extra time to make sure the smallest possible
>         diff is produced.
> diff --git a/diff.c b/diff.c
> index 4dfe660..a02aff9 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -26,6 +26,7 @@
>  #endif
>
>  static int diff_detect_rename_default;
> +static int diff_shortest_line_heuristic = 0;
>  static int diff_rename_limit_default = 400;
>  static int diff_suppress_blank_empty;
>  static int diff_use_color_default = -1;
> @@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
>                 diff_detect_rename_default = git_config_rename(var, value);
>                 return 0;
>         }
> +       if (!strcmp(var, "diff.shortestlineheuristic")) {
> +               diff_shortest_line_heuristic = git_config_bool(var, value);
> +               return 0;
> +       }
>         if (!strcmp(var, "diff.autorefreshindex")) {
>                 diff_auto_refresh_index = git_config_bool(var, value);
>                 return 0;
> @@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
>         options->use_color = diff_use_color_default;
>         options->detect_rename = diff_detect_rename_default;
>         options->xdl_opts |= diff_algorithm;
> +       if (diff_shortest_line_heuristic)
> +               DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC);
>
>         options->orderfile = diff_order_file_cfg;
>
> @@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
>                 DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
>         else if (!strcmp(arg, "--ignore-blank-lines"))
>                 DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
> +       else if (!strcmp(arg, "--shortest-line-heuristic"))
> +               DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC);
> +       else if (!strcmp(arg, "--no-shortest-line-heuristic"))
> +               DIFF_XDL_CLR(options, SHORTEST_LINE_HEURISTIC);
>         else if (!strcmp(arg, "--patience"))
>                 options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
>         else if (!strcmp(arg, "--histogram"))
> diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
> index 4fb7e79..e1f8ec0 100644
> --- a/xdiff/xdiff.h
> +++ b/xdiff/xdiff.h
> @@ -41,6 +41,8 @@ extern "C" {
>
>  #define XDF_IGNORE_BLANK_LINES (1 << 7)
>
> +#define XDF_SHORTEST_LINE_HEURISTIC (1 << 8)
> +
>  #define XDL_EMIT_FUNCNAMES (1 << 0)
>  #define XDL_EMIT_FUNCCONTEXT (1 << 2)
>
> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index 748eeb9..7d15b26 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -400,6 +400,12 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
>  }
>
>
> +static int line_length(const char *recs)
> +{
> +       char *s = strchr(recs, '\n');
> +       return s ? s - recs : strlen(recs);
> +}
> +
>  static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
>  {
>         return (recs[ixs]->ha == recs[ix]->ha &&
> @@ -411,6 +417,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
>  int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>         long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
>         char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
> +       unsigned int shortest_line;
>         xrecord_t **recs = xdf->recs;
>
>         /*
> @@ -444,6 +451,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>
>                 do {
>                         grpsiz = ix - ixs;
> +                       shortest_line = UINT_MAX;
>
>                         /*
>                          * If the line before the current change group, is equal to
> @@ -478,6 +486,10 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>                          * the group.
>                          */
>                         while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
> +                               int l = line_length(recs[ix]->ptr);
> +                               if (l < shortest_line)
> +                                       shortest_line = l;
> +
>                                 rchg[ixs++] = 0;
>                                 rchg[ix++] = 1;
>
> @@ -504,6 +516,23 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
>                         rchg[--ix] = 0;
>                         while (rchgo[--ixo]);
>                 }
> +
> +               /*
> +                * If a group can be moved back and forth, see if there is an
> +                * empty line in the moving space. If there is an empty line,
> +                * make sure the last empty line is the end of the group.
> +                *
> +                * As we shifted the group forward as far as possible, we only
> +                * need to shift it back if at all.
> +                */
> +               if ((flags & XDF_SHORTEST_LINE_HEURISTIC)) {
> +                       while (ixs > 0 &&
> +                              line_length(recs[ix - 1]->ptr) > shortest_line &&
> +                              recs_match(recs, ixs - 1, ix - 1, flags)) {
> +                               rchg[--ixs] = 1;
> +                               rchg[--ix] = 0;
> +                       }
> +               }
>         }
>
>         return 0;
> --
> 2.8.1.189.gd13d43c
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 2/2] xdiff: implement empty line chunk heuristic
  2016-04-15 23:01 [RFC PATCH, WAS: "weird diff output?" v3a 0/2] implement shortest line diff " Stefan Beller
@ 2016-04-15 23:01 ` Stefan Beller
  2016-04-15 23:05   ` Jacob Keller
  2016-04-16  0:49   ` Junio C Hamano
  0 siblings, 2 replies; 45+ messages in thread
From: Stefan Beller @ 2016-04-15 23:01 UTC (permalink / raw)
  To: jacob.keller
  Cc: git, gitster, peff, Jens.Lehmann, davidel, Stefan Beller, Jacob Keller

In order to produce the smallest possible diff and combine several diff
hunks together, we implement a heuristic from GNU Diff which moves diff
hunks forward as far as possible when we find common context above and
below a diff hunk. This sometimes produces less readable diffs when
writing C, Shell, or other programming languages, ie:

...
 /*
+ *
+ *
+ */
+
+/*
...

instead of the more readable equivalent of

...
+/*
+ *
+ *
+ */
+
 /*
...

Original discussion and testing found the following heuristic to be
producing the desired output:

  If there are diff chunks which can be shifted around, shift each hunk
  such that the last common empty line is below the chunk with the rest
  of the context above.

This heuristic appears to resolve the above example and several other
common issues without producing significantly weird results. When
implementing this heuristic the handling of empty lines was awkward as
it is unclear what an empty line is. ('\n' or do we include "\r\n" as it
is common on Windows?) Instead we implement a slightly different heuristic:

  If there are diff chunks which can be shifted around, find the shortest
  line in the overlapping parts. Use the line with the shortest length that
  occurs last as the last line of the chunk with the rest
  of the context above.

However, as with any heuristic it is not really known whether this will
always be more optimal. Thus, leave the heuristic disabled by default.

Add an XDIFF flag to enable this heuristic only conditionally. Add
a diff command line option and diff configuration option to allow users
to enable this option when desired.

TODO:
* Add tests
* Add better/more documentation explaining the heuristic, possibly with
  examples(?)
* better name(?)

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 Documentation/diff-config.txt  |  6 ++++++
 Documentation/diff-options.txt |  6 ++++++
 diff.c                         | 11 +++++++++++
 xdiff/xdiff.h                  |  2 ++
 xdiff/xdiffi.c                 | 29 +++++++++++++++++++++++++++++
 5 files changed, 54 insertions(+)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index edba565..3d99a90 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -170,6 +170,12 @@ diff.tool::
 
 include::mergetools-diff.txt[]
 
+diff.shortestLineHeuristic::
+	Set this option to true to enable the shortest line chunk heuristic when
+	producing diff output. This heuristic will attempt to shift hunks such
+	that the last shortest common line occurs below the hunk with the rest of
+	the context above it.
+
 diff.algorithm::
 	Choose a diff algorithm.  The variants are as follows:
 +
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 4b0318e..b1ca83d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -63,6 +63,12 @@ ifndef::git-format-patch[]
 	Synonym for `-p --raw`.
 endif::git-format-patch[]
 
+--shortest-line-heuristic::
+--no-shortest-line-heuristic::
+	When possible, shift common shortest line in diff hunks below the hunk
+	such that the last common shortest line for each hunk is below, with the
+	rest of the context above the hunk.
+
 --minimal::
 	Spend extra time to make sure the smallest possible
 	diff is produced.
diff --git a/diff.c b/diff.c
index 4dfe660..a02aff9 100644
--- a/diff.c
+++ b/diff.c
@@ -26,6 +26,7 @@
 #endif
 
 static int diff_detect_rename_default;
+static int diff_shortest_line_heuristic = 0;
 static int diff_rename_limit_default = 400;
 static int diff_suppress_blank_empty;
 static int diff_use_color_default = -1;
@@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb)
 		diff_detect_rename_default = git_config_rename(var, value);
 		return 0;
 	}
+	if (!strcmp(var, "diff.shortestlineheuristic")) {
+		diff_shortest_line_heuristic = git_config_bool(var, value);
+		return 0;
+	}
 	if (!strcmp(var, "diff.autorefreshindex")) {
 		diff_auto_refresh_index = git_config_bool(var, value);
 		return 0;
@@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options)
 	options->use_color = diff_use_color_default;
 	options->detect_rename = diff_detect_rename_default;
 	options->xdl_opts |= diff_algorithm;
+	if (diff_shortest_line_heuristic)
+		DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC);
 
 	options->orderfile = diff_order_file_cfg;
 
@@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options,
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
 	else if (!strcmp(arg, "--ignore-blank-lines"))
 		DIFF_XDL_SET(options, IGNORE_BLANK_LINES);
+	else if (!strcmp(arg, "--shortest-line-heuristic"))
+		DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC);
+	else if (!strcmp(arg, "--no-shortest-line-heuristic"))
+		DIFF_XDL_CLR(options, SHORTEST_LINE_HEURISTIC);
 	else if (!strcmp(arg, "--patience"))
 		options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF);
 	else if (!strcmp(arg, "--histogram"))
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 4fb7e79..e1f8ec0 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -41,6 +41,8 @@ extern "C" {
 
 #define XDF_IGNORE_BLANK_LINES (1 << 7)
 
+#define XDF_SHORTEST_LINE_HEURISTIC (1 << 8)
+
 #define XDL_EMIT_FUNCNAMES (1 << 0)
 #define XDL_EMIT_FUNCCONTEXT (1 << 2)
 
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 748eeb9..7d15b26 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,6 +400,12 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
 }
 
 
+static int line_length(const char *recs)
+{
+	char *s = strchr(recs, '\n');
+	return s ? s - recs : strlen(recs);
+}
+
 static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 {
 	return (recs[ixs]->ha == recs[ix]->ha &&
@@ -411,6 +417,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
 int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
+	unsigned int shortest_line;
 	xrecord_t **recs = xdf->recs;
 
 	/*
@@ -444,6 +451,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 
 		do {
 			grpsiz = ix - ixs;
+			shortest_line = UINT_MAX;
 
 			/*
 			 * If the line before the current change group, is equal to
@@ -478,6 +486,10 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			 * the group.
 			 */
 			while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
+				int l = line_length(recs[ix]->ptr);
+				if (l < shortest_line)
+					shortest_line = l;
+
 				rchg[ixs++] = 0;
 				rchg[ix++] = 1;
 
@@ -504,6 +516,23 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 			rchg[--ix] = 0;
 			while (rchgo[--ixo]);
 		}
+
+		/*
+		 * If a group can be moved back and forth, see if there is an
+		 * empty line in the moving space. If there is an empty line,
+		 * make sure the last empty line is the end of the group.
+		 *
+		 * As we shifted the group forward as far as possible, we only
+		 * need to shift it back if at all.
+		 */
+		if ((flags & XDF_SHORTEST_LINE_HEURISTIC)) {
+			while (ixs > 0 &&
+			       line_length(recs[ix - 1]->ptr) > shortest_line &&
+			       recs_match(recs, ixs - 1, ix - 1, flags)) {
+				rchg[--ixs] = 1;
+				rchg[--ix] = 0;
+			}
+		}
 	}
 
 	return 0;
-- 
2.8.1.189.gd13d43c

^ permalink raw reply related	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-05-03 17:55 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-18 21:12 [PATCH 0/2 v4] xdiff: implement empty line chunk heuristic Stefan Beller
2016-04-18 21:12 ` [PATCH 1/2] xdiff: add recs_match helper function Stefan Beller
2016-04-18 21:12 ` [PATCH 2/2] xdiff: implement empty line chunk heuristic Stefan Beller
2016-04-18 22:04   ` Jacob Keller
2016-04-18 22:24     ` Junio C Hamano
2016-04-19  5:03   ` Jeff King
2016-04-19  6:47     ` Stefan Beller
2016-04-19  7:00       ` Jeff King
2016-04-19  7:05         ` Stefan Beller
2016-04-19 15:17     ` Stefan Beller
2016-04-19 17:06       ` Jeff King
2016-04-19 23:02         ` Jacob Keller
2016-04-19 23:07           ` Junio C Hamano
2016-04-20 13:12             ` Michael S. Tsirkin
2016-04-20 16:09               ` Junio C Hamano
2016-04-20 16:17                 ` Jeff King
2016-04-20  6:00         ` Junio C Hamano
2016-04-19 16:51     ` Junio C Hamano
2016-04-18 21:22 ` [PATCH 0/2 v4] " Junio C Hamano
2016-04-18 23:53   ` Stefan Beller
  -- strict thread matches above, loose matches on Subject: below --
2016-04-19 15:21 [PATCHv5 0/2] " Stefan Beller
2016-04-19 15:21 ` [PATCH 2/2] " Stefan Beller
     [not found]   ` <CA+P7+xoqn3fxEZGn02ST1XV-2UpQGr3iwV-37R8pakFJy_9n0w@mail.gmail.com>
2016-04-20  4:18     ` Jeff King
2016-04-20  4:37       ` Jeff King
2016-04-20  4:37       ` Stefan Beller
2016-04-29 20:29       ` Junio C Hamano
2016-04-29 20:59         ` Jacob Keller
2016-04-29 22:18           ` Junio C Hamano
2016-04-29 22:35             ` Stefan Beller
2016-04-29 22:39               ` Keller, Jacob E
2016-04-29 22:44                 ` Stefan Beller
2016-04-29 22:48                   ` Keller, Jacob E
2016-05-02 17:40                     ` Junio C Hamano
2016-05-02 17:45                       ` Stefan Beller
2016-05-02 18:02                       ` Jeff King
2016-05-03 17:55                         ` Jacob Keller
2016-04-30  3:06               ` Jeff King
2016-04-15 23:01 [RFC PATCH, WAS: "weird diff output?" v3a 0/2] implement shortest line diff " Stefan Beller
2016-04-15 23:01 ` [PATCH 2/2] xdiff: implement empty line " Stefan Beller
2016-04-15 23:05   ` Jacob Keller
2016-04-15 23:32     ` Jacob Keller
2016-04-15 23:45       ` Stefan Beller
2016-04-16  0:49   ` Junio C Hamano
2016-04-16  0:59     ` Stefan Beller
2016-04-16  1:07     ` Jacob Keller
2016-04-18 19:22       ` Junio C Hamano
2016-04-18 19:33         ` Stefan Beller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.