* [PATCHv5 0/2] xdiff: implement empty line chunk heuristic @ 2016-04-19 15:21 Stefan Beller 2016-04-19 15:21 ` [PATCH 1/2] xdiff: add recs_match helper function Stefan Beller ` (2 more replies) 0 siblings, 3 replies; 44+ messages in thread From: Stefan Beller @ 2016-04-19 15:21 UTC (permalink / raw) To: gitster, git, jacob.keller, peff; +Cc: Stefan Beller Thanks Jeff for pointing out issues in the comment! Thanks, Stefan diff to origin/jk/diff-compact-heuristic: diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index 5a02b15..b3c6848 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -515,12 +515,12 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { } /* - * If a group can be moved back and forth, see if there is an + * If a group can be moved back and forth, see if there is a * blank line in the moving space. If there is a blank line, * make sure the last blank line is the end of the group. * - * As we shifted the group forward as far as possible, we only - * need to shift it back if at all. + * As we already shifted the group forward as far as possible + * in the earlier loop, we need to shift it back only if at all. */ if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) { while (ixs > 0 && Jacob Keller (1): xdiff: add recs_match helper function Stefan Beller (1): xdiff: implement empty line chunk heuristic Documentation/diff-config.txt | 5 +++++ Documentation/diff-options.txt | 6 ++++++ diff.c | 11 +++++++++++ xdiff/xdiff.h | 2 ++ xdiff/xdiffi.c | 40 ++++++++++++++++++++++++++++++++++++---- 5 files changed, 60 insertions(+), 4 deletions(-) -- 2.4.11.2.g96ed4e5.dirty ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 1/2] xdiff: add recs_match helper function 2016-04-19 15:21 [PATCHv5 0/2] xdiff: implement empty line chunk heuristic Stefan Beller @ 2016-04-19 15:21 ` Stefan Beller 2016-04-19 15:21 ` [PATCH 2/2] xdiff: implement empty line chunk heuristic Stefan Beller 2016-04-19 16:23 ` [PATCHv5 0/2] " Junio C Hamano 2 siblings, 0 replies; 44+ messages in thread From: Stefan Beller @ 2016-04-19 15:21 UTC (permalink / raw) To: gitster, git, jacob.keller, peff; +Cc: Jacob Keller, Stefan Beller From: Jacob Keller <jacob.keller@gmail.com> It is a common pattern in xdl_change_compact to check that hashes and strings match. The resulting code to perform this change causes very long lines and makes it hard to follow the intention. Introduce a helper function recs_match which performs both checks to increase code readability. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- xdiff/xdiffi.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index 2358a2d..748eeb9 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -400,6 +400,14 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, } +static int recs_match(xrecord_t **recs, long ixs, long ix, long flags) +{ + return (recs[ixs]->ha == recs[ix]->ha && + xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, + recs[ix]->ptr, recs[ix]->size, + flags)); +} + int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec; char *rchg = xdf->rchg, *rchgo = xdfo->rchg; @@ -442,8 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { * the last line of the current change group, shift backward * the group. */ - while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha && - xdl_recmatch(recs[ixs - 1]->ptr, recs[ixs - 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) { + while (ixs > 0 && recs_match(recs, ixs - 1, ix - 1, flags)) { rchg[--ixs] = 1; rchg[--ix] = 0; @@ -470,8 +477,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { * the line next of the current change group, shift forward * the group. */ - while (ix < nrec && recs[ixs]->ha == recs[ix]->ha && - xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size, recs[ix]->ptr, recs[ix]->size, flags)) { + while (ix < nrec && recs_match(recs, ixs, ix, flags)) { rchg[ixs++] = 0; rchg[ix++] = 1; -- 2.4.11.2.g96ed4e5.dirty ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-19 15:21 [PATCHv5 0/2] xdiff: implement empty line chunk heuristic Stefan Beller 2016-04-19 15:21 ` [PATCH 1/2] xdiff: add recs_match helper function Stefan Beller @ 2016-04-19 15:21 ` Stefan Beller [not found] ` <CA+P7+xoqn3fxEZGn02ST1XV-2UpQGr3iwV-37R8pakFJy_9n0w@mail.gmail.com> 2016-04-19 16:23 ` [PATCHv5 0/2] " Junio C Hamano 2 siblings, 1 reply; 44+ messages in thread From: Stefan Beller @ 2016-04-19 15:21 UTC (permalink / raw) To: gitster, git, jacob.keller, peff; +Cc: Stefan Beller, Jacob Keller In order to produce the smallest possible diff and combine several diff hunks together, we implement a heuristic from GNU Diff which moves diff hunks forward as far as possible when we find common context above and below a diff hunk. This sometimes produces less readable diffs when writing C, Shell, or other programming languages, ie: ... /* + * + * + */ + +/* ... instead of the more readable equivalent of ... +/* + * + * + */ + /* ... Implement the following heuristic to (optionally) produce the desired output. If there are diff chunks which can be shifted around, shift each hunk such that the last common empty line is below the chunk with the rest of the context above. This heuristic appears to resolve the above example and several other common issues without producing significantly weird results. However, as with any heuristic it is not really known whether this will always be more optimal. Thus, it can be disabled via diff.compactionHeuristic. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Stefan Beller <sbeller@google.com> --- Documentation/diff-config.txt | 5 +++++ Documentation/diff-options.txt | 6 ++++++ diff.c | 11 +++++++++++ xdiff/xdiff.h | 2 ++ xdiff/xdiffi.c | 26 ++++++++++++++++++++++++++ 5 files changed, 50 insertions(+) diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt index 6eaa452..9bf3e92 100644 --- a/Documentation/diff-config.txt +++ b/Documentation/diff-config.txt @@ -166,6 +166,11 @@ diff.tool:: include::mergetools-diff.txt[] +diff.compactionHeuristic:: + Set this option to enable an experimental heuristic that + shifts the hunk boundary in an attempt to make the resulting + patch easier to read. + diff.algorithm:: Choose a diff algorithm. The variants are as follows: + diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 3ad6404..b513023 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -63,6 +63,12 @@ ifndef::git-format-patch[] Synonym for `-p --raw`. endif::git-format-patch[] +--compaction-heuristic:: +--no-compaction-heuristic:: + These are to help debugging and tuning an experimental + heuristic that shifts the hunk boundary in an attempt to + make the resulting patch easier to read. + --minimal:: Spend extra time to make sure the smallest possible diff is produced. diff --git a/diff.c b/diff.c index f62b7f7..05ca3ce 100644 --- a/diff.c +++ b/diff.c @@ -25,6 +25,7 @@ #endif static int diff_detect_rename_default; +static int diff_compaction_heuristic = 1; static int diff_rename_limit_default = 400; static int diff_suppress_blank_empty; static int diff_use_color_default = -1; @@ -181,6 +182,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb) } if (!strcmp(var, "diff.renames")) { diff_detect_rename_default = git_config_rename(var, value); + return 0; + } + if (!strcmp(var, "diff.compactionheuristic")) { + diff_compaction_heuristic = git_config_bool(var, value); return 0; } if (!strcmp(var, "diff.autorefreshindex")) { @@ -3235,6 +3240,8 @@ void diff_setup(struct diff_options *options) options->use_color = diff_use_color_default; options->detect_rename = diff_detect_rename_default; options->xdl_opts |= diff_algorithm; + if (diff_compaction_heuristic) + DIFF_XDL_SET(options, COMPACTION_HEURISTIC); options->orderfile = diff_order_file_cfg; @@ -3712,6 +3719,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac) DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL); else if (!strcmp(arg, "--ignore-blank-lines")) DIFF_XDL_SET(options, IGNORE_BLANK_LINES); + else if (!strcmp(arg, "--compaction-heuristic")) + DIFF_XDL_SET(options, COMPACTION_HEURISTIC); + else if (!strcmp(arg, "--no-compaction-heuristic")) + DIFF_XDL_CLR(options, COMPACTION_HEURISTIC); else if (!strcmp(arg, "--patience")) options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF); else if (!strcmp(arg, "--histogram")) diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index c033991..d1dbb27 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -41,6 +41,8 @@ extern "C" { #define XDF_IGNORE_BLANK_LINES (1 << 7) +#define XDF_COMPACTION_HEURISTIC (1 << 8) + #define XDL_EMIT_FUNCNAMES (1 << 0) #define XDL_EMIT_COMMON (1 << 1) #define XDL_EMIT_FUNCCONTEXT (1 << 2) diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index 748eeb9..b3c6848 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -400,6 +400,11 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, } +static int is_blank_line(xrecord_t **recs, long ix, long flags) +{ + return xdl_blankline(recs[ix]->ptr, recs[ix]->size, flags); +} + static int recs_match(xrecord_t **recs, long ixs, long ix, long flags) { return (recs[ixs]->ha == recs[ix]->ha && @@ -411,6 +416,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags) int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec; char *rchg = xdf->rchg, *rchgo = xdfo->rchg; + unsigned int blank_lines; xrecord_t **recs = xdf->recs; /* @@ -444,6 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { do { grpsiz = ix - ixs; + blank_lines = 0; /* * If the line before the current change group, is equal to @@ -478,6 +485,8 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { * the group. */ while (ix < nrec && recs_match(recs, ixs, ix, flags)) { + blank_lines += is_blank_line(recs, ix, flags); + rchg[ixs++] = 0; rchg[ix++] = 1; @@ -503,6 +512,23 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { rchg[--ixs] = 1; rchg[--ix] = 0; while (rchgo[--ixo]); + } + + /* + * If a group can be moved back and forth, see if there is a + * blank line in the moving space. If there is a blank line, + * make sure the last blank line is the end of the group. + * + * As we already shifted the group forward as far as possible + * in the earlier loop, we need to shift it back only if at all. + */ + if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) { + while (ixs > 0 && + !is_blank_line(recs, ix - 1, flags) && + recs_match(recs, ixs - 1, ix - 1, flags)) { + rchg[--ixs] = 1; + rchg[--ix] = 0; + } } } -- 2.4.11.2.g96ed4e5.dirty ^ permalink raw reply related [flat|nested] 44+ messages in thread
[parent not found: <CA+P7+xoqn3fxEZGn02ST1XV-2UpQGr3iwV-37R8pakFJy_9n0w@mail.gmail.com>]
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic [not found] ` <CA+P7+xoqn3fxEZGn02ST1XV-2UpQGr3iwV-37R8pakFJy_9n0w@mail.gmail.com> @ 2016-04-20 4:18 ` Jeff King 2016-04-20 4:37 ` Jeff King ` (2 more replies) 0 siblings, 3 replies; 44+ messages in thread From: Jeff King @ 2016-04-20 4:18 UTC (permalink / raw) To: Jacob Keller Cc: Stefan Beller, Junio C Hamano, Git mailing list, Jacob Keller [your original probably didn't make it to the list because of its 5MB attachment; the list has a 100K limit; I'll try to quote liberally] On Tue, Apr 19, 2016 at 04:17:50PM -0700, Jacob Keller wrote: > I ran this version of the patch against the entire Linux kernel > history, as I figured this has a large batch of C code to try and spot > any issues. > > I ran something like the following command in bash > > $git rev-list HEAD | while read -r rev; do diff -F ^commit -u <(git > show --format="commit %H" --no-compaction-heuristic $rev) <(git show > --format="commit %H" --compaction-heuristic $rev); done > > heuristic.patch My earlier tests with the perl script were all done with "git log -p", which will not show anything at all for merges (and my script wouldn't know how to deal with combined diffs anyway). But I think this new patch _will_ kick in for combined diffs (because it is built on individual diffs). It will be interesting to see if this has any effect there, and what it looks like. We should be able to see it (on a small enough repository) with: git log --format='commit %H' --cc --merges and comparing the before/after. > I've attached the file that I generated for the Linux history, it's > rather large so hopefully I can get some help to spot any differences. > The above approach will work for pretty much any repository, and works > better than trying to generate the entire thing first and then diff > (since that runs out of memory pretty fast). I don't think there is much point in generating a complete diff between the patches for every commit, when nobody can look at the whole thing. Unless we have automated tooling to find "interesting" bits (and certainly a tool to remove the boring "a comment got shifted by one" lines would help; those are all known improvements, but it's the _other_ stuff we want to look). But if we are not using automated tooling to find the needle in the haystack, we might as well using sampling to make the dataset more manageable. Adding "--since=1.year.ago" is one way, though we may want to sample more randomly across time. > So far, I haven't spotted anything that would want me to disable it, > while I've spotted several cases where I felt that readability was > improved. It's somewhat difficult to spot though. I did find one case that I think is worse. Look at 857942fd1a in the kernel. It has a pattern like this: ... surrounding code ... function_one(); ... more surrounding code ... which becomes: ... surrounding code ... function_two(); ... more surrounding code Without the new heuristic, that looks like: -function_one(); +function_two(); + but with it, it becomes: + +function_two(); -function_one(); which is kind of weird. Having the two directly next to each other reads better to me. This is a pretty unusual diff, though, in that it did change the surrounding whitespace (and if you look further in the diff, the identical change is made elsewhere _without_ touching the whitespace). So this is kind of an anomaly. And IMHO the weirdness here is outweighed by the vast number of improvements elsewhere. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-20 4:18 ` Jeff King @ 2016-04-20 4:37 ` Jeff King 2016-04-20 4:37 ` Stefan Beller 2016-04-29 20:29 ` Junio C Hamano 2 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2016-04-20 4:37 UTC (permalink / raw) To: Jacob Keller Cc: Stefan Beller, Junio C Hamano, Git mailing list, Jacob Keller On Wed, Apr 20, 2016 at 12:18:27AM -0400, Jeff King wrote: > My earlier tests with the perl script were all done with "git log -p", > which will not show anything at all for merges (and my script wouldn't > know how to deal with combined diffs anyway). But I think this new patch > _will_ kick in for combined diffs (because it is built on individual > diffs). It will be interesting to see if this has any effect there, and > what it looks like. > > We should be able to see it (on a small enough repository) with: > > git log --format='commit %H' --cc --merges > > and comparing the before/after. Add in "-p" if you are testing the tip of jk/diff-compact-heuristic. It is based on the older maintenance track in which "--cc" does not imply "-p". Looking over the results, it's about what you'd expect (comment blocks shifted by one as we want, and then there happens to be a one-line conflict resolved later in the hunk). The most interesting thing I found was db65f0fc3b1e. There we have two functions being added in the same spot, and the resolution obviously is to put one after the other. So both sides do the usual comment-block thing, and the resulting combined diff carries through that improvement as you'd expect. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-20 4:18 ` Jeff King 2016-04-20 4:37 ` Jeff King @ 2016-04-20 4:37 ` Stefan Beller 2016-04-29 20:29 ` Junio C Hamano 2 siblings, 0 replies; 44+ messages in thread From: Stefan Beller @ 2016-04-20 4:37 UTC (permalink / raw) To: Jeff King; +Cc: Jacob Keller, Junio C Hamano, Git mailing list, Jacob Keller On Tue, Apr 19, 2016 at 9:18 PM, Jeff King <peff@peff.net> wrote: > [your original probably didn't make it to the list because of its 5MB > attachment; the list has a 100K limit; I'll try to quote liberally] > > On Tue, Apr 19, 2016 at 04:17:50PM -0700, Jacob Keller wrote: > >> I ran this version of the patch against the entire Linux kernel >> history, as I figured this has a large batch of C code to try and spot >> any issues. >> >> I ran something like the following command in bash >> >> $git rev-list HEAD | while read -r rev; do diff -F ^commit -u <(git >> show --format="commit %H" --no-compaction-heuristic $rev) <(git show >> --format="commit %H" --compaction-heuristic $rev); done > >> heuristic.patch > > My earlier tests with the perl script were all done with "git log -p", > which will not show anything at all for merges (and my script wouldn't > know how to deal with combined diffs anyway). But I think this new patch > _will_ kick in for combined diffs (because it is built on individual > diffs). It will be interesting to see if this has any effect there, and > what it looks like. > > We should be able to see it (on a small enough repository) with: > > git log --format='commit %H' --cc --merges > > and comparing the before/after. > >> I've attached the file that I generated for the Linux history, it's >> rather large so hopefully I can get some help to spot any differences. >> The above approach will work for pretty much any repository, and works >> better than trying to generate the entire thing first and then diff >> (since that runs out of memory pretty fast). > > I don't think there is much point in generating a complete diff between > the patches for every commit, when nobody can look at the whole thing. > Unless we have automated tooling to find "interesting" bits (and > certainly a tool to remove the boring "a comment got shifted by one" > lines would help; those are all known improvements, but it's the _other_ > stuff we want to look). > > But if we are not using automated tooling to find the needle in the > haystack, we might as well using sampling to make the dataset more > manageable. Adding "--since=1.year.ago" is one way, though we may want > to sample more randomly across time. > >> So far, I haven't spotted anything that would want me to disable it, >> while I've spotted several cases where I felt that readability was >> improved. It's somewhat difficult to spot though. > > I did find one case that I think is worse. Look at 857942fd1a in the > kernel. It has a pattern like this: > > ... surrounding code ... > > function_one(); > ... more surrounding code ... > > which becomes: > > ... surrounding code ... > > function_two(); > > ... more surrounding code > > Without the new heuristic, that looks like: > > -function_one(); > +function_two(); > + > > but with it, it becomes: > > + > +function_two(); > > -function_one(); > > which is kind of weird. Having the two directly next to each other reads > better to me. This is a pretty unusual diff, though, in that it did > change the surrounding whitespace (and if you look further in the diff, > the identical change is made elsewhere _without_ touching the > whitespace). So this is kind of an anomaly. And IMHO the weirdness here > is outweighed by the vast number of improvements elsewhere. The new implementation supports the flags for ignoring white space, too. > > -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-20 4:18 ` Jeff King 2016-04-20 4:37 ` Jeff King 2016-04-20 4:37 ` Stefan Beller @ 2016-04-29 20:29 ` Junio C Hamano 2016-04-29 20:59 ` Jacob Keller 2 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2016-04-29 20:29 UTC (permalink / raw) To: Jeff King; +Cc: Jacob Keller, Stefan Beller, Git mailing list, Jacob Keller Jeff King <peff@peff.net> writes: > ... Having the two directly next to each other reads > better to me. This is a pretty unusual diff, though, in that it did > change the surrounding whitespace (and if you look further in the diff, > the identical change is made elsewhere _without_ touching the > whitespace). So this is kind of an anomaly. And IMHO the weirdness here > is outweighed by the vast number of improvements elsewhere. So... is everybody happy with the result and now we can drop the tweaking knob added to help experimentation before merging the result to 'master'? I am pretty happy with the end result myself. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-29 20:29 ` Junio C Hamano @ 2016-04-29 20:59 ` Jacob Keller 2016-04-29 22:18 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Jacob Keller @ 2016-04-29 20:59 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Stefan Beller, Git mailing list, Jacob Keller On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano <gitster@pobox.com> wrote: > Jeff King <peff@peff.net> writes: > >> ... Having the two directly next to each other reads >> better to me. This is a pretty unusual diff, though, in that it did >> change the surrounding whitespace (and if you look further in the diff, >> the identical change is made elsewhere _without_ touching the >> whitespace). So this is kind of an anomaly. And IMHO the weirdness here >> is outweighed by the vast number of improvements elsewhere. > > So... is everybody happy with the result and now we can drop the > tweaking knob added to help experimentation before merging the > result to 'master'? > > I am pretty happy with the end result myself. I am very happy with it. I haven't had any issues, and I think we'll find better traction by enabling it at this point and seeing when/if someone complains. I think for most it won't be noticed and for those that do it will likely be positive. Thanks, Jake ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-29 20:59 ` Jacob Keller @ 2016-04-29 22:18 ` Junio C Hamano 2016-04-29 22:35 ` Stefan Beller 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2016-04-29 22:18 UTC (permalink / raw) To: Jacob Keller; +Cc: Jeff King, Stefan Beller, Git mailing list, Jacob Keller Jacob Keller <jacob.keller@gmail.com> writes: > On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano <gitster@pobox.com> wrote: >> Jeff King <peff@peff.net> writes: >> >>> ... Having the two directly next to each other reads >>> better to me. This is a pretty unusual diff, though, in that it did >>> change the surrounding whitespace (and if you look further in the diff, >>> the identical change is made elsewhere _without_ touching the >>> whitespace). So this is kind of an anomaly. And IMHO the weirdness here >>> is outweighed by the vast number of improvements elsewhere. >> >> So... is everybody happy with the result and now we can drop the >> tweaking knob added to help experimentation before merging the >> result to 'master'? >> >> I am pretty happy with the end result myself. > > I am very happy with it. I haven't had any issues, and I think we'll > find better traction by enabling it at this point and seeing when/if > someone complains. > > I think for most it won't be noticed and for those that do it will > likely be positive. I am doing this only to prepare in case we have a concensus, i.e. this is not to declare that I do not care what other people say. Here is a patch to remove the experimentation knob. Let's say we keep this patch out of tree for now and keep the topic in 'next' so that people can further play with it for several more weeks, and then apply this on top and merge the result to 'master' early in the next cycle. -- >8 -- diff: enable "compaction heuristics" and lose experimentation knob It seems that the new "find a good hunk boundary by locating a blank line" heuristics gives much more pleasant result without much noticeable downsides. Let's make it the new algorithm for real, without the opt-out knob we added while experimenting with it. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Documentation/diff-config.txt | 5 ----- Documentation/diff-options.txt | 6 ------ diff.c | 11 ----------- xdiff/xdiff.h | 2 -- xdiff/xdiffi.c | 2 +- 5 files changed, 1 insertion(+), 25 deletions(-) diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt index 9bf3e92..6eaa452 100644 --- a/Documentation/diff-config.txt +++ b/Documentation/diff-config.txt @@ -166,11 +166,6 @@ diff.tool:: include::mergetools-diff.txt[] -diff.compactionHeuristic:: - Set this option to enable an experimental heuristic that - shifts the hunk boundary in an attempt to make the resulting - patch easier to read. - diff.algorithm:: Choose a diff algorithm. The variants are as follows: + diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index b513023..3ad6404 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -63,12 +63,6 @@ ifndef::git-format-patch[] Synonym for `-p --raw`. endif::git-format-patch[] ---compaction-heuristic:: ---no-compaction-heuristic:: - These are to help debugging and tuning an experimental - heuristic that shifts the hunk boundary in an attempt to - make the resulting patch easier to read. - --minimal:: Spend extra time to make sure the smallest possible diff is produced. diff --git a/diff.c b/diff.c index 05ca3ce..f62b7f7 100644 --- a/diff.c +++ b/diff.c @@ -25,7 +25,6 @@ #endif static int diff_detect_rename_default; -static int diff_compaction_heuristic = 1; static int diff_rename_limit_default = 400; static int diff_suppress_blank_empty; static int diff_use_color_default = -1; @@ -184,10 +183,6 @@ int git_diff_ui_config(const char *var, const char *value, void *cb) diff_detect_rename_default = git_config_rename(var, value); return 0; } - if (!strcmp(var, "diff.compactionheuristic")) { - diff_compaction_heuristic = git_config_bool(var, value); - return 0; - } if (!strcmp(var, "diff.autorefreshindex")) { diff_auto_refresh_index = git_config_bool(var, value); return 0; @@ -3240,8 +3235,6 @@ void diff_setup(struct diff_options *options) options->use_color = diff_use_color_default; options->detect_rename = diff_detect_rename_default; options->xdl_opts |= diff_algorithm; - if (diff_compaction_heuristic) - DIFF_XDL_SET(options, COMPACTION_HEURISTIC); options->orderfile = diff_order_file_cfg; @@ -3719,10 +3712,6 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac) DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL); else if (!strcmp(arg, "--ignore-blank-lines")) DIFF_XDL_SET(options, IGNORE_BLANK_LINES); - else if (!strcmp(arg, "--compaction-heuristic")) - DIFF_XDL_SET(options, COMPACTION_HEURISTIC); - else if (!strcmp(arg, "--no-compaction-heuristic")) - DIFF_XDL_CLR(options, COMPACTION_HEURISTIC); else if (!strcmp(arg, "--patience")) options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF); else if (!strcmp(arg, "--histogram")) diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index d1dbb27..c033991 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -41,8 +41,6 @@ extern "C" { #define XDF_IGNORE_BLANK_LINES (1 << 7) -#define XDF_COMPACTION_HEURISTIC (1 << 8) - #define XDL_EMIT_FUNCNAMES (1 << 0) #define XDL_EMIT_COMMON (1 << 1) #define XDL_EMIT_FUNCCONTEXT (1 << 2) diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index b3c6848..574f83c 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -522,7 +522,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { * As we already shifted the group forward as far as possible * in the earlier loop, we need to shift it back only if at all. */ - if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) { + if (blank_lines) { while (ixs > 0 && !is_blank_line(recs, ix - 1, flags) && recs_match(recs, ixs - 1, ix - 1, flags)) { ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-29 22:18 ` Junio C Hamano @ 2016-04-29 22:35 ` Stefan Beller 2016-04-29 22:39 ` Keller, Jacob E 2016-04-30 3:06 ` Jeff King 0 siblings, 2 replies; 44+ messages in thread From: Stefan Beller @ 2016-04-29 22:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jacob Keller, Jeff King, Git mailing list, Jacob Keller On Fri, Apr 29, 2016 at 3:18 PM, Junio C Hamano <gitster@pobox.com> wrote: > Jacob Keller <jacob.keller@gmail.com> writes: > >> On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano <gitster@pobox.com> wrote: >>> Jeff King <peff@peff.net> writes: >>> >>>> ... Having the two directly next to each other reads >>>> better to me. This is a pretty unusual diff, though, in that it did >>>> change the surrounding whitespace (and if you look further in the diff, >>>> the identical change is made elsewhere _without_ touching the >>>> whitespace). So this is kind of an anomaly. And IMHO the weirdness here >>>> is outweighed by the vast number of improvements elsewhere. >>> >>> So... is everybody happy with the result and now we can drop the >>> tweaking knob added to help experimentation before merging the >>> result to 'master'? >>> >>> I am pretty happy with the end result myself. >> >> I am very happy with it. I haven't had any issues, and I think we'll >> find better traction by enabling it at this point and seeing when/if >> someone complains. >> >> I think for most it won't be noticed and for those that do it will >> likely be positive. > > I am doing this only to prepare in case we have a concensus, > i.e. this is not to declare that I do not care what other people > say. Here is a patch to remove the experimentation knob. > > Let's say we keep this patch out of tree for now and keep the topic > in 'next' so that people can further play with it for several more > weeks, and then apply this on top and merge the result to 'master' > early in the next cycle. > > -- >8 -- > diff: enable "compaction heuristics" and lose experimentation knob > > It seems that the new "find a good hunk boundary by locating a blank > line" heuristics gives much more pleasant result without much > noticeable downsides. Let's make it the new algorithm for real, > without the opt-out knob we added while experimenting with it. I would remove the opt-out knob much later in the game, i.e. 1) make a patch that removes the documentation only before the next release (i.e. before 2.9) 2) make a patch to remove the actual (unlabeled) knobs, merge into master before 2.10 (i.e. just after the 2.9 release) Then we get the most of the community to test it with the 2.9 release and still have an emergency knob in case some major headaches show up. After one release cycle we'll be much more confident about its usage and its short comings and do not need the emergency turn off. If the community doesn't like it for some reason we can document it and debate the default setting? I agree we want the knob gone eventually. Making it an undocumented feature is as good as that from a users point of view? > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > Documentation/diff-config.txt | 5 ----- > Documentation/diff-options.txt | 6 ------ > diff.c | 11 ----------- > xdiff/xdiff.h | 2 -- > xdiff/xdiffi.c | 2 +- > 5 files changed, 1 insertion(+), 25 deletions(-) > > diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt > index 9bf3e92..6eaa452 100644 > --- a/Documentation/diff-config.txt > +++ b/Documentation/diff-config.txt > @@ -166,11 +166,6 @@ diff.tool:: > > include::mergetools-diff.txt[] > > -diff.compactionHeuristic:: > - Set this option to enable an experimental heuristic that > - shifts the hunk boundary in an attempt to make the resulting > - patch easier to read. > - > diff.algorithm:: > Choose a diff algorithm. The variants are as follows: > + > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > index b513023..3ad6404 100644 > --- a/Documentation/diff-options.txt > +++ b/Documentation/diff-options.txt > @@ -63,12 +63,6 @@ ifndef::git-format-patch[] > Synonym for `-p --raw`. > endif::git-format-patch[] > > ---compaction-heuristic:: > ---no-compaction-heuristic:: > - These are to help debugging and tuning an experimental > - heuristic that shifts the hunk boundary in an attempt to > - make the resulting patch easier to read. > - > --minimal:: > Spend extra time to make sure the smallest possible > diff is produced. > diff --git a/diff.c b/diff.c > index 05ca3ce..f62b7f7 100644 > --- a/diff.c > +++ b/diff.c > @@ -25,7 +25,6 @@ > #endif > > static int diff_detect_rename_default; > -static int diff_compaction_heuristic = 1; > static int diff_rename_limit_default = 400; > static int diff_suppress_blank_empty; > static int diff_use_color_default = -1; > @@ -184,10 +183,6 @@ int git_diff_ui_config(const char *var, const char *value, void *cb) > diff_detect_rename_default = git_config_rename(var, value); > return 0; > } > - if (!strcmp(var, "diff.compactionheuristic")) { > - diff_compaction_heuristic = git_config_bool(var, value); > - return 0; > - } > if (!strcmp(var, "diff.autorefreshindex")) { > diff_auto_refresh_index = git_config_bool(var, value); > return 0; > @@ -3240,8 +3235,6 @@ void diff_setup(struct diff_options *options) > options->use_color = diff_use_color_default; > options->detect_rename = diff_detect_rename_default; > options->xdl_opts |= diff_algorithm; > - if (diff_compaction_heuristic) > - DIFF_XDL_SET(options, COMPACTION_HEURISTIC); > > options->orderfile = diff_order_file_cfg; > > @@ -3719,10 +3712,6 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac) > DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL); > else if (!strcmp(arg, "--ignore-blank-lines")) > DIFF_XDL_SET(options, IGNORE_BLANK_LINES); > - else if (!strcmp(arg, "--compaction-heuristic")) > - DIFF_XDL_SET(options, COMPACTION_HEURISTIC); > - else if (!strcmp(arg, "--no-compaction-heuristic")) > - DIFF_XDL_CLR(options, COMPACTION_HEURISTIC); > else if (!strcmp(arg, "--patience")) > options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF); > else if (!strcmp(arg, "--histogram")) > diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h > index d1dbb27..c033991 100644 > --- a/xdiff/xdiff.h > +++ b/xdiff/xdiff.h > @@ -41,8 +41,6 @@ extern "C" { > > #define XDF_IGNORE_BLANK_LINES (1 << 7) > > -#define XDF_COMPACTION_HEURISTIC (1 << 8) > - > #define XDL_EMIT_FUNCNAMES (1 << 0) > #define XDL_EMIT_COMMON (1 << 1) > #define XDL_EMIT_FUNCCONTEXT (1 << 2) > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index b3c6848..574f83c 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > @@ -522,7 +522,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { > * As we already shifted the group forward as far as possible > * in the earlier loop, we need to shift it back only if at all. > */ > - if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) { > + if (blank_lines) { > while (ixs > 0 && > !is_blank_line(recs, ix - 1, flags) && > recs_match(recs, ixs - 1, ix - 1, flags)) { ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-29 22:35 ` Stefan Beller @ 2016-04-29 22:39 ` Keller, Jacob E 2016-04-29 22:44 ` Stefan Beller 2016-04-30 3:06 ` Jeff King 1 sibling, 1 reply; 44+ messages in thread From: Keller, Jacob E @ 2016-04-29 22:39 UTC (permalink / raw) To: gitster, sbeller; +Cc: git, peff, jacob.keller On Fri, 2016-04-29 at 15:35 -0700, Stefan Beller wrote: > On Fri, Apr 29, 2016 at 3:18 PM, Junio C Hamano <gitster@pobox.com> > wrote: > > > > Jacob Keller <jacob.keller@gmail.com> writes: > > > > > > > > On Fri, Apr 29, 2016 at 1:29 PM, Junio C Hamano <gitster@pobox.co > > > m> wrote: > > > > > > > > Jeff King <peff@peff.net> writes: > > > > > > > > > > > > > > ... Having the two directly next to each other reads > > > > > better to me. This is a pretty unusual diff, though, in that > > > > > it did > > > > > change the surrounding whitespace (and if you look further in > > > > > the diff, > > > > > the identical change is made elsewhere _without_ touching the > > > > > whitespace). So this is kind of an anomaly. And IMHO the > > > > > weirdness here > > > > > is outweighed by the vast number of improvements elsewhere. > > > > So... is everybody happy with the result and now we can drop > > > > the > > > > tweaking knob added to help experimentation before merging the > > > > result to 'master'? > > > > > > > > I am pretty happy with the end result myself. > > > I am very happy with it. I haven't had any issues, and I think > > > we'll > > > find better traction by enabling it at this point and seeing > > > when/if > > > someone complains. > > > > > > I think for most it won't be noticed and for those that do it > > > will > > > likely be positive. > > I am doing this only to prepare in case we have a concensus, > > i.e. this is not to declare that I do not care what other people > > say. Here is a patch to remove the experimentation knob. > > > > Let's say we keep this patch out of tree for now and keep the topic > > in 'next' so that people can further play with it for several more > > weeks, and then apply this on top and merge the result to 'master' > > early in the next cycle. > > > > -- >8 -- > > diff: enable "compaction heuristics" and lose experimentation knob > > > > It seems that the new "find a good hunk boundary by locating a > > blank > > line" heuristics gives much more pleasant result without much > > noticeable downsides. Let's make it the new algorithm for real, > > without the opt-out knob we added while experimenting with it. > I would remove the opt-out knob much later in the game, i.e. > > 1) make a patch that removes the documentation only > before the next release (i.e. before 2.9) > 2) make a patch to remove the actual (unlabeled) knobs, > merge into master before 2.10 (i.e. just after the 2.9 > release) > > Then we get the most of the community to test it with the 2.9 release > and still have an emergency knob in case some major headaches > show up. After one release cycle we'll be much more confident > about its usage and its short comings and do not need the > emergency turn off. If the community doesn't like it for some reason > we can document it and debate the default setting? > > I agree we want the knob gone eventually. > Making it an undocumented feature is as good as that from > a users point of view? > Currently it's an "opt in" knob, so this doesn't make sense to me. If we remove the entire knob as is, we can always (fairly easily) add it back. I would keep the code inside xdiff as a knob, but set it to enable default so that the user config has no knob at the top level but the xdiff machinery does (this making a "disable" be relatively small patch). Thanks, Jake ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-29 22:39 ` Keller, Jacob E @ 2016-04-29 22:44 ` Stefan Beller 2016-04-29 22:48 ` Keller, Jacob E 0 siblings, 1 reply; 44+ messages in thread From: Stefan Beller @ 2016-04-29 22:44 UTC (permalink / raw) To: Keller, Jacob E; +Cc: gitster, git, peff, jacob.keller > Currently it's an "opt in" knob, so this doesn't make sense to me. +static int diff_compaction_heuristic = 1; It's rather an opt-out knob going by the current origin/jk/diff-compact-heuristic > If > we remove the entire knob as is, we can always (fairly easily) add it > back. I would keep the code inside xdiff as a knob, but set it to > enable default so that the user config has no knob at the top level but > the xdiff machinery does (this making a "disable" be relatively small > patch). When writing my reply, I thought about people using Git from a binary distribution with little to no admin rights. They want to have an emergency knob to disable this thing, but cannot patch/recompile Git. If you can patch and compile your version of Git, then reverting is easy, so in that case Junios patch looks good to me. Thanks, Stefan > > Thanks, > Jake ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-29 22:44 ` Stefan Beller @ 2016-04-29 22:48 ` Keller, Jacob E 2016-05-02 17:40 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Keller, Jacob E @ 2016-04-29 22:48 UTC (permalink / raw) To: sbeller; +Cc: git, gitster, peff, jacob.keller On Fri, 2016-04-29 at 15:44 -0700, Stefan Beller wrote: > > > > Currently it's an "opt in" knob, so this doesn't make sense to me. > +static int diff_compaction_heuristic = 1; > Oops didn't know we'd made it default at some point. (all my versions had it disabled by default) > It's rather an opt-out knob going by the current > origin/jk/diff-compact-heuristic > Yea in that case, we could keep it. > > > > > If > > we remove the entire knob as is, we can always (fairly easily) add > > it > > back. I would keep the code inside xdiff as a knob, but set it to > > enable default so that the user config has no knob at the top level > > but > > the xdiff machinery does (this making a "disable" be relatively > > small > > patch). > When writing my reply, I thought about people using Git from a binary > distribution with little to no admin rights. They want to have an > emergency > knob to disable this thing, but cannot patch/recompile Git. > > If you can patch and compile your version of Git, then reverting is > easy, so > in that case Junios patch looks good to me. > > Thanks, > Stefan True. I think the chances that it needs such a thing are quite minor, and if an undocumented knob gets exposed it would have to become documented and maintained, so I'd prefer to avoid it. Given that the risk is pretty small I think that's ok. Thanks, Jake ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-29 22:48 ` Keller, Jacob E @ 2016-05-02 17:40 ` Junio C Hamano 2016-05-02 17:45 ` Stefan Beller 2016-05-02 18:02 ` Jeff King 0 siblings, 2 replies; 44+ messages in thread From: Junio C Hamano @ 2016-05-02 17:40 UTC (permalink / raw) To: Keller, Jacob E; +Cc: sbeller, git, peff, jacob.keller "Keller, Jacob E" <jacob.e.keller@intel.com> writes: > True. I think the chances that it needs such a thing are quite minor, > and if an undocumented knob gets exposed it would have to become > documented and maintained, so I'd prefer to avoid it. Given that the > risk is pretty small I think that's ok. OK, then let's do only the "documentation" part. -- >8 -- Subject: [PATCH] diff: undocument the compaction heuristic knobs for experimentation It seems that people around here are all happy with the updated heuristics used to decide where the hunks are separated. Let's keep that as the default. Even though we do not expect too much trouble from the difference between the old and the new algorithms, just in case let's leave the implementation of the knobs to turn it off for emergencies. There is no longer need for documenting them, though. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- Documentation/diff-config.txt | 5 ----- Documentation/diff-options.txt | 6 ------ 2 files changed, 11 deletions(-) diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt index 9bf3e92..6eaa452 100644 --- a/Documentation/diff-config.txt +++ b/Documentation/diff-config.txt @@ -166,11 +166,6 @@ diff.tool:: include::mergetools-diff.txt[] -diff.compactionHeuristic:: - Set this option to enable an experimental heuristic that - shifts the hunk boundary in an attempt to make the resulting - patch easier to read. - diff.algorithm:: Choose a diff algorithm. The variants are as follows: + diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index b513023..3ad6404 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -63,12 +63,6 @@ ifndef::git-format-patch[] Synonym for `-p --raw`. endif::git-format-patch[] ---compaction-heuristic:: ---no-compaction-heuristic:: - These are to help debugging and tuning an experimental - heuristic that shifts the hunk boundary in an attempt to - make the resulting patch easier to read. - --minimal:: Spend extra time to make sure the smallest possible diff is produced. -- 2.8.2-458-gacc1066 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-05-02 17:40 ` Junio C Hamano @ 2016-05-02 17:45 ` Stefan Beller 2016-05-02 18:02 ` Jeff King 1 sibling, 0 replies; 44+ messages in thread From: Stefan Beller @ 2016-05-02 17:45 UTC (permalink / raw) To: Junio C Hamano; +Cc: Keller, Jacob E, git, peff, jacob.keller On Mon, May 2, 2016 at 10:40 AM, Junio C Hamano <gitster@pobox.com> wrote: > "Keller, Jacob E" <jacob.e.keller@intel.com> writes: > >> True. I think the chances that it needs such a thing are quite minor, >> and if an undocumented knob gets exposed it would have to become >> documented and maintained, so I'd prefer to avoid it. Given that the >> risk is pretty small I think that's ok. > > OK, then let's do only the "documentation" part. The patch below looks good to me. Thanks, Stefan > > -- >8 -- > Subject: [PATCH] diff: undocument the compaction heuristic knobs for experimentation > > It seems that people around here are all happy with the updated > heuristics used to decide where the hunks are separated. Let's keep > that as the default. Even though we do not expect too much trouble > from the difference between the old and the new algorithms, just in > case let's leave the implementation of the knobs to turn it off for > emergencies. There is no longer need for documenting them, though. > > Signed-off-by: Junio C Hamano <gitster@pobox.com> > --- > Documentation/diff-config.txt | 5 ----- > Documentation/diff-options.txt | 6 ------ > 2 files changed, 11 deletions(-) > > diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt > index 9bf3e92..6eaa452 100644 > --- a/Documentation/diff-config.txt > +++ b/Documentation/diff-config.txt > @@ -166,11 +166,6 @@ diff.tool:: > > include::mergetools-diff.txt[] > > -diff.compactionHeuristic:: > - Set this option to enable an experimental heuristic that > - shifts the hunk boundary in an attempt to make the resulting > - patch easier to read. > - > diff.algorithm:: > Choose a diff algorithm. The variants are as follows: > + > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > index b513023..3ad6404 100644 > --- a/Documentation/diff-options.txt > +++ b/Documentation/diff-options.txt > @@ -63,12 +63,6 @@ ifndef::git-format-patch[] > Synonym for `-p --raw`. > endif::git-format-patch[] > > ---compaction-heuristic:: > ---no-compaction-heuristic:: > - These are to help debugging and tuning an experimental > - heuristic that shifts the hunk boundary in an attempt to > - make the resulting patch easier to read. > - > --minimal:: > Spend extra time to make sure the smallest possible > diff is produced. > -- > 2.8.2-458-gacc1066 > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-05-02 17:40 ` Junio C Hamano 2016-05-02 17:45 ` Stefan Beller @ 2016-05-02 18:02 ` Jeff King 2016-05-03 17:55 ` Jacob Keller 1 sibling, 1 reply; 44+ messages in thread From: Jeff King @ 2016-05-02 18:02 UTC (permalink / raw) To: Junio C Hamano; +Cc: Keller, Jacob E, sbeller, git, jacob.keller On Mon, May 02, 2016 at 10:40:28AM -0700, Junio C Hamano wrote: > "Keller, Jacob E" <jacob.e.keller@intel.com> writes: > > > True. I think the chances that it needs such a thing are quite minor, > > and if an undocumented knob gets exposed it would have to become > > documented and maintained, so I'd prefer to avoid it. Given that the > > risk is pretty small I think that's ok. > > OK, then let's do only the "documentation" part. > > -- >8 -- > Subject: [PATCH] diff: undocument the compaction heuristic knobs for experimentation > > It seems that people around here are all happy with the updated > heuristics used to decide where the hunks are separated. Let's keep > that as the default. Even though we do not expect too much trouble > from the difference between the old and the new algorithms, just in > case let's leave the implementation of the knobs to turn it off for > emergencies. There is no longer need for documenting them, though. I agree with this reasoning. Thanks. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-05-02 18:02 ` Jeff King @ 2016-05-03 17:55 ` Jacob Keller 0 siblings, 0 replies; 44+ messages in thread From: Jacob Keller @ 2016-05-03 17:55 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Keller, Jacob E, sbeller, git On Mon, May 2, 2016 at 11:02 AM, Jeff King <peff@peff.net> wrote: > On Mon, May 02, 2016 at 10:40:28AM -0700, Junio C Hamano wrote: > >> "Keller, Jacob E" <jacob.e.keller@intel.com> writes: >> >> > True. I think the chances that it needs such a thing are quite minor, >> > and if an undocumented knob gets exposed it would have to become >> > documented and maintained, so I'd prefer to avoid it. Given that the >> > risk is pretty small I think that's ok. >> >> OK, then let's do only the "documentation" part. >> >> -- >8 -- >> Subject: [PATCH] diff: undocument the compaction heuristic knobs for experimentation >> >> It seems that people around here are all happy with the updated >> heuristics used to decide where the hunks are separated. Let's keep >> that as the default. Even though we do not expect too much trouble >> from the difference between the old and the new algorithms, just in >> case let's leave the implementation of the knobs to turn it off for >> emergencies. There is no longer need for documenting them, though. > > I agree with this reasoning. Thanks. > > -Peff I think I agree too. Thanks, Jake ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-29 22:35 ` Stefan Beller 2016-04-29 22:39 ` Keller, Jacob E @ 2016-04-30 3:06 ` Jeff King 1 sibling, 0 replies; 44+ messages in thread From: Jeff King @ 2016-04-30 3:06 UTC (permalink / raw) To: Stefan Beller Cc: Junio C Hamano, Jacob Keller, Git mailing list, Jacob Keller On Fri, Apr 29, 2016 at 03:35:54PM -0700, Stefan Beller wrote: > > -- >8 -- > > diff: enable "compaction heuristics" and lose experimentation knob > > > > It seems that the new "find a good hunk boundary by locating a blank > > line" heuristics gives much more pleasant result without much > > noticeable downsides. Let's make it the new algorithm for real, > > without the opt-out knob we added while experimenting with it. > > I would remove the opt-out knob much later in the game, i.e. > > 1) make a patch that removes the documentation only > before the next release (i.e. before 2.9) > 2) make a patch to remove the actual (unlabeled) knobs, > merge into master before 2.10 (i.e. just after the 2.9 release) Yeah, I think it might be a good idea to keep some sort of undocumented safety valve in the release, at least for a cycle or two. The heuristic won't _really_ see wide use until it is in a released version of git, as much as we would like it to be otherwise. So I am anticipating a possible conversation where somebody reports that the new output looks bad, and it would be nice to say "try it with this flag (or environment variable, or whatever) and see if that looks better". And then based on that conversation we can decide what the right next is (a real user-visible flag, or reversion, or deciding the user's case isn't worth it). Or maybe if we're lucky that conversation never happens. The "whatever" in the instructions can obviously be "build with this patch" or "try with an older version of git", but we're a lot more likely to get a good response if it's easy for the user to do. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCHv5 0/2] xdiff: implement empty line chunk heuristic 2016-04-19 15:21 [PATCHv5 0/2] xdiff: implement empty line chunk heuristic Stefan Beller 2016-04-19 15:21 ` [PATCH 1/2] xdiff: add recs_match helper function Stefan Beller 2016-04-19 15:21 ` [PATCH 2/2] xdiff: implement empty line chunk heuristic Stefan Beller @ 2016-04-19 16:23 ` Junio C Hamano 2 siblings, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2016-04-19 16:23 UTC (permalink / raw) To: Stefan Beller; +Cc: git, jacob.keller, peff Stefan Beller <sbeller@google.com> writes: > Thanks Jeff for pointing out issues in the comment! > > Thanks, > Stefan > > diff to origin/jk/diff-compact-heuristic: > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index 5a02b15..b3c6848 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > @@ -515,12 +515,12 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { > } > > /* > - * If a group can be moved back and forth, see if there is an > + * If a group can be moved back and forth, see if there is a > * blank line in the moving space. If there is a blank line, > * make sure the last blank line is the end of the group. I guess this is mine to blame. Thanks for catching and rerolling. Will re-queue. ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 0/2 v4] xdiff: implement empty line chunk heuristic
@ 2016-04-18 21:12 Stefan Beller
2016-04-18 21:12 ` [PATCH 2/2] " Stefan Beller
0 siblings, 1 reply; 44+ messages in thread
From: Stefan Beller @ 2016-04-18 21:12 UTC (permalink / raw)
To: gitster; +Cc: git, jacob.keller, Stefan Beller
> OK, so perhaps either of you two can do a final version people can
> start having fun with?
Here we go. I squashed in your patch, although with a minor change:
- if ((flags & XDF_SHORTEST_LINE_HEURISTIC)) {
+ if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
We did not need that in the "shortest line" heuristic as we know
a line with the shortest line length must exist. We do not know about
empty lines though.
Thanks,
Stefan
Jacob Keller (1):
xdiff: add recs_match helper function
Stefan Beller (1):
xdiff: implement empty line chunk heuristic
Documentation/diff-config.txt | 5 +++++
Documentation/diff-options.txt | 6 ++++++
diff.c | 11 +++++++++++
xdiff/xdiff.h | 2 ++
xdiff/xdiffi.c | 40 ++++++++++++++++++++++++++++++++++++----
5 files changed, 60 insertions(+), 4 deletions(-)
--
2.8.0.26.gba39a1b.dirty
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-18 21:12 [PATCH 0/2 v4] " Stefan Beller @ 2016-04-18 21:12 ` Stefan Beller 2016-04-18 22:04 ` Jacob Keller 2016-04-19 5:03 ` Jeff King 0 siblings, 2 replies; 44+ messages in thread From: Stefan Beller @ 2016-04-18 21:12 UTC (permalink / raw) To: gitster; +Cc: git, jacob.keller, Stefan Beller, Jacob Keller In order to produce the smallest possible diff and combine several diff hunks together, we implement a heuristic from GNU Diff which moves diff hunks forward as far as possible when we find common context above and below a diff hunk. This sometimes produces less readable diffs when writing C, Shell, or other programming languages, ie: ... /* + * + * + */ + +/* ... instead of the more readable equivalent of ... +/* + * + * + */ + /* ... Implement the following heuristic to (optionally) produce the desired output. If there are diff chunks which can be shifted around, shift each hunk such that the last common empty line is below the chunk with the rest of the context above. This heuristic appears to resolve the above example and several other common issues without producing significantly weird results. However, as with any heuristic it is not really known whether this will always be more optimal. Thus, it can be disabled via diff.compactionHeuristic. Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Stefan Beller <sbeller@google.com> --- Documentation/diff-config.txt | 5 +++++ Documentation/diff-options.txt | 6 ++++++ diff.c | 11 +++++++++++ xdiff/xdiff.h | 2 ++ xdiff/xdiffi.c | 26 ++++++++++++++++++++++++++ 5 files changed, 50 insertions(+) diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt index edba565..a9f4b57 100644 --- a/Documentation/diff-config.txt +++ b/Documentation/diff-config.txt @@ -170,6 +170,11 @@ diff.tool:: include::mergetools-diff.txt[] +diff.compactionHeuristic:: + Set this option to enable an experimental heuristic that + shifts the hunk boundary in an attempt to make the resulting + patch easier to read. + diff.algorithm:: Choose a diff algorithm. The variants are as follows: + diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 4b0318e..0993742 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -63,6 +63,12 @@ ifndef::git-format-patch[] Synonym for `-p --raw`. endif::git-format-patch[] +--compaction-heuristic:: +--no-compaction-heuristic:: + These are to help debugging and tuning an experimental + heuristic that shifts the hunk boundary in an attempt to + make the resulting patch easier to read. + --minimal:: Spend extra time to make sure the smallest possible diff is produced. diff --git a/diff.c b/diff.c index 4dfe660..d3734d3 100644 --- a/diff.c +++ b/diff.c @@ -26,6 +26,7 @@ #endif static int diff_detect_rename_default; +static int diff_compaction_heuristic = 1; static int diff_rename_limit_default = 400; static int diff_suppress_blank_empty; static int diff_use_color_default = -1; @@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb) diff_detect_rename_default = git_config_rename(var, value); return 0; } + if (!strcmp(var, "diff.compactionheuristic")) { + diff_compaction_heuristic = git_config_bool(var, value); + return 0; + } if (!strcmp(var, "diff.autorefreshindex")) { diff_auto_refresh_index = git_config_bool(var, value); return 0; @@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options) options->use_color = diff_use_color_default; options->detect_rename = diff_detect_rename_default; options->xdl_opts |= diff_algorithm; + if (diff_compaction_heuristic) + DIFF_XDL_SET(options, COMPACTION_HEURISTIC); options->orderfile = diff_order_file_cfg; @@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options, DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL); else if (!strcmp(arg, "--ignore-blank-lines")) DIFF_XDL_SET(options, IGNORE_BLANK_LINES); + else if (!strcmp(arg, "--compaction-heuristic")) + DIFF_XDL_SET(options, COMPACTION_HEURISTIC); + else if (!strcmp(arg, "--no-compaction-heuristic")) + DIFF_XDL_CLR(options, COMPACTION_HEURISTIC); else if (!strcmp(arg, "--patience")) options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF); else if (!strcmp(arg, "--histogram")) diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index 4fb7e79..7423f77 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -41,6 +41,8 @@ extern "C" { #define XDF_IGNORE_BLANK_LINES (1 << 7) +#define XDF_COMPACTION_HEURISTIC (1 << 8) + #define XDL_EMIT_FUNCNAMES (1 << 0) #define XDL_EMIT_FUNCCONTEXT (1 << 2) diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index 748eeb9..5a02b15 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -400,6 +400,11 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, } +static int is_blank_line(xrecord_t **recs, long ix, long flags) +{ + return xdl_blankline(recs[ix]->ptr, recs[ix]->size, flags); +} + static int recs_match(xrecord_t **recs, long ixs, long ix, long flags) { return (recs[ixs]->ha == recs[ix]->ha && @@ -411,6 +416,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags) int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec; char *rchg = xdf->rchg, *rchgo = xdfo->rchg; + unsigned int blank_lines; xrecord_t **recs = xdf->recs; /* @@ -444,6 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { do { grpsiz = ix - ixs; + blank_lines = 0; /* * If the line before the current change group, is equal to @@ -478,6 +485,8 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { * the group. */ while (ix < nrec && recs_match(recs, ixs, ix, flags)) { + blank_lines += is_blank_line(recs, ix, flags); + rchg[ixs++] = 0; rchg[ix++] = 1; @@ -504,6 +513,23 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { rchg[--ix] = 0; while (rchgo[--ixo]); } + + /* + * If a group can be moved back and forth, see if there is an + * blank line in the moving space. If there is a blank line, + * make sure the last blank line is the end of the group. + * + * As we shifted the group forward as far as possible, we only + * need to shift it back if at all. + */ + if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) { + while (ixs > 0 && + !is_blank_line(recs, ix - 1, flags) && + recs_match(recs, ixs - 1, ix - 1, flags)) { + rchg[--ixs] = 1; + rchg[--ix] = 0; + } + } } return 0; -- 2.8.0.26.gba39a1b.dirty ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-18 21:12 ` [PATCH 2/2] " Stefan Beller @ 2016-04-18 22:04 ` Jacob Keller 2016-04-18 22:24 ` Junio C Hamano 2016-04-19 5:03 ` Jeff King 1 sibling, 1 reply; 44+ messages in thread From: Jacob Keller @ 2016-04-18 22:04 UTC (permalink / raw) To: Stefan Beller; +Cc: Junio C Hamano, Git mailing list, Jacob Keller On Mon, Apr 18, 2016 at 2:12 PM, Stefan Beller <sbeller@google.com> wrote: > In order to produce the smallest possible diff and combine several diff > hunks together, we implement a heuristic from GNU Diff which moves diff > hunks forward as far as possible when we find common context above and > below a diff hunk. This sometimes produces less readable diffs when > writing C, Shell, or other programming languages, ie: > > ... > /* > + * > + * > + */ > + > +/* > ... > > instead of the more readable equivalent of > > ... > +/* > + * > + * > + */ > + > /* > ... > > Implement the following heuristic to (optionally) produce the desired > output. > > If there are diff chunks which can be shifted around, shift each hunk > such that the last common empty line is below the chunk with the rest > of the context above. > > This heuristic appears to resolve the above example and several other > common issues without producing significantly weird results. However, as > with any heuristic it is not really known whether this will always be > more optimal. Thus, it can be disabled via diff.compactionHeuristic. > > Signed-off-by: Stefan Beller <sbeller@google.com> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> > Signed-off-by: Stefan Beller <sbeller@google.com> > --- Thanks Stephan and Junio, this looks pretty good. I think before it's merged we'd probably want to implement some sort of attributes which allows per-path configuration, incase it needs to be configured at all. I've got it applied to my local git, and I'm going to try to run a diff between enabled vs disabled on a large section of the Linux kernel history and a few other projects to see if I spot anything odd. Thanks, Jake ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-18 22:04 ` Jacob Keller @ 2016-04-18 22:24 ` Junio C Hamano 0 siblings, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2016-04-18 22:24 UTC (permalink / raw) To: Jacob Keller; +Cc: Stefan Beller, Git mailing list, Jacob Keller Jacob Keller <jacob.keller@gmail.com> writes: > Thanks Stephan and Junio, this looks pretty good. I think before it's > merged we'd probably want to implement some sort of attributes which > allows per-path configuration, incase it needs to be configured at > all. My take on it is that we'd want to make sure that the shift with blank line heuristics is "good enough", i.e. there is no need for end-user configuration or attributes, and then remove the tentative option, configuration and its documentation, before this is merged. If we really want to add knobs to handle different kind of payloads in vastly different way, the right place to do so is to add a set of bits "use this and that heuristics" to userdiff driver, I would say, but in the compaction codepath it does not seem to be enough room to have that many knobs to be tweaked in the first place to me. > I've got it applied to my local git, and I'm going to try to run a > diff between enabled vs disabled on a large section of the Linux > kernel history and a few other projects to see if I spot anything odd. Thanks. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-18 21:12 ` [PATCH 2/2] " Stefan Beller 2016-04-18 22:04 ` Jacob Keller @ 2016-04-19 5:03 ` Jeff King 2016-04-19 6:47 ` Stefan Beller ` (2 more replies) 1 sibling, 3 replies; 44+ messages in thread From: Jeff King @ 2016-04-19 5:03 UTC (permalink / raw) To: Stefan Beller; +Cc: gitster, git, jacob.keller, Jacob Keller On Mon, Apr 18, 2016 at 02:12:30PM -0700, Stefan Beller wrote: > + > + /* > + * If a group can be moved back and forth, see if there is an > + * blank line in the moving space. If there is a blank line, > + * make sure the last blank line is the end of the group. s/an/a/ on the first line > + * As we shifted the group forward as far as possible, we only > + * need to shift it back if at all. Maybe because I'm reading it as a diff that only contains this hunk and not the whole rest of the function, but the "we" here confused me. You mean the earlier, existing loop in xdl_change_compact, right? Maybe something like: As we already shifted the group forward as far as possible in the earlier loop... would help. > + if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) { > + while (ixs > 0 && > + !is_blank_line(recs, ix - 1, flags) && > + recs_match(recs, ixs - 1, ix - 1, flags)) { > + rchg[--ixs] = 1; > + rchg[--ix] = 0; > + } > + } This turned out to be delightfully simple (especially compared to the perl monstrosity). I tried comparing the output to the perl one, but it's not quite the same. In that one we had to work with the existing hunks and context lines, so any hunk that got shifted ended up with extra context on one side, and too little on the other. But here, we can actually bump the context lines to give the correct amount on both sides, which is good. I guess this will invalidate old patch-ids, but there's not much to be done about that. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-19 5:03 ` Jeff King @ 2016-04-19 6:47 ` Stefan Beller 2016-04-19 7:00 ` Jeff King 2016-04-19 15:17 ` Stefan Beller 2016-04-19 16:51 ` Junio C Hamano 2 siblings, 1 reply; 44+ messages in thread From: Stefan Beller @ 2016-04-19 6:47 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, Jacob Keller, Jacob Keller On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote: > On Mon, Apr 18, 2016 at 02:12:30PM -0700, Stefan Beller wrote: > >> + >> + /* >> + * If a group can be moved back and forth, see if there is an >> + * blank line in the moving space. If there is a blank line, >> + * make sure the last blank line is the end of the group. > > s/an/a/ on the first line So it looks like I'll be resending another version for this series tomorrow. Thanks for pointing this out! > >> + * As we shifted the group forward as far as possible, we only >> + * need to shift it back if at all. > > Maybe because I'm reading it as a diff that only contains this hunk and > not the whole rest of the function, but the "we" here confused me. You > mean the earlier, existing loop in xdl_change_compact, right? > > Maybe something like: > > As we already shifted the group forward as far as possible in the > earlier loop... > > would help. I'll see to get rid of the 'we', otherwise I'll stick with your suggestion. > >> + if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) { >> + while (ixs > 0 && >> + !is_blank_line(recs, ix - 1, flags) && >> + recs_match(recs, ixs - 1, ix - 1, flags)) { >> + rchg[--ixs] = 1; >> + rchg[--ix] = 0; >> + } >> + } > > This turned out to be delightfully simple (especially compared to the > perl monstrosity). > > I tried comparing the output to the perl one, but it's not quite the > same. In that one we had to work with the existing hunks and context > lines, so any hunk that got shifted ended up with extra context on one > side, and too little on the other. But here, we can actually bump the > context lines to give the correct amount on both sides, which is good. > > I guess this will invalidate old patch-ids, but there's not much to be > done about that. For the record: I thought about "optimal hunk separation" for a while, specially during my bike commute. And while this heuristic seems to be a good fit for most of the cases inspected, we can do better (in the future). I am convinced the better way to do it is like this: Calculate the entropy for each line and take the last line with the lowest entropy as the last line of the hunk. That heuristic requires more compute though as it will be hard to compute the entropy for the line. To do that I would imagine, we'd need to loop over the whole file and count the occurrences for each char (byte) and then take the negative log of (#number of that byte / #number of bytes in file) [1]. This would model our actual goal a bit more closely to split at parts, where there is low information density (the definition of entropy). One example Jacob pointed out was a thing like /** * Comment here. Over * more lines. * + * Add line here with a blank line + * + * in between and a trailing blank after. + * */ I think we had cases like this in the kernel tree and else where, and for a human it is clear to break after the last "empty line" (which for comments starts with " * "). To detect those we can use the entropy as it doesn't convey lots of information. (git show e1f7037167323461c0415447676262dcb) It also keeps the false positives out, Jacob pointed at 85ed2f32064b82e541fc7dcf2b0049a05 IIRC, which was bad with the shortest lines only, but I'd imagine the entropy based heuristic will do better there. [1] https://en.wikipedia.org/wiki/Entropy_(information_theory) Thanks for the review, Stefan > > -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-19 6:47 ` Stefan Beller @ 2016-04-19 7:00 ` Jeff King 2016-04-19 7:05 ` Stefan Beller 0 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2016-04-19 7:00 UTC (permalink / raw) To: Stefan Beller; +Cc: Junio C Hamano, git, Jacob Keller, Jacob Keller On Mon, Apr 18, 2016 at 11:47:52PM -0700, Stefan Beller wrote: > I am convinced the better way to do it is like this: > > Calculate the entropy for each line and take the last line with the > lowest entropy as the last line of the hunk. I'll be curious to see the results, but I think sometimes predictable and stupid may be the best route with these sorts of things. In particular, I'd worry that a content-independent measure of entropy might miss some subtleties of a particular language (e.g., that "*" is more or less meaningful than some other character). But we'll see. :) -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-19 7:00 ` Jeff King @ 2016-04-19 7:05 ` Stefan Beller 0 siblings, 0 replies; 44+ messages in thread From: Stefan Beller @ 2016-04-19 7:05 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, Jacob Keller, Jacob Keller On Tue, Apr 19, 2016 at 12:00 AM, Jeff King <peff@peff.net> wrote: > On Mon, Apr 18, 2016 at 11:47:52PM -0700, Stefan Beller wrote: > >> I am convinced the better way to do it is like this: >> >> Calculate the entropy for each line and take the last line with the >> lowest entropy as the last line of the hunk. > > I'll be curious to see the results, but I think sometimes predictable > and stupid may be the best route with these sorts of things. In > particular, I'd worry that a content-independent measure of entropy > might miss some subtleties of a particular language (e.g., that "*" is > more or less meaningful than some other character). But we'll see. :) I would assume that the "*" would have little entropy when there are lots of comments, i.e. it just "feels" like an empty line. If there are no "*", then the entropy is high as it is unusual. And unusual things should not be at the border of a hunk I would assume. So m prediction is that the 'subtleties of a particular language' correlate highly with the actual use of characters. Anyway, the experiment can be carried out later. :) Thanks, Stefan > > -Peff > -- > To unsubscribe from this list: send the line "unsubscribe git" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-19 5:03 ` Jeff King 2016-04-19 6:47 ` Stefan Beller @ 2016-04-19 15:17 ` Stefan Beller 2016-04-19 17:06 ` Jeff King 2016-04-19 16:51 ` Junio C Hamano 2 siblings, 1 reply; 44+ messages in thread From: Stefan Beller @ 2016-04-19 15:17 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, Jacob Keller, Jacob Keller On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote: > I guess this will invalidate old patch-ids, but there's not much to be > done about that. What do you mean by that? (What consequences do you imagine?) I think diffs with any kind of heuristic can still be applied, no? Thanks, Stefan > > -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-19 15:17 ` Stefan Beller @ 2016-04-19 17:06 ` Jeff King 2016-04-19 23:02 ` Jacob Keller 2016-04-20 6:00 ` Junio C Hamano 0 siblings, 2 replies; 44+ messages in thread From: Jeff King @ 2016-04-19 17:06 UTC (permalink / raw) To: Stefan Beller; +Cc: Junio C Hamano, git, Jacob Keller, Jacob Keller On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote: > On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote: > > > I guess this will invalidate old patch-ids, but there's not much to be > > done about that. > > What do you mean by that? (What consequences do you imagine?) > I think diffs with any kind of heuristic can still be applied, no? I mean that if you save any old patch-ids from "git patch-id", they won't match up when compared with new versions of git. We can probably ignore it, though. This isn't the first time that patch-ids might have changed, and I think the advice is already that one should not count on them to be stable in the long term. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-19 17:06 ` Jeff King @ 2016-04-19 23:02 ` Jacob Keller 2016-04-19 23:07 ` Junio C Hamano 2016-04-20 6:00 ` Junio C Hamano 1 sibling, 1 reply; 44+ messages in thread From: Jacob Keller @ 2016-04-19 23:02 UTC (permalink / raw) To: Jeff King; +Cc: Stefan Beller, Junio C Hamano, git, Jacob Keller On Tue, Apr 19, 2016 at 10:06 AM, Jeff King <peff@peff.net> wrote: > On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote: > >> On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote: >> >> > I guess this will invalidate old patch-ids, but there's not much to be >> > done about that. >> >> What do you mean by that? (What consequences do you imagine?) >> I think diffs with any kind of heuristic can still be applied, no? > > I mean that if you save any old patch-ids from "git patch-id", they > won't match up when compared with new versions of git. We can probably > ignore it, though. This isn't the first time that patch-ids might have > changed, and I think the advice is already that one should not count on > them to be stable in the long term. > > -Peff Plus they'll be stable within a version of Git, it's only recorded patch ids that change, which hopefully isn't done very much if at all. Thanks, Jake ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-19 23:02 ` Jacob Keller @ 2016-04-19 23:07 ` Junio C Hamano 2016-04-20 13:12 ` Michael S. Tsirkin 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2016-04-19 23:07 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Jacob Keller, Jeff King, Stefan Beller, git, Jacob Keller Jacob Keller <jacob.keller@gmail.com> writes: > On Tue, Apr 19, 2016 at 10:06 AM, Jeff King <peff@peff.net> wrote: >> On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote: >> >>> On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote: >>> >>> > I guess this will invalidate old patch-ids, but there's not much to be >>> > done about that. >>> >>> What do you mean by that? (What consequences do you imagine?) >>> I think diffs with any kind of heuristic can still be applied, no? >> >> I mean that if you save any old patch-ids from "git patch-id", they >> won't match up when compared with new versions of git. We can probably >> ignore it, though. This isn't the first time that patch-ids might have >> changed, and I think the advice is already that one should not count on >> them to be stable in the long term. >> >> -Peff > > Plus they'll be stable within a version of Git, it's only recorded > patch ids that change, which hopefully isn't done very much if at all. > > Thanks, > Jake Some people, like those who did things like 30e12b92 (patch-id: make it stable against hunk reordering, 2014-04-27), _may_ care. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-19 23:07 ` Junio C Hamano @ 2016-04-20 13:12 ` Michael S. Tsirkin 2016-04-20 16:09 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Michael S. Tsirkin @ 2016-04-20 13:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jacob Keller, Jeff King, Stefan Beller, git, Jacob Keller On Tue, Apr 19, 2016 at 04:07:35PM -0700, Junio C Hamano wrote: > Jacob Keller <jacob.keller@gmail.com> writes: > > > On Tue, Apr 19, 2016 at 10:06 AM, Jeff King <peff@peff.net> wrote: > >> On Tue, Apr 19, 2016 at 08:17:38AM -0700, Stefan Beller wrote: > >> > >>> On Mon, Apr 18, 2016 at 10:03 PM, Jeff King <peff@peff.net> wrote: > >>> > >>> > I guess this will invalidate old patch-ids, but there's not much to be > >>> > done about that. > >>> > >>> What do you mean by that? (What consequences do you imagine?) > >>> I think diffs with any kind of heuristic can still be applied, no? > >> > >> I mean that if you save any old patch-ids from "git patch-id", they > >> won't match up when compared with new versions of git. We can probably > >> ignore it, though. This isn't the first time that patch-ids might have > >> changed, and I think the advice is already that one should not count on > >> them to be stable in the long term. > >> > >> -Peff > > > > Plus they'll be stable within a version of Git, it's only recorded > > patch ids that change, which hopefully isn't done very much if at all. > > > > Thanks, > > Jake > > Some people, like those who did things like 30e12b92 (patch-id: make > it stable against hunk reordering, 2014-04-27), _may_ care. > FWIW IIRC what that commit is about is ability to reorder the chunks in a patch without changing patch-id. Not about keeping id stable across git revisions. -- MST ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-20 13:12 ` Michael S. Tsirkin @ 2016-04-20 16:09 ` Junio C Hamano 2016-04-20 16:17 ` Jeff King 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2016-04-20 16:09 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Jacob Keller, Jeff King, Stefan Beller, git, Jacob Keller "Michael S. Tsirkin" <mst@redhat.com> writes: > FWIW IIRC what that commit is about is ability to reorder the chunks in > a patch without changing patch-id. Not about keeping id stable across > git revisions. OK, but "reorder the chunks" is not meant to stay to be the _ONLY_ purpose for an option whose name is a broad "--[un]stable", but merely one (and only) possible cause of patch-id instability that happened to be noticed as an issue back then and was dealt with that commit, no? In other words, the intent of the "--stable" feature is to give a stable ID that is not affected by random end-user settings (e.g. diff.orderfile) and if somebody invents a new configurable knob in the future, they are supposed to pay attention to the "--stable" feature or existing users who do use "--stable" will be broken, no? I can still buy "--stable is not about stability across versions of Git"--it makes our job easier ;-) I just want to make sure that "--stable is about stability inside a single version of Git that patch ID for the same commit will stay the same and unaffected by random end-user configuration knobs". Which in turn would mean that we won't have to worry about this option in patch-id as long as we remove the diff.compactionheuristic configuration and command line option once the developers are done experimenting with their heuristics code. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-20 16:09 ` Junio C Hamano @ 2016-04-20 16:17 ` Jeff King 0 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2016-04-20 16:17 UTC (permalink / raw) To: Junio C Hamano Cc: Michael S. Tsirkin, Jacob Keller, Stefan Beller, git, Jacob Keller On Wed, Apr 20, 2016 at 09:09:53AM -0700, Junio C Hamano wrote: > "Michael S. Tsirkin" <mst@redhat.com> writes: > > > FWIW IIRC what that commit is about is ability to reorder the chunks in > > a patch without changing patch-id. Not about keeping id stable across > > git revisions. > > OK, but "reorder the chunks" is not meant to stay to be the _ONLY_ > purpose for an option whose name is a broad "--[un]stable", but > merely one (and only) possible cause of patch-id instability that > happened to be noticed as an issue back then and was dealt with that > commit, no? In other words, the intent of the "--stable" feature is > to give a stable ID that is not affected by random end-user settings > (e.g. diff.orderfile) and if somebody invents a new configurable knob > in the future, they are supposed to pay attention to the "--stable" > feature or existing users who do use "--stable" will be broken, no? I forgot that we added "--stable". Evne if it is not meant to be about stability across versions, is there any reason _not_ to turn off this heuristic for --stable (or for patch-ids in general)? I guess maybe that creates some inconsistency between generating a patch-id directly, and making one from a diff given on stdin (though I don't know that we can promise much about the latter in the general case; we can fix file ordering, but we don't have enough information to tweak other aspects). -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-19 17:06 ` Jeff King 2016-04-19 23:02 ` Jacob Keller @ 2016-04-20 6:00 ` Junio C Hamano 1 sibling, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2016-04-20 6:00 UTC (permalink / raw) To: Jeff King; +Cc: Stefan Beller, git, Jacob Keller, Jacob Keller Jeff King <peff@peff.net> writes: > I mean that if you save any old patch-ids from "git patch-id", they > won't match up when compared with new versions of git. We can probably > ignore it, though. This isn't the first time that patch-ids might have > changed, and I think the advice is already that one should not count on > them to be stable in the long term. Another thing that this *will* break is the patch signature upload protocol k.org uses to allow Linus, Greg, et al. on the road with limited hotel wifi bandwidth to prepare patch-X-test1.gz and patch-X-test1.sign file. They can locally tag X-test1, prepare "git diff X X-test1 | gzip -n >patch-X-test1.gz" and sign the result, and upload _only_ the detached signature after pushing. They can tell k.org, when uploading the detached signature, to recreate the patchfile by running the same "git diff" to save the bandwidth of sending the same thing twice (as they have to "push" anyway, having to send the generated patch is a pure overhead). Having said all that, kup(1) users are already warned that the textual diff produced by "git diff-tree -p" (which is mentioned in the documentation of the tool) varies across versions of Git and the above "optimization" would not work unless both ends have the same version of Git, so it may not be too big an issue for them. They have already been burned once when we corrected "git archive" output in the past (they obviously have the same optimization to sign tarballs, and the kup(1) mechanism relies to have byte-for-byte identical output). ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-19 5:03 ` Jeff King 2016-04-19 6:47 ` Stefan Beller 2016-04-19 15:17 ` Stefan Beller @ 2016-04-19 16:51 ` Junio C Hamano 2 siblings, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2016-04-19 16:51 UTC (permalink / raw) To: Jeff King; +Cc: Stefan Beller, git, jacob.keller, Jacob Keller Jeff King <peff@peff.net> writes: > I guess this will invalidate old patch-ids, but there's not much to be > done about that. If we really cared, we could disable this (and any future) change to the compaction logic to "patch-id --[un]stable" option. I am not sure if it is worth the effort, though ;-) ^ permalink raw reply [flat|nested] 44+ messages in thread
* [RFC PATCH, WAS: "weird diff output?" v3a 0/2] implement shortest line diff chunk heuristic @ 2016-04-15 23:01 Stefan Beller 2016-04-15 23:01 ` [PATCH 2/2] xdiff: implement empty line " Stefan Beller 0 siblings, 1 reply; 44+ messages in thread From: Stefan Beller @ 2016-04-15 23:01 UTC (permalink / raw) To: jacob.keller; +Cc: git, gitster, peff, Jens.Lehmann, davidel, Stefan Beller This is a version based on Jacobs v2, with the same fixes as in his v3 (hopefully), changing the heuristic, such that CRLF confusion might be gone. TODO: * add some tests * think about whether we need a git attribute or not (I did some thinking, and if we do need to configure this at all, this is where I would put it) Later on we want to have git attributes I'd think. For now let's just keep the `git config diff.shortestlineheuristic true` config option for testing? Changes since Jacobs v2: * s/empty line/shortest line/g That new heuristic is a superset of the empty line heuristic as empty lines are shortest lines. This solves the "What is an empty line?" question (Think of CRLF vs LF) * fixed Jacobs rebase mistake (which is also fixed in Jacobs v3) Changes since my v1: * rename xdl_hash_and_recmatch to recs_match * remove counting empty lines in the first section of the looping Changes since Stefan's v1: * Added a patch to implement xdl_hash_and_recmatch as Junio suggested. * Fixed a segfault in Stefan's patch * Added XDL flag to configure the behavior * Used an int and counted empty lines via += instead of |= * Renamed starts_with_emptyline to is_emptyline * Added diff command line and config options Jacob Keller (1): xdiff: add recs_match helper function Stefan Beller (1): xdiff: implement empty line chunk heuristic Documentation/diff-config.txt | 6 ++++++ Documentation/diff-options.txt | 6 ++++++ diff.c | 11 +++++++++++ xdiff/xdiff.h | 2 ++ xdiff/xdiffi.c | 43 ++++++++++++++++++++++++++++++++++++++---- 5 files changed, 64 insertions(+), 4 deletions(-) -- 2.8.1.189.gd13d43c ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-15 23:01 [RFC PATCH, WAS: "weird diff output?" v3a 0/2] implement shortest line diff " Stefan Beller @ 2016-04-15 23:01 ` Stefan Beller 2016-04-15 23:05 ` Jacob Keller 2016-04-16 0:49 ` Junio C Hamano 0 siblings, 2 replies; 44+ messages in thread From: Stefan Beller @ 2016-04-15 23:01 UTC (permalink / raw) To: jacob.keller Cc: git, gitster, peff, Jens.Lehmann, davidel, Stefan Beller, Jacob Keller In order to produce the smallest possible diff and combine several diff hunks together, we implement a heuristic from GNU Diff which moves diff hunks forward as far as possible when we find common context above and below a diff hunk. This sometimes produces less readable diffs when writing C, Shell, or other programming languages, ie: ... /* + * + * + */ + +/* ... instead of the more readable equivalent of ... +/* + * + * + */ + /* ... Original discussion and testing found the following heuristic to be producing the desired output: If there are diff chunks which can be shifted around, shift each hunk such that the last common empty line is below the chunk with the rest of the context above. This heuristic appears to resolve the above example and several other common issues without producing significantly weird results. When implementing this heuristic the handling of empty lines was awkward as it is unclear what an empty line is. ('\n' or do we include "\r\n" as it is common on Windows?) Instead we implement a slightly different heuristic: If there are diff chunks which can be shifted around, find the shortest line in the overlapping parts. Use the line with the shortest length that occurs last as the last line of the chunk with the rest of the context above. However, as with any heuristic it is not really known whether this will always be more optimal. Thus, leave the heuristic disabled by default. Add an XDIFF flag to enable this heuristic only conditionally. Add a diff command line option and diff configuration option to allow users to enable this option when desired. TODO: * Add tests * Add better/more documentation explaining the heuristic, possibly with examples(?) * better name(?) Signed-off-by: Stefan Beller <sbeller@google.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Stefan Beller <sbeller@google.com> --- Documentation/diff-config.txt | 6 ++++++ Documentation/diff-options.txt | 6 ++++++ diff.c | 11 +++++++++++ xdiff/xdiff.h | 2 ++ xdiff/xdiffi.c | 29 +++++++++++++++++++++++++++++ 5 files changed, 54 insertions(+) diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt index edba565..3d99a90 100644 --- a/Documentation/diff-config.txt +++ b/Documentation/diff-config.txt @@ -170,6 +170,12 @@ diff.tool:: include::mergetools-diff.txt[] +diff.shortestLineHeuristic:: + Set this option to true to enable the shortest line chunk heuristic when + producing diff output. This heuristic will attempt to shift hunks such + that the last shortest common line occurs below the hunk with the rest of + the context above it. + diff.algorithm:: Choose a diff algorithm. The variants are as follows: + diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 4b0318e..b1ca83d 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -63,6 +63,12 @@ ifndef::git-format-patch[] Synonym for `-p --raw`. endif::git-format-patch[] +--shortest-line-heuristic:: +--no-shortest-line-heuristic:: + When possible, shift common shortest line in diff hunks below the hunk + such that the last common shortest line for each hunk is below, with the + rest of the context above the hunk. + --minimal:: Spend extra time to make sure the smallest possible diff is produced. diff --git a/diff.c b/diff.c index 4dfe660..a02aff9 100644 --- a/diff.c +++ b/diff.c @@ -26,6 +26,7 @@ #endif static int diff_detect_rename_default; +static int diff_shortest_line_heuristic = 0; static int diff_rename_limit_default = 400; static int diff_suppress_blank_empty; static int diff_use_color_default = -1; @@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb) diff_detect_rename_default = git_config_rename(var, value); return 0; } + if (!strcmp(var, "diff.shortestlineheuristic")) { + diff_shortest_line_heuristic = git_config_bool(var, value); + return 0; + } if (!strcmp(var, "diff.autorefreshindex")) { diff_auto_refresh_index = git_config_bool(var, value); return 0; @@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options) options->use_color = diff_use_color_default; options->detect_rename = diff_detect_rename_default; options->xdl_opts |= diff_algorithm; + if (diff_shortest_line_heuristic) + DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC); options->orderfile = diff_order_file_cfg; @@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options, DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL); else if (!strcmp(arg, "--ignore-blank-lines")) DIFF_XDL_SET(options, IGNORE_BLANK_LINES); + else if (!strcmp(arg, "--shortest-line-heuristic")) + DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC); + else if (!strcmp(arg, "--no-shortest-line-heuristic")) + DIFF_XDL_CLR(options, SHORTEST_LINE_HEURISTIC); else if (!strcmp(arg, "--patience")) options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF); else if (!strcmp(arg, "--histogram")) diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index 4fb7e79..e1f8ec0 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -41,6 +41,8 @@ extern "C" { #define XDF_IGNORE_BLANK_LINES (1 << 7) +#define XDF_SHORTEST_LINE_HEURISTIC (1 << 8) + #define XDL_EMIT_FUNCNAMES (1 << 0) #define XDL_EMIT_FUNCCONTEXT (1 << 2) diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index 748eeb9..7d15b26 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -400,6 +400,12 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, } +static int line_length(const char *recs) +{ + char *s = strchr(recs, '\n'); + return s ? s - recs : strlen(recs); +} + static int recs_match(xrecord_t **recs, long ixs, long ix, long flags) { return (recs[ixs]->ha == recs[ix]->ha && @@ -411,6 +417,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags) int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec; char *rchg = xdf->rchg, *rchgo = xdfo->rchg; + unsigned int shortest_line; xrecord_t **recs = xdf->recs; /* @@ -444,6 +451,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { do { grpsiz = ix - ixs; + shortest_line = UINT_MAX; /* * If the line before the current change group, is equal to @@ -478,6 +486,10 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { * the group. */ while (ix < nrec && recs_match(recs, ixs, ix, flags)) { + int l = line_length(recs[ix]->ptr); + if (l < shortest_line) + shortest_line = l; + rchg[ixs++] = 0; rchg[ix++] = 1; @@ -504,6 +516,23 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { rchg[--ix] = 0; while (rchgo[--ixo]); } + + /* + * If a group can be moved back and forth, see if there is an + * empty line in the moving space. If there is an empty line, + * make sure the last empty line is the end of the group. + * + * As we shifted the group forward as far as possible, we only + * need to shift it back if at all. + */ + if ((flags & XDF_SHORTEST_LINE_HEURISTIC)) { + while (ixs > 0 && + line_length(recs[ix - 1]->ptr) > shortest_line && + recs_match(recs, ixs - 1, ix - 1, flags)) { + rchg[--ixs] = 1; + rchg[--ix] = 0; + } + } } return 0; -- 2.8.1.189.gd13d43c ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-15 23:01 ` [PATCH 2/2] xdiff: implement empty line " Stefan Beller @ 2016-04-15 23:05 ` Jacob Keller 2016-04-15 23:32 ` Jacob Keller 2016-04-16 0:49 ` Junio C Hamano 1 sibling, 1 reply; 44+ messages in thread From: Jacob Keller @ 2016-04-15 23:05 UTC (permalink / raw) To: Stefan Beller Cc: Git mailing list, Junio C Hamano, Jeff King, Jens Lehmann, Davide Libenzi, Jacob Keller On Fri, Apr 15, 2016 at 4:01 PM, Stefan Beller <sbeller@google.com> wrote: > In order to produce the smallest possible diff and combine several diff > hunks together, we implement a heuristic from GNU Diff which moves diff > hunks forward as far as possible when we find common context above and > below a diff hunk. This sometimes produces less readable diffs when > writing C, Shell, or other programming languages, ie: > > ... > /* > + * > + * > + */ > + > +/* > ... > > instead of the more readable equivalent of > > ... > +/* > + * > + * > + */ > + > /* > ... > > Original discussion and testing found the following heuristic to be > producing the desired output: > > If there are diff chunks which can be shifted around, shift each hunk > such that the last common empty line is below the chunk with the rest > of the context above. > > This heuristic appears to resolve the above example and several other > common issues without producing significantly weird results. When > implementing this heuristic the handling of empty lines was awkward as > it is unclear what an empty line is. ('\n' or do we include "\r\n" as it > is common on Windows?) Instead we implement a slightly different heuristic: > > If there are diff chunks which can be shifted around, find the shortest > line in the overlapping parts. Use the line with the shortest length that > occurs last as the last line of the chunk with the rest > of the context above. > > However, as with any heuristic it is not really known whether this will > always be more optimal. Thus, leave the heuristic disabled by default. > > Add an XDIFF flag to enable this heuristic only conditionally. Add > a diff command line option and diff configuration option to allow users > to enable this option when desired. > > TODO: > * Add tests > * Add better/more documentation explaining the heuristic, possibly with > examples(?) > * better name(?) > There's a few places that will need cleaning up (comments and such) that mention empty line still, but that's not surprising. I am going to test this for a bit on my local repos, and see if it makes any difference to the old heuristic as well. Thanks, Jake > Signed-off-by: Stefan Beller <sbeller@google.com> > Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> > Signed-off-by: Stefan Beller <sbeller@google.com> > --- > Documentation/diff-config.txt | 6 ++++++ > Documentation/diff-options.txt | 6 ++++++ > diff.c | 11 +++++++++++ > xdiff/xdiff.h | 2 ++ > xdiff/xdiffi.c | 29 +++++++++++++++++++++++++++++ > 5 files changed, 54 insertions(+) > > diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt > index edba565..3d99a90 100644 > --- a/Documentation/diff-config.txt > +++ b/Documentation/diff-config.txt > @@ -170,6 +170,12 @@ diff.tool:: > > include::mergetools-diff.txt[] > > +diff.shortestLineHeuristic:: > + Set this option to true to enable the shortest line chunk heuristic when > + producing diff output. This heuristic will attempt to shift hunks such > + that the last shortest common line occurs below the hunk with the rest of > + the context above it. > + > diff.algorithm:: > Choose a diff algorithm. The variants are as follows: > + > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt > index 4b0318e..b1ca83d 100644 > --- a/Documentation/diff-options.txt > +++ b/Documentation/diff-options.txt > @@ -63,6 +63,12 @@ ifndef::git-format-patch[] > Synonym for `-p --raw`. > endif::git-format-patch[] > > +--shortest-line-heuristic:: > +--no-shortest-line-heuristic:: > + When possible, shift common shortest line in diff hunks below the hunk > + such that the last common shortest line for each hunk is below, with the > + rest of the context above the hunk. > + > --minimal:: > Spend extra time to make sure the smallest possible > diff is produced. > diff --git a/diff.c b/diff.c > index 4dfe660..a02aff9 100644 > --- a/diff.c > +++ b/diff.c > @@ -26,6 +26,7 @@ > #endif > > static int diff_detect_rename_default; > +static int diff_shortest_line_heuristic = 0; > static int diff_rename_limit_default = 400; > static int diff_suppress_blank_empty; > static int diff_use_color_default = -1; > @@ -189,6 +190,10 @@ int git_diff_ui_config(const char *var, const char *value, void *cb) > diff_detect_rename_default = git_config_rename(var, value); > return 0; > } > + if (!strcmp(var, "diff.shortestlineheuristic")) { > + diff_shortest_line_heuristic = git_config_bool(var, value); > + return 0; > + } > if (!strcmp(var, "diff.autorefreshindex")) { > diff_auto_refresh_index = git_config_bool(var, value); > return 0; > @@ -3278,6 +3283,8 @@ void diff_setup(struct diff_options *options) > options->use_color = diff_use_color_default; > options->detect_rename = diff_detect_rename_default; > options->xdl_opts |= diff_algorithm; > + if (diff_shortest_line_heuristic) > + DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC); > > options->orderfile = diff_order_file_cfg; > > @@ -3798,6 +3805,10 @@ int diff_opt_parse(struct diff_options *options, > DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL); > else if (!strcmp(arg, "--ignore-blank-lines")) > DIFF_XDL_SET(options, IGNORE_BLANK_LINES); > + else if (!strcmp(arg, "--shortest-line-heuristic")) > + DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC); > + else if (!strcmp(arg, "--no-shortest-line-heuristic")) > + DIFF_XDL_CLR(options, SHORTEST_LINE_HEURISTIC); > else if (!strcmp(arg, "--patience")) > options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF); > else if (!strcmp(arg, "--histogram")) > diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h > index 4fb7e79..e1f8ec0 100644 > --- a/xdiff/xdiff.h > +++ b/xdiff/xdiff.h > @@ -41,6 +41,8 @@ extern "C" { > > #define XDF_IGNORE_BLANK_LINES (1 << 7) > > +#define XDF_SHORTEST_LINE_HEURISTIC (1 << 8) > + > #define XDL_EMIT_FUNCNAMES (1 << 0) > #define XDL_EMIT_FUNCCONTEXT (1 << 2) > > diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c > index 748eeb9..7d15b26 100644 > --- a/xdiff/xdiffi.c > +++ b/xdiff/xdiffi.c > @@ -400,6 +400,12 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, > } > > > +static int line_length(const char *recs) > +{ > + char *s = strchr(recs, '\n'); > + return s ? s - recs : strlen(recs); > +} > + > static int recs_match(xrecord_t **recs, long ixs, long ix, long flags) > { > return (recs[ixs]->ha == recs[ix]->ha && > @@ -411,6 +417,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags) > int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { > long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec; > char *rchg = xdf->rchg, *rchgo = xdfo->rchg; > + unsigned int shortest_line; > xrecord_t **recs = xdf->recs; > > /* > @@ -444,6 +451,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { > > do { > grpsiz = ix - ixs; > + shortest_line = UINT_MAX; > > /* > * If the line before the current change group, is equal to > @@ -478,6 +486,10 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { > * the group. > */ > while (ix < nrec && recs_match(recs, ixs, ix, flags)) { > + int l = line_length(recs[ix]->ptr); > + if (l < shortest_line) > + shortest_line = l; > + > rchg[ixs++] = 0; > rchg[ix++] = 1; > > @@ -504,6 +516,23 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { > rchg[--ix] = 0; > while (rchgo[--ixo]); > } > + > + /* > + * If a group can be moved back and forth, see if there is an > + * empty line in the moving space. If there is an empty line, > + * make sure the last empty line is the end of the group. > + * > + * As we shifted the group forward as far as possible, we only > + * need to shift it back if at all. > + */ > + if ((flags & XDF_SHORTEST_LINE_HEURISTIC)) { > + while (ixs > 0 && > + line_length(recs[ix - 1]->ptr) > shortest_line && > + recs_match(recs, ixs - 1, ix - 1, flags)) { > + rchg[--ixs] = 1; > + rchg[--ix] = 0; > + } > + } > } > > return 0; > -- > 2.8.1.189.gd13d43c > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-15 23:05 ` Jacob Keller @ 2016-04-15 23:32 ` Jacob Keller 2016-04-15 23:45 ` Stefan Beller 0 siblings, 1 reply; 44+ messages in thread From: Jacob Keller @ 2016-04-15 23:32 UTC (permalink / raw) To: Stefan Beller Cc: Git mailing list, Junio C Hamano, Jeff King, Jens Lehmann, Davide Libenzi, Jacob Keller On Fri, Apr 15, 2016 at 4:05 PM, Jacob Keller <jacob.keller@gmail.com> wrote: > There's a few places that will need cleaning up (comments and such) > that mention empty line still, but that's not surprising. I am going > to test this for a bit on my local repos, and see if it makes any > difference to the old heuristic as well. > > Thanks, > Jake > I ran this heuristic on git.git and it produces tons of false positive transforms which are much lease readable (to me at least), far more than those produced by the newline/blank link heuristic did. I think we should stick with the empty line heuristic instead of this version, even if it's easier to implement this version. We still would need to figure out how to handle CRLF properly but it's worth resolving that than this heuristic is. Thanks, Jake ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-15 23:32 ` Jacob Keller @ 2016-04-15 23:45 ` Stefan Beller 0 siblings, 0 replies; 44+ messages in thread From: Stefan Beller @ 2016-04-15 23:45 UTC (permalink / raw) To: Jacob Keller Cc: Git mailing list, Junio C Hamano, Jeff King, Jens Lehmann, Davide Libenzi, Jacob Keller On Fri, Apr 15, 2016 at 4:32 PM, Jacob Keller <jacob.keller@gmail.com> wrote: > On Fri, Apr 15, 2016 at 4:05 PM, Jacob Keller <jacob.keller@gmail.com> wrote: >> There's a few places that will need cleaning up (comments and such) >> that mention empty line still, but that's not surprising. I am going >> to test this for a bit on my local repos, and see if it makes any >> difference to the old heuristic as well. >> >> Thanks, >> Jake >> > > I ran this heuristic on git.git and it produces tons of false positive > transforms which are much lease readable (to me at least), far more > than those produced by the newline/blank link heuristic did. > > I think we should stick with the empty line heuristic instead of this > version, even if it's easier to implement this version. I agree. The heuristic is worse as we often have these 50:50 chances of messing stuff up. > > We still would need to figure out how to handle CRLF properly but it's > worth resolving that than this heuristic is. > > Thanks, > Jake ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-15 23:01 ` [PATCH 2/2] xdiff: implement empty line " Stefan Beller 2016-04-15 23:05 ` Jacob Keller @ 2016-04-16 0:49 ` Junio C Hamano 2016-04-16 0:59 ` Stefan Beller 2016-04-16 1:07 ` Jacob Keller 1 sibling, 2 replies; 44+ messages in thread From: Junio C Hamano @ 2016-04-16 0:49 UTC (permalink / raw) To: Stefan Beller Cc: jacob.keller, git, peff, Jens.Lehmann, davidel, Jacob Keller Stefan Beller <sbeller@google.com> writes: > +static int line_length(const char *recs) > +{ > + char *s = strchr(recs, '\n'); > + return s ? s - recs : strlen(recs); > +} It seems that you guys are discarding this "number of bytes on a line, no matter what these bytes are" idea, so this may be moot, but is there a guarantee that reading through recs until you happen to see a NUL is safe? Shouldn't the code that accesses a "line" be using the same "from here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid having to scan the underlying string in an unbounded way? ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-16 0:49 ` Junio C Hamano @ 2016-04-16 0:59 ` Stefan Beller 2016-04-16 1:07 ` Jacob Keller 1 sibling, 0 replies; 44+ messages in thread From: Stefan Beller @ 2016-04-16 0:59 UTC (permalink / raw) To: Junio C Hamano Cc: Jacob Keller, git, Jeff King, Jens Lehmann, Davide Libenzi, Jacob Keller On Fri, Apr 15, 2016 at 5:49 PM, Junio C Hamano <gitster@pobox.com> wrote: > Stefan Beller <sbeller@google.com> writes: > >> +static int line_length(const char *recs) >> +{ >> + char *s = strchr(recs, '\n'); >> + return s ? s - recs : strlen(recs); >> +} > > It seems that you guys are discarding this "number of bytes on a > line, no matter what these bytes are" idea, so this may be moot, but > is there a guarantee that reading through recs until you happen to > see a NUL is safe? We discarded this idea as it produces to many errors. (We'd be back at the 50:50 case, "is it really worth it?") We will go back to the "empty line" heuristic, which will be solved via xdl_blankline(rec[i]->ptr, rec[i]->size, flags); which could be inlined. That will solve the CRLF issue as a CR is covered as a whitespace (with CRLF you'd have to specify diff to ignore white spaces). For the safety I assumed * there is always a \n even on the last line by convention. * in case it is not, the string is null terminated, hence strchr and strlen for the rescue. > > Shouldn't the code that accesses a "line" be using the same "from > here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid > having to scan the underlying string in an unbounded way? xdl_blankline will use ->size, so we'll be holding it right. Thanks, Stefan > > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-16 0:49 ` Junio C Hamano 2016-04-16 0:59 ` Stefan Beller @ 2016-04-16 1:07 ` Jacob Keller 2016-04-18 19:22 ` Junio C Hamano 1 sibling, 1 reply; 44+ messages in thread From: Jacob Keller @ 2016-04-16 1:07 UTC (permalink / raw) To: Junio C Hamano Cc: Stefan Beller, Git mailing list, Jeff King, Jens Lehmann, Davide Libenzi, Jacob Keller On Fri, Apr 15, 2016 at 5:49 PM, Junio C Hamano <gitster@pobox.com> wrote: > Stefan Beller <sbeller@google.com> writes: > >> +static int line_length(const char *recs) >> +{ >> + char *s = strchr(recs, '\n'); >> + return s ? s - recs : strlen(recs); >> +} > > It seems that you guys are discarding this "number of bytes on a > line, no matter what these bytes are" idea, so this may be moot, but > is there a guarantee that reading through recs until you happen to > see a NUL is safe? > > Shouldn't the code that accesses a "line" be using the same "from > here to there", i.e. recs[]->ptr, recs[]->size, interface to avoid > having to scan the underlying string in an unbounded way? > > I think we're going to make use of xdl_blankline instead of this or our own "is_emptyline" Thanks, Jake ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-16 1:07 ` Jacob Keller @ 2016-04-18 19:22 ` Junio C Hamano 2016-04-18 19:33 ` Stefan Beller 0 siblings, 1 reply; 44+ messages in thread From: Junio C Hamano @ 2016-04-18 19:22 UTC (permalink / raw) To: Jacob Keller Cc: Stefan Beller, Git mailing list, Jeff King, Jens Lehmann, Davide Libenzi, Jacob Keller Jacob Keller <jacob.keller@gmail.com> writes: > I think we're going to make use of xdl_blankline instead of this or > our own "is_emptyline" OK, so perhaps either of you two can do a final version people can start having fun with? By the way, I really do not want to see something this low-level to be end-user tweakable with "one bit enable/disable"; the end users shouldn't have to bother [1]. I left it in but renamed after "what" it enables/disables, not "how" the enabled thing works, to clarify that we have this only as a developers' aid. *1* I am fine with --compaction-heuristic=(shortest|blank|...) that allows a choice among many as a developers' aid, but I do not think this topic is there yet. Documentation/diff-config.txt | 9 ++++----- Documentation/diff-options.txt | 10 +++++----- diff.c | 18 +++++++++--------- xdiff/xdiff.h | 2 +- xdiff/xdiffi.c | 22 ++++++++++------------ 5 files changed, 29 insertions(+), 32 deletions(-) diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt index c62745b..9bf3e92 100644 --- a/Documentation/diff-config.txt +++ b/Documentation/diff-config.txt @@ -166,11 +166,10 @@ diff.tool:: include::mergetools-diff.txt[] -diff.shortestLineHeuristic:: - Set this option to true to enable the shortest line chunk heuristic when - producing diff output. This heuristic will attempt to shift hunks such - that the last shortest common line occurs below the hunk with the rest of - the context above it. +diff.compactionHeuristic:: + Set this option to enable an experimental heuristic that + shifts the hunk boundary in an attempt to make the resulting + patch easier to read. diff.algorithm:: Choose a diff algorithm. The variants are as follows: diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt index 238f39c..b513023 100644 --- a/Documentation/diff-options.txt +++ b/Documentation/diff-options.txt @@ -63,11 +63,11 @@ ifndef::git-format-patch[] Synonym for `-p --raw`. endif::git-format-patch[] ---shortest-line-heuristic:: ---no-shortest-line-heuristic:: - When possible, shift common shortest line in diff hunks below the hunk - such that the last common shortest line for each hunk is below, with the - rest of the context above the hunk. +--compaction-heuristic:: +--no-compaction-heuristic:: + These are to help debugging and tuning an experimental + heuristic that shifts the hunk boundary in an attempt to + make the resulting patch easier to read. --minimal:: Spend extra time to make sure the smallest possible diff --git a/diff.c b/diff.c index 276174c..02c75c3 100644 --- a/diff.c +++ b/diff.c @@ -25,7 +25,7 @@ #endif static int diff_detect_rename_default; -static int diff_shortest_line_heuristic = 0; +static int diff_compaction_heuristic = 1; static int diff_rename_limit_default = 400; static int diff_suppress_blank_empty; static int diff_use_color_default = -1; @@ -184,8 +184,8 @@ int git_diff_ui_config(const char *var, const char *value, void *cb) diff_detect_rename_default = git_config_rename(var, value); return 0; } - if (!strcmp(var, "diff.shortestlineheuristic")) { - diff_shortest_line_heuristic = git_config_bool(var, value); + if (!strcmp(var, "diff.compactionheuristic")) { + diff_compaction_heuristic = git_config_bool(var, value); return 0; } if (!strcmp(var, "diff.autorefreshindex")) { @@ -3240,8 +3240,8 @@ void diff_setup(struct diff_options *options) options->use_color = diff_use_color_default; options->detect_rename = diff_detect_rename_default; options->xdl_opts |= diff_algorithm; - if (diff_shortest_line_heuristic) - DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC); + if (diff_compaction_heuristic) + DIFF_XDL_SET(options, COMPACTION_HEURISTIC); options->orderfile = diff_order_file_cfg; @@ -3719,10 +3719,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac) DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL); else if (!strcmp(arg, "--ignore-blank-lines")) DIFF_XDL_SET(options, IGNORE_BLANK_LINES); - else if (!strcmp(arg, "--shortest-line-heuristic")) - DIFF_XDL_SET(options, SHORTEST_LINE_HEURISTIC); - else if (!strcmp(arg, "--no-shortest-line-heuristic")) - DIFF_XDL_CLR(options, SHORTEST_LINE_HEURISTIC); + else if (!strcmp(arg, "--compaction-heuristic")) + DIFF_XDL_SET(options, COMPACTION_HEURISTIC); + else if (!strcmp(arg, "--no-compaction-heuristic")) + DIFF_XDL_CLR(options, COMPACTION_HEURISTIC); else if (!strcmp(arg, "--patience")) options->xdl_opts = DIFF_WITH_ALG(options, PATIENCE_DIFF); else if (!strcmp(arg, "--histogram")) diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h index 968ac62..d1dbb27 100644 --- a/xdiff/xdiff.h +++ b/xdiff/xdiff.h @@ -41,7 +41,7 @@ extern "C" { #define XDF_IGNORE_BLANK_LINES (1 << 7) -#define XDF_SHORTEST_LINE_HEURISTIC (1 << 8) +#define XDF_COMPACTION_HEURISTIC (1 << 8) #define XDL_EMIT_FUNCNAMES (1 << 0) #define XDL_EMIT_COMMON (1 << 1) diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c index 7d15b26..1ec46e0 100644 --- a/xdiff/xdiffi.c +++ b/xdiff/xdiffi.c @@ -400,10 +400,9 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, } -static int line_length(const char *recs) +static int is_blank_line(xrecord_t **recs, long ix, long flags) { - char *s = strchr(recs, '\n'); - return s ? s - recs : strlen(recs); + return xdl_blankline(recs[ix]->ptr, recs[ix]->size, flags); } static int recs_match(xrecord_t **recs, long ixs, long ix, long flags) @@ -417,7 +416,7 @@ static int recs_match(xrecord_t **recs, long ixs, long ix, long flags) int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec; char *rchg = xdf->rchg, *rchgo = xdfo->rchg; - unsigned int shortest_line; + unsigned int blank_lines; xrecord_t **recs = xdf->recs; /* @@ -451,7 +450,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { do { grpsiz = ix - ixs; - shortest_line = UINT_MAX; + blank_lines = 0; /* * If the line before the current change group, is equal to @@ -486,9 +485,8 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { * the group. */ while (ix < nrec && recs_match(recs, ixs, ix, flags)) { - int l = line_length(recs[ix]->ptr); - if (l < shortest_line) - shortest_line = l; + + blank_lines += is_blank_line(recs, ix, flags); rchg[ixs++] = 0; rchg[ix++] = 1; @@ -519,15 +517,15 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) { /* * If a group can be moved back and forth, see if there is an - * empty line in the moving space. If there is an empty line, - * make sure the last empty line is the end of the group. + * blank line in the moving space. If there is a blank line, + * make sure the last blank line is the end of the group. * * As we shifted the group forward as far as possible, we only * need to shift it back if at all. */ - if ((flags & XDF_SHORTEST_LINE_HEURISTIC)) { + if ((flags & XDF_COMPACTION_HEURISTIC)) { while (ixs > 0 && - line_length(recs[ix - 1]->ptr) > shortest_line && + !is_blank_line(recs, ix - 1, flags) && recs_match(recs, ixs - 1, ix - 1, flags)) { rchg[--ixs] = 1; rchg[--ix] = 0; -- 2.8.1-399-g96b3b3a ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] xdiff: implement empty line chunk heuristic 2016-04-18 19:22 ` Junio C Hamano @ 2016-04-18 19:33 ` Stefan Beller 0 siblings, 0 replies; 44+ messages in thread From: Stefan Beller @ 2016-04-18 19:33 UTC (permalink / raw) To: Junio C Hamano Cc: Jacob Keller, Git mailing list, Jeff King, Jens Lehmann, Davide Libenzi, Jacob Keller On Mon, Apr 18, 2016 at 12:22 PM, Junio C Hamano <gitster@pobox.com> wrote: > Jacob Keller <jacob.keller@gmail.com> writes: > >> I think we're going to make use of xdl_blankline instead of this or >> our own "is_emptyline" > > OK, so perhaps either of you two can do a final version people can > start having fun with? Junios proposal seems to be on top of my latest series sent out, I'll squash it in and send it out as a final version if you don't mind (though I'll do it later today; currently diving into Gerrits Java) > > By the way, I really do not want to see something this low-level to > be end-user tweakable with "one bit enable/disable"; the end users > shouldn't have to bother [1]. Ok. Thanks for fixing that mistake. > I left it in but renamed after "what" > it enables/disables, not "how" the enabled thing works, to clarify > that we have this only as a developers' aid. > > *1* I am fine with --compaction-heuristic=(shortest|blank|...) that > allows a choice among many as a developers' aid, but I do not think > this topic is there yet. This doesn't bode well with > +--compaction-heuristic:: > +--no-compaction-heuristic:: in the future? I'd rather have +--compaction-heuristic=none +--compaction-heuristic=lastEmptyLine such that we don't have to worry about further experiments (or matured heuristics) later? ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2016-05-03 17:55 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-04-19 15:21 [PATCHv5 0/2] xdiff: implement empty line chunk heuristic Stefan Beller 2016-04-19 15:21 ` [PATCH 1/2] xdiff: add recs_match helper function Stefan Beller 2016-04-19 15:21 ` [PATCH 2/2] xdiff: implement empty line chunk heuristic Stefan Beller [not found] ` <CA+P7+xoqn3fxEZGn02ST1XV-2UpQGr3iwV-37R8pakFJy_9n0w@mail.gmail.com> 2016-04-20 4:18 ` Jeff King 2016-04-20 4:37 ` Jeff King 2016-04-20 4:37 ` Stefan Beller 2016-04-29 20:29 ` Junio C Hamano 2016-04-29 20:59 ` Jacob Keller 2016-04-29 22:18 ` Junio C Hamano 2016-04-29 22:35 ` Stefan Beller 2016-04-29 22:39 ` Keller, Jacob E 2016-04-29 22:44 ` Stefan Beller 2016-04-29 22:48 ` Keller, Jacob E 2016-05-02 17:40 ` Junio C Hamano 2016-05-02 17:45 ` Stefan Beller 2016-05-02 18:02 ` Jeff King 2016-05-03 17:55 ` Jacob Keller 2016-04-30 3:06 ` Jeff King 2016-04-19 16:23 ` [PATCHv5 0/2] " Junio C Hamano -- strict thread matches above, loose matches on Subject: below -- 2016-04-18 21:12 [PATCH 0/2 v4] " Stefan Beller 2016-04-18 21:12 ` [PATCH 2/2] " Stefan Beller 2016-04-18 22:04 ` Jacob Keller 2016-04-18 22:24 ` Junio C Hamano 2016-04-19 5:03 ` Jeff King 2016-04-19 6:47 ` Stefan Beller 2016-04-19 7:00 ` Jeff King 2016-04-19 7:05 ` Stefan Beller 2016-04-19 15:17 ` Stefan Beller 2016-04-19 17:06 ` Jeff King 2016-04-19 23:02 ` Jacob Keller 2016-04-19 23:07 ` Junio C Hamano 2016-04-20 13:12 ` Michael S. Tsirkin 2016-04-20 16:09 ` Junio C Hamano 2016-04-20 16:17 ` Jeff King 2016-04-20 6:00 ` Junio C Hamano 2016-04-19 16:51 ` Junio C Hamano 2016-04-15 23:01 [RFC PATCH, WAS: "weird diff output?" v3a 0/2] implement shortest line diff " Stefan Beller 2016-04-15 23:01 ` [PATCH 2/2] xdiff: implement empty line " Stefan Beller 2016-04-15 23:05 ` Jacob Keller 2016-04-15 23:32 ` Jacob Keller 2016-04-15 23:45 ` Stefan Beller 2016-04-16 0:49 ` Junio C Hamano 2016-04-16 0:59 ` Stefan Beller 2016-04-16 1:07 ` Jacob Keller 2016-04-18 19:22 ` Junio C Hamano 2016-04-18 19:33 ` Stefan Beller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.