All of lore.kernel.org
 help / color / mirror / Atom feed
* git diff too slow for a file
@ 2010-03-29  1:42 SungHyun Nam
  2010-04-17 15:52 ` René Scharfe
  0 siblings, 1 reply; 12+ messages in thread
From: SungHyun Nam @ 2010-03-29  1:42 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

Hello,

If I run a attached script for bunzipped attached files, I get:
(To reduce size, I removed many lines and bzipped.)

     $ ./mk.sh
     time diff -u x3 x4 >/dev/null 2>&1

     real	0m0.011s
     user	0m0.000s
     sys	0m0.010s

     time git diff >/dev/null 2>&1

     real	0m0.193s
     user	0m0.190s
     sys	0m0.000s

     $ git version
     git version 1.7.0.2.273.gc2413

     $ diff --version
     diff (GNU diffutils) 2.8.1
     ...

Well, though the files are ascii file, they includes a random
hexa-decimal datas, so that I don't interest the diff result at
all.  But the real problem is 'rebasing took so long if the file
was changed'.  Because the git tree includes several such a file,
if they changed, rebase took some miniutes for every branch.
Such a branch includes a few lines of changes for a C source file,
though.  Now I'm waiting an hour to finish rebasing all the
branches and yet a rebasing script is running... :-(

Please help!
Thanks,
namsh

The original file has more than 180000 lines and the result is:
     $ ./mk.sh
     time diff -u x3 x4 >/dev/null 2>&1

     real	0m0.759s
     user	0m0.740s
     sys	0m0.010s

     time git diff >/dev/null 2>&1

     real	0m44.460s
     user	0m44.390s
     sys	0m0.030s


[-- Attachment #2: x3.bz2 --]
[-- Type: application/octet-stream, Size: 3348 bytes --]

[-- Attachment #3: x4.bz2 --]
[-- Type: application/octet-stream, Size: 2088 bytes --]

[-- Attachment #4: mk.sh --]
[-- Type: text/plain, Size: 298 bytes --]

#!/bin/bash

run_command()
{
    echo "$@"
    eval "$@"
}

run_command 'time diff -u x3 x4 >/dev/null 2>&1'

{
    rm -rf u
    mkdir u
    cd u
    cp ../x3 x
    git init
    git add .
    git ci -m x
    cp ../x4 x
} >/dev/null 2>&1

echo ''
run_command 'time git diff >/dev/null 2>&1'

exit 0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git diff too slow for a file
  2010-03-29  1:42 git diff too slow for a file SungHyun Nam
@ 2010-04-17 15:52 ` René Scharfe
  2010-04-17 17:10   ` Junio C Hamano
  2010-04-19  0:43   ` SungHyun Nam
  0 siblings, 2 replies; 12+ messages in thread
From: René Scharfe @ 2010-04-17 15:52 UTC (permalink / raw)
  To: SungHyun Nam; +Cc: git

Am 29.03.2010 03:42, schrieb SungHyun Nam:
> Hello,
> 
> If I run a attached script for bunzipped attached files, I get:
> (To reduce size, I removed many lines and bzipped.)
> 
>     $ ./mk.sh
>     time diff -u x3 x4 >/dev/null 2>&1
> 
>     real    0m0.011s
>     user    0m0.000s
>     sys    0m0.010s
> 
>     time git diff >/dev/null 2>&1
> 
>     real    0m0.193s
>     user    0m0.190s
>     sys    0m0.000s
> 
>     $ git version
>     git version 1.7.0.2.273.gc2413
> 
>     $ diff --version
>     diff (GNU diffutils) 2.8.1
>     ...
> 
> Well, though the files are ascii file, they includes a random
> hexa-decimal datas, so that I don't interest the diff result at
> all.  But the real problem is 'rebasing took so long if the file
> was changed'.  Because the git tree includes several such a file,
> if they changed, rebase took some miniutes for every branch.
> Such a branch includes a few lines of changes for a C source file,
> though.  Now I'm waiting an hour to finish rebasing all the
> branches and yet a rebasing script is running... :-(

I can reproduce it; I concatenated your example files five times to get
meaningful timings (x1 = five times x3, x2 = five times x4).

The difference between GNU diff and git diff is that the latter is trying
hard to minimize the size of the diff.  Each user of the xdiff library in
git turns on the XDF_NEED_MINIMAL flag, which makes it very expensive
(specifically the function xdl_split()).

The following patch is not meant for inclusion, but rather to start a
dicussion.  Is XDF_NEED_MINIMAL a good default to have?

The patch removes XDF_NEED_MINIMAL and replaces it with XDF_QUICK, with
reversed meaning.  XDF_QUICK is only set if the new option --quick is
given, so without it the old behaviour is retained.  Some numbers:

$ time diff ../x1 ../x2 | diffstat
 unknown |34208 ++++++++++++++++++++++++++++++++--------------------------------
 1 file changed, 17104 insertions(+), 17104 deletions(-)

real	0m0.043s
user	0m0.030s
sys	0m0.010s

$ time git diff --stat
 x |32082 ++++++++++++++++++++++++++++++++++----------------------------------
 1 files changed, 16041 insertions(+), 16041 deletions(-)

real	0m2.176s
user	0m2.170s
sys	0m0.010s

$ time git diff --quick --stat
 x |34208 ++++++++++++++++++++++++++++++++++----------------------------------
 1 files changed, 17104 insertions(+), 17104 deletions(-)

real	0m0.064s
user	0m0.060s
sys	0m0.010s

---
 builtin/blame.c      |    2 +-
 builtin/merge-file.c |    2 +-
 builtin/merge-tree.c |    2 +-
 builtin/rerere.c     |    2 +-
 combine-diff.c       |    2 +-
 diff.c               |   12 +++++++-----
 merge-file.c         |    2 +-
 xdiff/xdiff.h        |    2 +-
 xdiff/xdiffi.c       |    2 +-
 9 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index fc15863..8deeee1 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -39,7 +39,7 @@ static int show_root;
 static int reverse;
 static int blank_boundary;
 static int incremental;
-static int xdl_opts = XDF_NEED_MINIMAL;
+static int xdl_opts;
 
 static enum date_mode blame_date_mode = DATE_ISO8601;
 static size_t blame_date_width;
diff --git a/builtin/merge-file.c b/builtin/merge-file.c
index 610849a..b8e9e5b 100644
--- a/builtin/merge-file.c
+++ b/builtin/merge-file.c
@@ -25,7 +25,7 @@ int cmd_merge_file(int argc, const char **argv, const char *prefix)
 	const char *names[3] = { NULL, NULL, NULL };
 	mmfile_t mmfs[3];
 	mmbuffer_t result = {NULL, 0};
-	xmparam_t xmp = {{XDF_NEED_MINIMAL}};
+	xmparam_t xmp = {{0}};
 	int ret = 0, i = 0, to_stdout = 0;
 	int quiet = 0;
 	int nongit;
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index a4a4f2c..fc00d79 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -106,7 +106,7 @@ static void show_diff(struct merge_list *entry)
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
 
-	xpp.flags = XDF_NEED_MINIMAL;
+	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
 	ecb.outf = show_outf;
diff --git a/builtin/rerere.c b/builtin/rerere.c
index 34f9ace..0048f9e 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -89,7 +89,7 @@ static int diff_two(const char *file1, const char *label1,
 	printf("--- a/%s\n+++ b/%s\n", label1, label2);
 	fflush(stdout);
 	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = XDF_NEED_MINIMAL;
+	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
 	ecb.outf = outf;
diff --git a/combine-diff.c b/combine-diff.c
index 6162691..29779be 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -221,7 +221,7 @@ static void combine_diff(const unsigned char *parent, unsigned int mode,
 	parent_file.ptr = grab_blob(parent, mode, &sz);
 	parent_file.size = sz;
 	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = XDF_NEED_MINIMAL;
+	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	memset(&state, 0, sizeof(state));
 	state.nmask = nmask;
diff --git a/diff.c b/diff.c
index a1bf1e9..e32b47b 100644
--- a/diff.c
+++ b/diff.c
@@ -718,7 +718,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	memset(&xecfg, 0, sizeof(xecfg));
 	diff_words_fill(&diff_words->minus, &minus, diff_words->word_regex);
 	diff_words_fill(&diff_words->plus, &plus, diff_words->word_regex);
-	xpp.flags = XDF_NEED_MINIMAL;
+	xpp.flags = 0;
 	/* as only the hunk header will be parsed, we need a 0-context */
 	xecfg.ctxlen = 0;
 	xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
@@ -1744,7 +1744,7 @@ static void builtin_diff(const char *name_a,
 			check_blank_at_eof(&mf1, &mf2, &ecbdata);
 		ecbdata.file = o->file;
 		ecbdata.header = header.len ? &header : NULL;
-		xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts;
+		xpp.flags = o->xdl_opts;
 		xecfg.ctxlen = o->context;
 		xecfg.interhunkctxlen = o->interhunkcontext;
 		xecfg.flags = XDL_EMIT_FUNCNAMES;
@@ -1834,7 +1834,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 
 		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
-		xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts;
+		xpp.flags = o->xdl_opts;
 		xdi_diff_outf(&mf1, &mf2, diffstat_consume, diffstat,
 			      &xpp, &xecfg, &ecb);
 	}
@@ -1883,7 +1883,7 @@ static void builtin_checkdiff(const char *name_a, const char *name_b,
 		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xecfg.ctxlen = 1; /* at least one context line */
-		xpp.flags = XDF_NEED_MINIMAL;
+		xpp.flags = 0;
 		xdi_diff_outf(&mf1, &mf2, checkdiff_consume, &data,
 			      &xpp, &xecfg, &ecb);
 
@@ -2816,6 +2816,8 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_XDL_SET(options, IGNORE_WHITESPACE_AT_EOL);
 	else if (!strcmp(arg, "--patience"))
 		DIFF_XDL_SET(options, PATIENCE_DIFF);
+	else if (!strcmp(arg, "--quick"))
+		DIFF_XDL_SET(options, QUICK);
 
 	/* flags options */
 	else if (!strcmp(arg, "--binary")) {
@@ -3438,7 +3440,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1)
 					len2, p->two->path);
 		git_SHA1_Update(&ctx, buffer, len1);
 
-		xpp.flags = XDF_NEED_MINIMAL;
+		xpp.flags = 0;
 		xecfg.ctxlen = 3;
 		xecfg.flags = XDL_EMIT_FUNCNAMES;
 		xdi_diff_outf(&mf1, &mf2, patch_id_consume, &data,
diff --git a/merge-file.c b/merge-file.c
index c336c93..db4d0d5 100644
--- a/merge-file.c
+++ b/merge-file.c
@@ -66,7 +66,7 @@ static int generate_common_file(mmfile_t *res, mmfile_t *f1, mmfile_t *f2)
 	xdemitcb_t ecb;
 
 	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = XDF_NEED_MINIMAL;
+	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
 	xecfg.flags = XDL_EMIT_COMMON;
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 711048e..3b8a962 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -28,7 +28,7 @@ extern "C" {
 #endif /* #ifdef __cplusplus */
 
 
-#define XDF_NEED_MINIMAL (1 << 1)
+#define XDF_QUICK (1 << 1)
 #define XDF_IGNORE_WHITESPACE (1 << 2)
 #define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
 #define XDF_IGNORE_WHITESPACE_AT_EOL (1 << 4)
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index da67c04..1f688cf 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -367,7 +367,7 @@ int xdl_do_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	dd2.rindex = xe->xdf2.rindex;
 
 	if (xdl_recs_cmp(&dd1, 0, dd1.nrec, &dd2, 0, dd2.nrec,
-			 kvdf, kvdb, (xpp->flags & XDF_NEED_MINIMAL) != 0, &xenv) < 0) {
+			 kvdf, kvdb, (xpp->flags & XDF_QUICK) == 0, &xenv) < 0) {
 
 		xdl_free(kvd);
 		xdl_free_env(xe);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: git diff too slow for a file
  2010-04-17 15:52 ` René Scharfe
@ 2010-04-17 17:10   ` Junio C Hamano
  2010-04-18 18:01     ` René Scharfe
  2010-04-19  0:43   ` SungHyun Nam
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2010-04-17 17:10 UTC (permalink / raw)
  To: René Scharfe; +Cc: SungHyun Nam, git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Am 29.03.2010 03:42, schrieb SungHyun Nam:
>>  ...
>> Well, though the files are ascii file, they includes a random
>> hexa-decimal datas, so that I don't interest the diff result at
>> all.
> ...
> The following patch is not meant for inclusion, but rather to start a
> dicussion.  Is XDF_NEED_MINIMAL a good default to have?

This is a very valid question to ask.  The choice of the default was done
without any benchmarking nor analysis on performance impact at all.

What we should do next would be to:

 - see how much performance impact we have been getting from more normal
   set of files (say, "git log -p" in the kernel archive) by our use of
   MINIMAL;  I suspect that git.git itself is too small to observe any
   meaningful difference.  We already _know_ that MINIMAL is more
   expensive, so this is not very important, but it would be good to
   know.

 - inspect the difference of the quality of output for not using MINIMAL,
   again for more normal set of files.  We know that the quality does not
   matter for pathological cases like the one in this thread --- the user
   is not even "interested in the diff result at all".

Thanks for starting this.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git diff too slow for a file
  2010-04-17 17:10   ` Junio C Hamano
@ 2010-04-18 18:01     ` René Scharfe
  2010-04-20  7:40       ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: René Scharfe @ 2010-04-18 18:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: SungHyun Nam, git

Am 17.04.2010 19:10, schrieb Junio C Hamano:
> René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> 
> René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> 
>> Am 29.03.2010 03:42, schrieb SungHyun Nam:
>>>  ...
>>> Well, though the files are ascii file, they includes a random
>>> hexa-decimal datas, so that I don't interest the diff result at
>>> all.
>> ...
>> The following patch is not meant for inclusion, but rather to start a
>> dicussion.  Is XDF_NEED_MINIMAL a good default to have?
> 
> This is a very valid question to ask.  The choice of the default was done
> without any benchmarking nor analysis on performance impact at all.
> 
> What we should do next would be to:
> 
>  - see how much performance impact we have been getting from more normal
>    set of files (say, "git log -p" in the kernel archive) by our use of
>    MINIMAL;  I suspect that git.git itself is too small to observe any
>    meaningful difference.  We already _know_ that MINIMAL is more
>    expensive, so this is not very important, but it would be good to
>    know.
> 
>  - inspect the difference of the quality of output for not using MINIMAL,
>    again for more normal set of files.  We know that the quality does not
>    matter for pathological cases like the one in this thread --- the user
>    is not even "interested in the diff result at all".

To see the what difference it makes to the output, I did the following:

	git rev-list --no-merges HEAD |
	while read commit
	do
		a=$(git show $commit | md5sum)
		b=$(git show --quick $commit | md5sum)
		test "$a" = "$b" || echo $commit
	done

For git, it reports those 2 out of 18095 total commits as being
different with and without XDF_NEED_MINIMAL:

  f522c9b5  717b8311

For Linux, these 161 out of 178107 commits are affected:

  90d49b4f  83f3c715  3b5dd52a  e97bd974  4e092d11  96b3c83d  4c96e893
  1210db95  ca97b838  52bbe3c7  a3a4f7e1  de0710aa  8e730c15  b54f78a8
  71034ba8  120a5d0d  1532ecea  0a18d7b5  3d0f8bc7  35c1b462  0f29f587
  2cf71d2e  4d295db0  4953550a  c0a5962f  5176fae4  51f94a7b  5d4f98a2
  cb6492e4  4f774513  da0436e9  3772a991  20b09c29  dd4969a8  b4a4d568
  a7fefd10  d93f87c8  9b7895ef  a72bdb1c  96b8e145  b960074f  56bec294
  f74df6fb  89cb7e7f  e8324357  5ef3041e  833dfbe7  65f9f619  02227c28
  56afef56  5db8dcc9  6261ab3a  deee7c81  e7594072  2b82032c  f1dc5600
  060ae855  5503ac56  54168ed7  e2d1d6c0  1a40e23b  d1310b2e  9c6bd790
  8feceb67  9ac19a90  60f8b39c  ef1a628d  95b86a9a  0391c828  a7707adf
  3f038d80  f5b728a1  5915eb53  287ac01a  c41f8cbd  b497549a  6802e340
  f8d79e79  34f80b04  c18487ee  c9df406f  03718234  d10c2e46  584fffc8
  624d5c51  6446a860  f1410647  1c45607a  b21a15f6  0e078e2f  8e09f215
  99ca4e58  8bf5e5ca  402aa76a  195a4ef6  a9de9248  20510f2f  ae0b78d0
  51219358  1a4f550a  b2c258fb  e2ebc74d  27c868c2  c3a2f0df  8e18257d
  05ffdd7b  328d5752  92d7f7b0  39279cc3  4b19fcc3  f23a06f0  7699acd1
  a1005012  eea221ce  dcd0538f  48b4554a  56b6aeb0  e05d723f  3cee5a60
  4d0b4af9  9e89dde2  483dfdb6  02f1175c  a8dea4ec  28a6d671  933a27d3
  572d3138  1450e6be  4ac4360b  9c7f852e  16a53ecc  5bb0b55a  2722971c
  6a2900b6  c752666c  5c04a7b8  c92f222e  a966f3e7  3ebc284d  a1a5ea70
  df694daa  7a88488b  afbf30a2  b095c381  ea2b26e0  12d30d89  09af7b44
  5e83d430  37448f7d  544393fe  d203a7ec  73a25462  bd4c625c  330a115a
  22e2c507  e9edcee0  303b86d9  47b5d69c  2d7edb92  cb624029  f4f051eb

I have briefly looked at a few of them.  They were big and not obvious
with or without XDF_NEED_MINIMAL, but the flag clearly helped to cut
them down a bit.


XDF_NEED_MINIMAL doesn't seem to affect the overall runtime for the
Linux repo:

	time git log --stat HEAD >/dev/null

	real	4m37.378s
	user	4m28.070s
	sys	0m9.310s

	time ../git/git log --stat --quick >/dev/null

	real	4m37.239s
	user	4m26.590s
	sys	0m10.620s

The difference between the times for git's own repo are in the noise, too.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git diff too slow for a file
  2010-04-17 15:52 ` René Scharfe
  2010-04-17 17:10   ` Junio C Hamano
@ 2010-04-19  0:43   ` SungHyun Nam
  1 sibling, 0 replies; 12+ messages in thread
From: SungHyun Nam @ 2010-04-19  0:43 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

René Scharfe wrote:
> Am 29.03.2010 03:42, schrieb SungHyun Nam:
>> Hello,
>>
>> If I run a attached script for bunzipped attached files, I get:
>> (To reduce size, I removed many lines and bzipped.)
>>
>>      $ ./mk.sh
>>      time diff -u x3 x4>/dev/null 2>&1
>>
>>      real    0m0.011s
>>      user    0m0.000s
>>      sys    0m0.010s
>>
>>      time git diff>/dev/null 2>&1
>>
>>      real    0m0.193s
>>      user    0m0.190s
>>      sys    0m0.000s
>>
>>      $ git version
>>      git version 1.7.0.2.273.gc2413
>>
>>      $ diff --version
>>      diff (GNU diffutils) 2.8.1
>>      ...
>>
>> Well, though the files are ascii file, they includes a random
>> hexa-decimal datas, so that I don't interest the diff result at
>> all.  But the real problem is 'rebasing took so long if the file
>> was changed'.  Because the git tree includes several such a file,
>> if they changed, rebase took some miniutes for every branch.
>> Such a branch includes a few lines of changes for a C source file,
>> though.  Now I'm waiting an hour to finish rebasing all the
>> branches and yet a rebasing script is running... :-(
>
> I can reproduce it; I concatenated your example files five times to get
> meaningful timings (x1 = five times x3, x2 = five times x4).
>
> The difference between GNU diff and git diff is that the latter is trying
> hard to minimize the size of the diff.  Each user of the xdiff library in
> git turns on the XDF_NEED_MINIMAL flag, which makes it very expensive
> (specifically the function xdl_split()).
>
> The following patch is not meant for inclusion, but rather to start a
> dicussion.  Is XDF_NEED_MINIMAL a good default to have?
>
> The patch removes XDF_NEED_MINIMAL and replaces it with XDF_QUICK, with
> reversed meaning.  XDF_QUICK is only set if the new option --quick is
> given, so without it the old behaviour is retained.  Some numbers:

The patch is great for me.  Thanks!

Added 'time git diff --quick' to the mk.sh and ran with a
original file (about 180000 lines):

     $ ./mk.sh
     time diff -u x3 x4 >/dev/null 2>&1

     real	0m0.794s
     user	0m0.720s
     sys	0m0.010s

     time git diff >/dev/null 2>&1

     real	0m44.687s
     user	0m44.670s
     sys	0m0.020s

     time git diff --quick >/dev/null 2>&1

     real	0m1.853s
     user	0m1.840s
     sys	0m0.010s

Thanks!
namsh

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git diff too slow for a file
  2010-04-18 18:01     ` René Scharfe
@ 2010-04-20  7:40       ` Junio C Hamano
  2010-04-20 21:15         ` René Scharfe
  2010-05-02 13:04         ` René Scharfe
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2010-04-20  7:40 UTC (permalink / raw)
  To: René Scharfe; +Cc: SungHyun Nam, git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> For Linux, these 161 out of 178107 commits are affected:
>
>   90d49b4f  83f3c715  3b5dd52a  e97bd974  4e092d11  96b3c83d  4c96e893
>   ...
>   22e2c507  e9edcee0  303b86d9  47b5d69c  2d7edb92  cb624029  f4f051eb
>
> I have briefly looked at a few of them.  They were big and not obvious
> with or without XDF_NEED_MINIMAL, but the flag clearly helped to cut
> them down a bit.

Thanks.

I am getting the same impression after staring some output.

Probably we should at least try to get rid of the use of MINIMAL
immediately after 1.7.1 and if nobody finds large discrepancies, aim to
ship 1.7.2 (and possibly 1.7.1.1) without even --quick/--slow options.

I expect that there will also be some differences in the blame output.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git diff too slow for a file
  2010-04-20  7:40       ` Junio C Hamano
@ 2010-04-20 21:15         ` René Scharfe
  2010-04-21  2:49           ` Junio C Hamano
  2010-05-02 13:04         ` René Scharfe
  1 sibling, 1 reply; 12+ messages in thread
From: René Scharfe @ 2010-04-20 21:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: SungHyun Nam, git

Am 20.04.2010 09:40, schrieb Junio C Hamano:
> René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
> 
>> For Linux, these 161 out of 178107 commits are affected:
>>
>>   90d49b4f  83f3c715  3b5dd52a  e97bd974  4e092d11  96b3c83d  4c96e893
>>   ...
>>   22e2c507  e9edcee0  303b86d9  47b5d69c  2d7edb92  cb624029  f4f051eb
>>
>> I have briefly looked at a few of them.  They were big and not obvious
>> with or without XDF_NEED_MINIMAL, but the flag clearly helped to cut
>> them down a bit.
> 
> Thanks.
> 
> I am getting the same impression after staring some output.
> 
> Probably we should at least try to get rid of the use of MINIMAL
> immediately after 1.7.1 and if nobody finds large discrepancies, aim to
> ship 1.7.2 (and possibly 1.7.1.1) without even --quick/--slow options.

Turning XDF_NEED_MINIMAL off by default looks like the sane thing to do
in order to help the fringe cases without hurting the normal ones.

A --slow/--minimal/--try-harder option for git diff could come in handy
for longer patches, though.  GNU diff has it, too (-d/--minimal).

> I expect that there will also be some differences in the blame output.

I haven't looked at the impact on blame, but additionally patch IDs are
going to change (for those patches where XDF_NEED_MINIMAL makes a
difference).  Are they stored somewhere?  Do we need to worry about them?

René

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git diff too slow for a file
  2010-04-20 21:15         ` René Scharfe
@ 2010-04-21  2:49           ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2010-04-21  2:49 UTC (permalink / raw)
  To: René Scharfe; +Cc: SungHyun Nam, git, Paolo Bonzini

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Turning XDF_NEED_MINIMAL off by default looks like the sane thing to do
> in order to help the fringe cases without hurting the normal ones.
>
> A --slow/--minimal/--try-harder option for git diff could come in handy
> for longer patches, though.  GNU diff has it, too (-d/--minimal).
>
>> I expect that there will also be some differences in the blame output.
>
> I haven't looked at the impact on blame, but additionally patch IDs are
> going to change (for those patches where XDF_NEED_MINIMAL makes a
> difference).  Are they stored somewhere?  Do we need to worry about them?

Patch-IDs are purely transient inside core-git as far as I know, but I
wouldn't surprised if people keep database of patch-IDs and corresponding
commits to somehow speed up change look-ups.

Theoretically, conflict IDs used to index the rerere database would also
be affected, but I don't think it is such a huge issue.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git diff too slow for a file
  2010-04-20  7:40       ` Junio C Hamano
  2010-04-20 21:15         ` René Scharfe
@ 2010-05-02 13:04         ` René Scharfe
  2010-05-02 15:10           ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: René Scharfe @ 2010-05-02 13:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: SungHyun Nam, git

Am 20.04.2010 09:40, schrieb Junio C Hamano:
> Probably we should at least try to get rid of the use of MINIMAL
> immediately after 1.7.1 and if nobody finds large discrepancies, aim to
> ship 1.7.2 (and possibly 1.7.1.1) without even --quick/--slow options.

-- >8 --
Ever since the xdiff library had been introduced to git, all its callers
have used the flag XDF_NEED_MINIMAL.  It makes sure that the smallest
possible diff is produced, but that takes quite some time if there are
lots of differences that can be expressed in multiple ways.

This flag makes a difference for only 0.1% of the non-merge commits in
the git repo of Linux, both in terms of diff size and execution time.
The patches there are mostly nice and small.

SungHyun Nam however reported a case in a different repo where a diff
took more than 20 times longer to generate with XDF_NEED_MINIMAL than
without.  Rebasing became really slow.

This patch removes this flag from all callers.  The default of xdiff is
saner because it has minimal to no impact in the normal case of small
diffs and doesn't incur that much of a speed penalty for large ones.

A follow-up patch may introduce a command line option to set the flag if
the user needs it, similar to GNU diff's -d/--minimal.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 builtin/blame.c      |    2 +-
 builtin/merge-file.c |    2 +-
 builtin/merge-tree.c |    2 +-
 builtin/rerere.c     |    2 +-
 combine-diff.c       |    2 +-
 diff.c               |   10 +++++-----
 merge-file.c         |    2 +-
 7 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index fc15863..8deeee1 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -39,7 +39,7 @@ static int show_root;
 static int reverse;
 static int blank_boundary;
 static int incremental;
-static int xdl_opts = XDF_NEED_MINIMAL;
+static int xdl_opts;
 
 static enum date_mode blame_date_mode = DATE_ISO8601;
 static size_t blame_date_width;
diff --git a/builtin/merge-file.c b/builtin/merge-file.c
index 610849a..b8e9e5b 100644
--- a/builtin/merge-file.c
+++ b/builtin/merge-file.c
@@ -25,7 +25,7 @@ int cmd_merge_file(int argc, const char **argv, const char *prefix)
 	const char *names[3] = { NULL, NULL, NULL };
 	mmfile_t mmfs[3];
 	mmbuffer_t result = {NULL, 0};
-	xmparam_t xmp = {{XDF_NEED_MINIMAL}};
+	xmparam_t xmp = {{0}};
 	int ret = 0, i = 0, to_stdout = 0;
 	int quiet = 0;
 	int nongit;
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index a4a4f2c..fc00d79 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -106,7 +106,7 @@ static void show_diff(struct merge_list *entry)
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
 
-	xpp.flags = XDF_NEED_MINIMAL;
+	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
 	ecb.outf = show_outf;
diff --git a/builtin/rerere.c b/builtin/rerere.c
index 34f9ace..0048f9e 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -89,7 +89,7 @@ static int diff_two(const char *file1, const char *label1,
 	printf("--- a/%s\n+++ b/%s\n", label1, label2);
 	fflush(stdout);
 	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = XDF_NEED_MINIMAL;
+	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
 	ecb.outf = outf;
diff --git a/combine-diff.c b/combine-diff.c
index 6162691..29779be 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -221,7 +221,7 @@ static void combine_diff(const unsigned char *parent, unsigned int mode,
 	parent_file.ptr = grab_blob(parent, mode, &sz);
 	parent_file.size = sz;
 	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = XDF_NEED_MINIMAL;
+	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	memset(&state, 0, sizeof(state));
 	state.nmask = nmask;
diff --git a/diff.c b/diff.c
index a1bf1e9..29e608b 100644
--- a/diff.c
+++ b/diff.c
@@ -718,7 +718,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	memset(&xecfg, 0, sizeof(xecfg));
 	diff_words_fill(&diff_words->minus, &minus, diff_words->word_regex);
 	diff_words_fill(&diff_words->plus, &plus, diff_words->word_regex);
-	xpp.flags = XDF_NEED_MINIMAL;
+	xpp.flags = 0;
 	/* as only the hunk header will be parsed, we need a 0-context */
 	xecfg.ctxlen = 0;
 	xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
@@ -1744,7 +1744,7 @@ static void builtin_diff(const char *name_a,
 			check_blank_at_eof(&mf1, &mf2, &ecbdata);
 		ecbdata.file = o->file;
 		ecbdata.header = header.len ? &header : NULL;
-		xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts;
+		xpp.flags = o->xdl_opts;
 		xecfg.ctxlen = o->context;
 		xecfg.interhunkctxlen = o->interhunkcontext;
 		xecfg.flags = XDL_EMIT_FUNCNAMES;
@@ -1834,7 +1834,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 
 		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
-		xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts;
+		xpp.flags = o->xdl_opts;
 		xdi_diff_outf(&mf1, &mf2, diffstat_consume, diffstat,
 			      &xpp, &xecfg, &ecb);
 	}
@@ -1883,7 +1883,7 @@ static void builtin_checkdiff(const char *name_a, const char *name_b,
 		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xecfg.ctxlen = 1; /* at least one context line */
-		xpp.flags = XDF_NEED_MINIMAL;
+		xpp.flags = 0;
 		xdi_diff_outf(&mf1, &mf2, checkdiff_consume, &data,
 			      &xpp, &xecfg, &ecb);
 
@@ -3438,7 +3438,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1)
 					len2, p->two->path);
 		git_SHA1_Update(&ctx, buffer, len1);
 
-		xpp.flags = XDF_NEED_MINIMAL;
+		xpp.flags = 0;
 		xecfg.ctxlen = 3;
 		xecfg.flags = XDL_EMIT_FUNCNAMES;
 		xdi_diff_outf(&mf1, &mf2, patch_id_consume, &data,
diff --git a/merge-file.c b/merge-file.c
index c336c93..db4d0d5 100644
--- a/merge-file.c
+++ b/merge-file.c
@@ -66,7 +66,7 @@ static int generate_common_file(mmfile_t *res, mmfile_t *f1, mmfile_t *f2)
 	xdemitcb_t ecb;
 
 	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = XDF_NEED_MINIMAL;
+	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
 	xecfg.flags = XDL_EMIT_COMMON;
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: git diff too slow for a file
  2010-05-02 13:04         ` René Scharfe
@ 2010-05-02 15:10           ` Junio C Hamano
  2010-05-04 20:16             ` René Scharfe
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2010-05-02 15:10 UTC (permalink / raw)
  To: René Scharfe; +Cc: SungHyun Nam, git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Ever since the xdiff library had been introduced to git, all its callers
> have used the flag XDF_NEED_MINIMAL.  It makes sure that the smallest
> possible diff is produced, but that takes quite some time if there are
> lots of differences that can be expressed in multiple ways.
>
> This flag makes a difference for only 0.1% of the non-merge commits in
> the git repo of Linux, both in terms of diff size and execution time.
> The patches there are mostly nice and small.
>
> SungHyun Nam however reported a case in a different repo where a diff
> took more than 20 times longer to generate with XDF_NEED_MINIMAL than
> without.  Rebasing became really slow.
>
> This patch removes this flag from all callers.  The default of xdiff is
> saner because it has minimal to no impact in the normal case of small
> diffs and doesn't incur that much of a speed penalty for large ones.

Thanks, will queue.

> diff --git a/merge-file.c b/merge-file.c
> index c336c93..db4d0d5 100644
> --- a/merge-file.c
> +++ b/merge-file.c
> @@ -66,7 +66,7 @@ static int generate_common_file(mmfile_t *res, mmfile_t *f1, mmfile_t *f2)
>  	xdemitcb_t ecb;
>  
>  	memset(&xpp, 0, sizeof(xpp));
> -	xpp.flags = XDF_NEED_MINIMAL;
> +	xpp.flags = 0;
>  	memset(&xecfg, 0, sizeof(xecfg));
>  	xecfg.ctxlen = 3;
>  	xecfg.flags = XDL_EMIT_COMMON;

When I wrote the message you are responding to, I tried to decide which is
better, to replace the assigned value like your patch does, or to remove
the assignment altogether as we have memset(&xpp, 0, sizeof(xpp)).  And I
was somewhat torn.

While it expresses what the patch wants to do a lot clearer (and it also
marks the places the "later patch" needs to touch), the resulting code
becomes slightly harder to read, because future readers of the code are
left with an obvious "why do we assign 0 after clearing the whole thing?
is there anything subtle going on?" unanswered.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: git diff too slow for a file
  2010-05-02 15:10           ` Junio C Hamano
@ 2010-05-04 20:16             ` René Scharfe
  2010-05-04 22:56               ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: René Scharfe @ 2010-05-04 20:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: SungHyun Nam, git

Am 02.05.2010 17:10, schrieb Junio C Hamano:
> René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:
>> diff --git a/merge-file.c b/merge-file.c
>> index c336c93..db4d0d5 100644
>> --- a/merge-file.c
>> +++ b/merge-file.c
>> @@ -66,7 +66,7 @@ static int generate_common_file(mmfile_t *res, mmfile_t *f1, mmfile_t *f2)
>>  	xdemitcb_t ecb;
>>  
>>  	memset(&xpp, 0, sizeof(xpp));
>> -	xpp.flags = XDF_NEED_MINIMAL;
>> +	xpp.flags = 0;
>>  	memset(&xecfg, 0, sizeof(xecfg));
>>  	xecfg.ctxlen = 3;
>>  	xecfg.flags = XDL_EMIT_COMMON;
> 
> When I wrote the message you are responding to, I tried to decide which is
> better, to replace the assigned value like your patch does, or to remove
> the assignment altogether as we have memset(&xpp, 0, sizeof(xpp)).  And I
> was somewhat torn.
> 
> While it expresses what the patch wants to do a lot clearer (and it also
> marks the places the "later patch" needs to touch), the resulting code
> becomes slightly harder to read, because future readers of the code are
> left with an obvious "why do we assign 0 after clearing the whole thing?
> is there anything subtle going on?" unanswered.

Well, I didn't do that because of a mix of laziness and caution.  A
mechanical replacement is much less likely to introduce bugs..

But when I take a closer look at the surrounding code, I can't help but
ask if the flags really have be passed in such a complicated way.

How about the following, which makes xdi_diff*() take a simple flag
parameter instead, moving the code to handle xpparam_t into
xdiff-interface.c, which seems to be the proper place for it?

René


---
 builtin/blame.c      |   10 ++--------
 builtin/merge-tree.c |    4 +---
 builtin/rerere.c     |    5 +----
 combine-diff.c       |    5 +----
 diff.c               |   25 +++++--------------------
 merge-file.c         |    5 +----
 xdiff-interface.c    |   18 +++++++++++-------
 xdiff-interface.h    |    8 ++++----
 8 files changed, 26 insertions(+), 54 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 8deeee1..1c1c9e4 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -734,7 +734,6 @@ static int pass_blame_to_parent(struct scoreboard *sb,
 	int last_in_target;
 	mmfile_t file_p, file_o;
 	struct blame_chunk_cb_data d = { sb, target, parent, 0, 0 };
-	xpparam_t xpp;
 	xdemitconf_t xecfg;
 
 	last_in_target = find_last_in_target(sb, target);
@@ -745,11 +744,9 @@ static int pass_blame_to_parent(struct scoreboard *sb,
 	fill_origin_blob(target, &file_o);
 	num_get_patch++;
 
-	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = xdl_opts;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 0;
-	xdi_diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, &xpp, &xecfg);
+	xdi_diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, &xecfg, xdl_opts);
 	/* The rest (i.e. anything after tlno) are the same as the parent */
 	blame_chunk(sb, d.tlno, d.plno, last_in_target, target, parent);
 
@@ -876,7 +873,6 @@ static void find_copy_in_blob(struct scoreboard *sb,
 	int cnt;
 	mmfile_t file_o;
 	struct handle_split_cb_data d = { sb, ent, parent, split, 0, 0 };
-	xpparam_t xpp;
 	xdemitconf_t xecfg;
 
 	/*
@@ -896,12 +892,10 @@ static void find_copy_in_blob(struct scoreboard *sb,
 	 * file_o is a part of final image we are annotating.
 	 * file_p partially may match that image.
 	 */
-	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = xdl_opts;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 1;
 	memset(split, 0, sizeof(struct blame_entry [3]));
-	xdi_diff_hunks(file_p, &file_o, handle_split_cb, &d, &xpp, &xecfg);
+	xdi_diff_hunks(file_p, &file_o, handle_split_cb, &d, &xecfg, xdl_opts);
 	/* remainder, if any, all match the preimage */
 	handle_split(sb, ent, d.tlno, d.plno, ent->num_lines, parent, split);
 }
diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c
index fc00d79..d95cd1c 100644
--- a/builtin/merge-tree.c
+++ b/builtin/merge-tree.c
@@ -102,11 +102,9 @@ static void show_diff(struct merge_list *entry)
 {
 	unsigned long size;
 	mmfile_t src, dst;
-	xpparam_t xpp;
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
 
-	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
 	ecb.outf = show_outf;
@@ -120,7 +118,7 @@ static void show_diff(struct merge_list *entry)
 	if (!dst.ptr)
 		size = 0;
 	dst.size = size;
-	xdi_diff(&src, &dst, &xpp, &xecfg, &ecb);
+	xdi_diff(&src, &dst, &xecfg, &ecb, 0);
 	free(src.ptr);
 	free(dst.ptr);
 }
diff --git a/builtin/rerere.c b/builtin/rerere.c
index 0048f9e..43b75fa 100644
--- a/builtin/rerere.c
+++ b/builtin/rerere.c
@@ -78,7 +78,6 @@ static int outf(void *dummy, mmbuffer_t *ptr, int nbuf)
 static int diff_two(const char *file1, const char *label1,
 		const char *file2, const char *label2)
 {
-	xpparam_t xpp;
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
 	mmfile_t minus, plus;
@@ -88,12 +87,10 @@ static int diff_two(const char *file1, const char *label1,
 
 	printf("--- a/%s\n+++ b/%s\n", label1, label2);
 	fflush(stdout);
-	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
 	ecb.outf = outf;
-	xdi_diff(&minus, &plus, &xpp, &xecfg, &ecb);
+	xdi_diff(&minus, &plus, &xecfg, &ecb, 0);
 
 	free(minus.ptr);
 	free(plus.ptr);
diff --git a/combine-diff.c b/combine-diff.c
index 29779be..5b29226 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -208,7 +208,6 @@ static void combine_diff(const unsigned char *parent, unsigned int mode,
 {
 	unsigned int p_lno, lno;
 	unsigned long nmask = (1UL << n);
-	xpparam_t xpp;
 	xdemitconf_t xecfg;
 	mmfile_t parent_file;
 	xdemitcb_t ecb;
@@ -220,8 +219,6 @@ static void combine_diff(const unsigned char *parent, unsigned int mode,
 
 	parent_file.ptr = grab_blob(parent, mode, &sz);
 	parent_file.size = sz;
-	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	memset(&state, 0, sizeof(state));
 	state.nmask = nmask;
@@ -231,7 +228,7 @@ static void combine_diff(const unsigned char *parent, unsigned int mode,
 	state.n = n;
 
 	xdi_diff_outf(&parent_file, result_file, consume_line, &state,
-		      &xpp, &xecfg, &ecb);
+		      &xecfg, &ecb, 0);
 	free(parent_file.ptr);
 
 	/* Assign line numbers for this parent.
diff --git a/diff.c b/diff.c
index 29e608b..d95a928 100644
--- a/diff.c
+++ b/diff.c
@@ -698,7 +698,6 @@ static void diff_words_fill(struct diff_words_buffer *buffer, mmfile_t *out,
 /* this executes the word diff on the accumulated buffers */
 static void diff_words_show(struct diff_words_data *diff_words)
 {
-	xpparam_t xpp;
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
 	mmfile_t minus, plus;
@@ -714,15 +713,13 @@ static void diff_words_show(struct diff_words_data *diff_words)
 
 	diff_words->current_plus = diff_words->plus.text.ptr;
 
-	memset(&xpp, 0, sizeof(xpp));
 	memset(&xecfg, 0, sizeof(xecfg));
 	diff_words_fill(&diff_words->minus, &minus, diff_words->word_regex);
 	diff_words_fill(&diff_words->plus, &plus, diff_words->word_regex);
-	xpp.flags = 0;
 	/* as only the hunk header will be parsed, we need a 0-context */
 	xecfg.ctxlen = 0;
 	xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
-		      &xpp, &xecfg, &ecb);
+		      &xecfg, &ecb, 0);
 	free(minus.ptr);
 	free(plus.ptr);
 	if (diff_words->current_plus != diff_words->plus.text.ptr +
@@ -1703,7 +1700,6 @@ static void builtin_diff(const char *name_a,
 	else {
 		/* Crazy xdl interfaces.. */
 		const char *diffopts = getenv("GIT_DIFF_OPTS");
-		xpparam_t xpp;
 		xdemitconf_t xecfg;
 		xdemitcb_t ecb;
 		struct emit_callback ecbdata;
@@ -1733,7 +1729,6 @@ static void builtin_diff(const char *name_a,
 		if (!pe)
 			pe = diff_funcname_pattern(two);
 
-		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		memset(&ecbdata, 0, sizeof(ecbdata));
 		ecbdata.label_path = lbl;
@@ -1744,7 +1739,6 @@ static void builtin_diff(const char *name_a,
 			check_blank_at_eof(&mf1, &mf2, &ecbdata);
 		ecbdata.file = o->file;
 		ecbdata.header = header.len ? &header : NULL;
-		xpp.flags = o->xdl_opts;
 		xecfg.ctxlen = o->context;
 		xecfg.interhunkctxlen = o->interhunkcontext;
 		xecfg.flags = XDL_EMIT_FUNCNAMES;
@@ -1777,7 +1771,7 @@ static void builtin_diff(const char *name_a,
 			}
 		}
 		xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
-			      &xpp, &xecfg, &ecb);
+			      &xecfg, &ecb, o->xdl_opts);
 		if (DIFF_OPT_TST(o, COLOR_DIFF_WORDS))
 			free_diff_words_data(&ecbdata);
 		if (textconv_one)
@@ -1828,15 +1822,12 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		data->deleted = mf1.size;
 	} else {
 		/* Crazy xdl interfaces.. */
-		xpparam_t xpp;
 		xdemitconf_t xecfg;
 		xdemitcb_t ecb;
 
-		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
-		xpp.flags = o->xdl_opts;
 		xdi_diff_outf(&mf1, &mf2, diffstat_consume, diffstat,
-			      &xpp, &xecfg, &ecb);
+			      &xecfg, &ecb, o->xdl_opts);
 	}
 
  free_and_return:
@@ -1876,16 +1867,13 @@ static void builtin_checkdiff(const char *name_a, const char *name_b,
 		goto free_and_return;
 	else {
 		/* Crazy xdl interfaces.. */
-		xpparam_t xpp;
 		xdemitconf_t xecfg;
 		xdemitcb_t ecb;
 
-		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xecfg.ctxlen = 1; /* at least one context line */
-		xpp.flags = 0;
 		xdi_diff_outf(&mf1, &mf2, checkdiff_consume, &data,
-			      &xpp, &xecfg, &ecb);
+			      &xecfg, &ecb, 0);
 
 		if (data.ws_rule & WS_BLANK_AT_EOF) {
 			struct emit_callback ecbdata;
@@ -3378,14 +3366,12 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1)
 	data.ctx = &ctx;
 
 	for (i = 0; i < q->nr; i++) {
-		xpparam_t xpp;
 		xdemitconf_t xecfg;
 		xdemitcb_t ecb;
 		mmfile_t mf1, mf2;
 		struct diff_filepair *p = q->queue[i];
 		int len1, len2;
 
-		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		if (p->status == 0)
 			return error("internal diff status error");
@@ -3438,11 +3424,10 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1)
 					len2, p->two->path);
 		git_SHA1_Update(&ctx, buffer, len1);
 
-		xpp.flags = 0;
 		xecfg.ctxlen = 3;
 		xecfg.flags = XDL_EMIT_FUNCNAMES;
 		xdi_diff_outf(&mf1, &mf2, patch_id_consume, &data,
-			      &xpp, &xecfg, &ecb);
+			      &xecfg, &ecb, 0);
 	}
 
 	git_SHA1_Final(sha1, &ctx);
diff --git a/merge-file.c b/merge-file.c
index db4d0d5..6521561 100644
--- a/merge-file.c
+++ b/merge-file.c
@@ -61,12 +61,9 @@ static int generate_common_file(mmfile_t *res, mmfile_t *f1, mmfile_t *f2)
 {
 	unsigned long size = f1->size < f2->size ? f1->size : f2->size;
 	void *ptr = xmalloc(size);
-	xpparam_t xpp;
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
 
-	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = 0;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
 	xecfg.flags = XDL_EMIT_COMMON;
@@ -76,7 +73,7 @@ static int generate_common_file(mmfile_t *res, mmfile_t *f1, mmfile_t *f2)
 	res->size = 0;
 
 	ecb.priv = res;
-	return xdi_diff(f1, f2, &xpp, &xecfg, &ecb);
+	return xdi_diff(f1, f2, &xecfg, &ecb, 0);
 }
 
 void *merge_file(const char *path, struct blob *base, struct blob *our, struct blob *their, unsigned long *size)
diff --git a/xdiff-interface.c b/xdiff-interface.c
index ca5e3fb..6457a5b 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -126,20 +126,24 @@ static void trim_common_tail(mmfile_t *a, mmfile_t *b, long ctx)
 	b->size -= trimmed - recovered;
 }
 
-int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *xecb)
+int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xdemitconf_t const *xecfg,
+	     xdemitcb_t *xecb, int xpp_flags)
 {
+	xpparam_t xpp;
 	mmfile_t a = *mf1;
 	mmfile_t b = *mf2;
 
+	memset(&xpp, 0, sizeof(xpp));
+	xpp.flags = xpp_flags;
+
 	trim_common_tail(&a, &b, xecfg->ctxlen);
 
-	return xdl_diff(&a, &b, xpp, xecfg, xecb);
+	return xdl_diff(&a, &b, &xpp, xecfg, xecb);
 }
 
 int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
 		  xdiff_emit_consume_fn fn, void *consume_callback_data,
-		  xpparam_t const *xpp,
-		  xdemitconf_t const *xecfg, xdemitcb_t *xecb)
+		  xdemitconf_t const *xecfg, xdemitcb_t *xecb, int xpp_flags)
 {
 	int ret;
 	struct xdiff_emit_state state;
@@ -150,7 +154,7 @@ int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
 	xecb->outf = xdiff_outf;
 	xecb->priv = &state;
 	strbuf_init(&state.remainder, 0);
-	ret = xdi_diff(mf1, mf2, xpp, xecfg, xecb);
+	ret = xdi_diff(mf1, mf2, xecfg, xecb, xpp_flags);
 	strbuf_release(&state.remainder);
 	return ret;
 }
@@ -185,7 +189,7 @@ static int process_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 
 int xdi_diff_hunks(mmfile_t *mf1, mmfile_t *mf2,
 		   xdiff_emit_hunk_consume_fn fn, void *consume_callback_data,
-		   xpparam_t const *xpp, xdemitconf_t *xecfg)
+		   xdemitconf_t *xecfg, int xpp_flags)
 {
 	struct xdiff_emit_hunk_state state;
 	xdemitcb_t ecb;
@@ -196,7 +200,7 @@ int xdi_diff_hunks(mmfile_t *mf1, mmfile_t *mf2,
 	state.consume_callback_data = consume_callback_data;
 	xecfg->emit_func = (void (*)())process_diff;
 	ecb.priv = &state;
-	return xdi_diff(mf1, mf2, xpp, xecfg, &ecb);
+	return xdi_diff(mf1, mf2, xecfg, &ecb, xpp_flags);
 }
 
 int read_mmfile(mmfile_t *ptr, const char *filename)
diff --git a/xdiff-interface.h b/xdiff-interface.h
index abba70c..05adeb3 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -6,14 +6,14 @@
 typedef void (*xdiff_emit_consume_fn)(void *, char *, unsigned long);
 typedef void (*xdiff_emit_hunk_consume_fn)(void *, long, long, long);
 
-int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb);
+int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xdemitconf_t const *xecfg,
+	     xdemitcb_t *ecb, int xpp_flags);
 int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
 		  xdiff_emit_consume_fn fn, void *consume_callback_data,
-		  xpparam_t const *xpp,
-		  xdemitconf_t const *xecfg, xdemitcb_t *xecb);
+		  xdemitconf_t const *xecfg, xdemitcb_t *xecb, int xpp_flags);
 int xdi_diff_hunks(mmfile_t *mf1, mmfile_t *mf2,
 		   xdiff_emit_hunk_consume_fn fn, void *consume_callback_data,
-		   xpparam_t const *xpp, xdemitconf_t *xecfg);
+		   xdemitconf_t *xecfg, int xpp_flags);
 int parse_hunk_header(char *line, int len,
 		      int *ob, int *on,
 		      int *nb, int *nn);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: git diff too slow for a file
  2010-05-04 20:16             ` René Scharfe
@ 2010-05-04 22:56               ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2010-05-04 22:56 UTC (permalink / raw)
  To: René Scharfe; +Cc: SungHyun Nam, git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> But when I take a closer look at the surrounding code, I can't help but
> ask if the flags really have be passed in such a complicated way.
>
> How about the following, which makes xdi_diff*() take a simple flag
> parameter instead, moving the code to handle xpparam_t into
> xdiff-interface.c, which seems to be the proper place for it?

This looks very sensible.  Your patch doesn't touch xdiff/ proper but only
the thin interface layer, so we don't have to worry about deviating from
the upstream even further.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-05-04 22:57 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-29  1:42 git diff too slow for a file SungHyun Nam
2010-04-17 15:52 ` René Scharfe
2010-04-17 17:10   ` Junio C Hamano
2010-04-18 18:01     ` René Scharfe
2010-04-20  7:40       ` Junio C Hamano
2010-04-20 21:15         ` René Scharfe
2010-04-21  2:49           ` Junio C Hamano
2010-05-02 13:04         ` René Scharfe
2010-05-02 15:10           ` Junio C Hamano
2010-05-04 20:16             ` René Scharfe
2010-05-04 22:56               ` Junio C Hamano
2010-04-19  0:43   ` SungHyun Nam

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.