All of lore.kernel.org
 help / color / mirror / Atom feed
* Patch id tests and diff performance optimization
@ 2010-09-19  9:59 Clemens Buchacher
  2010-09-19  9:59 ` [PATCH 1/3] add rebase patch id tests Clemens Buchacher
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Clemens Buchacher @ 2010-09-19  9:59 UTC (permalink / raw)
  To: git; +Cc: gitster

Hi,

I finally came around to de-bashify my patch ID test script. Here we go.

[PATCH 1/3] add rebase patch id tests
[PATCH 2/3] do not search functions for patch ID
[PATCH 3/3] use cache for function names in hunk headers

The first patch adds correctness and (optional) performance tests for the patch
"hash binary sha1 into patch id"
http://thread.gmane.org/gmane.comp.version-control.git/153468/focus=155919 .

The test reveals a performance problem with the search for function names for
the hunk headers. This is fixed for patch ID computation by the second patch
and for diff in general by the third patch.

Clemens
---

 diff.c                     |    2 +-
 t/t3419-rebase-patch-id.sh |  109 ++++++++++++++++++++++++++++++++++++++++++++
 xdiff/xemit.c              |   44 +++++++++++++-----
 3 files changed, 142 insertions(+), 13 deletions(-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] add rebase patch id tests
  2010-09-19  9:59 Patch id tests and diff performance optimization Clemens Buchacher
@ 2010-09-19  9:59 ` Clemens Buchacher
  2010-09-19  9:59 ` [PATCH 2/3] do not search functions for patch ID Clemens Buchacher
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Clemens Buchacher @ 2010-09-19  9:59 UTC (permalink / raw)
  To: git; +Cc: gitster, Clemens Buchacher


Signed-off-by: Clemens Buchacher <drizzd@aon.at>
---
 t/t3419-rebase-patch-id.sh |  109 ++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 109 insertions(+), 0 deletions(-)
 create mode 100755 t/t3419-rebase-patch-id.sh

diff --git a/t/t3419-rebase-patch-id.sh b/t/t3419-rebase-patch-id.sh
new file mode 100755
index 0000000..1aee483
--- /dev/null
+++ b/t/t3419-rebase-patch-id.sh
@@ -0,0 +1,109 @@
+#!/bin/bash
+
+test_description='git rebase - test patch id computation'
+
+. ./test-lib.sh
+
+test_set_prereq NOT_EXPENSIVE
+test -n "$GIT_PATCHID_TIMING_TESTS" && test_set_prereq EXPENSIVE
+test -x /usr/bin/time && test_set_prereq USR_BIN_TIME
+
+count()
+{
+	i=0
+	while test $i -lt $1
+	do
+		echo "$i"
+		i=$(($i+1))
+	done
+}
+
+scramble()
+{
+	i=0
+	while read x
+	do
+		if test $i -ne 0
+		then
+			echo "$x"
+		fi
+		i=$(((i+1) % 10))
+	done < "$1" > "$1.new"
+	mv -f "$1.new" "$1"
+}
+
+run()
+{
+	echo \$ "$@"
+	/usr/bin/time "$@" >/dev/null
+}
+
+test_expect_success 'setup' '
+	git commit --allow-empty -m initial
+	git tag root
+'
+
+do_tests()
+{
+	pr=$1
+	nlines=$2
+
+	test_expect_success $pr "setup: $nlines lines" "
+		rm -f .gitattributes &&
+		git checkout -q -f master &&
+		git reset --hard root &&
+		count $nlines >file &&
+		git add file &&
+		git commit -q -m initial &&
+		git branch -f other &&
+
+		scramble file &&
+		git add file &&
+		git commit -q -m 'change big file' &&
+
+		git checkout -q other &&
+		: >newfile &&
+		git add newfile &&
+		git commit -q -m 'add small file' &&
+
+		git cherry-pick master >/dev/null 2>&1
+	"
+
+	test_debug "
+		run git diff master^\!
+	"
+
+	test_expect_success $pr 'setup attributes' "
+		echo 'file binary' >.gitattributes
+	"
+
+	test_debug "
+		run git format-patch --stdout master &&
+		run git format-patch --stdout --ignore-if-in-upstream master
+	"
+
+	test_expect_success $pr 'detect upstream patch' "
+		git checkout -q master &&
+		scramble file &&
+		git add file &&
+		git commit -q -m 'change big file again' &&
+		git checkout -q other^{} &&
+		git rebase master &&
+		test_must_fail test -n \"\$(git rev-list master...HEAD~)\"
+	"
+
+	test_expect_success $pr 'do not drop patch' "
+		git branch -f squashed master &&
+		git checkout -q -f squashed &&
+		git reset -q --soft HEAD~2 &&
+		git commit -q -m squashed &&
+		git checkout -q other^{} &&
+		test_must_fail git rebase squashed &&
+		rm -rf .git/rebase-apply
+	"
+}
+
+do_tests NOT_EXPENSIVE 500
+do_tests EXPENSIVE 50000
+
+test_done
-- 
1.7.1.571.gba4d01

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] do not search functions for patch ID
  2010-09-19  9:59 Patch id tests and diff performance optimization Clemens Buchacher
  2010-09-19  9:59 ` [PATCH 1/3] add rebase patch id tests Clemens Buchacher
@ 2010-09-19  9:59 ` Clemens Buchacher
  2010-09-19  9:59 ` [PATCH 3/3] use cache for function names in hunk headers Clemens Buchacher
  2010-09-23  7:04 ` [PATCH 3/3 v2] " Clemens Buchacher
  3 siblings, 0 replies; 13+ messages in thread
From: Clemens Buchacher @ 2010-09-19  9:59 UTC (permalink / raw)
  To: git; +Cc: gitster, Clemens Buchacher

Visual aids, such as the function name in the hunk
header, are not necessary for the purposes of
computing a patch ID.

This is a performance optimization.

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
---
 diff.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/diff.c b/diff.c
index 9a5c77c..454b9c9 100644
--- a/diff.c
+++ b/diff.c
@@ -3865,7 +3865,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1)
 
 		xpp.flags = 0;
 		xecfg.ctxlen = 3;
-		xecfg.flags = XDL_EMIT_FUNCNAMES;
+		xecfg.flags = 0;
 		xdi_diff_outf(&mf1, &mf2, patch_id_consume, &data,
 			      &xpp, &xecfg);
 	}
-- 
1.7.1.571.gba4d01

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] use cache for function names in hunk headers
  2010-09-19  9:59 Patch id tests and diff performance optimization Clemens Buchacher
  2010-09-19  9:59 ` [PATCH 1/3] add rebase patch id tests Clemens Buchacher
  2010-09-19  9:59 ` [PATCH 2/3] do not search functions for patch ID Clemens Buchacher
@ 2010-09-19  9:59 ` Clemens Buchacher
  2010-09-19 20:24   ` Sverre Rabbelier
  2010-09-23  7:04 ` [PATCH 3/3 v2] " Clemens Buchacher
  3 siblings, 1 reply; 13+ messages in thread
From: Clemens Buchacher @ 2010-09-19  9:59 UTC (permalink / raw)
  To: git; +Cc: gitster, Clemens Buchacher

For each hunk, xdl_find_func searches the preimage
for a function name until the beginning of the
file. If the file does not contain any function
names, this search has complexity O(n^2) in the
number of hunks n.

Instead of searching the entire file for each hunk
individually, cache and reuse the search result
from previous hunks.

Diff performance for the 50000 line test in t3419
before and after this patch:

2.78user 0.01system 0:02.82elapsed 99%CPU
0.05user 0.01system 0:00.06elapsed 96%CPU

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
---
 xdiff/xemit.c |   44 ++++++++++++++++++++++++++++++++------------
 1 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index c4bedf0..349bd6b 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -85,8 +85,15 @@ static long def_ff(const char *rec, long len, char *buf, long sz, void *priv)
 	return -1;
 }
 
-static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll,
-		find_func_t ff, void *ff_priv) {
+struct xdl_find_func_cache {
+	char buf[80];
+	long len;
+	xdfile_t *xf;
+	int line;
+};
+
+static void xdl_find_func(xdfile_t *xf, long line, find_func_t ff,
+		          void *ff_priv, struct xdl_find_func_cache *cache) {
 
 	/*
 	 * Be quite stupid about this for now.  Find a line in the old file
@@ -96,13 +103,28 @@ static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll,
 
 	const char *rec;
 	long len;
+	int i, l;
 
-	while (i-- > 0) {
-		len = xdl_get_rec(xf, i, &rec);
-		if ((*ll = ff(rec, len, buf, sz, ff_priv)) >= 0)
+	if (line < cache->line)
+		cache->xf = 0;
+
+	i = line;
+	l = -1;
+	while (--i >= 0 && l < 0) {
+		if (xf == cache->xf && i < cache->line) {
+			cache->line = line;
 			return;
+		}
+
+		len = xdl_get_rec(xf, i, &rec);
+		l = ff(rec, len, cache->buf, sizeof(cache->buf), ff_priv);
 	}
-	*ll = 0;
+	if (l < 0)
+		l = 0;
+
+	cache->xf = xf;
+	cache->len = l;
+	cache->line = line;
 }
 
 
@@ -125,8 +147,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		  xdemitconf_t const *xecfg) {
 	long s1, s2, e1, e2, lctx;
 	xdchange_t *xch, *xche;
-	char funcbuf[80];
-	long funclen = 0;
+	struct xdl_find_func_cache func_cache = { "", 0, NULL, -1 };
 	find_func_t ff = xecfg->find_func ?  xecfg->find_func : def_ff;
 
 	if (xecfg->flags & XDL_EMIT_COMMON)
@@ -150,12 +171,11 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		 */
 
 		if (xecfg->flags & XDL_EMIT_FUNCNAMES) {
-			xdl_find_func(&xe->xdf1, s1, funcbuf,
-				      sizeof(funcbuf), &funclen,
-				      ff, xecfg->find_func_priv);
+			xdl_find_func(&xe->xdf1, s1, ff, xecfg->find_func_priv,
+					&func_cache);
 		}
 		if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2,
-				      funcbuf, funclen, ecb) < 0)
+				      func_cache.buf, func_cache.len, ecb) < 0)
 			return -1;
 
 		/*
-- 
1.7.1.571.gba4d01

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] use cache for function names in hunk headers
  2010-09-19  9:59 ` [PATCH 3/3] use cache for function names in hunk headers Clemens Buchacher
@ 2010-09-19 20:24   ` Sverre Rabbelier
  2010-09-20 17:36     ` Clemens Buchacher
  0 siblings, 1 reply; 13+ messages in thread
From: Sverre Rabbelier @ 2010-09-19 20:24 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: git, gitster

Heya,

On Sun, Sep 19, 2010 at 11:59, Clemens Buchacher <drizzd@aon.at> wrote:
> 2.78user 0.01system 0:02.82elapsed 99%CPU
> 0.05user 0.01system 0:00.06elapsed 96%CPU

Very nice. It would improve the commit message if you could explain
what exactly it is this optimizes though, saving the reader from
having to read through t3419 to find out.

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] use cache for function names in hunk headers
  2010-09-19 20:24   ` Sverre Rabbelier
@ 2010-09-20 17:36     ` Clemens Buchacher
  2010-09-20 19:15       ` Sverre Rabbelier
  0 siblings, 1 reply; 13+ messages in thread
From: Clemens Buchacher @ 2010-09-20 17:36 UTC (permalink / raw)
  To: Sverre Rabbelier; +Cc: git, gitster

[-- Attachment #1: Type: text/plain, Size: 675 bytes --]

On Sun, Sep 19, 2010 at 10:24:52PM +0200, Sverre Rabbelier wrote:
> 
> On Sun, Sep 19, 2010 at 11:59, Clemens Buchacher <drizzd@aon.at> wrote:
> > 2.78user 0.01system 0:02.82elapsed 99%CPU
> > 0.05user 0.01system 0:00.06elapsed 96%CPU
> 
> Very nice. It would improve the commit message if you could explain
> what exactly it is this optimizes though, saving the reader from
> having to read through t3419 to find out.

Ok.

The test creates a file with 50000 lines and a one-line change
every 10 lines, i.e., about 5000 hunks. Since none of the lines
matches a function definition, previous to this optimization, the
file was searched 5000 times.

Clemens

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] use cache for function names in hunk headers
  2010-09-20 17:36     ` Clemens Buchacher
@ 2010-09-20 19:15       ` Sverre Rabbelier
  0 siblings, 0 replies; 13+ messages in thread
From: Sverre Rabbelier @ 2010-09-20 19:15 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: git, gitster

Heya,

On Mon, Sep 20, 2010 at 19:36, Clemens Buchacher <drizzd@aon.at> wrote:
> The test creates a file with 50000 lines and a one-line change
> every 10 lines, i.e., about 5000 hunks. Since none of the lines
> matches a function definition, previous to this optimization, the
> file was searched 5000 times.

Ah, that makes sense, please add (something like) that to the commit message :).

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 3/3 v2] use cache for function names in hunk headers
  2010-09-19  9:59 Patch id tests and diff performance optimization Clemens Buchacher
                   ` (2 preceding siblings ...)
  2010-09-19  9:59 ` [PATCH 3/3] use cache for function names in hunk headers Clemens Buchacher
@ 2010-09-23  7:04 ` Clemens Buchacher
  2010-09-26 16:26   ` René Scharfe
  3 siblings, 1 reply; 13+ messages in thread
From: Clemens Buchacher @ 2010-09-23  7:04 UTC (permalink / raw)
  To: git; +Cc: gitster, Clemens Buchacher

For each hunk, xdl_find_func searches the preimage
for a function name until the beginning of the
file. If the file does not contain any function
names, this search has complexity O(n^2) in the
number of hunks n.

The timing test in t3419 creates a file with 50000
lines and a one-line change every 10 lines, i.e.,
about 5000 hunks. Since none of the lines matches
a function definition the file is searched 5000
times.

Instead of searching the entire file for each hunk
individually, cache and reuse the search result
from previous hunks.

Diff performance for the test described above
before and after this optimization:

2.78user 0.01system 0:02.82elapsed 99%CPU
0.05user 0.01system 0:00.06elapsed 96%CPU

Signed-off-by: Clemens Buchacher <drizzd@aon.at>
---

I have added the test description as requested by
Sverre. There are no code changes with respect to
the previous version of the patch.

The test scenario might seem quite obscure.  But I
have this case in some text files which are not
edited directly by humans.

Clemens

 xdiff/xemit.c |   44 ++++++++++++++++++++++++++++++++------------
 1 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index c4bedf0..349bd6b 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -85,8 +85,15 @@ static long def_ff(const char *rec, long len, char *buf, long sz, void *priv)
 	return -1;
 }
 
-static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll,
-		find_func_t ff, void *ff_priv) {
+struct xdl_find_func_cache {
+	char buf[80];
+	long len;
+	xdfile_t *xf;
+	int line;
+};
+
+static void xdl_find_func(xdfile_t *xf, long line, find_func_t ff,
+		          void *ff_priv, struct xdl_find_func_cache *cache) {
 
 	/*
 	 * Be quite stupid about this for now.  Find a line in the old file
@@ -96,13 +103,28 @@ static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll,
 
 	const char *rec;
 	long len;
+	int i, l;
 
-	while (i-- > 0) {
-		len = xdl_get_rec(xf, i, &rec);
-		if ((*ll = ff(rec, len, buf, sz, ff_priv)) >= 0)
+	if (line < cache->line)
+		cache->xf = 0;
+
+	i = line;
+	l = -1;
+	while (--i >= 0 && l < 0) {
+		if (xf == cache->xf && i < cache->line) {
+			cache->line = line;
 			return;
+		}
+
+		len = xdl_get_rec(xf, i, &rec);
+		l = ff(rec, len, cache->buf, sizeof(cache->buf), ff_priv);
 	}
-	*ll = 0;
+	if (l < 0)
+		l = 0;
+
+	cache->xf = xf;
+	cache->len = l;
+	cache->line = line;
 }
 
 
@@ -125,8 +147,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		  xdemitconf_t const *xecfg) {
 	long s1, s2, e1, e2, lctx;
 	xdchange_t *xch, *xche;
-	char funcbuf[80];
-	long funclen = 0;
+	struct xdl_find_func_cache func_cache = { "", 0, NULL, -1 };
 	find_func_t ff = xecfg->find_func ?  xecfg->find_func : def_ff;
 
 	if (xecfg->flags & XDL_EMIT_COMMON)
@@ -150,12 +171,11 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		 */
 
 		if (xecfg->flags & XDL_EMIT_FUNCNAMES) {
-			xdl_find_func(&xe->xdf1, s1, funcbuf,
-				      sizeof(funcbuf), &funclen,
-				      ff, xecfg->find_func_priv);
+			xdl_find_func(&xe->xdf1, s1, ff, xecfg->find_func_priv,
+					&func_cache);
 		}
 		if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2,
-				      funcbuf, funclen, ecb) < 0)
+				      func_cache.buf, func_cache.len, ecb) < 0)
 			return -1;
 
 		/*
-- 
1.7.1.571.gba4d01


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3 v2] use cache for function names in hunk headers
  2010-09-23  7:04 ` [PATCH 3/3 v2] " Clemens Buchacher
@ 2010-09-26 16:26   ` René Scharfe
  2010-09-26 20:43     ` Clemens Buchacher
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: René Scharfe @ 2010-09-26 16:26 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: git, gitster

Am 23.09.2010 09:04, schrieb Clemens Buchacher:
> For each hunk, xdl_find_func searches the preimage
> for a function name until the beginning of the
> file. If the file does not contain any function
> names, this search has complexity O(n2) in the
> number of hunks n.
> 
> The timing test in t3419 creates a file with 50000
> lines and a one-line change every 10 lines, i.e.,
> about 5000 hunks. Since none of the lines matches
> a function definition the file is searched 5000
> times.
> 
> Instead of searching the entire file for each hunk
> individually, cache and reuse the search result
> from previous hunks.
> 
> Diff performance for the test described above
> before and after this optimization:
> 
> 2.78user 0.01system 0:02.82elapsed 99%CPU
> 0.05user 0.01system 0:00.06elapsed 96%CPU
> 
> Signed-off-by: Clemens Buchacher <drizzd@aon.at>
> ---
> 
> I have added the test description as requested by
> Sverre. There are no code changes with respect to
> the previous version of the patch.
> 
> The test scenario might seem quite obscure.  But I
> have this case in some text files which are not
> edited directly by humans.
> 
> Clemens
> 
>  xdiff/xemit.c |   44 ++++++++++++++++++++++++++++++++------------
>  1 files changed, 32 insertions(+), 12 deletions(-)
> 
> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index c4bedf0..349bd6b 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -85,8 +85,15 @@ static long def_ff(const char *rec, long len, char *buf, long sz, void *priv)
>  	return -1;
>  }
>  
> -static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll,
> -		find_func_t ff, void *ff_priv) {
> +struct xdl_find_func_cache {
> +	char buf[80];
> +	long len;
> +	xdfile_t *xf;
> +	int line;
> +};

Is xf needed?  Does xdl_emit_diff() handle multiple files in one go?

If you inline xdl_find_func() the struct isn't needed anymore.

> +
> +static void xdl_find_func(xdfile_t *xf, long line, find_func_t ff,
> +		          void *ff_priv, struct xdl_find_func_cache *cache) {
>  
>  	/*
>  	 * Be quite stupid about this for now.  Find a line in the old file
> @@ -96,13 +103,28 @@ static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll,
>  
>  	const char *rec;
>  	long len;
> +	int i, l;
>  
> -	while (i-- > 0) {
> -		len = xdl_get_rec(xf, i, &rec);
> -		if ((*ll = ff(rec, len, buf, sz, ff_priv)) >= 0)
> +	if (line < cache->line)
> +		cache->xf = 0;

NULL is preferred.

> +
> +	i = line;
> +	l = -1;
> +	while (--i >= 0 && l < 0) {
> +		if (xf == cache->xf && i < cache->line) {
> +			cache->line = line;
>  			return;
> +		}
> +
> +		len = xdl_get_rec(xf, i, &rec);
> +		l = ff(rec, len, cache->buf, sizeof(cache->buf), ff_priv);
>  	}
> -	*ll = 0;
> +	if (l < 0)
> +		l = 0;
> +
> +	cache->xf = xf;
> +	cache->len = l;
> +	cache->line = line;
>  }
>  
>  
> @@ -125,8 +147,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>  		  xdemitconf_t const *xecfg) {
>  	long s1, s2, e1, e2, lctx;
>  	xdchange_t *xch, *xche;
> -	char funcbuf[80];
> -	long funclen = 0;
> +	struct xdl_find_func_cache func_cache = { "", 0, NULL, -1 };
>  	find_func_t ff = xecfg->find_func ?  xecfg->find_func : def_ff;
>  
>  	if (xecfg->flags & XDL_EMIT_COMMON)
> @@ -150,12 +171,11 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>  		 */
>  
>  		if (xecfg->flags & XDL_EMIT_FUNCNAMES) {
> -			xdl_find_func(&xe->xdf1, s1, funcbuf,
> -				      sizeof(funcbuf), &funclen,
> -				      ff, xecfg->find_func_priv);
> +			xdl_find_func(&xe->xdf1, s1, ff, xecfg->find_func_priv,
> +					&func_cache);
>  		}
>  		if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2,
> -				      funcbuf, funclen, ecb) < 0)
> +				      func_cache.buf, func_cache.len, ecb) < 0)
>  			return -1;
>  
>  		/*

How about something like this?  It also removes an outdated comment.  The
inlining part should probably split out in its own patch..

 xdiff/xemit.c |   38 ++++++++++++++------------------------
 1 files changed, 14 insertions(+), 24 deletions(-)

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index c4bedf0..a663f21 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -85,27 +85,6 @@ static long def_ff(const char *rec, long len, char *buf, long sz, void *priv)
 	return -1;
 }
 
-static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll,
-		find_func_t ff, void *ff_priv) {
-
-	/*
-	 * Be quite stupid about this for now.  Find a line in the old file
-	 * before the start of the hunk (and context) which starts with a
-	 * plausible character.
-	 */
-
-	const char *rec;
-	long len;
-
-	while (i-- > 0) {
-		len = xdl_get_rec(xf, i, &rec);
-		if ((*ll = ff(rec, len, buf, sz, ff_priv)) >= 0)
-			return;
-	}
-	*ll = 0;
-}
-
-
 static int xdl_emit_common(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
                            xdemitconf_t const *xecfg) {
 	xdfile_t *xdf = &xe->xdf1;
@@ -127,6 +106,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 	xdchange_t *xch, *xche;
 	char funcbuf[80];
 	long funclen = 0;
+	long funclineprev = -1;
 	find_func_t ff = xecfg->find_func ?  xecfg->find_func : def_ff;
 
 	if (xecfg->flags & XDL_EMIT_COMMON)
@@ -150,9 +130,19 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		 */
 
 		if (xecfg->flags & XDL_EMIT_FUNCNAMES) {
-			xdl_find_func(&xe->xdf1, s1, funcbuf,
-				      sizeof(funcbuf), &funclen,
-				      ff, xecfg->find_func_priv);
+			long l;
+			for (l = s1 - 1; l >= 0 && l > funclineprev; l--) {
+				const char *rec;
+				long reclen = xdl_get_rec(&xe->xdf1, l, &rec);
+				long newfunclen = ff(rec, reclen, funcbuf,
+						     sizeof(funcbuf),
+						     xecfg->find_func_priv);
+				if (newfunclen >= 0) {
+					funclen = newfunclen;
+					break;
+				}
+			}
+			funclineprev = s1 - 1;
 		}
 		if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2,
 				      funcbuf, funclen, ecb) < 0)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3 v2] use cache for function names in hunk headers
  2010-09-26 16:26   ` René Scharfe
@ 2010-09-26 20:43     ` Clemens Buchacher
  2010-09-26 22:17       ` René Scharfe
  2010-09-27 17:52     ` Junio C Hamano
  2010-09-30 18:33     ` Junio C Hamano
  2 siblings, 1 reply; 13+ messages in thread
From: Clemens Buchacher @ 2010-09-26 20:43 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, gitster

[-- Attachment #1: Type: text/plain, Size: 908 bytes --]

On Sun, Sep 26, 2010 at 06:26:56PM +0200, René Scharfe wrote:
> 
> Is xf needed?  Does xdl_emit_diff() handle multiple files in one go?

Right now it does not.

> If you inline xdl_find_func() the struct isn't needed anymore.
[...]
>
> How about something like this?

Yes, that's better. Thanks.

> -	/*
> -	 * Be quite stupid about this for now.  Find a line in the old file
> -	 * before the start of the hunk (and context) which starts with a
> -	 * plausible character.
> -	 */
>
> It also removes an outdated comment.

Actually, in the default case, the comment is still correct and
helpful IMO.

> The inlining part should probably split out in its own patch..

I am not sure what you mean here. Do you want to add the parameter
funclineprev to the function and remove the function in the next
page, or do you want to refactor the below into an inline function?

Clemens

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3 v2] use cache for function names in hunk headers
  2010-09-26 20:43     ` Clemens Buchacher
@ 2010-09-26 22:17       ` René Scharfe
  0 siblings, 0 replies; 13+ messages in thread
From: René Scharfe @ 2010-09-26 22:17 UTC (permalink / raw)
  To: Clemens Buchacher; +Cc: git, gitster

Am 26.09.2010 22:43, schrieb Clemens Buchacher:
> On Sun, Sep 26, 2010 at 06:26:56PM +0200, René Scharfe wrote:
>> -	/*
>> -	 * Be quite stupid about this for now.  Find a line in the old file
>> -	 * before the start of the hunk (and context) which starts with a
>> -	 * plausible character.
>> -	 */
>>
>> It also removes an outdated comment.
> 
> Actually, in the default case, the comment is still correct and
> helpful IMO.

The first sentence was probably written before the match function became
configurable.  The second one could be moved to def_ff().

>> The inlining part should probably split out in its own patch..
> 
> I am not sure what you mean here. Do you want to add the parameter
> funclineprev to the function and remove the function in the next
> page, or do you want to refactor the below into an inline function?

I'd suggest having a first patch for folding xdl_find_func() into
xdl_emit_diff() without any functional change (manually inlining it) and
a second one for adding funclineprev etc..

René

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3 v2] use cache for function names in hunk headers
  2010-09-26 16:26   ` René Scharfe
  2010-09-26 20:43     ` Clemens Buchacher
@ 2010-09-27 17:52     ` Junio C Hamano
  2010-09-30 18:33     ` Junio C Hamano
  2 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2010-09-27 17:52 UTC (permalink / raw)
  To: René Scharfe; +Cc: Clemens Buchacher, git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> How about something like this?  It also removes an outdated comment.  The
> inlining part should probably split out in its own patch..
>
>  xdiff/xemit.c |   38 ++++++++++++++------------------------
>  1 files changed, 14 insertions(+), 24 deletions(-)
>
> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index c4bedf0..a663f21 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -85,27 +85,6 @@ static long def_ff(const char *rec, long len, char *buf, long sz, void *priv)
>  	return -1;
>  }
>  
> -static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll,
> -		find_func_t ff, void *ff_priv) {
> -
> -	/*
> -	 * Be quite stupid about this for now.  Find a line in the old file
> -	 * before the start of the hunk (and context) which starts with a
> -	 * plausible character.
> -	 */
> -
> -	const char *rec;
> -	long len;
> -
> -	while (i-- > 0) {
> -		len = xdl_get_rec(xf, i, &rec);
> -		if ((*ll = ff(rec, len, buf, sz, ff_priv)) >= 0)
> -			return;
> -	}
> -	*ll = 0;
> -}
> -
> -
>  static int xdl_emit_common(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>                             xdemitconf_t const *xecfg) {
>  	xdfile_t *xdf = &xe->xdf1;
> @@ -127,6 +106,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>  	xdchange_t *xch, *xche;
>  	char funcbuf[80];
>  	long funclen = 0;
> +	long funclineprev = -1;
>  	find_func_t ff = xecfg->find_func ?  xecfg->find_func : def_ff;
>  
>  	if (xecfg->flags & XDL_EMIT_COMMON)
> @@ -150,9 +130,19 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>  		 */
>  
>  		if (xecfg->flags & XDL_EMIT_FUNCNAMES) {
> -			xdl_find_func(&xe->xdf1, s1, funcbuf,
> -				      sizeof(funcbuf), &funclen,
> -				      ff, xecfg->find_func_priv);
> +			long l;
> +			for (l = s1 - 1; l >= 0 && l > funclineprev; l--) {
> +				const char *rec;
> +				long reclen = xdl_get_rec(&xe->xdf1, l, &rec);
> +				long newfunclen = ff(rec, reclen, funcbuf,
> +						     sizeof(funcbuf),
> +						     xecfg->find_func_priv);
> +				if (newfunclen >= 0) {
> +					funclen = newfunclen;
> +					break;
> +				}
> +			}
> +			funclineprev = s1 - 1;
>  		}
>  		if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2,
>  				      funcbuf, funclen, ecb) < 0)

Looks much more straightforward ;-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3 v2] use cache for function names in hunk headers
  2010-09-26 16:26   ` René Scharfe
  2010-09-26 20:43     ` Clemens Buchacher
  2010-09-27 17:52     ` Junio C Hamano
@ 2010-09-30 18:33     ` Junio C Hamano
  2 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2010-09-30 18:33 UTC (permalink / raw)
  To: René Scharfe; +Cc: Clemens Buchacher, git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Am 23.09.2010 09:04, schrieb Clemens Buchacher:
>> For each hunk, xdl_find_func searches the preimage
>> for a function name until the beginning of the
>> file. If the file does not contain any function
>> names, this search has complexity O(n2) in the
>> number of hunks n.
> ...
> How about something like this?  It also removes an outdated comment.  The
> inlining part should probably split out in its own patch..
>
>  xdiff/xemit.c |   38 ++++++++++++++------------------------
>  1 files changed, 14 insertions(+), 24 deletions(-)

I had to wonder if this optimization closes a door for us to extend the
function header patterns to allow inspecting multiple lines.

Suppose that a rule says that a header line is a first non-blank line that
follows two or more blank lines.  E.g.

    ... the end of a paragraph that is not followed by two blank lines.
    (blank line)
    This is not a header, but is part of the previous paragraph group.
    ... body of the paragraph here ...
    and this is the tail of the paragraph group.
    (blank line)
    (blank line)
    This begins a new paragraph group.
    And this is the second line of it.

Under such a rule, "This begins a new paragraph group." is a header line.
If the first hunk begins at the second blank line before that header, we
will scan down to the beginning of the file and will find a header that
comes before that header line, and remember it in funclen/funcbuf[].  The
next scan will stop after looking at one blank line before "This begins a
new paragraph group.", instead of going down to the beginning, and would
not to have a chance to notice that "This begins a new paragraph group."
indeed has two blank lines before it.  It would instead keep returning the
header we found before this header.

I do not think this is such a big issue, though.  In order to allow such a
multi-line matching, the loop in xdl_find_func(), now inlined, needs to be
restructured anyway, and whoever is doing such an enhancement hopefully
will notice that s/he needs to look behind beyond (s1-1) at that point.

So an Ack from me.

Thanks.

> diff --git a/xdiff/xemit.c b/xdiff/xemit.c
> index c4bedf0..a663f21 100644
> --- a/xdiff/xemit.c
> +++ b/xdiff/xemit.c
> @@ -85,27 +85,6 @@ static long def_ff(const char *rec, long len, char *buf, long sz, void *priv)
>  	return -1;
>  }
>  
> -static void xdl_find_func(xdfile_t *xf, long i, char *buf, long sz, long *ll,
> -		find_func_t ff, void *ff_priv) {
> -
> -	/*
> -	 * Be quite stupid about this for now.  Find a line in the old file
> -	 * before the start of the hunk (and context) which starts with a
> -	 * plausible character.
> -	 */
> -
> -	const char *rec;
> -	long len;
> -
> -	while (i-- > 0) {
> -		len = xdl_get_rec(xf, i, &rec);
> -		if ((*ll = ff(rec, len, buf, sz, ff_priv)) >= 0)
> -			return;
> -	}
> -	*ll = 0;
> -}
> -
> -
>  static int xdl_emit_common(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>                             xdemitconf_t const *xecfg) {
>  	xdfile_t *xdf = &xe->xdf1;
> @@ -127,6 +106,7 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>  	xdchange_t *xch, *xche;
>  	char funcbuf[80];
>  	long funclen = 0;
> +	long funclineprev = -1;
>  	find_func_t ff = xecfg->find_func ?  xecfg->find_func : def_ff;
>  
>  	if (xecfg->flags & XDL_EMIT_COMMON)
> @@ -150,9 +130,19 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
>  		 */
>  
>  		if (xecfg->flags & XDL_EMIT_FUNCNAMES) {
> -			xdl_find_func(&xe->xdf1, s1, funcbuf,
> -				      sizeof(funcbuf), &funclen,
> -				      ff, xecfg->find_func_priv);
> +			long l;
> +			for (l = s1 - 1; l >= 0 && l > funclineprev; l--) {
> +				const char *rec;
> +				long reclen = xdl_get_rec(&xe->xdf1, l, &rec);
> +				long newfunclen = ff(rec, reclen, funcbuf,
> +						     sizeof(funcbuf),
> +						     xecfg->find_func_priv);
> +				if (newfunclen >= 0) {
> +					funclen = newfunclen;
> +					break;
> +				}
> +			}
> +			funclineprev = s1 - 1;
>  		}
>  		if (xdl_emit_hunk_hdr(s1 + 1, e1 - s1, s2 + 1, e2 - s2,
>  				      funcbuf, funclen, ecb) < 0)

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-09-30 18:33 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-19  9:59 Patch id tests and diff performance optimization Clemens Buchacher
2010-09-19  9:59 ` [PATCH 1/3] add rebase patch id tests Clemens Buchacher
2010-09-19  9:59 ` [PATCH 2/3] do not search functions for patch ID Clemens Buchacher
2010-09-19  9:59 ` [PATCH 3/3] use cache for function names in hunk headers Clemens Buchacher
2010-09-19 20:24   ` Sverre Rabbelier
2010-09-20 17:36     ` Clemens Buchacher
2010-09-20 19:15       ` Sverre Rabbelier
2010-09-23  7:04 ` [PATCH 3/3 v2] " Clemens Buchacher
2010-09-26 16:26   ` René Scharfe
2010-09-26 20:43     ` Clemens Buchacher
2010-09-26 22:17       ` René Scharfe
2010-09-27 17:52     ` Junio C Hamano
2010-09-30 18:33     ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.