git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] More git blame speed improvements
@ 2008-08-21 23:21 Brian Downing
  2008-08-21 23:21 ` [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff Brian Downing
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Downing @ 2008-08-21 23:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Brian Downing

This patch series contains more git blame speed improvements.  It's
really in two parts; patches 1-2 are the first improvement, and patches
3-5 are the second (which depend on the first, at least textually.)

There are some things I'm not entirely happy about in here, but overall
it's not too bad for a first effort.  I'm interested in hearing what
people have to say about it, especially since I'm adding new interfaces
to the xdiff core for some of this.

Performance summary for my (proprietary, alas) test case:
:; time git-blame -M -C -C -p --incremental server.c >/dev/null
Before (master):
79.62user 0.10system 1:19.81elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+41189minor)pagefaults 0swaps
After:
29.68user 0.22system 0:29.98elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+37897minor)pagefaults 0swaps

I have an additional patch (not included here) that caches the actual
entry lines from the blame, but the performance gain was not exciting
(the entries passed as the second file to the compare_buffer() call in
find_copy_in_blob() are generally quite small), and it results in more
page faults and a fair amount more code:
:; time git-blame -M -C -C -p --incremental server.c >/dev/null
28.82user 0.07system 0:28.93elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+40268minor)pagefaults 0swaps

 [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff
 [PATCH 2/5] Bypass textual patch generation and parsing in git blame
 [PATCH 3/5] Always initialize xpparam_t to 0
 [PATCH 4/5] Allow xdiff machinery to cache hash results for a file
 [PATCH 5/5] Use xdiff caching to improve git blame performance

 builtin-blame.c  |  103 +++++++++++++++++++++--------------------------------
 builtin-rerere.c |    1 +
 combine-diff.c   |    1 +
 diff.c           |    5 +++
 merge-file.c     |    1 +
 xdiff/xdiff.h    |   12 ++++++
 xdiff/xdiffi.c   |    4 ++-
 xdiff/xemit.c    |    3 +-
 xdiff/xemit.h    |    3 ++
 xdiff/xprepare.c |   59 +++++++++++++++++++++++++++----
 xdiff/xtypes.h   |    1 +
 11 files changed, 121 insertions(+), 72 deletions(-)

-bcd

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff
  2008-08-21 23:21 [PATCH 0/5] More git blame speed improvements Brian Downing
@ 2008-08-21 23:21 ` Brian Downing
  2008-08-21 23:21   ` [PATCH 2/5] Bypass textual patch generation and parsing in git blame Brian Downing
  2008-08-23  8:15   ` [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff René Scharfe
  0 siblings, 2 replies; 17+ messages in thread
From: Brian Downing @ 2008-08-21 23:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Brian Downing

For some users (e.g. git blame), getting textual patch output is just
extra work, as they can get all the information they need from the low-
level diff structures.  Allow for an alternate low-level emit function
to be defined to allow bypassing the textual patch generation; set
xemitconf_t's emit_func member to enable this.

The (void (*)()) type is pretty ugly, but the alternative would be to
include most of the private xdiff headers in xdiff.h to get the types
required for the "proper" function prototype.  Also, a (void *) won't
work, as ANSI C doesn't allow a function pointer to be cast to an
object pointer.

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 xdiff/xdiff.h  |    1 +
 xdiff/xdiffi.c |    4 +++-
 xdiff/xemit.c  |    3 +--
 xdiff/xemit.h  |    3 +++
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 413082e..281fc0b 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -81,6 +81,7 @@ typedef struct s_xdemitconf {
 	unsigned long flags;
 	find_func_t find_func;
 	void *find_func_priv;
+	void (*emit_func)();
 } xdemitconf_t;
 
 typedef struct s_bdiffparam {
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 1bad846..9d0324a 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -538,6 +538,8 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
 	xdchange_t *xscr;
 	xdfenv_t xe;
+	emit_func_t ef = xecfg->emit_func ?
+		(emit_func_t)xecfg->emit_func : xdl_emit_diff;
 
 	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {
 
@@ -551,7 +553,7 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		return -1;
 	}
 	if (xscr) {
-		if (xdl_emit_diff(&xe, xscr, ecb, xecfg) < 0) {
+		if (ef(&xe, xscr, ecb, xecfg) < 0) {
 
 			xdl_free_script(xscr);
 			xdl_free_env(&xe);
diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index d3d9c84..4625c1b 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -27,7 +27,6 @@
 
 static long xdl_get_rec(xdfile_t *xdf, long ri, char const **rec);
 static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb);
-static xdchange_t *xdl_get_hunk(xdchange_t *xscr, xdemitconf_t const *xecfg);
 
 
 
@@ -58,7 +57,7 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
  * Starting at the passed change atom, find the latest change atom to be included
  * inside the differential hunk according to the specified configuration.
  */
-static xdchange_t *xdl_get_hunk(xdchange_t *xscr, xdemitconf_t const *xecfg) {
+xdchange_t *xdl_get_hunk(xdchange_t *xscr, xdemitconf_t const *xecfg) {
 	xdchange_t *xch, *xchp;
 
 	for (xchp = xscr, xch = xscr->next; xch; xchp = xch, xch = xch->next)
diff --git a/xdiff/xemit.h b/xdiff/xemit.h
index 440a739..c2e2e83 100644
--- a/xdiff/xemit.h
+++ b/xdiff/xemit.h
@@ -24,7 +24,10 @@
 #define XEMIT_H
 
 
+typedef int (*emit_func_t)(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
+			   xdemitconf_t const *xecfg);
 
+xdchange_t *xdl_get_hunk(xdchange_t *xscr, xdemitconf_t const *xecfg);
 int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		  xdemitconf_t const *xecfg);
 
-- 
1.5.6.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] Bypass textual patch generation and parsing in git blame
  2008-08-21 23:21 ` [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff Brian Downing
@ 2008-08-21 23:21   ` Brian Downing
  2008-08-21 23:21     ` [PATCH 3/5] Always initialize xpparam_t to 0 Brian Downing
  2008-08-23  8:15   ` [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff René Scharfe
  1 sibling, 1 reply; 17+ messages in thread
From: Brian Downing @ 2008-08-21 23:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Brian Downing

This uses the new xdiff emit_func feature to directly generate the
patch/chunk information from the low-level diff output, rather than
generating and parsing a patch.  This improves performance considerably
for certain test cases:

:; time git-blame -M -C -C -p --incremental server.c >/dev/null
Before:
79.62user 0.10system 1:19.81elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+41189minor)pagefaults 0swaps
After:
48.66user 0.08system 0:48.75elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36961minor)pagefaults 0swaps

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 builtin-blame.c |   90 +++++++++++++++++++------------------------------------
 1 files changed, 31 insertions(+), 59 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index e4d12de..60f70bf 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -19,6 +19,10 @@
 #include "string-list.h"
 #include "mailmap.h"
 #include "parse-options.h"
+#include "xdiff/xtypes.h"
+#include "xdiff/xdiffi.h"
+#include "xdiff/xemit.h"
+#include "xdiff/xmacros.h"
 
 static char blame_usage[] = "git blame [options] [rev-opts] [rev] [--] file";
 
@@ -464,62 +468,36 @@ struct patch {
 	int num;
 };
 
-struct blame_diff_state {
-	struct patch *ret;
-	unsigned hunk_post_context;
-	unsigned hunk_in_pre_context : 1;
-};
-
-static void process_u_diff(void *state_, char *line, unsigned long len)
+static int process_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
+			xdemitconf_t const *xecfg)
 {
-	struct blame_diff_state *state = state_;
+	struct patch *patch = ecb->priv;
+	long s1, s2;
+	xdchange_t *xch, *xche;
 	struct chunk *chunk;
-	int off1, off2, len1, len2, num;
 
-	num = state->ret->num;
-	if (len < 4 || line[0] != '@' || line[1] != '@') {
-		if (state->hunk_in_pre_context && line[0] == ' ')
-			state->ret->chunks[num - 1].same++;
-		else {
-			state->hunk_in_pre_context = 0;
-			if (line[0] == ' ')
-				state->hunk_post_context++;
-			else
-				state->hunk_post_context = 0;
-		}
-		return;
-	}
+	for (xch = xche = xscr; xch; xch = xche->next) {
+		xche = xdl_get_hunk(xch, xecfg);
 
-	if (num && state->hunk_post_context) {
-		chunk = &state->ret->chunks[num - 1];
-		chunk->p_next -= state->hunk_post_context;
-		chunk->t_next -= state->hunk_post_context;
-	}
-	state->ret->num = ++num;
-	state->ret->chunks = xrealloc(state->ret->chunks,
-				      sizeof(struct chunk) * num);
-	chunk = &state->ret->chunks[num - 1];
-	if (parse_hunk_header(line, len, &off1, &len1, &off2, &len2)) {
-		state->ret->num--;
-		return;
-	}
-
-	/* Line numbers in patch output are one based. */
-	off1--;
-	off2--;
+		s1 = XDL_MAX(xch->i1 - xecfg->ctxlen, 0);
+		s2 = XDL_MAX(xch->i2 - xecfg->ctxlen, 0);
 
-	chunk->same = len2 ? off2 : (off2 + 1);
+		++patch->num;
+		patch->chunks = xrealloc(patch->chunks,
+					 sizeof(struct chunk) * patch->num);
+		chunk = &patch->chunks[patch->num - 1];
+		chunk->same = s2 + XDL_MAX(xch->i1 - s1, 0);
+		chunk->p_next = xche->i1 + xche->chg1;
+		chunk->t_next = xche->i2 + xche->chg2;
+	}
 
-	chunk->p_next = off1 + (len1 ? len1 : 1);
-	chunk->t_next = chunk->same + len2;
-	state->hunk_in_pre_context = 1;
-	state->hunk_post_context = 0;
+	return 0;
 }
 
 static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o,
 				    int context)
 {
-	struct blame_diff_state state;
+	struct patch *patch;
 	xpparam_t xpp;
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
@@ -527,20 +505,14 @@ static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o,
 	xpp.flags = xdl_opts;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = context;
-	memset(&state, 0, sizeof(state));
-	state.ret = xmalloc(sizeof(struct patch));
-	state.ret->chunks = NULL;
-	state.ret->num = 0;
-
-	xdi_diff_outf(file_p, file_o, process_u_diff, &state, &xpp, &xecfg, &ecb);
+	patch = xmalloc(sizeof(struct patch));
+	patch->chunks = NULL;
+	patch->num = 0;
+	xecfg.emit_func = (void (*)())process_diff;
+	ecb.priv = patch;
+	xdi_diff(file_p, file_o, &xpp, &xecfg, &ecb);
 
-	if (state.ret->num) {
-		struct chunk *chunk;
-		chunk = &state.ret->chunks[state.ret->num - 1];
-		chunk->p_next -= state.hunk_post_context;
-		chunk->t_next -= state.hunk_post_context;
-	}
-	return state.ret;
+	return patch;
 }
 
 /*
-- 
1.5.6.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] Always initialize xpparam_t to 0
  2008-08-21 23:21   ` [PATCH 2/5] Bypass textual patch generation and parsing in git blame Brian Downing
@ 2008-08-21 23:21     ` Brian Downing
  2008-08-21 23:22       ` [PATCH 4/5] Allow xdiff machinery to cache hash results for a file Brian Downing
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Downing @ 2008-08-21 23:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Brian Downing

We're going to be adding some parameters to this, so we can't have
any uninitialized data in it.

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 builtin-blame.c  |    1 +
 builtin-rerere.c |    1 +
 combine-diff.c   |    1 +
 diff.c           |    5 +++++
 merge-file.c     |    1 +
 5 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 60f70bf..66b7d15 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -502,6 +502,7 @@ static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o,
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
 
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = xdl_opts;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = context;
diff --git a/builtin-rerere.c b/builtin-rerere.c
index dd4573f..d4dec6b 100644
--- a/builtin-rerere.c
+++ b/builtin-rerere.c
@@ -98,6 +98,7 @@ static int diff_two(const char *file1, const char *label1,
 
 	printf("--- a/%s\n+++ b/%s\n", label1, label2);
 	fflush(stdout);
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = XDF_NEED_MINIMAL;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
diff --git a/combine-diff.c b/combine-diff.c
index 31ec0c5..70d5aad 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -213,6 +213,7 @@ static void combine_diff(const unsigned char *parent, mmfile_t *result_file,
 
 	parent_file.ptr = grab_blob(parent, &sz);
 	parent_file.size = sz;
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = XDF_NEED_MINIMAL;
 	memset(&xecfg, 0, sizeof(xecfg));
 	memset(&state, 0, sizeof(state));
diff --git a/diff.c b/diff.c
index 5923fe2..52346fe 100644
--- a/diff.c
+++ b/diff.c
@@ -446,6 +446,7 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	mmfile_t minus, plus;
 	int i;
 
+	memset(&xpp, 0, sizeof(xpp));
 	memset(&xecfg, 0, sizeof(xecfg));
 	minus.size = diff_words->minus.text.size;
 	minus.ptr = xmalloc(minus.size);
@@ -1508,6 +1509,7 @@ static void builtin_diff(const char *name_a,
 		if (!funcname_pattern)
 			funcname_pattern = diff_funcname_pattern(two);
 
+		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		memset(&ecbdata, 0, sizeof(ecbdata));
 		ecbdata.label_path = lbl;
@@ -1581,6 +1583,7 @@ static void builtin_diffstat(const char *name_a, const char *name_b,
 		xdemitconf_t xecfg;
 		xdemitcb_t ecb;
 
+		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts;
 		xdi_diff_outf(&mf1, &mf2, diffstat_consume, diffstat,
@@ -1627,6 +1630,7 @@ static void builtin_checkdiff(const char *name_a, const char *name_b,
 		xdemitconf_t xecfg;
 		xdemitcb_t ecb;
 
+		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xecfg.ctxlen = 1; /* at least one context line */
 		xpp.flags = XDF_NEED_MINIMAL;
@@ -3072,6 +3076,7 @@ static int diff_get_patch_id(struct diff_options *options, unsigned char *sha1)
 		struct diff_filepair *p = q->queue[i];
 		int len1, len2;
 
+		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		if (p->status == 0)
 			return error("internal diff status error");
diff --git a/merge-file.c b/merge-file.c
index 2a939c9..3120a95 100644
--- a/merge-file.c
+++ b/merge-file.c
@@ -61,6 +61,7 @@ static int generate_common_file(mmfile_t *res, mmfile_t *f1, mmfile_t *f2)
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;
 
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = XDF_NEED_MINIMAL;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
-- 
1.5.6.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] Allow xdiff machinery to cache hash results for a file
  2008-08-21 23:21     ` [PATCH 3/5] Always initialize xpparam_t to 0 Brian Downing
@ 2008-08-21 23:22       ` Brian Downing
  2008-08-21 23:22         ` [PATCH 5/5] Use xdiff caching to improve git blame performance Brian Downing
  0 siblings, 1 reply; 17+ messages in thread
From: Brian Downing @ 2008-08-21 23:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Brian Downing

When generating diffs against the same file multiple times, it is a
waste of work to regenerate the hash values for each line each time.
Instead, allow a cache pointer to be passed in xpparam_t; set mf1_cache
to the cache for the first mmfile_t to xdl_diff, and mf2_cache for the
second.

This works like:

	xdcache_t cache;
	memset(cache, 0, sizeof(cache));
	/* ... */
	xpp.mf1_cache = &cache;
	xdl_diff(file1, file2, &xpp, &xecfg, &ecb);
	/* ...later... */
	xpp.mf1_cache = &cache;
	xdl_diff(file1, file3, &xpp, &xecfg, &ecb);
	/* The cache for file1 will be reused. */
	xdl_cache_free(&cache);

Note that this isn't compatible with xdi_diff as-is, as getting a
complete cache is incompatible with tail trimming.

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 xdiff/xdiff.h    |   11 ++++++++++
 xdiff/xprepare.c |   59 +++++++++++++++++++++++++++++++++++++++++++++++------
 xdiff/xtypes.h   |    1 +
 3 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 281fc0b..6fd922b 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -65,8 +65,17 @@ typedef struct s_mmbuffer {
 	long size;
 } mmbuffer_t;
 
+typedef struct s_xdcache_int {
+	long nrec;
+	unsigned long flags;
+	unsigned long *ha;
+	long *size;
+} xdcache_t;
+
 typedef struct s_xpparam {
 	unsigned long flags;
+	xdcache_t *mf1_cache;
+	xdcache_t *mf2_cache;
 } xpparam_t;
 
 typedef struct s_xdemitcb {
@@ -104,6 +113,8 @@ int xdl_merge(mmfile_t *orig, mmfile_t *mf1, const char *name1,
 		mmfile_t *mf2, const char *name2,
 		xpparam_t const *xpp, int level, mmbuffer_t *result);
 
+void xdl_cache_free(xdcache_t *cache);
+
 #ifdef __cplusplus
 }
 #endif /* #ifdef __cplusplus */
diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index e87ab57..291caf9 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -54,7 +54,8 @@ static void xdl_free_classifier(xdlclassifier_t *cf);
 static int xdl_classify_record(xdlclassifier_t *cf, xrecord_t **rhash, unsigned int hbits,
 			       xrecord_t *rec);
 static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
-			   xdlclassifier_t *cf, xdfile_t *xdf);
+			   xdlclassifier_t *cf, xdfile_t *xdf,
+			   xdcache_t *cache);
 static void xdl_free_ctx(xdfile_t *xdf);
 static int xdl_clean_mmatch(char const *dis, long i, long s, long e);
 static int xdl_cleanup_records(xdfile_t *xdf1, xdfile_t *xdf2);
@@ -135,7 +136,8 @@ static int xdl_classify_record(xdlclassifier_t *cf, xrecord_t **rhash, unsigned
 
 
 static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
-			   xdlclassifier_t *cf, xdfile_t *xdf) {
+			   xdlclassifier_t *cf, xdfile_t *xdf,
+			   xdcache_t *cache) {
 	unsigned int hbits;
 	long i, nrec, hsize, bsize;
 	unsigned long hav;
@@ -177,7 +179,12 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 				top = blk + bsize;
 			}
 			prev = cur;
-			hav = xdl_hash_record(&cur, top, xpp->flags);
+			if (cache && cache->ha) {
+				hav = cache->ha[nrec];
+				cur += cache->size[nrec];
+			} else {
+				hav = xdl_hash_record(&cur, top, xpp->flags);
+			}
 			if (nrec >= narec) {
 				narec *= 2;
 				if (!(rrecs = (xrecord_t **) xdl_realloc(recs, narec * sizeof(xrecord_t *)))) {
@@ -199,6 +206,7 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 			crec->ptr = prev;
 			crec->size = (long) (cur - prev);
 			crec->ha = hav;
+			crec->original_ha = hav;
 			recs[nrec++] = crec;
 
 			if (xdl_classify_record(cf, rhash, hbits, crec) < 0) {
@@ -249,6 +257,23 @@ static int xdl_prepare_ctx(mmfile_t *mf, long narec, xpparam_t const *xpp,
 	xdf->dstart = 0;
 	xdf->dend = nrec - 1;
 
+	if (cache && !cache->ha) {
+		cache->nrec = nrec;
+		cache->ha = xdl_malloc(nrec * sizeof(unsigned long));
+		if (!cache->ha)
+			return 0;
+		cache->size = xdl_malloc(nrec * sizeof(long));
+		if (!cache->size) {
+			xdl_free(cache->ha);
+			cache->ha = NULL;
+			return 0;
+		}
+		for (i = 0; i < nrec; ++i) {
+			cache->ha[i] = recs[i]->original_ha;
+			cache->size[i] = recs[i]->size;
+		}
+	}
+
 	return 0;
 }
 
@@ -268,21 +293,34 @@ int xdl_prepare_env(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		    xdfenv_t *xe) {
 	long enl1, enl2;
 	xdlclassifier_t cf;
+	xdcache_t *c1 = xpp->mf1_cache;
+	xdcache_t *c2 = xpp->mf2_cache;
 
-	enl1 = xdl_guess_lines(mf1) + 1;
-	enl2 = xdl_guess_lines(mf2) + 1;
+	if (c1) {
+		if (c1->flags != xpp->flags)
+			xdl_cache_free(c1);
+		c1->flags = xpp->flags;
+	}
+	if (c2) {
+		if (c2->flags != xpp->flags)
+			xdl_cache_free(c2);
+		c2->flags = xpp->flags;
+	}
+
+	enl1 = c1 && c1->nrec ? c1->nrec : (xdl_guess_lines(mf1) + 1);
+	enl2 = c2 && c2->nrec ? c2->nrec : (xdl_guess_lines(mf2) + 1);
 
 	if (xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0) {
 
 		return -1;
 	}
 
-	if (xdl_prepare_ctx(mf1, enl1, xpp, &cf, &xe->xdf1) < 0) {
+	if (xdl_prepare_ctx(mf1, enl1, xpp, &cf, &xe->xdf1, c1) < 0) {
 
 		xdl_free_classifier(&cf);
 		return -1;
 	}
-	if (xdl_prepare_ctx(mf2, enl2, xpp, &cf, &xe->xdf2) < 0) {
+	if (xdl_prepare_ctx(mf2, enl2, xpp, &cf, &xe->xdf2, c2) < 0) {
 
 		xdl_free_ctx(&xe->xdf1);
 		xdl_free_classifier(&cf);
@@ -309,6 +347,13 @@ void xdl_free_env(xdfenv_t *xe) {
 }
 
 
+void xdl_cache_free(xdcache_t *cache) {
+	xdl_free(cache->ha);
+	xdl_free(cache->size);
+	memset(cache, 0, sizeof(xdcache_t));
+}
+
+
 static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
 	long r, rdis0, rpdis0, rdis1, rpdis1;
 
diff --git a/xdiff/xtypes.h b/xdiff/xtypes.h
index 2511aef..e6f6890 100644
--- a/xdiff/xtypes.h
+++ b/xdiff/xtypes.h
@@ -43,6 +43,7 @@ typedef struct s_xrecord {
 	char const *ptr;
 	long size;
 	unsigned long ha;
+	unsigned long original_ha;
 } xrecord_t;
 
 typedef struct s_xdfile {
-- 
1.5.6.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] Use xdiff caching to improve git blame performance
  2008-08-21 23:22       ` [PATCH 4/5] Allow xdiff machinery to cache hash results for a file Brian Downing
@ 2008-08-21 23:22         ` Brian Downing
  0 siblings, 0 replies; 17+ messages in thread
From: Brian Downing @ 2008-08-21 23:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Brian Downing

git blame -C -C diffs all entries against the same file.  To avoid having
to recompute the line hashes for this file over and over again, use the
xddiff cache feature to store the hashes in struct origin.

Note that because this bypasses the xdi_diff tail trimming, it can
return different (but still valid) blame results for certain cases.
See http://article.gmane.org/gmane.comp.version-control.git/93112 for
more details.

This yields another significant speed improvement for some cases:

:; time git-blame -M -C -C -p --incremental server.c >/dev/null
Before:
48.66user 0.08system 0:48.75elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+36961minor)pagefaults 0swaps
After:
29.68user 0.22system 0:29.98elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+37897minor)pagefaults 0swaps

Signed-off-by: Brian Downing <bdowning@lavos.net>
---
 builtin-blame.c |   14 ++++++++++----
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 66b7d15..3e90668 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -80,6 +80,7 @@ struct origin {
 	int refcnt;
 	struct commit *commit;
 	mmfile_t file;
+	xdcache_t xdcache;
 	unsigned char blob_sha1[20];
 	char path[FLEX_ARRAY];
 };
@@ -119,6 +120,7 @@ static inline struct origin *origin_incref(struct origin *o)
 static void origin_decref(struct origin *o)
 {
 	if (o && --o->refcnt <= 0) {
+	        xdl_cache_free(&o->xdcache);
 		free(o->file.ptr);
 		free(o);
 	}
@@ -494,7 +496,8 @@ static int process_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 	return 0;
 }
 
-static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o,
+static struct patch *compare_buffer(mmfile_t *file_p, xdcache_t *cache_p,
+				    mmfile_t *file_o, xdcache_t *cache_o,
 				    int context)
 {
 	struct patch *patch;
@@ -504,6 +507,8 @@ static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o,
 
 	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = xdl_opts;
+	xpp.mf1_cache = cache_p;
+	xpp.mf2_cache = cache_o;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = context;
 	patch = xmalloc(sizeof(struct patch));
@@ -511,7 +516,7 @@ static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o,
 	patch->num = 0;
 	xecfg.emit_func = (void (*)())process_diff;
 	ecb.priv = patch;
-	xdi_diff(file_p, file_o, &xpp, &xecfg, &ecb);
+	xdl_diff(file_p, file_o, &xpp, &xecfg, &ecb);
 
 	return patch;
 }
@@ -530,7 +535,8 @@ static struct patch *get_patch(struct origin *parent, struct origin *origin)
 	fill_origin_blob(origin, &file_o);
 	if (!file_p.ptr || !file_o.ptr)
 		return NULL;
-	patch = compare_buffer(&file_p, &file_o, 0);
+	patch = compare_buffer(&file_p, &parent->xdcache,
+			       &file_o, &origin->xdcache, 0);
 	num_get_patch++;
 	return patch;
 }
@@ -937,7 +943,7 @@ static void find_copy_in_blob(struct scoreboard *sb,
 	}
 	file_o.size = cp - file_o.ptr;
 
-	patch = compare_buffer(file_p, &file_o, 1);
+	patch = compare_buffer(file_p, &parent->xdcache, &file_o, NULL, 1);
 
 	/*
 	 * file_o is a part of final image we are annotating.
-- 
1.5.6.1

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff
  2008-08-21 23:21 ` [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff Brian Downing
  2008-08-21 23:21   ` [PATCH 2/5] Bypass textual patch generation and parsing in git blame Brian Downing
@ 2008-08-23  8:15   ` René Scharfe
  2008-08-23  9:03     ` Junio C Hamano
  2008-08-24  8:12     ` Brian Downing
  1 sibling, 2 replies; 17+ messages in thread
From: René Scharfe @ 2008-08-23  8:15 UTC (permalink / raw)
  To: Brian Downing; +Cc: Junio C Hamano, git

Brian Downing schrieb:
> For some users (e.g. git blame), getting textual patch output is just
> extra work, as they can get all the information they need from the low-
> level diff structures.  Allow for an alternate low-level emit function
> to be defined to allow bypassing the textual patch generation; set
> xemitconf_t's emit_func member to enable this.
> 
> The (void (*)()) type is pretty ugly, but the alternative would be to
> include most of the private xdiff headers in xdiff.h to get the types
> required for the "proper" function prototype.  Also, a (void *) won't
> work, as ANSI C doesn't allow a function pointer to be cast to an
> object pointer.

Could we move more code into the library code to avoid that ugliness?

AFAICS, compare_buffer() builds a struct patch with an array of
struct chunks, whose members are then fed one by one into either
blame_chunk() or handle_split().  Could we avoid the allocation
altogether by using a different interface?

E.g. have a callback like this:

	static void handle_split_cb(long same, long p_next, long t_next,
			void *data)
	{
		struct chunk_cb_data *d = data;
		handle_split(d->sb, d->ent, d->tlno, d->plno, same,
				d->parent, d->split);
		d->plno = p_next;
		d->tlno = t_next;
	}

And use it like this:

	struct chunk_cb_data d = {sb, ent, 0, 0, parent, split};
        xpparam_t xpp;
        xdemitconf_t xecfg;

        xpp.flags = xdl_opts;
        memset(&xecfg, 0, sizeof(xecfg));
        xecfg.ctxlen = context;
	xdi_diff_chunks(file_p, file_o, &xpp, &xecfg, handle_split_cb, &d);
        handle_split(sb, ent, d.tlno, d.plno, ent->num_lines,
			parent, split);

Makes sense?

René

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff
  2008-08-23  8:15   ` [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff René Scharfe
@ 2008-08-23  9:03     ` Junio C Hamano
  2008-08-24  8:12     ` Brian Downing
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2008-08-23  9:03 UTC (permalink / raw)
  To: René Scharfe; +Cc: Brian Downing, git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Could we move more code into the library code to avoid that ugliness?
>
> AFAICS, compare_buffer() builds a struct patch with an array of
> struct chunks, whose members are then fed one by one into either
> blame_chunk() or handle_split().  Could we avoid the allocation
> altogether by using a different interface?
>
> E.g. have a callback like this:
>
> 	static void handle_split_cb(long same, long p_next, long t_next,
> 			void *data)
> 	{
> 		struct chunk_cb_data *d = data;
> 		handle_split(d->sb, d->ent, d->tlno, d->plno, same,
> 				d->parent, d->split);
> 		d->plno = p_next;
> 		d->tlno = t_next;
> 	}
>
> And use it like this:
>
> 	struct chunk_cb_data d = {sb, ent, 0, 0, parent, split};
>         xpparam_t xpp;
>         xdemitconf_t xecfg;
>
>         xpp.flags = xdl_opts;
>         memset(&xecfg, 0, sizeof(xecfg));
>         xecfg.ctxlen = context;
> 	xdi_diff_chunks(file_p, file_o, &xpp, &xecfg, handle_split_cb, &d);
>         handle_split(sb, ent, d.tlno, d.plno, ent->num_lines,
> 			parent, split);
>
> Makes sense?

Absolutely; very well formulated.  Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff
  2008-08-23  8:15   ` [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff René Scharfe
  2008-08-23  9:03     ` Junio C Hamano
@ 2008-08-24  8:12     ` Brian Downing
  2008-09-03 22:29       ` René Scharfe
  1 sibling, 1 reply; 17+ messages in thread
From: Brian Downing @ 2008-08-24  8:12 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, git

On Sat, Aug 23, 2008 at 10:15:59AM +0200, René Scharfe wrote:
> Could we move more code into the library code to avoid that ugliness?
> 
> AFAICS, compare_buffer() builds a struct patch with an array of
> struct chunks, whose members are then fed one by one into either
> blame_chunk() or handle_split().  Could we avoid the allocation
> altogether by using a different interface?

Thanks, I think this is a good idea.  I'll try to work up something like
this, but it may be a few days before I have any appreciable hacking
time to do so.

-bcd 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff
  2008-08-24  8:12     ` Brian Downing
@ 2008-09-03 22:29       ` René Scharfe
  2008-10-25 13:30         ` [PATCH 1/5] blame: inline get_patch() René Scharfe
                           ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: René Scharfe @ 2008-09-03 22:29 UTC (permalink / raw)
  To: Brian Downing; +Cc: Junio C Hamano, git

Brian Downing schrieb:
> On Sat, Aug 23, 2008 at 10:15:59AM +0200, René Scharfe wrote:
>> Could we move more code into the library code to avoid that ugliness?
>>
>> AFAICS, compare_buffer() builds a struct patch with an array of
>> struct chunks, whose members are then fed one by one into either
>> blame_chunk() or handle_split().  Could we avoid the allocation
>> altogether by using a different interface?
> 
> Thanks, I think this is a good idea.  I'll try to work up something like
> this, but it may be a few days before I have any appreciable hacking
> time to do so.

Here's a patch on top of the one I'm replying to, which in turn is
b3779280ca4881252069fa9d1c7d2069a69c4a52 in pu.  While it removes more
code than it adds and has a slightly nicer interface, it doesn't speed
up blame.  The following test case:

   $ git-blame -M -C -C -p --incremental master -- Makefile >/dev/null

loses a few calls to mmap, munmap and brk, as strace tells me, but any
speed up that might result from that is lost in the noise.

I haven't had time to think about how to combine it with the cache you
introduced in patches 2-5, though, and I won't get to it before the
weekend (if at all). :-/

Thanks,
René


 builtin-blame.c   |  178 ++++++++++++++++------------------------------------
 xdiff-interface.c |   50 +++++++++++++++-
 xdiff-interface.h |    5 ++
 3 files changed, 109 insertions(+), 124 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 60f70bf..c9783dc 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -19,10 +19,6 @@
 #include "string-list.h"
 #include "mailmap.h"
 #include "parse-options.h"
-#include "xdiff/xtypes.h"
-#include "xdiff/xdiffi.h"
-#include "xdiff/xemit.h"
-#include "xdiff/xmacros.h"
 
 static char blame_usage[] = "git blame [options] [rev-opts] [rev] [--] file";
 
@@ -448,99 +444,6 @@ static struct origin *find_rename(struct scoreboard *sb,
 }
 
 /*
- * Parsing of patch chunks...
- */
-struct chunk {
-	/* line number in postimage; up to but not including this
-	 * line is the same as preimage
-	 */
-	int same;
-
-	/* preimage line number after this chunk */
-	int p_next;
-
-	/* postimage line number after this chunk */
-	int t_next;
-};
-
-struct patch {
-	struct chunk *chunks;
-	int num;
-};
-
-static int process_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
-			xdemitconf_t const *xecfg)
-{
-	struct patch *patch = ecb->priv;
-	long s1, s2;
-	xdchange_t *xch, *xche;
-	struct chunk *chunk;
-
-	for (xch = xche = xscr; xch; xch = xche->next) {
-		xche = xdl_get_hunk(xch, xecfg);
-
-		s1 = XDL_MAX(xch->i1 - xecfg->ctxlen, 0);
-		s2 = XDL_MAX(xch->i2 - xecfg->ctxlen, 0);
-
-		++patch->num;
-		patch->chunks = xrealloc(patch->chunks,
-					 sizeof(struct chunk) * patch->num);
-		chunk = &patch->chunks[patch->num - 1];
-		chunk->same = s2 + XDL_MAX(xch->i1 - s1, 0);
-		chunk->p_next = xche->i1 + xche->chg1;
-		chunk->t_next = xche->i2 + xche->chg2;
-	}
-
-	return 0;
-}
-
-static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o,
-				    int context)
-{
-	struct patch *patch;
-	xpparam_t xpp;
-	xdemitconf_t xecfg;
-	xdemitcb_t ecb;
-
-	xpp.flags = xdl_opts;
-	memset(&xecfg, 0, sizeof(xecfg));
-	xecfg.ctxlen = context;
-	patch = xmalloc(sizeof(struct patch));
-	patch->chunks = NULL;
-	patch->num = 0;
-	xecfg.emit_func = (void (*)())process_diff;
-	ecb.priv = patch;
-	xdi_diff(file_p, file_o, &xpp, &xecfg, &ecb);
-
-	return patch;
-}
-
-/*
- * Run diff between two origins and grab the patch output, so that
- * we can pass blame for lines origin is currently suspected for
- * to its parent.
- */
-static struct patch *get_patch(struct origin *parent, struct origin *origin)
-{
-	mmfile_t file_p, file_o;
-	struct patch *patch;
-
-	fill_origin_blob(parent, &file_p);
-	fill_origin_blob(origin, &file_o);
-	if (!file_p.ptr || !file_o.ptr)
-		return NULL;
-	patch = compare_buffer(&file_p, &file_o, 0);
-	num_get_patch++;
-	return patch;
-}
-
-static void free_patch(struct patch *p)
-{
-	free(p->chunks);
-	free(p);
-}
-
-/*
  * Link in a new blame entry to the scoreboard.  Entries that cover the
  * same line range have been removed from the scoreboard previously.
  */
@@ -786,6 +689,22 @@ static void blame_chunk(struct scoreboard *sb,
 	}
 }
 
+struct blame_chunk_cb_data {
+	struct scoreboard *sb;
+	struct origin *target;
+	struct origin *parent;
+	long plno;
+	long tlno;
+};
+
+static void blame_chunk_cb(void *data, long same, long p_next, long t_next)
+{
+	struct blame_chunk_cb_data *d = data;
+	blame_chunk(d->sb, d->tlno, d->plno, same, d->target, d->parent);
+	d->plno = p_next;
+	d->tlno = t_next;
+}
+
 /*
  * We are looking at the origin 'target' and aiming to pass blame
  * for the lines it is suspected to its parent.  Run diff to find
@@ -795,26 +714,28 @@ static int pass_blame_to_parent(struct scoreboard *sb,
 				struct origin *target,
 				struct origin *parent)
 {
-	int i, last_in_target, plno, tlno;
-	struct patch *patch;
+	int last_in_target;
+	mmfile_t file_p, file_o;
+	struct blame_chunk_cb_data d = { sb, target, parent, 0, 0 };
+	xpparam_t xpp;
+	xdemitconf_t xecfg;
 
 	last_in_target = find_last_in_target(sb, target);
 	if (last_in_target < 0)
 		return 1; /* nothing remains for this target */
 
-	patch = get_patch(parent, target);
-	plno = tlno = 0;
-	for (i = 0; i < patch->num; i++) {
-		struct chunk *chunk = &patch->chunks[i];
+	fill_origin_blob(parent, &file_p);
+	fill_origin_blob(target, &file_o);
 
-		blame_chunk(sb, tlno, plno, chunk->same, target, parent);
-		plno = chunk->p_next;
-		tlno = chunk->t_next;
-	}
+	num_get_patch++;
+
+	xpp.flags = xdl_opts;
+	memset(&xecfg, 0, sizeof(xecfg));
+	xecfg.ctxlen = 0;
+	xdi_diff_chunks(&file_p, &file_o, blame_chunk_cb, &d, &xpp, &xecfg);
 	/* The rest (i.e. anything after tlno) are the same as the parent */
-	blame_chunk(sb, tlno, plno, last_in_target, target, parent);
+	blame_chunk(sb, d.tlno, d.plno, last_in_target, target, parent);
 
-	free_patch(patch);
 	return 0;
 }
 
@@ -906,6 +827,23 @@ static void handle_split(struct scoreboard *sb,
 	}
 }
 
+struct handle_split_cb_data {
+	struct scoreboard *sb;
+	struct blame_entry *ent;
+	struct origin *parent;
+	struct blame_entry *split;
+	long plno;
+	long tlno;
+};
+
+static void handle_split_cb(void *data, long same, long p_next, long t_next)
+{
+	struct handle_split_cb_data *d = data;
+	handle_split(d->sb, d->ent, d->tlno, d->plno, same, d->parent, d->split);
+	d->plno = p_next;
+	d->tlno = t_next;
+}
+
 /*
  * Find the lines from parent that are the same as ent so that
  * we can pass blames to it.  file_p has the blob contents for
@@ -920,8 +858,9 @@ static void find_copy_in_blob(struct scoreboard *sb,
 	const char *cp;
 	int cnt;
 	mmfile_t file_o;
-	struct patch *patch;
-	int i, plno, tlno;
+	struct handle_split_cb_data d = { sb, ent, parent, split, 0, 0 };
+	xpparam_t xpp;
+	xdemitconf_t xecfg;
 
 	/*
 	 * Prepare mmfile that contains only the lines in ent.
@@ -936,24 +875,17 @@ static void find_copy_in_blob(struct scoreboard *sb,
 	}
 	file_o.size = cp - file_o.ptr;
 
-	patch = compare_buffer(file_p, &file_o, 1);
-
 	/*
 	 * file_o is a part of final image we are annotating.
 	 * file_p partially may match that image.
 	 */
+	xpp.flags = xdl_opts;
+	memset(&xecfg, 0, sizeof(xecfg));
+	xecfg.ctxlen = 1;
 	memset(split, 0, sizeof(struct blame_entry [3]));
-	plno = tlno = 0;
-	for (i = 0; i < patch->num; i++) {
-		struct chunk *chunk = &patch->chunks[i];
-
-		handle_split(sb, ent, tlno, plno, chunk->same, parent, split);
-		plno = chunk->p_next;
-		tlno = chunk->t_next;
-	}
+	xdi_diff_chunks(file_p, &file_o, handle_split_cb, &d, &xpp, &xecfg);
 	/* remainder, if any, all match the preimage */
-	handle_split(sb, ent, tlno, plno, ent->num_lines, parent, split);
-	free_patch(patch);
+	handle_split(sb, ent, d.tlno, d.plno, ent->num_lines, parent, split);
 }
 
 /*
diff --git a/xdiff-interface.c b/xdiff-interface.c
index 944ad98..a7cfdab 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -1,6 +1,9 @@
 #include "cache.h"
 #include "xdiff-interface.h"
-#include "strbuf.h"
+#include "xdiff/xtypes.h"
+#include "xdiff/xdiffi.h"
+#include "xdiff/xemit.h"
+#include "xdiff/xmacros.h"
 
 struct xdiff_emit_state {
 	xdiff_emit_consume_fn consume;
@@ -153,6 +156,51 @@ int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
 	return ret;
 }
 
+struct xdiff_emit_chunk_state {
+	xdiff_emit_chunk_consume_fn consume;
+	void *consume_callback_data;
+};
+
+static int process_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
+			xdemitconf_t const *xecfg)
+{
+	long s1, s2, same, p_next, t_next;
+	xdchange_t *xch, *xche;
+	struct xdiff_emit_chunk_state *state = ecb->priv;
+	xdiff_emit_chunk_consume_fn fn = state->consume;
+	void *consume_callback_data = state->consume_callback_data;
+
+
+	for (xch = xche = xscr; xch; xch = xche->next) {
+		xche = xdl_get_hunk(xch, xecfg);
+
+		s1 = XDL_MAX(xch->i1 - xecfg->ctxlen, 0);
+		s2 = XDL_MAX(xch->i2 - xecfg->ctxlen, 0);
+		same = s2 + XDL_MAX(xch->i1 - s1, 0);
+		p_next = xche->i1 + xche->chg1;
+		t_next = xche->i2 + xche->chg2;
+
+		fn(consume_callback_data, same, p_next, t_next);
+	}
+	return 0;
+}
+
+int xdi_diff_chunks(mmfile_t *mf1, mmfile_t *mf2,
+		    xdiff_emit_chunk_consume_fn fn, void *consume_callback_data,
+		    xpparam_t const *xpp, xdemitconf_t *xecfg)
+{
+	struct xdiff_emit_chunk_state state;
+	xdemitcb_t ecb;
+
+	memset(&state, 0, sizeof(state));
+	memset(&ecb, 0, sizeof(ecb));
+	state.consume = fn;
+	state.consume_callback_data = consume_callback_data;
+	xecfg->emit_func = (void (*)())process_diff;
+	ecb.priv = &state;
+	return xdi_diff(mf1, mf2, xpp, xecfg, &ecb);
+}
+
 int read_mmfile(mmfile_t *ptr, const char *filename)
 {
 	struct stat st;
diff --git a/xdiff-interface.h b/xdiff-interface.h
index 558492b..1f7d985 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -4,12 +4,17 @@
 #include "xdiff/xdiff.h"
 
 typedef void (*xdiff_emit_consume_fn)(void *, char *, unsigned long);
+typedef void (*xdiff_emit_chunk_consume_fn)(void *, long, long, long);
 
 int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb);
 int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
 		  xdiff_emit_consume_fn fn, void *consume_callback_data,
 		  xpparam_t const *xpp,
 		  xdemitconf_t const *xecfg, xdemitcb_t *xecb);
+int xdi_diff_chunks(mmfile_t *mf1, mmfile_t *mf2,
+		    xdiff_emit_chunk_consume_fn fn, void *consume_callback_data,
+		    xpparam_t const *xpp, xdemitconf_t *xecfg);
+
 int parse_hunk_header(char *line, int len,
 		      int *ob, int *on,
 		      int *nb, int *nn);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 1/5] blame: inline get_patch()
  2008-09-03 22:29       ` René Scharfe
@ 2008-10-25 13:30         ` René Scharfe
  2008-10-25 13:30         ` [PATCH 2/5] Always initialize xpparam_t to 0 René Scharfe
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: René Scharfe @ 2008-10-25 13:30 UTC (permalink / raw)
  To: Brian Downing; +Cc: Junio C Hamano, git

Inline get_patch() to its only call site as a preparation for getting rid
of struct patch.  Also we don't need to check the ptr members because
fill_origin_blob() already did, and the caller didn't check for NULL
anyway, so drop the test.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 builtin-blame.c |   31 +++++++++++--------------------
 1 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 48cc0c1..593b539 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -542,25 +542,6 @@ static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o,
 	return state.ret;
 }
 
-/*
- * Run diff between two origins and grab the patch output, so that
- * we can pass blame for lines origin is currently suspected for
- * to its parent.
- */
-static struct patch *get_patch(struct origin *parent, struct origin *origin)
-{
-	mmfile_t file_p, file_o;
-	struct patch *patch;
-
-	fill_origin_blob(parent, &file_p);
-	fill_origin_blob(origin, &file_o);
-	if (!file_p.ptr || !file_o.ptr)
-		return NULL;
-	patch = compare_buffer(&file_p, &file_o, 0);
-	num_get_patch++;
-	return patch;
-}
-
 static void free_patch(struct patch *p)
 {
 	free(p->chunks);
@@ -824,12 +805,22 @@ static int pass_blame_to_parent(struct scoreboard *sb,
 {
 	int i, last_in_target, plno, tlno;
 	struct patch *patch;
+	mmfile_t file_p, file_o;
 
 	last_in_target = find_last_in_target(sb, target);
 	if (last_in_target < 0)
 		return 1; /* nothing remains for this target */
 
-	patch = get_patch(parent, target);
+	/*
+	 * Run diff between two origins and grab the patch output, so that
+	 * we can pass blame for lines origin is currently suspected for
+	 * to its parent.
+	 */
+	fill_origin_blob(parent, &file_p);
+	fill_origin_blob(target, &file_o);
+	patch = compare_buffer(&file_p, &file_o, 0);
+	num_get_patch++;
+
 	plno = tlno = 0;
 	for (i = 0; i < patch->num; i++) {
 		struct chunk *chunk = &patch->chunks[i];
-- 
1.6.0.3.514.g2f91b

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/5] Always initialize xpparam_t to 0
  2008-09-03 22:29       ` René Scharfe
  2008-10-25 13:30         ` [PATCH 1/5] blame: inline get_patch() René Scharfe
@ 2008-10-25 13:30         ` René Scharfe
  2008-10-25 13:30         ` [PATCH 3/5] Allow alternate "low-level" emit function from xdl_diff René Scharfe
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: René Scharfe @ 2008-10-25 13:30 UTC (permalink / raw)
  To: Brian Downing; +Cc: Junio C Hamano, git

From: Brian Downing <bdowning@lavos.net>

We're going to be adding some parameters to this, so we can't have
any uninitialized data in it.

Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Taken from pu.  This patch series doesn't add anything to the struct,
but it's a good idea to future-proof its initialization anyway.

 builtin-blame.c  |    1 +
 builtin-rerere.c |    1 +
 combine-diff.c   |    1 +
 diff.c           |    5 +++++
 merge-file.c     |    1 +
 5 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 593b539..5ca7065 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -523,6 +523,7 @@ static struct patch *compare_buffer(mmfile_t
*file_p, mmfile_t *file_o,
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;

+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = xdl_opts;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = context;
diff --git a/builtin-rerere.c b/builtin-rerere.c
index dd4573f..d4dec6b 100644
--- a/builtin-rerere.c
+++ b/builtin-rerere.c
@@ -98,6 +98,7 @@ static int diff_two(const char *file1, const char *label1,

 	printf("--- a/%s\n+++ b/%s\n", label1, label2);
 	fflush(stdout);
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = XDF_NEED_MINIMAL;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
diff --git a/combine-diff.c b/combine-diff.c
index 5aa1104..ec8df39 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -213,6 +213,7 @@ static void combine_diff(const unsigned char
*parent, mmfile_t *result_file,

 	parent_file.ptr = grab_blob(parent, &sz);
 	parent_file.size = sz;
+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = XDF_NEED_MINIMAL;
 	memset(&xecfg, 0, sizeof(xecfg));
 	memset(&state, 0, sizeof(state));
diff --git a/diff.c b/diff.c
index e368fef..1918b73 100644
--- a/diff.c
+++ b/diff.c
@@ -400,6 +400,7 @@ static void diff_words_show(struct diff_words_data
*diff_words)
 	mmfile_t minus, plus;
 	int i;

+	memset(&xpp, 0, sizeof(xpp));
 	memset(&xecfg, 0, sizeof(xecfg));
 	minus.size = diff_words->minus.text.size;
 	minus.ptr = xmalloc(minus.size);
@@ -1416,6 +1417,7 @@ static void builtin_diff(const char *name_a,
 		if (!pe)
 			pe = diff_funcname_pattern(two);

+		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		memset(&ecbdata, 0, sizeof(ecbdata));
 		ecbdata.label_path = lbl;
@@ -1489,6 +1491,7 @@ static void builtin_diffstat(const char *name_a,
const char *name_b,
 		xdemitconf_t xecfg;
 		xdemitcb_t ecb;

+		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts;
 		xdi_diff_outf(&mf1, &mf2, diffstat_consume, diffstat,
@@ -1535,6 +1538,7 @@ static void builtin_checkdiff(const char *name_a,
const char *name_b,
 		xdemitconf_t xecfg;
 		xdemitcb_t ecb;

+		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		xecfg.ctxlen = 1; /* at least one context line */
 		xpp.flags = XDF_NEED_MINIMAL;
@@ -2958,6 +2962,7 @@ static int diff_get_patch_id(struct diff_options
*options, unsigned char *sha1)
 		struct diff_filepair *p = q->queue[i];
 		int len1, len2;

+		memset(&xpp, 0, sizeof(xpp));
 		memset(&xecfg, 0, sizeof(xecfg));
 		if (p->status == 0)
 			return error("internal diff status error");
diff --git a/merge-file.c b/merge-file.c
index 2a939c9..3120a95 100644
--- a/merge-file.c
+++ b/merge-file.c
@@ -61,6 +61,7 @@ static int generate_common_file(mmfile_t *res,
mmfile_t *f1, mmfile_t *f2)
 	xdemitconf_t xecfg;
 	xdemitcb_t ecb;

+	memset(&xpp, 0, sizeof(xpp));
 	xpp.flags = XDF_NEED_MINIMAL;
 	memset(&xecfg, 0, sizeof(xecfg));
 	xecfg.ctxlen = 3;
-- 
1.6.0.3.514.g2f91b

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/5] Allow alternate "low-level" emit function from xdl_diff
  2008-09-03 22:29       ` René Scharfe
  2008-10-25 13:30         ` [PATCH 1/5] blame: inline get_patch() René Scharfe
  2008-10-25 13:30         ` [PATCH 2/5] Always initialize xpparam_t to 0 René Scharfe
@ 2008-10-25 13:30         ` René Scharfe
  2008-10-25 13:31         ` [PATCH 4/5] add xdi_diff_hunks() for callers that only need hunk lengths René Scharfe
  2008-10-25 13:31         ` [PATCH 5/5] blame: use xdi_diff_hunks(), get rid of struct patch René Scharfe
  4 siblings, 0 replies; 17+ messages in thread
From: René Scharfe @ 2008-10-25 13:30 UTC (permalink / raw)
  To: Brian Downing; +Cc: Junio C Hamano, git

From: Brian Downing <bdowning@lavos.net>

For some users (e.g. git blame), getting textual patch output is just
extra work, as they can get all the information they need from the low-
level diff structures.  Allow for an alternate low-level emit function
to be defined to allow bypassing the textual patch generation; set
xemitconf_t's emit_func member to enable this.

The (void (*)()) type is pretty ugly, but the alternative would be to
include most of the private xdiff headers in xdiff.h to get the types
required for the "proper" function prototype.  Also, a (void *) won't
work, as ANSI C doesn't allow a function pointer to be cast to an
object pointer.

Signed-off-by: Brian Downing <bdowning@lavos.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
Taken from pu.

 xdiff/xdiff.h  |    1 +
 xdiff/xdiffi.c |    4 +++-
 xdiff/xemit.c  |    3 +--
 xdiff/xemit.h  |    3 +++
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index deebe02..84fff58 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -87,6 +87,7 @@ typedef struct s_xdemitconf {
 	unsigned long flags;
 	find_func_t find_func;
 	void *find_func_priv;
+	void (*emit_func)();
 } xdemitconf_t;
 
 typedef struct s_bdiffparam {
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 1bad846..9d0324a 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -538,6 +538,8 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
 	xdchange_t *xscr;
 	xdfenv_t xe;
+	emit_func_t ef = xecfg->emit_func ?
+		(emit_func_t)xecfg->emit_func : xdl_emit_diff;
 
 	if (xdl_do_diff(mf1, mf2, xpp, &xe) < 0) {
 
@@ -551,7 +553,7 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
 		return -1;
 	}
 	if (xscr) {
-		if (xdl_emit_diff(&xe, xscr, ecb, xecfg) < 0) {
+		if (ef(&xe, xscr, ecb, xecfg) < 0) {
 
 			xdl_free_script(xscr);
 			xdl_free_env(&xe);
diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index d3d9c84..4625c1b 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -27,7 +27,6 @@
 
 static long xdl_get_rec(xdfile_t *xdf, long ri, char const **rec);
 static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *ecb);
-static xdchange_t *xdl_get_hunk(xdchange_t *xscr, xdemitconf_t const *xecfg);
 
 
 
@@ -58,7 +57,7 @@ static int xdl_emit_record(xdfile_t *xdf, long ri, char const *pre, xdemitcb_t *
  * Starting at the passed change atom, find the latest change atom to be included
  * inside the differential hunk according to the specified configuration.
  */
-static xdchange_t *xdl_get_hunk(xdchange_t *xscr, xdemitconf_t const *xecfg) {
+xdchange_t *xdl_get_hunk(xdchange_t *xscr, xdemitconf_t const *xecfg) {
 	xdchange_t *xch, *xchp;
 
 	for (xchp = xscr, xch = xscr->next; xch; xchp = xch, xch = xch->next)
diff --git a/xdiff/xemit.h b/xdiff/xemit.h
index 440a739..c2e2e83 100644
--- a/xdiff/xemit.h
+++ b/xdiff/xemit.h
@@ -24,7 +24,10 @@
 #define XEMIT_H
 
 
+typedef int (*emit_func_t)(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
+			   xdemitconf_t const *xecfg);
 
+xdchange_t *xdl_get_hunk(xdchange_t *xscr, xdemitconf_t const *xecfg);
 int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		  xdemitconf_t const *xecfg);
 
-- 
1.6.0.3.514.g2f91b

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/5] add xdi_diff_hunks() for callers that only need hunk lengths
  2008-09-03 22:29       ` René Scharfe
                           ` (2 preceding siblings ...)
  2008-10-25 13:30         ` [PATCH 3/5] Allow alternate "low-level" emit function from xdl_diff René Scharfe
@ 2008-10-25 13:31         ` René Scharfe
  2008-10-25 13:31         ` [PATCH 5/5] blame: use xdi_diff_hunks(), get rid of struct patch René Scharfe
  4 siblings, 0 replies; 17+ messages in thread
From: René Scharfe @ 2008-10-25 13:31 UTC (permalink / raw)
  To: Brian Downing; +Cc: Junio C Hamano, git

Based on a patch by Brian Downing, this uses the xdiff emit_func feature
to implement xdi_diff_hunks().  It's a function that calls a callback for
each hunk of a diff, passing its lengths.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 xdiff-interface.c |   49 ++++++++++++++++++++++++++++++++++++++++++++++++-
 xdiff-interface.h |    4 ++++
 2 files changed, 52 insertions(+), 1 deletions(-)

diff --git a/xdiff-interface.c b/xdiff-interface.c
index 49e06af..e8ef46d 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -1,6 +1,9 @@
 #include "cache.h"
 #include "xdiff-interface.h"
-#include "strbuf.h"
+#include "xdiff/xtypes.h"
+#include "xdiff/xdiffi.h"
+#include "xdiff/xemit.h"
+#include "xdiff/xmacros.h"
 
 struct xdiff_emit_state {
 	xdiff_emit_consume_fn consume;
@@ -153,6 +156,50 @@ int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
 	return ret;
 }
 
+struct xdiff_emit_hunk_state {
+	xdiff_emit_hunk_consume_fn consume;
+	void *consume_callback_data;
+};
+
+static int process_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
+			xdemitconf_t const *xecfg)
+{
+	long s1, s2, same, p_next, t_next;
+	xdchange_t *xch, *xche;
+	struct xdiff_emit_hunk_state *state = ecb->priv;
+	xdiff_emit_hunk_consume_fn fn = state->consume;
+	void *consume_callback_data = state->consume_callback_data;
+
+	for (xch = xscr; xch; xch = xche->next) {
+		xche = xdl_get_hunk(xch, xecfg);
+
+		s1 = XDL_MAX(xch->i1 - xecfg->ctxlen, 0);
+		s2 = XDL_MAX(xch->i2 - xecfg->ctxlen, 0);
+		same = s2 + XDL_MAX(xch->i1 - s1, 0);
+		p_next = xche->i1 + xche->chg1;
+		t_next = xche->i2 + xche->chg2;
+
+		fn(consume_callback_data, same, p_next, t_next);
+	}
+	return 0;
+}
+
+int xdi_diff_hunks(mmfile_t *mf1, mmfile_t *mf2,
+		   xdiff_emit_hunk_consume_fn fn, void *consume_callback_data,
+		   xpparam_t const *xpp, xdemitconf_t *xecfg)
+{
+	struct xdiff_emit_hunk_state state;
+	xdemitcb_t ecb;
+
+	memset(&state, 0, sizeof(state));
+	memset(&ecb, 0, sizeof(ecb));
+	state.consume = fn;
+	state.consume_callback_data = consume_callback_data;
+	xecfg->emit_func = (void (*)())process_diff;
+	ecb.priv = &state;
+	return xdi_diff(mf1, mf2, xpp, xecfg, &ecb);
+}
+
 int read_mmfile(mmfile_t *ptr, const char *filename)
 {
 	struct stat st;
diff --git a/xdiff-interface.h b/xdiff-interface.h
index eaf9cd3..7352b9a 100644
--- a/xdiff-interface.h
+++ b/xdiff-interface.h
@@ -4,12 +4,16 @@
 #include "xdiff/xdiff.h"
 
 typedef void (*xdiff_emit_consume_fn)(void *, char *, unsigned long);
+typedef void (*xdiff_emit_hunk_consume_fn)(void *, long, long, long);
 
 int xdi_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, xdemitconf_t const *xecfg, xdemitcb_t *ecb);
 int xdi_diff_outf(mmfile_t *mf1, mmfile_t *mf2,
 		  xdiff_emit_consume_fn fn, void *consume_callback_data,
 		  xpparam_t const *xpp,
 		  xdemitconf_t const *xecfg, xdemitcb_t *xecb);
+int xdi_diff_hunks(mmfile_t *mf1, mmfile_t *mf2,
+		   xdiff_emit_hunk_consume_fn fn, void *consume_callback_data,
+		   xpparam_t const *xpp, xdemitconf_t *xecfg);
 int parse_hunk_header(char *line, int len,
 		      int *ob, int *on,
 		      int *nb, int *nn);
-- 
1.6.0.3.514.g2f91b

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 5/5] blame: use xdi_diff_hunks(), get rid of struct patch
  2008-09-03 22:29       ` René Scharfe
                           ` (3 preceding siblings ...)
  2008-10-25 13:31         ` [PATCH 4/5] add xdi_diff_hunks() for callers that only need hunk lengths René Scharfe
@ 2008-10-25 13:31         ` René Scharfe
  2008-10-25 19:36           ` Junio C Hamano
  4 siblings, 1 reply; 17+ messages in thread
From: René Scharfe @ 2008-10-25 13:31 UTC (permalink / raw)
  To: Brian Downing; +Cc: Junio C Hamano, git

Based on a patch by Brian Downing, this replaces the struct patch based
code for blame passing with calls to xdi_diff_hunks().  This way we
avoid generating and then parsing patches; we only let the interesting
infos be passed to our callbacks instead.  This makes blame a bit faster:

   $ blame="./git blame -M -C -C -p --incremental v1.6.0"

   # master
   $ /usr/bin/time $blame Makefile >/dev/null
   1.38user 0.14system 0:01.52elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (0major+12226minor)pagefaults 0swaps
   $ /usr/bin/time $blame cache.h >/dev/null
   1.66user 0.13system 0:01.80elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (0major+12262minor)pagefaults 0swaps

   # this patch series
   $ /usr/bin/time $blame Makefile >/dev/null
   1.27user 0.12system 0:01.40elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (0major+11836minor)pagefaults 0swaps
   $ /usr/bin/time $blame cache.h >/dev/null
   1.52user 0.12system 0:01.70elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (0major+12052minor)pagefaults 0swaps

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
Brian, your numbers looked much more impressive.  Could you please clock
this code with your repository and the file server.c?  I wonder if this
callback mechanism is just too complicated or if your case simply benefits
lots more than the two files from git mentioned above.

The patch series ends here without adding xdiff caching, for two reasons.
It's quite easy to add it; patch 4 from your series applies unchanged and
patch 5 is just needs a few small changes to account for the absence of
compare_buffer().  More importantly, speed actually went down with caching
for the test case.  The common tail optimization (xdi_diff() vs. xdl_diff())
seems to beat caching for cache.h and Makefile..

 builtin-blame.c |  191 +++++++++++++++----------------------------------------
 1 files changed, 52 insertions(+), 139 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index 5ca7065..b6bc5cf 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -443,113 +443,6 @@ static struct origin *find_rename(struct scoreboard *sb,
 }
 
 /*
- * Parsing of patch chunks...
- */
-struct chunk {
-	/* line number in postimage; up to but not including this
-	 * line is the same as preimage
-	 */
-	int same;
-
-	/* preimage line number after this chunk */
-	int p_next;
-
-	/* postimage line number after this chunk */
-	int t_next;
-};
-
-struct patch {
-	struct chunk *chunks;
-	int num;
-};
-
-struct blame_diff_state {
-	struct patch *ret;
-	unsigned hunk_post_context;
-	unsigned hunk_in_pre_context : 1;
-};
-
-static void process_u_diff(void *state_, char *line, unsigned long len)
-{
-	struct blame_diff_state *state = state_;
-	struct chunk *chunk;
-	int off1, off2, len1, len2, num;
-
-	num = state->ret->num;
-	if (len < 4 || line[0] != '@' || line[1] != '@') {
-		if (state->hunk_in_pre_context && line[0] == ' ')
-			state->ret->chunks[num - 1].same++;
-		else {
-			state->hunk_in_pre_context = 0;
-			if (line[0] == ' ')
-				state->hunk_post_context++;
-			else
-				state->hunk_post_context = 0;
-		}
-		return;
-	}
-
-	if (num && state->hunk_post_context) {
-		chunk = &state->ret->chunks[num - 1];
-		chunk->p_next -= state->hunk_post_context;
-		chunk->t_next -= state->hunk_post_context;
-	}
-	state->ret->num = ++num;
-	state->ret->chunks = xrealloc(state->ret->chunks,
-				      sizeof(struct chunk) * num);
-	chunk = &state->ret->chunks[num - 1];
-	if (parse_hunk_header(line, len, &off1, &len1, &off2, &len2)) {
-		state->ret->num--;
-		return;
-	}
-
-	/* Line numbers in patch output are one based. */
-	off1--;
-	off2--;
-
-	chunk->same = len2 ? off2 : (off2 + 1);
-
-	chunk->p_next = off1 + (len1 ? len1 : 1);
-	chunk->t_next = chunk->same + len2;
-	state->hunk_in_pre_context = 1;
-	state->hunk_post_context = 0;
-}
-
-static struct patch *compare_buffer(mmfile_t *file_p, mmfile_t *file_o,
-				    int context)
-{
-	struct blame_diff_state state;
-	xpparam_t xpp;
-	xdemitconf_t xecfg;
-	xdemitcb_t ecb;
-
-	memset(&xpp, 0, sizeof(xpp));
-	xpp.flags = xdl_opts;
-	memset(&xecfg, 0, sizeof(xecfg));
-	xecfg.ctxlen = context;
-	memset(&state, 0, sizeof(state));
-	state.ret = xmalloc(sizeof(struct patch));
-	state.ret->chunks = NULL;
-	state.ret->num = 0;
-
-	xdi_diff_outf(file_p, file_o, process_u_diff, &state, &xpp, &xecfg, &ecb);
-
-	if (state.ret->num) {
-		struct chunk *chunk;
-		chunk = &state.ret->chunks[state.ret->num - 1];
-		chunk->p_next -= state.hunk_post_context;
-		chunk->t_next -= state.hunk_post_context;
-	}
-	return state.ret;
-}
-
-static void free_patch(struct patch *p)
-{
-	free(p->chunks);
-	free(p);
-}
-
-/*
  * Link in a new blame entry to the scoreboard.  Entries that cover the
  * same line range have been removed from the scoreboard previously.
  */
@@ -795,6 +688,22 @@ static void blame_chunk(struct scoreboard *sb,
 	}
 }
 
+struct blame_chunk_cb_data {
+	struct scoreboard *sb;
+	struct origin *target;
+	struct origin *parent;
+	long plno;
+	long tlno;
+};
+
+static void blame_chunk_cb(void *data, long same, long p_next, long t_next)
+{
+	struct blame_chunk_cb_data *d = data;
+	blame_chunk(d->sb, d->tlno, d->plno, same, d->target, d->parent);
+	d->plno = p_next;
+	d->tlno = t_next;
+}
+
 /*
  * We are looking at the origin 'target' and aiming to pass blame
  * for the lines it is suspected to its parent.  Run diff to find
@@ -804,36 +713,28 @@ static int pass_blame_to_parent(struct scoreboard *sb,
 				struct origin *target,
 				struct origin *parent)
 {
-	int i, last_in_target, plno, tlno;
-	struct patch *patch;
+	int last_in_target;
 	mmfile_t file_p, file_o;
+	struct blame_chunk_cb_data d = { sb, target, parent, 0, 0 };
+	xpparam_t xpp;
+	xdemitconf_t xecfg;
 
 	last_in_target = find_last_in_target(sb, target);
 	if (last_in_target < 0)
 		return 1; /* nothing remains for this target */
 
-	/*
-	 * Run diff between two origins and grab the patch output, so that
-	 * we can pass blame for lines origin is currently suspected for
-	 * to its parent.
-	 */
 	fill_origin_blob(parent, &file_p);
 	fill_origin_blob(target, &file_o);
-	patch = compare_buffer(&file_p, &file_o, 0);
 	num_get_patch++;
 
-	plno = tlno = 0;
-	for (i = 0; i < patch->num; i++) {
-		struct chunk *chunk = &patch->chunks[i];
-
-		blame_chunk(sb, tlno, plno, chunk->same, target, parent);
-		plno = chunk->p_next;
-		tlno = chunk->t_next;
-	}
+	memset(&xpp, 0, sizeof(xpp));
+	xpp.flags = xdl_opts;
+	memset(&xecfg, 0, sizeof(xecfg));
+	xecfg.ctxlen = 0;
+	xdi_diff_hunks(&file_p, &file_o, blame_chunk_cb, &d, &xpp, &xecfg);
 	/* The rest (i.e. anything after tlno) are the same as the parent */
-	blame_chunk(sb, tlno, plno, last_in_target, target, parent);
+	blame_chunk(sb, d.tlno, d.plno, last_in_target, target, parent);
 
-	free_patch(patch);
 	return 0;
 }
 
@@ -925,6 +826,23 @@ static void handle_split(struct scoreboard *sb,
 	}
 }
 
+struct handle_split_cb_data {
+	struct scoreboard *sb;
+	struct blame_entry *ent;
+	struct origin *parent;
+	struct blame_entry *split;
+	long plno;
+	long tlno;
+};
+
+static void handle_split_cb(void *data, long same, long p_next, long t_next)
+{
+	struct handle_split_cb_data *d = data;
+	handle_split(d->sb, d->ent, d->tlno, d->plno, same, d->parent, d->split);
+	d->plno = p_next;
+	d->tlno = t_next;
+}
+
 /*
  * Find the lines from parent that are the same as ent so that
  * we can pass blames to it.  file_p has the blob contents for
@@ -939,8 +857,9 @@ static void find_copy_in_blob(struct scoreboard *sb,
 	const char *cp;
 	int cnt;
 	mmfile_t file_o;
-	struct patch *patch;
-	int i, plno, tlno;
+	struct handle_split_cb_data d = { sb, ent, parent, split, 0, 0 };
+	xpparam_t xpp;
+	xdemitconf_t xecfg;
 
 	/*
 	 * Prepare mmfile that contains only the lines in ent.
@@ -955,24 +874,18 @@ static void find_copy_in_blob(struct scoreboard *sb,
 	}
 	file_o.size = cp - file_o.ptr;
 
-	patch = compare_buffer(file_p, &file_o, 1);
-
 	/*
 	 * file_o is a part of final image we are annotating.
 	 * file_p partially may match that image.
 	 */
+	memset(&xpp, 0, sizeof(xpp));
+	xpp.flags = xdl_opts;
+	memset(&xecfg, 0, sizeof(xecfg));
+	xecfg.ctxlen = 1;
 	memset(split, 0, sizeof(struct blame_entry [3]));
-	plno = tlno = 0;
-	for (i = 0; i < patch->num; i++) {
-		struct chunk *chunk = &patch->chunks[i];
-
-		handle_split(sb, ent, tlno, plno, chunk->same, parent, split);
-		plno = chunk->p_next;
-		tlno = chunk->t_next;
-	}
+	xdi_diff_hunks(file_p, &file_o, handle_split_cb, &d, &xpp, &xecfg);
 	/* remainder, if any, all match the preimage */
-	handle_split(sb, ent, tlno, plno, ent->num_lines, parent, split);
-	free_patch(patch);
+	handle_split(sb, ent, d.tlno, d.plno, ent->num_lines, parent, split);
 }
 
 /*
-- 
1.6.0.3.514.g2f91b

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] blame: use xdi_diff_hunks(), get rid of struct patch
  2008-10-25 13:31         ` [PATCH 5/5] blame: use xdi_diff_hunks(), get rid of struct patch René Scharfe
@ 2008-10-25 19:36           ` Junio C Hamano
  2008-10-26 22:20             ` René Scharfe
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2008-10-25 19:36 UTC (permalink / raw)
  To: René Scharfe; +Cc: Brian Downing, git

René Scharfe <rene.scharfe@lsrfire.ath.cx> writes:

> Based on a patch by Brian Downing, this replaces the struct patch based
> code for blame passing with calls to xdi_diff_hunks().  This way we
> avoid generating and then parsing patches; we only let the interesting
> infos be passed to our callbacks instead.  This makes blame a bit faster:
>
>    $ blame="./git blame -M -C -C -p --incremental v1.6.0"
>
>    # master
>    $ /usr/bin/time $blame Makefile >/dev/null
>    1.38user 0.14system 0:01.52elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
>    0inputs+0outputs (0major+12226minor)pagefaults 0swaps
>    $ /usr/bin/time $blame cache.h >/dev/null
>    1.66user 0.13system 0:01.80elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
>    0inputs+0outputs (0major+12262minor)pagefaults 0swaps
>
>    # this patch series
>    $ /usr/bin/time $blame Makefile >/dev/null
>    1.27user 0.12system 0:01.40elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
>    0inputs+0outputs (0major+11836minor)pagefaults 0swaps
>    $ /usr/bin/time $blame cache.h >/dev/null
>    1.52user 0.12system 0:01.70elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
>    0inputs+0outputs (0major+12052minor)pagefaults 0swaps
>
> Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>

The resulting series reads quite clean.  I like it.

> Brian, your numbers looked much more impressive.  Could you please clock
> this code with your repository and the file server.c?  I wonder if this
> callback mechanism is just too complicated or if your case simply benefits
> lots more than the two files from git mentioned above.
>
> The patch series ends here without adding xdiff caching, for two reasons.
> It's quite easy to add it; patch 4 from your series applies unchanged and
> patch 5 is just needs a few small changes to account for the absence of
> compare_buffer().  More importantly, speed actually went down with caching
> for the test case.  The common tail optimization (xdi_diff() vs. xdl_diff())
> seems to beat caching for cache.h and Makefile..

Perhaps revision.c in our history would be more interesting than cache.h
or Makefile, as there are more line migrations from different places to
that file.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 5/5] blame: use xdi_diff_hunks(), get rid of struct patch
  2008-10-25 19:36           ` Junio C Hamano
@ 2008-10-26 22:20             ` René Scharfe
  0 siblings, 0 replies; 17+ messages in thread
From: René Scharfe @ 2008-10-26 22:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brian Downing, git

Junio C Hamano schrieb:
> Perhaps revision.c in our history would be more interesting than cache.h
> or Makefile, as there are more line migrations from different places to
> that file.

Indeed:

   # master
   $ /usr/bin/time $blame revision.c >/dev/null
   2.15user 0.27system 0:02.58elapsed 94%CPU (0avgtext+0avgdata 0maxresident)k
   3544inputs+0outputs (29major+13835minor)pagefaults 0swaps

   # this patch series
   $ /usr/bin/time $blame revision.c >/dev/null
   1.88user 0.14system 0:02.03elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
   0inputs+0outputs (0major+14068minor)pagefaults 0swaps

René

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2008-10-26 22:21 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-21 23:21 [PATCH 0/5] More git blame speed improvements Brian Downing
2008-08-21 23:21 ` [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff Brian Downing
2008-08-21 23:21   ` [PATCH 2/5] Bypass textual patch generation and parsing in git blame Brian Downing
2008-08-21 23:21     ` [PATCH 3/5] Always initialize xpparam_t to 0 Brian Downing
2008-08-21 23:22       ` [PATCH 4/5] Allow xdiff machinery to cache hash results for a file Brian Downing
2008-08-21 23:22         ` [PATCH 5/5] Use xdiff caching to improve git blame performance Brian Downing
2008-08-23  8:15   ` [PATCH 1/5] Allow alternate "low-level" emit function from xdl_diff René Scharfe
2008-08-23  9:03     ` Junio C Hamano
2008-08-24  8:12     ` Brian Downing
2008-09-03 22:29       ` René Scharfe
2008-10-25 13:30         ` [PATCH 1/5] blame: inline get_patch() René Scharfe
2008-10-25 13:30         ` [PATCH 2/5] Always initialize xpparam_t to 0 René Scharfe
2008-10-25 13:30         ` [PATCH 3/5] Allow alternate "low-level" emit function from xdl_diff René Scharfe
2008-10-25 13:31         ` [PATCH 4/5] add xdi_diff_hunks() for callers that only need hunk lengths René Scharfe
2008-10-25 13:31         ` [PATCH 5/5] blame: use xdi_diff_hunks(), get rid of struct patch René Scharfe
2008-10-25 19:36           ` Junio C Hamano
2008-10-26 22:20             ` René Scharfe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).