All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Minor bug fix and optimizations for revision/tree walking
@ 2010-08-25  2:53 Elijah Newren
  2010-08-25  2:53 ` [PATCH 1/7] Add testcase showing how pathspecs are ignored with rev-list --objects Elijah Newren
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Elijah Newren @ 2010-08-25  2:53 UTC (permalink / raw)
  To: git; +Cc: gitster, Elijah Newren

While working on sparse clones[1], I discovered a minor bug in the
tree-walking machinery for rev-list when both --objects and paths are
specified.  In particular, such a combination of options would
(correctly) result in the paths being used to select relevant commits,
but (incorrectly) being ignored when walking the subtrees of those
commits.  While I was at it, I also cleaned a few things up and
provided some small optimizations.  Passes all the tests for me.

NOTE: The last two patches are here mostly to see if anyone knows of
any real uses of git with ginormous trees that are really deep and
have few entries per tree.  These two patches might help with such a
case, but otherwise they seem to make the code a bit uglier and don't
really help performance-wise (and I think may even hurt a little bit).

Performance-wise, ignoring the last two patches, I get the following
approximate speedups:
  (A -  0%) git rev-list --quiet HEAD
  (B -  4%) git rev-list --quiet HEAD -- Documentation/
  (C -  3%) git rev-list --quiet HEAD -- t/
  (D -  1%) git rev-list --objects HEAD > /dev/null
  (E - 66%) git rev-list --objects HEAD -- Documentation/ > /dev/null

Complete timings (in seconds) on my laptop (core 2 duo?):
               A      B      C      D      E
  maint       0.35   0.69   1.35   1.92   1.40
  Patch-1     0.35   0.70   1.35   1.92   1.40
  Patch-2     0.34   0.69   1.35   1.90   0.85
  Patch-3     0.35   0.69   1.35   1.90   0.84
  Patch-4     0.34   0.70   1.35   1.90   0.85
  Patch-5     0.35   0.66   1.31   1.90   0.84
  Patch-6     0.35   0.66   1.31   1.92   0.82
  Patch-7     0.35   0.66   1.31   1.91   0.81

Note that for each case, I ran 6 times and averaged the last 5 runs.
I've rerun the cases a few times to regenerate the above table; the
numbers seem to vary about +/- 0.01 seconds between runs so the
speedups are slightly noisy given the small values, but they are
consistently positive for me.  I sometimes see the last two patches
have a little more negative impact, though it's pretty close to noise.


Elijah Newren (7):
  Add testcase showing how pathspecs are ignored with rev-list
    --objects
  Fix ignoring of pathspecs with rev-list --objects
  tree-walk: Correct bitrotted comment about tree_entry()
  tree_entry_interesting(): Make return value more specific
  diff_tree(): Skip skip_uninteresting() when all remaining paths
    interesting
  list-objects.c: Avoid recomputing interesting-ness for subtrees when
    possible
  tree-diff.c: Avoid recomputing interesting-ness for subtrees when
    possible

 diff.h                   |    1 +
 list-objects.c           |   27 ++++++++++++++++++---
 t/t6000-rev-list-misc.sh |   23 ++++++++++++++++++
 tree-diff.c              |   58 +++++++++++++++++++++++-----------------------
 tree-walk.h              |    4 ++-
 5 files changed, 79 insertions(+), 34 deletions(-)
 create mode 100755 t/t6000-rev-list-misc.sh

[1] It looks like Nguyễn Thái Ngọc Duy is much further along than I am
on such work.  Perhaps the only point in the work I was doing was to
enable me to help review a few of his patches (which I will try to
find some time to do), though perhaps the different route I have taken
will end up helping some.  It's fun and educational either way.

-- 
1.7.2.2.39.gf7e23

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] Add testcase showing how pathspecs are ignored with rev-list --objects
  2010-08-25  2:53 [PATCH 0/7] Minor bug fix and optimizations for revision/tree walking Elijah Newren
@ 2010-08-25  2:53 ` Elijah Newren
  2010-08-25  2:53 ` [PATCH 2/7] Fix ignoring of pathspecs " Elijah Newren
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2010-08-25  2:53 UTC (permalink / raw)
  To: git; +Cc: gitster, Elijah Newren


Signed-off-by: Elijah Newren <newren@gmail.com>
---
It looks like the 19 rev-list testcases (t6000*-t6019*) are all fairly
specific, and I didn't see one where this testcase seemed to make sense;
or perhaps I just missed it.  I just created this 20th file to hold all
other miscellaneous rev-list testcases; if it does make more sense in
one of the other existing files, just let me know.

 t/t6000-rev-list-misc.sh |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)
 create mode 100755 t/t6000-rev-list-misc.sh

diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
new file mode 100755
index 0000000..2c403ac
--- /dev/null
+++ b/t/t6000-rev-list-misc.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+test_description='miscellaneous rev-list tests'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+
+	echo content1 > wanted_file &&
+	echo content2 > unwanted_file &&
+	git add wanted_file unwanted_file &&
+	git commit -mone
+'
+
+test_expect_failure 'rev-list --objects heeds pathspecs' '
+
+	git rev-list --objects HEAD -- wanted_file >output &&
+	grep wanted_file output &&
+	! grep unwanted_file output
+
+'
+
+test_done
-- 
1.7.2.2.39.gf7e23

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/7] Fix ignoring of pathspecs with rev-list --objects
  2010-08-25  2:53 [PATCH 0/7] Minor bug fix and optimizations for revision/tree walking Elijah Newren
  2010-08-25  2:53 ` [PATCH 1/7] Add testcase showing how pathspecs are ignored with rev-list --objects Elijah Newren
@ 2010-08-25  2:53 ` Elijah Newren
  2010-08-25 22:11   ` Nguyen Thai Ngoc Duy
  2010-08-25  2:53 ` [PATCH 3/7] tree-walk: Correct bitrotted comment about tree_entry() Elijah Newren
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Elijah Newren @ 2010-08-25  2:53 UTC (permalink / raw)
  To: git; +Cc: gitster, Elijah Newren

When traversing commits, the selection of commits would heed the list of
pathspecs passed, but subsequent walking of the trees of those commits
would not.  This resulted in 'rev-list --objects HEAD -- <paths>'
displaying objects at unwanted paths.

Have process_tree() call tree_entry_interesting() to determine which paths
are interesting and should be walked.

Naturally, this change can provide a large speedup when paths are specified
together with --objects, since many tree entries are now correctly ignored.
Interestingly, though, this change also gives me a small (~1%) but
repeatable speedup even when no paths are specified with --objects.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 diff.h                   |    1 +
 list-objects.c           |   20 +++++++++++++++++++-
 t/t6000-rev-list-misc.sh |    2 +-
 tree-diff.c              |    2 +-
 4 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/diff.h b/diff.h
index 6fff024..35130b8 100644
--- a/diff.h
+++ b/diff.h
@@ -169,6 +169,7 @@ extern int diff_tree_sha1(const unsigned char *old, const unsigned char *new,
 			  const char *base, struct diff_options *opt);
 extern int diff_root_tree_sha1(const unsigned char *new, const char *base,
                                struct diff_options *opt);
+extern int tree_entry_interesting(struct tree_desc *desc, const char *base, int baselen, struct diff_options *opt);
 
 struct combine_diff_path {
 	struct combine_diff_path *next;
diff --git a/list-objects.c b/list-objects.c
index 8953548..daa50bf 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -67,6 +67,7 @@ static void process_tree(struct rev_info *revs,
 	struct tree_desc desc;
 	struct name_entry entry;
 	struct name_path me;
+	int all_interesting = (revs->diffopt.nr_paths == 0);
 
 	if (!revs->tree_objects)
 		return;
@@ -84,7 +85,24 @@ static void process_tree(struct rev_info *revs,
 
 	init_tree_desc(&desc, tree->buffer, tree->size);
 
-	while (tree_entry(&desc, &entry)) {
+	for (; desc.size; update_tree_entry(&desc)) {
+		entry = desc.entry;
+
+		if (!all_interesting) {
+			char *full_path = path_name(path, name);
+			int full_path_len = strlen(full_path);
+			int showit = tree_entry_interesting(&desc, full_path, full_path_len,
+							    &revs->diffopt);
+			free(full_path);
+
+			if (showit < 0)
+				break;
+			else if (!showit)
+				continue;
+			else if (showit == 2)
+				all_interesting = 1;
+		}
+
 		if (S_ISDIR(entry.mode))
 			process_tree(revs,
 				     lookup_tree(entry.sha1),
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 2c403ac..1bc395c 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -12,7 +12,7 @@ test_expect_success setup '
 	git commit -mone
 '
 
-test_expect_failure 'rev-list --objects heeds pathspecs' '
+test_expect_success 'rev-list --objects heeds pathspecs' '
 
 	git rev-list --objects HEAD -- wanted_file >output &&
 	grep wanted_file output &&
diff --git a/tree-diff.c b/tree-diff.c
index cd659c6..2fb670b 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -91,7 +91,7 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
  *  - zero for no
  *  - negative for "no, and no subsequent entries will be either"
  */
-static int tree_entry_interesting(struct tree_desc *desc, const char *base, int baselen, struct diff_options *opt)
+int tree_entry_interesting(struct tree_desc *desc, const char *base, int baselen, struct diff_options *opt)
 {
 	const char *path;
 	const unsigned char *sha1;
-- 
1.7.2.2.39.gf7e23

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] tree-walk: Correct bitrotted comment about tree_entry()
  2010-08-25  2:53 [PATCH 0/7] Minor bug fix and optimizations for revision/tree walking Elijah Newren
  2010-08-25  2:53 ` [PATCH 1/7] Add testcase showing how pathspecs are ignored with rev-list --objects Elijah Newren
  2010-08-25  2:53 ` [PATCH 2/7] Fix ignoring of pathspecs " Elijah Newren
@ 2010-08-25  2:53 ` Elijah Newren
  2010-08-25  2:53 ` [PATCH 4/7] tree_entry_interesting(): Make return value more specific Elijah Newren
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2010-08-25  2:53 UTC (permalink / raw)
  To: git; +Cc: gitster, Elijah Newren

There was a code comment that referred to the "above two functions" but
over time the functions immediately preceding the comment have changed.
Just mention the relevant functions by name.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 tree-walk.h |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/tree-walk.h b/tree-walk.h
index 42110a4..a5175fa 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -28,7 +28,9 @@ static inline int tree_entry_len(const char *name, const unsigned char *sha1)
 void update_tree_entry(struct tree_desc *);
 void init_tree_desc(struct tree_desc *desc, const void *buf, unsigned long size);
 
-/* Helper function that does both of the above and returns true for success */
+/* Helper function that does both tree_entry_extract() and update_tree_entry()
+ * and returns true for success
+ */
 int tree_entry(struct tree_desc *, struct name_entry *);
 
 void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1);
-- 
1.7.2.2.39.gf7e23

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] tree_entry_interesting(): Make return value more specific
  2010-08-25  2:53 [PATCH 0/7] Minor bug fix and optimizations for revision/tree walking Elijah Newren
                   ` (2 preceding siblings ...)
  2010-08-25  2:53 ` [PATCH 3/7] tree-walk: Correct bitrotted comment about tree_entry() Elijah Newren
@ 2010-08-25  2:53 ` Elijah Newren
  2010-08-25  2:53 ` [PATCH 5/7] diff_tree(): Skip skip_uninteresting() when all remaining paths interesting Elijah Newren
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2010-08-25  2:53 UTC (permalink / raw)
  To: git; +Cc: gitster, Elijah Newren

tree_entry_interesting() can signal to its callers not only if the given
entry matches one of the specified paths, but whether all remaining paths
will (or will not) match.  When no paths are specified, all paths are
considered interesting, so intead of returning 1 (this path is interesting)
return 2 (all paths are interesting).

This will allow the caller to avoid calling tree_entry_interesting() again,
which theoretically should speed up tree walking.  I am not able to measure
any actual gains in practice, but it certainly can not hurt and makes more
sense to me.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 tree-diff.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index 2fb670b..a740a9c 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -101,7 +101,7 @@ int tree_entry_interesting(struct tree_desc *desc, const char *base, int baselen
 	int never_interesting = -1;
 
 	if (!opt->nr_paths)
-		return 1;
+		return 2;
 
 	sha1 = tree_entry_extract(desc, &path, &mode);
 
-- 
1.7.2.2.39.gf7e23

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] diff_tree(): Skip skip_uninteresting() when all remaining paths interesting
  2010-08-25  2:53 [PATCH 0/7] Minor bug fix and optimizations for revision/tree walking Elijah Newren
                   ` (3 preceding siblings ...)
  2010-08-25  2:53 ` [PATCH 4/7] tree_entry_interesting(): Make return value more specific Elijah Newren
@ 2010-08-25  2:53 ` Elijah Newren
  2010-08-25 23:25   ` Junio C Hamano
  2010-08-25  2:53 ` [PATCH 6/7] list-objects.c: Avoid recomputing interesting-ness for subtrees when possible Elijah Newren
  2010-08-25  2:53 ` [PATCH 7/7] tree-diff.c: " Elijah Newren
  6 siblings, 1 reply; 16+ messages in thread
From: Elijah Newren @ 2010-08-25  2:53 UTC (permalink / raw)
  To: git; +Cc: gitster, Elijah Newren

In 1d848f6 (tree_entry_interesting(): allow it to say "everything is
interesting" 2007-03-21), both show_tree() and skip_uninteresting() were
modified to determine if all remaining tree entries were interesting.
However, the latter returns as soon as it finds the first interesting path,
without any way to signal to its caller (namely, diff_tree()) that all
remaining paths are interesting, making these extra checks useless.

Pass whether all remaining entries are interesting back to diff_tree(), and
whenever they are, have diff_tree() skip subsequent calls to
skip_uninteresting().

With this change, I measure speedups of 3-4% for the commands
  git rev-list --quiet HEAD -- Documentation/
  git rev-list --quiet HEAD -- t/
in git.git.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 tree-diff.c |   25 ++++++++++++-------------
 1 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index a740a9c..292cb6b 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -257,19 +257,12 @@ static void show_entry(struct diff_options *opt, const char *prefix, struct tree
 	}
 }
 
-static void skip_uninteresting(struct tree_desc *t, const char *base, int baselen, struct diff_options *opt)
+static void skip_uninteresting(struct tree_desc *t, const char *base, int baselen, struct diff_options *opt, int *all_interesting)
 {
-	int all_interesting = 0;
 	while (t->size) {
-		int show;
-
-		if (all_interesting)
-			show = 1;
-		else {
-			show = tree_entry_interesting(t, base, baselen, opt);
-			if (show == 2)
-				all_interesting = 1;
-		}
+		int show = tree_entry_interesting(t, base, baselen, opt);
+		if (show == 2)
+			*all_interesting = 1;
 		if (!show) {
 			update_tree_entry(t);
 			continue;
@@ -284,14 +277,20 @@ static void skip_uninteresting(struct tree_desc *t, const char *base, int basele
 int diff_tree(struct tree_desc *t1, struct tree_desc *t2, const char *base, struct diff_options *opt)
 {
 	int baselen = strlen(base);
+	int all_t1_interesting = 0;
+	int all_t2_interesting = 0;
 
 	for (;;) {
 		if (DIFF_OPT_TST(opt, QUICK) &&
 		    DIFF_OPT_TST(opt, HAS_CHANGES))
 			break;
 		if (opt->nr_paths) {
-			skip_uninteresting(t1, base, baselen, opt);
-			skip_uninteresting(t2, base, baselen, opt);
+			if (!all_t1_interesting)
+				skip_uninteresting(t1, base, baselen, opt,
+						   &all_t1_interesting);
+			if (!all_t2_interesting)
+				skip_uninteresting(t2, base, baselen, opt,
+						   &all_t2_interesting);
 		}
 		if (!t1->size) {
 			if (!t2->size)
-- 
1.7.2.2.39.gf7e23

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] list-objects.c: Avoid recomputing interesting-ness for subtrees when possible
  2010-08-25  2:53 [PATCH 0/7] Minor bug fix and optimizations for revision/tree walking Elijah Newren
                   ` (4 preceding siblings ...)
  2010-08-25  2:53 ` [PATCH 5/7] diff_tree(): Skip skip_uninteresting() when all remaining paths interesting Elijah Newren
@ 2010-08-25  2:53 ` Elijah Newren
  2010-08-25  2:53 ` [PATCH 7/7] tree-diff.c: " Elijah Newren
  6 siblings, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2010-08-25  2:53 UTC (permalink / raw)
  To: git; +Cc: gitster, Elijah Newren

Weather balloon patch.  Doesn't seem to help much in benchmarks; in fact
I think it sometimes hurts a bit.  Is it worthwhile just in case someone
comes up with a ginormous tree that is really deep with few entries per
tree?  I'm leaning against it, but am sending the patch to at least show
others that it has been considered.

---
No signed-off-by, since I'm not sold on this patch and am somewhat leaning
against it.  It would need a better commit message anyway.  :-)

 list-objects.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index daa50bf..66e4ccc 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -61,13 +61,13 @@ static void process_tree(struct rev_info *revs,
 			 struct tree *tree,
 			 show_object_fn show,
 			 struct name_path *path,
-			 const char *name)
+			 const char *name,
+			 int all_interesting)
 {
 	struct object *obj = &tree->object;
 	struct tree_desc desc;
 	struct name_entry entry;
 	struct name_path me;
-	int all_interesting = (revs->diffopt.nr_paths == 0);
 
 	if (!revs->tree_objects)
 		return;
@@ -106,7 +106,8 @@ static void process_tree(struct rev_info *revs,
 		if (S_ISDIR(entry.mode))
 			process_tree(revs,
 				     lookup_tree(entry.sha1),
-				     show, &me, entry.path);
+				     show, &me, entry.path,
+				     all_interesting);
 		else if (S_ISGITLINK(entry.mode))
 			process_gitlink(revs, entry.sha1,
 					show, &me, entry.path);
@@ -182,7 +183,7 @@ void traverse_commit_list(struct rev_info *revs,
 		}
 		if (obj->type == OBJ_TREE) {
 			process_tree(revs, (struct tree *)obj, show_object,
-				     NULL, name);
+				     NULL, name, revs->diffopt.nr_paths == 0);
 			continue;
 		}
 		if (obj->type == OBJ_BLOB) {
-- 
1.7.2.2.39.gf7e23

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] tree-diff.c: Avoid recomputing interesting-ness for subtrees when possible
  2010-08-25  2:53 [PATCH 0/7] Minor bug fix and optimizations for revision/tree walking Elijah Newren
                   ` (5 preceding siblings ...)
  2010-08-25  2:53 ` [PATCH 6/7] list-objects.c: Avoid recomputing interesting-ness for subtrees when possible Elijah Newren
@ 2010-08-25  2:53 ` Elijah Newren
  6 siblings, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2010-08-25  2:53 UTC (permalink / raw)
  To: git; +Cc: gitster, Elijah Newren

Weather balloon patch.  Doesn't seem to help much in benchmarks, and after
many runs I think it might occasionally hurt slightly.  It ought to help
in the case of a ginormous tree that is really deep with few entries per
tree, but that case seems kind of contrived.

Kind of makes the code look icky too.

---
No signed-off-by, since I'm not sold on this patch and am somewhat leaning
against it.  It would need a better commit message anyway.  :-)

 tree-diff.c |   29 +++++++++++++++--------------
 1 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index 292cb6b..3c34c77 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -25,9 +25,9 @@ static char *malloc_fullname(const char *base, int baselen, const char *path, in
 }
 
 static void show_entry(struct diff_options *opt, const char *prefix, struct tree_desc *desc,
-		       const char *base, int baselen);
+		       const char *base, int baselen, int all_interesting);
 
-static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const char *base, int baselen, struct diff_options *opt)
+static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const char *base, int baselen, struct diff_options *opt, int all_t1_interesting, int all_t2_interesting)
 {
 	unsigned mode1, mode2;
 	const char *path1, *path2;
@@ -42,11 +42,11 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
 	pathlen2 = tree_entry_len(path2, sha2);
 	cmp = base_name_compare(path1, pathlen1, mode1, path2, pathlen2, mode2);
 	if (cmp < 0) {
-		show_entry(opt, "-", t1, base, baselen);
+		show_entry(opt, "-", t1, base, baselen, all_t1_interesting);
 		return -1;
 	}
 	if (cmp > 0) {
-		show_entry(opt, "+", t2, base, baselen);
+		show_entry(opt, "+", t2, base, baselen, all_t2_interesting);
 		return 1;
 	}
 	if (!DIFF_OPT_TST(opt, FIND_COPIES_HARDER) && !hashcmp(sha1, sha2) && mode1 == mode2)
@@ -57,8 +57,8 @@ static int compare_tree_entry(struct tree_desc *t1, struct tree_desc *t2, const
 	 * file, we need to consider it a remove and an add.
 	 */
 	if (S_ISDIR(mode1) != S_ISDIR(mode2)) {
-		show_entry(opt, "-", t1, base, baselen);
-		show_entry(opt, "+", t2, base, baselen);
+		show_entry(opt, "-", t1, base, baselen, all_t1_interesting);
+		show_entry(opt, "+", t2, base, baselen, all_t2_interesting);
 		return 0;
 	}
 
@@ -197,9 +197,8 @@ int tree_entry_interesting(struct tree_desc *desc, const char *base, int baselen
 }
 
 /* A whole sub-tree went away or appeared */
-static void show_tree(struct diff_options *opt, const char *prefix, struct tree_desc *desc, const char *base, int baselen)
+static void show_tree(struct diff_options *opt, const char *prefix, struct tree_desc *desc, const char *base, int baselen, int all_interesting)
 {
-	int all_interesting = 0;
 	while (desc->size) {
 		int show;
 
@@ -214,14 +213,14 @@ static void show_tree(struct diff_options *opt, const char *prefix, struct tree_
 		if (show < 0)
 			break;
 		if (show)
-			show_entry(opt, prefix, desc, base, baselen);
+			show_entry(opt, prefix, desc, base, baselen, all_interesting);
 		update_tree_entry(desc);
 	}
 }
 
 /* A file entry went away or appeared */
 static void show_entry(struct diff_options *opt, const char *prefix, struct tree_desc *desc,
-		       const char *base, int baselen)
+		       const char *base, int baselen, int all_interesting)
 {
 	unsigned mode;
 	const char *path;
@@ -246,7 +245,7 @@ static void show_entry(struct diff_options *opt, const char *prefix, struct tree
 		}
 
 		init_tree_desc(&inner, tree, size);
-		show_tree(opt, prefix, &inner, newbase, baselen + 1 + pathlen);
+		show_tree(opt, prefix, &inner, newbase, baselen + 1 + pathlen, all_interesting);
 
 		free(tree);
 		free(newbase);
@@ -295,16 +294,18 @@ int diff_tree(struct tree_desc *t1, struct tree_desc *t2, const char *base, stru
 		if (!t1->size) {
 			if (!t2->size)
 				break;
-			show_entry(opt, "+", t2, base, baselen);
+			show_entry(opt, "+", t2, base, baselen, all_t2_interesting);
 			update_tree_entry(t2);
 			continue;
 		}
 		if (!t2->size) {
-			show_entry(opt, "-", t1, base, baselen);
+			show_entry(opt, "-", t1, base, baselen, all_t1_interesting);
 			update_tree_entry(t1);
 			continue;
 		}
-		switch (compare_tree_entry(t1, t2, base, baselen, opt)) {
+		switch (compare_tree_entry(t1, t2, base, baselen, opt,
+					   all_t1_interesting,
+					   all_t2_interesting)) {
 		case -1:
 			update_tree_entry(t1);
 			continue;
-- 
1.7.2.2.39.gf7e23

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/7] Fix ignoring of pathspecs with rev-list --objects
  2010-08-25  2:53 ` [PATCH 2/7] Fix ignoring of pathspecs " Elijah Newren
@ 2010-08-25 22:11   ` Nguyen Thai Ngoc Duy
  2010-08-25 23:50     ` Elijah Newren
  0 siblings, 1 reply; 16+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-08-25 22:11 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, gitster

On Wed, Aug 25, 2010 at 12:53 PM, Elijah Newren <newren@gmail.com> wrote:
> @@ -84,7 +85,24 @@ static void process_tree(struct rev_info *revs,
>
>        init_tree_desc(&desc, tree->buffer, tree->size);
>
> -       while (tree_entry(&desc, &entry)) {
> +       for (; desc.size; update_tree_entry(&desc)) {
> +               entry = desc.entry;
> +
> +               if (!all_interesting) {
> +                       char *full_path = path_name(path, name);
> +                       int full_path_len = strlen(full_path);
> +                       int showit = tree_entry_interesting(&desc,
> full_path, full_path_len,
> +                                                           &revs->diffopt);
> +                       free(full_path);

I wonder if we can avoid xmalloc/free so many times here. If full_path
is unchanged, how about moving it outside the loop?

> diff --git a/tree-diff.c b/tree-diff.c
> index cd659c6..2fb670b 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -91,7 +91,7 @@ static int compare_tree_entry(struct tree_desc *t1, struct
> tree_desc *t2, const
>  *  - zero for no
>  *  - negative for "no, and no subsequent entries will be either"
>  */
> -static int tree_entry_interesting(struct tree_desc *desc, const char *base,
> int baselen, struct diff_options *opt)
> +int tree_entry_interesting(struct tree_desc *desc, const char *base, int
> baselen, struct diff_options *opt)
>  {
>        const char *path;
>        const unsigned char *sha1;

While at it, can you please also fix its comments? The comments say
pathspec while what it uses is actually path prefix. Maybe something
like this

diff --git a/tree-diff.c b/tree-diff.c
index 3c34c77..514dbca 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -83,7 +83,7 @@ static int compare_tree_entry(struct tree_desc *t1,
struct tree_desc *t2, const
 }

 /*
- * Is a tree entry interesting given the pathspec we have?
+ * Is a tree entry interesting given the path prefix we have?
  *
  * Return:
  *  - 2 for "yes, and all subsequent entries will be"


This patch is good stuff. Please be informed I will steal this patch
for my narrow use.
-- 
Duy

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] diff_tree(): Skip skip_uninteresting() when all remaining paths interesting
  2010-08-25  2:53 ` [PATCH 5/7] diff_tree(): Skip skip_uninteresting() when all remaining paths interesting Elijah Newren
@ 2010-08-25 23:25   ` Junio C Hamano
  2010-08-25 23:57     ` Elijah Newren
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2010-08-25 23:25 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git

This and the previous patch seem to make sense to me, independent of the
rest.

As to mixing pathspecs with --objects, I would say it is unfair to Linus
to call it a bug that that particular use case has not been supported.  I'd
say it _is_ a bug that we didn't error out when --objects was given with
pathspec at the command line level, though, as that is a combination we
knowingly ignored to support.

It simply hasn't been deemed as a sane operation to produce a pack using
an output from rev-list with pathspec and --objects, as it will leave some
trees and blobs that the pack knows about in the result, without actually
having them in it.  In the context of "narrow clone", these wounds to
trees are deliberate, and the existence of these wounds alone is not the
reason why I called it insane to produce such a pack.

To make use of such a pack, however, you need to somehow cauterize these
deliberate wounds in trees so that fsck, a later run of pack-objects,
fetch-pack and friends will not choke on them.  We didn't plan to have
such an infrastructure so far, and that is what made such a pack with
thousands of cuts "insane".  As soon as "narrow clone" addresses that
issue, mixture of pathspecs with --objects stops being an insane use
case.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/7] Fix ignoring of pathspecs with rev-list --objects
  2010-08-25 22:11   ` Nguyen Thai Ngoc Duy
@ 2010-08-25 23:50     ` Elijah Newren
  2010-08-26  2:49       ` Elijah Newren
  2010-08-26 23:15       ` Nguyen Thai Ngoc Duy
  0 siblings, 2 replies; 16+ messages in thread
From: Elijah Newren @ 2010-08-25 23:50 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, gitster

On Wed, Aug 25, 2010 at 4:11 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
> On Wed, Aug 25, 2010 at 12:53 PM, Elijah Newren <newren@gmail.com> wrote:
>> @@ -84,7 +85,24 @@ static void process_tree(struct rev_info *revs,
>>
>>        init_tree_desc(&desc, tree->buffer, tree->size);
>>
>> -       while (tree_entry(&desc, &entry)) {
>> +       for (; desc.size; update_tree_entry(&desc)) {
>> +               entry = desc.entry;
>> +
>> +               if (!all_interesting) {
>> +                       char *full_path = path_name(path, name);
>> +                       int full_path_len = strlen(full_path);
>> +                       int showit = tree_entry_interesting(&desc,
>> full_path, full_path_len,
>> +                                                           &revs->diffopt);
>> +                       free(full_path);
>
> I wonder if we can avoid xmalloc/free so many times here. If full_path
> is unchanged, how about moving it outside the loop?

Good point.

>> diff --git a/tree-diff.c b/tree-diff.c
>> index cd659c6..2fb670b 100644
>> --- a/tree-diff.c
>> +++ b/tree-diff.c
>> @@ -91,7 +91,7 @@ static int compare_tree_entry(struct tree_desc *t1, struct
>> tree_desc *t2, const
>>  *  - zero for no
>>  *  - negative for "no, and no subsequent entries will be either"
>>  */
>> -static int tree_entry_interesting(struct tree_desc *desc, const char *base,
>> int baselen, struct diff_options *opt)
>> +int tree_entry_interesting(struct tree_desc *desc, const char *base, int
>> baselen, struct diff_options *opt)
>>  {
>>        const char *path;
>>        const unsigned char *sha1;
>
> While at it, can you please also fix its comments? The comments say
> pathspec while what it uses is actually path prefix. Maybe something
> like this
>
> diff --git a/tree-diff.c b/tree-diff.c
> index 3c34c77..514dbca 100644
> --- a/tree-diff.c
> +++ b/tree-diff.c
> @@ -83,7 +83,7 @@ static int compare_tree_entry(struct tree_desc *t1,
> struct tree_desc *t2, const
>  }
>
>  /*
> - * Is a tree entry interesting given the pathspec we have?
> + * Is a tree entry interesting given the path prefix we have?

I believe the comment is parsed thus: "tree entry" == combination of
desc, base, and baselen.  "pathspec" == paths and pathlens fields of
opt (which do provide a pathspec).  So I believe the original was
correct, though I can see how it's confusing at first.

>  *
>  * Return:
>  *  - 2 for "yes, and all subsequent entries will be"
>
>
> This patch is good stuff. Please be informed I will steal this patch
> for my narrow use.

Thanks, but I just found a bug with it; I'm investigating.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] diff_tree(): Skip skip_uninteresting() when all remaining paths interesting
  2010-08-25 23:25   ` Junio C Hamano
@ 2010-08-25 23:57     ` Elijah Newren
  2010-08-26  3:49       ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Elijah Newren @ 2010-08-25 23:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Aug 25, 2010 at 5:25 PM, Junio C Hamano <gitster@pobox.com> wrote:
> This and the previous patch seem to make sense to me, independent of the
> rest.

Thanks.

> As to mixing pathspecs with --objects, I would say it is unfair to Linus
> to call it a bug that that particular use case has not been supported.  I'd
> say it _is_ a bug that we didn't error out when --objects was given with
> pathspec at the command line level, though, as that is a combination we
> knowingly ignored to support.
>
> It simply hasn't been deemed as a sane operation to produce a pack using
> an output from rev-list with pathspec and --objects, as it will leave some
> trees and blobs that the pack knows about in the result, without actually
> having them in it.  In the context of "narrow clone", these wounds to
> trees are deliberate, and the existence of these wounds alone is not the
> reason why I called it insane to produce such a pack.
>
> To make use of such a pack, however, you need to somehow cauterize these
> deliberate wounds in trees so that fsck, a later run of pack-objects,
> fetch-pack and friends will not choke on them.  We didn't plan to have
> such an infrastructure so far, and that is what made such a pack with
> thousands of cuts "insane".  As soon as "narrow clone" addresses that
> issue, mixture of pathspecs with --objects stops being an insane use
> case.

Well, I did call it a minor bug, because I figured no one was using it
and so it wasn't hurting anyone.  But saying you knowingly ignored
supporting it is understandable.

I understand that for this combination of options to be useful, all
the other stuff you mention needs to be done.  Duy and I are both
working on separate ways of doing that; I just figured it made sense
to submit any independent pieces early and separately.  I don't think
supporting this would hurt anyone, as I don't think anyone could
accidentally use it at this point.  However, if you'd prefer that I
submitted a patch to just error out with this combination of options
for now, and resubmit this patch later with a full sparse clone
patchset, I can do that.  (Note though that my fix for supporting both
--objects and pathspec isn't correct; there's a bug I'm looking in
to.)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/7] Fix ignoring of pathspecs with rev-list --objects
  2010-08-25 23:50     ` Elijah Newren
@ 2010-08-26  2:49       ` Elijah Newren
  2010-08-26 23:15       ` Nguyen Thai Ngoc Duy
  1 sibling, 0 replies; 16+ messages in thread
From: Elijah Newren @ 2010-08-26  2:49 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git, gitster

On Wed, Aug 25, 2010 at 5:50 PM, Elijah Newren <newren@gmail.com> wrote:
> On Wed, Aug 25, 2010 at 4:11 PM, Nguyen Thai Ngoc Duy <pclouds@gmail.com> wrote:
>>
>> I wonder if we can avoid xmalloc/free so many times here. If full_path
>> is unchanged, how about moving it outside the loop?
>
> Good point.
>

<snip>

> Thanks, but I just found a bug with it; I'm investigating.

tree_entry_interesting assumes base[baselen-1] == '/' || baselen == 0;
it will fail to notice matches if this precondition is not met, and I
failed to meet it except for files at the toplevel tree.

I'll send out an updated series with a fix for this and the
improvement you suggested.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] diff_tree(): Skip skip_uninteresting() when all remaining paths interesting
  2010-08-25 23:57     ` Elijah Newren
@ 2010-08-26  3:49       ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2010-08-26  3:49 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git

Elijah Newren <newren@gmail.com> writes:

> ...  I don't think
> supporting this would hurt anyone, as I don't think anyone could
> accidentally use it at this point.

I agree 100%.  There is no harm in doing what the user asks us to do, so
producing list of objects that is path limited is fine [*1*].

After all, you can feed any random set of object names to pack-objects and
use the resulting pack as a way to sneakernet just a random set of objects
without any connectivity guarantee even today.

Also, since it is unlikely anybody would accidentally use the combination,
it wouldn't make much sense to retroactively add a sanity check that
errors out either, as that needs to be lifted later when you two are done.


[Footnote]

*1* As long as it is done correctly, of course.  Watch out for a case like
the same blob that is outside the pathspec (marked as "uninteresting")
appears inside (needs to be sent), i.e.

    mkdir two
    echo frotz >one
    cp one two/three
    git add one two/three
    git commit -m that
    git rev-list --objects two

which _should_ show the blob that records "frotz\n" because it appears at
"two/three" which is in the area, even though you happen to have the same
one at a path outside (i.e. "one").

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/7] Fix ignoring of pathspecs with rev-list --objects
  2010-08-25 23:50     ` Elijah Newren
  2010-08-26  2:49       ` Elijah Newren
@ 2010-08-26 23:15       ` Nguyen Thai Ngoc Duy
  2010-08-26 23:41         ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2010-08-26 23:15 UTC (permalink / raw)
  To: Elijah Newren; +Cc: git, gitster

On Thu, Aug 26, 2010 at 9:50 AM, Elijah Newren <newren@gmail.com> wrote:
>> While at it, can you please also fix its comments? The comments say
>> pathspec while what it uses is actually path prefix. Maybe something
>> like this
>>
>> diff --git a/tree-diff.c b/tree-diff.c
>> index 3c34c77..514dbca 100644
>> --- a/tree-diff.c
>> +++ b/tree-diff.c
>> @@ -83,7 +83,7 @@ static int compare_tree_entry(struct tree_desc *t1,
>> struct tree_desc *t2, const
>>  }
>>
>>  /*
>> - * Is a tree entry interesting given the pathspec we have?
>> + * Is a tree entry interesting given the path prefix we have?
>
> I believe the comment is parsed thus: "tree entry" == combination of
> desc, base, and baselen.  "pathspec" == paths and pathlens fields of
> opt (which do provide a pathspec).  So I believe the original was
> correct, though I can see how it's confusing at first.

Pathspec as in match_pathspec() in dir.c allows wildcards. Though it
seems only used with worktree or index-related operations. To me the
one with wildcard support is called pathspec. The one without is path
prefix. But it's probably just me.

There is also a GSoC suggestion about this [1] and I posted a related
patch a while back [2] but forgot it until now.

[1] https://git.wiki.kernel.org/index.php/SoC2010Ideas#Unify_Pathspec_Semantics

[2] http://mid.gmane.org/1243240924-5981-1-git-send-email-pclouds@gmail.com
-- 
Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/7] Fix ignoring of pathspecs with rev-list --objects
  2010-08-26 23:15       ` Nguyen Thai Ngoc Duy
@ 2010-08-26 23:41         ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2010-08-26 23:41 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: Elijah Newren, git

Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:

> On Thu, Aug 26, 2010 at 9:50 AM, Elijah Newren <newren@gmail.com> wrote:
>>> While at it, can you please also fix its comments? The comments say
>>> pathspec while what it uses is actually path prefix...
>>> ...
>>> - * Is a tree entry interesting given the pathspec we have?
>>> + * Is a tree entry interesting given the path prefix we have?
>>
>> I believe the comment is parsed thus: "tree entry" == combination of
>> desc, base, and baselen... So I believe the original was
>> correct, though I can see how it's confusing at first.
>
> Pathspec as in match_pathspec() in dir.c allows wildcards...
>
> There is also a GSoC suggestion about this [1] and I posted a related
> patch a while back [2] but forgot it until now.
>
> [1] https://git.wiki.kernel.org/index.php/SoC2010Ideas#Unify_Pathspec_Semantics
>
> [2] http://mid.gmane.org/1243240924-5981-1-git-send-email-pclouds@gmail.com

As [1] you quoted says, both are called pathspecs and because our longer
term goal is to unify them, I think the original comment is fine.

I wonder why I was Cc'ed on this shed-coloring committee, though...

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2010-08-26 23:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-25  2:53 [PATCH 0/7] Minor bug fix and optimizations for revision/tree walking Elijah Newren
2010-08-25  2:53 ` [PATCH 1/7] Add testcase showing how pathspecs are ignored with rev-list --objects Elijah Newren
2010-08-25  2:53 ` [PATCH 2/7] Fix ignoring of pathspecs " Elijah Newren
2010-08-25 22:11   ` Nguyen Thai Ngoc Duy
2010-08-25 23:50     ` Elijah Newren
2010-08-26  2:49       ` Elijah Newren
2010-08-26 23:15       ` Nguyen Thai Ngoc Duy
2010-08-26 23:41         ` Junio C Hamano
2010-08-25  2:53 ` [PATCH 3/7] tree-walk: Correct bitrotted comment about tree_entry() Elijah Newren
2010-08-25  2:53 ` [PATCH 4/7] tree_entry_interesting(): Make return value more specific Elijah Newren
2010-08-25  2:53 ` [PATCH 5/7] diff_tree(): Skip skip_uninteresting() when all remaining paths interesting Elijah Newren
2010-08-25 23:25   ` Junio C Hamano
2010-08-25 23:57     ` Elijah Newren
2010-08-26  3:49       ` Junio C Hamano
2010-08-25  2:53 ` [PATCH 6/7] list-objects.c: Avoid recomputing interesting-ness for subtrees when possible Elijah Newren
2010-08-25  2:53 ` [PATCH 7/7] tree-diff.c: " Elijah Newren

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.