All of lore.kernel.org
 help / color / mirror / Atom feed
* Funny 'git describe --contains' output
@ 2012-08-29  4:48 Greg KH
  2012-08-29  5:57 ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Greg KH @ 2012-08-29  4:48 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds

Hi,

In the Linux kernel tree, commit 0136db586c028f71e7cc21cc183064ff0d5919
is a bit "odd".

If I go to look to see what release it was in, I normally do:
	$ git describe --contains 0136db586c028f71e7cc21cc183064ff0d5919
	v3.6-rc1~59^2~56^2~76

However, it really showed up first in the 3.5-rc1 kernel release, as can
be seen by doing the following:
	$ git tag --contains 0136db586c028f71e7cc21cc183064ff0d5919
	v3.5
	v3.5-rc1
	v3.5-rc2
	v3.5-rc3
	v3.5-rc4
	v3.5-rc5
	v3.5-rc6
	v3.5-rc7
	v3.6-rc1
	v3.6-rc2
	v3.6-rc3

This commit ended up coming into Linus's tree in two different places,
both in 3.5-rc1 and in 3.6-rc1, through different merge requests, so it
seems to be tricky to figure out when it "first" went in.

Asking Linus about this, he tried the following:

	$ git name-rev --tags 0136db586c028f71e7cc21cc183064ff0d5919
	0136db586c028f71e7cc21cc183064ff0d5919 tags/v3.6-rc1~59^2~56^2~76
	$ git rev-list 0136db586c028f71e7cc21cc183064ff0d5919..v3.5-rc1 | wc
	  11415   11415  468015
	$ git rev-list 0136db586c028f71e7cc21cc183064ff0d5919..v3.4-rc1 | wc
	  0       0       0
	$ git rev-list 0136db586c028f71e7cc21cc183064ff0d5919..v3.6-rc1 | wc
	  22279   22279  913439

which shows that there are "less" commits to get from this commit to
v3.5-rc1 instead of v3.6-rc1, so something odd is going on here.

Any ideas?

I can reproduce this right now with git version 1.7.12.116.g31e0100

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Funny 'git describe --contains' output
  2012-08-29  4:48 Funny 'git describe --contains' output Greg KH
@ 2012-08-29  5:57 ` Junio C Hamano
  2012-08-29  6:36   ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2012-08-29  5:57 UTC (permalink / raw)
  To: Greg KH; +Cc: git, Linus Torvalds

Greg KH <gregkh@linuxfoundation.org> writes:

> In the Linux kernel tree, commit 0136db586c028f71e7cc21cc183064ff0d5919
> is a bit "odd".
>
> If I go to look to see what release it was in, I normally do:
> 	$ git describe --contains 0136db586c028f71e7cc21cc183064ff0d5919
> 	v3.6-rc1~59^2~56^2~76
> ...
> Any ideas?

That is 59 + 1 + 56 + 1 + 76 = 193 steps away from the tag v3.6-rc1.

$ git name-rev --refs=refs/tags/v3.5-rc1 0136db58
0136db58 tags/v3.5-rc1~83^2~81^2~76

which is 83 + 1 + 81 + 1 + 76 = 242 steps away from that tag.

So it _is_ odd that the newly tagged tip merged a branch that had
smaller development since it merged the commit, but name-rev seems
to be measuring the steps it takes from the tags to reach the commit
and giving us the one that gives the shortest path correctly.

Obviously, that is not the same as "which tag is the oldest one
among the ones that can reach this commit?"

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Funny 'git describe --contains' output
  2012-08-29  5:57 ` Junio C Hamano
@ 2012-08-29  6:36   ` Junio C Hamano
  2012-08-29 18:17     ` Greg KH
  2012-08-29 21:17     ` [PATCH 0/3] "git name-rev --weight" Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-08-29  6:36 UTC (permalink / raw)
  To: Greg KH; +Cc: git, Linus Torvalds

Junio C Hamano <gitster@pobox.com> writes:

> Greg KH <gregkh@linuxfoundation.org> writes:
>
>> In the Linux kernel tree, commit 0136db586c028f71e7cc21cc183064ff0d5919
>> is a bit "odd".
>>
>> If I go to look to see what release it was in, I normally do:
>> 	$ git describe --contains 0136db586c028f71e7cc21cc183064ff0d5919
>> 	v3.6-rc1~59^2~56^2~76
>> ...
>> Any ideas?
>
> That is 59 + 1 + 56 + 1 + 76 = 193 steps away from the tag v3.6-rc1.
>
> $ git name-rev --refs=refs/tags/v3.5-rc1 0136db58
> 0136db58 tags/v3.5-rc1~83^2~81^2~76
>
> which is 83 + 1 + 81 + 1 + 76 = 242 steps away from that tag.
>
> So it _is_ odd that the newly tagged tip merged a branch that had
> smaller development since it merged the commit, but name-rev seems
> to be measuring the steps it takes from the tags to reach the commit
> and giving us the one that gives the shortest path correctly.
>
> Obviously, that is not the same as "which tag is the oldest one
> among the ones that can reach this commit?"

As is usual for what I say, the above is an explanation of what we
are seeing, not necessarily a justification.

Given a history of this shape:

        o---o---o---o TONS!!!
                     \
 ---o--o--o--o--o--Y--o---o---Z
     \   /               /
      \ /               /
       X---------------o

where Y is v3.5-rc1 and Z is v3.6-rc1, "name-rev X" measures the
distance of the shortest path between Z and X (Z^^2^ = 3 steps away)
and between Y and X (Y~3^2 = 4 steps away), and uses the tag with
the shortest path.

But in order to answer "which is the earlier tag that merges X",
what "name-rev" measures is not very interesting.

What we want to see is the tag whose "weight" (imagine these commits
are beads on strings, and you hold the tag between your fingers and
lift it, pulling all the commits behind it on the history) is the
smallest and reaches the commit X in question.  The distance on the
shortest path to X totally ignores tons of merges that went into the
mainline between Y and Z.  That is what makes name-rev not useful
for this purpose.

That "weight" is what Linus's "rev-list | wc -l" showed, but it is
fairly expensive to compute.  We do have a code that computes such
weight in the history bisection code (it computes this exact weight
for each and every commit that is still suspect, and picks the one
that is half-way).  We know how to compute it, but I suspect that
applying that code naively to name-rev would make it unusably slow.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Funny 'git describe --contains' output
  2012-08-29  6:36   ` Junio C Hamano
@ 2012-08-29 18:17     ` Greg KH
  2012-08-29 21:17     ` [PATCH 0/3] "git name-rev --weight" Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Greg KH @ 2012-08-29 18:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Tue, Aug 28, 2012 at 11:36:46PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Greg KH <gregkh@linuxfoundation.org> writes:
> >
> >> In the Linux kernel tree, commit 0136db586c028f71e7cc21cc183064ff0d5919
> >> is a bit "odd".
> >>
> >> If I go to look to see what release it was in, I normally do:
> >> 	$ git describe --contains 0136db586c028f71e7cc21cc183064ff0d5919
> >> 	v3.6-rc1~59^2~56^2~76
> >> ...
> >> Any ideas?
> >
> > That is 59 + 1 + 56 + 1 + 76 = 193 steps away from the tag v3.6-rc1.
> >
> > $ git name-rev --refs=refs/tags/v3.5-rc1 0136db58
> > 0136db58 tags/v3.5-rc1~83^2~81^2~76
> >
> > which is 83 + 1 + 81 + 1 + 76 = 242 steps away from that tag.
> >
> > So it _is_ odd that the newly tagged tip merged a branch that had
> > smaller development since it merged the commit, but name-rev seems
> > to be measuring the steps it takes from the tags to reach the commit
> > and giving us the one that gives the shortest path correctly.
> >
> > Obviously, that is not the same as "which tag is the oldest one
> > among the ones that can reach this commit?"
> 
> As is usual for what I say, the above is an explanation of what we
> are seeing, not necessarily a justification.
> 
> Given a history of this shape:
> 
>         o---o---o---o TONS!!!
>                      \
>  ---o--o--o--o--o--Y--o---o---Z
>      \   /               /
>       \ /               /
>        X---------------o
> 
> where Y is v3.5-rc1 and Z is v3.6-rc1, "name-rev X" measures the
> distance of the shortest path between Z and X (Z^^2^ = 3 steps away)
> and between Y and X (Y~3^2 = 4 steps away), and uses the tag with
> the shortest path.
> 
> But in order to answer "which is the earlier tag that merges X",
> what "name-rev" measures is not very interesting.
> 
> What we want to see is the tag whose "weight" (imagine these commits
> are beads on strings, and you hold the tag between your fingers and
> lift it, pulling all the commits behind it on the history) is the
> smallest and reaches the commit X in question.  The distance on the
> shortest path to X totally ignores tons of merges that went into the
> mainline between Y and Z.  That is what makes name-rev not useful
> for this purpose.
> 
> That "weight" is what Linus's "rev-list | wc -l" showed, but it is
> fairly expensive to compute.  We do have a code that computes such
> weight in the history bisection code (it computes this exact weight
> for each and every commit that is still suspect, and picks the one
> that is half-way).  We know how to compute it, but I suspect that
> applying that code naively to name-rev would make it unusably slow.

Thanks for the full explaination.  "Normally" this never is an issue for
me, as this is the first time, in the history of Linux stable kernel
releases, that I've ever noticed this.  And I agree, it's probably not
something that can easily be resolved in git, given how it's calculated.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 0/3] "git name-rev --weight"
  2012-08-29  6:36   ` Junio C Hamano
  2012-08-29 18:17     ` Greg KH
@ 2012-08-29 21:17     ` Junio C Hamano
  2012-08-29 21:17       ` [PATCH 1/3] name-rev: lose unnecessary typedef Junio C Hamano
                         ` (3 more replies)
  1 sibling, 4 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-08-29 21:17 UTC (permalink / raw)
  To: git; +Cc: Greg KH

So here is an attempt to teach "name-rev" a mode that tries to base
its name on oldest tag that can reach the commit.  It needs the
reset_revision_walk() call recently added to the revision traversal
API, and applies to bcc0a3e (v1.7.11-rc0~111^2~2) or newer.

Note that this can benefit from caching, as the "weight" of the tag
(rather, the commit that is tagged) will never change once a history
is made, but that part is left as an exercise to the reader.

It correctly names 0136db586c in the kernel history as based on
v3.5-rc1 as tags/v3.5-rc1~83^2~81^2~76, not on v3.6-rc1, as we saw
on the list recently.

Once it is verified to operate correctly and updated to perform
properly, we can start passing --weight when "describe --contains"
runs the command.

Junio C Hamano (3):
  name-rev: lose unnecessary typedef
  name_rev: clarify when a new tip-name is assigned to a commit
  name-rev: --weight option (WIP)

 builtin/name-rev.c | 142 ++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 120 insertions(+), 22 deletions(-)

-- 
1.7.12.285.ga3d5fc0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] name-rev: lose unnecessary typedef
  2012-08-29 21:17     ` [PATCH 0/3] "git name-rev --weight" Junio C Hamano
@ 2012-08-29 21:17       ` Junio C Hamano
  2012-08-29 21:17       ` [PATCH 2/3] name_rev: clarify when a new tip-name is assigned to a commit Junio C Hamano
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-08-29 21:17 UTC (permalink / raw)
  To: git; +Cc: Greg KH

Just spell it "struct rev_name"; it makes it more clear what is
going on.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/name-rev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 1b37458..8af2cfa 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -7,11 +7,11 @@
 
 #define CUTOFF_DATE_SLOP 86400 /* one day */
 
-typedef struct rev_name {
+struct rev_name {
 	const char *tip_name;
 	int generation;
 	int distance;
-} rev_name;
+};
 
 static long cutoff = LONG_MAX;
 
@@ -43,7 +43,7 @@ static void name_rev(struct commit *commit,
 	}
 
 	if (name == NULL) {
-		name = xmalloc(sizeof(rev_name));
+		name = xmalloc(sizeof(struct rev_name));
 		commit->util = name;
 		goto copy_data;
 	} else if (name->distance > distance) {
-- 
1.7.12.285.ga3d5fc0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] name_rev: clarify when a new tip-name is assigned to a commit
  2012-08-29 21:17     ` [PATCH 0/3] "git name-rev --weight" Junio C Hamano
  2012-08-29 21:17       ` [PATCH 1/3] name-rev: lose unnecessary typedef Junio C Hamano
@ 2012-08-29 21:17       ` Junio C Hamano
  2012-08-29 21:17       ` [PATCH 3/3] name-rev: --weight option (WIP) Junio C Hamano
  2012-08-30  7:06       ` [PATCH 0/3] "git name-rev --weight" Philip Oakley
  3 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-08-29 21:17 UTC (permalink / raw)
  To: git; +Cc: Greg KH

In preparation for the later changes, restructure the logic a little
bit to separate how the code decides to use the new "tip" for naming
a particular commit, and what happens based on the decision.

Also re-indent and correct style of this function while we are at it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/name-rev.c | 45 +++++++++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 8af2cfa..ebbf541 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -19,12 +19,13 @@ static long cutoff = LONG_MAX;
 #define MERGE_TRAVERSAL_WEIGHT 65535
 
 static void name_rev(struct commit *commit,
-		const char *tip_name, int generation, int distance,
-		int deref)
+		     const char *tip_name, int generation, int distance,
+		     int deref)
 {
 	struct rev_name *name = (struct rev_name *)commit->util;
 	struct commit_list *parents;
-	int parent_number = 1;
+	int parent_number;
+	int use_this_tip = 0;
 
 	if (!commit->object.parsed)
 		parse_commit(commit);
@@ -42,21 +43,26 @@ static void name_rev(struct commit *commit,
 			die("generation: %d, but deref?", generation);
 	}
 
-	if (name == NULL) {
-		name = xmalloc(sizeof(struct rev_name));
+	if (!name) {
+		name = xcalloc(1, sizeof(struct rev_name));
 		commit->util = name;
-		goto copy_data;
-	} else if (name->distance > distance) {
-copy_data:
-		name->tip_name = tip_name;
-		name->generation = generation;
-		name->distance = distance;
-	} else
+		use_this_tip = 1;
+	}
+
+	if (distance < name->distance)
+		use_this_tip = 1;
+
+	if (!use_this_tip)
 		return;
 
-	for (parents = commit->parents;
-			parents;
-			parents = parents->next, parent_number++) {
+	name->tip_name = tip_name;
+	name->generation = generation;
+	name->distance = distance;
+
+	/* Propagate our name to our parents */
+	for (parents = commit->parents, parent_number = 1;
+	     parents;
+	     parents = parents->next, parent_number++) {
 		if (parent_number > 1) {
 			int len = strlen(tip_name);
 			char *new_name = xmalloc(len +
@@ -68,16 +74,15 @@ copy_data:
 				len -= 2;
 			if (generation > 0)
 				sprintf(new_name, "%.*s~%d^%d", len, tip_name,
-						generation, parent_number);
+					generation, parent_number);
 			else
 				sprintf(new_name, "%.*s^%d", len, tip_name,
-						parent_number);
-
+					parent_number);
 			name_rev(parents->item, new_name, 0,
-				distance + MERGE_TRAVERSAL_WEIGHT, 0);
+				 distance + MERGE_TRAVERSAL_WEIGHT, 0);
 		} else {
 			name_rev(parents->item, tip_name, generation + 1,
-				distance + 1, 0);
+				 distance + 1, 0);
 		}
 	}
 }
-- 
1.7.12.285.ga3d5fc0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] name-rev: --weight option (WIP)
  2012-08-29 21:17     ` [PATCH 0/3] "git name-rev --weight" Junio C Hamano
  2012-08-29 21:17       ` [PATCH 1/3] name-rev: lose unnecessary typedef Junio C Hamano
  2012-08-29 21:17       ` [PATCH 2/3] name_rev: clarify when a new tip-name is assigned to a commit Junio C Hamano
@ 2012-08-29 21:17       ` Junio C Hamano
  2012-08-29 23:37         ` Junio C Hamano
  2012-08-30  3:51         ` Jeff King
  2012-08-30  7:06       ` [PATCH 0/3] "git name-rev --weight" Philip Oakley
  3 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-08-29 21:17 UTC (permalink / raw)
  To: git; +Cc: Greg KH

Instead of naming a rev after a tip that is topologically closest,
use the tip that is the oldest one among those which contain the
rev.

The semantics "name-rev --weight" would give is closer to what
people expect from "describe --contains".

Note that this is fairly expensive (see NEEDSWORK comment in the
code).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/name-rev.c | 97 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 95 insertions(+), 2 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index ebbf541..69da41d 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -4,6 +4,8 @@
 #include "tag.h"
 #include "refs.h"
 #include "parse-options.h"
+#include "diff.h"
+#include "revision.h"
 
 #define CUTOFF_DATE_SLOP 86400 /* one day */
 
@@ -11,8 +13,85 @@ struct rev_name {
 	const char *tip_name;
 	int generation;
 	int distance;
+	int weight;
 };
 
+/*
+ * Historically, "name-rev" named a rev based on the tip that is
+ * closest to it.
+ *
+ * It does not give a good answer to "what is the earliest tag that
+ * contains the commit?", however, because you can build a new commit
+ * on top of an ancient commit X, merge it to the tip and tag the
+ * result, which would make X reachable from the new tag in two hops,
+ * even though it appears in the part of the history that is contained
+ * in other ancient tags.
+ *
+ * In order to answer that question, "name-rev" can be told to name a
+ * rev based on the tip that has smallest number of commits behind it.
+ */
+static int use_weight;
+
+/*
+ * NEEDSWORK: the result of this computation must be cached to
+ * a dedicated notes tree, keyed by the commit object name.
+ */
+static int compute_tip_weight(struct commit *commit)
+{
+	struct rev_info revs;
+	int weight = 1; /* give root the weight of 1 */
+
+	reset_revision_walk();
+	init_revisions(&revs, NULL);
+	add_pending_object(&revs, (struct object *)commit, NULL);
+	prepare_revision_walk(&revs);
+	while (get_revision(&revs))
+		weight++;
+	return weight;
+}
+
+static int tip_weight(const char *tip, size_t reflen)
+{
+	struct strbuf buf = STRBUF_INIT;
+	unsigned char sha1[20];
+	struct commit *commit;
+	struct rev_name *name;
+
+	strbuf_add(&buf, tip, reflen);
+	if (get_sha1(buf.buf, sha1))
+		die("Internal error: cannot parse tip '%s'", tip);
+	strbuf_release(&buf);
+
+	commit = lookup_commit_reference_gently(sha1, 0);
+	if (!commit)
+		die("Internal error: cannot look up commit '%s'", tip);
+	name = commit->util;
+	if (!name)
+		die("Internal error: a tip without name '%s'", tip);
+	if (!name->weight)
+		name->weight = compute_tip_weight(commit);
+	return name->weight;
+}
+
+static int tip_weight_cmp(const char *a, const char *b)
+{
+	size_t reflen_a, reflen_b;
+	static const char traversal[] = "^~";
+
+	/*
+	 * A "tip" may look like <refname> followed by traversal
+	 * instruction (e.g. ^2~74).  We only are interested in
+	 * the weight of the ref part.
+	 */
+	reflen_a = strcspn(a, traversal);
+	reflen_b = strcspn(b, traversal);
+
+	if (reflen_a == reflen_b && !memcmp(a, b, reflen_a))
+		return 0;
+
+	return tip_weight(a, reflen_a) - tip_weight(b, reflen_b);
+}
+
 static long cutoff = LONG_MAX;
 
 /* How many generations are maximally preferred over _one_ merge traversal? */
@@ -49,8 +128,20 @@ static void name_rev(struct commit *commit,
 		use_this_tip = 1;
 	}
 
-	if (distance < name->distance)
-		use_this_tip = 1;
+	if (!use_weight) {
+		if (distance < name->distance)
+			use_this_tip = 1;
+	} else {
+		if (!name->tip_name)
+			use_this_tip = 1;
+		else {
+			int cmp = tip_weight_cmp(name->tip_name, tip_name);
+			if (0 < cmp)
+				use_this_tip = 1;
+			else if (!cmp && distance < name->distance)
+				use_this_tip = 1;
+		}
+	}
 
 	if (!use_this_tip)
 		return;
@@ -241,6 +332,8 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
 		OPT_BOOLEAN(0, "undefined", &allow_undefined, "allow to print `undefined` names"),
 		OPT_BOOLEAN(0, "always",     &always,
 			   "show abbreviated commit object as fallback"),
+		OPT_BOOLEAN(0, "weight", &use_weight,
+			    "name revs based on the oldest tip that contain them"),
 		OPT_END(),
 	};
 
-- 
1.7.12.285.ga3d5fc0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] name-rev: --weight option (WIP)
  2012-08-29 21:17       ` [PATCH 3/3] name-rev: --weight option (WIP) Junio C Hamano
@ 2012-08-29 23:37         ` Junio C Hamano
  2012-08-30  3:36           ` Jeff King
  2012-08-30  3:51         ` Jeff King
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2012-08-29 23:37 UTC (permalink / raw)
  To: git; +Cc: Greg KH

Junio C Hamano <gitster@pobox.com> writes:

> Note that this is fairly expensive (see NEEDSWORK comment in the
> code).

And this is with the "notes-cache".

    (priming the cache from scratch)
    $ rm .git/refs/notes/name-rev-weight 
    $ /usr/bin/time ../git.git/git-name-rev --weight --tags 0136db586c
    0136db586c tags/v3.5-rc1~83^2~81^2~76
    6.06user 0.46system 0:06.54elapsed 99%CPU (0avgtext+0avgdata 1861456maxresident)k
    8inputs+16outputs (0major+128576minor)pagefaults 0swaps

    (with valid cache)
    $ /usr/bin/time ../git.git/git-name-rev --weight --tags 0136db586c
    0136db586c tags/v3.5-rc1~83^2~81^2~76
    0.50user 0.22system 0:00.72elapsed 100%CPU (0avgtext+0avgdata 244224maxresident)k
    0inputs+0outputs (0major+16062minor)pagefaults 0swaps

    (the old "shortest path" version)
    $ /usr/bin/time git name-rev --tags 0136db586c
    0136db586c tags/v3.6-rc1~59^2~56^2~76
    0.31user 0.01system 0:00.32elapsed 100%CPU (0avgtext+0avgdata 243488maxresident)k
    0inputs+0outputs (0major+16000minor)pagefaults 0swaps


 builtin/name-rev.c | 38 +++++++++++++++++++++++++++++++++-----
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git c/builtin/name-rev.c w/builtin/name-rev.c
index 69da41d..fdd087c 100644
--- c/builtin/name-rev.c
+++ w/builtin/name-rev.c
@@ -6,6 +6,7 @@
 #include "parse-options.h"
 #include "diff.h"
 #include "revision.h"
+#include "notes-cache.h"
 
 #define CUTOFF_DATE_SLOP 86400 /* one day */
 
@@ -32,10 +33,6 @@ struct rev_name {
  */
 static int use_weight;
 
-/*
- * NEEDSWORK: the result of this computation must be cached to
- * a dedicated notes tree, keyed by the commit object name.
- */
 static int compute_tip_weight(struct commit *commit)
 {
 	struct rev_info revs;
@@ -50,6 +47,31 @@ static int compute_tip_weight(struct commit *commit)
 	return weight;
 }
 
+static struct notes_cache weight_cache;
+static int weight_cache_updated;
+
+static int get_tip_weight(struct commit *commit)
+{
+	struct strbuf buf = STRBUF_INIT;
+	size_t sz;
+	int weight;
+	char *note = notes_cache_get(&weight_cache, commit->object.sha1, &sz);
+
+	if (note && !strtol_i(note, 10, &weight)) {
+		free(note);
+		return weight;
+	}
+	free(note);
+
+	weight = compute_tip_weight(commit);
+	strbuf_addf(&buf, "%d", weight);
+	notes_cache_put(&weight_cache, commit->object.sha1,
+			buf.buf, buf.len);
+	strbuf_release(&buf);
+	weight_cache_updated = 1;
+	return weight;
+}
+
 static int tip_weight(const char *tip, size_t reflen)
 {
 	struct strbuf buf = STRBUF_INIT;
@@ -69,7 +91,7 @@ static int tip_weight(const char *tip, size_t reflen)
 	if (!name)
 		die("Internal error: a tip without name '%s'", tip);
 	if (!name->weight)
-		name->weight = compute_tip_weight(commit);
+		name->weight = get_tip_weight(commit);
 	return name->weight;
 }
 
@@ -346,6 +368,9 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
 	if (all || transform_stdin)
 		cutoff = 0;
 
+	if (use_weight)
+		notes_cache_init(&weight_cache, "name-rev-weight", "2012-08-29");
+
 	for (; argc; argc--, argv++) {
 		unsigned char sha1[20];
 		struct object *o;
@@ -401,5 +426,8 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
 				  always, allow_undefined, data.name_only);
 	}
 
+	if (use_weight && weight_cache_updated)
+		notes_cache_write(&weight_cache);
+
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] name-rev: --weight option (WIP)
  2012-08-29 23:37         ` Junio C Hamano
@ 2012-08-30  3:36           ` Jeff King
  2012-08-30  3:53             ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Jeff King @ 2012-08-30  3:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Greg KH

On Wed, Aug 29, 2012 at 04:37:06PM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Note that this is fairly expensive (see NEEDSWORK comment in the
> > code).
> 
> And this is with the "notes-cache".
> [...]
> +static int get_tip_weight(struct commit *commit)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +	size_t sz;
> +	int weight;
> +	char *note = notes_cache_get(&weight_cache, commit->object.sha1, &sz);
> +
> +	if (note && !strtol_i(note, 10, &weight)) {
> +		free(note);
> +		return weight;
> +	}
> +	free(note);
> +
> +	weight = compute_tip_weight(commit);
> +	strbuf_addf(&buf, "%d", weight);
> +	notes_cache_put(&weight_cache, commit->object.sha1,
> +			buf.buf, buf.len);
> +	strbuf_release(&buf);
> +	weight_cache_updated = 1;
> +	return weight;
> +}

It looks like you didn't update compute_tip_weight at all, so it will
still do the full traversal down to the roots. I wonder if you can
define the weight as a recursive function of the parents. Using the sum
of the weights of the parents is not right, because you would
double-count in this situation:

  A--B--C--D---M
      \       /
       E--F--G

That would double-count "A" and "B" in this example. But maybe there is
a clever way to define it that avoids that.

The advantage would be that you could cheaply find the weights of new
commits by only traversing back to the last cached one. I did something
similar with the generation number cache (but the recursive definition
is easier there).

> +	if (use_weight)
> +		notes_cache_init(&weight_cache, "name-rev-weight", "2012-08-29");

Is that a sufficient validity field? What about grafts or replace
objects? For the generation cache, I used a hash of the graft and
replace fields.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] name-rev: --weight option (WIP)
  2012-08-29 21:17       ` [PATCH 3/3] name-rev: --weight option (WIP) Junio C Hamano
  2012-08-29 23:37         ` Junio C Hamano
@ 2012-08-30  3:51         ` Jeff King
  2012-08-30  4:09           ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2012-08-30  3:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Greg KH

On Wed, Aug 29, 2012 at 02:17:24PM -0700, Junio C Hamano wrote:

> Instead of naming a rev after a tip that is topologically closest,
> use the tip that is the oldest one among those which contain the
> rev.

When you wrote "oldest" here, I thought that meant you would do a
comparison on the taggerdate. But reading the implementation, you really
mean "topologically oldest".

I wonder, though, if the former would be sufficient for most people. Or
even just sorting based on the tag name. For example, taking Greg's
original example:

  $ commit=0136db586c028f71e7cc21cc183064ff0d5919
  $ oldest_tag=`git tag --contains $commit | sort -V | head -1`
  $ git name-rev --refs="refs/tags/$oldest_tag" $commit
  0136db586c028f71e7cc21cc183064ff0d5919 tags/v3.5~335^2~81^2~76

Of course "sort -V" is not portable, and it actually places -rc tags
after release tags (note that we found v3.5 here, not v3.5-rc1). But
that is an implementation detail that could be solved (either by a
better comparison function, or by just using taggerdate instead).

In some ways it is not as elegant (clock skew in your tag dates would be
relevant), but it is simple and performs well without needing to manage
a cache.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] name-rev: --weight option (WIP)
  2012-08-30  3:36           ` Jeff King
@ 2012-08-30  3:53             ` Junio C Hamano
  2012-08-30  3:55               ` Jeff King
  2012-08-30 15:59               ` Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-08-30  3:53 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Greg KH

Jeff King <peff@peff.net> writes:

> I wonder if you can
> define the weight as a recursive function of the parents.

I do not think we can.  A merge Z between X (that has N commits
behind it) and Y (that has M commits behind it) has at most N+M+1
commits behind it (counting itself), but we cannot tell how many
among these N and M are shared.

> That would double-count "A" and "B" in this example. But maybe there is
> a clever way to define it that avoids that.

We've dealt with this issue long time ago when we optimized the
bisection count, which involves exactly the same issue.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] name-rev: --weight option (WIP)
  2012-08-30  3:53             ` Junio C Hamano
@ 2012-08-30  3:55               ` Jeff King
  2012-08-30  4:10                 ` Junio C Hamano
  2012-08-30 15:59               ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Jeff King @ 2012-08-30  3:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Greg KH

On Wed, Aug 29, 2012 at 08:53:49PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I wonder if you can
> > define the weight as a recursive function of the parents.
> 
> I do not think we can.  A merge Z between X (that has N commits
> behind it) and Y (that has M commits behind it) has at most N+M+1
> commits behind it (counting itself), but we cannot tell how many
> among these N and M are shared.
> 
> > That would double-count "A" and "B" in this example. But maybe there is
> > a clever way to define it that avoids that.
> 
> We've dealt with this issue long time ago when we optimized the
> bisection count, which involves exactly the same issue.

OK. I didn't think too hard about it, so I'll trust you that it is not
easy. I wonder if using the generation number would be another way of
defining "oldest" that would be easier to calculate.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] name-rev: --weight option (WIP)
  2012-08-30  3:51         ` Jeff King
@ 2012-08-30  4:09           ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-08-30  4:09 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Greg KH

Jeff King <peff@peff.net> writes:

> When you wrote "oldest" here, I thought that meant you would do a
> comparison on the taggerdate. But reading the implementation, you really
> mean "topologically oldest".
>
> I wonder, though, if the former would be sufficient for most people.

Even without clock skew, timestamps are inappropriate measure for
the purpose of measuring "this dates back to...", unless you are
limiting yourself to very linear history.  Even in our own history,
v1.7.6.6 is 4 months newer than v1.7.7, for example.  The same holds
true between v1.7.6.6^0 and v1.7.7^0, so the story does not change
whether you use tagger date or committer date.

Basing this based on tag names is an attractive idea, but we would
need to devise a way for people to pass project specific tagname
comparison function from outside.  v3.2 is newer than v3.2-rc1 but
v3.2-bis may probably be newer than v3.2 and I do not think we want
to collect the rules and cast the logic in our code in stone.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] name-rev: --weight option (WIP)
  2012-08-30  3:55               ` Jeff King
@ 2012-08-30  4:10                 ` Junio C Hamano
  2012-08-30  4:15                   ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2012-08-30  4:10 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Greg KH

Jeff King <peff@peff.net> writes:

> OK. I didn't think too hard about it, so I'll trust you that it is not
> easy. I wonder if using the generation number would be another way of
> defining "oldest" that would be easier to calculate.

Go back to my illustration to Greg and think about the implication
of "TONS!" side branch in the picture has on the generation numbers.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] name-rev: --weight option (WIP)
  2012-08-30  4:10                 ` Junio C Hamano
@ 2012-08-30  4:15                   ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-08-30  4:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Greg KH

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> OK. I didn't think too hard about it, so I'll trust you that it is not
>> easy. I wonder if using the generation number would be another way of
>> defining "oldest" that would be easier to calculate.
>
> Go back to my illustration to Greg and think about the implication
> of "TONS!" side branch in the picture has on the generation numbers.

That is, we want some number (I called it "weight") that grows when
"TONS!" side branch is heavy.  Generation numbers are about giving
larger numbers for _longer_ chains, but a long and thin side branch
does not contribute as much as a medium length but a very fat side
branch when merged.

So generation numbers might be a candidate for approximation, but I
do not think it will give us a _good_ approximation.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] "git name-rev --weight"
  2012-08-29 21:17     ` [PATCH 0/3] "git name-rev --weight" Junio C Hamano
                         ` (2 preceding siblings ...)
  2012-08-29 21:17       ` [PATCH 3/3] name-rev: --weight option (WIP) Junio C Hamano
@ 2012-08-30  7:06       ` Philip Oakley
  2012-08-30 15:54         ` Junio C Hamano
  3 siblings, 1 reply; 19+ messages in thread
From: Philip Oakley @ 2012-08-30  7:06 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Greg KH

From: "Junio C Hamano" <gitster@pobox.com>
Sent: Wednesday, August 29, 2012 10:17 PM
> So here is an attempt to teach "name-rev" a mode that tries to base
> its name on oldest tag that can reach the commit.  It needs the
> reset_revision_walk() call recently added to the revision traversal
> API, and applies to bcc0a3e (v1.7.11-rc0~111^2~2) or newer.
>
> Note that this can benefit from caching, as the "weight" of the tag
> (rather, the commit that is tagged) will never change once a history
> is made, but that part is left as an exercise to the reader.

Is "--weight" the right term to use for the user (cli) interface? 
Wouldn't '--oldest' (or similar) be a better statement of what is 
desired (absent clock skew).

While 'weight' may be a good internal technical description it didn't 
convey to me what was being sought (maybe -- deepest'?).

>
> It correctly names 0136db586c in the kernel history as based on
> v3.5-rc1 as tags/v3.5-rc1~83^2~81^2~76, not on v3.6-rc1, as we saw
> on the list recently.
>
> Once it is verified to operate correctly and updated to perform
> properly, we can start passing --weight when "describe --contains"
> runs the command.
>
> Junio C Hamano (3):
>  name-rev: lose unnecessary typedef
>  name_rev: clarify when a new tip-name is assigned to a commit
>  name-rev: --weight option (WIP)
>
> builtin/name-rev.c | 142 
> ++++++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 120 insertions(+), 22 deletions(-)
>
> -- 
> 1.7.12.285.ga3d5fc0
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/3] "git name-rev --weight"
  2012-08-30  7:06       ` [PATCH 0/3] "git name-rev --weight" Philip Oakley
@ 2012-08-30 15:54         ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-08-30 15:54 UTC (permalink / raw)
  To: Philip Oakley; +Cc: git, Greg KH, Jeff King

"Philip Oakley" <philipoakley@iee.org> writes:

> Is "--weight" the right term to use for the user (cli) interface?
> Wouldn't '--oldest' (or similar) be a better statement of what is
> desired (absent clock skew).
>
> While 'weight' may be a good internal technical description it didn't
> convey to me what was being sought (maybe -- deepest'?).

I agree with you that weight represents what it internally does.  I
however think that "oldest" is not quite good, as it still leaves
the source of possible confusion.  It has at least 3 (or 4,
depending on how you count) possible meanings.

 - Is it the one with the oldest timestamp (and if so, do we use the
   committer date, or do we use the tagger date that may be much
   newer than the committer date)?

 - Is it the one with its longest path down to the root is the
   shortest (i.e. with smallest generation number)?

 - Is it the one with the smallest number of ancestor commits?

For the purpose of "oldest tag that contains this commit", I think
the last one would give the most intuitive answer, but depending on
your use case, you may want to enhance the command to support other
definition of "oldest"; it does not feel quite right to have this
particular definition (the last one) squat on the generic "--oldest"
name.

We could punt to tautology and call it "--contains", meaning that is
the logic used to implement "describe --contains" ;-) but that is
not satisfactory, either.

I dunno.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] name-rev: --weight option (WIP)
  2012-08-30  3:53             ` Junio C Hamano
  2012-08-30  3:55               ` Jeff King
@ 2012-08-30 15:59               ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2012-08-30 15:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Greg KH

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> I wonder if you can
>> define the weight as a recursive function of the parents.
>
> I do not think we can.  A merge Z between X (that has N commits
> behind it) and Y (that has M commits behind it) has at most N+M+1
> commits behind it (counting itself), but we cannot tell how many
> among these N and M are shared.

You can theoretically take all the merge bases between X and Y,
magically come up with the "weight" of a fictitious merge across
these merge bases, and subtract that number from N+M to arrive at
the weight of Z, I suppose, but it is not clear what an efficient
implementation of that "magically" part looks like.  I think that is
where we stopped when we tried to optimize the "rev-list --bisect"
node weighting logic; it punts handling the merges, and only
optimizes single strand of pearls on top of a merge with a known
weight.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-08-30 16:00 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-29  4:48 Funny 'git describe --contains' output Greg KH
2012-08-29  5:57 ` Junio C Hamano
2012-08-29  6:36   ` Junio C Hamano
2012-08-29 18:17     ` Greg KH
2012-08-29 21:17     ` [PATCH 0/3] "git name-rev --weight" Junio C Hamano
2012-08-29 21:17       ` [PATCH 1/3] name-rev: lose unnecessary typedef Junio C Hamano
2012-08-29 21:17       ` [PATCH 2/3] name_rev: clarify when a new tip-name is assigned to a commit Junio C Hamano
2012-08-29 21:17       ` [PATCH 3/3] name-rev: --weight option (WIP) Junio C Hamano
2012-08-29 23:37         ` Junio C Hamano
2012-08-30  3:36           ` Jeff King
2012-08-30  3:53             ` Junio C Hamano
2012-08-30  3:55               ` Jeff King
2012-08-30  4:10                 ` Junio C Hamano
2012-08-30  4:15                   ` Junio C Hamano
2012-08-30 15:59               ` Junio C Hamano
2012-08-30  3:51         ` Jeff King
2012-08-30  4:09           ` Junio C Hamano
2012-08-30  7:06       ` [PATCH 0/3] "git name-rev --weight" Philip Oakley
2012-08-30 15:54         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.