All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] tag: add -i and --introduced modifier for --contains
@ 2014-04-16 20:58 Luis R. Rodriguez
  2014-04-16 22:02 ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Luis R. Rodriguez @ 2014-04-16 20:58 UTC (permalink / raw)
  To: git; +Cc: Luis R. Rodriguez, Jiri Slaby, Andreas Schwab, Jan Kara

From: "Luis R. Rodriguez" <mcgrof@suse.com>

Upstream Linux kernel commit c5905afb was introduced on v3.4 but
git describe --contains yields v3.5 while if we use git to look
for the first parent with git describe --first-parent yields
v3.3. The reason for this seems to be that the merge commit that
introduced c5905afb was based on v3.3. At least for --contains
its unclear to me why we get v3.5, the result is not intuitive,
as for --first-parent the issue is that the first parent actually
*is* v3.3. The easiest way to address this it to rely on on the
git tag --contains implmenetation and add a modifier that specifies
you want the tag that first introduced the specified commit.

mcgrof@ergon ~/linux (git::master)$ git tag -i --contains c5905afb
v3.4

mcgrof@ergon ~/linux (git::master)$ git tag --introduced --contains c5905afb
v3.4

Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Andreas Schwab <schwab@suse.de>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Luis R. Rodriguez <mcgrof@suse.com>
---
 builtin/tag.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/builtin/tag.c b/builtin/tag.c
index 6c7c6bd..65a939b 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -21,7 +21,7 @@
 static const char * const git_tag_usage[] = {
 	N_("git tag [-a|-s|-u <key-id>] [-f] [-m <msg>|-F <file>] <tagname> [<head>]"),
 	N_("git tag -d <tagname>..."),
-	N_("git tag -l [-n[<num>]] [--contains <commit>] [--points-at <object>] "
+	N_("git tag -l [-n[<num>]] [--contains <commit>] [ -i | --introduced --contains <commit> ] [--points-at <object>] "
 		"\n\t\t[<pattern>...]"),
 	N_("git tag -v <tagname>..."),
 	NULL
@@ -195,13 +195,18 @@ static int sort_by_version(const void *a_, const void *b_)
 }
 
 static int list_tags(const char **patterns, int lines,
-		     struct commit_list *with_commit, int sort)
+		     struct commit_list *with_commit, int sort,
+		     int introduced)
 {
 	struct tag_filter filter;
 
 	filter.patterns = patterns;
 	filter.lines = lines;
-	filter.sort = sort;
+	if (introduced) {
+		sort = VERCMP_SORT;
+		filter.sort = sort;
+	} else
+		filter.sort = sort;
 	filter.with_commit = with_commit;
 	memset(&filter.tags, 0, sizeof(filter.tags));
 	filter.tags.strdup_strings = 1;
@@ -216,8 +221,11 @@ static int list_tags(const char **patterns, int lines,
 			for (i = filter.tags.nr - 1; i >= 0; i--)
 				printf("%s\n", filter.tags.items[i].string);
 		else
-			for (i = 0; i < filter.tags.nr; i++)
+			for (i = 0; i < filter.tags.nr; i++) {
 				printf("%s\n", filter.tags.items[i].string);
+				if (introduced)
+					break;
+			}
 		string_list_clear(&filter.tags, 0);
 	}
 	return 0;
@@ -493,6 +501,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 	char *cleanup_arg = NULL;
 	int annotate = 0, force = 0, lines = -1;
 	int cmdmode = 0, sort = 0;
+	int introduced = 0;
 	const char *msgfile = NULL, *keyid = NULL;
 	struct msg_arg msg = { 0, STRBUF_INIT };
 	struct commit_list *with_commit = NULL;
@@ -511,6 +520,7 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 			     N_("tag message"), parse_msg_arg),
 		OPT_FILENAME('F', "file", &msgfile, N_("read message from file")),
 		OPT_BOOL('s', "sign", &opt.sign, N_("annotated and GPG-signed tag")),
+		OPT_BOOL('i', "introduced", &introduced, N_("print the first tag that introduced the commit")),
 		OPT_STRING(0, "cleanup", &cleanup_arg, N_("mode"),
 			N_("how to strip spaces and #comments from message")),
 		OPT_STRING('u', "local-user", &keyid, N_("key-id"),
@@ -576,7 +586,8 @@ int cmd_tag(int argc, const char **argv, const char *prefix)
 		}
 		if (lines != -1 && sort)
 			die(_("--sort and -n are incompatible"));
-		ret = list_tags(argv, lines == -1 ? 0 : lines, with_commit, sort);
+		ret = list_tags(argv, lines == -1 ? 0 : lines, with_commit,
+				sort, introduced);
 		if (column_active(colopts))
 			stop_column_filter();
 		return ret;
-- 
1.9.0

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-16 20:58 [PATCH] tag: add -i and --introduced modifier for --contains Luis R. Rodriguez
@ 2014-04-16 22:02 ` Junio C Hamano
  2014-04-16 22:35   ` Luis R. Rodriguez
  2014-04-17  7:17   ` Andreas Schwab
  0 siblings, 2 replies; 16+ messages in thread
From: Junio C Hamano @ 2014-04-16 22:02 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: git, Luis R. Rodriguez, Jiri Slaby, Andreas Schwab, Jan Kara

"Luis R. Rodriguez" <mcgrof@do-not-panic.com> writes:

> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>
> Upstream Linux kernel commit c5905afb was introduced on v3.4 but
> git describe --contains yields v3.5

Actually, "describe --contains" should yield v3.5-rc1~120^3~76^2,
not v3.5.

And you are right that the commit is contained in v3.4, so we also
should be able to describe it as v3.4~479^2~9^2 as well.

And between v3.4 and v3.5-rc1, the latter is a closer anchor point
for that commit (v3.5-rc1 only needs about 200 hops to reach the
commit, while from v3.4 you would need close to 500 hops), hence we
end up picking the latter as "a better answer".

Now, with the explanation of how/why this happens behind us, I see
two possible issues with this patch:

 - The reason a human-user rejects v3.5-rc1~120^3~76^2 as the
   solution and favor v3.4~479^2~9^2 could be because of the -rc1
   part in the answer.  Perhaps we would want an option that affects
   which tags are to be used (and which tags are to be excluded) as
   anchoring points?

 - If we are truly interested in finding out the "earliest tag that
   contains the given commit", shouldn't we be ignoring the tagname
   and go with the tag with the oldest timestamp?  After all, there
   may be a fix merged to v7.0 first on April 1st, and then on a
   later date the same fix may be merged to the maintenance track to
   be tagged as v6.9.1 on May 5th, and in such a case, wouldn't you
   want to say that the fix first appeared on v7.0 on April 1st,
   instead of on May 5th?

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-16 22:02 ` Junio C Hamano
@ 2014-04-16 22:35   ` Luis R. Rodriguez
  2014-04-17 17:04     ` Junio C Hamano
  2014-04-17  7:17   ` Andreas Schwab
  1 sibling, 1 reply; 16+ messages in thread
From: Luis R. Rodriguez @ 2014-04-16 22:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Luis R. Rodriguez, Jiri Slaby, Andreas Schwab, Jan Kara

On Wed, Apr 16, 2014 at 3:02 PM, Junio C Hamano <gitster@pobox.com> wrote:
> "Luis R. Rodriguez" <mcgrof@do-not-panic.com> writes:
>
>> From: "Luis R. Rodriguez" <mcgrof@suse.com>
>>
>> Upstream Linux kernel commit c5905afb was introduced on v3.4 but
>> git describe --contains yields v3.5
>
> Actually, "describe --contains" should yield v3.5-rc1~120^3~76^2,
> not v3.5.

Yes, indeed thanks, sorry I should have been explicit.

> And you are right that the commit is contained in v3.4, so we also
> should be able to describe it as v3.4~479^2~9^2 as well.

That'd be swell :)

> And between v3.4 and v3.5-rc1, the latter is a closer anchor point
> for that commit (v3.5-rc1 only needs about 200 hops to reach the
> commit, while from v3.4 you would need close to 500 hops),

Ah! Thanks for explaining this mysterious puzzle to me. I'm a bit
perplexed why still. Can I trouble you for a little elaboration here?
How could one view from a commit merged on v3.4 possibly yield more
commits to v3.4 than to v3.5 ? Is it because it starts counting on the
merge's parent (v3.3) ?

> hence we
> end up picking the latter as "a better answer".
>
> Now, with the explanation of how/why this happens behind us, I see
> two possible issues with this patch:
>
>  - The reason a human-user rejects v3.5-rc1~120^3~76^2 as the
>    solution and favor v3.4~479^2~9^2 could be because of the -rc1
>    part in the answer.  Perhaps we would want an option that affects
>    which tags are to be used (and which tags are to be excluded) as
>    anchoring points?

I'd take an rc release as a blessed point too so not sure, and come to
think of it I'm not a bit perplexed why the results for my change did
not yield an rc1 as well.

>  - If we are truly interested in finding out the "earliest tag that
>    contains the given commit", shouldn't we be ignoring the tagname
>    and go with the tag with the oldest timestamp?  After all, there
>    may be a fix merged to v7.0 first on April 1st, and then on a
>    later date the same fix may be merged to the maintenance track to
>    be tagged as v6.9.1 on May 5th,

At least for Linux linux-3.X.y branches (one example linux-3.4.y) on
linux-stable has different commit IDs from patches cherry picked from
Linus' tree, and that patch just referneces the upstream commit from
Linus' tree on the commit log, but nothing more.

> and in such a case, wouldn't you  want to say that the fix first appeared on v7.0 on April 1st,
> instead of on May 5th?

Sure, but I'd expect the folks maintaining v6.9.x would just refer to
the upstream commit ID from v7.0.

  Luis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-16 22:02 ` Junio C Hamano
  2014-04-16 22:35   ` Luis R. Rodriguez
@ 2014-04-17  7:17   ` Andreas Schwab
  2014-04-17 17:05     ` Junio C Hamano
  1 sibling, 1 reply; 16+ messages in thread
From: Andreas Schwab @ 2014-04-17  7:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Luis R. Rodriguez, git, Luis R. Rodriguez, Jiri Slaby, Jan Kara

Junio C Hamano <gitster@pobox.com> writes:

> And you are right that the commit is contained in v3.4, so we also
> should be able to describe it as v3.4~479^2~9^2 as well.

IMHO it should be described as v3.4-rc1~192^2~9^2, which is what git
describe --contains --match=v3.4\* returns.  This path is only a few
commits longer than v3.5-rc1~120^3~76^2.

Andreas.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-16 22:35   ` Luis R. Rodriguez
@ 2014-04-17 17:04     ` Junio C Hamano
  2014-04-17 22:16       ` Jeff King
                         ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Junio C Hamano @ 2014-04-17 17:04 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: git, Luis R. Rodriguez, Jiri Slaby, Andreas Schwab, Jan Kara, Jeff King

"Luis R. Rodriguez" <mcgrof@do-not-panic.com> writes:

>> And between v3.4 and v3.5-rc1, the latter is a closer anchor point
>> for that commit (v3.5-rc1 only needs about 200 hops to reach the
>> commit, while from v3.4 you would need close to 500 hops),
>
> Ah! Thanks for explaining this mysterious puzzle to me. I'm a bit
> perplexed why still. Can I trouble you for a little elaboration here?
> How could one view from a commit merged on v3.4 possibly yield more
> commits to v3.4 than to v3.5 ? Is it because it starts counting on the
> merge's parent (v3.3) ?

The reason is very simple, once you realize that in a distributed
environment it is very common to fork off a new branch from an
ancient commit and then merging the result to a newer release
without merging it all the way down to older maintenance releases.

Try this sequence:

    1. start from say v3.4~1^2~2
    $ git checkout -b side v3.4~1^2~2

The history near v3.4 proper looks like this:

    $ git log --oneline -3 v3.4
    76e10d1 Linux 3.4
    d6c77973 Merge tag 'parisc-fixes' of git://git.kernel.o...
    5d12045 Merge branch 'x86/ld-fix' of git://git.kernel.o...

and the last merge before v3.4 brings three commits in to the
history:

    $ git log --oneline d6c77973^1..d6c77973^2
    b3cb867 [PARISC] fix panic on prefetch(NULL) on PA7300LC
    207f583 [PARISC] fix crash in flush_icache_page_asm on PA1.1
    5e18558 [PARISC] fix PA1.1 oops on boot

We just forked a new "side" branch off of the bottom one (5e18558).

    2. pretend a new development on this old codebase
    $ git commit --allow-empty -m "[PARISC] another"

    3. let's merge this to v3.5 and call the result v9.0
    $ git checkout v3.5
    $ git merge --no-edit side
    $ git tag -a -m 'Nine' v9.0

Think what just happened to v3.4~1^2~2, the fork-point of this new
side branch (I am not asking what *should* happen. This exercise is
only to illustrate how the commit v3.5-rc1~120^3~76^2 can be closer
to v3.5-rc1 than to v3.4 when it is reachable from both).

Here is how the topology looks like:

                   v3.4                  v9.0
             ---M---X---------------------Y
               /                         /
   ---A---B---C                         /
       \                               / 
        ------------------------------D (side)

where X is v3.4, M is d6c77973, A thru C are the PARISC patches,
D is the "another", and Y is the phoney version Nine we just made.
We are trying to "describe --contains" commit A.

If you start counting from the new tag v9.0, it is on the merged
side branch that brought in one new commit D, and in fact it is the
direct parent of it, so even without asking "describe --contains",
we know that it is v9.0^2~1.  That is 2 hops from v9.0 tag.  If you
count from v3.4, it is 4 hops.

And both of these tags X and Y contain the commit A.

Now, as to what *SHOULD* happen, I think the above exercise shows us
a way to define what the desired semantics is, without resorting to
heuristics (e.g. "which tag has older timestamp?" or "which tag's
name sorts older under Linux version naming convention?").

Commit A can be described in terms of both v3.4 and v9.0, and it may
be closer to v9.0 than v3.4, and under that definition "we pick the
closest tag", the current "describe --contains" behaviour may be
correct, but from the human point of view, it is *WRONG*.

It is wrong because v9.0 can reach v3.4.  So perhaps the rule should
be updated to do something like:

    - find candidate tags that can be used to "describe --contains"
      the commit A, yielding v3.4, v3.5 (not shown), and v9.0;

    - among the candidate tags, cull the ones that contain another
      candidate tag, rejecting v3.5 (not shown) and v9.0;

    - among the surviving tags, pick the closest.

Hmm?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-17  7:17   ` Andreas Schwab
@ 2014-04-17 17:05     ` Junio C Hamano
  2014-04-17 17:30       ` Andreas Schwab
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2014-04-17 17:05 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Luis R. Rodriguez, git, Luis R. Rodriguez, Jiri Slaby, Jan Kara

Andreas Schwab <schwab@suse.de> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> And you are right that the commit is contained in v3.4, so we also
>> should be able to describe it as v3.4~479^2~9^2 as well.
>
> IMHO it should be described as v3.4-rc1~192^2~9^2, which is what git
> describe --contains --match=v3.4\* returns.  This path is only a few
> commits longer than v3.5-rc1~120^3~76^2.

Sure. In my response to Luis, I assumed that rc tags are not as
desirable as the final release points for his purpose for whatever
reason, as Luis compared between v3.4 and v3.5-rc1~120^3~76^2, not
with v3.4-rc1 or any later rc.

I also think this illustrates my earlier point. Depending on the
project and the expectation of the users, which tags are good
candidates as anchor points differ.  Your example using --match
probably shows a good direction to go in---somehow tell Git which
tags to base the description on, to reject names that the users do
not want.

When your project does not mind basing the description on rc tags,
between v3.4-rc1~192^2~9^2 and v3.5-rc1~120^3~76^2, I am not sure if
we would want to say that "the former is not so longer than the
latter, so use that", or what kind of heuristics to employ to reach
that conclusion.  Date-based selection (i.e. earliest first) is one
possibility.  Tagname-based selection has the issue of having to
configure "whose version numbering convention would you use when
sorting tags, and how you would tell Git that sorting order rule?"

For a possible cleaner alternative semantics, see the other message
I just sent to the thread.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-17 17:05     ` Junio C Hamano
@ 2014-04-17 17:30       ` Andreas Schwab
  2014-04-17 18:49         ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Schwab @ 2014-04-17 17:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Luis R. Rodriguez, git, Luis R. Rodriguez, Jiri Slaby, Jan Kara

Junio C Hamano <gitster@pobox.com> writes:

> I also think this illustrates my earlier point. Depending on the
> project and the expectation of the users, which tags are good
> candidates as anchor points differ.  Your example using --match
> probably shows a good direction to go in---somehow tell Git which
> tags to base the description on, to reject names that the users do
> not want.

I've used --match only to force git describe to find a better match.

> When your project does not mind basing the description on rc tags,
> between v3.4-rc1~192^2~9^2 and v3.5-rc1~120^3~76^2, I am not sure if
> we would want to say that "the former is not so longer than the
> latter, so use that", or what kind of heuristics to employ to reach
> that conclusion.  Date-based selection (i.e. earliest first) is one
> possibility.  Tagname-based selection has the issue of having to
> configure "whose version numbering convention would you use when
> sorting tags, and how you would tell Git that sorting order rule?"

IMHO git should select based on topology: the first tag that isn't
contained in any other tag still containing the commit in question, only
when ambigous it needs to fall back to other criteria.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-17 17:30       ` Andreas Schwab
@ 2014-04-17 18:49         ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2014-04-17 18:49 UTC (permalink / raw)
  To: Andreas Schwab
  Cc: Luis R. Rodriguez, git, Luis R. Rodriguez, Jiri Slaby, Jan Kara

Andreas Schwab <schwab@suse.de> writes:

> Junio C Hamano <gitster@pobox.com> writes:
> ...
>> When your project does not mind basing the description on rc tags,
>> between v3.4-rc1~192^2~9^2 and v3.5-rc1~120^3~76^2, I am not sure if
>> we would want to say that "the former is not so longer than the
>> latter, so use that", or what kind of heuristics to employ to reach
>> that conclusion.  Date-based selection (i.e. earliest first) is one
>> possibility.  Tagname-based selection has the issue of having to
>> configure "whose version numbering convention would you use when
>> sorting tags, and how you would tell Git that sorting order rule?"
>
> IMHO git should select based on topology: the first tag that isn't
> contained in any other tag still containing the commit in question, only
> when ambigous it needs to fall back to other criteria.

I think we are in agreement.  In the part you chopped from your
quote, I said:

>> For a possible cleaner alternative semantics, see the other message
>> I just sent to the thread.

didn't I?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-17 17:04     ` Junio C Hamano
@ 2014-04-17 22:16       ` Jeff King
  2014-04-18 16:26         ` Junio C Hamano
  2014-04-22  4:04         ` W. Trevor King
  2014-04-18 23:17       ` Luis R. Rodriguez
  2014-04-22 10:27       ` Jan Kara
  2 siblings, 2 replies; 16+ messages in thread
From: Jeff King @ 2014-04-17 22:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Luis R. Rodriguez, git, Luis R. Rodriguez, Jiri Slaby,
	Andreas Schwab, Jan Kara

On Thu, Apr 17, 2014 at 10:04:52AM -0700, Junio C Hamano wrote:

> Commit A can be described in terms of both v3.4 and v9.0, and it may
> be closer to v9.0 than v3.4, and under that definition "we pick the
> closest tag", the current "describe --contains" behaviour may be
> correct, but from the human point of view, it is *WRONG*.
> 
> It is wrong because v9.0 can reach v3.4.  So perhaps the rule should
> be updated to do something like:
> 
>     - find candidate tags that can be used to "describe --contains"
>       the commit A, yielding v3.4, v3.5 (not shown), and v9.0;
> 
>     - among the candidate tags, cull the ones that contain another
>       candidate tag, rejecting v3.5 (not shown) and v9.0;
> 
>     - among the surviving tags, pick the closest.
> 
> Hmm?

Interesting.  I think that would cover some cases, but there are others
in which the tags are not direct descendants. For example, imagine you
have both a "master" and a "maint" branch. You fork a topic from an old
commit that both branches contain, and then independently merge the
topic to each branch. You then cut a release for each. So your graph
might look like:

 ---A---B---C-----D---E---F (maint, v3.4)
     \   \       /
      \   ---G-----H---I (master, v4.0)
       \       /  /
        ------J---

The fix is J, and it got merged up to maint at D, and to master at H.
v4.0 does not contain v3.4. What's the best description of J?

By the rules above, we hit the third rule "pick the closest". Which
means we choose v3.4 or v4.0 based solely on how many commits are
between the topic's merge and the tag release. Which has nothing at all
to do with the topic itself.

In this case we'd show v4.0 (because "J-H-I" is shorter than "J-D-E-F").
But I suspect most users would want to know v3.4, because they want to
know the "oldest" release they can move up to that contains the commit.
But that notion of oldness is not conveyed by the graph above; it's only
an artifact of the tag names.

So you can solve this by actually representing the relationship with a
merge. IOW, by merging v3.4 into v4.0 to say "yes, v4.0 is a superset".
And that's generally what we do in git.git, merging maint into master
periodically. But I imagine there are other possible workflows where
people do not do that "merge up", and the maint and master branches
diverge (and maybe they even cherry-pick from each other, but sometimes
merge if the fix can be based on a common ancestor, as in this case).

-Peff

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-17 22:16       ` Jeff King
@ 2014-04-18 16:26         ` Junio C Hamano
  2014-04-22  4:04         ` W. Trevor King
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2014-04-18 16:26 UTC (permalink / raw)
  To: Jeff King
  Cc: Luis R. Rodriguez, git, Luis R. Rodriguez, Jiri Slaby,
	Andreas Schwab, Jan Kara

Jeff King <peff@peff.net> writes:

>  ---A---B---C-----D---E---F (maint, v3.4)
>      \   \       /
>       \   ---G-----H---I (master, v4.0)
>        \       /  /
>         ------J---
>
> The fix is J, and it got merged up to maint at D, and to master at H.
> v4.0 does not contain v3.4. What's the best description of J?
>
> By the rules above, we hit the third rule "pick the closest". Which
> means we choose v3.4 or v4.0 based solely on how many commits are
> between the topic's merge and the tag release. Which has nothing at all
> to do with the topic itself.

Even if J..F and J..I were of the same hop-count, there is no
fundamental reason to choose one over the other.

What is "best" at that point depends on what the user wants to see.

 - Luis's case that started this thread may want to favor v3.4 if
   only because that "sounds" the smaller, even though v3.4 and v4.0
   in the illustration cannot be compared.

 - I think the "closest" we have had is primarily a heuristic to
   favour the result that is textually shorter.

 - And as I alluded to, "which one has the earliest timestamp?", is
   another valid question to ask.

In other words, there is no single "correct" answer, once you have
multiple canidates that are all valid from topological point of
view.

> In this case we'd show v4.0 (because "J-H-I" is shorter than "J-D-E-F").
> But I suspect most users would want to know v3.4, because they want to
> know the "oldest" release they can move up to that contains the commit.
> But that notion of oldness is not conveyed by the graph above; it's only
> an artifact of the tag names.

Yes, exactly.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-17 17:04     ` Junio C Hamano
  2014-04-17 22:16       ` Jeff King
@ 2014-04-18 23:17       ` Luis R. Rodriguez
  2014-04-18 23:36         ` Junio C Hamano
  2014-04-22 10:27       ` Jan Kara
  2 siblings, 1 reply; 16+ messages in thread
From: Luis R. Rodriguez @ 2014-04-18 23:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jiri Slaby, Andreas Schwab, Jan Kara, Jeff King

On Thu, Apr 17, 2014 at 10:04 AM, Junio C Hamano <gitster@pobox.com> wrote:
> "Luis R. Rodriguez" <mcgrof@do-not-panic.com> writes:
>
>>> And between v3.4 and v3.5-rc1, the latter is a closer anchor point
>>> for that commit (v3.5-rc1 only needs about 200 hops to reach the
>>> commit, while from v3.4 you would need close to 500 hops),
>>
>> Ah! Thanks for explaining this mysterious puzzle to me. I'm a bit
>> perplexed why still. Can I trouble you for a little elaboration here?

< Junio gives a great huge example>

Phew! Thanks for the elaborate explanation, this makes perfect sense now!

> Now, as to what *SHOULD* happen, I think the above exercise shows us
> a way to define what the desired semantics is, without resorting to
> heuristics (e.g. "which tag has older timestamp?" or "which tag's
> name sorts older under Linux version naming convention?").

I think ultimately this reveals that given that tags *can* be
arbitrary and subjective, and given that clocks can also pretty much
arbitrary 'git describe --contains' can and probably only should do
best effort (TM) and perhaps one thing to help is documenting this
issue well and provide a set of best practices that are supported for
tagging schemes. I can't describe how many libraries I've reviewed
about software versioning schemes and most of them support a huge
array of things, and funny enough the Linux versioning scheme, was not
supported well, for something so simple as versioning sort. This is
ultimately why I had to implement my own sort solution on rel-html. If
we agree on this we could just for example take on the Linux
versioning scheme as an emum and document that well both on code and a
wiki. More on this below.

With regards to timestamps: care must be taken given that we'd be
assuming that clocks are synchronized, this can likely yield incorrect
results on a distributed development environment with different time
zones, and it can also be easily cheated, which is why I was concerned
over using timestamps. Its still certainly something that can be
considered, but I've heard enough rants of a few maintainers about
crazy dates on patches which makes me believe this could actually be
an issue, specially if we speed up development and need higher degree
of resolution.

I know the above example but its perhaps worth mentioning how Linux
does not follow the above development model for merging stable fixes
or changes though, but it does not prevent folks from branching off of
older tags to do development which Linux will then pull. In Ingo's
case the issue then points then I think to another mild issue -- the
commit was developed on a v3.3 based tag, which is why 'git describe
--first-parent c5905afb' yields v3.3-rc1-41-gc5905af and not v3.4,
which *can also* be a bit perplexing if one does not understand the
above example you provided can be used for a development work flow for
code sent out to Linus. That said then, since we don't follow the
model you laid out it still reveals another issue, and I am not yet
sure I still understand why --contains yields a v3.5 tag in that case
since we ensured commits on v3.5 were already piled up on older
releases, or were being introduced newly on its own release. It smells
to me that the commit's first parent (which can be anything) is used
somehow here as a shortcut ?

This doesn't mean we can't use the work flow above for merging changes
from say a v3.4.x onto a v3.5 -- but we don't -- and perhaps as part
of the documentation about a scheme for Linux, we should advise
against such practices. In any case the closest thing I see we can use
upstream on Linux is 'git cherry-pick -x <commit-id>' but Greg doesn't
seem to use this and instead appends the commit with the respective
commit ID of the upstream gitsum. Both strategies yield different
commit IDs anyway, so neither practice should interrupt the 'git
describe --contains' practice. In the stable branches to find out when
a commit was introduced one would not rely on the commit ID on the
stable branch but instead of the commit ID of the 'upstream
reference'.

> Commit A can be described in terms of both v3.4 and v9.0,

And in the real example case, why *would* c5905afb' be be described in
terms of v3.5 instead of v3.4 ?

> and it may
> be closer to v9.0 than v3.4, and under that definition "we pick the
> closest tag", the current "describe --contains" behaviour may be
> correct, but from the human point of view, it is *WRONG*.

Yeap, if a development work flow does not follow a strict pattern
(maybe a .git/config variable?) perhaps 'git describe --contains'
should spit out a the few tags it does have?

> It is wrong because v9.0 can reach v3.4.  So perhaps the rule should
> be updated to do something like:
>
>     - find candidate tags that can be used to "describe --contains"
>       the commit A, yielding v3.4, v3.5 (not shown), and v9.0;

Sure.

>
>     - among the candidate tags, cull the ones that contain another
>       candidate tag, rejecting v3.5 (not shown) and v9.0;

Sounds good to me but that seems to stick the output to a scheme, ie,
would it support schemes without a v prefix for tags? In other words,
perhaps do this only for Linux scheme?

>     - among the surviving tags, pick the closest.
>
> Hmm?

Sounds good to me!

  Luis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-18 23:17       ` Luis R. Rodriguez
@ 2014-04-18 23:36         ` Junio C Hamano
  2014-04-22  0:38           ` Luis R. Rodriguez
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2014-04-18 23:36 UTC (permalink / raw)
  To: Luis R. Rodriguez; +Cc: git, Jiri Slaby, Andreas Schwab, Jan Kara, Jeff King

"Luis R. Rodriguez" <mcgrof@do-not-panic.com> writes:

> I think ultimately this reveals that given that tags *can* be
> arbitrary and subjective,...

Yes; see the part at the bottom.

>> Commit A can be described in terms of both v3.4 and v9.0,
>
> And in the real example case, why *would* c5905afb' be be described in
> terms of v3.5 instead of v3.4 ?

I am not interested in graphing that particular history between v3.4
and v3.5 myself.  If you are interested, I already gave you enough
information on how to figure that out.

>>     - find candidate tags that can be used to "describe --contains"
>>       the commit A, yielding v3.4, v3.5 (not shown), and v9.0;
>
>>     - among the candidate tags, cull the ones that contain another
>>       candidate tag, rejecting v3.5 (not shown) and v9.0;
>
>>     - among the surviving tags, pick the closest.
>>
>> Hmm?
>
> Sounds good to me!

Not so fast ;-)

My other message to Peff in response to his another example has an
updated position on this.  "Reject candidates that can reach other
candidates" is universally correct, but after that point, there are
at least three but probably more options that suit preference of
different people and project to break ties:

 - Your case that started this thread may want to favor v3.4 if only
   because that v3.4 _sounds_ smaller than v4.0 (in Peff's example),
   even when v3.4 and v4.0 do not have ancestry relationship.

 - The "closest" we have had is a heuristic to produce a result that
   is textually shorter.

 - And as I alluded to, "which one has the earliest timestamp?", is
   another valid question to ask.

And there may be more to appear.  A new command line option (and
possibly a new configuration) to choose from these three (and more
heuristics that will be added later) would be necessary.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-18 23:36         ` Junio C Hamano
@ 2014-04-22  0:38           ` Luis R. Rodriguez
  0 siblings, 0 replies; 16+ messages in thread
From: Luis R. Rodriguez @ 2014-04-22  0:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jiri Slaby, Andreas Schwab, Jan Kara, Jeff King

On Fri, Apr 18, 2014 at 4:36 PM, Junio C Hamano <gitster@pobox.com> wrote:
> "Luis R. Rodriguez" <mcgrof@do-not-panic.com> writes:
>
>> I think ultimately this reveals that given that tags *can* be
>> arbitrary and subjective,...
>
> Yes; see the part at the bottom.
>
>>> Commit A can be described in terms of both v3.4 and v9.0,
>>
>> And in the real example case, why *would* c5905afb' be be described in
>> terms of v3.5 instead of v3.4 ?
>
> I am not interested in graphing that particular history between v3.4
> and v3.5 myself.  If you are interested, I already gave you enough
> information on how to figure that out.

I was alluding to another possible issue here, my concern was that the
commit's parent (which is not really the point at which it was merged,
but rather where the topic got forked off to be worked on) could be
used for as reference points but clearly its not given the nature of
how name-rev was implemented. I still do see some possible issues with
it's parent on other commands (but I haven't studied the other's
implementation) that reveals some of my original concerns, but its
unclear if they are related. I also found that if we didn't want to
rely on dates or start defining naming convention we may want to
reconsider the name_rev() recursive implementation. I'll illustrate a
few results that might help to show my concerns for both other
commands perhaps using the parent erroneously, and a possible
alternative implementation for name_rev() or at the very least
contains.

[0] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.5| grep
^commit | wc -l
24878
[1] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.4| grep
^commit | wc -l
13106
[2] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.3| grep
^commit | wc -l
1360

Now that I revised name_rev.c I see the recursive nature of name_rev()
works top down from each tag down to each v* tag object and for each
actual commit pegs a name on it. How we rule out each tag under this
implementation is not that obvious to me, specially when results like
[0] and [1] reveal v3.4 should be 'shorter' in light of number of
commits. I see now how we don't update a commit's name if other
crucial information such as the ones discussed on this thread might be
important for the user, and I can see how this can help but an
alternative approach, which is what I expected to see implemented at
least for 'git describe --contains', would have been to see how many
commits are present from the commit's *merged* upstream parent (not
the actual parent as in c5905afb's commit case its v3.3 which is not
where it got merged). Getting the smallest number of commits under
this logic and stopping when we don't find any commits should yield us
the base tag under which the commit was merged, without any heuristics
on dates. This however applies to Linux though given that we don't
merge commits on stable branches but rather create new commits and
reference the upstream sha1sum, a practice which also solves the
problem Jeff pointed out.

The results for command [2] above however a bit surprising, I'd take a
look but I should go back to look at other stuff, figured I'd at least
bring it up now as it seems relevant.

>>>     - find candidate tags that can be used to "describe --contains"
>>>       the commit A, yielding v3.4, v3.5 (not shown), and v9.0;
>>
>>>     - among the candidate tags, cull the ones that contain another
>>>       candidate tag, rejecting v3.5 (not shown) and v9.0;
>>
>>>     - among the surviving tags, pick the closest.
>>>
>>> Hmm?
>>
>> Sounds good to me!
>
> Not so fast ;-)
>
> My other message to Peff in response to his another example has an
> updated position on this.  "Reject candidates that can reach other
> candidates" is universally correct, but after that point, there are
> at least three but probably more options that suit preference of
> different people and project to break ties:
>
>  - Your case that started this thread may want to favor v3.4 if only
>    because that v3.4 _sounds_ smaller than v4.0 (in Peff's example),
>    even when v3.4 and v4.0 do not have ancestry relationship.
>
>  - The "closest" we have had is a heuristic to produce a result that
>    is textually shorter.
>
>  - And as I alluded to, "which one has the earliest timestamp?", is
>    another valid question to ask.

The first one above can be subjective if and only if the Linux
upstream model of dealing with stable branches is not followed. In
other words I think its a non issue if you create new commits on the
stable branches instead of merge stuff onto them. This however is
technical practice and I guess not everyone follows.

> And there may be more to appear.  A new command line option (and
> possibly a new configuration) to choose from these three (and more
> heuristics that will be added later) would be necessary.

Yeah this is rather complex, the resolutions to the issue in the ways
you've described seem reasonable to me but do wonder if this can be
simplified by reevaluating how the candidates are considered. You'd
know better :)

 Luis

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-17 22:16       ` Jeff King
  2014-04-18 16:26         ` Junio C Hamano
@ 2014-04-22  4:04         ` W. Trevor King
  1 sibling, 0 replies; 16+ messages in thread
From: W. Trevor King @ 2014-04-22  4:04 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Junio C Hamano, git, Jiri Slaby, Andreas Schwab, Jan Kara, Jeff King

[-- Attachment #1: Type: text/plain, Size: 4533 bytes --]

On Mon, Apr 21, 2014 at 05:38:34PM -0700, Luis R. Rodriguez wrote:
> [0] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.5| grep
> ^commit | wc -l
> 24878
> [1] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.4| grep
> ^commit | wc -l
> 13106
> [2] mcgrof@ergon ~/linux (git::master)$ git log c5905afb..v3.3| grep
> ^commit | wc -l
> 1360

From gitrevisions(7), r1..r2 is “commits that are reachable from r2
excluding those that are reachable from r1”.  Using Peff's example:

On Thu, Apr 17, 2014 at 06:16:20PM -0400, Jeff King wrote:
>  ---A---B---C-----D---E---F (maint, v3.4)
>      \   \       /
>       \   ---G-----H---I (master, v4.0)
>        \       /  /
>         ------J---
> 
> The fix is J, and it got merged up to maint at D, and to master at H.
> v4.0 does not contain v3.4. What's the best description of J?

J..v3.4 is going to include B, C, D, E and F.  However, the “distance”
used by ‘git describe’ uses the shortest path between the commits
(J-D-E-F), which doesn't care about development between A and D.

> The results for command [2] above however a bit surprising, I'd take a
> look but I should go back to look at other stuff, figured I'd at least
> bring it up now as it seems relevant.

Here's a simplified graph with d1-* tags for the v3.5-rc1~120^3~76^2
description and d2-* tags for the v3.4~479^2~9^2 description [1]:

  * f8f5701 (tag: v3.5-rc1) Linux 3.5-rc1
  * 912afc3 (tag: d1-F) Merge tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm
  *   56edab3 (tag: d1-E) Merge branches 'perf-urgent-for-linus' and 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
  |\  
  | * ab0cce5 (tag: d1-D) Revert "sched, perf: Use a single callback into the scheduler"
  | * 26252ea (tag: d1-C-1, tag: d1-C) perf evlist: Show event attribute details
  | *   a385ec4 (tag: d1-C-64) Merge tag 'v3.4-rc2' into perf/core
  | |\
  | * \ 659c36f (tag: d1-C-65) Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
  | |\ \
  | | * | 5a7ed29 (tag: d1-C-65-2) perf record: Use sw counter only if hw pmu is not detected
  * | |/  76e10d1 (tag: v3.4) Linux 3.4
  | |/|  
  |/| |  
  * |/ dd775ae (tag: v3.4-rc1) Linux 3.4-rc1
  |/|  
  * |  c5bc437 Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
  |\|  
  | * 9521d83 (tag: d1-C-66) Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
  * |   9c2b957 (tag: d2-E) Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
  |\ \  
  | |/  
  | * bea95c1 (tag: d2-D, tag: d1-C-67) Merge branch 'perf/hw-branch-sampling' into perf/core
  | * f9b4eeb (tag: d2-C, tag: d1-C-68) perf/x86: Prettify pmu config literals
  | * a706d4f (tag: d2-B, tag: d1-C-76, tag: d1-B) Merge branch 'perf/jump-labels' into perf/core
  | * c5905af (tag: A) static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()
  * | c16fa4f (tag: v3.3) Linux 3.3
  |/  
  * dcd6c92 (tag: v3.3-rc1) Linux 3.3-rc1

This shows the v3.4-rc1 bypass from 9521d83 (d1-C-66) to 659c36f
(d1-C-65) which sets up the v3.5-rc1~120^3~76 description.  It also
shows the c5905afb..v3.3 commits on the branch from c5905af's fork
(between v3.3-rc1 and v3.3) and v3.3.

Cheers,
Trevor

[1]: The simplified graph is from:

  $ git tag A c5905afb
  $ git tag d1-B v3.5-rc1~120^3~76
  $ git tag d1-C v3.5-rc1~120^3~1
  $ git tag d1-D v3.5-rc1~120^3
  $ git tag d1-E v3.5-rc1~120
  $ git tag d1-F v3.5-rc1~1
  $ for x in $(seq 76); do git tag d1-C-$x v3.5-rc1~120^3~$x; done
  $ git tag d1-C-65-2 d1-C-65^2
  $ git tag d2-B v3.4~479^2~9
  $ git tag d2-C v3.4~479^2~1
  $ git tag d2-D v3.4~479^2
  $ git tag d2-E v3.4~479
  $ git tag -d sound-fixes sound-3.4 v3.3-rc{2,3,4,5,6,7} v3.4-rc{2,3,4,5,6,7}
  $ git log --graph --topo-order --oneline --decorate --simplify-by-decoration v3.5-rc1
  …simplified graph…
  $ git tag -d A d1-{B,C,D,E,F} d2-{B,C,D,E} d1-C-65-2 
  $ for x in $(seq 76); do git tag -d d1-C-$x; done

With some additional tweaks to cull the d1-C-* bits we don't care
about and clear up the 659c36f (d1-C-65) merge.

-- 
This email may be signed or encrypted with GnuPG (http://www.gnupg.org).
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-17 17:04     ` Junio C Hamano
  2014-04-17 22:16       ` Jeff King
  2014-04-18 23:17       ` Luis R. Rodriguez
@ 2014-04-22 10:27       ` Jan Kara
  2014-04-22 17:58         ` Junio C Hamano
  2 siblings, 1 reply; 16+ messages in thread
From: Jan Kara @ 2014-04-22 10:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Luis R. Rodriguez, git, Luis R. Rodriguez, Jiri Slaby,
	Andreas Schwab, Jan Kara, Jeff King

On Thu 17-04-14 10:04:52, Junio C Hamano wrote:
> So perhaps the rule should be updated to do something like:
> 
>     - find candidate tags that can be used to "describe --contains"
>       the commit A, yielding v3.4, v3.5 (not shown), and v9.0;
> 
>     - among the candidate tags, cull the ones that contain another
>       candidate tag, rejecting v3.5 (not shown) and v9.0;
>
>     - among the surviving tags, pick the closest.
  I guess all parties agree with the first two points (and actually I would
prefer not to assume anything about tag names and consider v3.4-rc1 as good
as v3.4). Regarding the strategy what to select when there are several
remaining tags after first two steps I would prefer to output all such
tags. As people have mentioned in this thread it varies a lot between
projects what people want to see (and in some cases I can imagine people
really *want* to see all the tags). So printing all such tags would let
them select the desired tag with grep or some more elaborate scripting...
Just a thought.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tag: add -i and --introduced modifier for --contains
  2014-04-22 10:27       ` Jan Kara
@ 2014-04-22 17:58         ` Junio C Hamano
  0 siblings, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2014-04-22 17:58 UTC (permalink / raw)
  To: Jan Kara
  Cc: Luis R. Rodriguez, git, Luis R. Rodriguez, Jiri Slaby,
	Andreas Schwab, Jeff King

Jan Kara <jack@suse.cz> writes:

> On Thu 17-04-14 10:04:52, Junio C Hamano wrote:
>> So perhaps the rule should be updated to do something like:
>> 
>>     - find candidate tags that can be used to "describe --contains"
>>       the commit A, yielding v3.4, v3.5 (not shown), and v9.0;
>> 
>>     - among the candidate tags, cull the ones that contain another
>>       candidate tag, rejecting v3.5 (not shown) and v9.0;
>>
>>     - among the surviving tags, pick the closest.
> ...
> Regarding the strategy what to select when there are several
> remaining tags after first two steps I would prefer to output all such
> tags.

Yes, as I mentioned in another subthread ($gmane/246488), different
projects want different tie-breaking rules at the third step, and
your "show all to give more information to the user" could be
another mode of operation.

I offhand do not think the current name-rev machinery is set up to
compute your variant easily, though.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-04-22 17:58 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-16 20:58 [PATCH] tag: add -i and --introduced modifier for --contains Luis R. Rodriguez
2014-04-16 22:02 ` Junio C Hamano
2014-04-16 22:35   ` Luis R. Rodriguez
2014-04-17 17:04     ` Junio C Hamano
2014-04-17 22:16       ` Jeff King
2014-04-18 16:26         ` Junio C Hamano
2014-04-22  4:04         ` W. Trevor King
2014-04-18 23:17       ` Luis R. Rodriguez
2014-04-18 23:36         ` Junio C Hamano
2014-04-22  0:38           ` Luis R. Rodriguez
2014-04-22 10:27       ` Jan Kara
2014-04-22 17:58         ` Junio C Hamano
2014-04-17  7:17   ` Andreas Schwab
2014-04-17 17:05     ` Junio C Hamano
2014-04-17 17:30       ` Andreas Schwab
2014-04-17 18:49         ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.