All of lore.kernel.org
 help / color / mirror / Atom feed
* history damage in linux.git
@ 2016-04-21 11:30 Olaf Hering
  2016-04-21 12:10 ` Matthieu Moy
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Olaf Hering @ 2016-04-21 11:30 UTC (permalink / raw)
  To: git

To track the changes in hyperv related files I created some scripts
years ago to automate the process of finding relevant commits in
linux.git. Part of that process is to record the tag when a commit
appeared in mainline. This worked fine, until very recently.

Suddenly years-old commits are declared as having-just-arrived in
linux.git. Look at this example:

  $ git log --oneline -- drivers/input/serio/hyperv-keyboard.c
  2048157 Drivers: hv: vmbus: fix the building warning with hyperv-keyboard
  62238f3 Input: hyperv-keyboard - register as a wakeup source
  c3c4d99 Input: hyperv-keyboard - pass through 0xE1 prefix
  aed06b9 Input: add a driver to support Hyper-V synthetic keyboard
  $ git describe --contains aed06b9
  v4.6-rc1~9^2~792
  $ git show aed06b9 | head
  commit aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
  Author: K. Y. Srinivasan <kys@microsoft.com>
  Date:   Wed Sep 18 12:50:42 2013 -0700

Obviously that and other commits are in the tree since a very long time.

How can I find out whats going on? Is my git(1) 2.8.1 broken, or did
Linus just pull some junk tree (and does he continue to do so)?

Olaf

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 11:30 history damage in linux.git Olaf Hering
@ 2016-04-21 12:10 ` Matthieu Moy
  2016-04-21 12:32   ` Olaf Hering
  2016-04-21 13:19 ` John Keeping
  2016-04-21 13:24 ` Andreas Schwab
  2 siblings, 1 reply; 24+ messages in thread
From: Matthieu Moy @ 2016-04-21 12:10 UTC (permalink / raw)
  To: Olaf Hering; +Cc: git

Olaf Hering <olaf@aepfle.de> writes:

> To track the changes in hyperv related files I created some scripts
> years ago to automate the process of finding relevant commits in
> linux.git. Part of that process is to record the tag when a commit
> appeared in mainline. This worked fine, until very recently.
>
> Suddenly years-old commits are declared as having-just-arrived in
> linux.git. Look at this example:
>
>   $ git log --oneline -- drivers/input/serio/hyperv-keyboard.c
>   2048157 Drivers: hv: vmbus: fix the building warning with hyperv-keyboard
>   62238f3 Input: hyperv-keyboard - register as a wakeup source
>   c3c4d99 Input: hyperv-keyboard - pass through 0xE1 prefix
>   aed06b9 Input: add a driver to support Hyper-V synthetic keyboard
>   $ git describe --contains aed06b9
>   v4.6-rc1~9^2~792
>   $ git show aed06b9 | head
>   commit aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
>   Author: K. Y. Srinivasan <kys@microsoft.com>
>   Date:   Wed Sep 18 12:50:42 2013 -0700
>
> Obviously that and other commits are in the tree since a very long time.

I cannot take that conclusion from the excerpts above. What I can
conclude is that the commit was made a long time ago (according to
https://github.com/torvalds/linux/commit/aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
the commit was initially made on Sep 18 2013 by K. Y. Srinivasan, and
then applied by Dmitry Torokhov the day after to "some" tree.

My guess is that this commit has been sitting for a long time in a
repo outside Linus' linux.git, and got merged only recently.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 12:10 ` Matthieu Moy
@ 2016-04-21 12:32   ` Olaf Hering
  2016-04-21 12:51     ` Matthieu Moy
  0 siblings, 1 reply; 24+ messages in thread
From: Olaf Hering @ 2016-04-21 12:32 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

On Thu, Apr 21, Matthieu Moy wrote:

> My guess is that this commit has been sitting for a long time in a
> repo outside Linus' linux.git, and got merged only recently.

Thats what it looks like. And thats what I'm complaining about. But in
fact that file is there since v3.13-rc7 (if the tag is really correct in
my patch file), since at least end of 2014.



And after some checking its in linux-3.13-rc1 from 2013-11-22, which
indicates that such damage has already happend earlier. At least I have
a record from Nov 2014 for the patch file. There is a slim chance that I
tweaked that particular tag menually in the patch file.

This is a list of changed tags in my patch collection:

 Date: Wed, 18 Sep 2013 12:50:42 -0700
-Patch-mainline: v3.13-rc7
+Patch-mainline: v4.6-rc1
 Subject: Input: add a driver to support Hyper-V synthetic keyboard
 Git-commit: aed06b9cfcabf8644ac5f6f108c0b3d01522f88b

 Date: Sun, 12 Jan 2014 11:09:14 -0800
-Patch-mainline: v3.14
+Patch-mainline: v4.6-rc1
 Subject: Input: hyperv-keyboard - pass through 0xE1 prefix
 Git-commit: c3c4d99485ea51cd354ed3cd955a8310703456b6

 Date: Wed, 6 Aug 2014 13:33:54 -0700
-Patch-mainline: v3.17-rc1
+Patch-mainline: v4.6-rc1
 Subject: Input: hyperv-keyboard - register as a wakeup source
 Git-commit: 62238f3aadc9bc56da70100e19ec61b9f8d72a5f

 Date: Mon, 30 Nov 2015 19:22:13 +0300
-Patch-mainline: v4.5-rc1
+Patch-mainline: v4.5-rc2
 Subject: drivers/hv: replace enum hv_message_type by u32
 Git-commit: 7797dcf63f11b6e1d34822daf2317223d0f4ad46

 Date: Mon, 30 Nov 2015 19:22:14 +0300
-Patch-mainline: v4.5-rc1
+Patch-mainline: v4.5-rc2
 Subject: drivers/hv: Move HV_SYNIC_STIMER_COUNT into Hyper-V UAPI x86 header
 Git-commit: 4f39bcfd1c132522380138a323f9af7902766301

 Date: Mon, 30 Nov 2015 19:22:15 +0300
-Patch-mainline: v4.5-rc1
+Patch-mainline: v4.5-rc2
 Subject: drivers/hv: Move struct hv_message into UAPI Hyper-V x86 header
 Git-commit: 5b423efe11e822e092e8c911a6bad17eadf718eb

 Date: Mon, 30 Nov 2015 19:22:16 +0300
-Patch-mainline: v4.5-rc1
+Patch-mainline: v4.5-rc2
 Subject: drivers/hv: Move struct hv_timer_message_payload into UAPI Hyper-V x86 header
 Git-commit: c71acc4c74dddebbbbeede69fdd4f0b1a124f9df

 Date: Mon, 30 Nov 2015 19:22:21 +0300
-Patch-mainline: v4.5-rc1
+Patch-mainline: v4.5-rc2
 Subject: kvm/x86: Hyper-V SynIC timers
 Git-commit: 1f4b34f825e8cef6f493d06b46605384785b3d16

 Date: Tue, 2 Feb 2016 11:45:02 +0800
-Patch-mainline: v4.6-rc1
+Patch-mainline: v4.6-rc2
 Subject: x86/cpu: Convert printk(KERN_<LEVEL> ...) to pr_<level>(...)
 Git-commit: 1b74dde7c47c19a73ea3e9fac95ac27b5d3d50c5

Olaf

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 12:32   ` Olaf Hering
@ 2016-04-21 12:51     ` Matthieu Moy
  0 siblings, 0 replies; 24+ messages in thread
From: Matthieu Moy @ 2016-04-21 12:51 UTC (permalink / raw)
  To: Olaf Hering; +Cc: git

Olaf Hering <olaf@aepfle.de> writes:

> On Thu, Apr 21, Matthieu Moy wrote:
>
>> My guess is that this commit has been sitting for a long time in a
>> repo outside Linus' linux.git, and got merged only recently.
>
> Thats what it looks like. And thats what I'm complaining about. But in
> fact that file is there since v3.13-rc7 (if the tag is really correct in
> my patch file), since at least end of 2014.

Ah, indeed. It was merged right before 3.13. See "git tag --contains
aed06b9cfcab".

It's indeed weird that "git describe --contains" gives a named based on
tag v4.6-rc1, but it is not really incorrect since tags/v4.6-rc1~9^2~792
is indeed a correct name for aed06b9cfcabf8644ac5f6f108c0b3d01522f88b,
and actually uses a tag that follows it.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 11:30 history damage in linux.git Olaf Hering
  2016-04-21 12:10 ` Matthieu Moy
@ 2016-04-21 13:19 ` John Keeping
  2016-04-21 15:54   ` Olaf Hering
  2016-04-21 13:24 ` Andreas Schwab
  2 siblings, 1 reply; 24+ messages in thread
From: John Keeping @ 2016-04-21 13:19 UTC (permalink / raw)
  To: Olaf Hering; +Cc: git

On Thu, Apr 21, 2016 at 01:30:04PM +0200, Olaf Hering wrote:
> To track the changes in hyperv related files I created some scripts
> years ago to automate the process of finding relevant commits in
> linux.git. Part of that process is to record the tag when a commit
> appeared in mainline. This worked fine, until very recently.
> 
> Suddenly years-old commits are declared as having-just-arrived in
> linux.git. Look at this example:
> 
>   $ git log --oneline -- drivers/input/serio/hyperv-keyboard.c
>   2048157 Drivers: hv: vmbus: fix the building warning with hyperv-keyboard
>   62238f3 Input: hyperv-keyboard - register as a wakeup source
>   c3c4d99 Input: hyperv-keyboard - pass through 0xE1 prefix
>   aed06b9 Input: add a driver to support Hyper-V synthetic keyboard
>   $ git describe --contains aed06b9
>   v4.6-rc1~9^2~792
>   $ git show aed06b9 | head
>   commit aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
>   Author: K. Y. Srinivasan <kys@microsoft.com>
>   Date:   Wed Sep 18 12:50:42 2013 -0700
> 
> Obviously that and other commits are in the tree since a very long time.
> 
> How can I find out whats going on? Is my git(1) 2.8.1 broken, or did
> Linus just pull some junk tree (and does he continue to do so)?

I suspect it indicates that an old tree was pulled in such that the path
to v4.6-rc1 is shorter than to the older version.  The commit is clearly
in v3.13-rc1:

	$ git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
	v3.13
	v3.13-rc1
	v3.13-rc2
	[snip]

The behaviour of describe is a bit clearer if you limit it to v3.*:

	$ git describe --match='v3.*' --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
	v3.13-rc7~9^2~14^2~42

	$ git describe --match='v3.13-rc1' --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
	v3.13-rc1~65^2^2~42

It seems that the path to v4.6-rc1 is "more direct" than to either of
these commits: there is only one second-parent merge transition.

>From a quick look, I think the problem is in commit c155c7492c9a ("Merge
branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input")
which merges a branch that has repeatedly had master merged back into it
but does not build on any recent releases.  The most recent tag on the
first-parent history of that branch is v3.0-rc4.

I think it is as simple as git-describe (or git-name-rev which is used
in the --contains case) preferring a less branchy path, which has been
introduced in v4.6 with the merge commit above.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 11:30 history damage in linux.git Olaf Hering
  2016-04-21 12:10 ` Matthieu Moy
  2016-04-21 13:19 ` John Keeping
@ 2016-04-21 13:24 ` Andreas Schwab
  2016-04-21 16:36   ` Linus Torvalds
  2 siblings, 1 reply; 24+ messages in thread
From: Andreas Schwab @ 2016-04-21 13:24 UTC (permalink / raw)
  To: Olaf Hering; +Cc: git

Olaf Hering <olaf@aepfle.de> writes:

> How can I find out whats going on? Is my git(1) 2.8.1 broken, or did
> Linus just pull some junk tree (and does he continue to do so)?

The branches recently pulled by Linus contain commits with rather old
author dates, eg 8cad489261c564d4ee1db4de4ac365af56807d8a or
281baf7a702693deaa45c98ef0c5161006b48257.  That will probably cause git
describe --contains to take a different path through the history.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 13:19 ` John Keeping
@ 2016-04-21 15:54   ` Olaf Hering
  2016-04-21 16:36     ` Matthieu Moy
  0 siblings, 1 reply; 24+ messages in thread
From: Olaf Hering @ 2016-04-21 15:54 UTC (permalink / raw)
  To: John Keeping; +Cc: git

On Thu, Apr 21, John Keeping wrote:

> 	$ git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b

Thanks for that, I did not know this variant.

Unless git does not do it for me, I may hackup my script like that to
find the earlierst tag:

 for i in `git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b`
 do
  git log --max-count=1 --oneline --pretty=%ct:%h:$i $i | cat
 done | sort -n | sed 's@^.*:@@;q'

Olaf

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 15:54   ` Olaf Hering
@ 2016-04-21 16:36     ` Matthieu Moy
  0 siblings, 0 replies; 24+ messages in thread
From: Matthieu Moy @ 2016-04-21 16:36 UTC (permalink / raw)
  To: Olaf Hering; +Cc: John Keeping, git

Olaf Hering <olaf@aepfle.de> writes:

> On Thu, Apr 21, John Keeping wrote:
>
>> 	$ git tag --contains aed06b9cfcabf8644ac5f6f108c0b3d01522f88b
>
> Thanks for that, I did not know this variant.
>
> Unless git does not do it for me, I may hackup my script like that to
> find the earlierst tag:

"git tag" has a --sort key, so you can sort on dates, and "| head -n 1".
See also "git for-each-ref" which is essentially a superset of what "git
tag" does, and for-each-ref is plumbing, so safer to use in scripts (we
have a strong tradition of backward compatibility for plumbing).

This relies on the dates recorded in the commits, which may be wrong
(typically if someone commited on a machine with an incorrect clock).
But hopefully not.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 13:24 ` Andreas Schwab
@ 2016-04-21 16:36   ` Linus Torvalds
  2016-04-21 16:59     ` Junio C Hamano
  2016-04-21 17:00     ` Linus Torvalds
  0 siblings, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2016-04-21 16:36 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Olaf Hering, Git Mailing List

On Thu, Apr 21, 2016 at 6:24 AM, Andreas Schwab <schwab@linux-m68k.org> wrote:
>
> The branches recently pulled by Linus contain commits with rather old
> author dates, eg 8cad489261c564d4ee1db4de4ac365af56807d8a or
> 281baf7a702693deaa45c98ef0c5161006b48257.  That will probably cause git
> describe --contains to take a different path through the history.

Nothing in git should look at author dates (except for the obvious
code that then shows the date).

The committer date is in fact used for the traversal heuristics for
some cases, but those are different and recent in the examples you
note.

This seems to be a git bug.

That commit aed06b9 can also be described as

    v3.13-rc7~9^2~14^2~42

so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.

However did git decide to use that further-away name? That looks odd.

I'm wondering if it's because of the new root commit we have, and that
confuses the distance calculation somehow? That came in somewhat
recently (mid March) with the GPIO pull.

       Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 16:36   ` Linus Torvalds
@ 2016-04-21 16:59     ` Junio C Hamano
  2016-04-21 17:08       ` Jeff King
  2016-04-21 17:00     ` Linus Torvalds
  1 sibling, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2016-04-21 16:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andreas Schwab, Olaf Hering, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> That commit aed06b9 can also be described as
>
>     v3.13-rc7~9^2~14^2~42
>
> so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.

I seem to recall that name-rev had an unexplained heuristics to
strongly avoid following second parent changes (I see two ^2 in the
path from 3.13-rc7 above).

> However did git decide to use that further-away name? That looks odd.
>
> I'm wondering if it's because of the new root commit we have, and that
> confuses the distance calculation somehow? That came in somewhat
> recently (mid March) with the GPIO pull.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 16:36   ` Linus Torvalds
  2016-04-21 16:59     ` Junio C Hamano
@ 2016-04-21 17:00     ` Linus Torvalds
  2016-04-21 17:23       ` Junio C Hamano
  1 sibling, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2016-04-21 17:00 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Olaf Hering, Git Mailing List

On Thu, Apr 21, 2016 at 9:36 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This seems to be a git bug.
>
> That commit aed06b9 can also be described as
>
>     v3.13-rc7~9^2~14^2~42
>
> so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.

Hmm. I think I see what's up. The git distance function has a special
hack for preferring first-parent traversal, introduced long long ago
with commit ac076c29ae8d ("name-rev: Fix non-shortest description").

Changing that

  #define MERGE_TRAVERSAL_WEIGHT 65535

to be a smaller value makes git find the shorter path.

I do not know what the correct fix is, though.

              Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 16:59     ` Junio C Hamano
@ 2016-04-21 17:08       ` Jeff King
  2016-04-21 17:23         ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: Jeff King @ 2016-04-21 17:08 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Linus Torvalds, Andreas Schwab, Olaf Hering, Git Mailing List

On Thu, Apr 21, 2016 at 09:59:13AM -0700, Junio C Hamano wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > That commit aed06b9 can also be described as
> >
> >     v3.13-rc7~9^2~14^2~42
> >
> > so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.
> 
> I seem to recall that name-rev had an unexplained heuristics to
> strongly avoid following second parent changes (I see two ^2 in the
> path from 3.13-rc7 above).

Right, because it makes the names longer. We give the second-parent
traversal a heuristic cost. If we drop that cost to "1", like:

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 092e03c..03be8be 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -17,7 +17,7 @@ typedef struct rev_name {
 static long cutoff = LONG_MAX;
 
 /* How many generations are maximally preferred over _one_ merge traversal? */
-#define MERGE_TRAVERSAL_WEIGHT 65535
+#define MERGE_TRAVERSAL_WEIGHT 1
 
 static void name_rev(struct commit *commit,
 		const char *tip_name, int generation, int distance,


then this case gives:

  v3.13~5^2~4^2~2^2~1^2~42

which is technically true, but kind of painful to read. It may be that a
reasonable weight is somewhere between "1" and "65535", though.

However, I think the more fundamental confusion with git-describe is
that people expect the shortest distance to be the "first" tag that
contained the commit, and that is clearly not true in a branchy history.

I actually think most people would be happy with an algorithm more like:

  1. Find the "oldest" tag (either by timestamp, or by version-sorting
     the tags) that contains the commit in question.

  2. Find the "simplest" path from that tag to the commit, where we
     are striving mostly for shortness of explanation, not of path (so
     "~500" is way better than "~20^2~30^2~14", even though the latter
     is technically a shorter path).

-Peff

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 17:00     ` Linus Torvalds
@ 2016-04-21 17:23       ` Junio C Hamano
  2016-04-21 17:43         ` Linus Torvalds
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2016-04-21 17:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andreas Schwab, Olaf Hering, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, Apr 21, 2016 at 9:36 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> This seems to be a git bug.
>>
>> That commit aed06b9 can also be described as
>>
>>     v3.13-rc7~9^2~14^2~42
>>
>> so describing it as 'v4.6-rc1~9^2~792' is clearly not closer in any way.
>
> Hmm. I think I see what's up. The git distance function has a special
> hack for preferring first-parent traversal, introduced long long ago
> with commit ac076c29ae8d ("name-rev: Fix non-shortest description").
>
> Changing that
>
>   #define MERGE_TRAVERSAL_WEIGHT 65535
>
> to be a smaller value makes git find the shorter path.
>
> I do not know what the correct fix is, though.

I think avoiding side branches to describe with the weight is a
right thing to do, i.e. if you have this history:

    X---o---o---o---o---v4.6
     \             /
      o-----------o

you do not want to explain X as "v4.6~^2~2", and instead you want it
as "v4.6~5", even though the former is 4 hops while the latter is 5
hops (which is longer).

But when comparing a name based on v4.6 (which I think the algorithm
with the weight heuristics would choose v4.6~5) and another name
based on v3.13, I suspect that we compare them with number of hops
with the weight heuristics, and that is what gives us a wrong
result, isn't it?

I think it should instead compare the number of true hops.

v3.13-rc7~9^2~14^2~42 = 9 + 1 + 14 + 1 + 42 = 67 hops from v3.13
v4.6-rc1~9^2~792 = 9 + 1 + 792 = 802 hops from v4.6-rc1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 17:08       ` Jeff King
@ 2016-04-21 17:23         ` Linus Torvalds
  2016-04-21 17:44           ` Stefan Beller
  2016-04-21 18:05           ` Jeff King
  0 siblings, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2016-04-21 17:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Andreas Schwab, Olaf Hering, Git Mailing List

On Thu, Apr 21, 2016 at 10:08 AM, Jeff King <peff@peff.net> wrote:
>
> Right, because it makes the names longer. We give the second-parent
> traversal a heuristic cost. If we drop that cost to "1", like:

So I dropped it to 500 (removed the two last digits), and it gave a
reasonable answer. With 1000, it gave the same "based on 4.6" answer
as the current 65536 value does.

> which is technically true, but kind of painful to read. It may be that a
> reasonable weight is somewhere between "1" and "65535", though.

Based on my tests, the "right" number is somewhere in the 500-1000
range for this particular case. But it's still a completely made up
number.

> However, I think the more fundamental confusion with git-describe is
> that people expect the shortest distance to be the "first" tag that
> contained the commit, and that is clearly not true in a branchy history.

Yeah.

And I don't think people care *too* much, because I'm sure this has
happened before, it's just that before when it happened it wasn't
quite _so_ far off the expected path..

> I actually think most people would be happy with an algorithm more like:
>
>   1. Find the "oldest" tag (either by timestamp, or by version-sorting
>      the tags) that contains the commit in question.

Yes, we might want to base the "distance" at least partly on the age
of the base commits.

>   2. Find the "simplest" path from that tag to the commit, where we
>      are striving mostly for shortness of explanation, not of path (so
>      "~500" is way better than "~20^2~30^2~14", even though the latter
>      is technically a shorter path).

Well, so the three different paths I've seen are:

 - standard git (65536), and 1000:
   aed06b9 tags/v4.6-rc1~9^2~792

 - non-first-parent cost: 500:
   aed06b9 tags/v3.13-rc7~9^2~14^2~42

 - non-first parent cost: 1:
   aed06b9 tags/v3.13~5^2~4^2~2^2~1^2~42

so there clearly are multiple valid answers.

I would actually claim that the middle one is the best one - but I
claim that based on your algorithm case #1. The last one may be the
shortest actual path, but it's a shorter path to a newer tag that is a
superset of the older tag, so the middle one is actually not just
better based on age, but is a better choice based on "minimal actual
history".

               Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 17:23       ` Junio C Hamano
@ 2016-04-21 17:43         ` Linus Torvalds
  2016-04-21 17:59           ` Linus Torvalds
  2016-04-21 19:27           ` Junio C Hamano
  0 siblings, 2 replies; 24+ messages in thread
From: Linus Torvalds @ 2016-04-21 17:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andreas Schwab, Olaf Hering, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 2478 bytes --]

On Thu, Apr 21, 2016 at 10:23 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
> I think avoiding side branches to describe with the weight is a
> right thing to do, i.e. if you have this history:
>
>     X---o---o---o---o---v4.6
>      \             /
>       o-----------o
>
> you do not want to explain X as "v4.6~^2~2", and instead you want it
> as "v4.6~5", even though the former is 4 hops while the latter is 5
> hops (which is longer).

Yes. I have a new suggestion: make the "merge weight" depend on how
complex the name ends up being.

And that algorithm actually gives a completely new and improved path:

   aed06b9 tags/v3.13-rc2~32^2^2~47

which is still complex, but is actually the best one I've found so far.

To compare against the previous ones I looked at:

   v4.6-rc1~9^2~792    <- current git code
   v3.13-rc2~32^2^2~47     <- new with attached patch
   v3.13-rc7~9^2~14^2~42     <- merge weight 500
   v3.13~5^2~4^2~2^2~1^2~42   <- merge weight 1

that new one is actually the second-most dense, and uses the oldest
tag. So it's a good combination of denseness, but still gets the best
actual base tag.

The logic of the patch is that there are different "complexities" in
the naming, and it's not just whether you are following a second
parent, it's also if you're doing generational hops.

So when you do a first-parent change, the name stays simple (the last
number changes), and that means that the "distance" weight is low (so
that's the normal "+1" we do for first-parent.

But if it's not a first parent, there are two different cases:

 - generation > 0: we add "~%d^%d", and we approximate that with "four
characters", and use a cost that is four orders of magnitude higher
(so 10000).

 - generation == 0: we add just "^%d", so generally just two
characters. So use a cost that is just two orders of magnitude higher
(so 100).

In other words, I'm trying to convince people that my patch not only
gives a good result, but that the "weight numbers" I use make some
kind of conceptual sense from a complexity cost angle.

With that, the patch itself is attached.

I think it's better than what "git name-rev" does now. Is it optimal?
No, I think the *optimal* would use some kind of "does one tag contain
the other" logic, and discarding all base names that are just
supersets of another base that still reaches the target.

But this patch is small and simple, and has some excuses for its
behavior. What do people think?

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 1360 bytes --]

 builtin/name-rev.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 092e03c3cc9b..0354c8d222e1 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -16,9 +16,6 @@ typedef struct rev_name {
 
 static long cutoff = LONG_MAX;
 
-/* How many generations are maximally preferred over _one_ merge traversal? */
-#define MERGE_TRAVERSAL_WEIGHT 65535
-
 static void name_rev(struct commit *commit,
 		const char *tip_name, int generation, int distance,
 		int deref)
@@ -55,19 +52,26 @@ copy_data:
 			parents;
 			parents = parents->next, parent_number++) {
 		if (parent_number > 1) {
+			int weight;
 			size_t len;
 			char *new_name;
 
 			strip_suffix(tip_name, "^0", &len);
-			if (generation > 0)
+
+			// The extra merge traversal "weight" depends
+			// on how complex the resulting name is.
+			if (generation > 0) {
+				weight = 10000;
 				new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name,
 						   generation, parent_number);
-			else
+			} else {
+				weight = 100;
 				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
 						   parent_number);
+			}
 
 			name_rev(parents->item, new_name, 0,
-				distance + MERGE_TRAVERSAL_WEIGHT, 0);
+				distance + weight, 0);
 		} else {
 			name_rev(parents->item, tip_name, generation + 1,
 				distance + 1, 0);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 17:23         ` Linus Torvalds
@ 2016-04-21 17:44           ` Stefan Beller
  2016-04-21 22:16             ` Junio C Hamano
  2016-04-21 18:05           ` Jeff King
  1 sibling, 1 reply; 24+ messages in thread
From: Stefan Beller @ 2016-04-21 17:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff King, Junio C Hamano, Andreas Schwab, Olaf Hering, Git Mailing List

On Thu, Apr 21, 2016 at 10:23 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Apr 21, 2016 at 10:08 AM, Jeff King <peff@peff.net> wrote:
>>
>> Right, because it makes the names longer. We give the second-parent
>> traversal a heuristic cost. If we drop that cost to "1", like:
>
> So I dropped it to 500 (removed the two last digits), and it gave a
> reasonable answer. With 1000, it gave the same "based on 4.6" answer
> as the current 65536 value does.
>
>> which is technically true, but kind of painful to read. It may be that a
>> reasonable weight is somewhere between "1" and "65535", though.
>
> Based on my tests, the "right" number is somewhere in the 500-1000
> range for this particular case. But it's still a completely made up
> number.
>
>> However, I think the more fundamental confusion with git-describe is
>> that people expect the shortest distance to be the "first" tag that
>> contained the commit, and that is clearly not true in a branchy history.
>
> Yeah.
>
> And I don't think people care *too* much, because I'm sure this has
> happened before, it's just that before when it happened it wasn't
> quite _so_ far off the expected path..
>
>> I actually think most people would be happy with an algorithm more like:
>>
>>   1. Find the "oldest" tag (either by timestamp, or by version-sorting
>>      the tags) that contains the commit in question.
>
> Yes, we might want to base the "distance" at least partly on the age
> of the base commits.
>
>>   2. Find the "simplest" path from that tag to the commit, where we
>>      are striving mostly for shortness of explanation, not of path (so
>>      "~500" is way better than "~20^2~30^2~14", even though the latter
>>      is technically a shorter path).
>
> Well, so the three different paths I've seen are:
>
>  - standard git (65536), and 1000:
>    aed06b9 tags/v4.6-rc1~9^2~792
>
>  - non-first-parent cost: 500:
>    aed06b9 tags/v3.13-rc7~9^2~14^2~42
>
>  - non-first parent cost: 1:
>    aed06b9 tags/v3.13~5^2~4^2~2^2~1^2~42
>
> so there clearly are multiple valid answers.
>
> I would actually claim that the middle one is the best one - but I
> claim that based on your algorithm case #1. The last one may be the
> shortest actual path, but it's a shorter path to a newer tag that is a
> superset of the older tag, so the middle one is actually not just
> better based on age, but is a better choice based on "minimal actual
> history".
>
>                Linus

Combining Junios and Linus idea:

* We want to have the minimal history, i.e. that tag with the fewest
cummulative parent commits. (i.e. v3.13-rc7 is better than v3.13
because `git log --oneline v3.13-rc7 |wc -l` (414317) is smaller tha
`git log --oneline v3.13 |wc -l` (414530).
The difference is 213.

tags/v3.13-rc7~9^2~14^2~42 has 9 + 14 + 42 additional steps (65)

tags/v3.13~5^2~4^2~2^2~1^2~42 has 5 + 4 + 2 + 1 +42 steps (54)

tags/v3.13~5^2~4^2~2^2~1^2~42 has 9 less steps, but its base tag
has a higher weight by 213.

v4.6-rc1 has even more weight (588477).

So I guess what I propose is to take the weight of a tag into account
via `git log --oneline <tag> |wc -l` as that gives the tag which encloses
least history?

We also do not want to have "a lot of side traversals", so we could
punish each additional addendum by a heuristic.


> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 17:43         ` Linus Torvalds
@ 2016-04-21 17:59           ` Linus Torvalds
  2016-04-21 18:09             ` Jeff King
  2016-04-21 19:27           ` Junio C Hamano
  1 sibling, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2016-04-21 17:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andreas Schwab, Olaf Hering, Git Mailing List

On Thu, Apr 21, 2016 at 10:43 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> In other words, I'm trying to convince people that my patch not only
> gives a good result, but that the "weight numbers" I use make some
> kind of conceptual sense from a complexity cost angle.

Basically, the patch approximates the numerical representation of the
distance measure with the complexity of the suffix.

It's not a *true* length of the suffix (which does heavily favor
first-parent use, kind of like the current code), but I think it's
better than the pretty random "+65535" that the current git code has.
That number is clearly just completely made up. The new numbers at
least have some kind of logic behind them.

And the current code obviously does give really bad results. Picking
the v4.6-rc1 tag as a base just because it happens to get a lot of
first-parent traversals (800!) from one second-parent commit that is
close to 4.6-rc1 is just obscene.

So the more I look at my patch, the more I go "it's a real improvement
on the current situation".

That said, I do think that a much bigger conceptual change that
actually does full traversal and be much more complicated might be the
only "correct" solution.

So my patch is just a "improve heuristics" small fixlet rather than
something optimal.

                 Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 17:23         ` Linus Torvalds
  2016-04-21 17:44           ` Stefan Beller
@ 2016-04-21 18:05           ` Jeff King
  2016-04-21 18:18             ` Linus Torvalds
  1 sibling, 1 reply; 24+ messages in thread
From: Jeff King @ 2016-04-21 18:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Andreas Schwab, Olaf Hering, Git Mailing List

On Thu, Apr 21, 2016 at 10:23:10AM -0700, Linus Torvalds wrote:

> > which is technically true, but kind of painful to read. It may be that a
> > reasonable weight is somewhere between "1" and "65535", though.
> 
> Based on my tests, the "right" number is somewhere in the 500-1000
> range for this particular case. But it's still a completely made up
> number.

Yeah, exactly. I think if we're going to tweak the weight heuristic it
would be worth choosing a random sample of commits throughout history
and seeing how they look with various weights.

> > However, I think the more fundamental confusion with git-describe is
> > that people expect the shortest distance to be the "first" tag that
> > contained the commit, and that is clearly not true in a branchy history.
> 
> Yeah.
> 
> And I don't think people care *too* much, because I'm sure this has
> happened before, it's just that before when it happened it wasn't
> quite _so_ far off the expected path..

I think about once a year somebody complains to the list that
git-describe chose a bad name. I don't know how many confused users it
takes to muster one complain to the list, though. ;)

> > I actually think most people would be happy with an algorithm more like:
> >
> >   1. Find the "oldest" tag (either by timestamp, or by version-sorting
> >      the tags) that contains the commit in question.
> 
> Yes, we might want to base the "distance" at least partly on the age
> of the base commits.

I had actually meant my (1) and (2) to be part of the same algorithm.
That is, to literally* do a two-pass check over the history, where first
we find the "best" tag, and then compute the distance from that tag. The
first concern trumps the latter completely.

* Where "literally" only means that's the conceptual model. We probably
  could do it in one pass if we're clever, but it would behave as if
  we made the two passes.

Another way to find the "oldest" tag that I didn't mention is to find
all containing tags, and then eliminate any that contain another tag
(similar to the way we cull duplicate merge bases). That gives you an
answer based on the topology, which is more robust than timestamps or
tag names. But it doesn't necessarily provide a single answer, so you'd
still have to break ties with timestamps or name-sorting.

> >   2. Find the "simplest" path from that tag to the commit, where we
> >      are striving mostly for shortness of explanation, not of path (so
> >      "~500" is way better than "~20^2~30^2~14", even though the latter
> >      is technically a shorter path).
> 
> Well, so the three different paths I've seen are:
> 
>  - standard git (65536), and 1000:
>    aed06b9 tags/v4.6-rc1~9^2~792
> 
>  - non-first-parent cost: 500:
>    aed06b9 tags/v3.13-rc7~9^2~14^2~42
> 
>  - non-first parent cost: 1:
>    aed06b9 tags/v3.13~5^2~4^2~2^2~1^2~42
> 
> so there clearly are multiple valid answers.
> 
> I would actually claim that the middle one is the best one - but I
> claim that based on your algorithm case #1. The last one may be the
> shortest actual path, but it's a shorter path to a newer tag that is a
> superset of the older tag, so the middle one is actually not just
> better based on age, but is a better choice based on "minimal actual
> history".

Yeah, I'd agree the middle one is the best one, because the other tags
contain -rc7. Finding the best tag is much more important than the path
distance, because that's the part that humans read and care about. The
rest is mostly machine-readable to find the exact commit, so we want any
path that's accurate, and not too cumbersome to look at or cut and
paste (and obviously shorter path is better, too).

I actually think the best name for aed06b9 is probably:

  v3.13-rc1~65^2^2~42

which I found by picking the oldest tag from "git tag --contains" and
plugging it into "git describe --match". Sadly, neither git's internal
version-sorting nor GNU's "sort -V" knows that "v1.0-rc1" comes before
"v1.0", so I had to rely on "--sort=taggerdate".

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 17:59           ` Linus Torvalds
@ 2016-04-21 18:09             ` Jeff King
  0 siblings, 0 replies; 24+ messages in thread
From: Jeff King @ 2016-04-21 18:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Andreas Schwab, Olaf Hering, Git Mailing List

On Thu, Apr 21, 2016 at 10:59:52AM -0700, Linus Torvalds wrote:

> That said, I do think that a much bigger conceptual change that
> actually does full traversal and be much more complicated might be the
> only "correct" solution.
> 
> So my patch is just a "improve heuristics" small fixlet rather than
> something optimal.

Yeah, I'd agree with both points. Unless somebody is planning to work on
the bigger change in the near future, your patch is a strict improvement
to the heuristic, and is worth applying.

-Peff

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 18:05           ` Jeff King
@ 2016-04-21 18:18             ` Linus Torvalds
  2016-04-22 13:38               ` Johannes Schindelin
  0 siblings, 1 reply; 24+ messages in thread
From: Linus Torvalds @ 2016-04-21 18:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Andreas Schwab, Olaf Hering, Git Mailing List

On Thu, Apr 21, 2016 at 11:05 AM, Jeff King <peff@peff.net> wrote:
>
> I actually think the best name for aed06b9 is probably:
>
>   v3.13-rc1~65^2^2~42

Sounds likely. I don't know how to find that best match without a
complete rewrite, though.

My recent patch that got

  v3.13-rc2~32^2^2~47

comes close to that, and the complexity is similar but the numbers are
actually smaller, so I guess my heuristic did indeed find a "simpler"
name, but yes, the one based on 3.13-rc1 would definitely be the
better one.

> which I found by picking the oldest tag from "git tag --contains" and
> plugging it into "git describe --match".

Yeah, so you basically did the "let's figure out minimal inclusion" by hand.

> Sadly, neither git's internal
> version-sorting nor GNU's "sort -V" knows that "v1.0-rc1" comes before
> "v1.0", so I had to rely on "--sort=taggerdate".

I'm not seeing the "sadly".

I think "--sort=taggerdate" is pretty much the only sane sort there is
for tags, unless you do a true and full topological one (ie sort based
on by how many commits that tag encompasses, but also by how each tag
contains another tag).

              Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 17:43         ` Linus Torvalds
  2016-04-21 17:59           ` Linus Torvalds
@ 2016-04-21 19:27           ` Junio C Hamano
  2016-04-21 19:43             ` Linus Torvalds
  1 sibling, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2016-04-21 19:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Andreas Schwab, Olaf Hering, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> But this patch is small and simple, and has some excuses for its
> behavior. What do people think?

I like it that you call it "excuse" not "rationale", as I couldn't
form a logical connection between your "4 (2) letters" and "10000
(100)" at all ;-)

Modulo the usual style issues (e.g. we frown upon patches in
attachement that makes it harder to quote and comment), I think this
is a strict improvement and is a good measure until somebody does a
full "topologically closest" solution.

>                  Linus
>
>  builtin/name-rev.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 092e03c3cc9b..0354c8d222e1 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -16,9 +16,6 @@ typedef struct rev_name {
>  
>  static long cutoff = LONG_MAX;
>  
> -/* How many generations are maximally preferred over _one_ merge traversal? */
> -#define MERGE_TRAVERSAL_WEIGHT 65535
> -
>  static void name_rev(struct commit *commit,
>  		const char *tip_name, int generation, int distance,
>  		int deref)
> @@ -55,19 +52,26 @@ copy_data:
>  			parents;
>  			parents = parents->next, parent_number++) {
>  		if (parent_number > 1) {
> +			int weight;
>  			size_t len;
>  			char *new_name;
>  
>  			strip_suffix(tip_name, "^0", &len);
> -			if (generation > 0)
> +
> +			// The extra merge traversal "weight" depends
> +			// on how complex the resulting name is.
> +			if (generation > 0) {
> +				weight = 10000;
>  				new_name = xstrfmt("%.*s~%d^%d", (int)len, tip_name,
>  						   generation, parent_number);
> -			else
> +			} else {
> +				weight = 100;
>  				new_name = xstrfmt("%.*s^%d", (int)len, tip_name,
>  						   parent_number);
> +			}
>  
>  			name_rev(parents->item, new_name, 0,
> -				distance + MERGE_TRAVERSAL_WEIGHT, 0);
> +				distance + weight, 0);
>  		} else {
>  			name_rev(parents->item, tip_name, generation + 1,
>  				distance + 1, 0);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 19:27           ` Junio C Hamano
@ 2016-04-21 19:43             ` Linus Torvalds
  0 siblings, 0 replies; 24+ messages in thread
From: Linus Torvalds @ 2016-04-21 19:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Andreas Schwab, Olaf Hering, Git Mailing List

On Thu, Apr 21, 2016 at 12:27 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> But this patch is small and simple, and has some excuses for its
>> behavior. What do people think?
>
> I like it that you call it "excuse" not "rationale", as I couldn't
> form a logical connection between your "4 (2) letters" and "10000
> (100)" at all ;-)

Think of the distance number as a "order of magnitude in complexity",
and it actually makes a certain amount of sense.

It's not the same as the length of the string, but the "log()" of the
distance number really does give a kind of complexity value.

Think of it this way: if things are entirely linear (all just first
parenthood), there will be just a single simple number, and the
relationship between the simple distance number (that just increments
by one for each parent traversed) and the length of the string that
describes it will really be "log10(distance)". That's literally how
many characters you need to describe the linear distance number.

So a simple linear distance of 'n' commits will need on the order of
'log10(n)' digits to describe it (ie a number around a thousand will
need around three digits).

The "100" and "10000" are just extending that notion of distance to
the more complex cases., and expresses their complexity in the same
logarithmic units. The same way you need four digits to express a
_linear_ distance of 10000, you need four characters to express that
"~n^p" case of "merge parent p, n generations back".

And if you don't have the generation thing, you only need two
characters to express parent #'p': "^p".

So two characters really *are* equivalent to ~100 linear steps, and
four characters really *are* equivalent to ~10000 linear steps.

So it's not _just_ an excuse. There's an actual rationale for picking
those numbers, and why they are equivalent in a complexity measure.

             Linus

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 17:44           ` Stefan Beller
@ 2016-04-21 22:16             ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2016-04-21 22:16 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Linus Torvalds, Jeff King, Andreas Schwab, Olaf Hering, Git Mailing List

Stefan Beller <sbeller@google.com> writes:

> Combining Junios and Linus idea:
>
> * We want to have the minimal history, i.e. that tag with the fewest
> cummulative parent commits. (i.e. v3.13-rc7 is better than v3.13
> because `git log --oneline v3.13-rc7 |wc -l` (414317) is smaller tha
> `git log --oneline v3.13 |wc -l` (414530).
> The difference is 213.
>
> tags/v3.13-rc7~9^2~14^2~42 has 9 + 14 + 42 additional steps (65)
>
> tags/v3.13~5^2~4^2~2^2~1^2~42 has 5 + 4 + 2 + 1 +42 steps (54)
>
> tags/v3.13~5^2~4^2~2^2~1^2~42 has 9 less steps, but its base tag
> has a higher weight by 213.
>
> v4.6-rc1 has even more weight (588477).
>
> So I guess what I propose is to take the weight of a tag into account
> via `git log --oneline <tag> |wc -l` as that gives the tag which encloses
> least history?
>
> We also do not want to have "a lot of side traversals", so we could
> punish each additional addendum by a heuristic.

These are essentially shooting for what Linus meant "something
optimal", contrasting his "improved heuristics" version, and it is
good that you are thinking about these issues.

It may be a bit tricky to think the "optimum description" in terms
of "how many commits are behind these tags", though.  One thing that
commonly happens is:

 (1) A fix X is developed on an ancient codebase, e.g. on top of
     v1.0.0 release.

 (2) X is first merged to the current 'master' and becomes part of
     the next release v2.0.0.

 (3) X is later merged to the maintenance track and included in
     v1.0.1.

There are a few questions you would want to ask "describe --contains
X" and depending on the reason why you are asking the question, the
desired answer may be different:

 - In which release did the fix X first appear?  (answer: v2.0.0)
 - What is the oldest release with the fix X?    (answer: v1.0.1)

I happen to be a maintainer who prefers to have a tidy containment
relationships among his releases, and after the above three steps,
there will be this:

 (4) Later v1.0.1 is merged back to 'master' branch and a tag v2.0.1
     on the maintenance track for v2.0.x series would contain it.

But not all projects are run that way.  Often older maintenance
tracks will never merge back to the current development track
(i.e. fixes are only ported-back).

If the v1.0.x codebase had an unrelated problem in the code that no
longer exists in v2.0.x codebase, the maintenance track may have
quite a many commits that exist only there and not in 'master', and
when (3) happens, the "weight", i.e. the commits behind the tag, of
v1.0.1 may be greater than the weight of v2.0.0 in such a project.

In other words, an algorithm that uses "how many commits are behind
the tag" as one of the deciding factor to name X would choose
between v1.0.1 and v2.0.0 depending on what other things that have
nothing to do with X happend on these parallel development tracks,
which may not be desirable.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: history damage in linux.git
  2016-04-21 18:18             ` Linus Torvalds
@ 2016-04-22 13:38               ` Johannes Schindelin
  0 siblings, 0 replies; 24+ messages in thread
From: Johannes Schindelin @ 2016-04-22 13:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff King, Junio C Hamano, Andreas Schwab, Olaf Hering, Git Mailing List

Hi Linus,

On Thu, 21 Apr 2016, Linus Torvalds wrote:

> On Thu, Apr 21, 2016 at 11:05 AM, Jeff King <peff@peff.net> wrote:
> >
> > Sadly, neither git's internal version-sorting nor GNU's "sort -V"
> > knows that "v1.0-rc1" comes before "v1.0", so I had to rely on
> > "--sort=taggerdate".
> 
> I'm not seeing the "sadly".
> 
> I think "--sort=taggerdate" is pretty much the only sane sort there is
> for tags, unless you do a true and full topological one (ie sort based
> on by how many commits that tag encompasses, but also by how each tag
> contains another tag).

Turns out it is pretty easy to implement this in name-rev. Expect the
patch in your inbox in a minute.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-04-22 13:39 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-21 11:30 history damage in linux.git Olaf Hering
2016-04-21 12:10 ` Matthieu Moy
2016-04-21 12:32   ` Olaf Hering
2016-04-21 12:51     ` Matthieu Moy
2016-04-21 13:19 ` John Keeping
2016-04-21 15:54   ` Olaf Hering
2016-04-21 16:36     ` Matthieu Moy
2016-04-21 13:24 ` Andreas Schwab
2016-04-21 16:36   ` Linus Torvalds
2016-04-21 16:59     ` Junio C Hamano
2016-04-21 17:08       ` Jeff King
2016-04-21 17:23         ` Linus Torvalds
2016-04-21 17:44           ` Stefan Beller
2016-04-21 22:16             ` Junio C Hamano
2016-04-21 18:05           ` Jeff King
2016-04-21 18:18             ` Linus Torvalds
2016-04-22 13:38               ` Johannes Schindelin
2016-04-21 17:00     ` Linus Torvalds
2016-04-21 17:23       ` Junio C Hamano
2016-04-21 17:43         ` Linus Torvalds
2016-04-21 17:59           ` Linus Torvalds
2016-04-21 18:09             ` Jeff King
2016-04-21 19:27           ` Junio C Hamano
2016-04-21 19:43             ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.