All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Boeckel <ben.boeckel@kitware.com>
To: git@vger.kernel.org
Subject: [BUG] `git describe` doesn't traverse the graph in topological order
Date: Sat, 12 Aug 2023 15:36:56 -0400	[thread overview]
Message-ID: <ZNffWAgldUZdpQcr@farprobe> (raw)

[-- Attachment #1: Type: text/plain, Size: 4182 bytes --]

Hi,

I found an issue where `git describe` doesn't find a "closer" tag than
another tag as the correct one to base the description off of. I have a
reproducer, but I'll first give details of the real world issue.

Repository: https://gitlab.kitware.com/vtk/vtk.git
`master` as of: dedf87b3a1b7e5be5d8cdb46b37ad3030590b8ac

$ git rev-parse HEAD
dedf87b3a1b7e5be5d8cdb46b37ad3030590b8ac
$ git rev-parse HEAD~2^2
da2482f716310fc59ac4be42ce977f6badc6af95
$ git rev-prase v9.3.0.rc1
f150d52568f4e00aa9c8b1568a521a08ded8d4cb
$ git rev-prase v9.3.0.rc1^{commit}
da2482f716310fc59ac4be42ce977f6badc6af95
$ git rev-parse HEAD~2^2~2
0a77d7cf4fdbf489ee5d38c6fec6517574cdaeb5
$ git rev-prase v9.3.0.rc0
e5e13b14629d445bf65d5f8a181920ed9b97d54c
$ git rev-prase v9.3.0.rc0^{commit}
0a77d7cf4fdbf489ee5d38c6fec6517574cdaeb5
$ git describe HEAD
v9.3.0.rc0-56-gdedf87b3a1
$ git describe --matches v9.3.0.rc1 HEAD
v9.3.0.rc1-86876-gdedf87b3a1

As you can see:

- v9.3.0.rc1 is "closer" to `HEAD` than v9.3.0.rc0 (created as a
  workaround for this bug; v9.2.6 is otherwise reported)
- v9.3.0.rc0 is an ancestor of v9.3.0.rc1
- Both v9.3.0.rc0 and v9.3.0.rc1 are ancestors of `HEAD`
- `git describe` reports that `HEAD` is "closest" to v9.3.0.rc0
- Forcing the issue and asking for v9.3.0.rc1 shows that it thinks there
  are almost 87000 commits somehow not on that commit.

I have a reproducer script attached. It reproduces back to 2.9.0 and
probably before. 2.8.0 didn't support the structure hiding that newer
OpenSSL 1.1 has done and given that it's at least that old, I don't
think it matters too much for backporting or anything like that.

I instrumented `git describe` with some `printf` debugging (diff
attached) and found out that the commit traversal is not happening in
topological order. I suspect that this is the root cause of the issue:

    looking at commit 0a77d7cf4fdbf489ee5d38c6fec6517574cdaeb5
    depth of 0a77d7cf4fdbf489ee5d38c6fec6517574cdaeb5: 16
    find order of 0a77d7cf4fdbf489ee5d38c6fec6517574cdaeb5: 2
    setting flag 4 for commit 0a77d7cf4fdbf489ee5d38c6fec6517574cdaeb5
    flag for 0a77d7cf4fdbf489ee5d38c6fec6517574cdaeb5: 5
    pushing depth of da2482f716310fc59ac4be42ce977f6badc6af95 because of 0a77d7cf4fdbf489ee5d38c6fec6517574cdaeb5 (flag_within): 16
    setting flag 5 for commit d478d2e22e81ee2602035fe1d731b402d9b4eda7 due to ancestry

    looking at commit 1c3d839dac92761ae0866e23d89bdc8ee690de08
    depth of 1c3d839dac92761ae0866e23d89bdc8ee690de08: 17
    find order of 1c3d839dac92761ae0866e23d89bdc8ee690de08: 3
    setting flag 8 for commit 1c3d839dac92761ae0866e23d89bdc8ee690de08
    flag for 1c3d839dac92761ae0866e23d89bdc8ee690de08: b
    pushing depth of 0a77d7cf4fdbf489ee5d38c6fec6517574cdaeb5 because of 1c3d839dac92761ae0866e23d89bdc8ee690de08 (flag_within): 17
    setting flag f for commit 0a77d7cf4fdbf489ee5d38c6fec6517574cdaeb5 due to ancestry

It looks at 0a77d7cf4 before it looks at 1c3d839da (which is
v9.3.0.rc1~), but 1c3d839da~ *is* 0a77d7cf4. Because 0a77d7cf4 has
already passed on its presence flags to its parent(s), the update
performed when processing 1c3d839da has no effect. Therefore the
"entire" history is not seen as being reachable from da2482f71 and it
ends up not being the best match.

I will note that the authorship date of 1c3d839da is before that of
either 0a77d7cf4 or da2482f71 (due to a rebase that reordered the
commits to keep 1c3d839da on the release-only part of the branch), but
the reproducer script doesn't seem to care that much.

I suspect that building of the `commit_list` is the problem, probably by
using `commit_list_insert_by_date` instead of by topological sorting.
The reproducer script doesn't do anything (AFAICT) sneaky with dates
(e.g., rebasing and such) though, so I'm nowhere near 100% confident
about that.

Perhaps commits should be re-scheduled if their `flags` get updated
based on a newly discovered ancestor while traversing? I suspect that
depth tracking becomes more complicated in that case though because the
second pass on 0a77d7cf4 needs to subtract a depth from the relevant
tags with the new flag value. But it'd find the right tag at least…

Thanks,

--Ben

[-- Attachment #2: repro-git-describe.sh --]
[-- Type: application/x-sh, Size: 2100 bytes --]

[-- Attachment #3: describe-trace.diff --]
[-- Type: text/plain, Size: 2825 bytes --]

diff --git a/builtin/describe.c b/builtin/describe.c
index b28a4a1f82..5895d1af3a 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -264,8 +264,10 @@ static unsigned long finish_depth_computation(
 			}
 			if (!a)
 				break;
-		} else
+		} else {
 			best->depth++;
+			fprintf(stderr, "pushing depth of %s (finish_depth_computation): %d\n", oid_to_hex(&c->object.oid), best->depth);
+		}
 		while (parents) {
 			struct commit *p = parents->item;
 			repo_parse_commit(the_repository, p);
@@ -363,19 +365,24 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 		struct commit_list *parents = c->parents;
 		struct commit_name **slot;
 
+		fprintf(stderr, "\n\nlooking at commit %s\n", oid_to_hex(&c->object.oid));
 		seen_commits++;
 		slot = commit_names_peek(&commit_names, c);
 		n = slot ? *slot : NULL;
 		if (n) {
 			if (!tags && !all && n->prio < 2) {
+				fprintf(stderr, "skipping unannotated tag %s\n", oid_to_hex(&c->object.oid));
 				unannotated_cnt++;
 			} else if (match_cnt < max_candidates) {
 				struct possible_tag *t = &all_matches[match_cnt++];
 				t->name = n;
 				t->depth = seen_commits - 1;
+				fprintf(stderr, "depth of %s: %d\n", oid_to_hex(&c->object.oid), t->depth);
 				t->flag_within = 1u << match_cnt;
 				t->found_order = match_cnt;
+				fprintf(stderr, "find order of %s: %d\n", oid_to_hex(&c->object.oid), t->found_order);
 				c->object.flags |= t->flag_within;
+				fprintf(stderr, "setting flag %x for commit %s\n", t->flag_within, oid_to_hex(&c->object.oid));
 				if (n->prio == 2)
 					annotated_cnt++;
 			}
@@ -386,11 +393,15 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 		}
 		for (cur_match = 0; cur_match < match_cnt; cur_match++) {
 			struct possible_tag *t = &all_matches[cur_match];
-			if (!(c->object.flags & t->flag_within))
+			if (!(c->object.flags & t->flag_within)) {
 				t->depth++;
+				fprintf(stderr, "flag for %s: %x\n", oid_to_hex(&c->object.oid), c->object.flags);
+				fprintf(stderr, "pushing depth of %s because of %s (flag_within): %d\n", oid_to_hex(&t->name->peeled), oid_to_hex(&c->object.oid), t->depth);
+			}
 		}
 		/* Stop if last remaining path already covered by best candidate(s) */
 		if (annotated_cnt && !list) {
+			fprintf(stderr, "checking for best candidate\n");
 			int best_depth = INT_MAX;
 			unsigned best_within = 0;
 			for (cur_match = 0; cur_match < match_cnt; cur_match++) {
@@ -415,6 +426,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 			if (!(p->object.flags & SEEN))
 				commit_list_insert_by_date(p, &list);
 			p->object.flags |= c->object.flags;
+			fprintf(stderr, "setting flag %x for commit %s due to ancestry\n", p->object.flags, oid_to_hex(&p->object.oid));
 			parents = parents->next;
 
 			if (first_parent)

             reply	other threads:[~2023-08-12 19:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-12 19:36 Ben Boeckel [this message]
2023-09-22 15:39 ` [BUG] `git describe` doesn't traverse the graph in topological order Ben Boeckel
2023-09-22 16:13   ` rsbecker
2023-09-22 16:51     ` 'Ben Boeckel'
2023-09-22 17:14       ` rsbecker
2023-09-22 17:38         ` 'Ben Boeckel'
2023-09-22 17:51         ` Junio C Hamano
2023-09-22 18:12           ` rsbecker
2023-09-22 18:44             ` 'Ben Boeckel'
2023-09-22 18:49               ` rsbecker
2023-09-22 19:05                 ` 'Ben Boeckel'
2023-09-22 19:27                   ` rsbecker
2023-09-22 18:41           ` 'Ben Boeckel'
2023-09-23 12:32         ` 'Ben Boeckel'
2023-09-22 17:11     ` Kristoffer Haugsbakk
2023-09-22 17:35   ` Kristoffer Haugsbakk
2023-09-22 17:43     ` 'Ben Boeckel'

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZNffWAgldUZdpQcr@farprobe \
    --to=ben.boeckel@kitware.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.