All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Jacob Keller <jacob.e.keller@intel.com>, git@vger.kernel.org
Cc: Junio C Hamano <gitster@pobox.com>,
	Jacob Keller <jacob.keller@gmail.com>
Subject: Re: [PATCH] name-rev: use generation numbers if available
Date: Mon, 28 Feb 2022 14:50:55 -0500	[thread overview]
Message-ID: <e4096124-e566-0842-f17c-366645c3e37c@github.com> (raw)
In-Reply-To: <20220228190738.2112503-1-jacob.e.keller@intel.com>

On 2/28/2022 2:07 PM, Jacob Keller wrote:
> From: Jacob Keller <jacob.keller@gmail.com>
> 
> If a commit in a sequence of linear history has a non-monotonically
> increasing commit timestamp, git name-rev might not properly name the
> commit.
> 
> This occurs because name-rev uses a heuristic of the commit date to
> avoid searching down tags which lead to commits that are older than the
> named commit. This is intended to avoid work on larger repositories.
> 
> This heuristic impacts git name-rev, and by extension git describe
> --contains which is built on top of name-rev.
> 
> Further more, if --annotate-stdin is used, the heuristic is not enabled
> because the full history has to be analyzed anyways. This results in
> some confusion if a user sees that --annotate-stdin works but a normal
> name-rev does not.
> 
> If the repository has a commit graph, we can use the generation numbers
> instead of using the commit dates. This is essentially the same check
> except that generation numbers make it exact, where the commit date
> heuristic could be incorrect due to clock errors.
> 
> Add a test case which covers this behavior and shows how the commit
> graph makes the name-rev process work.
> 
> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
> ---
> The initial implementation of this came from [1]. Should this have Stolee's
> sign-off?
> 
> [1]: https://lore.kernel.org/git/42d2a9fe-a3f2-f841-dcd1-27a0440521b6@github.com/

I think your implementation is sufficiently different (and better)
that you don't need my co-authorship _or_ sign-off.

> +static void set_commit_cutoff(struct commit *commit)
> +{
> +	timestamp_t generation;
> +
> +	if (cutoff > commit->date)
> +		cutoff = commit->date;
> +
> +	generation = commit_graph_generation(commit);
> +	if (generation_cutoff > generation)
> +		generation_cutoff = generation;
> +}

I appreciate that you split this out into its own method to isolate
the logic.

> +/* Check if a commit is before the cutoff. Prioritize generation numbers
> + * first, but use the commit timestamp if we lack generation data.
> + */
> +static int commit_is_before_cutoff(struct commit *commit)
> +{
> +	if (generation_cutoff < GENERATION_NUMBER_INFINITY)
> +		return commit_graph_generation(commit) < generation_cutoff;
> +
> +	return commit->date < cutoff;
> +}

There are two subtle things going on here when generation_cutoff is
zero:

1. In a commit-graph with topological levels _or_ generation numbers v2,
   commit_graph_generation(commit) will always be positive, so we don't
   need to do the lookup.

2. If the commit-graph was written by an older Git version before
   topological levels were implemented, then the generation of commits
   in the commit-graph are all zero(!). This means that the logic here
   would be incorrect for the "all" case.

The fix for both cases is to return 1 if generation_cutoff is zero:

	if (!generaton_cutoff)
		return 1;
> -	if (start_commit->date < cutoff)
> +	if (commit_is_before_cutoff(start_commit))

> -			if (parent->date < cutoff)
> +			if (commit_is_before_cutoff(parent))

Nice replacements.

> -	if (all || annotate_stdin)
> +	if (all || annotate_stdin) {
> +		generation_cutoff = 0;
>  		cutoff = 0;
> +	}

Good.

> -		if (commit) {
> -			if (cutoff > commit->date)
> -				cutoff = commit->date;
> -		}
> +		if (commit)
> +			set_commit_cutoff(commit);

Another nice replacement.

> +# A-B-C-D-E-main
> +#
> +# Where C has a non-monotonically increasing commit timestamp w.r.t. other
> +# commits
> +test_expect_success 'non-monotonic commit dates setup' '
> +	UNIX_EPOCH_ZERO="@0 +0000" &&
> +	git init non-monotonic &&
> +	test_commit -C non-monotonic A &&
> +	test_commit -C non-monotonic --no-tag B &&
> +	test_commit -C non-monotonic --no-tag --date "$UNIX_EPOCH_ZERO" C &&
> +	test_commit -C non-monotonic D &&
> +	test_commit -C non-monotonic E
> +'
> +
> +test_expect_success 'name-rev with commitGraph handles non-monotonic timestamps' '
> +	test_config -C non-monotonic core.commitGraph true &&
> +	(
> +		cd non-monotonic &&
> +
> +		# Ensure commit graph is up to date
> +		git -c gc.writeCommitGraph=true gc &&

"git commit-graph write --reachable" would suffice here.


> +
> +		echo "main~3 tags/D~2" >expect &&
> +		git name-rev --tags main~3 >actual &&
> +
> +		test_cmp expect actual
> +	)
> +'
> +
> +test_expect_success 'name-rev --all works with non-monotonic' '

This is working because of the commit-graph, right? We still have
it from the previous test, so we aren't testing that this works
when we only have the commit date as a cutoff.

> +	(
> +		cd non-monotonic &&
> +
> +		cat >expect <<-\EOF &&
> +		E
> +		D
> +		D~1
> +		D~2
> +		A
> +		EOF
> +
> +		git log --pretty=%H >revs &&
> +		git name-rev --tags --annotate-stdin --name-only <revs >actual &&
> +
> +		test_cmp expect actual
> +	)

Do you want to include a test showing the "expected" behavior
when there isn't a commit-graph file? You might need to delete
an existing commit-graph (it will exist in the case of
GIT_TEST_COMMIT_GRAPH=1).

I also see that you intended to test the "--all" option, which
is not included in your test. That's probably the real key to
getting this test to work correctly. Deleting the graph will
probably cause a failure on this test unless "--all" is added.

Thanks,
-Stolee


  reply	other threads:[~2022-02-28 19:51 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-28 19:07 [PATCH] name-rev: use generation numbers if available Jacob Keller
2022-02-28 19:50 ` Derrick Stolee [this message]
2022-02-28 20:20   ` Keller, Jacob E
2022-02-28 20:24     ` Derrick Stolee
2022-02-28 20:59       ` Keller, Jacob E
2022-02-28 21:50 [PATCH v2 0/1] " Jacob Keller
2022-02-28 21:50 ` [PATCH] " Jacob Keller
2022-03-01  2:36   ` Junio C Hamano
2022-03-01  7:08     ` Jacob Keller
2022-03-01  7:09       ` Jacob Keller
2022-03-01  7:33       ` Junio C Hamano
2022-03-01 15:09         ` Derrick Stolee
2022-03-01 19:52           ` Keller, Jacob E
2022-03-01 19:56             ` Derrick Stolee
2022-03-01 20:22               ` Junio C Hamano
2022-03-01 22:46                 ` Keller, Jacob E
2022-03-03  1:10                   ` Junio C Hamano
2022-03-07 20:22                     ` Jacob Keller
2022-03-07 20:26                       ` Derrick Stolee
2022-03-07 22:30                         ` Keller, Jacob E
2022-03-07 22:43                           ` Derrick Stolee
2022-03-07 22:52                           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e4096124-e566-0842-f17c-366645c3e37c@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jacob.e.keller@intel.com \
    --cc=jacob.keller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.