All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [PATCH v4 08/10] commit: add short-circuit to paint_down_to_common()
Date: Tue, 1 May 2018 07:47:01 -0400	[thread overview]
Message-ID: <3e3ae7e4-9efd-3046-c0c2-7b3bf12d0d81@gmail.com> (raw)
In-Reply-To: <864ljsgwx1.fsf@gmail.com>

On 4/30/2018 6:19 PM, Jakub Narebski wrote:
> Derrick Stolee <dstolee@microsoft.com> writes:
>
>> When running 'git branch --contains', the in_merge_bases_many()
>> method calls paint_down_to_common() to discover if a specific
>> commit is reachable from a set of branches. Commits with lower
>> generation number are not needed to correctly answer the
>> containment query of in_merge_bases_many().
>>
>> Add a new parameter, min_generation, to paint_down_to_common() that
>> prevents walking commits with generation number strictly less than
>> min_generation. If 0 is given, then there is no functional change.
> This is thanks to the fact that generation numbers start at zero (as
> special case, though, with _ZERO), and we use strict inequality to avoid
> handling _ZERO etc. in a special way.
>
> As you wrote in response in previous version of this series, because
> paint_down_to_common() is file-local, there is no need to come up with
> symbolic name for GENERATION_NO_CUTOFF case.
>
> All right.
>
>> For in_merge_bases_many(), we can pass commit->generation as the
>> cutoff, and this saves time during 'git branch --contains' queries
>> that would otherwise walk "around" the commit we are inspecting.
> All right, and when using paint_down_to_common() to actually find merge
> bases, and not only check for containment, we cannot use cutoff.
> Therefore at least one call site needs to run it without functional
> change... which we can do.  Good.
>
>> For a copy of the Linux repository, where HEAD is checked out at
>> v4.13~100, we get the following performance improvement for
>> 'git branch --contains' over the previous commit:
>>
>> Before: 0.21s
>> After:  0.13s
>> Rel %: -38%
> Nice.
>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   commit.c | 18 ++++++++++++++----
>>   1 file changed, 14 insertions(+), 4 deletions(-)
> Let me reorder chunks a bit to make it easier to review.
>
>> diff --git a/commit.c b/commit.c
>> index 7bb007f56a..e2e16ea1a7 100644
>> --- a/commit.c
>> +++ b/commit.c
>> @@ -1070,7 +1080,7 @@ int in_merge_bases_many(struct commit *commit, int nr_reference, struct commit *
>>   	if (commit->generation > min_generation)
>>   		return ret;
>>   
>> -	bases = paint_down_to_common(commit, nr_reference, reference);
>> +	bases = paint_down_to_common(commit, nr_reference, reference, commit->generation);
>>   	if (commit->object.flags & PARENT2)
>>   		ret = 1;
>>   	clear_commit_marks(commit, all_flags);
>> @@ -808,11 +808,14 @@ static int queue_has_nonstale(struct prio_queue *queue)
>>   }
>>   
>>   /* all input commits in one and twos[] must have been parsed! */
>> -static struct commit_list *paint_down_to_common(struct commit *one, int n, struct commit **twos)
>> +static struct commit_list *paint_down_to_common(struct commit *one, int n,
>> +						struct commit **twos,
>> +						int min_generation)
>>   {
>>   	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
>>   	struct commit_list *result = NULL;
>>   	int i;
>> +	uint32_t last_gen = GENERATION_NUMBER_INFINITY;
>>   
>>   	one->object.flags |= PARENT1;
>>   	if (!n) {
>> @@ -831,6 +834,13 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n, struc
>>   		struct commit_list *parents;
>>   		int flags;
>>   
>> +		if (commit->generation > last_gen)
>> +			BUG("bad generation skip");
>> +		last_gen = commit->generation;
> Shouldn't we provide more information about where the problem is to the
> user, to make it easier to debug the repository / commit-graph data?
>
> Good to have this sanity check here.

This BUG() _should_ only be seen by developers who add callers which do 
not load commits from the commit-graph file. There is a chance that 
there are cases not covered by this patch and the added tests, though. 
Hopefully we catch them all by dogfooding the feature before turning it 
on by default.

I can add the following to help debug these bad situations:

+			BUG("bad generation skip %d > %d at %s",
+			    commit->generation, last_gen,
+			    oid_to_hex(&commit->object.oid));



>
>> +
>> +		if (commit->generation < min_generation)
>> +			break;
> So the reasoning for this, as far as I understand, is the following.
> Please correct me if I am wrong.
>
> The callsite with non-zero min_generation, in_merge_bases_many(), tries
> to find out if "commit" is an ancestor of one of the "references".  At
> least one of "references" is above "commit", so in_merge_bases_many()
> uses paint_down_to_common() - but is interested only if "commit" was
> painted as reachable from one of "references".
>
> Thus we can interrupt the walk if we know that none of [considered]
> commits in the queue can reach "commit"/"one" - as if they were all
> STALE.
>
> The search is done using priority queue (a bit like in Dijkstra
> algorithm), with newer commits - with larger generation numbers -
> considered first.  Thus if current commit has generation number less
> than min_generation cutoff, i.e. if it is below "commit", then all
> remaining commits in the queue are below cutoff.
>
> Good.
>
>> +
>>   		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
>>   		if (flags == (PARENT1 | PARENT2)) {
>>   			if (!(commit->object.flags & RESULT)) {
>> @@ -879,7 +889,7 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
>>   			return NULL;
>>   	}
>>   
>> -	list = paint_down_to_common(one, n, twos);
>> +	list = paint_down_to_common(one, n, twos, 0);
> When calculating merge bases there is no such possibility of an early
> return due to generation number cutoff.  All right then.
>
>>   
>>   	while (list) {
>>   		struct commit *commit = pop_commit(&list);
>> @@ -946,7 +956,7 @@ static int remove_redundant(struct commit **array, int cnt)
>>   			filled_index[filled] = j;
>>   			work[filled++] = array[j];
>>   		}
>> -		common = paint_down_to_common(array[i], filled, work);
>> +		common = paint_down_to_common(array[i], filled, work, 0);
> Here we are interested not only if "one"/array[i] is reachable from
> "twos"/work, but also if "twos" is reachable from "one".  Simple cutoff
> only works in one way, though I wonder if we couldn't use cutoff being
> minimum generation number of "one" and "twos" together.
>
> But that may be left for a separate commit (after checking that the
> above is correct).
>
> Not as simple and obvious as paint_down_to_common() used in
> in_merge_bases_any(), so it is all right.

Thanks for reporting this. Since we are only concerned about 
reachability in this method, it is a good candidate to use 
min_generation. It is also subtle enough that we should leave it as a 
separate commit. Also, we can measure performance improvements 
separately, as I will mention in my commit message (but I'll copy it here):

     For a copy of the Linux repository, we measured the following
     performance improvements:

     git merge-base v3.3 v4.5

     Before: 234 ms
      After: 208 ms
      Rel %: -11%

     git merge-base v4.3 v4.5

     Before: 102 ms
      After:  83 ms
      Rel %: -19%

     The experiments above were chosen to demonstrate that we are
     improving the filtering of the merge-base set. In the first
     example, more time is spent walking the history to find the
     set of merge bases before the remove_redundant() call. The
     starting commits are closer together in the second example,
     therefore more time is spent in remove_redundant(). The relative
     change in performance differs as expected.


>
>>   		if (array[i]->object.flags & PARENT2)
>>   			redundant[i] = 1;
>>   		for (j = 0; j < filled; j++)


  reply	other threads:[~2018-05-01 11:47 UTC|newest]

Thread overview: 162+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-03 16:51 [PATCH 0/6] Compute and consume generation numbers Derrick Stolee
2018-04-03 16:51 ` [PATCH 1/6] object.c: parse commit in graph first Derrick Stolee
2018-04-03 18:21   ` Jonathan Tan
2018-04-03 18:28     ` Jeff King
2018-04-03 18:32       ` Derrick Stolee
2018-04-03 16:51 ` [PATCH 2/6] commit: add generation number to struct commmit Derrick Stolee
2018-04-03 18:05   ` Brandon Williams
2018-04-03 18:28     ` Jeff King
2018-04-03 18:31       ` Derrick Stolee
2018-04-03 18:32       ` Brandon Williams
2018-04-03 18:44       ` Stefan Beller
2018-04-03 23:17       ` Ramsay Jones
2018-04-03 23:19         ` Jeff King
2018-04-03 18:24   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 3/6] commit-graph: compute generation numbers Derrick Stolee
2018-04-03 18:30   ` Jonathan Tan
2018-04-03 18:49     ` Stefan Beller
2018-04-03 16:51 ` [PATCH 4/6] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-03 18:31   ` Stefan Beller
2018-04-03 18:31   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 5/6] commit.c: use generation to halt paint walk Derrick Stolee
2018-04-03 19:01   ` Jonathan Tan
2018-04-03 16:51 ` [PATCH 6/6] commit-graph.txt: update future work Derrick Stolee
2018-04-03 19:04   ` Jonathan Tan
2018-04-03 16:56 ` [PATCH 0/6] Compute and consume generation numbers Derrick Stolee
2018-04-03 18:03 ` Brandon Williams
2018-04-03 18:29   ` Derrick Stolee
2018-04-03 18:47     ` Jeff King
2018-04-03 19:05       ` Jeff King
2018-04-04 15:45         ` [PATCH 7/6] ref-filter: use generation number for --contains Derrick Stolee
2018-04-04 15:45           ` [PATCH 8/6] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-04 15:48             ` Derrick Stolee
2018-04-04 17:01               ` Brandon Williams
2018-04-04 18:24               ` Jeff King
2018-04-04 18:53                 ` Derrick Stolee
2018-04-04 18:59                   ` Jeff King
2018-04-04 18:22           ` [PATCH 7/6] ref-filter: use generation number for --contains Jeff King
2018-04-04 19:06             ` Derrick Stolee
2018-04-04 19:16               ` Jeff King
2018-04-04 19:22                 ` Derrick Stolee
2018-04-04 19:42                   ` Jeff King
2018-04-04 19:45                     ` Derrick Stolee
2018-04-04 19:46                       ` Jeff King
2018-04-07 17:09     ` [PATCH 0/6] Compute and consume generation numbers Jakub Narebski
2018-04-07 16:55 ` Jakub Narebski
2018-04-08  1:06   ` Derrick Stolee
2018-04-11 19:32     ` Jakub Narebski
2018-04-11 19:58       ` Derrick Stolee
2018-04-14 16:52         ` Jakub Narebski
2018-04-21 20:44           ` Jakub Narebski
2018-04-23 13:54             ` Derrick Stolee
2018-04-09 16:41 ` [PATCH v2 00/10] " Derrick Stolee
2018-04-09 16:41   ` [PATCH v2 01/10] object.c: parse commit in graph first Derrick Stolee
2018-04-09 16:41   ` [PATCH v2 02/10] merge: check config before loading commits Derrick Stolee
2018-04-11  2:12     ` Junio C Hamano
2018-04-11 12:49       ` Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 03/10] commit: add generation number to struct commmit Derrick Stolee
2018-04-09 17:59     ` Stefan Beller
2018-04-11  2:31     ` Junio C Hamano
2018-04-11 12:57       ` Derrick Stolee
2018-04-11 23:28         ` Junio C Hamano
2018-04-09 16:42   ` [PATCH v2 04/10] commit-graph: compute generation numbers Derrick Stolee
2018-04-11  2:51     ` Junio C Hamano
2018-04-11 13:02       ` Derrick Stolee
2018-04-11 18:49         ` Stefan Beller
2018-04-11 19:26         ` Eric Sunshine
2018-04-09 16:42   ` [PATCH v2 05/10] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 06/10] commit.c: use generation to halt paint walk Derrick Stolee
2018-04-11  3:02     ` Junio C Hamano
2018-04-11 13:24       ` Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 07/10] commit-graph.txt: update future work Derrick Stolee
2018-04-12  9:12     ` Junio C Hamano
2018-04-12 11:35       ` Derrick Stolee
2018-04-13  9:53         ` Jakub Narebski
2018-04-09 16:42   ` [PATCH v2 08/10] ref-filter: use generation number for --contains Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 09/10] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-09 16:42   ` [PATCH v2 10/10] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-17 17:00   ` [PATCH v3 0/9] Compute and consume generation numbers Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 1/9] commit: add generation number to struct commmit Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 2/9] commit-graph: compute generation numbers Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 3/9] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-18 14:31       ` Jakub Narebski
2018-04-18 14:46         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 4/9] commit-graph.txt: update design document Derrick Stolee
2018-04-18 19:47       ` Jakub Narebski
2018-04-17 17:00     ` [PATCH v3 5/9] ref-filter: use generation number for --contains Derrick Stolee
2018-04-18 21:02       ` Jakub Narebski
2018-04-23 14:22         ` Derrick Stolee
2018-04-24 18:56           ` Jakub Narebski
2018-04-25 14:11             ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 6/9] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-18 22:15       ` Jakub Narebski
2018-04-23 14:31         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 7/9] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-18 23:19       ` Jakub Narebski
2018-04-23 14:40         ` Derrick Stolee
2018-04-23 21:38           ` Jakub Narebski
2018-04-24 12:31             ` Derrick Stolee
2018-04-19  8:32       ` Jakub Narebski
2018-04-17 17:00     ` [PATCH v3 8/9] commit-graph: always load commit-graph information Derrick Stolee
2018-04-17 17:50       ` Derrick Stolee
2018-04-19  0:02       ` Jakub Narebski
2018-04-23 14:49         ` Derrick Stolee
2018-04-17 17:00     ` [PATCH v3 9/9] merge: check config before loading commits Derrick Stolee
2018-04-19  0:04     ` [PATCH v3 0/9] Compute and consume generation numbers Jakub Narebski
2018-04-23 14:54       ` Derrick Stolee
2018-04-25 14:37     ` [PATCH v4 00/10] " Derrick Stolee
2018-04-25 14:37       ` [PATCH v4 01/10] ref-filter: fix outdated comment on in_commit_list Derrick Stolee
2018-04-28 17:54         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 02/10] commit: add generation number to struct commmit Derrick Stolee
2018-04-28 22:35         ` Jakub Narebski
2018-04-30 12:05           ` Derrick Stolee
2018-04-25 14:37       ` [PATCH v4 03/10] commit-graph: compute generation numbers Derrick Stolee
2018-04-26  2:35         ` Junio C Hamano
2018-04-26 12:58           ` Derrick Stolee
2018-04-26 13:49             ` Derrick Stolee
2018-04-29  9:08         ` Jakub Narebski
2018-05-01 12:10           ` Derrick Stolee
2018-05-02 16:15             ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 04/10] commit: use generations in paint_down_to_common() Derrick Stolee
2018-04-26  3:22         ` Junio C Hamano
2018-04-26  9:02           ` Jakub Narebski
2018-04-28 14:38             ` Jakub Narebski
2018-04-29 15:40         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 06/10] ref-filter: use generation number for --contains Derrick Stolee
2018-04-30 16:34         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 05/10] commit-graph: always load commit-graph information Derrick Stolee
2018-04-29 22:14         ` Jakub Narebski
2018-05-01 12:19           ` Derrick Stolee
2018-04-29 22:18         ` Jakub Narebski
2018-04-25 14:37       ` [PATCH v4 07/10] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-04-30 17:05         ` Jakub Narebski
2018-04-25 14:38       ` [PATCH v4 08/10] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-04-30 22:19         ` Jakub Narebski
2018-05-01 11:47           ` Derrick Stolee [this message]
2018-05-02 13:05             ` Jakub Narebski
2018-05-02 13:42               ` Derrick Stolee
2018-04-25 14:38       ` [PATCH v4 09/10] merge: check config before loading commits Derrick Stolee
2018-04-30 22:54         ` Jakub Narebski
2018-05-01 11:52           ` Derrick Stolee
2018-05-02 11:41             ` Jakub Narebski
2018-04-25 14:38       ` [PATCH v4 10/10] commit-graph.txt: update design document Derrick Stolee
2018-04-30 23:32         ` Jakub Narebski
2018-05-01 12:00           ` Derrick Stolee
2018-05-02  7:57             ` Jakub Narebski
2018-04-25 14:40       ` [PATCH v4 00/10] Compute and consume generation numbers Derrick Stolee
2018-04-28 17:28         ` Jakub Narebski
2018-05-01 12:47       ` [PATCH v5 00/11] " Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 01/11] ref-filter: fix outdated comment on in_commit_list Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 02/11] commit: add generation number to struct commmit Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 03/11] commit-graph: compute generation numbers Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 04/11] commit: use generations in paint_down_to_common() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 05/11] commit-graph: always load commit-graph information Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 06/11] ref-filter: use generation number for --contains Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 07/11] commit: use generation numbers for in_merge_bases() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 08/11] commit: add short-circuit to paint_down_to_common() Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 09/11] commit: use generation number in remove_redundant() Derrick Stolee
2018-05-01 15:37           ` Derrick Stolee
2018-05-03 18:45           ` Jakub Narebski
2018-05-01 12:47         ` [PATCH v5 10/11] merge: check config before loading commits Derrick Stolee
2018-05-01 12:47         ` [PATCH v5 11/11] commit-graph.txt: update design document Derrick Stolee
2018-05-03 11:18         ` [PATCH v5 00/11] Compute and consume generation numbers Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3e3ae7e4-9efd-3046-c0c2-7b3bf12d0d81@gmail.com \
    --to=stolee@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.