Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: jnareb@gmail.com (Jakub Narębski)
To: Abhishek Kumar <abhishekkumar8222@gmail.com>
Cc: git@vger.kernel.org,
	"Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Taylor Blau" <me@ttaylorr.com>,
	"Jakub Narębski" <jnareb@gmail.com>
Subject: Re: [PATCH v3 10/11] commit-reach: use corrected commit dates in paint_down_to_common()
Date: Thu, 03 Sep 2020 21:11:59 +0200
Message-ID: <85pn72ad4w.fsf@gmail.com> (raw)
In-Reply-To: <20200901100828.GA10388@Abhishek-Arch> (Abhishek Kumar's message of "Tue, 1 Sep 2020 15:38:28 +0530")

Hello,

Abhishek Kumar <abhishekkumar8222@gmail.com> writes:
> On Sat, Aug 22, 2020 at 09:09:21PM +0200, Jakub Narębski wrote:
>> 
>> "Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes:
>> 
>>> From: Abhishek Kumar <abhishekkumar8222@gmail.com>
>>>
>>> With corrected commit dates implemented, we no longer have to rely on
>>> commit date as a heuristic in paint_down_to_common().
>> 
>> All right, but it would be nice to have some benchmark data: what were
>> performance when using topological levels, what was performance when
>> using commit date heuristics (before this patch), what is performace now
>> when using corrected commit date.

All right, the new proposed commit message has this benchmark data.
Thanks.

>>> t6024-recursive-merge setups a unique repository where all commits have
>>> the same committer date without well-defined merge-base. As this has
>>> already caused problems (as noted in 859fdc0 (commit-graph: define
>>> GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph within the
>>> test script.
>> 
>> OK?
>
> In hindsight, that is a terrible explanation. Here's what I have revised
> this to:
>
>   With corrected commit dates implemented, we no longer have to rely on
>   commit date as a heuristic in paint_down_to_common().
>
>   While using corrected commit dates Git walks nearly the same number of
>   commits as commit date, the process is slower as for each comparision we
>   have to access the commit-slab (for corrected committer date) instead of
>   accessing struct member (for committer date).
>
>   For example, the command `git merge-base v4.8 v4.9` on the linux
>   repository walks 167468 commits, taking 0.135s for committer date and
>   167496 commits, taking 0.157s for corrected committer date respectively.
>
>   t6404-recursive-merge setups a unique repository where all commits have
>   the same committer date without well-defined merge-base. As this has
>   already caused problems (as noted in 859fdc0 (commit-graph: define
>   GIT_TEST_COMMIT_GRAPH, 2018-08-29)).
>
>   While running tests with GIT_TEST_COMMIT_GRAPH unset, we use committer
>   date as a heuristic in paint_down_to_common(). 6404.1 'combined merge
>   conflicts' merges commits in the order:
>   - Merge C with B to form a intermediate commit.
>   - Merge the intermediate commit with A.
>
>   With GIT_TEST_COMMIT_GRAPH=1, we write a commit-graph and subsequently
>   use the corrected committer date, which changes the order in which
>   commits are merged:
>   - Merge A with B to form a intermediate commit.
>   - Merge the intermediate commit with C.
>
>   While resulting repositories are equivalent, 6404.4 'virtual trees were
>   processed' fails with GIT_TEST_COMMIT_GRAPH=1 as we are selecting
>   different merge-bases and thus have different object ids for the
>   intermediate commits.
>
>   As this has already causes problems (as noted in 859fdc0 (commit-graph:
>   define GIT_TEST_COMMIT_GRAPH, 2018-08-29)), we disable commit graph
>   within t6404-recursive-merge.

Much better.  Thanks a lot.

[...]
>>> diff --git a/commit-graph.c b/commit-graph.c
>>> index c1292f8e08..6411068411 100644
>>> --- a/commit-graph.c
>>> +++ b/commit-graph.c
>>> @@ -703,6 +703,20 @@ int generation_numbers_enabled(struct repository *r)
>>>  	return !!first_generation;
>>>  }
>>>
>>> +int corrected_commit_dates_enabled(struct repository *r)
>>> +{
>>> +	struct commit_graph *g;
>>> +	if (!prepare_commit_graph(r))
>>> +		return 0;
>>> +
>>> +	g = r->objects->commit_graph;
>>> +
>>> +	if (!g->num_commits)
>>> +		return 0;
>>> +
>>> +	return !!g->chunk_generation_data;
>>> +}
>> 
>> The previous commit introduced validate_mixed_generation_chain(), which
>> walked whole split commit-graph chain, and set `read_generation_data`
>> field in `struct commit_graph` for all layers in the chain.
>> 
>> This function examines only the top layer, so it follows the assumption
>> that Git would behave in such way that oly topmost layers in the chai
>> can be GDAT-less.
>> 
>> Why the difference?  Couldn't validate_mixed_generation_chain() simply
>> call corrected_commit_dates_enabled()?
>
> The previous commit didn't need to walk the whole split commit-graph
> chain.

Errr... but `validate_mixed_generation_chain()` introduced in previous
commit in this patch series *does* walk all the layers of the whole
split commit-graph chain.

	static void validate_mixed_generation_chain(struct repository *r)
	{
		struct commit_graph *g = r->objects->commit_graph;
		int read_generation_data = 1;
	
		while (g) {
			if (!g->chunk_generation_data) {
				read_generation_data = 0;
				break;
			}
			g = g->base_graph;
		}
	
		g = r->objects->commit_graph;
	
		while (g) {
			g->read_generation_data = read_generation_data;
			g = g->base_graph;
		}
	}

Moreover it "marks up" the whole chain, actually walking it twice.

You wrote somewhere else (possibly after I wrote this post) that this
was needed to handle `git commit-graph validate`, if I remember it
correctly.

If it is true, then we need both approaches: the less expensive one
(relying on our assumptions) and the more expensive one.  But we need to
better explain both: why we need more expensive one, why we can use the
less expensive onne (how we ensure that the requirements are fulfilled).

>       Because of how we are handling writing in a mixed generation data
> chunk, if a layer has generation data chunk, all layers below it have a
> generation data chunk as well.
>
> So, there are two cases at hand:
>
> - Topmost layer has generation data chunk, so we know all layers below
>   it has generation data chunk and we can read values from it.
> - Topmost layer does not have generation data chunk, so we know we can't
>   read from generation data chunk.
>
> Just checking the topmost layer suffices - modified the previous commit.
>
> Then, this function is more or less the same as
> `g->read_generation_data` that is, if we are reading from generation
> data chunk, we are using corrected commit dates.

All right.  That explains how corrected_commit_dates_enabled() works,
but not why we need also validate_mixed_generation_chain() that sets
g->read_generation_data for every layer in the chain.

>> 
>>> +
>>>  static void close_commit_graph_one(struct commit_graph *g)
>>>  {
>>>  	if (!g)
>>> diff --git a/commit-reach.c b/commit-reach.c
>>> index 470bc80139..3a1b925274 100644
>>> --- a/commit-reach.c
>>> +++ b/commit-reach.c
>>> @@ -39,7 +39,7 @@ static struct commit_list *paint_down_to_common(struct repository *r,
>>>  	int i;
>>>  	timestamp_t last_gen = GENERATION_NUMBER_INFINITY;
>>>
>>> -	if (!min_generation)
>> 
>> This check was added in 091f4cf (commit: don't use generation numbers if
>> not needed, 2018-08-30) by Derrick Stolee, and its commit message
>> includes benchmark results for running 'git merge-base v4.8 v4.9' in
>> Linux kernel repository:
>> 
>>       v2.18.0: 0.122s    167,468 walked
>>   v2.19.0-rc1: 0.547s    635,579 walked
>>          HEAD: 0.127s
>> 
>>> +	if (!min_generation && !corrected_commit_dates_enabled(r))
>>>  		queue.compare = compare_commits_by_commit_date;
>> 
>> It would be nice to have similar benchmark for this change... unless of
>> course there is no change in performance, but I think then it needs to
>> be stated explicitly.  I think.
>> 
>
> Mentioned in the commit message - we walk (nearly) the same number of
> commits but take somewhat longer.

All right, the new proposed commit message has it.

Sidenote: this is outside of the scope of this patch series, but perhaps
we should think about bringing the `generation` field from the
commit-slab back as a member of the `struct commit`; this would need
profiling and benchmarking of the typical workload to get amortized
performance across many git commands.

>>>
>>>  	one->object.flags |= PARENT1;
>>> diff --git a/t/t6024-recursive-merge.sh b/t/t6024-recursive-merge.sh
>>> index 332cfc53fd..d3def66e7d 100755
>>> --- a/t/t6024-recursive-merge.sh
>>> +++ b/t/t6024-recursive-merge.sh

Note: this might be now t/t6404-recursive-merge.sh -- 6404 ot 6024.

>>> @@ -15,6 +15,8 @@ GIT_COMMITTER_DATE="2006-12-12 23:28:00 +0100"
>>>  export GIT_COMMITTER_DATE
>>>
>>>  test_expect_success 'setup tests' '
>>> +	GIT_TEST_COMMIT_GRAPH=0 &&
>>> +	export GIT_TEST_COMMIT_GRAPH &&
>>>  	echo 1 >a1 &&
>>>  	git add a1 &&
>>>  	GIT_AUTHOR_DATE="2006-12-12 23:00:00" git commit -m 1 a1 &&
>>> @@ -66,7 +68,7 @@ test_expect_success 'setup tests' '
>>>  '
>>>
>>>  test_expect_success 'combined merge conflicts' '
>>> -	test_must_fail env GIT_TEST_COMMIT_GRAPH=0 git merge -m final G
>>> +	test_must_fail git merge -m final G
>>>  '
>>>
>>>  test_expect_success 'result contains a conflict' '
>> 
>> OK, so instead of disabling commit-graph for this test, now we disable
>> it for the whole script.
>> 
>> Maybe this change should be in a separate patch?
>
> With the explanation in commit message, it's clear to see how using
> corrected commit dates leads to an (incorrectly) failing test. Does it
> still make sense to seperate them?

No, I think that the new commit message explains why those changes are
together.

On the other hand it might be a good idea to add a TODO comment to this
test to mark it as fragile (fixing it is certainly out of scope of this
patch series, but better have something to remind us about the issue).
Perhaps:

  # TODO: fragile test, relies on specific resolving of ambiguity

Or something like that.  The original commit that added
GIT_TEST_COMMIT_GRAPH=0 (for a single test) explained:

  There is one test in t6024-recursive-merge.sh that relies on the
  merge-base algorithm picking one of two ambiguous merge-bases, and
  the commit-graph feature changes which merge-base is picked.

I'm not sure of we could salvage some of this test as it is now adding
`env GIT_TEST_COMMIT_GRAPH=0` in more individual tests instead of
turning it off for the whole test script.  But that is something that we
can do later.

Best,
-- 
Jakub Narębski

  reply index

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  9:13 [PATCH 0/6] [GSoC] Implement Corrected Commit Date Abhishek Kumar via GitGitGadget
2020-07-28  9:13 ` [PATCH 1/6] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-07-28 15:28   ` Taylor Blau
2020-07-30  5:24     ` Abhishek Kumar
2020-08-04  0:46   ` Jakub Narębski
2020-08-04  0:56     ` Taylor Blau
2020-08-04 10:10       ` Jakub Narębski
2020-08-04  7:55     ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 2/6] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-07-28 13:00   ` Derrick Stolee
2020-07-28 15:30     ` Taylor Blau
2020-08-05 23:16   ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 3/6] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-07-28 13:14   ` Derrick Stolee
2020-07-28 15:19     ` René Scharfe
2020-07-28 15:58       ` Derrick Stolee
2020-07-28 16:01     ` Taylor Blau
2020-07-30  6:07     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 4/6] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-07-28 16:03   ` Taylor Blau
2020-07-28  9:13 ` [PATCH 5/6] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-07-28 16:12   ` Taylor Blau
2020-07-30  6:52     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 6/6] commit-graph: implement corrected commit date offset Abhishek Kumar via GitGitGadget
2020-07-28 15:55   ` Derrick Stolee
2020-07-28 16:23     ` Taylor Blau
2020-07-30  7:27     ` Abhishek Kumar
2020-07-28 14:54 ` [PATCH 0/6] [GSoC] Implement Corrected Commit Date Taylor Blau
2020-07-30  7:47   ` Abhishek Kumar
2020-07-28 16:35 ` Derrick Stolee
2020-08-09  2:53 ` [PATCH v2 00/10] " Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 01/10] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 04/10] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 05/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-10 16:28     ` Derrick Stolee
2020-08-11 11:03       ` Abhishek Kumar
2020-08-11 12:27         ` Derrick Stolee
2020-08-11 18:58           ` Taylor Blau
2020-08-09  2:53   ` [PATCH v2 06/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 07/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-10 14:23     ` Derrick Stolee
2020-08-14  4:59       ` Abhishek Kumar
2020-08-14 12:24         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 08/10] commit-graph: handle mixed generation commit chains Abhishek Kumar via GitGitGadget
2020-08-10 16:42     ` Derrick Stolee
2020-08-11 11:36       ` Abhishek Kumar
2020-08-11 12:43         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-10 16:47   ` [PATCH v2 00/10] [GSoC] Implement Corrected Commit Date Derrick Stolee
2020-08-15 16:39   ` [PATCH v3 00/11] " Abhishek Kumar via GitGitGadget
2020-08-15 16:39     ` [PATCH v3 01/11] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-17 22:30       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 02/11] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-18 14:18       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 03/11] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-19 17:54       ` Jakub Narębski
2020-08-21  4:11         ` Abhishek Kumar
2020-08-25 11:11           ` Jakub Narębski
2020-09-01 11:35             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 04/11] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-17 13:22       ` Derrick Stolee
2020-08-21 11:05       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 05/11] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-21 13:14       ` Jakub Narębski
2020-08-25  5:04         ` Abhishek Kumar
2020-08-25 12:18           ` Jakub Narębski
2020-09-01 12:06             ` Abhishek Kumar
2020-09-03 13:42               ` Jakub Narębski
2020-09-05 17:21                 ` Abhishek Kumar
2020-09-13 15:39                   ` Jakub Narębski
2020-09-28 21:48                     ` Jakub Narębski
2020-10-05  5:25                       ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 06/11] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-08-21 18:43       ` Jakub Narębski
2020-08-25  6:14         ` Abhishek Kumar
2020-08-25  7:33           ` Jakub Narębski
2020-08-25  7:56             ` Jakub Narębski
2020-09-01 10:26               ` Abhishek Kumar
2020-09-03  9:25                 ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 07/11] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-22  0:05       ` Jakub Narębski
2020-08-25  6:49         ` Abhishek Kumar
2020-08-25 10:07           ` Jakub Narębski
2020-09-01 11:01             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 08/11] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-22 13:09       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 09/11] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-08-22 17:14       ` Jakub Narębski
2020-08-26  7:15         ` Abhishek Kumar
2020-08-26 10:38           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 10/11] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-22 19:09       ` Jakub Narębski
2020-09-01 10:08         ` Abhishek Kumar
2020-09-03 19:11           ` Jakub Narębski [this message]
2020-08-15 16:39     ` [PATCH v3 11/11] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-22 22:20       ` Jakub Narębski
2020-08-27  6:39         ` Abhishek Kumar
2020-08-27 12:43           ` Jakub Narębski
2020-08-27 13:15           ` Derrick Stolee
2020-09-01 13:01             ` Abhishek Kumar
2020-08-17  0:13     ` [PATCH v3 00/11] [GSoC] Implement Corrected Commit Date Jakub Narębski
     [not found]       ` <CANQwDwdKp7oKy9BeKdvKhwPUiq0R5MS8TCw-eWGCYCoMGv=G-g@mail.gmail.com>
2020-08-17  1:32         ` Fwd: " Taylor Blau
2020-08-17  7:56           ` Jakub Narębski
2020-08-18  6:12       ` Abhishek Kumar
2020-08-23 15:27       ` Jakub Narębski
2020-08-24  2:49         ` Abhishek Kumar
2020-10-07 14:09     ` [PATCH v4 00/10] " Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 01/10] commit-graph: fix regression when computing Bloom filters Abhishek Kumar via GitGitGadget
2020-10-24 23:16         ` Jakub Narębski
2020-10-25 20:58           ` Taylor Blau
2020-10-07 14:09       ` [PATCH v4 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-10-24 23:41         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-10-25 10:52         ` Jakub Narębski
2020-10-27  6:33           ` Abhishek Kumar
2020-10-07 14:09       ` [PATCH v4 04/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-10-25 13:48         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 05/10] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-10-25 22:17         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 06/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-10-27 18:53         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 07/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-10-30 12:45         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 08/10] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85pn72ad4w.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git