git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, me@ttaylorr.com,
	abhishekkumar8222@gmail.com,
	Derrick Stolee <derrickstolee@github.com>
Subject: Re: [PATCH 2/7] commit-graph: fix ordering bug in generation numbers
Date: Thu, 24 Feb 2022 14:15:02 -0800	[thread overview]
Message-ID: <xmqqh78nkj2x.fsf@gitster.g> (raw)
In-Reply-To: <6e47ffed25795260c2b8614d4589fb58d892c8df.1645735117.git.gitgitgadget@gmail.com> (Derrick Stolee via GitGitGadget's message of "Thu, 24 Feb 2022 20:38:31 +0000")

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <derrickstolee@github.com>
>
> When computing the generation numbers for a commit-graph, we compute
> the corrected commit dates and then check if their offsets from the
> actual dates is too large to fit in the 32-bit Generation Data chunk.
> However, there is a problem with this approach: if we have parsed the
> generation data from the previous commit-graph, then we continue the
> loop because the corrected commit date is already computed.
>
> It is incorrect to add an increment to num_generation_data_overflows
> here, because we might start double-counting commits that are computed
> because of the depth-first search walk from a commit with an earlier
> OID.
>
> Instead, iterate over the full commit list at the end, checking the
> offsets to see how many grow beyond the maximum value.

Hmph, I can see how the new code correctly counts the commits that
require offsets that are too large, but I am not sure why the fix is
needed.  The overall loop structure is

    for each commit ctx->commits.list[i]:
        continue if generation number has been computed for it already

	set up a commit-list for depth first search
	while (we are still digging) {
		for each parent {
			if generation for the parent is not known yet:
				push it down and redo
			else
				compute max of parents' generation number
		}
                if (all parents' generation number is known) {
			set the generation number for ourselves
			count if we needed an offset that is too big
		}
	}
    }

The only case where we may double-count near the end of inner loop I
can think of is when we end up computing generation for the same
commit in the while () loop.  But isn't that "we dig the same thing
twice" by itself something we want to fix, regardless of the
double-counting issue?

IOW,

>  				if (current->date && current->date > max_corrected_commit_date)
>  					max_corrected_commit_date = current->date - 1;
>  				commit_graph_data_at(current)->generation = max_corrected_commit_date + 1;
> -
> -				if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX)
> -					ctx->num_generation_data_overflows++;
>  			}
>  		}
>  	}

here, before doing the assignment for the "current" commit's
generation number, if we added

		if (commit_graph_data_at(current)->generation !=
		    GENERATION_NUMBER_ZERO)
			BUG("why are we digging it twice?");

would it trigger?  If so, isn't that already a bug worth fixing?

Perhaps avoiding the second round, perhaps like this, may be a
better fix?

	while (list) {
		struct commit *current = list->item;
		struct commit_list *parent;
		int all_parents_computed = 1;
		timestamp_t max_corrected_commit_date = 0;

+		if (commit_graph_data_at(current)->generation !=
+		    GENERATION_NUMBER_ZERO) {
+			pop_commit(&list);
+			continue;
+		}
+
		for (parent = current->parents; parent; parent = parent->next) {

Or am I grossly misunderstanding why the original code is incorrect
to have the counting at this place?

> +
> +	for (i = 0; i < ctx->commits.nr; i++) {
> +		struct commit *c = ctx->commits.list[i];
> +		timestamp_t offset = commit_graph_data_at(c)->generation - c->date;
> +		if (offset > GENERATION_NUMBER_V2_OFFSET_MAX)
> +			ctx->num_generation_data_overflows++;
> +	}
>  	stop_progress(&ctx->progress);
>  }
>  
> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
> index 2b05026cf6d..f9bffe38013 100755
> --- a/t/t5318-commit-graph.sh
> +++ b/t/t5318-commit-graph.sh
> @@ -467,10 +467,10 @@ test_expect_success 'warn on improper hash version' '
>  	)
>  '
>  
> -test_expect_success 'lower layers have overflow chunk' '
> +test_expect_success TIME_IS_64BIT,TIME_T_IS_64BIT 'lower layers have overflow chunk' '
>  	cd "$TRASH_DIRECTORY/full" &&
>  	UNIX_EPOCH_ZERO="@0 +0000" &&
> -	FUTURE_DATE="@2147483646 +0000" &&
> +	FUTURE_DATE="@4147483646 +0000" &&
>  	rm -f .git/objects/info/commit-graph &&
>  	test_commit --date "$FUTURE_DATE" future-1 &&
>  	test_commit --date "$UNIX_EPOCH_ZERO" old-1 &&

  reply	other threads:[~2022-02-24 22:15 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-24 20:38 [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 1/7] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-24 20:38 ` [PATCH 2/7] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-24 22:15   ` Junio C Hamano [this message]
2022-02-25 13:51     ` Derrick Stolee
2022-02-25 17:35       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 3/7] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:18   ` Patrick Steinhardt
2022-02-28 16:23     ` Derrick Stolee
2022-02-28 16:59       ` Patrick Steinhardt
2022-02-28 18:44         ` Derrick Stolee
2022-03-01  9:46           ` Patrick Steinhardt
2022-03-01 10:35             ` Patrick Steinhardt
2022-03-01 14:06               ` Derrick Stolee
2022-03-01 14:53                 ` Patrick Steinhardt
2022-03-01 15:25                   ` Derrick Stolee
2022-03-02 13:57                     ` Patrick Steinhardt
2022-03-02 14:57                       ` Derrick Stolee
2022-03-02 18:15                         ` Junio C Hamano
2022-03-02 18:46                           ` Derrick Stolee
2022-03-02 22:42                             ` Junio C Hamano
2022-03-03 11:19                         ` Patrick Steinhardt
2022-03-03 16:00                           ` Derrick Stolee
2022-03-04 14:03                             ` Derrick Stolee
2022-03-07 10:34                               ` Patrick Steinhardt
2022-03-07 13:45                                 ` Derrick Stolee
2022-03-07 17:22                                   ` Junio C Hamano
2022-03-10 13:58                                   ` Patrick Steinhardt
2022-03-10 17:18                                     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 4/7] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-24 22:35   ` Junio C Hamano
2022-02-25 13:53     ` Derrick Stolee
2022-02-25 17:38       ` Junio C Hamano
2022-02-24 20:38 ` [PATCH 5/7] commit-graph: document file format v2 Derrick Stolee via GitGitGadget
2022-02-24 22:55   ` Junio C Hamano
2022-02-25 22:31   ` Ævar Arnfjörð Bjarmason
2022-02-28 13:44     ` Derrick Stolee
2022-02-28 14:27       ` Ævar Arnfjörð Bjarmason
2022-02-28 16:39         ` Derrick Stolee
2022-02-28 21:14           ` Ævar Arnfjörð Bjarmason
2022-03-01 14:19             ` Derrick Stolee
2022-03-01 14:29               ` Ævar Arnfjörð Bjarmason
2022-03-01 15:59                 ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 6/7] commit-graph: parse " Derrick Stolee via GitGitGadget
2022-02-24 23:01   ` Junio C Hamano
2022-02-25 13:54     ` Derrick Stolee
2022-02-24 20:38 ` [PATCH 7/7] commit-graph: write " Derrick Stolee via GitGitGadget
2022-02-24 21:42 ` [PATCH 0/7] Commit-graph: Generation Number v2 Fixes, v3 implementation Junio C Hamano
2022-02-24 23:06   ` Junio C Hamano
2022-02-25 13:55     ` Derrick Stolee
2022-02-28 13:53 ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Derrick Stolee via GitGitGadget
2022-02-28 13:53   ` [PATCH v2 1/4] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-02-28 15:22     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 2/4] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-02-28 15:25     ` Ævar Arnfjörð Bjarmason
2022-02-28 13:53   ` [PATCH v2 3/4] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-02-28 15:30     ` Ævar Arnfjörð Bjarmason
2022-02-28 16:43       ` Derrick Stolee
2022-02-28 13:53   ` [PATCH v2 4/4] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget
2022-02-28 15:40     ` Ævar Arnfjörð Bjarmason
2022-03-01 17:23   ` [PATCH v2 0/4] Commit-graph: Generation Number v2 Fixes Ævar Arnfjörð Bjarmason
2022-03-01 19:48   ` [PATCH v3 0/5] " Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 1/5] test-read-graph: include extra post-parse info Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 2/5] t5318: extract helpers to lib-commit-graph.sh Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 3/5] commit-graph: fix ordering bug in generation numbers Derrick Stolee via GitGitGadget
2022-03-01 20:13       ` Junio C Hamano
2022-03-01 20:30         ` Junio C Hamano
2022-03-02 14:13           ` Derrick Stolee
2022-03-01 19:48     ` [PATCH v3 4/5] commit-graph: start parsing generation v2 (again) Derrick Stolee via GitGitGadget
2022-03-01 19:48     ` [PATCH v3 5/5] commit-graph: fix generation number v2 overflow values Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xmqqh78nkj2x.fsf@gitster.g \
    --to=gitster@pobox.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).