Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: jnareb@gmail.com (Jakub Narębski)
To: "Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, "Derrick Stolee" <stolee@gmail.com>,
	"Taylor Blau" <me@ttaylorr.com>,
	"Abhishek Kumar" <abhishekkumar8222@gmail.com>,
	"Jakub Narębski" <jnareb@gmail.com>
Subject: Re: [PATCH v3 03/11] commit-graph: consolidate fill_commit_graph_info
Date: Wed, 19 Aug 2020 19:54:20 +0200
Message-ID: <857dtuo71v.fsf@gmail.com> (raw)
In-Reply-To: <18d5864f81e89585cc94cd12eca166a9d8b929a5.1597509583.git.gitgitgadget@gmail.com> (Abhishek Kumar via GitGitGadget's message of "Sat, 15 Aug 2020 16:39:35 +0000")

"Abhishek Kumar via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Abhishek Kumar <abhishekkumar8222@gmail.com>
>
> Both fill_commit_graph_info() and fill_commit_in_graph() parse
> information present in commit data chunk. Let's simplify the
> implementation by calling fill_commit_graph_info() within
> fill_commit_in_graph().
>
> The test 'generate tar with future mtime' creates a commit with commit
> time of (2 ^ 36 + 1) seconds since EPOCH. The commit time overflows into
> generation number (within CDAT chunk) and has undefined behavior.
>
> The test used to pass

Could you please tell us how does this test starts to fail without the
change to the test described there?  What is the error message, etc.?

>                       as fill_commit_in_graph() guarantees the values of
                           ^^^^^^^^^^^^^^^^^^^^^^

s/fill_commit_in_graph()/fill_commit_graph_info()/

It is fill_commit_graph_info() that changes its behavior in this patch.

> graph position and generation number, and did not load timestamp.
> However, with corrected commit date we will need load the timestamp as
> well to populate the generation number.
>
> Let's fix the test by setting a timestamp of (2 ^ 34 - 1) seconds.

I think this commit should be split into two commits:
- fix to the 'generate tar with future mtime' test
- simplify implementation of fill_commit_in_graph()

The test 'generate tar with future mtime' in t/t5000-tar-tree.sh creates
a commit with commit time of (2 ^ 36 - 1) seconds since EPOCH
(68719476737). However, the commit-graph file format version 1 provides
only 34-bits for storing committer date (32 + 2 bits), not 64-bits.
Therefore maximum far in the future commit time can only be at most
(2 ^ 34 - 1) seconds since EPOCH, as Stolee said in commet for v1
of this series.

This "limitation" is not a problem in practice, because the maximum
timestamp allowed takes place in the year 2514. I hope at that time
there would be no Git version in use that still crashes on changing the
version field in the commit-graph format -- then we can simply get rid
of storing topological levels (generation number v1) in those 30 bits of
CDAT chunk and use full 64 bits for committer date.

Git does not perform any bounds checking for committer date value in
write_graph_chunk_data():

	uint32_t packedDate[2];

	/* ... */

	if (sizeof((*list)->date) > 4)
		packedDate[0] = htonl(((*list)->date >> 32) & 0x3);
	else
		packedDate[0] = 0;

	packedDate[0] |= htonl(commit_graph_data_at(*list)->generation << 2);

	packedDate[1] = htonl((*list)->date);
	hashwrite(f, packedDate, 8);

This means that the date is trimmed to 34 bits on save discarding most
significant bits, assuming that unsigned overflow simply discards most
significant bits truncating the signed (?) value.

In this case running the test with GIT_TEST_COMMIT_GRAPH=1 would lead to
errors, as the committer date read from the commit graph would be
incorrect, and therefore generation number v2 would also be incorrect.


I don't quite understand however how second part of this patch in its
current iteration, namely simplifing the implementation of
fill_commit_in_graph() makes this bug / error shows...

Do I understand it correctly that before this change the committer date
would always be parsed out of the commit object, instead of reading it
from the commit-graph file?  However the only user of static
fill_commit_in_graph() is the parse_commit_in_graph(), which in turn is
used by parse_commit_gently(); but fill_commit_in_graph() read commit
date from commit-graph before this change... color me confused.

Ah, after the change fill_commit_graph_info() changes its behavior, not
fill_commit_in_graph() as said in the commit message. Before this commit
it used to only load graph position and generation number, and did not
load the timestamp. The function fill_commit_graph_info() is used in
turn by public-facing load_commit_graph_info():

  /*
   * It is possible that we loaded commit contents from the commit buffer,
   * but we also want to ensure the commit-graph content is correctly
   * checked and filled. Fill the graph_pos and generation members of
   * the given commit.
   */
  void load_commit_graph_info(struct repository *r, struct commit *item);

This function is used in turn by get_bloom_filter(), contains_tag_algo()
and parse_commit_buffer(), change in any of which behavior can lead to
failing 'generate tar with future mtime' test.

>
> Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com>
> ---
>  commit-graph.c      | 29 +++++++++++------------------
>  t/t5000-tar-tree.sh |  4 ++--
>  2 files changed, 13 insertions(+), 20 deletions(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index ace7400a1a..af8d9cc45e 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -725,15 +725,24 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g,
>  	const unsigned char *commit_data;
>  	struct commit_graph_data *graph_data;
>  	uint32_t lex_index;
> +	uint64_t date_high, date_low;
>
>  	while (pos < g->num_commits_in_base)
>  		g = g->base_graph;
>
> +	if (pos >= g->num_commits + g->num_commits_in_base)
> +		die(_("invalid commit position. commit-graph is likely corrupt"));
> +

All right, if we want to use fill_commit_graph_info() function to load
the graph data (graph position and generation number) in the
fill_commit_in_graph() we need to perform this check.

I'd think that this check should be here from the beginning, just in
case.


Sidenote: I wonder if it would be good idea to print more information in
the above error message, for example:

	die(_("invalid commit position %ld. commit-graph '%s' is likely corrupt"),
        pos, g->filename);

But this is unrelated thing, tangential to this change, and it might not
add anything useful.

>  	lex_index = pos - g->num_commits_in_base;
>  	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;
>
>  	graph_data = commit_graph_data_at(item);
>  	graph_data->graph_pos = pos;
> +
> +	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
> +	date_low = get_be32(commit_data + g->hash_len + 12);
> +	item->date = (timestamp_t)((date_high << 32) | date_low);
> +

I think this change, moving loading of commit date from commit-graph out
of fill_commit_in_graph() and into fill_commit_graph_info(), is in my
opinion a bit inadequatly described in the commit message. As I
understand it this change prepares fill_commit_graph_info() for
generation number v2, that is Corrected Commit Date, where loading
commit date from CDAT together with loading offset from GDAT would be
necessary to correctly set the 'generation' field of 'struct
commit_graph_data' (on the commit_graph_data_slab).

I'm not sure if it would be worth it splitting this refactoring change
(Move Statements into Function) into a separate patch -- it would split
this commit into three, changing 11 part series into 13 part series.


Note that we might want to update the description of
load_commit_graph_info() in commit-graph.h to include that it
incidentally loads commit date from the commit-graph.  Butthis might be
not worth it -- it is a side effect, not the major goal of this
function.

>  	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
>  }
>
> @@ -748,38 +757,22 @@ static int fill_commit_in_graph(struct repository *r,
>  {
>  	uint32_t edge_value;
>  	uint32_t *parent_data_ptr;
> -	uint64_t date_low, date_high;
>  	struct commit_list **pptr;
> -	struct commit_graph_data *graph_data;
>  	const unsigned char *commit_data;
>  	uint32_t lex_index;
>
>  	while (pos < g->num_commits_in_base)
>  		g = g->base_graph;
>
> -	if (pos >= g->num_commits + g->num_commits_in_base)
> -		die(_("invalid commit position. commit-graph is likely corrupt"));
> +	fill_commit_graph_info(item, g, pos);

All right, the check got moved into fill_commit_graph_info().

>
> -	/*
> -	 * Store the "full" position, but then use the
> -	 * "local" position for the rest of the calculation.
> -	 */
> -	graph_data = commit_graph_data_at(item);
> -	graph_data->graph_pos = pos;

All right, 'graph_pos' field in the graph data (on commit slab) got
filled by just called load_commit_graph_info().

>  	lex_index = pos - g->num_commits_in_base;
> -
> -	commit_data = g->chunk_commit_data + (g->hash_len + 16) * lex_index;
> +	commit_data = g->chunk_commit_data + GRAPH_DATA_WIDTH * lex_index;

All right, unrelated cleanup in the neighbourhood.

>
>  	item->object.parsed = 1;
>
>  	set_commit_tree(item, NULL);
>
> -	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
> -	date_low = get_be32(commit_data + g->hash_len + 12);
> -	item->date = (timestamp_t)((date_high << 32) | date_low);
> -

All right, this code got moved down the call chain into just called
load_commit_graph_info().

> -	graph_data->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
> -


All right, 'generation' field in the graph data (on commit slab) got
filled by load_commit_graph_info() called at the beginning of the
function.



>  	pptr = &item->parents;
>
>  	edge_value = get_be32(commit_data + g->hash_len);
> diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
> index 37655a237c..1986354fc3 100755
> --- a/t/t5000-tar-tree.sh
> +++ b/t/t5000-tar-tree.sh
> @@ -406,7 +406,7 @@ test_expect_success TIME_IS_64BIT 'set up repository with far-future commit' '
>  	rm -f .git/index &&
>  	echo content >file &&
>  	git add file &&
> -	GIT_COMMITTER_DATE="@68719476737 +0000" \
> +	GIT_COMMITTER_DATE="@17179869183 +0000" \
>  		git commit -m "tempori parendum"
>  '
>
> @@ -415,7 +415,7 @@ test_expect_success TIME_IS_64BIT 'generate tar with future mtime' '
>  '
>
>  test_expect_success TAR_HUGE,TIME_IS_64BIT,TIME_T_IS_64BIT 'system tar can read our future mtime' '
> -	echo 4147 >expect &&
> +	echo 2514 >expect &&
>  	tar_info future.tar | cut -d" " -f2 >actual &&
>  	test_cmp expect actual
>  '

Looks good to me.

Best,
-- 
Jakub Narębski

  reply index

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-28  9:13 [PATCH 0/6] [GSoC] Implement Corrected Commit Date Abhishek Kumar via GitGitGadget
2020-07-28  9:13 ` [PATCH 1/6] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-07-28 15:28   ` Taylor Blau
2020-07-30  5:24     ` Abhishek Kumar
2020-08-04  0:46   ` Jakub Narębski
2020-08-04  0:56     ` Taylor Blau
2020-08-04 10:10       ` Jakub Narębski
2020-08-04  7:55     ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 2/6] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-07-28 13:00   ` Derrick Stolee
2020-07-28 15:30     ` Taylor Blau
2020-08-05 23:16   ` Jakub Narębski
2020-07-28  9:13 ` [PATCH 3/6] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-07-28 13:14   ` Derrick Stolee
2020-07-28 15:19     ` René Scharfe
2020-07-28 15:58       ` Derrick Stolee
2020-07-28 16:01     ` Taylor Blau
2020-07-30  6:07     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 4/6] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-07-28 16:03   ` Taylor Blau
2020-07-28  9:13 ` [PATCH 5/6] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-07-28 16:12   ` Taylor Blau
2020-07-30  6:52     ` Abhishek Kumar
2020-07-28  9:13 ` [PATCH 6/6] commit-graph: implement corrected commit date offset Abhishek Kumar via GitGitGadget
2020-07-28 15:55   ` Derrick Stolee
2020-07-28 16:23     ` Taylor Blau
2020-07-30  7:27     ` Abhishek Kumar
2020-07-28 14:54 ` [PATCH 0/6] [GSoC] Implement Corrected Commit Date Taylor Blau
2020-07-30  7:47   ` Abhishek Kumar
2020-07-28 16:35 ` Derrick Stolee
2020-08-09  2:53 ` [PATCH v2 00/10] " Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 01/10] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 04/10] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 05/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-10 16:28     ` Derrick Stolee
2020-08-11 11:03       ` Abhishek Kumar
2020-08-11 12:27         ` Derrick Stolee
2020-08-11 18:58           ` Taylor Blau
2020-08-09  2:53   ` [PATCH v2 06/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 07/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-10 14:23     ` Derrick Stolee
2020-08-14  4:59       ` Abhishek Kumar
2020-08-14 12:24         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 08/10] commit-graph: handle mixed generation commit chains Abhishek Kumar via GitGitGadget
2020-08-10 16:42     ` Derrick Stolee
2020-08-11 11:36       ` Abhishek Kumar
2020-08-11 12:43         ` Derrick Stolee
2020-08-09  2:53   ` [PATCH v2 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-09  2:53   ` [PATCH v2 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-10 16:47   ` [PATCH v2 00/10] [GSoC] Implement Corrected Commit Date Derrick Stolee
2020-08-15 16:39   ` [PATCH v3 00/11] " Abhishek Kumar via GitGitGadget
2020-08-15 16:39     ` [PATCH v3 01/11] commit-graph: fix regression when computing bloom filter Abhishek Kumar via GitGitGadget
2020-08-17 22:30       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 02/11] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-08-18 14:18       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 03/11] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-08-19 17:54       ` Jakub Narębski [this message]
2020-08-21  4:11         ` Abhishek Kumar
2020-08-25 11:11           ` Jakub Narębski
2020-09-01 11:35             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 04/11] commit-graph: consolidate compare_commits_by_gen Abhishek Kumar via GitGitGadget
2020-08-17 13:22       ` Derrick Stolee
2020-08-21 11:05       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 05/11] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-08-21 13:14       ` Jakub Narębski
2020-08-25  5:04         ` Abhishek Kumar
2020-08-25 12:18           ` Jakub Narębski
2020-09-01 12:06             ` Abhishek Kumar
2020-09-03 13:42               ` Jakub Narębski
2020-09-05 17:21                 ` Abhishek Kumar
2020-09-13 15:39                   ` Jakub Narębski
2020-09-28 21:48                     ` Jakub Narębski
2020-10-05  5:25                       ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 06/11] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-08-21 18:43       ` Jakub Narębski
2020-08-25  6:14         ` Abhishek Kumar
2020-08-25  7:33           ` Jakub Narębski
2020-08-25  7:56             ` Jakub Narębski
2020-09-01 10:26               ` Abhishek Kumar
2020-09-03  9:25                 ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 07/11] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-08-22  0:05       ` Jakub Narębski
2020-08-25  6:49         ` Abhishek Kumar
2020-08-25 10:07           ` Jakub Narębski
2020-09-01 11:01             ` Abhishek Kumar
2020-08-15 16:39     ` [PATCH v3 08/11] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-08-22 13:09       ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 09/11] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-08-22 17:14       ` Jakub Narębski
2020-08-26  7:15         ` Abhishek Kumar
2020-08-26 10:38           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 10/11] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-08-22 19:09       ` Jakub Narębski
2020-09-01 10:08         ` Abhishek Kumar
2020-09-03 19:11           ` Jakub Narębski
2020-08-15 16:39     ` [PATCH v3 11/11] doc: add corrected commit date info Abhishek Kumar via GitGitGadget
2020-08-22 22:20       ` Jakub Narębski
2020-08-27  6:39         ` Abhishek Kumar
2020-08-27 12:43           ` Jakub Narębski
2020-08-27 13:15           ` Derrick Stolee
2020-09-01 13:01             ` Abhishek Kumar
2020-08-17  0:13     ` [PATCH v3 00/11] [GSoC] Implement Corrected Commit Date Jakub Narębski
     [not found]       ` <CANQwDwdKp7oKy9BeKdvKhwPUiq0R5MS8TCw-eWGCYCoMGv=G-g@mail.gmail.com>
2020-08-17  1:32         ` Fwd: " Taylor Blau
2020-08-17  7:56           ` Jakub Narębski
2020-08-18  6:12       ` Abhishek Kumar
2020-08-23 15:27       ` Jakub Narębski
2020-08-24  2:49         ` Abhishek Kumar
2020-10-07 14:09     ` [PATCH v4 00/10] " Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 01/10] commit-graph: fix regression when computing Bloom filters Abhishek Kumar via GitGitGadget
2020-10-24 23:16         ` Jakub Narębski
2020-10-25 20:58           ` Taylor Blau
2020-10-07 14:09       ` [PATCH v4 02/10] revision: parse parent in indegree_walk_step() Abhishek Kumar via GitGitGadget
2020-10-24 23:41         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 03/10] commit-graph: consolidate fill_commit_graph_info Abhishek Kumar via GitGitGadget
2020-10-25 10:52         ` Jakub Narębski
2020-10-27  6:33           ` Abhishek Kumar
2020-10-07 14:09       ` [PATCH v4 04/10] commit-graph: return 64-bit generation number Abhishek Kumar via GitGitGadget
2020-10-25 13:48         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 05/10] commit-graph: add a slab to store topological levels Abhishek Kumar via GitGitGadget
2020-10-25 22:17         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 06/10] commit-graph: implement corrected commit date Abhishek Kumar via GitGitGadget
2020-10-27 18:53         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 07/10] commit-graph: implement generation data chunk Abhishek Kumar via GitGitGadget
2020-10-30 12:45         ` Jakub Narębski
2020-10-07 14:09       ` [PATCH v4 08/10] commit-graph: use generation v2 only if entire chain does Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 09/10] commit-reach: use corrected commit dates in paint_down_to_common() Abhishek Kumar via GitGitGadget
2020-10-07 14:09       ` [PATCH v4 10/10] doc: add corrected commit date info Abhishek Kumar via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=857dtuo71v.fsf@gmail.com \
    --to=jnareb@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git