All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jakub Narebski <jnareb@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>
Cc: "git@vger.kernel.org" <git@vger.kernel.org>,
	"gitster@pobox.com" <gitster@pobox.com>,
	"avarab@gmail.com" <avarab@gmail.com>,
	"marten.agren@gmail.com" <marten.agren@gmail.com>,
	"peff@peff.net" <peff@peff.net>
Subject: Re: [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup
Date: Wed, 30 May 2018 12:18:26 -0400	[thread overview]
Message-ID: <b34afb39-8df2-b825-773c-8526d93314b5@gmail.com> (raw)
In-Reply-To: <86in75w9m9.fsf@gmail.com>


On 5/30/2018 9:34 AM, Jakub Narebski wrote:
> Derrick Stolee <dstolee@microsoft.com> writes:
>
>> In the commit-graph file, the OID fanout chunk provides an index into
>> the OID lookup. The 'verify' subcommand should find incorrect values
>> in the fanout.
>>
>> Similarly, the 'verify' subcommand should find out-of-order values in
>> the OID lookup.
> O.K., the OID Lookup chunk is checked together with OID Fanout chunk
> because those two chunks are tightly connected: OID Fanout is fanout
> into OID Lookup.
>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>>   commit-graph.c          | 36 ++++++++++++++++++++++++++++++++++++
>>   t/t5318-commit-graph.sh | 22 ++++++++++++++++++++++
>>   2 files changed, 58 insertions(+)
>>
>> diff --git a/commit-graph.c b/commit-graph.c
>> index 06e3e4f9ba..cbd1aae514 100644
>> --- a/commit-graph.c
>> +++ b/commit-graph.c
>> @@ -855,6 +855,9 @@ static void graph_report(const char *fmt, ...)
>>   
>>   int verify_commit_graph(struct commit_graph *g)
>>   {
>> +	uint32_t i, cur_fanout_pos = 0;
>> +	struct object_id prev_oid, cur_oid;
> Minor nitpick about the naming convention: why cur_*, and not curr_*?
>
>> +
>>   	if (!g) {
>>   		graph_report("no commit-graph file loaded");
>>   		return 1;
>> @@ -869,5 +872,38 @@ int verify_commit_graph(struct commit_graph *g)
>>   	if (!g->chunk_commit_data)
>>   		graph_report("commit-graph is missing the Commit Data chunk");
>>   
>> +	if (verify_commit_graph_error)
>> +		return verify_commit_graph_error;
>> +
> Before checking that commits in OID Lookup are sorted, and that OID
> Fanout correctly points into OID Lookup (thus also checking that OID
> Lookup is sorted), it would be good to verify that sizes of those chunks
> are correct.
>
> Out of 4 standrd chunks, 1 of them (OIDF) has constant size, and 2 of
> them have size given by number of commits and hash size
>   - OID Fanout is (256 * 4 bytes)
>   - OID Lookup is (N * H bytes),
>     where N is number of commits, and H is hash size
>
> The one that is more significant is if number of commits gets corrupted
> upwards, and walking through OID Lookup would lead us out of bounds,
> outside the file size.
>
> IIRC we have checked that all chunks fit within file size, isn't it?
>
>> +	for (i = 0; i < g->num_commits; i++) {
>> +		hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
> Why do you use hashcpy() here, but oidcpy() below?

We are copying from a section of binary data, not from a struct object_id *.

>
>> +
>> +		if (i && oidcmp(&prev_oid, &cur_oid) >= 0)
> All right, OIDs needs to be sorted in ascending lexicographical order,
> and the above condition checks that this constraint is fullfilled.
>
>> +			graph_report("commit-graph has incorrect OID order: %s then %s",
>> +				     oid_to_hex(&prev_oid),
>> +				     oid_to_hex(&cur_oid));
> Nice informative error message.
>
>> +
>> +		oidcpy(&prev_oid, &cur_oid);
>> +
>> +		while (cur_oid.hash[0] > cur_fanout_pos) {
>> +			uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
>> +			if (i != fanout_value)
>> +				graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
>> +					     cur_fanout_pos, fanout_value, i);
>> +
>> +			cur_fanout_pos++;
>> +		}
> The k-th entry of fanout, F[k], stores the number of OIDs with first
> byte at most k.
>
> Let's examine it in detail on a simple example:
>
>     fanout                     lookup
>     ------                     ------
>     0 : 0  ---------------> 0: 1xxxx....
>     1 : 2  -----\           1: 1yyyy....
>     2 : 2  ------\--------> 2: 3xxxx....
>     3 : 3  ---------------> 3: 4xxxx....
>
> We are checking that after most significant byte (first byte) changes,
> then all fanout position up to the current first byte value (exclusive)
> point to current position in OID Lookup chunk.
>
> Seems all right; it would be nice to come up with rigorous proof that it
> is all right.
>
>> +	}
>> +
>> +	while (cur_fanout_pos < 256) {
>> +		uint32_t fanout_value = get_be32(g->chunk_oid_fanout + cur_fanout_pos);
>> +
>> +		if (g->num_commits != fanout_value)
>> +			graph_report("commit-graph has incorrect fanout value: fanout[%d] = %u != %u",
>> +				     cur_fanout_pos, fanout_value, i);
>> +
>> +		cur_fanout_pos++;
>> +	}
> All right, this checks that all remaining fanout entries, up and
> including the 255-ith entry store the total number of commits, N.
>
>> +
>>   	return verify_commit_graph_error;
>>   }
>> diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
>> index 4ef3fe3dc2..c050ef980b 100755
>> --- a/t/t5318-commit-graph.sh
>> +++ b/t/t5318-commit-graph.sh
>> @@ -247,6 +247,7 @@ test_expect_success 'git commit-graph verify' '
>>   	git commit-graph verify >output
>>   '
>>   
>> +HASH_LEN=20
>>   GRAPH_BYTE_VERSION=4
>>   GRAPH_BYTE_HASH=5
>>   GRAPH_BYTE_CHUNK_COUNT=6
>> @@ -258,6 +259,12 @@ GRAPH_BYTE_OID_LOOKUP_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
>>   			      1 \* $GRAPH_CHUNK_LOOKUP_WIDTH`
>>   GRAPH_BYTE_COMMIT_DATA_ID=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
>>   				2 \* $GRAPH_CHUNK_LOOKUP_WIDTH`
>> +GRAPH_FANOUT_OFFSET=`expr $GRAPH_CHUNK_LOOKUP_OFFSET + \
>> +			  $GRAPH_CHUNK_LOOKUP_WIDTH \* $GRAPH_CHUNK_LOOKUP_ROWS`
>> +GRAPH_BYTE_FANOUT1=`expr $GRAPH_FANOUT_OFFSET + 4 \* 4`
>> +GRAPH_BYTE_FANOUT2=`expr $GRAPH_FANOUT_OFFSET + 4 \* 255`
>> +GRAPH_OID_LOOKUP_OFFSET=`expr $GRAPH_FANOUT_OFFSET + 4 \* 256`
>> +GRAPH_BYTE_OID_LOOKUP_ORDER=`expr $GRAPH_OID_LOOKUP_OFFSET + $HASH_LEN \* 8`
> Something that I forgot to write about in previous patch.
>
> POSIX shell includes $(( ... )) for arithmetic expansion, there is
> nowadays no need to use $(expr ...), and even more no need to use
> pre-POSIX `expr ...`.
>
> TLDR: use $(( ... )), not `expr ... `.

I'll use this convention in v4. Thanks!

>
>>   
>>   # usage: corrupt_graph_and_verify <position> <data> <string>
>>   # Manipulates the commit-graph file at the position
>> @@ -312,4 +319,19 @@ test_expect_success 'detect missing commit data chunk' '
>>   		"missing the Commit Data chunk"
>>   '
>>   
>> +test_expect_success 'detect incorrect fanout' '
>> +	corrupt_graph_and_verify $GRAPH_BYTE_FANOUT1 "\01" \
> How can you be sure that this value is incorrect?

The repo is created using constant information (the commit/author dates 
are set by environment variables in the test environment). The 
commit-graph file that is generated by the test script is identical each 
time.


>
>> +		"fanout value"
>> +'
>> +
>> +test_expect_success 'detect incorrect fanout' '
>> +	corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \
>> +		"fanout value"
>> +'
> I would prefer if different tests had different description.  Those two
> are both 'detect incorrect layout'.  What is the difference between them?
I'll specify this one is for the final value.
>
>> +
>> +test_expect_success 'detect incorrect OID order' '
>> +	corrupt_graph_and_verify $GRAPH_BYTE_OID_LOOKUP_ORDER "\01" \
>> +		"incorrect OID order"
>> +'
> How can you be sure that this value is incorrect, or rather that it
> would be always incorrect?
>
>> +
>>   test_done

  reply	other threads:[~2018-05-30 16:18 UTC|newest]

Thread overview: 149+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-17 18:10 [RFC PATCH 00/12] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
2018-04-17 18:10 ` [RFC PATCH 01/12] fixup! commit-graph: always load commit-graph information Derrick Stolee
2018-04-17 18:10 ` [RFC PATCH 02/12] commit-graph: add 'check' subcommand Derrick Stolee
2018-04-19 13:24   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 03/12] commit-graph: check file header information Derrick Stolee
2018-04-19 15:58   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 04/12] commit-graph: parse commit from chosen graph Derrick Stolee
2018-04-19 17:21   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 05/12] commit-graph: check fanout and lookup table Derrick Stolee
2018-04-20  7:27   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 06/12] commit: force commit to parse from object database Derrick Stolee
2018-04-20 12:13   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 07/12] commit-graph: load a root tree from specific graph Derrick Stolee
2018-04-20 12:18   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 08/12] commit-graph: verify commit contents against odb Derrick Stolee
2018-04-20 16:47   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 10/12] commit-graph: add '--reachable' option Derrick Stolee
2018-04-20 17:17   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 09/12] fsck: check commit-graph Derrick Stolee
2018-04-20 16:59   ` Jakub Narebski
2018-04-17 18:10 ` [RFC PATCH 11/12] gc: automatically write commit-graph files Derrick Stolee
2018-04-20 17:34   ` Jakub Narebski
2018-04-20 18:33     ` Ævar Arnfjörð Bjarmason
2018-04-17 18:10 ` [RFC PATCH 12/12] commit-graph: update design document Derrick Stolee
2018-04-20 19:10   ` Jakub Narebski
2018-04-17 18:50 ` [RFC PATCH 00/12] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
2018-05-10 17:34 ` [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch Derrick Stolee
2018-05-10 17:34   ` [PATCH 01/12] commit-graph: add 'verify' subcommand Derrick Stolee
2018-05-10 18:15     ` Martin Ågren
2018-05-10 17:34   ` [PATCH 02/12] commit-graph: verify file header information Derrick Stolee
2018-05-10 18:21     ` Martin Ågren
2018-05-10 17:34   ` [PATCH 03/12] commit-graph: parse commit from chosen graph Derrick Stolee
2018-05-10 17:34   ` [PATCH 04/12] commit-graph: verify fanout and lookup table Derrick Stolee
2018-05-10 18:29     ` Martin Ågren
2018-05-11 15:17       ` Derrick Stolee
2018-05-10 17:34   ` [PATCH 05/12] commit: force commit to parse from object database Derrick Stolee
2018-05-10 17:34   ` [PATCH 06/12] commit-graph: load a root tree from specific graph Derrick Stolee
2018-05-10 17:34   ` [PATCH 07/12] commit-graph: verify commit contents against odb Derrick Stolee
2018-05-10 17:34   ` [PATCH 08/12] fsck: verify commit-graph Derrick Stolee
2018-05-10 17:34   ` [PATCH 09/12] commit-graph: add '--reachable' option Derrick Stolee
2018-05-10 17:34   ` [PATCH 10/12] gc: automatically write commit-graph files Derrick Stolee
2018-05-10 17:34   ` [PATCH 11/12] fetch: compute commit-graph by default Derrick Stolee
2018-05-10 17:34   ` [PATCH 12/12] commit-graph: update design document Derrick Stolee
2018-05-10 19:05   ` [PATCH 00/12] Integrate commit-graph into fsck, gc, and fetch Martin Ågren
2018-05-10 19:22     ` Stefan Beller
2018-05-11 17:23       ` Derrick Stolee
2018-05-11 17:30         ` Martin Ågren
2018-05-10 19:17   ` Ævar Arnfjörð Bjarmason
2018-05-11 17:23     ` Derrick Stolee
2018-05-11 21:15   ` [PATCH v2 00/12] Integrate commit-graph into fsck and gc Derrick Stolee
2018-05-11 21:15     ` [PATCH v2 01/12] commit-graph: add 'verify' subcommand Derrick Stolee
2018-05-12 13:31       ` Martin Ågren
2018-05-14 13:27         ` Derrick Stolee
2018-05-20 12:10       ` Jakub Narebski
2018-05-11 21:15     ` [PATCH v2 02/12] commit-graph: verify file header information Derrick Stolee
2018-05-12 13:35       ` Martin Ågren
2018-05-14 13:31         ` Derrick Stolee
2018-05-20 20:00       ` Jakub Narebski
2018-05-11 21:15     ` [PATCH v2 03/12] commit-graph: test that 'verify' finds corruption Derrick Stolee
2018-05-12 13:43       ` Martin Ågren
2018-05-21 18:53       ` Jakub Narebski
2018-05-24 16:28         ` Derrick Stolee
2018-05-11 21:15     ` [PATCH v2 04/12] commit-graph: parse commit from chosen graph Derrick Stolee
2018-05-12 20:50       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 05/12] commit-graph: verify fanout and lookup table Derrick Stolee
2018-05-11 21:15     ` [PATCH v2 06/12] commit: force commit to parse from object database Derrick Stolee
2018-05-12 20:54       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 07/12] commit-graph: load a root tree from specific graph Derrick Stolee
2018-05-12 20:55       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 08/12] commit-graph: verify commit contents against odb Derrick Stolee
2018-05-12 21:17       ` Martin Ågren
2018-05-14 13:44         ` Derrick Stolee
2018-05-15 21:12       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 09/12] fsck: verify commit-graph Derrick Stolee
2018-05-17 18:13       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 10/12] commit-graph: add '--reachable' option Derrick Stolee
2018-05-17 18:16       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 11/12] gc: automatically write commit-graph files Derrick Stolee
2018-05-17 18:20       ` Martin Ågren
2018-05-11 21:15     ` [PATCH v2 12/12] commit-graph: update design document Derrick Stolee
2018-05-24 16:25     ` [PATCH v3 00/20] Integrate commit-graph into 'fsck' and 'gc' Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 01/20] commit-graph: UNLEAK before die() Derrick Stolee
2018-05-24 22:47         ` Stefan Beller
2018-05-25  0:08           ` Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 02/20] commit-graph: fix GRAPH_MIN_SIZE Derrick Stolee
2018-05-26 18:46         ` Jakub Narebski
2018-05-26 20:30           ` brian m. carlson
2018-06-02 19:43             ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 03/20] commit-graph: parse commit from chosen graph Derrick Stolee
2018-05-27 10:23         ` Jakub Narebski
2018-05-29 12:31           ` Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 04/20] commit: force commit to parse from object database Derrick Stolee
2018-05-27 18:04         ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 05/20] commit-graph: load a root tree from specific graph Derrick Stolee
2018-05-27 19:12         ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 06/20] commit-graph: add 'verify' subcommand Derrick Stolee
2018-05-27 22:55         ` Jakub Narebski
2018-05-30 16:07           ` Derrick Stolee
2018-06-02 21:19             ` Jakub Narebski
2018-06-04 11:30               ` Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 07/20] commit-graph: verify catches corrupt signature Derrick Stolee
2018-05-28 14:05         ` Jakub Narebski
2018-05-29 12:43           ` Derrick Stolee
2018-06-02 22:30             ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 08/20] commit-graph: verify required chunks are present Derrick Stolee
2018-05-28 17:11         ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 09/20] commit-graph: verify corrupt OID fanout and lookup Derrick Stolee
2018-05-30 13:34         ` Jakub Narebski
2018-05-30 16:18           ` Derrick Stolee [this message]
2018-06-02  4:38         ` Duy Nguyen
2018-06-04 11:32           ` Derrick Stolee
2018-06-04 14:42             ` Duy Nguyen
2018-05-24 16:25       ` [PATCH v3 10/20] commit-graph: verify objects exist Derrick Stolee
2018-05-30 19:22         ` Jakub Narebski
2018-05-31 12:53           ` Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 11/20] commit-graph: verify root tree OIDs Derrick Stolee
2018-05-30 22:24         ` Jakub Narebski
2018-05-31 13:16           ` Derrick Stolee
2018-06-02 22:50             ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 12/20] commit-graph: verify parent list Derrick Stolee
2018-06-01 23:21         ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 13/20] commit-graph: verify generation number Derrick Stolee
2018-06-02 12:23         ` Jakub Narebski
2018-06-04 11:47           ` Derrick Stolee
2018-05-24 16:25       ` [PATCH v3 14/20] commit-graph: verify commit date Derrick Stolee
2018-06-02 12:29         ` Jakub Narebski
2018-05-24 16:25       ` [PATCH v3 15/20] commit-graph: test for corrupted octopus edge Derrick Stolee
2018-06-02 12:39         ` Jakub Narebski
2018-06-04 13:08           ` Derrick Stolee
2018-05-24 16:26       ` [PATCH v3 16/20] commit-graph: verify contents match checksum Derrick Stolee
2018-05-30 12:35         ` SZEDER Gábor
2018-06-02 15:52         ` Jakub Narebski
2018-06-04 11:55           ` Derrick Stolee
2018-05-24 16:26       ` [PATCH v3 17/20] fsck: verify commit-graph Derrick Stolee
2018-06-02 16:17         ` Jakub Narebski
2018-06-04 11:59           ` Derrick Stolee
2018-05-24 16:26       ` [PATCH v3 18/20] commit-graph: add '--reachable' option Derrick Stolee
2018-06-02 17:34         ` Jakub Narebski
2018-06-04 12:44           ` Derrick Stolee
2018-05-24 16:26       ` [PATCH v3 19/20] gc: automatically write commit-graph files Derrick Stolee
2018-06-02 18:03         ` Jakub Narebski
2018-06-04 12:51           ` Derrick Stolee
2018-05-24 16:26       ` [PATCH v3 20/20] commit-graph: update design document Derrick Stolee
2018-06-02 18:27         ` Jakub Narebski
2018-05-24 21:15       ` [PATCH v3 00/20] Integrate commit-graph into 'fsck' and 'gc' Ævar Arnfjörð Bjarmason
2018-05-25  4:11       ` Junio C Hamano
2018-05-29  4:27       ` Junio C Hamano
2018-05-29 12:37         ` Derrick Stolee
2018-05-29 13:41           ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b34afb39-8df2-b825-773c-8526d93314b5@gmail.com \
    --to=stolee@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=marten.agren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.