All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6] read-cache: call verify_hdr() in a background thread
@ 2017-04-05 18:06 git
  2017-04-05 18:06 ` [PATCH v6] read-cache: force_verify_index_checksum git
  0 siblings, 1 reply; 5+ messages in thread
From: git @ 2017-04-05 18:06 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Version 6 of this patch series cleans up a little further
and uses test_when_finished in the unit test.

================
Version 5 of this patch series further simplifies the change to a
single global variable for use by fsck.  It eliminates the core
config setting.  Further discussion on the mailing list indicated
that the config setting only added confusion and probably would not
be used by anyone.


Jeff Hostetler (1):
  read-cache: force_verify_index_checksum

 builtin/fsck.c  |  1 +
 cache.h         |  2 ++
 read-cache.c    |  7 +++++++
 t/t1450-fsck.sh | 13 +++++++++++++
 4 files changed, 23 insertions(+)

-- 
2.9.3


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v6] read-cache: force_verify_index_checksum
  2017-04-05 18:06 [PATCH v6] read-cache: call verify_hdr() in a background thread git
@ 2017-04-05 18:06 ` git
  2017-04-05 18:25   ` Jeff King
  2017-04-05 18:41   ` Jonathan Nieder
  0 siblings, 2 replies; 5+ messages in thread
From: git @ 2017-04-05 18:06 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, Jeff Hostetler

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.

Teach fsck to force this verification.

The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.

These effect can be seen using t/perf/p0002-read-cache.sh:

================
BEFORE:
$ ./p0002-read-cache.sh
0002.1: read_cache/discard_cache 1000 times   0.66(0.58+0.08)

AFTER:
$ ./p0002-read-cache.sh
0002.1: read_cache/discard_cache 1000 times   0.30(0.28+0.01)
================

Also, using the t/perf/p0004-read-tree.sh test that I created
in a parallel patch series.

https://public-inbox.org/git/20170404210847.50860-1-git@jeffhostetler.com/

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fsck.c  |  1 +
 cache.h         |  2 ++
 read-cache.c    |  7 +++++++
 t/t1450-fsck.sh | 13 +++++++++++++
 4 files changed, 23 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 1a5cacc..5512d06 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -771,6 +771,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	}
 
 	if (keep_cache_objects) {
+		verify_index_checksum = 1;
 		read_cache();
 		for (i = 0; i < active_nr; i++) {
 			unsigned int mode;
diff --git a/cache.h b/cache.h
index 80b6372..87f13bf 100644
--- a/cache.h
+++ b/cache.h
@@ -685,6 +685,8 @@ extern void update_index_if_able(struct index_state *, struct lock_file *);
 extern int hold_locked_index(struct lock_file *, int);
 extern void set_alternate_index_output(const char *);
 
+extern int verify_index_checksum;
+
 /* Environment bits from configuration mechanism */
 extern int trust_executable_bit;
 extern int trust_ctime;
diff --git a/read-cache.c b/read-cache.c
index 9054369..c4205aa 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1371,6 +1371,9 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
+/* Allow fsck to force verification of the index checksum. */
+int verify_index_checksum;
+
 static int verify_hdr(struct cache_header *hdr, unsigned long size)
 {
 	git_SHA_CTX c;
@@ -1382,6 +1385,10 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
 	hdr_version = ntohl(hdr->hdr_version);
 	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
+
+	if (!verify_index_checksum)
+		return 0;
+
 	git_SHA1_Init(&c);
 	git_SHA1_Update(&c, hdr, size - 20);
 	git_SHA1_Final(sha1, &c);
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 33a51c9..fc1acb5 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -689,4 +689,17 @@ test_expect_success 'bogus head does not fallback to all heads' '
 	! grep $blob out
 '
 
+test_expect_success 'detect corrupt index file in fsck' '
+	cp .git/index .git/index.backup &&
+	test_when_finished "mv .git/index.backup .git/index" &&
+	echo zzzzzzzz >zzzzzzzz &&
+	test_when_finished "rm zzzzzzzz" &&
+	git add zzzzzzzz &&
+	sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy &&
+	rm .git/index &&
+	mv .git/index.yyy .git/index &&
+	test_when_finished "rm .git/index" &&
+	test_must_fail git fsck --cache
+'
+
 test_done
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v6] read-cache: force_verify_index_checksum
  2017-04-05 18:06 ` [PATCH v6] read-cache: force_verify_index_checksum git
@ 2017-04-05 18:25   ` Jeff King
  2017-04-05 19:06     ` Jeff Hostetler
  2017-04-05 18:41   ` Jonathan Nieder
  1 sibling, 1 reply; 5+ messages in thread
From: Jeff King @ 2017-04-05 18:25 UTC (permalink / raw)
  To: git; +Cc: git, gitster, Jeff Hostetler

On Wed, Apr 05, 2017 at 06:06:25PM +0000, git@jeffhostetler.com wrote:

> ---
>  builtin/fsck.c  |  1 +
>  cache.h         |  2 ++
>  read-cache.c    |  7 +++++++
>  t/t1450-fsck.sh | 13 +++++++++++++
>  4 files changed, 23 insertions(+)

This version is delightfully simple. Thanks for sticking with it.

> +test_expect_success 'detect corrupt index file in fsck' '
> +	cp .git/index .git/index.backup &&
> +	test_when_finished "mv .git/index.backup .git/index" &&
> +	echo zzzzzzzz >zzzzzzzz &&
> +	test_when_finished "rm zzzzzzzz" &&
> +	git add zzzzzzzz &&
> +	sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy &&
> +	rm .git/index &&
> +	mv .git/index.yyy .git/index &&
> +	test_when_finished "rm .git/index" &&
> +	test_must_fail git fsck --cache
> +'

I was surprised by the second when-finished "rm". The earlier "mv"
should be enough to restore everything. I guess you're trying to
overcome the tendency of "mv" to complain about overwriting on some
platforms.

But wouldn't that make the first when-finished "mv" potentially a
problem, if it happens before the "rm .git/index" line?

Perhaps "mv -f" would make it all simpler (ordinarily I'd suggest just
"git reset --hard", but since we're corrupting the index, it may be
better not to rely on git for the restore). So maybe just:

  cp .git/index .git/index.backup &&
  test_when_finished "mv -f .git/index.backup .git/index" &&
  ...

I think the "rm zzzzzzzz" can go, too. We don't typically worry too much
about accumulating random files in the working tree. The important thing
is that the index is restored to a sane state.

-Peff

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v6] read-cache: force_verify_index_checksum
  2017-04-05 18:06 ` [PATCH v6] read-cache: force_verify_index_checksum git
  2017-04-05 18:25   ` Jeff King
@ 2017-04-05 18:41   ` Jonathan Nieder
  1 sibling, 0 replies; 5+ messages in thread
From: Jonathan Nieder @ 2017-04-05 18:41 UTC (permalink / raw)
  To: git; +Cc: git, gitster, peff, Jeff Hostetler

Hi,

git@jeffhostetler.com wrote:

> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -689,4 +689,17 @@ test_expect_success 'bogus head does not fallback to all heads' '
>  	! grep $blob out
>  '
>  
> +test_expect_success 'detect corrupt index file in fsck' '
> +	cp .git/index .git/index.backup &&
> +	test_when_finished "mv .git/index.backup .git/index" &&
> +	echo zzzzzzzz >zzzzzzzz &&
> +	test_when_finished "rm zzzzzzzz" &&
> +	git add zzzzzzzz &&
> +	sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy &&
> +	rm .git/index &&
> +	mv .git/index.yyy .git/index &&

Why is the "rm" before "mv" needed?  I would have expected "mv" to
replace the file.

If that doesn't work, I expect there are a large number of other tests
that would need fixing...

> +	test_when_finished "rm .git/index" &&

Likewise --- this seems redundant next to the "mv" enqueued previously.

> +	test_must_fail git fsck --cache

That said,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v6] read-cache: force_verify_index_checksum
  2017-04-05 18:25   ` Jeff King
@ 2017-04-05 19:06     ` Jeff Hostetler
  0 siblings, 0 replies; 5+ messages in thread
From: Jeff Hostetler @ 2017-04-05 19:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git, gitster, Jeff Hostetler



On 4/5/2017 2:25 PM, Jeff King wrote:
> On Wed, Apr 05, 2017 at 06:06:25PM +0000, git@jeffhostetler.com wrote:
>
>> ---
>>  builtin/fsck.c  |  1 +
>>  cache.h         |  2 ++
>>  read-cache.c    |  7 +++++++
>>  t/t1450-fsck.sh | 13 +++++++++++++
>>  4 files changed, 23 insertions(+)
>
> This version is delightfully simple. Thanks for sticking with it.
>
>> +test_expect_success 'detect corrupt index file in fsck' '
>> +	cp .git/index .git/index.backup &&
>> +	test_when_finished "mv .git/index.backup .git/index" &&
>> +	echo zzzzzzzz >zzzzzzzz &&
>> +	test_when_finished "rm zzzzzzzz" &&
>> +	git add zzzzzzzz &&
>> +	sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy &&
>> +	rm .git/index &&
>> +	mv .git/index.yyy .git/index &&
>> +	test_when_finished "rm .git/index" &&
>> +	test_must_fail git fsck --cache
>> +'
>
> I was surprised by the second when-finished "rm". The earlier "mv"
> should be enough to restore everything. I guess you're trying to
> overcome the tendency of "mv" to complain about overwriting on some
> platforms.

Yeah, just being cautious. (I spend a lot of time on Windows.)

>
> But wouldn't that make the first when-finished "mv" potentially a
> problem, if it happens before the "rm .git/index" line?
>
> Perhaps "mv -f" would make it all simpler (ordinarily I'd suggest just
> "git reset --hard", but since we're corrupting the index, it may be
> better not to rely on git for the restore). So maybe just:
>
>   cp .git/index .git/index.backup &&
>   test_when_finished "mv -f .git/index.backup .git/index" &&
>   ...
>
> I think the "rm zzzzzzzz" can go, too. We don't typically worry too much
> about accumulating random files in the working tree. The important thing
> is that the index is restored to a sane state.
>
> -Peff
>

Good point.  Let me take another stab at it.
Thanks
Jeff

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-04-05 19:06 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-05 18:06 [PATCH v6] read-cache: call verify_hdr() in a background thread git
2017-04-05 18:06 ` [PATCH v6] read-cache: force_verify_index_checksum git
2017-04-05 18:25   ` Jeff King
2017-04-05 19:06     ` Jeff Hostetler
2017-04-05 18:41   ` Jonathan Nieder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.