* [PATCH v8] read-cache: call verify_hdr() in a background thread @ 2017-04-25 18:41 git 2017-04-25 18:41 ` [PATCH v8] read-cache: force_verify_index_checksum git 2017-04-26 4:11 ` [PATCH v8] read-cache: call verify_hdr() in a background thread Junio C Hamano 0 siblings, 2 replies; 7+ messages in thread From: git @ 2017-04-25 18:41 UTC (permalink / raw) To: git; +Cc: gitster, peff, j6t, Jeff Hostetler From: Jeff Hostetler <jeffhost@microsoft.com> Version 8 of this patch converts the unit test to use perl to corrupt the index checksum (rather than altering a filename) and also verifies the fsck error message as suggested in response to v7 on the mailing list. If there are no other suggestions, I think this version should be considered final. Jeff Hostetler (1): read-cache: force_verify_index_checksum builtin/fsck.c | 1 + cache.h | 2 ++ read-cache.c | 7 +++++++ t/t1450-fsck.sh | 32 ++++++++++++++++++++++++++++++++ 4 files changed, 42 insertions(+) -- 2.9.3 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v8] read-cache: force_verify_index_checksum 2017-04-25 18:41 [PATCH v8] read-cache: call verify_hdr() in a background thread git @ 2017-04-25 18:41 ` git 2017-04-26 4:11 ` [PATCH v8] read-cache: call verify_hdr() in a background thread Junio C Hamano 1 sibling, 0 replies; 7+ messages in thread From: git @ 2017-04-25 18:41 UTC (permalink / raw) To: git; +Cc: gitster, peff, j6t, Jeff Hostetler From: Jeff Hostetler <jeffhost@microsoft.com> Teach git to skip verification of the SHA1-1 checksum at the end of the index file in verify_hdr() which is called from read_index() unless the "force_verify_index_checksum" global variable is set. Teach fsck to force this verification. The checksum verification is for detecting disk corruption, and for small projects, the time it takes to compute SHA-1 is not that significant, but for gigantic repositories this calculation adds significant time to every command. These effect can be seen using t/perf/p0002-read-cache.sh: Test HEAD~1 HEAD -------------------------------------------------------------------------------------- 0002.1: read_cache/discard_cache 1000 times 0.66(0.44+0.20) 0.30(0.27+0.02) -54.5% Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com> --- builtin/fsck.c | 1 + cache.h | 2 ++ read-cache.c | 7 +++++++ t/t1450-fsck.sh | 32 ++++++++++++++++++++++++++++++++ 4 files changed, 42 insertions(+) diff --git a/builtin/fsck.c b/builtin/fsck.c index 1a5cacc..5512d06 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -771,6 +771,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) } if (keep_cache_objects) { + verify_index_checksum = 1; read_cache(); for (i = 0; i < active_nr; i++) { unsigned int mode; diff --git a/cache.h b/cache.h index 80b6372..87f13bf 100644 --- a/cache.h +++ b/cache.h @@ -685,6 +685,8 @@ extern void update_index_if_able(struct index_state *, struct lock_file *); extern int hold_locked_index(struct lock_file *, int); extern void set_alternate_index_output(const char *); +extern int verify_index_checksum; + /* Environment bits from configuration mechanism */ extern int trust_executable_bit; extern int trust_ctime; diff --git a/read-cache.c b/read-cache.c index 9054369..c4205aa 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1371,6 +1371,9 @@ struct ondisk_cache_entry_extended { ondisk_cache_entry_extended_size(ce_namelen(ce)) : \ ondisk_cache_entry_size(ce_namelen(ce))) +/* Allow fsck to force verification of the index checksum. */ +int verify_index_checksum; + static int verify_hdr(struct cache_header *hdr, unsigned long size) { git_SHA_CTX c; @@ -1382,6 +1385,10 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size) hdr_version = ntohl(hdr->hdr_version); if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version) return error("bad index version %d", hdr_version); + + if (!verify_index_checksum) + return 0; + git_SHA1_Init(&c); git_SHA1_Update(&c, hdr, size - 20); git_SHA1_Final(sha1, &c); diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 33a51c9..eff1cd6 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -689,4 +689,36 @@ test_expect_success 'bogus head does not fallback to all heads' ' ! grep $blob out ' +# Corrupt the checksum on the index. +# Add 1 to the last byte in the SHA. +corrupt_index_checksum () { + perl -w -e ' + use Fcntl ":seek"; + open my $fh, "+<", ".git/index" or die "open: $!"; + binmode $fh; + seek $fh, -1, SEEK_END or die "seek: $!"; + read $fh, my $in_byte, 1 or die "read: $!"; + + $in_value = unpack("C", $in_byte); + $out_value = ($in_value + 1) & 255; + + $out_byte = pack("C", $out_value); + + seek $fh, -1, SEEK_END or die "seek: $!"; + print $fh $out_byte; + close $fh or die "close: $!"; + ' +} + +# Corrupt the checksum on the index and then +# verify that only fsck notices. +test_expect_success 'detect corrupt index file in fsck' ' + cp .git/index .git/index.backup && + test_when_finished "mv .git/index.backup .git/index" && + corrupt_index_checksum && + test_must_fail git fsck --cache 2>expect && + grep "bad index file" expect && + git status +' + test_done -- 2.9.3 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v8] read-cache: call verify_hdr() in a background thread 2017-04-25 18:41 [PATCH v8] read-cache: call verify_hdr() in a background thread git 2017-04-25 18:41 ` [PATCH v8] read-cache: force_verify_index_checksum git @ 2017-04-26 4:11 ` Junio C Hamano 2017-04-26 4:34 ` Junio C Hamano 2017-04-26 13:03 ` Jeff Hostetler 1 sibling, 2 replies; 7+ messages in thread From: Junio C Hamano @ 2017-04-26 4:11 UTC (permalink / raw) To: git; +Cc: git, peff, j6t, Jeff Hostetler git@jeffhostetler.com writes: > From: Jeff Hostetler <jeffhost@microsoft.com> > > Version 8 of this patch converts the unit test to use > perl to corrupt the index checksum (rather than altering > a filename) and also verifies the fsck error message as > suggested in response to v7 on the mailing list. > > If there are no other suggestions, I think this version > should be considered final. Oops. The earlier one has already been in 'master' for a few days. Let's make this an incremental update. Is the description in the following something you are OK with (so that I can add your sign-off)? Thanks. -- >8 -- From: Jeff Hostetler <jeffhost@microsoft.com> Date: Tue, 25 Apr 2017 18:41:09 +0000 Subject: t1450: avoid use of "sed" on the index, which is a binary file The previous step added a path zzzzzzzz to the index, and then used "sed" to replace this string to yyyyyyyy to create a test case where the checksum at the end of the file does not match the contents. Unfortunately, use of "sed" on a non-text file is not portable. Instead, use a Perl script that seeks to the end and modifies the last byte of the file (where we _know_ stores the trailing checksum). --- t/t1450-fsck.sh | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 677e15a7a4..eff1cd68e9 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -689,16 +689,35 @@ test_expect_success 'bogus head does not fallback to all heads' ' ! grep $blob out ' +# Corrupt the checksum on the index. +# Add 1 to the last byte in the SHA. +corrupt_index_checksum () { + perl -w -e ' + use Fcntl ":seek"; + open my $fh, "+<", ".git/index" or die "open: $!"; + binmode $fh; + seek $fh, -1, SEEK_END or die "seek: $!"; + read $fh, my $in_byte, 1 or die "read: $!"; + + $in_value = unpack("C", $in_byte); + $out_value = ($in_value + 1) & 255; + + $out_byte = pack("C", $out_value); + + seek $fh, -1, SEEK_END or die "seek: $!"; + print $fh $out_byte; + close $fh or die "close: $!"; + ' +} + +# Corrupt the checksum on the index and then +# verify that only fsck notices. test_expect_success 'detect corrupt index file in fsck' ' cp .git/index .git/index.backup && test_when_finished "mv .git/index.backup .git/index" && - echo zzzzzzzz >zzzzzzzz && - git add zzzzzzzz && - sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy && - mv .git/index.yyy .git/index && - # Confirm that fsck detects invalid checksum - test_must_fail git fsck --cache && - # Confirm that status no longer complains about invalid checksum + corrupt_index_checksum && + test_must_fail git fsck --cache 2>expect && + grep "bad index file" expect && git status ' ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v8] read-cache: call verify_hdr() in a background thread 2017-04-26 4:11 ` [PATCH v8] read-cache: call verify_hdr() in a background thread Junio C Hamano @ 2017-04-26 4:34 ` Junio C Hamano 2017-04-26 13:06 ` Jeff Hostetler 2017-04-26 13:03 ` Jeff Hostetler 1 sibling, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2017-04-26 4:34 UTC (permalink / raw) To: git; +Cc: git, peff, j6t, Jeff Hostetler Junio C Hamano <gitster@pobox.com> writes: > git@jeffhostetler.com writes: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Version 8 of this patch converts the unit test to use >> perl to corrupt the index checksum (rather than altering >> a filename) and also verifies the fsck error message as >> suggested in response to v7 on the mailing list. >> >> If there are no other suggestions, I think this version >> should be considered final. > > Oops. The earlier one has already been in 'master' for a few days. > Let's make this an incremental update. > > Is the description in the following something you are OK with (so > that I can add your sign-off)? > > Thanks. By the way, I said I am fine with the two-liner version in Dscho's message, but I am also OK with this version that does not use a separate dd and instead does everything in a single invocation of Perl. As long as you've tested this version, there is no point replacing it with yet another one. Thanks. > -- >8 -- > From: Jeff Hostetler <jeffhost@microsoft.com> > Date: Tue, 25 Apr 2017 18:41:09 +0000 > Subject: t1450: avoid use of "sed" on the index, which is a binary file > > The previous step added a path zzzzzzzz to the index, and then used > "sed" to replace this string to yyyyyyyy to create a test case where > the checksum at the end of the file does not match the contents. > > Unfortunately, use of "sed" on a non-text file is not portable. > Instead, use a Perl script that seeks to the end and modifies the > last byte of the file (where we _know_ stores the trailing > checksum). > > > --- > t/t1450-fsck.sh | 33 ++++++++++++++++++++++++++------- > 1 file changed, 26 insertions(+), 7 deletions(-) > > diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh > index 677e15a7a4..eff1cd68e9 100755 > --- a/t/t1450-fsck.sh > +++ b/t/t1450-fsck.sh > @@ -689,16 +689,35 @@ test_expect_success 'bogus head does not fallback to all heads' ' > ! grep $blob out > ' > > +# Corrupt the checksum on the index. > +# Add 1 to the last byte in the SHA. > +corrupt_index_checksum () { > + perl -w -e ' > + use Fcntl ":seek"; > + open my $fh, "+<", ".git/index" or die "open: $!"; > + binmode $fh; > + seek $fh, -1, SEEK_END or die "seek: $!"; > + read $fh, my $in_byte, 1 or die "read: $!"; > + > + $in_value = unpack("C", $in_byte); > + $out_value = ($in_value + 1) & 255; > + > + $out_byte = pack("C", $out_value); > + > + seek $fh, -1, SEEK_END or die "seek: $!"; > + print $fh $out_byte; > + close $fh or die "close: $!"; > + ' > +} > + > +# Corrupt the checksum on the index and then > +# verify that only fsck notices. > test_expect_success 'detect corrupt index file in fsck' ' > cp .git/index .git/index.backup && > test_when_finished "mv .git/index.backup .git/index" && > - echo zzzzzzzz >zzzzzzzz && > - git add zzzzzzzz && > - sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy && > - mv .git/index.yyy .git/index && > - # Confirm that fsck detects invalid checksum > - test_must_fail git fsck --cache && > - # Confirm that status no longer complains about invalid checksum > + corrupt_index_checksum && > + test_must_fail git fsck --cache 2>expect && > + grep "bad index file" expect && > git status > ' > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v8] read-cache: call verify_hdr() in a background thread 2017-04-26 4:34 ` Junio C Hamano @ 2017-04-26 13:06 ` Jeff Hostetler 2017-04-27 0:41 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Jeff Hostetler @ 2017-04-26 13:06 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, peff, j6t, Jeff Hostetler On 4/26/2017 12:34 AM, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> git@jeffhostetler.com writes: >> >>> From: Jeff Hostetler <jeffhost@microsoft.com> >>> >>> Version 8 of this patch converts the unit test to use >>> perl to corrupt the index checksum (rather than altering >>> a filename) and also verifies the fsck error message as >>> suggested in response to v7 on the mailing list. >>> >>> If there are no other suggestions, I think this version >>> should be considered final. >> Oops. The earlier one has already been in 'master' for a few days. >> Let's make this an incremental update. >> >> Is the description in the following something you are OK with (so >> that I can add your sign-off)? >> >> Thanks. > By the way, I said I am fine with the two-liner version in Dscho's > message, but I am also OK with this version that does not use a > separate dd and instead does everything in a single invocation of > Perl. As long as you've tested this version, there is no point > replacing it with yet another one. Either version is fine. I'd say use my perl version as I have tested it and it is simple enough and already in the tree. I don't think it's worth the bother to switch it to the dd version. Thanks Jeff > Thanks. > >> -- >8 -- >> From: Jeff Hostetler <jeffhost@microsoft.com> >> Date: Tue, 25 Apr 2017 18:41:09 +0000 >> Subject: t1450: avoid use of "sed" on the index, which is a binary file >> >> The previous step added a path zzzzzzzz to the index, and then used >> "sed" to replace this string to yyyyyyyy to create a test case where >> the checksum at the end of the file does not match the contents. >> >> Unfortunately, use of "sed" on a non-text file is not portable. >> Instead, use a Perl script that seeks to the end and modifies the >> last byte of the file (where we _know_ stores the trailing >> checksum). >> >> >> --- >> t/t1450-fsck.sh | 33 ++++++++++++++++++++++++++------- >> 1 file changed, 26 insertions(+), 7 deletions(-) >> >> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh >> index 677e15a7a4..eff1cd68e9 100755 >> --- a/t/t1450-fsck.sh >> +++ b/t/t1450-fsck.sh >> @@ -689,16 +689,35 @@ test_expect_success 'bogus head does not fallback to all heads' ' >> ! grep $blob out >> ' >> >> +# Corrupt the checksum on the index. >> +# Add 1 to the last byte in the SHA. >> +corrupt_index_checksum () { >> + perl -w -e ' >> + use Fcntl ":seek"; >> + open my $fh, "+<", ".git/index" or die "open: $!"; >> + binmode $fh; >> + seek $fh, -1, SEEK_END or die "seek: $!"; >> + read $fh, my $in_byte, 1 or die "read: $!"; >> + >> + $in_value = unpack("C", $in_byte); >> + $out_value = ($in_value + 1) & 255; >> + >> + $out_byte = pack("C", $out_value); >> + >> + seek $fh, -1, SEEK_END or die "seek: $!"; >> + print $fh $out_byte; >> + close $fh or die "close: $!"; >> + ' >> +} >> + >> +# Corrupt the checksum on the index and then >> +# verify that only fsck notices. >> test_expect_success 'detect corrupt index file in fsck' ' >> cp .git/index .git/index.backup && >> test_when_finished "mv .git/index.backup .git/index" && >> - echo zzzzzzzz >zzzzzzzz && >> - git add zzzzzzzz && >> - sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy && >> - mv .git/index.yyy .git/index && >> - # Confirm that fsck detects invalid checksum >> - test_must_fail git fsck --cache && >> - # Confirm that status no longer complains about invalid checksum >> + corrupt_index_checksum && >> + test_must_fail git fsck --cache 2>expect && >> + grep "bad index file" expect && >> git status >> ' >> ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v8] read-cache: call verify_hdr() in a background thread 2017-04-26 13:06 ` Jeff Hostetler @ 2017-04-27 0:41 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2017-04-27 0:41 UTC (permalink / raw) To: Jeff Hostetler; +Cc: git, peff, j6t, Jeff Hostetler Jeff Hostetler <git@jeffhostetler.com> writes: > Either version is fine. I'd say use my perl version as I have tested it and > it is simple enough and already in the tree. I don't think it's worth the > bother to switch it to the dd version. Thanks, I agree what you said is very sensible. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v8] read-cache: call verify_hdr() in a background thread 2017-04-26 4:11 ` [PATCH v8] read-cache: call verify_hdr() in a background thread Junio C Hamano 2017-04-26 4:34 ` Junio C Hamano @ 2017-04-26 13:03 ` Jeff Hostetler 1 sibling, 0 replies; 7+ messages in thread From: Jeff Hostetler @ 2017-04-26 13:03 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, peff, j6t, Jeff Hostetler On 4/26/2017 12:11 AM, Junio C Hamano wrote: > git@jeffhostetler.com writes: > >> From: Jeff Hostetler <jeffhost@microsoft.com> >> >> Version 8 of this patch converts the unit test to use >> perl to corrupt the index checksum (rather than altering >> a filename) and also verifies the fsck error message as >> suggested in response to v7 on the mailing list. >> >> If there are no other suggestions, I think this version >> should be considered final. > Oops. The earlier one has already been in 'master' for a few days. > Let's make this an incremental update. > > Is the description in the following something you are OK with (so > that I can add your sign-off)? Yes, this is fine. Thanks! Jeff > > Thanks. > > -- >8 -- > From: Jeff Hostetler <jeffhost@microsoft.com> > Date: Tue, 25 Apr 2017 18:41:09 +0000 > Subject: t1450: avoid use of "sed" on the index, which is a binary file > > The previous step added a path zzzzzzzz to the index, and then used > "sed" to replace this string to yyyyyyyy to create a test case where > the checksum at the end of the file does not match the contents. > > Unfortunately, use of "sed" on a non-text file is not portable. > Instead, use a Perl script that seeks to the end and modifies the > last byte of the file (where we _know_ stores the trailing > checksum). > > > --- > t/t1450-fsck.sh | 33 ++++++++++++++++++++++++++------- > 1 file changed, 26 insertions(+), 7 deletions(-) > > diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh > index 677e15a7a4..eff1cd68e9 100755 > --- a/t/t1450-fsck.sh > +++ b/t/t1450-fsck.sh > @@ -689,16 +689,35 @@ test_expect_success 'bogus head does not fallback to all heads' ' > ! grep $blob out > ' > > +# Corrupt the checksum on the index. > +# Add 1 to the last byte in the SHA. > +corrupt_index_checksum () { > + perl -w -e ' > + use Fcntl ":seek"; > + open my $fh, "+<", ".git/index" or die "open: $!"; > + binmode $fh; > + seek $fh, -1, SEEK_END or die "seek: $!"; > + read $fh, my $in_byte, 1 or die "read: $!"; > + > + $in_value = unpack("C", $in_byte); > + $out_value = ($in_value + 1) & 255; > + > + $out_byte = pack("C", $out_value); > + > + seek $fh, -1, SEEK_END or die "seek: $!"; > + print $fh $out_byte; > + close $fh or die "close: $!"; > + ' > +} > + > +# Corrupt the checksum on the index and then > +# verify that only fsck notices. > test_expect_success 'detect corrupt index file in fsck' ' > cp .git/index .git/index.backup && > test_when_finished "mv .git/index.backup .git/index" && > - echo zzzzzzzz >zzzzzzzz && > - git add zzzzzzzz && > - sed -e "s/zzzzzzzz/yyyyyyyy/" .git/index >.git/index.yyy && > - mv .git/index.yyy .git/index && > - # Confirm that fsck detects invalid checksum > - test_must_fail git fsck --cache && > - # Confirm that status no longer complains about invalid checksum > + corrupt_index_checksum && > + test_must_fail git fsck --cache 2>expect && > + grep "bad index file" expect && > git status > ' > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2017-04-27 0:41 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-04-25 18:41 [PATCH v8] read-cache: call verify_hdr() in a background thread git 2017-04-25 18:41 ` [PATCH v8] read-cache: force_verify_index_checksum git 2017-04-26 4:11 ` [PATCH v8] read-cache: call verify_hdr() in a background thread Junio C Hamano 2017-04-26 4:34 ` Junio C Hamano 2017-04-26 13:06 ` Jeff Hostetler 2017-04-27 0:41 ` Junio C Hamano 2017-04-26 13:03 ` Jeff Hostetler
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.