All of lore.kernel.org
 help / color / mirror / Atom feed
From: git@jeffhostetler.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net, j6t@kdbg.org,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: [PATCH v8] read-cache: force_verify_index_checksum
Date: Tue, 25 Apr 2017 18:41:09 +0000	[thread overview]
Message-ID: <20170425184109.46168-2-git@jeffhostetler.com> (raw)
In-Reply-To: <20170425184109.46168-1-git@jeffhostetler.com>

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach git to skip verification of the SHA1-1 checksum at the end of
the index file in verify_hdr() which is called from read_index()
unless the "force_verify_index_checksum" global variable is set.

Teach fsck to force this verification.

The checksum verification is for detecting disk corruption, and for
small projects, the time it takes to compute SHA-1 is not that
significant, but for gigantic repositories this calculation adds
significant time to every command.

These effect can be seen using t/perf/p0002-read-cache.sh:

Test                                          HEAD~1            HEAD
--------------------------------------------------------------------------------------
0002.1: read_cache/discard_cache 1000 times   0.66(0.44+0.20)   0.30(0.27+0.02) -54.5%

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 builtin/fsck.c  |  1 +
 cache.h         |  2 ++
 read-cache.c    |  7 +++++++
 t/t1450-fsck.sh | 32 ++++++++++++++++++++++++++++++++
 4 files changed, 42 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 1a5cacc..5512d06 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -771,6 +771,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	}
 
 	if (keep_cache_objects) {
+		verify_index_checksum = 1;
 		read_cache();
 		for (i = 0; i < active_nr; i++) {
 			unsigned int mode;
diff --git a/cache.h b/cache.h
index 80b6372..87f13bf 100644
--- a/cache.h
+++ b/cache.h
@@ -685,6 +685,8 @@ extern void update_index_if_able(struct index_state *, struct lock_file *);
 extern int hold_locked_index(struct lock_file *, int);
 extern void set_alternate_index_output(const char *);
 
+extern int verify_index_checksum;
+
 /* Environment bits from configuration mechanism */
 extern int trust_executable_bit;
 extern int trust_ctime;
diff --git a/read-cache.c b/read-cache.c
index 9054369..c4205aa 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1371,6 +1371,9 @@ struct ondisk_cache_entry_extended {
 			    ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
 			    ondisk_cache_entry_size(ce_namelen(ce)))
 
+/* Allow fsck to force verification of the index checksum. */
+int verify_index_checksum;
+
 static int verify_hdr(struct cache_header *hdr, unsigned long size)
 {
 	git_SHA_CTX c;
@@ -1382,6 +1385,10 @@ static int verify_hdr(struct cache_header *hdr, unsigned long size)
 	hdr_version = ntohl(hdr->hdr_version);
 	if (hdr_version < INDEX_FORMAT_LB || INDEX_FORMAT_UB < hdr_version)
 		return error("bad index version %d", hdr_version);
+
+	if (!verify_index_checksum)
+		return 0;
+
 	git_SHA1_Init(&c);
 	git_SHA1_Update(&c, hdr, size - 20);
 	git_SHA1_Final(sha1, &c);
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 33a51c9..eff1cd6 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -689,4 +689,36 @@ test_expect_success 'bogus head does not fallback to all heads' '
 	! grep $blob out
 '
 
+# Corrupt the checksum on the index.
+# Add 1 to the last byte in the SHA.
+corrupt_index_checksum () {
+    perl -w -e '
+	use Fcntl ":seek";
+	open my $fh, "+<", ".git/index" or die "open: $!";
+	binmode $fh;
+	seek $fh, -1, SEEK_END or die "seek: $!";
+	read $fh, my $in_byte, 1 or die "read: $!";
+
+	$in_value = unpack("C", $in_byte);
+	$out_value = ($in_value + 1) & 255;
+
+	$out_byte = pack("C", $out_value);
+
+	seek $fh, -1, SEEK_END or die "seek: $!";
+	print $fh $out_byte;
+	close $fh or die "close: $!";
+    '
+}
+
+# Corrupt the checksum on the index and then
+# verify that only fsck notices.
+test_expect_success 'detect corrupt index file in fsck' '
+	cp .git/index .git/index.backup &&
+	test_when_finished "mv .git/index.backup .git/index" &&
+	corrupt_index_checksum &&
+	test_must_fail git fsck --cache 2>expect &&
+	grep "bad index file" expect &&
+	git status
+'
+
 test_done
-- 
2.9.3


  reply	other threads:[~2017-04-25 18:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-25 18:41 [PATCH v8] read-cache: call verify_hdr() in a background thread git
2017-04-25 18:41 ` git [this message]
2017-04-26  4:11 ` Junio C Hamano
2017-04-26  4:34   ` Junio C Hamano
2017-04-26 13:06     ` Jeff Hostetler
2017-04-27  0:41       ` Junio C Hamano
2017-04-26 13:03   ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170425184109.46168-2-git@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=jeffhost@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.