git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] Add 'git multi-pack-index verify' command
@ 2018-09-05 14:46 Derrick Stolee via GitGitGadget
  2018-09-05 14:46 ` [PATCH 01/11] multi-pack-index: add 'verify' verb Derrick Stolee via GitGitGadget
                   ` (11 more replies)
  0 siblings, 12 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano

The multi-pack-index file provides faster lookups in repos with many
packfiles by duplicating the information from multiple pack-indexes into a
single file. This series allows us to verify a multi-pack-index using 'git
multi-pack-index verify' and 'git fsck' (when core.multiPackIndex is true).

The pattern for the tests is similar to that found in t5318-commit-graph.sh.

During testing, I found a bug in how we check for the size of off_t (we are
not actually checking off_t, but instead uint32_t). See "multi-pack-index:
fix 32-bit vs 64-bit size check".

Thanks to Ævar [1], I added a commit that provides progress updates when
checking object offsets.

Based on ds/multi-pack-index

[1] 
https://public-inbox.org/git/20180904202729.13900-1-avarab@gmail.com/T/#u

Derrick Stolee (11):
  multi-pack-index: add 'verify' verb
  multi-pack-index: verify bad header
  multi-pack-index: verify corrupt chunk lookup table
  multi-pack-index: verify packname order
  multi-pack-index: verify missing pack
  multi-pack-index: verify oid fanout order
  multi-pack-index: verify oid lookup order
  multi-pack-index: fix 32-bit vs 64-bit size check
  multi-pack-index: verify object offsets
  multi-pack-index: report progress during 'verify'
  fsck: verify multi-pack-index

 Documentation/git-multi-pack-index.txt |  10 ++
 builtin/fsck.c                         |  18 ++++
 builtin/multi-pack-index.c             |   4 +-
 midx.c                                 | 112 ++++++++++++++++----
 midx.h                                 |   1 +
 t/t5319-multi-pack-index.sh            | 136 ++++++++++++++++++++++++-
 6 files changed, 261 insertions(+), 20 deletions(-)


base-commit: 6a22d521260f86dff8fe6f23ab329cebb62ba4f0
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-34%2Fderrickstolee%2Fmidx%2Fverify-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-34/derrickstolee/midx/verify-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/34
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 01/11] multi-pack-index: add 'verify' verb
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
@ 2018-09-05 14:46 ` Derrick Stolee via GitGitGadget
  2018-09-05 18:59   ` Eric Sunshine
  2018-09-05 14:46 ` [PATCH 02/11] multi-pack-index: verify bad header Derrick Stolee via GitGitGadget
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The multi-pack-index builtin writes multi-pack-index files, and
uses a 'write' verb to do so. Add a 'verify' verb that checks this
file matches the contents of the pack-indexes it replaces.

The current implementation is a no-op, but will be extended in
small increments in later commits.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt | 10 ++++++++++
 builtin/multi-pack-index.c             |  4 +++-
 midx.c                                 | 13 +++++++++++++
 midx.h                                 |  1 +
 t/t5319-multi-pack-index.sh            |  8 ++++++++
 5 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 1f97e79912..f7778a2c85 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -27,6 +27,10 @@ write::
 	When given as the verb, write a new MIDX file to
 	`<dir>/packs/multi-pack-index`.
 
+verify::
+	When given as the verb, verify the contents of the MIDX file
+	at `<dir>/packs/multi-pack-index`.
+
 
 EXAMPLES
 --------
@@ -43,6 +47,12 @@ $ git multi-pack-index write
 $ git multi-pack-index --object-dir <alt> write
 -----------------------------------------------
 
+* Verify the MIDX file for the packfiles in the current .git folder.
++
+-----------------------------------------------
+$ git multi-pack-index verify
+-----------------------------------------------
+
 
 SEE ALSO
 --------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 2633efd95d..7d567dafbc 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,7 +5,7 @@
 #include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-	N_("git multi-pack-index [--object-dir=<dir>] write"),
+	N_("git multi-pack-index [--object-dir=<dir>] [write|verify]"),
 	NULL
 };
 
@@ -42,6 +42,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
 
 	if (!strcmp(argv[0], "write"))
 		return write_midx_file(opts.object_dir);
+	if (!strcmp(argv[0], "verify"))
+		return verify_midx_file(opts.object_dir);
 
 	die(_("unrecognized verb: %s"), argv[0]);
 }
diff --git a/midx.c b/midx.c
index f3e8dbc108..b253bed517 100644
--- a/midx.c
+++ b/midx.c
@@ -928,3 +928,16 @@ void clear_midx_file(const char *object_dir)
 
 	free(midx);
 }
+
+int verify_midx_error;
+
+int verify_midx_file(const char *object_dir)
+{
+	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+	verify_midx_error = 0;
+
+	if (!m)
+		return 0;
+
+	return verify_midx_error;
+}
diff --git a/midx.h b/midx.h
index a210f1af2a..ce80b91c68 100644
--- a/midx.h
+++ b/midx.h
@@ -43,5 +43,6 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 
 int write_midx_file(const char *object_dir);
 void clear_midx_file(const char *object_dir);
+int verify_midx_file(const char *object_dir);
 
 #endif
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 6f56b38674..1c4e0e6d31 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -150,6 +150,10 @@ test_expect_success 'write midx with twelve packs' '
 
 compare_results_with_midx "twelve packs"
 
+test_expect_success 'verify multi-pack-index success' '
+	git multi-pack-index verify --object-dir=$objdir
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
@@ -214,4 +218,8 @@ test_expect_success 'force some 64-bit offsets with pack-objects' '
 	midx_read_expect 1 63 5 objects64 " large-offsets"
 '
 
+test_expect_success 'verify multi-pack-index with 64-bit offsets' '
+	git multi-pack-index verify --object-dir=objects64
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 02/11] multi-pack-index: verify bad header
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
  2018-09-05 14:46 ` [PATCH 01/11] multi-pack-index: add 'verify' verb Derrick Stolee via GitGitGadget
@ 2018-09-05 14:46 ` Derrick Stolee via GitGitGadget
  2018-09-07  0:27   ` Eric Sunshine
  2018-09-05 14:46 ` [PATCH 03/11] multi-pack-index: verify corrupt chunk lookup table Derrick Stolee via GitGitGadget
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When verifying if a multi-pack-index file is valid, we want the
command to fail to signal an invalid file. Previously, we wrote
an error to stderr and continued as if we had no multi-pack-index.
Now, die() instead of error().

Add tests that check corrupted headers in a few ways:

* Bad signature
* Bad file version
* Bad hash version
* Truncated hash count
* Extended hash count

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 18 +++++----------
 t/t5319-multi-pack-index.sh | 46 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/midx.c b/midx.c
index b253bed517..ec78254bb6 100644
--- a/midx.c
+++ b/midx.c
@@ -76,24 +76,18 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	m->local = local;
 
 	m->signature = get_be32(m->data);
-	if (m->signature != MIDX_SIGNATURE) {
-		error(_("multi-pack-index signature 0x%08x does not match signature 0x%08x"),
+	if (m->signature != MIDX_SIGNATURE)
+		die(_("multi-pack-index signature 0x%08x does not match signature 0x%08x"),
 		      m->signature, MIDX_SIGNATURE);
-		goto cleanup_fail;
-	}
 
 	m->version = m->data[MIDX_BYTE_FILE_VERSION];
-	if (m->version != MIDX_VERSION) {
-		error(_("multi-pack-index version %d not recognized"),
+	if (m->version != MIDX_VERSION)
+		die(_("multi-pack-index version %d not recognized"),
 		      m->version);
-		goto cleanup_fail;
-	}
 
 	hash_version = m->data[MIDX_BYTE_HASH_VERSION];
-	if (hash_version != MIDX_HASH_VERSION) {
-		error(_("hash version %u does not match"), hash_version);
-		goto cleanup_fail;
-	}
+	if (hash_version != MIDX_HASH_VERSION)
+		die(_("hash version %u does not match"), hash_version);
 	m->hash_len = MIDX_HASH_LEN;
 
 	m->num_chunks = m->data[MIDX_BYTE_NUM_CHUNKS];
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 1c4e0e6d31..836d2bdb53 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -154,6 +154,51 @@ test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
 '
 
+# usage: corrupt_midx_and_verify <pos> <data> <objdir> <string>
+corrupt_midx_and_verify() {
+	POS=$1
+	DATA="${2:-\0}"
+	OBJDIR=$3
+	GREPSTR="$4"
+	FILE=$OBJDIR/pack/multi-pack-index &&
+	chmod a+w $FILE &&
+	test_when_finished mv midx-backup $FILE &&
+	cp $FILE midx-backup &&
+	printf "$DATA" | dd of="$FILE" bs=1 seek="$POS" conv=notrunc &&
+	test_must_fail git multi-pack-index verify --object-dir=$OBJDIR 2>test_err &&
+	grep -v "^+" test_err >err &&
+	test_i18ngrep "$GREPSTR" err
+}
+
+test_expect_success 'verify bad signature' '
+	corrupt_midx_and_verify 0 "\00" $objdir \
+		"multi-pack-index signature"
+'
+
+MIDX_BYTE_VERSION=4
+MIDX_BYTE_OID_VERSION=5
+MIDX_BYTE_CHUNK_COUNT=6
+
+test_expect_success 'verify bad version' '
+	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
+		"multi-pack-index version"
+'
+
+test_expect_success 'verify bad OID version' '
+	corrupt_midx_and_verify $MIDX_BYTE_OID_VERSION "\02" $objdir \
+		"hash version"
+'
+
+test_expect_success 'verify truncated chunk count' '
+	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \
+		"missing required"
+'
+
+test_expect_success 'verify extended chunk count' '
+	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \
+		"terminating multi-pack-index chunk id appears earlier than expected"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
@@ -191,7 +236,6 @@ test_expect_success 'multi-pack-index in an alternate' '
 
 compare_results_with_midx "with alternate (remote midx)"
 
-
 # usage: corrupt_data <file> <pos> [<data>]
 corrupt_data () {
 	file=$1
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 03/11] multi-pack-index: verify corrupt chunk lookup table
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
  2018-09-05 14:46 ` [PATCH 01/11] multi-pack-index: add 'verify' verb Derrick Stolee via GitGitGadget
  2018-09-05 14:46 ` [PATCH 02/11] multi-pack-index: verify bad header Derrick Stolee via GitGitGadget
@ 2018-09-05 14:46 ` Derrick Stolee via GitGitGadget
  2018-09-05 14:46 ` [PATCH 04/11] multi-pack-index: verify packname order Derrick Stolee via GitGitGadget
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      |  3 +++
 t/t5319-multi-pack-index.sh | 13 +++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/midx.c b/midx.c
index ec78254bb6..8b054b39ab 100644
--- a/midx.c
+++ b/midx.c
@@ -100,6 +100,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 		uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 +
 						 MIDX_CHUNKLOOKUP_WIDTH * i);
 
+		if (chunk_offset >= m->data_len)
+			die(_("invalid chunk offset (too large)"));
+
 		switch (chunk_id) {
 			case MIDX_CHUNKID_PACKNAMES:
 				m->chunk_pack_names = m->data + chunk_offset;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 836d2bdb53..5c9c499aa6 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -178,6 +178,9 @@ test_expect_success 'verify bad signature' '
 MIDX_BYTE_VERSION=4
 MIDX_BYTE_OID_VERSION=5
 MIDX_BYTE_CHUNK_COUNT=6
+MIDX_HEADER_SIZE=12
+MIDX_BYTE_CHUNK_ID=$MIDX_HEADER_SIZE
+MIDX_BYTE_CHUNK_OFFSET=$(($MIDX_HEADER_SIZE + 4))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -199,6 +202,16 @@ test_expect_success 'verify extended chunk count' '
 		"terminating multi-pack-index chunk id appears earlier than expected"
 '
 
+test_expect_success 'verify missing required chunk' '
+	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_ID "\01" $objdir \
+		"missing required"
+'
+
+test_expect_success 'verify invalid chunk offset' '
+	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \
+		"invalid chunk offset (too large)"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 04/11] multi-pack-index: verify packname order
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2018-09-05 14:46 ` [PATCH 03/11] multi-pack-index: verify corrupt chunk lookup table Derrick Stolee via GitGitGadget
@ 2018-09-05 14:46 ` Derrick Stolee via GitGitGadget
  2018-09-05 18:15   ` Stefan Beller
  2018-09-05 14:46 ` [PATCH 05/11] multi-pack-index: verify missing pack Derrick Stolee via GitGitGadget
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The final check we make while loading a multi-pack-index is that
the packfile names are in lexicographical order. Make this error
be a die() instead.

In order to test this condition, we need multiple packfiles.
Earlier in t5319-multi-pack-index.sh, we tested the interaction with
'git repack' but this limits us to one packfile in our object dir.
Move these repack tests until after the 'verify' tests.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      |  6 ++----
 t/t5319-multi-pack-index.sh | 10 ++++++++++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/midx.c b/midx.c
index 8b054b39ab..e655a15aed 100644
--- a/midx.c
+++ b/midx.c
@@ -157,12 +157,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 
 		cur_pack_name += strlen(cur_pack_name) + 1;
 
-		if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0) {
-			error(_("multi-pack-index pack names out of order: '%s' before '%s'"),
+		if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
+			die(_("multi-pack-index pack names out of order: '%s' before '%s'"),
 			      m->pack_names[i - 1],
 			      m->pack_names[i]);
-			goto cleanup_fail;
-		}
 	}
 
 	return m;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 5c9c499aa6..4a8f231d93 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -181,6 +181,11 @@ MIDX_BYTE_CHUNK_COUNT=6
 MIDX_HEADER_SIZE=12
 MIDX_BYTE_CHUNK_ID=$MIDX_HEADER_SIZE
 MIDX_BYTE_CHUNK_OFFSET=$(($MIDX_HEADER_SIZE + 4))
+MIDX_NUM_CHUNKS=5
+MIDX_CHUNK_LOOKUP_WIDTH=12
+MIDX_OFFSET_PACKNAMES=$(($MIDX_HEADER_SIZE + \
+			 $MIDX_NUM_CHUNKS * $MIDX_CHUNK_LOOKUP_WIDTH))
+MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -212,6 +217,11 @@ test_expect_success 'verify invalid chunk offset' '
 		"invalid chunk offset (too large)"
 '
 
+test_expect_success 'verify packnames out of order' '
+	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "z" $objdir \
+		"pack names out of order"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 05/11] multi-pack-index: verify missing pack
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                   ` (3 preceding siblings ...)
  2018-09-05 14:46 ` [PATCH 04/11] multi-pack-index: verify packname order Derrick Stolee via GitGitGadget
@ 2018-09-05 14:46 ` Derrick Stolee via GitGitGadget
  2018-09-05 14:46 ` [PATCH 06/11] multi-pack-index: verify oid fanout order Derrick Stolee via GitGitGadget
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 16 ++++++++++++++++
 t/t5319-multi-pack-index.sh |  5 +++++
 2 files changed, 21 insertions(+)

diff --git a/midx.c b/midx.c
index e655a15aed..a02b19efc1 100644
--- a/midx.c
+++ b/midx.c
@@ -926,13 +926,29 @@ void clear_midx_file(const char *object_dir)
 
 int verify_midx_error;
 
+static void midx_report(const char *fmt, ...)
+{
+	va_list ap;
+	verify_midx_error = 1;
+	va_start(ap, fmt);
+	vfprintf(stderr, fmt, ap);
+	fprintf(stderr, "\n");
+	va_end(ap);
+}
+
 int verify_midx_file(const char *object_dir)
 {
+	uint32_t i;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
 	verify_midx_error = 0;
 
 	if (!m)
 		return 0;
 
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(m, i))
+			midx_report("failed to load pack in position %d", i);
+	}
+
 	return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 4a8f231d93..52b7f7905b 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -222,6 +222,11 @@ test_expect_success 'verify packnames out of order' '
 		"pack names out of order"
 '
 
+test_expect_success 'verify packnames out of order' '
+	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
+		"failed to load pack"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 06/11] multi-pack-index: verify oid fanout order
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                   ` (4 preceding siblings ...)
  2018-09-05 14:46 ` [PATCH 05/11] multi-pack-index: verify missing pack Derrick Stolee via GitGitGadget
@ 2018-09-05 14:46 ` Derrick Stolee via GitGitGadget
  2018-09-05 14:46 ` [PATCH 07/11] multi-pack-index: verify oid lookup order Derrick Stolee via GitGitGadget
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 9 +++++++++
 t/t5319-multi-pack-index.sh | 8 ++++++++
 2 files changed, 17 insertions(+)

diff --git a/midx.c b/midx.c
index a02b19efc1..dfd26b4d74 100644
--- a/midx.c
+++ b/midx.c
@@ -950,5 +950,14 @@ int verify_midx_file(const char *object_dir)
 			midx_report("failed to load pack in position %d", i);
 	}
 
+	for (i = 0; i < 255; i++) {
+		uint32_t oid_fanout1 = ntohl(m->chunk_oid_fanout[i]);
+		uint32_t oid_fanout2 = ntohl(m->chunk_oid_fanout[i + 1]);
+
+		if (oid_fanout1 > oid_fanout2)
+			midx_report(_("oid fanout out of order: fanout[%d] = %"PRIx32" > %"PRIx32" = fanout[%d]"),
+				    i, oid_fanout1, oid_fanout2, i + 1);
+	}
+
 	return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 52b7f7905b..95ac7a6edd 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -186,6 +186,9 @@ MIDX_CHUNK_LOOKUP_WIDTH=12
 MIDX_OFFSET_PACKNAMES=$(($MIDX_HEADER_SIZE + \
 			 $MIDX_NUM_CHUNKS * $MIDX_CHUNK_LOOKUP_WIDTH))
 MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
+MIDX_OFFSET_OID_FANOUT=$(($MIDX_OFFSET_PACKNAMES + 652))
+MIDX_OID_FANOUT_WIDTH=4
+MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * $MIDX_OID_FANOUT_WIDTH + 1))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -227,6 +230,11 @@ test_expect_success 'verify packnames out of order' '
 		"failed to load pack"
 '
 
+test_expect_success 'verify oid fanout out of order' '
+	corrupt_midx_and_verify $MIDX_BYTE_OID_FANOUT_ORDER "\01" $objdir \
+		"oid fanout out of order"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 07/11] multi-pack-index: verify oid lookup order
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                   ` (5 preceding siblings ...)
  2018-09-05 14:46 ` [PATCH 06/11] multi-pack-index: verify oid fanout order Derrick Stolee via GitGitGadget
@ 2018-09-05 14:46 ` Derrick Stolee via GitGitGadget
  2018-09-05 14:46 ` [PATCH 08/11] multi-pack-index: fix 32-bit vs 64-bit size check Derrick Stolee via GitGitGadget
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 11 +++++++++++
 t/t5319-multi-pack-index.sh |  8 ++++++++
 2 files changed, 19 insertions(+)

diff --git a/midx.c b/midx.c
index dfd26b4d74..06d5cfc826 100644
--- a/midx.c
+++ b/midx.c
@@ -959,5 +959,16 @@ int verify_midx_file(const char *object_dir)
 				    i, oid_fanout1, oid_fanout2, i + 1);
 	}
 
+	for (i = 0; i < m->num_objects - 1; i++) {
+		struct object_id oid1, oid2;
+
+		nth_midxed_object_oid(&oid1, m, i);
+		nth_midxed_object_oid(&oid2, m, i + 1);
+
+		if (oidcmp(&oid1, &oid2) >= 0)
+			midx_report(_("oid lookup out of order: oid[%d] = %s >= %s = oid[%d]"),
+				    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
+	}
+
 	return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 95ac7a6edd..072b1d1a74 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -175,6 +175,7 @@ test_expect_success 'verify bad signature' '
 		"multi-pack-index signature"
 '
 
+HASH_LEN=20
 MIDX_BYTE_VERSION=4
 MIDX_BYTE_OID_VERSION=5
 MIDX_BYTE_CHUNK_COUNT=6
@@ -189,6 +190,8 @@ MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
 MIDX_OFFSET_OID_FANOUT=$(($MIDX_OFFSET_PACKNAMES + 652))
 MIDX_OID_FANOUT_WIDTH=4
 MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * $MIDX_OID_FANOUT_WIDTH + 1))
+MIDX_OFFSET_OID_LOOKUP=$(($MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
+MIDX_BYTE_OID_LOOKUP=$(($MIDX_OFFSET_OID_LOOKUP + 16 * $HASH_LEN))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -235,6 +238,11 @@ test_expect_success 'verify oid fanout out of order' '
 		"oid fanout out of order"
 '
 
+test_expect_success 'verify oid lookup out of order' '
+	corrupt_midx_and_verify $MIDX_BYTE_OID_LOOKUP "\00" $objdir \
+		"oid lookup out of order"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 08/11] multi-pack-index: fix 32-bit vs 64-bit size check
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                   ` (6 preceding siblings ...)
  2018-09-05 14:46 ` [PATCH 07/11] multi-pack-index: verify oid lookup order Derrick Stolee via GitGitGadget
@ 2018-09-05 14:46 ` Derrick Stolee via GitGitGadget
  2018-09-05 14:46 ` [PATCH 09/11] multi-pack-index: verify object offsets Derrick Stolee via GitGitGadget
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When loading a 64-bit offset, we intend to check that off_t can store
the resulting offset. However, the condition accidentally checks the
32-bit offset to see if it is smaller than a 64-bit value. Fix it,
and this will be covered by a test in the 'git multi-pack-index verify'
command in a later commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 06d5cfc826..80094c02a7 100644
--- a/midx.c
+++ b/midx.c
@@ -236,7 +236,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 	offset32 = get_be32(offset_data + sizeof(uint32_t));
 
 	if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) {
-		if (sizeof(offset32) < sizeof(uint64_t))
+		if (sizeof(off_t) < sizeof(uint64_t))
 			die(_("multi-pack-index stores a 64-bit offset, but off_t is too small"));
 
 		offset32 ^= MIDX_LARGE_OFFSET_NEEDED;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 09/11] multi-pack-index: verify object offsets
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                   ` (7 preceding siblings ...)
  2018-09-05 14:46 ` [PATCH 08/11] multi-pack-index: fix 32-bit vs 64-bit size check Derrick Stolee via GitGitGadget
@ 2018-09-05 14:46 ` Derrick Stolee via GitGitGadget
  2018-09-07  0:34   ` Eric Sunshine
  2018-09-05 14:46 ` [PATCH 10/11] multi-pack-index: report progress during 'verify' Derrick Stolee via GitGitGadget
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git multi-pack-index verify' command must verify the object
offsets stored in the multi-pack-index are correct. There are two
ways the offset chunk can be incorrect: the pack-int-id and the
object offset.

Replace the BUG() statement with a die() statement, now that we
may hit a bad pack-int-id during a 'verify' command on a corrupt
multi-pack-index, and it is covered by a test.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 28 +++++++++++++++++++++++++++-
 t/t5319-multi-pack-index.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 80094c02a7..87708dc367 100644
--- a/midx.c
+++ b/midx.c
@@ -197,7 +197,7 @@ int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
 	struct strbuf pack_name = STRBUF_INIT;
 
 	if (pack_int_id >= m->num_packs)
-		BUG("bad pack-int-id");
+		die(_("bad pack-int-id"));
 
 	if (m->packs[pack_int_id])
 		return 0;
@@ -970,5 +970,31 @@ int verify_midx_file(const char *object_dir)
 				    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
 	}
 
+	for (i = 0; i < m->num_objects; i++) {
+		struct object_id oid;
+		struct pack_entry e;
+		off_t m_offset, p_offset;
+
+		nth_midxed_object_oid(&oid, m, i);
+		if (!fill_midx_entry(&oid, &e, m)) {
+			midx_report(_("failed to load pack entry for oid[%d] = %s"),
+				    i, oid_to_hex(&oid));
+			continue;
+		}
+
+		if (open_pack_index(e.p)) {
+			midx_report(_("failed to load pack-index for packfile %s"),
+				    e.p->pack_name);
+			break;
+		}
+
+		m_offset = e.offset;
+		p_offset = find_pack_entry_one(oid.hash, e.p);
+
+		if (m_offset != p_offset)
+			midx_report(_("incorrect object offset for oid[%d] = %s: %"PRIx64" != %"PRIx64),
+				    i, oid_to_hex(&oid), m_offset, p_offset);
+	}
+
 	return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 072b1d1a74..b6c34631d3 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -176,6 +176,7 @@ test_expect_success 'verify bad signature' '
 '
 
 HASH_LEN=20
+NUM_OBJECTS=74
 MIDX_BYTE_VERSION=4
 MIDX_BYTE_OID_VERSION=5
 MIDX_BYTE_CHUNK_COUNT=6
@@ -192,6 +193,10 @@ MIDX_OID_FANOUT_WIDTH=4
 MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * $MIDX_OID_FANOUT_WIDTH + 1))
 MIDX_OFFSET_OID_LOOKUP=$(($MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
 MIDX_BYTE_OID_LOOKUP=$(($MIDX_OFFSET_OID_LOOKUP + 16 * $HASH_LEN))
+MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * $HASH_LEN))
+MIDX_OFFSET_WIDTH=8
+MIDX_BYTE_PACK_INT_ID=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH + 2))
+MIDX_BYTE_OFFSET=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH + 6))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -243,6 +248,16 @@ test_expect_success 'verify oid lookup out of order' '
 		"oid lookup out of order"
 '
 
+test_expect_success 'verify incorrect pack-int-id' '
+	corrupt_midx_and_verify $MIDX_BYTE_PACK_INT_ID "\07" $objdir \
+		"bad pack-int-id"
+'
+
+test_expect_success 'verify incorrect offset' '
+	corrupt_midx_and_verify $MIDX_BYTE_OFFSET "\07" $objdir \
+		"incorrect object offset"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
@@ -310,4 +325,16 @@ test_expect_success 'verify multi-pack-index with 64-bit offsets' '
 	git multi-pack-index verify --object-dir=objects64
 '
 
+NUM_OBJECTS=63
+MIDX_OFFSET_OID_FANOUT=$((MIDX_OFFSET_PACKNAMES + 54))
+MIDX_OFFSET_OID_LOOKUP=$((MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
+MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * $HASH_LEN))
+MIDX_OFFSET_LARGE_OFFSETS=$(($MIDX_OFFSET_OBJECT_OFFSETS + $NUM_OBJECTS * $MIDX_OFFSET_WIDTH))
+MIDX_BYTE_LARGE_OFFSET=$(($MIDX_OFFSET_LARGE_OFFSETS + 3))
+
+test_expect_success 'verify incorrect 64-bit offset' '
+	corrupt_midx_and_verify $MIDX_BYTE_LARGE_OFFSET "\07" objects64 \
+		"incorrect object offset"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 10/11] multi-pack-index: report progress during 'verify'
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                   ` (8 preceding siblings ...)
  2018-09-05 14:46 ` [PATCH 09/11] multi-pack-index: verify object offsets Derrick Stolee via GitGitGadget
@ 2018-09-05 14:46 ` Derrick Stolee via GitGitGadget
  2018-09-05 14:46 ` [PATCH 11/11] fsck: verify multi-pack-index Derrick Stolee via GitGitGadget
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When verifying a multi-pack-index, the only action that takes
significant time is checking the object offsets. For example,
to verify a multi-pack-index containing 6.2 million objects in
the Linux kernel repository takes 1.3 seconds on my machine.
99% of that time is spent looking up object offsets in each of
the packfiles and comparing them to the multi-pack-index offset.

Add a progress indicator for that section of the 'verify' verb.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/midx.c b/midx.c
index 87708dc367..489180a599 100644
--- a/midx.c
+++ b/midx.c
@@ -7,6 +7,7 @@
 #include "object-store.h"
 #include "sha1-lookup.h"
 #include "midx.h"
+#include "progress.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -939,6 +940,7 @@ static void midx_report(const char *fmt, ...)
 int verify_midx_file(const char *object_dir)
 {
 	uint32_t i;
+	struct progress *progress = NULL;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
 	verify_midx_error = 0;
 
@@ -970,6 +972,7 @@ int verify_midx_file(const char *object_dir)
 				    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
 	}
 
+	progress = start_progress(_("Verifying object offsets"), m->num_objects);
 	for (i = 0; i < m->num_objects; i++) {
 		struct object_id oid;
 		struct pack_entry e;
@@ -994,7 +997,10 @@ int verify_midx_file(const char *object_dir)
 		if (m_offset != p_offset)
 			midx_report(_("incorrect object offset for oid[%d] = %s: %"PRIx64" != %"PRIx64),
 				    i, oid_to_hex(&oid), m_offset, p_offset);
+
+		display_progress(progress, i + 1);
 	}
+	stop_progress(&progress);
 
 	return verify_midx_error;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 11/11] fsck: verify multi-pack-index
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                   ` (9 preceding siblings ...)
  2018-09-05 14:46 ` [PATCH 10/11] multi-pack-index: report progress during 'verify' Derrick Stolee via GitGitGadget
@ 2018-09-05 14:46 ` Derrick Stolee via GitGitGadget
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
  11 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-05 14:46 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When core.multiPackIndex is true, we may have a multi-pack-index
in our object directory. Add calls to 'git multi-pack-index verify'
at the end of 'git fsck' if so.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/fsck.c              | 18 ++++++++++++++++++
 t/t5319-multi-pack-index.sh | 13 ++++++++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 63c8578cc1..06eb421720 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -848,5 +848,23 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (!git_config_get_bool("core.multipackindex", &i) && i) {
+		struct child_process midx_verify = CHILD_PROCESS_INIT;
+		const char *midx_argv[] = { "multi-pack-index", "verify", NULL, NULL, NULL };
+
+		midx_verify.argv = midx_argv;
+		midx_verify.git_cmd = 1;
+		if (run_command(&midx_verify))
+			errors_found |= ERROR_COMMIT_GRAPH;
+
+		prepare_alt_odb(the_repository);
+		for (alt =  the_repository->objects->alt_odb_list; alt; alt = alt->next) {
+			midx_argv[2] = "--object-dir";
+			midx_argv[3] = alt->path;
+			if (run_command(&midx_verify))
+				errors_found |= ERROR_COMMIT_GRAPH;
+		}
+	}
+
 	return errors_found;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index b6c34631d3..63f8718fd7 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -160,12 +160,17 @@ corrupt_midx_and_verify() {
 	DATA="${2:-\0}"
 	OBJDIR=$3
 	GREPSTR="$4"
+	COMMAND="$5"
+	if test -z "$COMMAND"
+	then
+		COMMAND="git multi-pack-index verify --object-dir=$OBJDIR"
+	fi
 	FILE=$OBJDIR/pack/multi-pack-index &&
 	chmod a+w $FILE &&
 	test_when_finished mv midx-backup $FILE &&
 	cp $FILE midx-backup &&
 	printf "$DATA" | dd of="$FILE" bs=1 seek="$POS" conv=notrunc &&
-	test_must_fail git multi-pack-index verify --object-dir=$OBJDIR 2>test_err &&
+	test_must_fail $COMMAND 2>test_err &&
 	grep -v "^+" test_err >err &&
 	test_i18ngrep "$GREPSTR" err
 }
@@ -258,6 +263,12 @@ test_expect_success 'verify incorrect offset' '
 		"incorrect object offset"
 '
 
+test_expect_success 'git-fsck incorrect offset' '
+	corrupt_midx_and_verify $MIDX_BYTE_OFFSET "\07" $objdir \
+		"incorrect object offset" \
+		"git -c core.multipackindex=true fsck"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/11] multi-pack-index: verify packname order
  2018-09-05 14:46 ` [PATCH 04/11] multi-pack-index: verify packname order Derrick Stolee via GitGitGadget
@ 2018-09-05 18:15   ` Stefan Beller
  2018-09-05 19:11     ` Derrick Stolee
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2018-09-05 18:15 UTC (permalink / raw)
  To: gitgitgadget
  Cc: git, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee

On Wed, Sep 5, 2018 at 7:46 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The final check we make while loading a multi-pack-index is that
> the packfile names are in lexicographical order. Make this error
> be a die() instead.

What is the advantage of having a die() here?
Would the midx work under normal operation when
not sorted correctly?

This sounds like a hack for easy testability in this context,
so could you clarify why this helps the regular user?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 01/11] multi-pack-index: add 'verify' verb
  2018-09-05 14:46 ` [PATCH 01/11] multi-pack-index: add 'verify' verb Derrick Stolee via GitGitGadget
@ 2018-09-05 18:59   ` Eric Sunshine
  2018-09-05 19:37     ` Derrick Stolee
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Sunshine @ 2018-09-05 18:59 UTC (permalink / raw)
  To: gitgitgadget
  Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee

On Wed, Sep 5, 2018 at 10:46 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> The multi-pack-index builtin writes multi-pack-index files, and
> uses a 'write' verb to do so. Add a 'verify' verb that checks this
> file matches the contents of the pack-indexes it replaces.
> [...]
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
> @@ -5,7 +5,7 @@
>  static char const * const builtin_multi_pack_index_usage[] = {
> -       N_("git multi-pack-index [--object-dir=<dir>] write"),
> +       N_("git multi-pack-index [--object-dir=<dir>] [write|verify]"),

Nit: The square brackets make the verb optional. You probably want
parenthesis to indicate that one or the other is required:

    git multi-pack-index [--object-dir=<dir>] (write|verify)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/11] multi-pack-index: verify packname order
  2018-09-05 18:15   ` Stefan Beller
@ 2018-09-05 19:11     ` Derrick Stolee
  2018-09-05 19:14       ` Stefan Beller
  0 siblings, 1 reply; 34+ messages in thread
From: Derrick Stolee @ 2018-09-05 19:11 UTC (permalink / raw)
  To: Stefan Beller, gitgitgadget
  Cc: git, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee

On 9/5/2018 2:15 PM, Stefan Beller wrote:
> On Wed, Sep 5, 2018 at 7:46 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The final check we make while loading a multi-pack-index is that
>> the packfile names are in lexicographical order. Make this error
>> be a die() instead.
> What is the advantage of having a die() here?
> Would the midx work under normal operation when
> not sorted correctly?
>
> This sounds like a hack for easy testability in this context,
> so could you clarify why this helps the regular user?

The multi-pack-index will not work as expected, since 
midx_contains_pack() may provide incorrect results, and thus we will add 
the "missing" packfiles to the packed_git linked list.

This _should_ be a die(), as something unexpected occurred (the file 
doesn't match the format specification).


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/11] multi-pack-index: verify packname order
  2018-09-05 19:11     ` Derrick Stolee
@ 2018-09-05 19:14       ` Stefan Beller
  2018-09-05 19:28         ` Derrick Stolee
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Beller @ 2018-09-05 19:14 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: gitgitgadget, git, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, Derrick Stolee

On Wed, Sep 5, 2018 at 12:11 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 9/5/2018 2:15 PM, Stefan Beller wrote:
> > On Wed, Sep 5, 2018 at 7:46 AM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >>
> >> The final check we make while loading a multi-pack-index is that
> >> the packfile names are in lexicographical order. Make this error
> >> be a die() instead.
> > What is the advantage of having a die() here?
> > Would the midx work under normal operation when
> > not sorted correctly?
> >
> > This sounds like a hack for easy testability in this context,
> > so could you clarify why this helps the regular user?
>
> The multi-pack-index will not work as expected, since
> midx_contains_pack() may provide incorrect results, and thus we will add
> the "missing" packfiles to the packed_git linked list.
>
> This _should_ be a die(), as something unexpected occurred (the file
> doesn't match the format specification).

Thanks for the clarification.

So this patch actually hits 2 birds with one stone, as it fixes
a bug as well as adds the check for such corruption to the
verify step?

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 04/11] multi-pack-index: verify packname order
  2018-09-05 19:14       ` Stefan Beller
@ 2018-09-05 19:28         ` Derrick Stolee
  0 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee @ 2018-09-05 19:28 UTC (permalink / raw)
  To: Stefan Beller
  Cc: gitgitgadget, git, Ævar Arnfjörð Bjarmason,
	Junio C Hamano, Derrick Stolee

On 9/5/2018 3:14 PM, Stefan Beller wrote:
> On Wed, Sep 5, 2018 at 12:11 PM Derrick Stolee <stolee@gmail.com> wrote:
>> On 9/5/2018 2:15 PM, Stefan Beller wrote:
>>> On Wed, Sep 5, 2018 at 7:46 AM Derrick Stolee via GitGitGadget
>>> <gitgitgadget@gmail.com> wrote:
>>>> From: Derrick Stolee <dstolee@microsoft.com>
>>>>
>>>> The final check we make while loading a multi-pack-index is that
>>>> the packfile names are in lexicographical order. Make this error
>>>> be a die() instead.
>>> What is the advantage of having a die() here?
>>> Would the midx work under normal operation when
>>> not sorted correctly?
>>>
>>> This sounds like a hack for easy testability in this context,
>>> so could you clarify why this helps the regular user?
>> The multi-pack-index will not work as expected, since
>> midx_contains_pack() may provide incorrect results, and thus we will add
>> the "missing" packfiles to the packed_git linked list.
>>
>> This _should_ be a die(), as something unexpected occurred (the file
>> doesn't match the format specification).
> Thanks for the clarification.
>
> So this patch actually hits 2 birds with one stone, as it fixes
> a bug as well as adds the check for such corruption to the
> verify step?
This is a common occurrence in this series (replacing error() with 
die()) as we are now exercising those conditions and clarifying what 
should happen.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 01/11] multi-pack-index: add 'verify' verb
  2018-09-05 18:59   ` Eric Sunshine
@ 2018-09-05 19:37     ` Derrick Stolee
  0 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee @ 2018-09-05 19:37 UTC (permalink / raw)
  To: Eric Sunshine, gitgitgadget
  Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee

On 9/5/2018 2:59 PM, Eric Sunshine wrote:
> On Wed, Sep 5, 2018 at 10:46 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> The multi-pack-index builtin writes multi-pack-index files, and
>> uses a 'write' verb to do so. Add a 'verify' verb that checks this
>> file matches the contents of the pack-indexes it replaces.
>> [...]
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>> diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
>> @@ -5,7 +5,7 @@
>>   static char const * const builtin_multi_pack_index_usage[] = {
>> -       N_("git multi-pack-index [--object-dir=<dir>] write"),
>> +       N_("git multi-pack-index [--object-dir=<dir>] [write|verify]"),
> Nit: The square brackets make the verb optional. You probably want
> parenthesis to indicate that one or the other is required:
>
>      git multi-pack-index [--object-dir=<dir>] (write|verify)
Thanks for catching this mistake! This would be easy to miss and keep 
around forever.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 02/11] multi-pack-index: verify bad header
  2018-09-05 14:46 ` [PATCH 02/11] multi-pack-index: verify bad header Derrick Stolee via GitGitGadget
@ 2018-09-07  0:27   ` Eric Sunshine
  0 siblings, 0 replies; 34+ messages in thread
From: Eric Sunshine @ 2018-09-07  0:27 UTC (permalink / raw)
  To: gitgitgadget
  Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee

On Wed, Sep 5, 2018 at 10:46 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Add tests that check corrupted headers in a few ways:
> [...]
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> @@ -154,6 +154,51 @@ test_expect_success 'verify multi-pack-index success' '
> +corrupt_midx_and_verify() {
> +       POS=$1
> +       DATA="${2:-\0}"
> +       OBJDIR=$3
> +       GREPSTR="$4"

If you happen to re-roll, perhaps make these assignments part of the
&&-chain to protect against someone coming along some day an inserting
code before them without realizing that the &&-chain is broken.

> +       FILE=$OBJDIR/pack/multi-pack-index &&
> +       chmod a+w $FILE &&
> +       test_when_finished mv midx-backup $FILE &&
> +       cp $FILE midx-backup &&
> +       printf "$DATA" | dd of="$FILE" bs=1 seek="$POS" conv=notrunc &&
> +       test_must_fail git multi-pack-index verify --object-dir=$OBJDIR 2>test_err &&
> +       grep -v "^+" test_err >err &&
> +       test_i18ngrep "$GREPSTR" err
> +}

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/11] multi-pack-index: verify object offsets
  2018-09-05 14:46 ` [PATCH 09/11] multi-pack-index: verify object offsets Derrick Stolee via GitGitGadget
@ 2018-09-07  0:34   ` Eric Sunshine
  2018-09-07 13:10     ` Derrick Stolee
  0 siblings, 1 reply; 34+ messages in thread
From: Eric Sunshine @ 2018-09-07  0:34 UTC (permalink / raw)
  To: gitgitgadget
  Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee

On Wed, Sep 5, 2018 at 10:46 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> Replace the BUG() statement with a die() statement, now that we
> may hit a bad pack-int-id during a 'verify' command on a corrupt
> multi-pack-index, and it is covered by a test.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
> diff --git a/midx.c b/midx.c
> @@ -197,7 +197,7 @@ int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
>         if (pack_int_id >= m->num_packs)
> -               BUG("bad pack-int-id");
> +               die(_("bad pack-int-id"));

For someone trying to diagnose this issue, would it be helpful to know
(that is, print out) the values of pack_int_id and num_packs?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 09/11] multi-pack-index: verify object offsets
  2018-09-07  0:34   ` Eric Sunshine
@ 2018-09-07 13:10     ` Derrick Stolee
  0 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee @ 2018-09-07 13:10 UTC (permalink / raw)
  To: Eric Sunshine, gitgitgadget
  Cc: Git List, Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Derrick Stolee

On 9/6/2018 8:34 PM, Eric Sunshine wrote:
> On Wed, Sep 5, 2018 at 10:46 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> Replace the BUG() statement with a die() statement, now that we
>> may hit a bad pack-int-id during a 'verify' command on a corrupt
>> multi-pack-index, and it is covered by a test.
>>
>> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
>> ---
>> diff --git a/midx.c b/midx.c
>> @@ -197,7 +197,7 @@ int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
>>          if (pack_int_id >= m->num_packs)
>> -               BUG("bad pack-int-id");
>> +               die(_("bad pack-int-id"));
> For someone trying to diagnose this issue, would it be helpful to know
> (that is, print out) the values of pack_int_id and num_packs?
Can do! Thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 00/11] Add 'git multi-pack-index verify' command
  2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                   ` (10 preceding siblings ...)
  2018-09-05 14:46 ` [PATCH 11/11] fsck: verify multi-pack-index Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02 ` Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 01/11] multi-pack-index: add 'verify' verb Derrick Stolee via GitGitGadget
                     ` (10 more replies)
  11 siblings, 11 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano

The multi-pack-index file provides faster lookups in repos with many
packfiles by duplicating the information from multiple pack-indexes into a
single file. This series allows us to verify a multi-pack-index using 'git
multi-pack-index verify' and 'git fsck' (when core.multiPackIndex is true).

The pattern for the tests is similar to that found in t5318-commit-graph.sh.

During testing, I found a bug in how we check for the size of off_t (we are
not actually checking off_t, but instead uint32_t). See "multi-pack-index:
fix 32-bit vs 64-bit size check".

Thanks to Ævar [1], I added a commit that provides progress updates when
checking object offsets.

Based on ds/multi-pack-index

[1] 
https://public-inbox.org/git/20180904202729.13900-1-avarab@gmail.com/T/#u

Derrick Stolee (11):
  multi-pack-index: add 'verify' verb
  multi-pack-index: verify bad header
  multi-pack-index: verify corrupt chunk lookup table
  multi-pack-index: verify packname order
  multi-pack-index: verify missing pack
  multi-pack-index: verify oid fanout order
  multi-pack-index: verify oid lookup order
  multi-pack-index: fix 32-bit vs 64-bit size check
  multi-pack-index: verify object offsets
  multi-pack-index: report progress during 'verify'
  fsck: verify multi-pack-index

 Documentation/git-multi-pack-index.txt |  10 ++
 builtin/fsck.c                         |  18 ++++
 builtin/multi-pack-index.c             |   4 +-
 midx.c                                 | 113 ++++++++++++++++----
 midx.h                                 |   1 +
 t/t5319-multi-pack-index.sh            | 136 ++++++++++++++++++++++++-
 6 files changed, 262 insertions(+), 20 deletions(-)


base-commit: 6a22d521260f86dff8fe6f23ab329cebb62ba4f0
Published-As: https://github.com/gitgitgadget/git/releases/tags/pr-34%2Fderrickstolee%2Fmidx%2Fverify-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-34/derrickstolee/midx/verify-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/34

Range-diff vs v1:

  1:  8dc38afe2b !  1:  d8ffd84d67 multi-pack-index: add 'verify' verb
     @@ -47,7 +47,7 @@
       
       static char const * const builtin_multi_pack_index_usage[] = {
      -	N_("git multi-pack-index [--object-dir=<dir>] write"),
     -+	N_("git multi-pack-index [--object-dir=<dir>] [write|verify]"),
     ++	N_("git multi-pack-index [--object-dir=<dir>] (write|verify)"),
       	NULL
       };
       
  2:  787e1fb616 !  2:  9590895830 multi-pack-index: verify bad header
     @@ -61,10 +61,10 @@
       
      +# usage: corrupt_midx_and_verify <pos> <data> <objdir> <string>
      +corrupt_midx_and_verify() {
     -+	POS=$1
     -+	DATA="${2:-\0}"
     -+	OBJDIR=$3
     -+	GREPSTR="$4"
     ++	POS=$1 &&
     ++	DATA="${2:-\0}" &&
     ++	OBJDIR=$3 &&
     ++	GREPSTR="$4" &&
      +	FILE=$OBJDIR/pack/multi-pack-index &&
      +	chmod a+w $FILE &&
      +	test_when_finished mv midx-backup $FILE &&
  3:  b385aa2abf =  3:  2448173844 multi-pack-index: verify corrupt chunk lookup table
  4:  37ee24c82b =  4:  947241bfdc multi-pack-index: verify packname order
  5:  b747da415c =  5:  4058867380 multi-pack-index: verify missing pack
  6:  58e5c09468 =  6:  ea1c522702 multi-pack-index: verify oid fanout order
  7:  b21772d054 =  7:  511791de91 multi-pack-index: verify oid lookup order
  8:  b08d3f0055 =  8:  210649bf83 multi-pack-index: fix 32-bit vs 64-bit size check
  9:  e1498aea45 !  9:  ef20193d59 multi-pack-index: verify object offsets
     @@ -21,7 +21,8 @@
       
       	if (pack_int_id >= m->num_packs)
      -		BUG("bad pack-int-id");
     -+		die(_("bad pack-int-id"));
     ++		die(_("bad pack-int-id: %u (%u total packs"),
     ++		    pack_int_id, m->num_packs);
       
       	if (m->packs[pack_int_id])
       		return 0;
 10:  acf8cfd632 = 10:  29ebc17161 multi-pack-index: report progress during 'verify'
 11:  09d16aff20 ! 11:  406c88b456 fsck: verify multi-pack-index
     @@ -40,14 +40,14 @@
      --- a/t/t5319-multi-pack-index.sh
      +++ b/t/t5319-multi-pack-index.sh
      @@
     - 	DATA="${2:-\0}"
     - 	OBJDIR=$3
     - 	GREPSTR="$4"
     -+	COMMAND="$5"
     + 	DATA="${2:-\0}" &&
     + 	OBJDIR=$3 &&
     + 	GREPSTR="$4" &&
     ++	COMMAND="$5" &&
      +	if test -z "$COMMAND"
      +	then
      +		COMMAND="git multi-pack-index verify --object-dir=$OBJDIR"
     -+	fi
     ++	fi &&
       	FILE=$OBJDIR/pack/multi-pack-index &&
       	chmod a+w $FILE &&
       	test_when_finished mv midx-backup $FILE &&

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 01/11] multi-pack-index: add 'verify' verb
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02   ` Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 02/11] multi-pack-index: verify bad header Derrick Stolee via GitGitGadget
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The multi-pack-index builtin writes multi-pack-index files, and
uses a 'write' verb to do so. Add a 'verify' verb that checks this
file matches the contents of the pack-indexes it replaces.

The current implementation is a no-op, but will be extended in
small increments in later commits.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt | 10 ++++++++++
 builtin/multi-pack-index.c             |  4 +++-
 midx.c                                 | 13 +++++++++++++
 midx.h                                 |  1 +
 t/t5319-multi-pack-index.sh            |  8 ++++++++
 5 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 1f97e79912..f7778a2c85 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -27,6 +27,10 @@ write::
 	When given as the verb, write a new MIDX file to
 	`<dir>/packs/multi-pack-index`.
 
+verify::
+	When given as the verb, verify the contents of the MIDX file
+	at `<dir>/packs/multi-pack-index`.
+
 
 EXAMPLES
 --------
@@ -43,6 +47,12 @@ $ git multi-pack-index write
 $ git multi-pack-index --object-dir <alt> write
 -----------------------------------------------
 
+* Verify the MIDX file for the packfiles in the current .git folder.
++
+-----------------------------------------------
+$ git multi-pack-index verify
+-----------------------------------------------
+
 
 SEE ALSO
 --------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 2633efd95d..fca70f8e4f 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -5,7 +5,7 @@
 #include "midx.h"
 
 static char const * const builtin_multi_pack_index_usage[] = {
-	N_("git multi-pack-index [--object-dir=<dir>] write"),
+	N_("git multi-pack-index [--object-dir=<dir>] (write|verify)"),
 	NULL
 };
 
@@ -42,6 +42,8 @@ int cmd_multi_pack_index(int argc, const char **argv,
 
 	if (!strcmp(argv[0], "write"))
 		return write_midx_file(opts.object_dir);
+	if (!strcmp(argv[0], "verify"))
+		return verify_midx_file(opts.object_dir);
 
 	die(_("unrecognized verb: %s"), argv[0]);
 }
diff --git a/midx.c b/midx.c
index f3e8dbc108..b253bed517 100644
--- a/midx.c
+++ b/midx.c
@@ -928,3 +928,16 @@ void clear_midx_file(const char *object_dir)
 
 	free(midx);
 }
+
+int verify_midx_error;
+
+int verify_midx_file(const char *object_dir)
+{
+	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+	verify_midx_error = 0;
+
+	if (!m)
+		return 0;
+
+	return verify_midx_error;
+}
diff --git a/midx.h b/midx.h
index a210f1af2a..ce80b91c68 100644
--- a/midx.h
+++ b/midx.h
@@ -43,5 +43,6 @@ int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, i
 
 int write_midx_file(const char *object_dir);
 void clear_midx_file(const char *object_dir);
+int verify_midx_file(const char *object_dir);
 
 #endif
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 6f56b38674..1c4e0e6d31 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -150,6 +150,10 @@ test_expect_success 'write midx with twelve packs' '
 
 compare_results_with_midx "twelve packs"
 
+test_expect_success 'verify multi-pack-index success' '
+	git multi-pack-index verify --object-dir=$objdir
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
@@ -214,4 +218,8 @@ test_expect_success 'force some 64-bit offsets with pack-objects' '
 	midx_read_expect 1 63 5 objects64 " large-offsets"
 '
 
+test_expect_success 'verify multi-pack-index with 64-bit offsets' '
+	git multi-pack-index verify --object-dir=objects64
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 02/11] multi-pack-index: verify bad header
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 01/11] multi-pack-index: add 'verify' verb Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02   ` Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 03/11] multi-pack-index: verify corrupt chunk lookup table Derrick Stolee via GitGitGadget
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When verifying if a multi-pack-index file is valid, we want the
command to fail to signal an invalid file. Previously, we wrote
an error to stderr and continued as if we had no multi-pack-index.
Now, die() instead of error().

Add tests that check corrupted headers in a few ways:

* Bad signature
* Bad file version
* Bad hash version
* Truncated hash count
* Extended hash count

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 18 +++++----------
 t/t5319-multi-pack-index.sh | 46 ++++++++++++++++++++++++++++++++++++-
 2 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/midx.c b/midx.c
index b253bed517..ec78254bb6 100644
--- a/midx.c
+++ b/midx.c
@@ -76,24 +76,18 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 	m->local = local;
 
 	m->signature = get_be32(m->data);
-	if (m->signature != MIDX_SIGNATURE) {
-		error(_("multi-pack-index signature 0x%08x does not match signature 0x%08x"),
+	if (m->signature != MIDX_SIGNATURE)
+		die(_("multi-pack-index signature 0x%08x does not match signature 0x%08x"),
 		      m->signature, MIDX_SIGNATURE);
-		goto cleanup_fail;
-	}
 
 	m->version = m->data[MIDX_BYTE_FILE_VERSION];
-	if (m->version != MIDX_VERSION) {
-		error(_("multi-pack-index version %d not recognized"),
+	if (m->version != MIDX_VERSION)
+		die(_("multi-pack-index version %d not recognized"),
 		      m->version);
-		goto cleanup_fail;
-	}
 
 	hash_version = m->data[MIDX_BYTE_HASH_VERSION];
-	if (hash_version != MIDX_HASH_VERSION) {
-		error(_("hash version %u does not match"), hash_version);
-		goto cleanup_fail;
-	}
+	if (hash_version != MIDX_HASH_VERSION)
+		die(_("hash version %u does not match"), hash_version);
 	m->hash_len = MIDX_HASH_LEN;
 
 	m->num_chunks = m->data[MIDX_BYTE_NUM_CHUNKS];
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 1c4e0e6d31..e04b5f43a2 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -154,6 +154,51 @@ test_expect_success 'verify multi-pack-index success' '
 	git multi-pack-index verify --object-dir=$objdir
 '
 
+# usage: corrupt_midx_and_verify <pos> <data> <objdir> <string>
+corrupt_midx_and_verify() {
+	POS=$1 &&
+	DATA="${2:-\0}" &&
+	OBJDIR=$3 &&
+	GREPSTR="$4" &&
+	FILE=$OBJDIR/pack/multi-pack-index &&
+	chmod a+w $FILE &&
+	test_when_finished mv midx-backup $FILE &&
+	cp $FILE midx-backup &&
+	printf "$DATA" | dd of="$FILE" bs=1 seek="$POS" conv=notrunc &&
+	test_must_fail git multi-pack-index verify --object-dir=$OBJDIR 2>test_err &&
+	grep -v "^+" test_err >err &&
+	test_i18ngrep "$GREPSTR" err
+}
+
+test_expect_success 'verify bad signature' '
+	corrupt_midx_and_verify 0 "\00" $objdir \
+		"multi-pack-index signature"
+'
+
+MIDX_BYTE_VERSION=4
+MIDX_BYTE_OID_VERSION=5
+MIDX_BYTE_CHUNK_COUNT=6
+
+test_expect_success 'verify bad version' '
+	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
+		"multi-pack-index version"
+'
+
+test_expect_success 'verify bad OID version' '
+	corrupt_midx_and_verify $MIDX_BYTE_OID_VERSION "\02" $objdir \
+		"hash version"
+'
+
+test_expect_success 'verify truncated chunk count' '
+	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \
+		"missing required"
+'
+
+test_expect_success 'verify extended chunk count' '
+	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \
+		"terminating multi-pack-index chunk id appears earlier than expected"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
@@ -191,7 +236,6 @@ test_expect_success 'multi-pack-index in an alternate' '
 
 compare_results_with_midx "with alternate (remote midx)"
 
-
 # usage: corrupt_data <file> <pos> [<data>]
 corrupt_data () {
 	file=$1
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 03/11] multi-pack-index: verify corrupt chunk lookup table
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 01/11] multi-pack-index: add 'verify' verb Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 02/11] multi-pack-index: verify bad header Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02   ` Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 04/11] multi-pack-index: verify packname order Derrick Stolee via GitGitGadget
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      |  3 +++
 t/t5319-multi-pack-index.sh | 13 +++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/midx.c b/midx.c
index ec78254bb6..8b054b39ab 100644
--- a/midx.c
+++ b/midx.c
@@ -100,6 +100,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 		uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 +
 						 MIDX_CHUNKLOOKUP_WIDTH * i);
 
+		if (chunk_offset >= m->data_len)
+			die(_("invalid chunk offset (too large)"));
+
 		switch (chunk_id) {
 			case MIDX_CHUNKID_PACKNAMES:
 				m->chunk_pack_names = m->data + chunk_offset;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index e04b5f43a2..c54b6e7188 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -178,6 +178,9 @@ test_expect_success 'verify bad signature' '
 MIDX_BYTE_VERSION=4
 MIDX_BYTE_OID_VERSION=5
 MIDX_BYTE_CHUNK_COUNT=6
+MIDX_HEADER_SIZE=12
+MIDX_BYTE_CHUNK_ID=$MIDX_HEADER_SIZE
+MIDX_BYTE_CHUNK_OFFSET=$(($MIDX_HEADER_SIZE + 4))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -199,6 +202,16 @@ test_expect_success 'verify extended chunk count' '
 		"terminating multi-pack-index chunk id appears earlier than expected"
 '
 
+test_expect_success 'verify missing required chunk' '
+	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_ID "\01" $objdir \
+		"missing required"
+'
+
+test_expect_success 'verify invalid chunk offset' '
+	corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \
+		"invalid chunk offset (too large)"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 04/11] multi-pack-index: verify packname order
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                     ` (2 preceding siblings ...)
  2018-09-13 18:02   ` [PATCH v2 03/11] multi-pack-index: verify corrupt chunk lookup table Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02   ` Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 05/11] multi-pack-index: verify missing pack Derrick Stolee via GitGitGadget
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The final check we make while loading a multi-pack-index is that
the packfile names are in lexicographical order. Make this error
be a die() instead.

In order to test this condition, we need multiple packfiles.
Earlier in t5319-multi-pack-index.sh, we tested the interaction with
'git repack' but this limits us to one packfile in our object dir.
Move these repack tests until after the 'verify' tests.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      |  6 ++----
 t/t5319-multi-pack-index.sh | 10 ++++++++++
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/midx.c b/midx.c
index 8b054b39ab..e655a15aed 100644
--- a/midx.c
+++ b/midx.c
@@ -157,12 +157,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 
 		cur_pack_name += strlen(cur_pack_name) + 1;
 
-		if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0) {
-			error(_("multi-pack-index pack names out of order: '%s' before '%s'"),
+		if (i && strcmp(m->pack_names[i], m->pack_names[i - 1]) <= 0)
+			die(_("multi-pack-index pack names out of order: '%s' before '%s'"),
 			      m->pack_names[i - 1],
 			      m->pack_names[i]);
-			goto cleanup_fail;
-		}
 	}
 
 	return m;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index c54b6e7188..01a3cd6b00 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -181,6 +181,11 @@ MIDX_BYTE_CHUNK_COUNT=6
 MIDX_HEADER_SIZE=12
 MIDX_BYTE_CHUNK_ID=$MIDX_HEADER_SIZE
 MIDX_BYTE_CHUNK_OFFSET=$(($MIDX_HEADER_SIZE + 4))
+MIDX_NUM_CHUNKS=5
+MIDX_CHUNK_LOOKUP_WIDTH=12
+MIDX_OFFSET_PACKNAMES=$(($MIDX_HEADER_SIZE + \
+			 $MIDX_NUM_CHUNKS * $MIDX_CHUNK_LOOKUP_WIDTH))
+MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -212,6 +217,11 @@ test_expect_success 'verify invalid chunk offset' '
 		"invalid chunk offset (too large)"
 '
 
+test_expect_success 'verify packnames out of order' '
+	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "z" $objdir \
+		"pack names out of order"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 05/11] multi-pack-index: verify missing pack
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                     ` (3 preceding siblings ...)
  2018-09-13 18:02   ` [PATCH v2 04/11] multi-pack-index: verify packname order Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02   ` Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 06/11] multi-pack-index: verify oid fanout order Derrick Stolee via GitGitGadget
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 16 ++++++++++++++++
 t/t5319-multi-pack-index.sh |  5 +++++
 2 files changed, 21 insertions(+)

diff --git a/midx.c b/midx.c
index e655a15aed..a02b19efc1 100644
--- a/midx.c
+++ b/midx.c
@@ -926,13 +926,29 @@ void clear_midx_file(const char *object_dir)
 
 int verify_midx_error;
 
+static void midx_report(const char *fmt, ...)
+{
+	va_list ap;
+	verify_midx_error = 1;
+	va_start(ap, fmt);
+	vfprintf(stderr, fmt, ap);
+	fprintf(stderr, "\n");
+	va_end(ap);
+}
+
 int verify_midx_file(const char *object_dir)
 {
+	uint32_t i;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
 	verify_midx_error = 0;
 
 	if (!m)
 		return 0;
 
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(m, i))
+			midx_report("failed to load pack in position %d", i);
+	}
+
 	return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 01a3cd6b00..0a566afb05 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -222,6 +222,11 @@ test_expect_success 'verify packnames out of order' '
 		"pack names out of order"
 '
 
+test_expect_success 'verify packnames out of order' '
+	corrupt_midx_and_verify $MIDX_BYTE_PACKNAME_ORDER "a" $objdir \
+		"failed to load pack"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 06/11] multi-pack-index: verify oid fanout order
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                     ` (4 preceding siblings ...)
  2018-09-13 18:02   ` [PATCH v2 05/11] multi-pack-index: verify missing pack Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02   ` Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 07/11] multi-pack-index: verify oid lookup order Derrick Stolee via GitGitGadget
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 9 +++++++++
 t/t5319-multi-pack-index.sh | 8 ++++++++
 2 files changed, 17 insertions(+)

diff --git a/midx.c b/midx.c
index a02b19efc1..dfd26b4d74 100644
--- a/midx.c
+++ b/midx.c
@@ -950,5 +950,14 @@ int verify_midx_file(const char *object_dir)
 			midx_report("failed to load pack in position %d", i);
 	}
 
+	for (i = 0; i < 255; i++) {
+		uint32_t oid_fanout1 = ntohl(m->chunk_oid_fanout[i]);
+		uint32_t oid_fanout2 = ntohl(m->chunk_oid_fanout[i + 1]);
+
+		if (oid_fanout1 > oid_fanout2)
+			midx_report(_("oid fanout out of order: fanout[%d] = %"PRIx32" > %"PRIx32" = fanout[%d]"),
+				    i, oid_fanout1, oid_fanout2, i + 1);
+	}
+
 	return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 0a566afb05..47a54e138d 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -186,6 +186,9 @@ MIDX_CHUNK_LOOKUP_WIDTH=12
 MIDX_OFFSET_PACKNAMES=$(($MIDX_HEADER_SIZE + \
 			 $MIDX_NUM_CHUNKS * $MIDX_CHUNK_LOOKUP_WIDTH))
 MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
+MIDX_OFFSET_OID_FANOUT=$(($MIDX_OFFSET_PACKNAMES + 652))
+MIDX_OID_FANOUT_WIDTH=4
+MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * $MIDX_OID_FANOUT_WIDTH + 1))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -227,6 +230,11 @@ test_expect_success 'verify packnames out of order' '
 		"failed to load pack"
 '
 
+test_expect_success 'verify oid fanout out of order' '
+	corrupt_midx_and_verify $MIDX_BYTE_OID_FANOUT_ORDER "\01" $objdir \
+		"oid fanout out of order"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 07/11] multi-pack-index: verify oid lookup order
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                     ` (5 preceding siblings ...)
  2018-09-13 18:02   ` [PATCH v2 06/11] multi-pack-index: verify oid fanout order Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02   ` Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 08/11] multi-pack-index: fix 32-bit vs 64-bit size check Derrick Stolee via GitGitGadget
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 11 +++++++++++
 t/t5319-multi-pack-index.sh |  8 ++++++++
 2 files changed, 19 insertions(+)

diff --git a/midx.c b/midx.c
index dfd26b4d74..06d5cfc826 100644
--- a/midx.c
+++ b/midx.c
@@ -959,5 +959,16 @@ int verify_midx_file(const char *object_dir)
 				    i, oid_fanout1, oid_fanout2, i + 1);
 	}
 
+	for (i = 0; i < m->num_objects - 1; i++) {
+		struct object_id oid1, oid2;
+
+		nth_midxed_object_oid(&oid1, m, i);
+		nth_midxed_object_oid(&oid2, m, i + 1);
+
+		if (oidcmp(&oid1, &oid2) >= 0)
+			midx_report(_("oid lookup out of order: oid[%d] = %s >= %s = oid[%d]"),
+				    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
+	}
+
 	return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 47a54e138d..a968b9a959 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -175,6 +175,7 @@ test_expect_success 'verify bad signature' '
 		"multi-pack-index signature"
 '
 
+HASH_LEN=20
 MIDX_BYTE_VERSION=4
 MIDX_BYTE_OID_VERSION=5
 MIDX_BYTE_CHUNK_COUNT=6
@@ -189,6 +190,8 @@ MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
 MIDX_OFFSET_OID_FANOUT=$(($MIDX_OFFSET_PACKNAMES + 652))
 MIDX_OID_FANOUT_WIDTH=4
 MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * $MIDX_OID_FANOUT_WIDTH + 1))
+MIDX_OFFSET_OID_LOOKUP=$(($MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
+MIDX_BYTE_OID_LOOKUP=$(($MIDX_OFFSET_OID_LOOKUP + 16 * $HASH_LEN))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -235,6 +238,11 @@ test_expect_success 'verify oid fanout out of order' '
 		"oid fanout out of order"
 '
 
+test_expect_success 'verify oid lookup out of order' '
+	corrupt_midx_and_verify $MIDX_BYTE_OID_LOOKUP "\00" $objdir \
+		"oid lookup out of order"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 08/11] multi-pack-index: fix 32-bit vs 64-bit size check
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                     ` (6 preceding siblings ...)
  2018-09-13 18:02   ` [PATCH v2 07/11] multi-pack-index: verify oid lookup order Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02   ` Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 09/11] multi-pack-index: verify object offsets Derrick Stolee via GitGitGadget
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When loading a 64-bit offset, we intend to check that off_t can store
the resulting offset. However, the condition accidentally checks the
32-bit offset to see if it is smaller than a 64-bit value. Fix it,
and this will be covered by a test in the 'git multi-pack-index verify'
command in a later commit.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 06d5cfc826..80094c02a7 100644
--- a/midx.c
+++ b/midx.c
@@ -236,7 +236,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
 	offset32 = get_be32(offset_data + sizeof(uint32_t));
 
 	if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) {
-		if (sizeof(offset32) < sizeof(uint64_t))
+		if (sizeof(off_t) < sizeof(uint64_t))
 			die(_("multi-pack-index stores a 64-bit offset, but off_t is too small"));
 
 		offset32 ^= MIDX_LARGE_OFFSET_NEEDED;
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 09/11] multi-pack-index: verify object offsets
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                     ` (7 preceding siblings ...)
  2018-09-13 18:02   ` [PATCH v2 08/11] multi-pack-index: fix 32-bit vs 64-bit size check Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02   ` Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 10/11] multi-pack-index: report progress during 'verify' Derrick Stolee via GitGitGadget
  2018-09-13 18:02   ` [PATCH v2 11/11] fsck: verify multi-pack-index Derrick Stolee via GitGitGadget
  10 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The 'git multi-pack-index verify' command must verify the object
offsets stored in the multi-pack-index are correct. There are two
ways the offset chunk can be incorrect: the pack-int-id and the
object offset.

Replace the BUG() statement with a die() statement, now that we
may hit a bad pack-int-id during a 'verify' command on a corrupt
multi-pack-index, and it is covered by a test.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c                      | 29 ++++++++++++++++++++++++++++-
 t/t5319-multi-pack-index.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 80094c02a7..47e7e6113a 100644
--- a/midx.c
+++ b/midx.c
@@ -197,7 +197,8 @@ int prepare_midx_pack(struct multi_pack_index *m, uint32_t pack_int_id)
 	struct strbuf pack_name = STRBUF_INIT;
 
 	if (pack_int_id >= m->num_packs)
-		BUG("bad pack-int-id");
+		die(_("bad pack-int-id: %u (%u total packs"),
+		    pack_int_id, m->num_packs);
 
 	if (m->packs[pack_int_id])
 		return 0;
@@ -970,5 +971,31 @@ int verify_midx_file(const char *object_dir)
 				    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
 	}
 
+	for (i = 0; i < m->num_objects; i++) {
+		struct object_id oid;
+		struct pack_entry e;
+		off_t m_offset, p_offset;
+
+		nth_midxed_object_oid(&oid, m, i);
+		if (!fill_midx_entry(&oid, &e, m)) {
+			midx_report(_("failed to load pack entry for oid[%d] = %s"),
+				    i, oid_to_hex(&oid));
+			continue;
+		}
+
+		if (open_pack_index(e.p)) {
+			midx_report(_("failed to load pack-index for packfile %s"),
+				    e.p->pack_name);
+			break;
+		}
+
+		m_offset = e.offset;
+		p_offset = find_pack_entry_one(oid.hash, e.p);
+
+		if (m_offset != p_offset)
+			midx_report(_("incorrect object offset for oid[%d] = %s: %"PRIx64" != %"PRIx64),
+				    i, oid_to_hex(&oid), m_offset, p_offset);
+	}
+
 	return verify_midx_error;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index a968b9a959..828c240389 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -176,6 +176,7 @@ test_expect_success 'verify bad signature' '
 '
 
 HASH_LEN=20
+NUM_OBJECTS=74
 MIDX_BYTE_VERSION=4
 MIDX_BYTE_OID_VERSION=5
 MIDX_BYTE_CHUNK_COUNT=6
@@ -192,6 +193,10 @@ MIDX_OID_FANOUT_WIDTH=4
 MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * $MIDX_OID_FANOUT_WIDTH + 1))
 MIDX_OFFSET_OID_LOOKUP=$(($MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
 MIDX_BYTE_OID_LOOKUP=$(($MIDX_OFFSET_OID_LOOKUP + 16 * $HASH_LEN))
+MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * $HASH_LEN))
+MIDX_OFFSET_WIDTH=8
+MIDX_BYTE_PACK_INT_ID=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH + 2))
+MIDX_BYTE_OFFSET=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH + 6))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -243,6 +248,16 @@ test_expect_success 'verify oid lookup out of order' '
 		"oid lookup out of order"
 '
 
+test_expect_success 'verify incorrect pack-int-id' '
+	corrupt_midx_and_verify $MIDX_BYTE_PACK_INT_ID "\07" $objdir \
+		"bad pack-int-id"
+'
+
+test_expect_success 'verify incorrect offset' '
+	corrupt_midx_and_verify $MIDX_BYTE_OFFSET "\07" $objdir \
+		"incorrect object offset"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
@@ -310,4 +325,16 @@ test_expect_success 'verify multi-pack-index with 64-bit offsets' '
 	git multi-pack-index verify --object-dir=objects64
 '
 
+NUM_OBJECTS=63
+MIDX_OFFSET_OID_FANOUT=$((MIDX_OFFSET_PACKNAMES + 54))
+MIDX_OFFSET_OID_LOOKUP=$((MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
+MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * $HASH_LEN))
+MIDX_OFFSET_LARGE_OFFSETS=$(($MIDX_OFFSET_OBJECT_OFFSETS + $NUM_OBJECTS * $MIDX_OFFSET_WIDTH))
+MIDX_BYTE_LARGE_OFFSET=$(($MIDX_OFFSET_LARGE_OFFSETS + 3))
+
+test_expect_success 'verify incorrect 64-bit offset' '
+	corrupt_midx_and_verify $MIDX_BYTE_LARGE_OFFSET "\07" objects64 \
+		"incorrect object offset"
+'
+
 test_done
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 10/11] multi-pack-index: report progress during 'verify'
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                     ` (8 preceding siblings ...)
  2018-09-13 18:02   ` [PATCH v2 09/11] multi-pack-index: verify object offsets Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02   ` Derrick Stolee via GitGitGadget
  2018-09-13 19:02     ` Ævar Arnfjörð Bjarmason
  2018-09-13 18:02   ` [PATCH v2 11/11] fsck: verify multi-pack-index Derrick Stolee via GitGitGadget
  10 siblings, 1 reply; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When verifying a multi-pack-index, the only action that takes
significant time is checking the object offsets. For example,
to verify a multi-pack-index containing 6.2 million objects in
the Linux kernel repository takes 1.3 seconds on my machine.
99% of that time is spent looking up object offsets in each of
the packfiles and comparing them to the multi-pack-index offset.

Add a progress indicator for that section of the 'verify' verb.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 midx.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/midx.c b/midx.c
index 47e7e6113a..4d4c930522 100644
--- a/midx.c
+++ b/midx.c
@@ -7,6 +7,7 @@
 #include "object-store.h"
 #include "sha1-lookup.h"
 #include "midx.h"
+#include "progress.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -940,6 +941,7 @@ static void midx_report(const char *fmt, ...)
 int verify_midx_file(const char *object_dir)
 {
 	uint32_t i;
+	struct progress *progress = NULL;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
 	verify_midx_error = 0;
 
@@ -971,6 +973,7 @@ int verify_midx_file(const char *object_dir)
 				    i, oid_to_hex(&oid1), oid_to_hex(&oid2), i + 1);
 	}
 
+	progress = start_progress(_("Verifying object offsets"), m->num_objects);
 	for (i = 0; i < m->num_objects; i++) {
 		struct object_id oid;
 		struct pack_entry e;
@@ -995,7 +998,10 @@ int verify_midx_file(const char *object_dir)
 		if (m_offset != p_offset)
 			midx_report(_("incorrect object offset for oid[%d] = %s: %"PRIx64" != %"PRIx64),
 				    i, oid_to_hex(&oid), m_offset, p_offset);
+
+		display_progress(progress, i + 1);
 	}
+	stop_progress(&progress);
 
 	return verify_midx_error;
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH v2 11/11] fsck: verify multi-pack-index
  2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
                     ` (9 preceding siblings ...)
  2018-09-13 18:02   ` [PATCH v2 10/11] multi-pack-index: report progress during 'verify' Derrick Stolee via GitGitGadget
@ 2018-09-13 18:02   ` Derrick Stolee via GitGitGadget
  10 siblings, 0 replies; 34+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2018-09-13 18:02 UTC (permalink / raw)
  To: git; +Cc: avarab, Junio C Hamano, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When core.multiPackIndex is true, we may have a multi-pack-index
in our object directory. Add calls to 'git multi-pack-index verify'
at the end of 'git fsck' if so.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/fsck.c              | 18 ++++++++++++++++++
 t/t5319-multi-pack-index.sh | 13 ++++++++++++-
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 63c8578cc1..06eb421720 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -848,5 +848,23 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (!git_config_get_bool("core.multipackindex", &i) && i) {
+		struct child_process midx_verify = CHILD_PROCESS_INIT;
+		const char *midx_argv[] = { "multi-pack-index", "verify", NULL, NULL, NULL };
+
+		midx_verify.argv = midx_argv;
+		midx_verify.git_cmd = 1;
+		if (run_command(&midx_verify))
+			errors_found |= ERROR_COMMIT_GRAPH;
+
+		prepare_alt_odb(the_repository);
+		for (alt =  the_repository->objects->alt_odb_list; alt; alt = alt->next) {
+			midx_argv[2] = "--object-dir";
+			midx_argv[3] = alt->path;
+			if (run_command(&midx_verify))
+				errors_found |= ERROR_COMMIT_GRAPH;
+		}
+	}
+
 	return errors_found;
 }
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 828c240389..bd8e841b81 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -160,12 +160,17 @@ corrupt_midx_and_verify() {
 	DATA="${2:-\0}" &&
 	OBJDIR=$3 &&
 	GREPSTR="$4" &&
+	COMMAND="$5" &&
+	if test -z "$COMMAND"
+	then
+		COMMAND="git multi-pack-index verify --object-dir=$OBJDIR"
+	fi &&
 	FILE=$OBJDIR/pack/multi-pack-index &&
 	chmod a+w $FILE &&
 	test_when_finished mv midx-backup $FILE &&
 	cp $FILE midx-backup &&
 	printf "$DATA" | dd of="$FILE" bs=1 seek="$POS" conv=notrunc &&
-	test_must_fail git multi-pack-index verify --object-dir=$OBJDIR 2>test_err &&
+	test_must_fail $COMMAND 2>test_err &&
 	grep -v "^+" test_err >err &&
 	test_i18ngrep "$GREPSTR" err
 }
@@ -258,6 +263,12 @@ test_expect_success 'verify incorrect offset' '
 		"incorrect object offset"
 '
 
+test_expect_success 'git-fsck incorrect offset' '
+	corrupt_midx_and_verify $MIDX_BYTE_OFFSET "\07" $objdir \
+		"incorrect object offset" \
+		"git -c core.multipackindex=true fsck"
+'
+
 test_expect_success 'repack removes multi-pack-index' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
 	git repack -adf &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 10/11] multi-pack-index: report progress during 'verify'
  2018-09-13 18:02   ` [PATCH v2 10/11] multi-pack-index: report progress during 'verify' Derrick Stolee via GitGitGadget
@ 2018-09-13 19:02     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 34+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-09-13 19:02 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Junio C Hamano, Derrick Stolee


On Thu, Sep 13 2018, Derrick Stolee via GitGitGadget wrote:

> +	progress = start_progress(_("Verifying object offsets"), m->num_objects);

I think in the spirit of my "commit-graph {write,verify}: add progress
output" it would be better to say:

    "Verifying multi-pack-index object offsets"

I.e. both to make it clearer what's actually going on (without much
added verbosity), and also so that when the user turns on this feature
it's clear what gc / fsck etc. are doing for that feature in particular,
as with the commit-graph.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2018-09-13 19:02 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-05 14:46 [PATCH 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
2018-09-05 14:46 ` [PATCH 01/11] multi-pack-index: add 'verify' verb Derrick Stolee via GitGitGadget
2018-09-05 18:59   ` Eric Sunshine
2018-09-05 19:37     ` Derrick Stolee
2018-09-05 14:46 ` [PATCH 02/11] multi-pack-index: verify bad header Derrick Stolee via GitGitGadget
2018-09-07  0:27   ` Eric Sunshine
2018-09-05 14:46 ` [PATCH 03/11] multi-pack-index: verify corrupt chunk lookup table Derrick Stolee via GitGitGadget
2018-09-05 14:46 ` [PATCH 04/11] multi-pack-index: verify packname order Derrick Stolee via GitGitGadget
2018-09-05 18:15   ` Stefan Beller
2018-09-05 19:11     ` Derrick Stolee
2018-09-05 19:14       ` Stefan Beller
2018-09-05 19:28         ` Derrick Stolee
2018-09-05 14:46 ` [PATCH 05/11] multi-pack-index: verify missing pack Derrick Stolee via GitGitGadget
2018-09-05 14:46 ` [PATCH 06/11] multi-pack-index: verify oid fanout order Derrick Stolee via GitGitGadget
2018-09-05 14:46 ` [PATCH 07/11] multi-pack-index: verify oid lookup order Derrick Stolee via GitGitGadget
2018-09-05 14:46 ` [PATCH 08/11] multi-pack-index: fix 32-bit vs 64-bit size check Derrick Stolee via GitGitGadget
2018-09-05 14:46 ` [PATCH 09/11] multi-pack-index: verify object offsets Derrick Stolee via GitGitGadget
2018-09-07  0:34   ` Eric Sunshine
2018-09-07 13:10     ` Derrick Stolee
2018-09-05 14:46 ` [PATCH 10/11] multi-pack-index: report progress during 'verify' Derrick Stolee via GitGitGadget
2018-09-05 14:46 ` [PATCH 11/11] fsck: verify multi-pack-index Derrick Stolee via GitGitGadget
2018-09-13 18:02 ` [PATCH v2 00/11] Add 'git multi-pack-index verify' command Derrick Stolee via GitGitGadget
2018-09-13 18:02   ` [PATCH v2 01/11] multi-pack-index: add 'verify' verb Derrick Stolee via GitGitGadget
2018-09-13 18:02   ` [PATCH v2 02/11] multi-pack-index: verify bad header Derrick Stolee via GitGitGadget
2018-09-13 18:02   ` [PATCH v2 03/11] multi-pack-index: verify corrupt chunk lookup table Derrick Stolee via GitGitGadget
2018-09-13 18:02   ` [PATCH v2 04/11] multi-pack-index: verify packname order Derrick Stolee via GitGitGadget
2018-09-13 18:02   ` [PATCH v2 05/11] multi-pack-index: verify missing pack Derrick Stolee via GitGitGadget
2018-09-13 18:02   ` [PATCH v2 06/11] multi-pack-index: verify oid fanout order Derrick Stolee via GitGitGadget
2018-09-13 18:02   ` [PATCH v2 07/11] multi-pack-index: verify oid lookup order Derrick Stolee via GitGitGadget
2018-09-13 18:02   ` [PATCH v2 08/11] multi-pack-index: fix 32-bit vs 64-bit size check Derrick Stolee via GitGitGadget
2018-09-13 18:02   ` [PATCH v2 09/11] multi-pack-index: verify object offsets Derrick Stolee via GitGitGadget
2018-09-13 18:02   ` [PATCH v2 10/11] multi-pack-index: report progress during 'verify' Derrick Stolee via GitGitGadget
2018-09-13 19:02     ` Ævar Arnfjörð Bjarmason
2018-09-13 18:02   ` [PATCH v2 11/11] fsck: verify multi-pack-index Derrick Stolee via GitGitGadget

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).