git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX
@ 2022-08-19 21:30 Taylor Blau
  2022-08-19 21:30 ` [PATCH 1/6] t5326: demonstrate potential bitmap corruption Taylor Blau
                   ` (7 more replies)
  0 siblings, 8 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

This series resolves a bug that was reported[1] by Johannes, and
investigated by him, Abhradeep, and Stolee in that same sub-thread.

The crux of the issue is that a MIDX bitmap can enter a corrupt state
when changing the preferred pack from its value in an existing MIDX in
certain circumstances as described in the first and final patches.

This series is structured as follows:

  - The first patch of this series adds a test which demonstrates the
    problem. (This is an improvement from the debugging in [1], where we
    only noticed the problem racily in an existing test, and only in
    SHA-256 mode).

  - The next small handful of patches refactor midx.c's
    `get_sorted_entries()` function to make fixing this bug more
    straightforward.

  - The final patch resolves the bug and updates the test to no longer
    expect failure.

A couple of meta-notes:

  - This bug has existed since the introduction of MIDX bitmaps, but
    probably wasn't noticed until now since it is only triggerable when
    the `--stdin-packs` mode *isn't* passed, so it never occurs when
    invoked via `git repack`.

  - We could likely change the behavior of 56d863e979 (midx: expose
    `write_midx_file_only()` publicly, 2021-09-28), which explicitly
    disables reusing the existing MIDX (by avoiding loading it
    altogether) when `--stdin-packs` is given.

    The rationale in the comment added by 56d863e979 is somewhat unclear,
    but I have a vague recollection of running into a bug that was
    squashed by avoiding reusing an existing MIDX to write one with
    bitmaps. This was likely that bug.

Thanks in advance for your review, and thanks to Johannes, Abhradeep,
and Stolee for investigating this bug while I was on vacation.

[1]: https://lore.kernel.org/git/p3r70610-8n52-s8q0-n641-onp4ps01330n@tzk.qr/

Taylor Blau (6):
  t5326: demonstrate potential bitmap corruption
  t/lib-bitmap.sh: avoid silencing stderr
  midx.c: extract `struct midx_fanout`
  midx.c: extract `midx_fanout_add_midx_fanout()`
  midx.c: extract `midx_fanout_add_pack_fanout()`
  midx.c: include preferred pack correctly with existing MIDX

 midx.c                        | 128 ++++++++++++++++++++++------------
 t/lib-bitmap.sh               |   2 +-
 t/t5326-multi-pack-bitmaps.sh |  43 ++++++++++++
 3 files changed, 127 insertions(+), 46 deletions(-)

-- 
2.37.0.1.g1379af2e9d

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/6] t5326: demonstrate potential bitmap corruption
  2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau
@ 2022-08-19 21:30 ` Taylor Blau
  2022-08-22 16:09   ` Derrick Stolee
  2022-08-19 21:30 ` [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

It is possible to generate a corrupt MIDX bitmap when certain conditions
are met. This happens when the preferred pack "P" changes to one (say,
"Q") that:

  - "Q" has objects included in an existing MIDX,
  - but "Q" is different than "P",
  - and "Q" and "P" have some objects in common

When this is the case, not all objects from "Q" will be selected from
"Q" (ie., the generated MIDX will represent them as coming from a
different pack), despite "Q" being preferred.

This is an invariant violation, since all objects contained in the
MIDX's preferred pack are supposed to originate from the preferred pack.
In other words, all duplicate objects are resolved in favor of the copy
that comes from the MIDX's preferred pack, if any.

This violation results in a corrupt object order, which cannot be
interpreted by the pack-bitmap code, leading to broken clones and other
defects.

This test demonstrates the above problem by constructing a minimal
reproduction, and showing that the final `git clone` invocation fails.

The reproduction is mostly straightforward, except that the new pack
generated between MIDX writes (which is necessary in order to prevent
that operation from being a noop) must sort ahead of all existing packs
in order to prevent a different pack (neither "P" nor "Q") from
appearing as preferred (meaning all its objects appear in order at the
beginning of the pseudo-pack order).

Subsequent commits will first refactor the midx.c::get_sorted_entries()
function, and then fix this bug.

Reported-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 43 +++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4fe57414c1..a60cec5cab 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -307,4 +307,47 @@ test_expect_success 'graceful fallback when missing reverse index' '
 	)
 '
 
+test_expect_success 'preferred pack change with existing MIDX bitmap' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit base &&
+		test_commit other &&
+
+		git rev-list --objects --no-object-names base >p1.objects &&
+		git rev-list --objects --no-object-names other >p2.objects &&
+
+		p1="$(git pack-objects "$objdir/pack/pack" \
+			--delta-base-offset <p1.objects)" &&
+		p2="$(git pack-objects "$objdir/pack/pack" \
+			--delta-base-offset <p2.objects)" &&
+
+		# Generate a MIDX containing the first two packs, marking p1 as
+		# preferred, and ensure that it can be successfully cloned.
+		git multi-pack-index write --bitmap \
+			--preferred-pack="pack-$p1.pack" &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		git clone --no-local . clone1 &&
+
+		# Then generate a new pack which sorts ahead of any existing
+		# pack (by tweaking the pack prefix).
+		test_commit foo &&
+		git pack-objects --all --unpacked $objdir/pack/pack0 &&
+
+		# Generate a new MIDX which changes the preferred pack to a pack
+		# contained in the existing MIDX, such that not all objects from
+		# p2 that appear in the MIDX had their copy selected from p2.
+		git multi-pack-index write --bitmap \
+			--preferred-pack="pack-$p2.pack" &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+		test_must_fail git clone --no-local . clone2
+	)
+'
+
 test_done
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr
  2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau
  2022-08-19 21:30 ` [PATCH 1/6] t5326: demonstrate potential bitmap corruption Taylor Blau
@ 2022-08-19 21:30 ` Taylor Blau
  2022-08-20 16:44   ` Abhradeep Chakraborty
  2022-08-19 21:30 ` [PATCH 3/6] midx.c: extract `struct midx_fanout` Taylor Blau
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 29+ messages in thread
From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

The midx_bitmap_partial_tests() function is responsible for setting up a
state where some (but not all) packs in the repository are covered by a
MIDX (and bitmap).

This function has redirected the `git multi-pack-index write --bitmap`'s
stderr to a file "err" since its introduction back in c51f5a6437 (t5326:
test multi-pack bitmap behavior, 2021-08-31).

This was likely a stray change left over from a slightly different
version of this test, since the file "err" is never read after being
written. This leads to confusingly-missing output, especially when the
contents of stderr are important.

Resolve this confusion by avoiding silencing stderr in this case.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index a95537e759..f595937094 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -440,7 +440,7 @@ midx_bitmap_partial_tests () {
 		test_commit packed &&
 		git repack &&
 		test_commit loose &&
-		git multi-pack-index write --bitmap 2>err &&
+		git multi-pack-index write --bitmap &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 	'
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 3/6] midx.c: extract `struct midx_fanout`
  2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau
  2022-08-19 21:30 ` [PATCH 1/6] t5326: demonstrate potential bitmap corruption Taylor Blau
  2022-08-19 21:30 ` [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau
@ 2022-08-19 21:30 ` Taylor Blau
  2022-08-19 21:30 ` [PATCH 4/6] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

To build up a list of objects (along with their packs, and the offsets
within those packs that each object appears at), the MIDX code
implements `get_sorted_entries()` which builds up a list of candidates,
sorts them, and then removes duplicate entries.

To do this, it keeps an array of `pack_midx_entry` structures that it
builds up once for each fanout level (ie., for all possible values of
the first byte of each object's ID).

This array is a function-local variable of `get_sorted_entries()`. Since
it uses the ALLOC_GROW() macro, having the `alloc_fanout` variable also
be local to that function, and only modified within that function is
convenient.

However, subsequent changes will extract the two ways this array is
filled (from a pack at some fanout value, and from an existing MIDX at
some fanout value) into separate functions. Instead of passing around
pointers to the entries array, along with `nr_fanout` and
`alloc_fanout`, encapsulate these three into a structure instead. Then
pass around a pointer to this structure instead.

This patch does not yet extract the above two functions, but sets us up
to begin doing so in the following commit. For now, the implementation
of get_sorted_entries() is only modified to replace `entries_by_fanout`
with `fanout.entries`, `nr_fanout` with `fanout.nr`, and so on.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 54 +++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 19 deletions(-)

diff --git a/midx.c b/midx.c
index 4e956cacb7..cdb6c481c7 100644
--- a/midx.c
+++ b/midx.c
@@ -577,6 +577,22 @@ static void fill_pack_entry(uint32_t pack_int_id,
 	entry->preferred = !!preferred;
 }
 
+struct midx_fanout {
+	struct pack_midx_entry *entries;
+	uint32_t nr;
+	uint32_t alloc;
+};
+
+static void midx_fanout_grow(struct midx_fanout *fanout, uint32_t nr)
+{
+	ALLOC_GROW(fanout->entries, nr, fanout->alloc);
+}
+
+static void midx_fanout_sort(struct midx_fanout *fanout)
+{
+	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -595,8 +611,8 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						  int preferred_pack)
 {
 	uint32_t cur_fanout, cur_pack, cur_object;
-	uint32_t alloc_fanout, alloc_objects, total_objects = 0;
-	struct pack_midx_entry *entries_by_fanout = NULL;
+	uint32_t alloc_objects, total_objects = 0;
+	struct midx_fanout fanout = { 0 };
 	struct pack_midx_entry *deduplicated_entries = NULL;
 	uint32_t start_pack = m ? m->num_packs : 0;
 
@@ -608,14 +624,14 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 	 * slices to be evenly distributed, with some noise. Hence,
 	 * allocate slightly more than one 256th.
 	 */
-	alloc_objects = alloc_fanout = total_objects > 3200 ? total_objects / 200 : 16;
+	alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16;
 
-	ALLOC_ARRAY(entries_by_fanout, alloc_fanout);
+	ALLOC_ARRAY(fanout.entries, fanout.alloc);
 	ALLOC_ARRAY(deduplicated_entries, alloc_objects);
 	*nr_objects = 0;
 
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
-		uint32_t nr_fanout = 0;
+		fanout.nr = 0;
 
 		if (m) {
 			uint32_t start = 0, end;
@@ -625,15 +641,15 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 			end = ntohl(m->chunk_oid_fanout[cur_fanout]);
 
 			for (cur_object = start; cur_object < end; cur_object++) {
-				ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
+				midx_fanout_grow(&fanout, fanout.nr + 1);
 				nth_midxed_pack_midx_entry(m,
-							   &entries_by_fanout[nr_fanout],
+							   &fanout.entries[fanout.nr],
 							   cur_object);
 				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
-					entries_by_fanout[nr_fanout].preferred = 1;
+					fanout.entries[fanout.nr].preferred = 1;
 				else
-					entries_by_fanout[nr_fanout].preferred = 0;
-				nr_fanout++;
+					fanout.entries[fanout.nr].preferred = 0;
+				fanout.nr++;
 			}
 		}
 
@@ -646,36 +662,36 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 			end = get_pack_fanout(info[cur_pack].p, cur_fanout);
 
 			for (cur_object = start; cur_object < end; cur_object++) {
-				ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
+				midx_fanout_grow(&fanout, fanout.nr + 1);
 				fill_pack_entry(cur_pack,
 						info[cur_pack].p,
 						cur_object,
-						&entries_by_fanout[nr_fanout],
+						&fanout.entries[fanout.nr],
 						preferred);
-				nr_fanout++;
+				fanout.nr++;
 			}
 		}
 
-		QSORT(entries_by_fanout, nr_fanout, midx_oid_compare);
+		midx_fanout_sort(&fanout);
 
 		/*
 		 * The batch is now sorted by OID and then mtime (descending).
 		 * Take only the first duplicate.
 		 */
-		for (cur_object = 0; cur_object < nr_fanout; cur_object++) {
-			if (cur_object && oideq(&entries_by_fanout[cur_object - 1].oid,
-						&entries_by_fanout[cur_object].oid))
+		for (cur_object = 0; cur_object < fanout.nr; cur_object++) {
+			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
+						&fanout.entries[cur_object].oid))
 				continue;
 
 			ALLOC_GROW(deduplicated_entries, *nr_objects + 1, alloc_objects);
 			memcpy(&deduplicated_entries[*nr_objects],
-			       &entries_by_fanout[cur_object],
+			       &fanout.entries[cur_object],
 			       sizeof(struct pack_midx_entry));
 			(*nr_objects)++;
 		}
 	}
 
-	free(entries_by_fanout);
+	free(fanout.entries);
 	return deduplicated_entries;
 }
 
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 4/6] midx.c: extract `midx_fanout_add_midx_fanout()`
  2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau
                   ` (2 preceding siblings ...)
  2022-08-19 21:30 ` [PATCH 3/6] midx.c: extract `struct midx_fanout` Taylor Blau
@ 2022-08-19 21:30 ` Taylor Blau
  2022-08-19 21:30 ` [PATCH 5/6] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

Extract a routine to add all objects whose object ID's first byte is
`cur_fanout` from an existing MIDX.

This function will only be called once, so extracting it is purely
cosmetic to improve the readability of `get_sorted_entries()` (its sole
caller) below.

The functionality is unchanged in this commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 47 ++++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/midx.c b/midx.c
index cdb6c481c7..0d40089c4d 100644
--- a/midx.c
+++ b/midx.c
@@ -593,6 +593,31 @@ static void midx_fanout_sort(struct midx_fanout *fanout)
 	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
 }
 
+static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
+					struct multi_pack_index *m,
+					int preferred_pack,
+					uint32_t cur_fanout)
+{
+	uint32_t start = 0, end;
+	uint32_t cur_object;
+
+	if (cur_fanout)
+		start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
+	end = ntohl(m->chunk_oid_fanout[cur_fanout]);
+
+	for (cur_object = start; cur_object < end; cur_object++) {
+		midx_fanout_grow(fanout, fanout->nr + 1);
+		nth_midxed_pack_midx_entry(m,
+					   &fanout->entries[fanout->nr],
+					   cur_object);
+		if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
+			fanout->entries[fanout->nr].preferred = 1;
+		else
+			fanout->entries[fanout->nr].preferred = 0;
+		fanout->nr++;
+	}
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -633,25 +658,9 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
 		fanout.nr = 0;
 
-		if (m) {
-			uint32_t start = 0, end;
-
-			if (cur_fanout)
-				start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
-			end = ntohl(m->chunk_oid_fanout[cur_fanout]);
-
-			for (cur_object = start; cur_object < end; cur_object++) {
-				midx_fanout_grow(&fanout, fanout.nr + 1);
-				nth_midxed_pack_midx_entry(m,
-							   &fanout.entries[fanout.nr],
-							   cur_object);
-				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
-					fanout.entries[fanout.nr].preferred = 1;
-				else
-					fanout.entries[fanout.nr].preferred = 0;
-				fanout.nr++;
-			}
-		}
+		if (m)
+			midx_fanout_add_midx_fanout(&fanout, m, preferred_pack,
+						    cur_fanout);
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
 			uint32_t start = 0, end;
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 5/6] midx.c: extract `midx_fanout_add_pack_fanout()`
  2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau
                   ` (3 preceding siblings ...)
  2022-08-19 21:30 ` [PATCH 4/6] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau
@ 2022-08-19 21:30 ` Taylor Blau
  2022-08-19 21:30 ` [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX Taylor Blau
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

Extract a routine to add all objects whose object ID's first byte is
`cur_fanout` from a given pack (identified by its index into the `struct
pack_info` array maintained by the MIDX writing routine).

Unlike the previous extraction (for `midx_fanout_add_midx_fanout()`),
this function will be called twice, once for all new packs, and again
for the preferred pack (if it appears in an existing MIDX). The latter
change is to resolve the bug described a few patches ago, and will be
made in the subsequent commit.

Similar to the previous refactoring, this function also enhances the
readability of its caller in `get_sorted_entries()`.

Its functionality is unchanged in this commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/midx.c b/midx.c
index 0d40089c4d..be8186eec2 100644
--- a/midx.c
+++ b/midx.c
@@ -618,6 +618,31 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 	}
 }
 
+static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
+					struct pack_info *info,
+					uint32_t cur_pack,
+					int preferred,
+					uint32_t cur_fanout)
+{
+	struct packed_git *pack = info[cur_pack].p;
+	uint32_t start = 0, end;
+	uint32_t cur_object;
+
+	if (cur_fanout)
+		start = get_pack_fanout(pack, cur_fanout - 1);
+	end = get_pack_fanout(pack, cur_fanout);
+
+	for (cur_object = start; cur_object < end; cur_object++) {
+		midx_fanout_grow(fanout, fanout->nr + 1);
+		fill_pack_entry(cur_pack,
+				info[cur_pack].p,
+				cur_object,
+				&fanout->entries[fanout->nr],
+				preferred);
+		fanout->nr++;
+	}
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -663,22 +688,10 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						    cur_fanout);
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
-			uint32_t start = 0, end;
 			int preferred = cur_pack == preferred_pack;
-
-			if (cur_fanout)
-				start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1);
-			end = get_pack_fanout(info[cur_pack].p, cur_fanout);
-
-			for (cur_object = start; cur_object < end; cur_object++) {
-				midx_fanout_grow(&fanout, fanout.nr + 1);
-				fill_pack_entry(cur_pack,
-						info[cur_pack].p,
-						cur_object,
-						&fanout.entries[fanout.nr],
-						preferred);
-				fanout.nr++;
-			}
+			midx_fanout_add_pack_fanout(&fanout,
+						    info, cur_pack,
+						    preferred, cur_fanout);
 		}
 
 		midx_fanout_sort(&fanout);
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX
  2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau
                   ` (4 preceding siblings ...)
  2022-08-19 21:30 ` [PATCH 5/6] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau
@ 2022-08-19 21:30 ` Taylor Blau
  2022-08-20 18:40   ` Abhradeep Chakraborty
  2022-08-22 17:03   ` Derrick Stolee
  2022-08-22 17:04 ` [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee
  2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau
  7 siblings, 2 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

This patch resolves an issue where the object order used to generate a
MIDX bitmap would violate an invariant that all of the preferred pack's
objects are represented by that pack in the MIDX.

The problem arises when reusing an existing MIDX while generating a new
one, and occurs specifically when the identity of the preferred pack
changes from one MIDX to another, along with a few other conditions:

    - the new preferred pack must also be present in the existing MIDX

    - the new preferred pack must *not* have been the preferred pack in
      the existing MIDX

    - most importantly, there must be at least one object present in the
      physical preferred pack (ie., it shows up in that pack's index)
      but was selected from a *different* pack when the previous MIDX
      was generated

When the above conditions are all met, we end up (incorrectly)
discarding copies of some objects in the pack selected as the preferred
pack. This is because `get_sorted_entries()` adds objects to its list
by doing the following at each fanout level:

    - first, adding all objects from that fanout level from an existing
      MIDX

    - then, adding all objects from that fanout level in each pack *not*
      included in the existing MIDX

So if some object was not selected from the to-be-preferred pack when
writing the previous MIDX, then we will never consider it as a candidate
when generating the new MIDX. This means that it's possible for the
preferred pack to not include all of its objects in the MIDX's
pseudo-pack object order, which is an invariant violation of that order.

Resolve this by adding all objects from the preferred pack separately
when it appears in the existing MIDX (if one was present). This will
duplicate objects from that pack that *did* appear in the MIDX, but this
is fine, since get_sorted_entries() already handles duplicates. (A
future optimization in this area could avoid adding copies of objects
that we know already existing in the MIDX.)

Note that we no longer need to compute the preferred-ness of objects
added from the MIDX, since we only want to select the preferred objects
from a single source. (We could still mark these preferred bits, but
doing so is redundant and unnecessary).

This resolves the bug described in the first patch of this series.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                        | 14 +++++++-------
 t/t5326-multi-pack-bitmaps.sh |  2 +-
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/midx.c b/midx.c
index be8186eec2..bd1d27090e 100644
--- a/midx.c
+++ b/midx.c
@@ -595,7 +595,6 @@ static void midx_fanout_sort(struct midx_fanout *fanout)
 
 static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 					struct multi_pack_index *m,
-					int preferred_pack,
 					uint32_t cur_fanout)
 {
 	uint32_t start = 0, end;
@@ -610,10 +609,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 		nth_midxed_pack_midx_entry(m,
 					   &fanout->entries[fanout->nr],
 					   cur_object);
-		if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
-			fanout->entries[fanout->nr].preferred = 1;
-		else
-			fanout->entries[fanout->nr].preferred = 0;
+		fanout->entries[fanout->nr].preferred = 0;
 		fanout->nr++;
 	}
 }
@@ -684,8 +680,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 		fanout.nr = 0;
 
 		if (m)
-			midx_fanout_add_midx_fanout(&fanout, m, preferred_pack,
-						    cur_fanout);
+			midx_fanout_add_midx_fanout(&fanout, m, cur_fanout);
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
 			int preferred = cur_pack == preferred_pack;
@@ -694,6 +689,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						    preferred, cur_fanout);
 		}
 
+		if (-1 < preferred_pack && preferred_pack < start_pack)
+			midx_fanout_add_pack_fanout(&fanout, info,
+						    preferred_pack, 1,
+						    cur_fanout);
+
 		midx_fanout_sort(&fanout);
 
 		/*
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index a60cec5cab..4f3841661a 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -346,7 +346,7 @@ test_expect_success 'preferred pack change with existing MIDX bitmap' '
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		test_must_fail git clone --no-local . clone2
+		git clone --no-local . clone2
 	)
 '
 
-- 
2.37.0.1.g1379af2e9d

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr
  2022-08-19 21:30 ` [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau
@ 2022-08-20 16:44   ` Abhradeep Chakraborty
  2022-08-22 17:58     ` Taylor Blau
  0 siblings, 1 reply; 29+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-20 16:44 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Junio C Hamano, Johannes Schindelin, Derrick Stolee,
	jonathantanmy, Kaartic Sivaraam

On Sat, Aug 20, 2022 at 3:00 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> The midx_bitmap_partial_tests() function is responsible for setting up a
> state where some (but not all) packs in the repository are covered by a
> MIDX (and bitmap).
>
> This function has redirected the `git multi-pack-index write --bitmap`'s
> stderr to a file "err" since its introduction back in c51f5a6437 (t5326:
> test multi-pack bitmap behavior, 2021-08-31).
>
> This was likely a stray change left over from a slightly different
> version of this test, since the file "err" is never read after being
> written. This leads to confusingly-missing output, especially when the
> contents of stderr are important.
>
> Resolve this confusion by avoiding silencing stderr in this case.
>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
>  t/lib-bitmap.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
> index a95537e759..f595937094 100644
> --- a/t/lib-bitmap.sh
> +++ b/t/lib-bitmap.sh
> @@ -440,7 +440,7 @@ midx_bitmap_partial_tests () {
>                 test_commit packed &&
>                 git repack &&
>                 test_commit loose &&
> -               git multi-pack-index write --bitmap 2>err &&
> +               git multi-pack-index write --bitmap &&
>                 test_path_is_file $midx &&
>                 test_path_is_file $midx-$(midx_checksum $objdir).bitmap
>         '

Thanks Taylor! I would say this is a very good change. It might have
been there for some reason when it was written, but that was resisting
us to debug what was going on ;-)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX
  2022-08-19 21:30 ` [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX Taylor Blau
@ 2022-08-20 18:40   ` Abhradeep Chakraborty
  2022-08-22 18:08     ` Taylor Blau
  2022-08-22 17:03   ` Derrick Stolee
  1 sibling, 1 reply; 29+ messages in thread
From: Abhradeep Chakraborty @ 2022-08-20 18:40 UTC (permalink / raw)
  To: Taylor Blau
  Cc: git, Junio C Hamano, Johannes Schindelin, Derrick Stolee,
	jonathantanmy, Kaartic Sivaraam

On Sat, Aug 20, 2022 at 3:00 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> +               if (-1 < preferred_pack && preferred_pack < start_pack)
> +                       midx_fanout_add_pack_fanout(&fanout, info,
> +                                                   preferred_pack, 1,
> +                                                   cur_fanout);
> +

All the other changes make sense to me but I have a question about
this particular change. Instead of adding all the preferred objects
again (but in this case these are being added from preferred pack) in
`fanout->entries`, will it be better if we call
`midx_fanout_add_pack_fanout()` function from
`midx_fanout_add_midx_fanout()` when above conditions are met?
Something like this -

    static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
                                        struct multi_pack_index *m,
                                        struct pack_info *info,
                                        uint32_t cur_pack,
                                        int preferred,
                                        uint32_t cur_fanout)
    {
     ...
          if (cur_fanout)
                start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
          end = ntohl(m->chunk_oid_fanout[cur_fanout]);
          if (preferred) {
                midx_fanout_add_pack_fanout(&fanout, info, cur_pack,

preferred, cur_fanout);
                return;
          }

          for (.....) {
          ........
    }

It may reduce some extra code work but also could decrease readability
of the function(may be). What's your thought here?

Thanks :)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] t5326: demonstrate potential bitmap corruption
  2022-08-19 21:30 ` [PATCH 1/6] t5326: demonstrate potential bitmap corruption Taylor Blau
@ 2022-08-22 16:09   ` Derrick Stolee
  2022-08-22 17:57     ` Taylor Blau
  0 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2022-08-22 16:09 UTC (permalink / raw)
  To: Taylor Blau, git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	jonathantanmy, kaartic.sivaraam

On 8/19/2022 5:30 PM, Taylor Blau wrote:

> +test_expect_success 'preferred pack change with existing MIDX bitmap' '
> +	rm -fr repo &&

Does a previous test not delete 'repo' when necessary?

Or, do previous tests re-use 'repo' and now we are in a region
where we can safely clear that directory? Should we use a
different name?

> +	git init repo &&
> +	test_when_finished "rm -fr repo" &&

nit: test_when_finished should be the first line of the test.

> +		# Generate a new MIDX which changes the preferred pack to a pack
> +		# contained in the existing MIDX, such that not all objects from
> +		# p2 that appear in the MIDX had their copy selected from p2.
> +		git multi-pack-index write --bitmap \
> +			--preferred-pack="pack-$p2.pack" &&
> +		test_path_is_file $midx &&
> +		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> +
> +		test_must_fail git clone --no-local . clone2

This section is demonstrating the bug. Perhaps we should have
comments indicating that this is not desired behavior, but is
being demonstrated so the bug can be fixed by a later change?

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX
  2022-08-19 21:30 ` [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX Taylor Blau
  2022-08-20 18:40   ` Abhradeep Chakraborty
@ 2022-08-22 17:03   ` Derrick Stolee
  2022-08-22 18:14     ` Taylor Blau
  1 sibling, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2022-08-22 17:03 UTC (permalink / raw)
  To: Taylor Blau, git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	jonathantanmy, kaartic.sivaraam

On 8/19/2022 5:30 PM, Taylor Blau wrote:

> Resolve this by adding all objects from the preferred pack separately
> when it appears in the existing MIDX (if one was present). This will
> duplicate objects from that pack that *did* appear in the MIDX, but this
> is fine, since get_sorted_entries() already handles duplicates. (A
> future optimization in this area could avoid adding copies of objects
> that we know already existing in the MIDX.)

...

> This resolves the bug described in the first patch of this series.

Thinking ahead to when this is a commit, perhaps this could instead
refer to the 'preferred pack change with existing MIDX bitmap' test
case?

> @@ -610,10 +609,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
>  		nth_midxed_pack_midx_entry(m,
>  					   &fanout->entries[fanout->nr],
>  					   cur_object);
> -		if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
> -			fanout->entries[fanout->nr].preferred = 1;
> -		else
> -			fanout->entries[fanout->nr].preferred = 0;
> +		fanout->entries[fanout->nr].preferred = 0;
>  		fanout->nr++;

Here, we have lost the ability to set the 'preferred' bit from the
previous MIDX. Good.

> @@ -694,6 +689,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
>  						    preferred, cur_fanout);
>  		}
>  
> +		if (-1 < preferred_pack && preferred_pack < start_pack)
> +			midx_fanout_add_pack_fanout(&fanout, info,
> +						    preferred_pack, 1,
> +						    cur_fanout);
> +

And here, when there is a preferred pack _in the previous MIDX_,
we add its objects a second time, but now with the preferred bit
on. If the preferred pack is _not_ in the previous MIDX, then the
'preferred_pack < start_pack' condition will fail and the bits
would have been set within the for loop.

> @@ -346,7 +346,7 @@ test_expect_success 'preferred pack change with existing MIDX bitmap' '
>  		test_path_is_file $midx &&
>  		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
>  
> -		test_must_fail git clone --no-local . clone2
> +		git clone --no-local . clone2

I mentioned in patch 1 that this test could use some comments about
what is unexpected and what _is_ expected. I think this comment needs
an update in this patch:

	# Generate a new MIDX which changes the preferred pack to a pack
	# contained in the existing MIDX, such that not all objects from
	# p2 that appear in the MIDX had their copy selected from p2.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX
  2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau
                   ` (5 preceding siblings ...)
  2022-08-19 21:30 ` [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX Taylor Blau
@ 2022-08-22 17:04 ` Derrick Stolee
  2022-08-22 19:44   ` Taylor Blau
  2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau
  7 siblings, 1 reply; 29+ messages in thread
From: Derrick Stolee @ 2022-08-22 17:04 UTC (permalink / raw)
  To: Taylor Blau, git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	jonathantanmy, kaartic.sivaraam

On 8/19/2022 5:30 PM, Taylor Blau wrote:
> This series resolves a bug that was reported[1] by Johannes, and
> investigated by him, Abhradeep, and Stolee in that same sub-thread.
> 
> The crux of the issue is that a MIDX bitmap can enter a corrupt state
> when changing the preferred pack from its value in an existing MIDX in
> certain circumstances as described in the first and final patches.
> 
> This series is structured as follows:
> 
>   - The first patch of this series adds a test which demonstrates the
>     problem. (This is an improvement from the debugging in [1], where we
>     only noticed the problem racily in an existing test, and only in
>     SHA-256 mode).
> 
>   - The next small handful of patches refactor midx.c's
>     `get_sorted_entries()` function to make fixing this bug more
>     straightforward.
> 
>   - The final patch resolves the bug and updates the test to no longer
>     expect failure.

Thanks for putting this together. Definitely not an easy bug to find
and fix.

I mostly have nitpicks, but the overall structure is sound.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] t5326: demonstrate potential bitmap corruption
  2022-08-22 16:09   ` Derrick Stolee
@ 2022-08-22 17:57     ` Taylor Blau
  2022-08-22 19:31       ` Junio C Hamano
  0 siblings, 1 reply; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 17:57 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	jonathantanmy, kaartic.sivaraam

On Mon, Aug 22, 2022 at 12:09:55PM -0400, Derrick Stolee wrote:
> On 8/19/2022 5:30 PM, Taylor Blau wrote:
>
> > +test_expect_success 'preferred pack change with existing MIDX bitmap' '
> > +	rm -fr repo &&
>
> Does a previous test not delete 'repo' when necessary?
>
> Or, do previous tests re-use 'repo' and now we are in a region
> where we can safely clear that directory? Should we use a
> different name?
>
> > +	git init repo &&
> > +	test_when_finished "rm -fr repo" &&
>
> nit: test_when_finished should be the first line of the test.

The "rm-then-init-then-test_when_finished" is an (unfortunate) pattern
extended throughout t5326, mostly that some tests don't clean up "repo"
after deleting and recreating it.

But it's easy enough to just use a separate repository, and avoid
removing it altogether. Thanks for the suggestion!

> > +		# Generate a new MIDX which changes the preferred pack to a pack
> > +		# contained in the existing MIDX, such that not all objects from
> > +		# p2 that appear in the MIDX had their copy selected from p2.
> > +		git multi-pack-index write --bitmap \
> > +			--preferred-pack="pack-$p2.pack" &&
> > +		test_path_is_file $midx &&
> > +		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> > +
> > +		test_must_fail git clone --no-local . clone2
>
> This section is demonstrating the bug. Perhaps we should have
> comments indicating that this is not desired behavior, but is
> being demonstrated so the bug can be fixed by a later change?

Yeah, good call. Thanks again!

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr
  2022-08-20 16:44   ` Abhradeep Chakraborty
@ 2022-08-22 17:58     ` Taylor Blau
  0 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 17:58 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: git, Junio C Hamano, Johannes Schindelin, Derrick Stolee,
	jonathantanmy, Kaartic Sivaraam

On Sat, Aug 20, 2022 at 10:14:03PM +0530, Abhradeep Chakraborty wrote:
> On Sat, Aug 20, 2022 at 3:00 AM Taylor Blau <me@ttaylorr.com> wrote:
> > diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
> > index a95537e759..f595937094 100644
> > --- a/t/lib-bitmap.sh
> > +++ b/t/lib-bitmap.sh
> > @@ -440,7 +440,7 @@ midx_bitmap_partial_tests () {
> >                 test_commit packed &&
> >                 git repack &&
> >                 test_commit loose &&
> > -               git multi-pack-index write --bitmap 2>err &&
> > +               git multi-pack-index write --bitmap &&
> >                 test_path_is_file $midx &&
> >                 test_path_is_file $midx-$(midx_checksum $objdir).bitmap
> >         '
>
> Thanks Taylor! I would say this is a very good change. It might have
> been there for some reason when it was written, but that was resisting
> us to debug what was going on ;-)

:-). I was probably doing something with "err" when I originally wrote
this test. But I likely dropped whatever assertion I had written and
forgot to stop redirecting stderr.

Better late than never ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX
  2022-08-20 18:40   ` Abhradeep Chakraborty
@ 2022-08-22 18:08     ` Taylor Blau
  0 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 18:08 UTC (permalink / raw)
  To: Abhradeep Chakraborty
  Cc: git, Junio C Hamano, Johannes Schindelin, Derrick Stolee,
	jonathantanmy, Kaartic Sivaraam

On Sun, Aug 21, 2022 at 12:10:42AM +0530, Abhradeep Chakraborty wrote:
> On Sat, Aug 20, 2022 at 3:00 AM Taylor Blau <me@ttaylorr.com> wrote:
> >
> > +               if (-1 < preferred_pack && preferred_pack < start_pack)
> > +                       midx_fanout_add_pack_fanout(&fanout, info,
> > +                                                   preferred_pack, 1,
> > +                                                   cur_fanout);
> > +
>
> All the other changes make sense to me but I have a question about
> this particular change. Instead of adding all the preferred objects
> again (but in this case these are being added from preferred pack) in
> `fanout->entries`, will it be better if we call
> `midx_fanout_add_pack_fanout()` function from
> `midx_fanout_add_midx_fanout()` when above conditions are met?
> Something like this -
>
>     static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
>                                         struct multi_pack_index *m,
>                                         struct pack_info *info,
>                                         uint32_t cur_pack,
>                                         int preferred,
>                                         uint32_t cur_fanout)
>     {
>      ...
>           if (cur_fanout)
>                 start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
>           end = ntohl(m->chunk_oid_fanout[cur_fanout]);
>           if (preferred) {
>                 midx_fanout_add_pack_fanout(&fanout, info, cur_pack,
>
> preferred, cur_fanout);
>                 return;
>           }
>
>           for (.....) {
>           ........
>     }

A slightly simpler approach might be to see that the pack_midx_entry
structure we get back from calling nth_midxed_pack_midx_entry() contains
the pack from which each object is represented. So if we see one that
collides with the preferred pack, then we can just skip it, since we
know they'll be handled separately down below.

I'll add another patch on top which makes that optimization.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX
  2022-08-22 17:03   ` Derrick Stolee
@ 2022-08-22 18:14     ` Taylor Blau
  0 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 18:14 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	jonathantanmy, kaartic.sivaraam

On Mon, Aug 22, 2022 at 01:03:11PM -0400, Derrick Stolee wrote:
> On 8/19/2022 5:30 PM, Taylor Blau wrote:
>
> > Resolve this by adding all objects from the preferred pack separately
> > when it appears in the existing MIDX (if one was present). This will
> > duplicate objects from that pack that *did* appear in the MIDX, but this
> > is fine, since get_sorted_entries() already handles duplicates. (A
> > future optimization in this area could avoid adding copies of objects
> > that we know already existing in the MIDX.)
>
> ...
>
> > This resolves the bug described in the first patch of this series.
>
> Thinking ahead to when this is a commit, perhaps this could instead
> refer to the 'preferred pack change with existing MIDX bitmap' test
> case?

Good idea, thanks.

> > @@ -610,10 +609,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
> >  		nth_midxed_pack_midx_entry(m,
> >  					   &fanout->entries[fanout->nr],
> >  					   cur_object);
> > -		if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
> > -			fanout->entries[fanout->nr].preferred = 1;
> > -		else
> > -			fanout->entries[fanout->nr].preferred = 0;
> > +		fanout->entries[fanout->nr].preferred = 0;
> >  		fanout->nr++;
>
> Here, we have lost the ability to set the 'preferred' bit from the
> previous MIDX. Good.

Yep, we don't want to propagate any of these bits forward when reusing
an existing MIDX. Thinking on it more, I think this is the only
legitimate use of MIDX reuse in the "I'm about to write bitmaps"
context.

I mentioned before the idea that we could use `--stdin-packs` now when
writing a MIDX bitmap where before it wasn't implemented (likely due to
problems caused by this bug). But the whole premise doesn't make a ton
of sense:

  - Every pack that's in the include_packs list would need to be handled
    separately.

  - And every pack that *isn't* in that list would be skipped.

Which means that it wouldn't help at all to reuse an existing MIDX. The
reason that we'd need to handle all included packs separately is subtle
and a little different from what's going on here, though. The problem
there is that if you have two packs, say P1 and P2, and P1 is in the
include list but P2 is not, then any objects duplicated between the two
and selected from P2 will disappear when writing the new MIDX.

Since the set of packs that are going into the new MIDX are precisely
equal to the set of packs that we'd need to handle separately, it
probably makes sense to continue to avoid using the existing MIDX when
writing a bitmap with the `--stdin-packs` option.

> > @@ -694,6 +689,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
> >  						    preferred, cur_fanout);
> >  		}
> >
> > +		if (-1 < preferred_pack && preferred_pack < start_pack)
> > +			midx_fanout_add_pack_fanout(&fanout, info,
> > +						    preferred_pack, 1,
> > +						    cur_fanout);
> > +
>
> And here, when there is a preferred pack _in the previous MIDX_,
> we add its objects a second time, but now with the preferred bit
> on. If the preferred pack is _not_ in the previous MIDX, then the
> 'preferred_pack < start_pack' condition will fail and the bits
> would have been set within the for loop.

Exactly!

> > @@ -346,7 +346,7 @@ test_expect_success 'preferred pack change with existing MIDX bitmap' '
> >  		test_path_is_file $midx &&
> >  		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> >
> > -		test_must_fail git clone --no-local . clone2
> > +		git clone --no-local . clone2
>
> I mentioned in patch 1 that this test could use some comments about
> what is unexpected and what _is_ expected. I think this comment needs
> an update in this patch:
>
> 	# Generate a new MIDX which changes the preferred pack to a pack
> 	# contained in the existing MIDX, such that not all objects from
> 	# p2 that appear in the MIDX had their copy selected from p2.

Good eyes, thanks for spotting. I updated the comment below, too (which
doesn't exist in this version of the patch, but you suggested adding to
the first patch in this series).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] t5326: demonstrate potential bitmap corruption
  2022-08-22 17:57     ` Taylor Blau
@ 2022-08-22 19:31       ` Junio C Hamano
  2022-08-22 19:41         ` Taylor Blau
  0 siblings, 1 reply; 29+ messages in thread
From: Junio C Hamano @ 2022-08-22 19:31 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Derrick Stolee, git, Johannes.Schindelin, chakrabortyabhradeep79,
	jonathantanmy, kaartic.sivaraam

Taylor Blau <me@ttaylorr.com> writes:

>> > +	git init repo &&
>> > +	test_when_finished "rm -fr repo" &&
>>
>> nit: test_when_finished should be the first line of the test.
>
> The "rm-then-init-then-test_when_finished" is an (unfortunate) pattern
> extended throughout t5326, mostly that some tests don't clean up "repo"
> after deleting and recreating it.

I do not think it is so bad to be defensive to "prepare it cleanly
enough so that I would not be affected".  So "rm -fr repo && git
init repo" I would fully support.  "init && test_when_finished" is
totally indefensible.  It should be the other way around.

> But it's easy enough to just use a separate repository, and avoid
> removing it altogether. Thanks for the suggestion!

Those who run tests in a batch without "-i" would have more material
to study and find breakages if you did so.  I agree that is probably
something worth doing (unless in narrow corner cases where each test
repository consumes unusual amount of storage or somethinglike that).


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/6] t5326: demonstrate potential bitmap corruption
  2022-08-22 19:31       ` Junio C Hamano
@ 2022-08-22 19:41         ` Taylor Blau
  0 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 19:41 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, git, Johannes.Schindelin, chakrabortyabhradeep79,
	jonathantanmy, kaartic.sivaraam

On Mon, Aug 22, 2022 at 12:31:44PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >> > +	git init repo &&
> >> > +	test_when_finished "rm -fr repo" &&
> >>
> >> nit: test_when_finished should be the first line of the test.
> >
> > The "rm-then-init-then-test_when_finished" is an (unfortunate) pattern
> > extended throughout t5326, mostly that some tests don't clean up "repo"
> > after deleting and recreating it.
>
> I do not think it is so bad to be defensive to "prepare it cleanly
> enough so that I would not be affected".  So "rm -fr repo && git
> init repo" I would fully support.  "init && test_when_finished" is
> totally indefensible.  It should be the other way around.

Agreed. We should fix those up, probably once Abhradeep's topic has
landed, since doing so now would create an avoidable amount of merge
conflicts.

> > But it's easy enough to just use a separate repository, and avoid
> > removing it altogether. Thanks for the suggestion!
>
> Those who run tests in a batch without "-i" would have more material
> to study and find breakages if you did so.  I agree that is probably
> something worth doing (unless in narrow corner cases where each test
> repository consumes unusual amount of storage or somethinglike that).

Agreed here, too. It's much handier to get a broken state immediately,
instead of realizing that a test was broken, commenting out the
"test_when_finished" bits, and rerunning it again (not to mention hoping
that the failure is deterministic).

Luckily these repositories are small in size, so it doesn't hurt to keep
them around.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX
  2022-08-22 17:04 ` [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee
@ 2022-08-22 19:44   ` Taylor Blau
  0 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 19:44 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: git, gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	jonathantanmy, kaartic.sivaraam

On Mon, Aug 22, 2022 at 01:04:29PM -0400, Derrick Stolee wrote:
> Thanks for putting this together. Definitely not an easy bug to find
> and fix.
>
> I mostly have nitpicks, but the overall structure is sound.

Thanks very much for reviewing (both to you and Abhradeep). I have a new
version that I'll send now, which incorporates your suggestions.

It also adds a new patch on top that takes a slight optimization, by
avoiding adding objects from the MIDX that show up in the (new)
preferred pack, since we know we'll discard them anyway.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 0/7] midx: permit changing the preferred pack when reusing the MIDX
  2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau
                   ` (6 preceding siblings ...)
  2022-08-22 17:04 ` [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee
@ 2022-08-22 19:50 ` Taylor Blau
  2022-08-22 19:50   ` [PATCH v2 1/7] t5326: demonstrate potential bitmap corruption Taylor Blau
                     ` (7 more replies)
  7 siblings, 8 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

Here is a small reroll of my series that resolves a bug that was
reported[1] by Johannes, and investigated by him, Abhradeep, and Stolee
in that same sub-thread.

As before: the crux of the issue is that a MIDX bitmap can enter a
corrupt state when changing the preferred pack from its value in an
existing MIDX in certain circumstances as described in the first and
final patches.

This version incorporates some cosmetic changes suggested by Stolee, and
adds a new patch on top which avoids adding objects from the MIDX that
were represented by the (new) preferred pack, since we know we'll end up
discarding those objects anyways. For convenience, a range-diff against
v1 is included below.

Thanks again for your review!

[1]: https://lore.kernel.org/git/p3r70610-8n52-s8q0-n641-onp4ps01330n@tzk.qr/

Taylor Blau (7):
  t5326: demonstrate potential bitmap corruption
  t/lib-bitmap.sh: avoid silencing stderr
  midx.c: extract `struct midx_fanout`
  midx.c: extract `midx_fanout_add_midx_fanout()`
  midx.c: extract `midx_fanout_add_pack_fanout()`
  midx.c: include preferred pack correctly with existing MIDX
  midx.c: avoid adding preferred objects twice

 midx.c                        | 139 +++++++++++++++++++++++-----------
 t/lib-bitmap.sh               |   2 +-
 t/t5326-multi-pack-bitmaps.sh |  44 +++++++++++
 3 files changed, 139 insertions(+), 46 deletions(-)

Range-diff against v1:
1:  3e30ab1a19 ! 1:  6b38bfcd2c t5326: demonstrate potential bitmap corruption
    @@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'graceful fallback when missi
      '
      
     +test_expect_success 'preferred pack change with existing MIDX bitmap' '
    -+	rm -fr repo &&
    -+	git init repo &&
    -+	test_when_finished "rm -fr repo" &&
    ++	git init preferred-pack-with-existing &&
     +	(
    -+		cd repo &&
    ++		cd preferred-pack-with-existing &&
     +
     +		test_commit base &&
     +		test_commit other &&
    @@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'graceful fallback when missi
     +		p2="$(git pack-objects "$objdir/pack/pack" \
     +			--delta-base-offset <p2.objects)" &&
     +
    -+		# Generate a MIDX containing the first two packs, marking p1 as
    -+		# preferred, and ensure that it can be successfully cloned.
    ++		# Generate a MIDX containing the first two packs,
    ++		# marking p1 as preferred, and ensure that it can be
    ++		# successfully cloned.
     +		git multi-pack-index write --bitmap \
     +			--preferred-pack="pack-$p1.pack" &&
     +		test_path_is_file $midx &&
     +		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     +		git clone --no-local . clone1 &&
     +
    -+		# Then generate a new pack which sorts ahead of any existing
    -+		# pack (by tweaking the pack prefix).
    ++		# Then generate a new pack which sorts ahead of any
    ++		# existing pack (by tweaking the pack prefix).
     +		test_commit foo &&
     +		git pack-objects --all --unpacked $objdir/pack/pack0 &&
     +
    -+		# Generate a new MIDX which changes the preferred pack to a pack
    -+		# contained in the existing MIDX, such that not all objects from
    -+		# p2 that appear in the MIDX had their copy selected from p2.
    ++		# Generate a new MIDX which changes the preferred pack
    ++		# to a pack contained in the existing MIDX, such that
    ++		# not all objects from p2 that appear in the MIDX had
    ++		# their copy selected from p2.
     +		git multi-pack-index write --bitmap \
     +			--preferred-pack="pack-$p2.pack" &&
     +		test_path_is_file $midx &&
     +		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
     +
    ++		# When the above circumstances are met, an existing bug
    ++		# in the MIDX machinery will cause the reverse index to
    ++		# be read incorrectly, resulting in failed clones (among
    ++		# other things).
     +		test_must_fail git clone --no-local . clone2
     +	)
     +'
2:  053045db14 = 2:  d6648ed88f t/lib-bitmap.sh: avoid silencing stderr
3:  2df8f1e884 = 3:  ae2077acb7 midx.c: extract `struct midx_fanout`
4:  92b82c83ea = 4:  2351a9fc27 midx.c: extract `midx_fanout_add_midx_fanout()`
5:  db1c6ea8e5 = 5:  845e1484b4 midx.c: extract `midx_fanout_add_pack_fanout()`
6:  4ddddc959b ! 6:  d301c4d87f midx.c: include preferred pack correctly with existing MIDX
    @@ Commit message
         from a single source. (We could still mark these preferred bits, but
         doing so is redundant and unnecessary).
     
    -    This resolves the bug described in the first patch of this series.
    +    This resolves the bug demonstrated by t5326.174 ("preferred pack change
    +    with existing MIDX bitmap").
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ midx.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pack_inde
     
      ## t/t5326-multi-pack-bitmaps.sh ##
     @@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'preferred pack change with existing MIDX bitmap' '
    + 		git pack-objects --all --unpacked $objdir/pack/pack0 &&
    + 
    + 		# Generate a new MIDX which changes the preferred pack
    +-		# to a pack contained in the existing MIDX, such that
    +-		# not all objects from p2 that appear in the MIDX had
    +-		# their copy selected from p2.
    ++		# to a pack contained in the existing MIDX.
    + 		git multi-pack-index write --bitmap \
    + 			--preferred-pack="pack-$p2.pack" &&
      		test_path_is_file $midx &&
      		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
      
    +-		# When the above circumstances are met, an existing bug
    +-		# in the MIDX machinery will cause the reverse index to
    +-		# be read incorrectly, resulting in failed clones (among
    +-		# other things).
     -		test_must_fail git clone --no-local . clone2
    ++		# When the above circumstances are met, the preferred
    ++		# pack should change appropriately and clones should
    ++		# (still) succeed.
     +		git clone --no-local . clone2
      	)
      '
-:  ---------- > 7:  887ab9485f midx.c: avoid adding preferred objects twice
-- 
2.37.0.1.g1379af2e9d

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 1/7] t5326: demonstrate potential bitmap corruption
  2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau
@ 2022-08-22 19:50   ` Taylor Blau
  2022-08-22 19:50   ` [PATCH v2 2/7] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

It is possible to generate a corrupt MIDX bitmap when certain conditions
are met. This happens when the preferred pack "P" changes to one (say,
"Q") that:

  - "Q" has objects included in an existing MIDX,
  - but "Q" is different than "P",
  - and "Q" and "P" have some objects in common

When this is the case, not all objects from "Q" will be selected from
"Q" (ie., the generated MIDX will represent them as coming from a
different pack), despite "Q" being preferred.

This is an invariant violation, since all objects contained in the
MIDX's preferred pack are supposed to originate from the preferred pack.
In other words, all duplicate objects are resolved in favor of the copy
that comes from the MIDX's preferred pack, if any.

This violation results in a corrupt object order, which cannot be
interpreted by the pack-bitmap code, leading to broken clones and other
defects.

This test demonstrates the above problem by constructing a minimal
reproduction, and showing that the final `git clone` invocation fails.

The reproduction is mostly straightforward, except that the new pack
generated between MIDX writes (which is necessary in order to prevent
that operation from being a noop) must sort ahead of all existing packs
in order to prevent a different pack (neither "P" nor "Q") from
appearing as preferred (meaning all its objects appear in order at the
beginning of the pseudo-pack order).

Subsequent commits will first refactor the midx.c::get_sorted_entries()
function, and then fix this bug.

Reported-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 47 +++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4fe57414c1..c364677ae8 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -307,4 +307,51 @@ test_expect_success 'graceful fallback when missing reverse index' '
 	)
 '
 
+test_expect_success 'preferred pack change with existing MIDX bitmap' '
+	git init preferred-pack-with-existing &&
+	(
+		cd preferred-pack-with-existing &&
+
+		test_commit base &&
+		test_commit other &&
+
+		git rev-list --objects --no-object-names base >p1.objects &&
+		git rev-list --objects --no-object-names other >p2.objects &&
+
+		p1="$(git pack-objects "$objdir/pack/pack" \
+			--delta-base-offset <p1.objects)" &&
+		p2="$(git pack-objects "$objdir/pack/pack" \
+			--delta-base-offset <p2.objects)" &&
+
+		# Generate a MIDX containing the first two packs,
+		# marking p1 as preferred, and ensure that it can be
+		# successfully cloned.
+		git multi-pack-index write --bitmap \
+			--preferred-pack="pack-$p1.pack" &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		git clone --no-local . clone1 &&
+
+		# Then generate a new pack which sorts ahead of any
+		# existing pack (by tweaking the pack prefix).
+		test_commit foo &&
+		git pack-objects --all --unpacked $objdir/pack/pack0 &&
+
+		# Generate a new MIDX which changes the preferred pack
+		# to a pack contained in the existing MIDX, such that
+		# not all objects from p2 that appear in the MIDX had
+		# their copy selected from p2.
+		git multi-pack-index write --bitmap \
+			--preferred-pack="pack-$p2.pack" &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+		# When the above circumstances are met, an existing bug
+		# in the MIDX machinery will cause the reverse index to
+		# be read incorrectly, resulting in failed clones (among
+		# other things).
+		test_must_fail git clone --no-local . clone2
+	)
+'
+
 test_done
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 2/7] t/lib-bitmap.sh: avoid silencing stderr
  2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau
  2022-08-22 19:50   ` [PATCH v2 1/7] t5326: demonstrate potential bitmap corruption Taylor Blau
@ 2022-08-22 19:50   ` Taylor Blau
  2022-08-22 19:50   ` [PATCH v2 3/7] midx.c: extract `struct midx_fanout` Taylor Blau
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

The midx_bitmap_partial_tests() function is responsible for setting up a
state where some (but not all) packs in the repository are covered by a
MIDX (and bitmap).

This function has redirected the `git multi-pack-index write --bitmap`'s
stderr to a file "err" since its introduction back in c51f5a6437 (t5326:
test multi-pack bitmap behavior, 2021-08-31).

This was likely a stray change left over from a slightly different
version of this test, since the file "err" is never read after being
written. This leads to confusingly-missing output, especially when the
contents of stderr are important.

Resolve this confusion by avoiding silencing stderr in this case.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index a95537e759..f595937094 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -440,7 +440,7 @@ midx_bitmap_partial_tests () {
 		test_commit packed &&
 		git repack &&
 		test_commit loose &&
-		git multi-pack-index write --bitmap 2>err &&
+		git multi-pack-index write --bitmap &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 	'
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 3/7] midx.c: extract `struct midx_fanout`
  2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau
  2022-08-22 19:50   ` [PATCH v2 1/7] t5326: demonstrate potential bitmap corruption Taylor Blau
  2022-08-22 19:50   ` [PATCH v2 2/7] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau
@ 2022-08-22 19:50   ` Taylor Blau
  2022-08-22 19:50   ` [PATCH v2 4/7] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

To build up a list of objects (along with their packs, and the offsets
within those packs that each object appears at), the MIDX code
implements `get_sorted_entries()` which builds up a list of candidates,
sorts them, and then removes duplicate entries.

To do this, it keeps an array of `pack_midx_entry` structures that it
builds up once for each fanout level (ie., for all possible values of
the first byte of each object's ID).

This array is a function-local variable of `get_sorted_entries()`. Since
it uses the ALLOC_GROW() macro, having the `alloc_fanout` variable also
be local to that function, and only modified within that function is
convenient.

However, subsequent changes will extract the two ways this array is
filled (from a pack at some fanout value, and from an existing MIDX at
some fanout value) into separate functions. Instead of passing around
pointers to the entries array, along with `nr_fanout` and
`alloc_fanout`, encapsulate these three into a structure instead. Then
pass around a pointer to this structure instead.

This patch does not yet extract the above two functions, but sets us up
to begin doing so in the following commit. For now, the implementation
of get_sorted_entries() is only modified to replace `entries_by_fanout`
with `fanout.entries`, `nr_fanout` with `fanout.nr`, and so on.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 54 +++++++++++++++++++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 19 deletions(-)

diff --git a/midx.c b/midx.c
index 4e956cacb7..cdb6c481c7 100644
--- a/midx.c
+++ b/midx.c
@@ -577,6 +577,22 @@ static void fill_pack_entry(uint32_t pack_int_id,
 	entry->preferred = !!preferred;
 }
 
+struct midx_fanout {
+	struct pack_midx_entry *entries;
+	uint32_t nr;
+	uint32_t alloc;
+};
+
+static void midx_fanout_grow(struct midx_fanout *fanout, uint32_t nr)
+{
+	ALLOC_GROW(fanout->entries, nr, fanout->alloc);
+}
+
+static void midx_fanout_sort(struct midx_fanout *fanout)
+{
+	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -595,8 +611,8 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						  int preferred_pack)
 {
 	uint32_t cur_fanout, cur_pack, cur_object;
-	uint32_t alloc_fanout, alloc_objects, total_objects = 0;
-	struct pack_midx_entry *entries_by_fanout = NULL;
+	uint32_t alloc_objects, total_objects = 0;
+	struct midx_fanout fanout = { 0 };
 	struct pack_midx_entry *deduplicated_entries = NULL;
 	uint32_t start_pack = m ? m->num_packs : 0;
 
@@ -608,14 +624,14 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 	 * slices to be evenly distributed, with some noise. Hence,
 	 * allocate slightly more than one 256th.
 	 */
-	alloc_objects = alloc_fanout = total_objects > 3200 ? total_objects / 200 : 16;
+	alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16;
 
-	ALLOC_ARRAY(entries_by_fanout, alloc_fanout);
+	ALLOC_ARRAY(fanout.entries, fanout.alloc);
 	ALLOC_ARRAY(deduplicated_entries, alloc_objects);
 	*nr_objects = 0;
 
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
-		uint32_t nr_fanout = 0;
+		fanout.nr = 0;
 
 		if (m) {
 			uint32_t start = 0, end;
@@ -625,15 +641,15 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 			end = ntohl(m->chunk_oid_fanout[cur_fanout]);
 
 			for (cur_object = start; cur_object < end; cur_object++) {
-				ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
+				midx_fanout_grow(&fanout, fanout.nr + 1);
 				nth_midxed_pack_midx_entry(m,
-							   &entries_by_fanout[nr_fanout],
+							   &fanout.entries[fanout.nr],
 							   cur_object);
 				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
-					entries_by_fanout[nr_fanout].preferred = 1;
+					fanout.entries[fanout.nr].preferred = 1;
 				else
-					entries_by_fanout[nr_fanout].preferred = 0;
-				nr_fanout++;
+					fanout.entries[fanout.nr].preferred = 0;
+				fanout.nr++;
 			}
 		}
 
@@ -646,36 +662,36 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 			end = get_pack_fanout(info[cur_pack].p, cur_fanout);
 
 			for (cur_object = start; cur_object < end; cur_object++) {
-				ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout);
+				midx_fanout_grow(&fanout, fanout.nr + 1);
 				fill_pack_entry(cur_pack,
 						info[cur_pack].p,
 						cur_object,
-						&entries_by_fanout[nr_fanout],
+						&fanout.entries[fanout.nr],
 						preferred);
-				nr_fanout++;
+				fanout.nr++;
 			}
 		}
 
-		QSORT(entries_by_fanout, nr_fanout, midx_oid_compare);
+		midx_fanout_sort(&fanout);
 
 		/*
 		 * The batch is now sorted by OID and then mtime (descending).
 		 * Take only the first duplicate.
 		 */
-		for (cur_object = 0; cur_object < nr_fanout; cur_object++) {
-			if (cur_object && oideq(&entries_by_fanout[cur_object - 1].oid,
-						&entries_by_fanout[cur_object].oid))
+		for (cur_object = 0; cur_object < fanout.nr; cur_object++) {
+			if (cur_object && oideq(&fanout.entries[cur_object - 1].oid,
+						&fanout.entries[cur_object].oid))
 				continue;
 
 			ALLOC_GROW(deduplicated_entries, *nr_objects + 1, alloc_objects);
 			memcpy(&deduplicated_entries[*nr_objects],
-			       &entries_by_fanout[cur_object],
+			       &fanout.entries[cur_object],
 			       sizeof(struct pack_midx_entry));
 			(*nr_objects)++;
 		}
 	}
 
-	free(entries_by_fanout);
+	free(fanout.entries);
 	return deduplicated_entries;
 }
 
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 4/7] midx.c: extract `midx_fanout_add_midx_fanout()`
  2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau
                     ` (2 preceding siblings ...)
  2022-08-22 19:50   ` [PATCH v2 3/7] midx.c: extract `struct midx_fanout` Taylor Blau
@ 2022-08-22 19:50   ` Taylor Blau
  2022-08-22 19:50   ` [PATCH v2 5/7] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

Extract a routine to add all objects whose object ID's first byte is
`cur_fanout` from an existing MIDX.

This function will only be called once, so extracting it is purely
cosmetic to improve the readability of `get_sorted_entries()` (its sole
caller) below.

The functionality is unchanged in this commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 47 ++++++++++++++++++++++++++++-------------------
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/midx.c b/midx.c
index cdb6c481c7..0d40089c4d 100644
--- a/midx.c
+++ b/midx.c
@@ -593,6 +593,31 @@ static void midx_fanout_sort(struct midx_fanout *fanout)
 	QSORT(fanout->entries, fanout->nr, midx_oid_compare);
 }
 
+static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
+					struct multi_pack_index *m,
+					int preferred_pack,
+					uint32_t cur_fanout)
+{
+	uint32_t start = 0, end;
+	uint32_t cur_object;
+
+	if (cur_fanout)
+		start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
+	end = ntohl(m->chunk_oid_fanout[cur_fanout]);
+
+	for (cur_object = start; cur_object < end; cur_object++) {
+		midx_fanout_grow(fanout, fanout->nr + 1);
+		nth_midxed_pack_midx_entry(m,
+					   &fanout->entries[fanout->nr],
+					   cur_object);
+		if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
+			fanout->entries[fanout->nr].preferred = 1;
+		else
+			fanout->entries[fanout->nr].preferred = 0;
+		fanout->nr++;
+	}
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -633,25 +658,9 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 	for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) {
 		fanout.nr = 0;
 
-		if (m) {
-			uint32_t start = 0, end;
-
-			if (cur_fanout)
-				start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
-			end = ntohl(m->chunk_oid_fanout[cur_fanout]);
-
-			for (cur_object = start; cur_object < end; cur_object++) {
-				midx_fanout_grow(&fanout, fanout.nr + 1);
-				nth_midxed_pack_midx_entry(m,
-							   &fanout.entries[fanout.nr],
-							   cur_object);
-				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
-					fanout.entries[fanout.nr].preferred = 1;
-				else
-					fanout.entries[fanout.nr].preferred = 0;
-				fanout.nr++;
-			}
-		}
+		if (m)
+			midx_fanout_add_midx_fanout(&fanout, m, preferred_pack,
+						    cur_fanout);
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
 			uint32_t start = 0, end;
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 5/7] midx.c: extract `midx_fanout_add_pack_fanout()`
  2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau
                     ` (3 preceding siblings ...)
  2022-08-22 19:50   ` [PATCH v2 4/7] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau
@ 2022-08-22 19:50   ` Taylor Blau
  2022-08-22 19:50   ` [PATCH v2 6/7] midx.c: include preferred pack correctly with existing MIDX Taylor Blau
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

Extract a routine to add all objects whose object ID's first byte is
`cur_fanout` from a given pack (identified by its index into the `struct
pack_info` array maintained by the MIDX writing routine).

Unlike the previous extraction (for `midx_fanout_add_midx_fanout()`),
this function will be called twice, once for all new packs, and again
for the preferred pack (if it appears in an existing MIDX). The latter
change is to resolve the bug described a few patches ago, and will be
made in the subsequent commit.

Similar to the previous refactoring, this function also enhances the
readability of its caller in `get_sorted_entries()`.

Its functionality is unchanged in this commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 43 ++++++++++++++++++++++++++++---------------
 1 file changed, 28 insertions(+), 15 deletions(-)

diff --git a/midx.c b/midx.c
index 0d40089c4d..be8186eec2 100644
--- a/midx.c
+++ b/midx.c
@@ -618,6 +618,31 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 	}
 }
 
+static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout,
+					struct pack_info *info,
+					uint32_t cur_pack,
+					int preferred,
+					uint32_t cur_fanout)
+{
+	struct packed_git *pack = info[cur_pack].p;
+	uint32_t start = 0, end;
+	uint32_t cur_object;
+
+	if (cur_fanout)
+		start = get_pack_fanout(pack, cur_fanout - 1);
+	end = get_pack_fanout(pack, cur_fanout);
+
+	for (cur_object = start; cur_object < end; cur_object++) {
+		midx_fanout_grow(fanout, fanout->nr + 1);
+		fill_pack_entry(cur_pack,
+				info[cur_pack].p,
+				cur_object,
+				&fanout->entries[fanout->nr],
+				preferred);
+		fanout->nr++;
+	}
+}
+
 /*
  * It is possible to artificially get into a state where there are many
  * duplicate copies of objects. That can create high memory pressure if
@@ -663,22 +688,10 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						    cur_fanout);
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
-			uint32_t start = 0, end;
 			int preferred = cur_pack == preferred_pack;
-
-			if (cur_fanout)
-				start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1);
-			end = get_pack_fanout(info[cur_pack].p, cur_fanout);
-
-			for (cur_object = start; cur_object < end; cur_object++) {
-				midx_fanout_grow(&fanout, fanout.nr + 1);
-				fill_pack_entry(cur_pack,
-						info[cur_pack].p,
-						cur_object,
-						&fanout.entries[fanout.nr],
-						preferred);
-				fanout.nr++;
-			}
+			midx_fanout_add_pack_fanout(&fanout,
+						    info, cur_pack,
+						    preferred, cur_fanout);
 		}
 
 		midx_fanout_sort(&fanout);
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 6/7] midx.c: include preferred pack correctly with existing MIDX
  2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau
                     ` (4 preceding siblings ...)
  2022-08-22 19:50   ` [PATCH v2 5/7] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau
@ 2022-08-22 19:50   ` Taylor Blau
  2022-08-22 19:50   ` [PATCH v2 7/7] midx.c: avoid adding preferred objects twice Taylor Blau
  2022-08-23 16:23   ` [PATCH v2 0/7] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee
  7 siblings, 0 replies; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

This patch resolves an issue where the object order used to generate a
MIDX bitmap would violate an invariant that all of the preferred pack's
objects are represented by that pack in the MIDX.

The problem arises when reusing an existing MIDX while generating a new
one, and occurs specifically when the identity of the preferred pack
changes from one MIDX to another, along with a few other conditions:

    - the new preferred pack must also be present in the existing MIDX

    - the new preferred pack must *not* have been the preferred pack in
      the existing MIDX

    - most importantly, there must be at least one object present in the
      physical preferred pack (ie., it shows up in that pack's index)
      but was selected from a *different* pack when the previous MIDX
      was generated

When the above conditions are all met, we end up (incorrectly)
discarding copies of some objects in the pack selected as the preferred
pack. This is because `get_sorted_entries()` adds objects to its list
by doing the following at each fanout level:

    - first, adding all objects from that fanout level from an existing
      MIDX

    - then, adding all objects from that fanout level in each pack *not*
      included in the existing MIDX

So if some object was not selected from the to-be-preferred pack when
writing the previous MIDX, then we will never consider it as a candidate
when generating the new MIDX. This means that it's possible for the
preferred pack to not include all of its objects in the MIDX's
pseudo-pack object order, which is an invariant violation of that order.

Resolve this by adding all objects from the preferred pack separately
when it appears in the existing MIDX (if one was present). This will
duplicate objects from that pack that *did* appear in the MIDX, but this
is fine, since get_sorted_entries() already handles duplicates. (A
future optimization in this area could avoid adding copies of objects
that we know already existing in the MIDX.)

Note that we no longer need to compute the preferred-ness of objects
added from the MIDX, since we only want to select the preferred objects
from a single source. (We could still mark these preferred bits, but
doing so is redundant and unnecessary).

This resolves the bug demonstrated by t5326.174 ("preferred pack change
with existing MIDX bitmap").

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                        | 14 +++++++-------
 t/t5326-multi-pack-bitmaps.sh | 13 +++++--------
 2 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/midx.c b/midx.c
index be8186eec2..bd1d27090e 100644
--- a/midx.c
+++ b/midx.c
@@ -595,7 +595,6 @@ static void midx_fanout_sort(struct midx_fanout *fanout)
 
 static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 					struct multi_pack_index *m,
-					int preferred_pack,
 					uint32_t cur_fanout)
 {
 	uint32_t start = 0, end;
@@ -610,10 +609,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 		nth_midxed_pack_midx_entry(m,
 					   &fanout->entries[fanout->nr],
 					   cur_object);
-		if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
-			fanout->entries[fanout->nr].preferred = 1;
-		else
-			fanout->entries[fanout->nr].preferred = 0;
+		fanout->entries[fanout->nr].preferred = 0;
 		fanout->nr++;
 	}
 }
@@ -684,8 +680,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 		fanout.nr = 0;
 
 		if (m)
-			midx_fanout_add_midx_fanout(&fanout, m, preferred_pack,
-						    cur_fanout);
+			midx_fanout_add_midx_fanout(&fanout, m, cur_fanout);
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
 			int preferred = cur_pack == preferred_pack;
@@ -694,6 +689,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 						    preferred, cur_fanout);
 		}
 
+		if (-1 < preferred_pack && preferred_pack < start_pack)
+			midx_fanout_add_pack_fanout(&fanout, info,
+						    preferred_pack, 1,
+						    cur_fanout);
+
 		midx_fanout_sort(&fanout);
 
 		/*
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index c364677ae8..89ecd1062c 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -338,19 +338,16 @@ test_expect_success 'preferred pack change with existing MIDX bitmap' '
 		git pack-objects --all --unpacked $objdir/pack/pack0 &&
 
 		# Generate a new MIDX which changes the preferred pack
-		# to a pack contained in the existing MIDX, such that
-		# not all objects from p2 that appear in the MIDX had
-		# their copy selected from p2.
+		# to a pack contained in the existing MIDX.
 		git multi-pack-index write --bitmap \
 			--preferred-pack="pack-$p2.pack" &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
 
-		# When the above circumstances are met, an existing bug
-		# in the MIDX machinery will cause the reverse index to
-		# be read incorrectly, resulting in failed clones (among
-		# other things).
-		test_must_fail git clone --no-local . clone2
+		# When the above circumstances are met, the preferred
+		# pack should change appropriately and clones should
+		# (still) succeed.
+		git clone --no-local . clone2
 	)
 '
 
-- 
2.37.0.1.g1379af2e9d


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 7/7] midx.c: avoid adding preferred objects twice
  2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau
                     ` (5 preceding siblings ...)
  2022-08-22 19:50   ` [PATCH v2 6/7] midx.c: include preferred pack correctly with existing MIDX Taylor Blau
@ 2022-08-22 19:50   ` Taylor Blau
  2022-08-23 16:22     ` Derrick Stolee
  2022-08-23 16:23   ` [PATCH v2 0/7] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee
  7 siblings, 1 reply; 29+ messages in thread
From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw)
  To: git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	derrickstolee, jonathantanmy, kaartic.sivaraam

The last commit changes the behavior of midx.c's `get_sorted_objects()`
function to handle the case of writing a MIDX bitmap while reusing an
existing MIDX and changing the identity of the preferred pack
separately.

As part of this change, all objects from the (new) preferred pack are
added to the fanout table in a separate pass. Since these copies of the
objects all have their preferred bits set, any duplicates will be
resolved in their favor.

Importantly, this includes any copies of those same objects that come
from the existing MIDX. We know at the time of adding them that they'll
be redundant if their source pack is the (new) preferred one, so we can
avoid adding them to the list in this case.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index bd1d27090e..148ecc2f14 100644
--- a/midx.c
+++ b/midx.c
@@ -595,7 +595,8 @@ static void midx_fanout_sort(struct midx_fanout *fanout)
 
 static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 					struct multi_pack_index *m,
-					uint32_t cur_fanout)
+					uint32_t cur_fanout,
+					int preferred_pack)
 {
 	uint32_t start = 0, end;
 	uint32_t cur_object;
@@ -605,6 +606,15 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
 	end = ntohl(m->chunk_oid_fanout[cur_fanout]);
 
 	for (cur_object = start; cur_object < end; cur_object++) {
+		if ((preferred_pack > -1) &&
+		    (preferred_pack == nth_midxed_pack_int_id(m, cur_object))) {
+			/*
+			 * Objects from preferred packs are added
+			 * separately.
+			 */
+			continue;
+		}
+
 		midx_fanout_grow(fanout, fanout->nr + 1);
 		nth_midxed_pack_midx_entry(m,
 					   &fanout->entries[fanout->nr],
@@ -680,7 +690,8 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 		fanout.nr = 0;
 
 		if (m)
-			midx_fanout_add_midx_fanout(&fanout, m, cur_fanout);
+			midx_fanout_add_midx_fanout(&fanout, m, cur_fanout,
+						    preferred_pack);
 
 		for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) {
 			int preferred = cur_pack == preferred_pack;
-- 
2.37.0.1.g1379af2e9d

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 7/7] midx.c: avoid adding preferred objects twice
  2022-08-22 19:50   ` [PATCH v2 7/7] midx.c: avoid adding preferred objects twice Taylor Blau
@ 2022-08-23 16:22     ` Derrick Stolee
  0 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2022-08-23 16:22 UTC (permalink / raw)
  To: Taylor Blau, git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	jonathantanmy, kaartic.sivaraam

On 8/22/2022 3:50 PM, Taylor Blau wrote:
> The last commit changes the behavior of midx.c's `get_sorted_objects()`
> function to handle the case of writing a MIDX bitmap while reusing an
> existing MIDX and changing the identity of the preferred pack
> separately.
> 
> As part of this change, all objects from the (new) preferred pack are
> added to the fanout table in a separate pass. Since these copies of the
> objects all have their preferred bits set, any duplicates will be
> resolved in their favor.
> 
> Importantly, this includes any copies of those same objects that come
> from the existing MIDX. We know at the time of adding them that they'll
> be redundant if their source pack is the (new) preferred one, so we can
> avoid adding them to the list in this case.

Good call to reduce memory requirements.

> @@ -605,6 +606,15 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout,
>  	end = ntohl(m->chunk_oid_fanout[cur_fanout]);
>  
>  	for (cur_object = start; cur_object < end; cur_object++) {
> +		if ((preferred_pack > -1) &&
> +		    (preferred_pack == nth_midxed_pack_int_id(m, cur_object))) {

nit: you don't need the extra parentheses here.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 0/7] midx: permit changing the preferred pack when reusing the MIDX
  2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau
                     ` (6 preceding siblings ...)
  2022-08-22 19:50   ` [PATCH v2 7/7] midx.c: avoid adding preferred objects twice Taylor Blau
@ 2022-08-23 16:23   ` Derrick Stolee
  7 siblings, 0 replies; 29+ messages in thread
From: Derrick Stolee @ 2022-08-23 16:23 UTC (permalink / raw)
  To: Taylor Blau, git
  Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79,
	jonathantanmy, kaartic.sivaraam

On 8/22/2022 3:50 PM, Taylor Blau wrote:
> Here is a small reroll of my series that resolves a bug that was
> reported[1] by Johannes, and investigated by him, Abhradeep, and Stolee
> in that same sub-thread.
> 
> As before: the crux of the issue is that a MIDX bitmap can enter a
> corrupt state when changing the preferred pack from its value in an
> existing MIDX in certain circumstances as described in the first and
> final patches.
> 
> This version incorporates some cosmetic changes suggested by Stolee, and
> adds a new patch on top which avoids adding objects from the MIDX that
> were represented by the (new) preferred pack, since we know we'll end up
> discarding those objects anyways. For convenience, a range-diff against
> v1 is included below.

You resolved the comments from the previous version well. I'm happy
with the changes.

I have one style nit in the new patch, but it doesn't merit a re-roll
on its own.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-08-23 18:12 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau
2022-08-19 21:30 ` [PATCH 1/6] t5326: demonstrate potential bitmap corruption Taylor Blau
2022-08-22 16:09   ` Derrick Stolee
2022-08-22 17:57     ` Taylor Blau
2022-08-22 19:31       ` Junio C Hamano
2022-08-22 19:41         ` Taylor Blau
2022-08-19 21:30 ` [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau
2022-08-20 16:44   ` Abhradeep Chakraborty
2022-08-22 17:58     ` Taylor Blau
2022-08-19 21:30 ` [PATCH 3/6] midx.c: extract `struct midx_fanout` Taylor Blau
2022-08-19 21:30 ` [PATCH 4/6] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau
2022-08-19 21:30 ` [PATCH 5/6] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau
2022-08-19 21:30 ` [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX Taylor Blau
2022-08-20 18:40   ` Abhradeep Chakraborty
2022-08-22 18:08     ` Taylor Blau
2022-08-22 17:03   ` Derrick Stolee
2022-08-22 18:14     ` Taylor Blau
2022-08-22 17:04 ` [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee
2022-08-22 19:44   ` Taylor Blau
2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau
2022-08-22 19:50   ` [PATCH v2 1/7] t5326: demonstrate potential bitmap corruption Taylor Blau
2022-08-22 19:50   ` [PATCH v2 2/7] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau
2022-08-22 19:50   ` [PATCH v2 3/7] midx.c: extract `struct midx_fanout` Taylor Blau
2022-08-22 19:50   ` [PATCH v2 4/7] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau
2022-08-22 19:50   ` [PATCH v2 5/7] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau
2022-08-22 19:50   ` [PATCH v2 6/7] midx.c: include preferred pack correctly with existing MIDX Taylor Blau
2022-08-22 19:50   ` [PATCH v2 7/7] midx.c: avoid adding preferred objects twice Taylor Blau
2022-08-23 16:22     ` Derrick Stolee
2022-08-23 16:23   ` [PATCH v2 0/7] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).