* [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX @ 2022-08-19 21:30 Taylor Blau 2022-08-19 21:30 ` [PATCH 1/6] t5326: demonstrate potential bitmap corruption Taylor Blau ` (7 more replies) 0 siblings, 8 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam This series resolves a bug that was reported[1] by Johannes, and investigated by him, Abhradeep, and Stolee in that same sub-thread. The crux of the issue is that a MIDX bitmap can enter a corrupt state when changing the preferred pack from its value in an existing MIDX in certain circumstances as described in the first and final patches. This series is structured as follows: - The first patch of this series adds a test which demonstrates the problem. (This is an improvement from the debugging in [1], where we only noticed the problem racily in an existing test, and only in SHA-256 mode). - The next small handful of patches refactor midx.c's `get_sorted_entries()` function to make fixing this bug more straightforward. - The final patch resolves the bug and updates the test to no longer expect failure. A couple of meta-notes: - This bug has existed since the introduction of MIDX bitmaps, but probably wasn't noticed until now since it is only triggerable when the `--stdin-packs` mode *isn't* passed, so it never occurs when invoked via `git repack`. - We could likely change the behavior of 56d863e979 (midx: expose `write_midx_file_only()` publicly, 2021-09-28), which explicitly disables reusing the existing MIDX (by avoiding loading it altogether) when `--stdin-packs` is given. The rationale in the comment added by 56d863e979 is somewhat unclear, but I have a vague recollection of running into a bug that was squashed by avoiding reusing an existing MIDX to write one with bitmaps. This was likely that bug. Thanks in advance for your review, and thanks to Johannes, Abhradeep, and Stolee for investigating this bug while I was on vacation. [1]: https://lore.kernel.org/git/p3r70610-8n52-s8q0-n641-onp4ps01330n@tzk.qr/ Taylor Blau (6): t5326: demonstrate potential bitmap corruption t/lib-bitmap.sh: avoid silencing stderr midx.c: extract `struct midx_fanout` midx.c: extract `midx_fanout_add_midx_fanout()` midx.c: extract `midx_fanout_add_pack_fanout()` midx.c: include preferred pack correctly with existing MIDX midx.c | 128 ++++++++++++++++++++++------------ t/lib-bitmap.sh | 2 +- t/t5326-multi-pack-bitmaps.sh | 43 ++++++++++++ 3 files changed, 127 insertions(+), 46 deletions(-) -- 2.37.0.1.g1379af2e9d ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 1/6] t5326: demonstrate potential bitmap corruption 2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau @ 2022-08-19 21:30 ` Taylor Blau 2022-08-22 16:09 ` Derrick Stolee 2022-08-19 21:30 ` [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau ` (6 subsequent siblings) 7 siblings, 1 reply; 29+ messages in thread From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam It is possible to generate a corrupt MIDX bitmap when certain conditions are met. This happens when the preferred pack "P" changes to one (say, "Q") that: - "Q" has objects included in an existing MIDX, - but "Q" is different than "P", - and "Q" and "P" have some objects in common When this is the case, not all objects from "Q" will be selected from "Q" (ie., the generated MIDX will represent them as coming from a different pack), despite "Q" being preferred. This is an invariant violation, since all objects contained in the MIDX's preferred pack are supposed to originate from the preferred pack. In other words, all duplicate objects are resolved in favor of the copy that comes from the MIDX's preferred pack, if any. This violation results in a corrupt object order, which cannot be interpreted by the pack-bitmap code, leading to broken clones and other defects. This test demonstrates the above problem by constructing a minimal reproduction, and showing that the final `git clone` invocation fails. The reproduction is mostly straightforward, except that the new pack generated between MIDX writes (which is necessary in order to prevent that operation from being a noop) must sort ahead of all existing packs in order to prevent a different pack (neither "P" nor "Q") from appearing as preferred (meaning all its objects appear in order at the beginning of the pseudo-pack order). Subsequent commits will first refactor the midx.c::get_sorted_entries() function, and then fix this bug. Reported-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/t5326-multi-pack-bitmaps.sh | 43 +++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 4fe57414c1..a60cec5cab 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -307,4 +307,47 @@ test_expect_success 'graceful fallback when missing reverse index' ' ) ' +test_expect_success 'preferred pack change with existing MIDX bitmap' ' + rm -fr repo && + git init repo && + test_when_finished "rm -fr repo" && + ( + cd repo && + + test_commit base && + test_commit other && + + git rev-list --objects --no-object-names base >p1.objects && + git rev-list --objects --no-object-names other >p2.objects && + + p1="$(git pack-objects "$objdir/pack/pack" \ + --delta-base-offset <p1.objects)" && + p2="$(git pack-objects "$objdir/pack/pack" \ + --delta-base-offset <p2.objects)" && + + # Generate a MIDX containing the first two packs, marking p1 as + # preferred, and ensure that it can be successfully cloned. + git multi-pack-index write --bitmap \ + --preferred-pack="pack-$p1.pack" && + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + git clone --no-local . clone1 && + + # Then generate a new pack which sorts ahead of any existing + # pack (by tweaking the pack prefix). + test_commit foo && + git pack-objects --all --unpacked $objdir/pack/pack0 && + + # Generate a new MIDX which changes the preferred pack to a pack + # contained in the existing MIDX, such that not all objects from + # p2 that appear in the MIDX had their copy selected from p2. + git multi-pack-index write --bitmap \ + --preferred-pack="pack-$p2.pack" && + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + + test_must_fail git clone --no-local . clone2 + ) +' + test_done -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 1/6] t5326: demonstrate potential bitmap corruption 2022-08-19 21:30 ` [PATCH 1/6] t5326: demonstrate potential bitmap corruption Taylor Blau @ 2022-08-22 16:09 ` Derrick Stolee 2022-08-22 17:57 ` Taylor Blau 0 siblings, 1 reply; 29+ messages in thread From: Derrick Stolee @ 2022-08-22 16:09 UTC (permalink / raw) To: Taylor Blau, git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, jonathantanmy, kaartic.sivaraam On 8/19/2022 5:30 PM, Taylor Blau wrote: > +test_expect_success 'preferred pack change with existing MIDX bitmap' ' > + rm -fr repo && Does a previous test not delete 'repo' when necessary? Or, do previous tests re-use 'repo' and now we are in a region where we can safely clear that directory? Should we use a different name? > + git init repo && > + test_when_finished "rm -fr repo" && nit: test_when_finished should be the first line of the test. > + # Generate a new MIDX which changes the preferred pack to a pack > + # contained in the existing MIDX, such that not all objects from > + # p2 that appear in the MIDX had their copy selected from p2. > + git multi-pack-index write --bitmap \ > + --preferred-pack="pack-$p2.pack" && > + test_path_is_file $midx && > + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && > + > + test_must_fail git clone --no-local . clone2 This section is demonstrating the bug. Perhaps we should have comments indicating that this is not desired behavior, but is being demonstrated so the bug can be fixed by a later change? Thanks, -Stolee ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/6] t5326: demonstrate potential bitmap corruption 2022-08-22 16:09 ` Derrick Stolee @ 2022-08-22 17:57 ` Taylor Blau 2022-08-22 19:31 ` Junio C Hamano 0 siblings, 1 reply; 29+ messages in thread From: Taylor Blau @ 2022-08-22 17:57 UTC (permalink / raw) To: Derrick Stolee Cc: git, gitster, Johannes.Schindelin, chakrabortyabhradeep79, jonathantanmy, kaartic.sivaraam On Mon, Aug 22, 2022 at 12:09:55PM -0400, Derrick Stolee wrote: > On 8/19/2022 5:30 PM, Taylor Blau wrote: > > > +test_expect_success 'preferred pack change with existing MIDX bitmap' ' > > + rm -fr repo && > > Does a previous test not delete 'repo' when necessary? > > Or, do previous tests re-use 'repo' and now we are in a region > where we can safely clear that directory? Should we use a > different name? > > > + git init repo && > > + test_when_finished "rm -fr repo" && > > nit: test_when_finished should be the first line of the test. The "rm-then-init-then-test_when_finished" is an (unfortunate) pattern extended throughout t5326, mostly that some tests don't clean up "repo" after deleting and recreating it. But it's easy enough to just use a separate repository, and avoid removing it altogether. Thanks for the suggestion! > > + # Generate a new MIDX which changes the preferred pack to a pack > > + # contained in the existing MIDX, such that not all objects from > > + # p2 that appear in the MIDX had their copy selected from p2. > > + git multi-pack-index write --bitmap \ > > + --preferred-pack="pack-$p2.pack" && > > + test_path_is_file $midx && > > + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && > > + > > + test_must_fail git clone --no-local . clone2 > > This section is demonstrating the bug. Perhaps we should have > comments indicating that this is not desired behavior, but is > being demonstrated so the bug can be fixed by a later change? Yeah, good call. Thanks again! Thanks, Taylor ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/6] t5326: demonstrate potential bitmap corruption 2022-08-22 17:57 ` Taylor Blau @ 2022-08-22 19:31 ` Junio C Hamano 2022-08-22 19:41 ` Taylor Blau 0 siblings, 1 reply; 29+ messages in thread From: Junio C Hamano @ 2022-08-22 19:31 UTC (permalink / raw) To: Taylor Blau Cc: Derrick Stolee, git, Johannes.Schindelin, chakrabortyabhradeep79, jonathantanmy, kaartic.sivaraam Taylor Blau <me@ttaylorr.com> writes: >> > + git init repo && >> > + test_when_finished "rm -fr repo" && >> >> nit: test_when_finished should be the first line of the test. > > The "rm-then-init-then-test_when_finished" is an (unfortunate) pattern > extended throughout t5326, mostly that some tests don't clean up "repo" > after deleting and recreating it. I do not think it is so bad to be defensive to "prepare it cleanly enough so that I would not be affected". So "rm -fr repo && git init repo" I would fully support. "init && test_when_finished" is totally indefensible. It should be the other way around. > But it's easy enough to just use a separate repository, and avoid > removing it altogether. Thanks for the suggestion! Those who run tests in a batch without "-i" would have more material to study and find breakages if you did so. I agree that is probably something worth doing (unless in narrow corner cases where each test repository consumes unusual amount of storage or somethinglike that). ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 1/6] t5326: demonstrate potential bitmap corruption 2022-08-22 19:31 ` Junio C Hamano @ 2022-08-22 19:41 ` Taylor Blau 0 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 19:41 UTC (permalink / raw) To: Junio C Hamano Cc: Derrick Stolee, git, Johannes.Schindelin, chakrabortyabhradeep79, jonathantanmy, kaartic.sivaraam On Mon, Aug 22, 2022 at 12:31:44PM -0700, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > >> > + git init repo && > >> > + test_when_finished "rm -fr repo" && > >> > >> nit: test_when_finished should be the first line of the test. > > > > The "rm-then-init-then-test_when_finished" is an (unfortunate) pattern > > extended throughout t5326, mostly that some tests don't clean up "repo" > > after deleting and recreating it. > > I do not think it is so bad to be defensive to "prepare it cleanly > enough so that I would not be affected". So "rm -fr repo && git > init repo" I would fully support. "init && test_when_finished" is > totally indefensible. It should be the other way around. Agreed. We should fix those up, probably once Abhradeep's topic has landed, since doing so now would create an avoidable amount of merge conflicts. > > But it's easy enough to just use a separate repository, and avoid > > removing it altogether. Thanks for the suggestion! > > Those who run tests in a batch without "-i" would have more material > to study and find breakages if you did so. I agree that is probably > something worth doing (unless in narrow corner cases where each test > repository consumes unusual amount of storage or somethinglike that). Agreed here, too. It's much handier to get a broken state immediately, instead of realizing that a test was broken, commenting out the "test_when_finished" bits, and rerunning it again (not to mention hoping that the failure is deterministic). Luckily these repositories are small in size, so it doesn't hurt to keep them around. Thanks, Taylor ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr 2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau 2022-08-19 21:30 ` [PATCH 1/6] t5326: demonstrate potential bitmap corruption Taylor Blau @ 2022-08-19 21:30 ` Taylor Blau 2022-08-20 16:44 ` Abhradeep Chakraborty 2022-08-19 21:30 ` [PATCH 3/6] midx.c: extract `struct midx_fanout` Taylor Blau ` (5 subsequent siblings) 7 siblings, 1 reply; 29+ messages in thread From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam The midx_bitmap_partial_tests() function is responsible for setting up a state where some (but not all) packs in the repository are covered by a MIDX (and bitmap). This function has redirected the `git multi-pack-index write --bitmap`'s stderr to a file "err" since its introduction back in c51f5a6437 (t5326: test multi-pack bitmap behavior, 2021-08-31). This was likely a stray change left over from a slightly different version of this test, since the file "err" is never read after being written. This leads to confusingly-missing output, especially when the contents of stderr are important. Resolve this confusion by avoiding silencing stderr in this case. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/lib-bitmap.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh index a95537e759..f595937094 100644 --- a/t/lib-bitmap.sh +++ b/t/lib-bitmap.sh @@ -440,7 +440,7 @@ midx_bitmap_partial_tests () { test_commit packed && git repack && test_commit loose && - git multi-pack-index write --bitmap 2>err && + git multi-pack-index write --bitmap && test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap ' -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr 2022-08-19 21:30 ` [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau @ 2022-08-20 16:44 ` Abhradeep Chakraborty 2022-08-22 17:58 ` Taylor Blau 0 siblings, 1 reply; 29+ messages in thread From: Abhradeep Chakraborty @ 2022-08-20 16:44 UTC (permalink / raw) To: Taylor Blau Cc: git, Junio C Hamano, Johannes Schindelin, Derrick Stolee, jonathantanmy, Kaartic Sivaraam On Sat, Aug 20, 2022 at 3:00 AM Taylor Blau <me@ttaylorr.com> wrote: > > The midx_bitmap_partial_tests() function is responsible for setting up a > state where some (but not all) packs in the repository are covered by a > MIDX (and bitmap). > > This function has redirected the `git multi-pack-index write --bitmap`'s > stderr to a file "err" since its introduction back in c51f5a6437 (t5326: > test multi-pack bitmap behavior, 2021-08-31). > > This was likely a stray change left over from a slightly different > version of this test, since the file "err" is never read after being > written. This leads to confusingly-missing output, especially when the > contents of stderr are important. > > Resolve this confusion by avoiding silencing stderr in this case. > > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > t/lib-bitmap.sh | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh > index a95537e759..f595937094 100644 > --- a/t/lib-bitmap.sh > +++ b/t/lib-bitmap.sh > @@ -440,7 +440,7 @@ midx_bitmap_partial_tests () { > test_commit packed && > git repack && > test_commit loose && > - git multi-pack-index write --bitmap 2>err && > + git multi-pack-index write --bitmap && > test_path_is_file $midx && > test_path_is_file $midx-$(midx_checksum $objdir).bitmap > ' Thanks Taylor! I would say this is a very good change. It might have been there for some reason when it was written, but that was resisting us to debug what was going on ;-) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr 2022-08-20 16:44 ` Abhradeep Chakraborty @ 2022-08-22 17:58 ` Taylor Blau 0 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 17:58 UTC (permalink / raw) To: Abhradeep Chakraborty Cc: git, Junio C Hamano, Johannes Schindelin, Derrick Stolee, jonathantanmy, Kaartic Sivaraam On Sat, Aug 20, 2022 at 10:14:03PM +0530, Abhradeep Chakraborty wrote: > On Sat, Aug 20, 2022 at 3:00 AM Taylor Blau <me@ttaylorr.com> wrote: > > diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh > > index a95537e759..f595937094 100644 > > --- a/t/lib-bitmap.sh > > +++ b/t/lib-bitmap.sh > > @@ -440,7 +440,7 @@ midx_bitmap_partial_tests () { > > test_commit packed && > > git repack && > > test_commit loose && > > - git multi-pack-index write --bitmap 2>err && > > + git multi-pack-index write --bitmap && > > test_path_is_file $midx && > > test_path_is_file $midx-$(midx_checksum $objdir).bitmap > > ' > > Thanks Taylor! I would say this is a very good change. It might have > been there for some reason when it was written, but that was resisting > us to debug what was going on ;-) :-). I was probably doing something with "err" when I originally wrote this test. But I likely dropped whatever assertion I had written and forgot to stop redirecting stderr. Better late than never ;-). Thanks, Taylor ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH 3/6] midx.c: extract `struct midx_fanout` 2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau 2022-08-19 21:30 ` [PATCH 1/6] t5326: demonstrate potential bitmap corruption Taylor Blau 2022-08-19 21:30 ` [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau @ 2022-08-19 21:30 ` Taylor Blau 2022-08-19 21:30 ` [PATCH 4/6] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau ` (4 subsequent siblings) 7 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam To build up a list of objects (along with their packs, and the offsets within those packs that each object appears at), the MIDX code implements `get_sorted_entries()` which builds up a list of candidates, sorts them, and then removes duplicate entries. To do this, it keeps an array of `pack_midx_entry` structures that it builds up once for each fanout level (ie., for all possible values of the first byte of each object's ID). This array is a function-local variable of `get_sorted_entries()`. Since it uses the ALLOC_GROW() macro, having the `alloc_fanout` variable also be local to that function, and only modified within that function is convenient. However, subsequent changes will extract the two ways this array is filled (from a pack at some fanout value, and from an existing MIDX at some fanout value) into separate functions. Instead of passing around pointers to the entries array, along with `nr_fanout` and `alloc_fanout`, encapsulate these three into a structure instead. Then pass around a pointer to this structure instead. This patch does not yet extract the above two functions, but sets us up to begin doing so in the following commit. For now, the implementation of get_sorted_entries() is only modified to replace `entries_by_fanout` with `fanout.entries`, `nr_fanout` with `fanout.nr`, and so on. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 54 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 35 insertions(+), 19 deletions(-) diff --git a/midx.c b/midx.c index 4e956cacb7..cdb6c481c7 100644 --- a/midx.c +++ b/midx.c @@ -577,6 +577,22 @@ static void fill_pack_entry(uint32_t pack_int_id, entry->preferred = !!preferred; } +struct midx_fanout { + struct pack_midx_entry *entries; + uint32_t nr; + uint32_t alloc; +}; + +static void midx_fanout_grow(struct midx_fanout *fanout, uint32_t nr) +{ + ALLOC_GROW(fanout->entries, nr, fanout->alloc); +} + +static void midx_fanout_sort(struct midx_fanout *fanout) +{ + QSORT(fanout->entries, fanout->nr, midx_oid_compare); +} + /* * It is possible to artificially get into a state where there are many * duplicate copies of objects. That can create high memory pressure if @@ -595,8 +611,8 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, int preferred_pack) { uint32_t cur_fanout, cur_pack, cur_object; - uint32_t alloc_fanout, alloc_objects, total_objects = 0; - struct pack_midx_entry *entries_by_fanout = NULL; + uint32_t alloc_objects, total_objects = 0; + struct midx_fanout fanout = { 0 }; struct pack_midx_entry *deduplicated_entries = NULL; uint32_t start_pack = m ? m->num_packs : 0; @@ -608,14 +624,14 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, * slices to be evenly distributed, with some noise. Hence, * allocate slightly more than one 256th. */ - alloc_objects = alloc_fanout = total_objects > 3200 ? total_objects / 200 : 16; + alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16; - ALLOC_ARRAY(entries_by_fanout, alloc_fanout); + ALLOC_ARRAY(fanout.entries, fanout.alloc); ALLOC_ARRAY(deduplicated_entries, alloc_objects); *nr_objects = 0; for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) { - uint32_t nr_fanout = 0; + fanout.nr = 0; if (m) { uint32_t start = 0, end; @@ -625,15 +641,15 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, end = ntohl(m->chunk_oid_fanout[cur_fanout]); for (cur_object = start; cur_object < end; cur_object++) { - ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout); + midx_fanout_grow(&fanout, fanout.nr + 1); nth_midxed_pack_midx_entry(m, - &entries_by_fanout[nr_fanout], + &fanout.entries[fanout.nr], cur_object); if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) - entries_by_fanout[nr_fanout].preferred = 1; + fanout.entries[fanout.nr].preferred = 1; else - entries_by_fanout[nr_fanout].preferred = 0; - nr_fanout++; + fanout.entries[fanout.nr].preferred = 0; + fanout.nr++; } } @@ -646,36 +662,36 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, end = get_pack_fanout(info[cur_pack].p, cur_fanout); for (cur_object = start; cur_object < end; cur_object++) { - ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout); + midx_fanout_grow(&fanout, fanout.nr + 1); fill_pack_entry(cur_pack, info[cur_pack].p, cur_object, - &entries_by_fanout[nr_fanout], + &fanout.entries[fanout.nr], preferred); - nr_fanout++; + fanout.nr++; } } - QSORT(entries_by_fanout, nr_fanout, midx_oid_compare); + midx_fanout_sort(&fanout); /* * The batch is now sorted by OID and then mtime (descending). * Take only the first duplicate. */ - for (cur_object = 0; cur_object < nr_fanout; cur_object++) { - if (cur_object && oideq(&entries_by_fanout[cur_object - 1].oid, - &entries_by_fanout[cur_object].oid)) + for (cur_object = 0; cur_object < fanout.nr; cur_object++) { + if (cur_object && oideq(&fanout.entries[cur_object - 1].oid, + &fanout.entries[cur_object].oid)) continue; ALLOC_GROW(deduplicated_entries, *nr_objects + 1, alloc_objects); memcpy(&deduplicated_entries[*nr_objects], - &entries_by_fanout[cur_object], + &fanout.entries[cur_object], sizeof(struct pack_midx_entry)); (*nr_objects)++; } } - free(entries_by_fanout); + free(fanout.entries); return deduplicated_entries; } -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 4/6] midx.c: extract `midx_fanout_add_midx_fanout()` 2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau ` (2 preceding siblings ...) 2022-08-19 21:30 ` [PATCH 3/6] midx.c: extract `struct midx_fanout` Taylor Blau @ 2022-08-19 21:30 ` Taylor Blau 2022-08-19 21:30 ` [PATCH 5/6] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau ` (3 subsequent siblings) 7 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam Extract a routine to add all objects whose object ID's first byte is `cur_fanout` from an existing MIDX. This function will only be called once, so extracting it is purely cosmetic to improve the readability of `get_sorted_entries()` (its sole caller) below. The functionality is unchanged in this commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 47 ++++++++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 19 deletions(-) diff --git a/midx.c b/midx.c index cdb6c481c7..0d40089c4d 100644 --- a/midx.c +++ b/midx.c @@ -593,6 +593,31 @@ static void midx_fanout_sort(struct midx_fanout *fanout) QSORT(fanout->entries, fanout->nr, midx_oid_compare); } +static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, + struct multi_pack_index *m, + int preferred_pack, + uint32_t cur_fanout) +{ + uint32_t start = 0, end; + uint32_t cur_object; + + if (cur_fanout) + start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]); + end = ntohl(m->chunk_oid_fanout[cur_fanout]); + + for (cur_object = start; cur_object < end; cur_object++) { + midx_fanout_grow(fanout, fanout->nr + 1); + nth_midxed_pack_midx_entry(m, + &fanout->entries[fanout->nr], + cur_object); + if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) + fanout->entries[fanout->nr].preferred = 1; + else + fanout->entries[fanout->nr].preferred = 0; + fanout->nr++; + } +} + /* * It is possible to artificially get into a state where there are many * duplicate copies of objects. That can create high memory pressure if @@ -633,25 +658,9 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) { fanout.nr = 0; - if (m) { - uint32_t start = 0, end; - - if (cur_fanout) - start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]); - end = ntohl(m->chunk_oid_fanout[cur_fanout]); - - for (cur_object = start; cur_object < end; cur_object++) { - midx_fanout_grow(&fanout, fanout.nr + 1); - nth_midxed_pack_midx_entry(m, - &fanout.entries[fanout.nr], - cur_object); - if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) - fanout.entries[fanout.nr].preferred = 1; - else - fanout.entries[fanout.nr].preferred = 0; - fanout.nr++; - } - } + if (m) + midx_fanout_add_midx_fanout(&fanout, m, preferred_pack, + cur_fanout); for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { uint32_t start = 0, end; -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 5/6] midx.c: extract `midx_fanout_add_pack_fanout()` 2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau ` (3 preceding siblings ...) 2022-08-19 21:30 ` [PATCH 4/6] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau @ 2022-08-19 21:30 ` Taylor Blau 2022-08-19 21:30 ` [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX Taylor Blau ` (2 subsequent siblings) 7 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam Extract a routine to add all objects whose object ID's first byte is `cur_fanout` from a given pack (identified by its index into the `struct pack_info` array maintained by the MIDX writing routine). Unlike the previous extraction (for `midx_fanout_add_midx_fanout()`), this function will be called twice, once for all new packs, and again for the preferred pack (if it appears in an existing MIDX). The latter change is to resolve the bug described a few patches ago, and will be made in the subsequent commit. Similar to the previous refactoring, this function also enhances the readability of its caller in `get_sorted_entries()`. Its functionality is unchanged in this commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-) diff --git a/midx.c b/midx.c index 0d40089c4d..be8186eec2 100644 --- a/midx.c +++ b/midx.c @@ -618,6 +618,31 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, } } +static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout, + struct pack_info *info, + uint32_t cur_pack, + int preferred, + uint32_t cur_fanout) +{ + struct packed_git *pack = info[cur_pack].p; + uint32_t start = 0, end; + uint32_t cur_object; + + if (cur_fanout) + start = get_pack_fanout(pack, cur_fanout - 1); + end = get_pack_fanout(pack, cur_fanout); + + for (cur_object = start; cur_object < end; cur_object++) { + midx_fanout_grow(fanout, fanout->nr + 1); + fill_pack_entry(cur_pack, + info[cur_pack].p, + cur_object, + &fanout->entries[fanout->nr], + preferred); + fanout->nr++; + } +} + /* * It is possible to artificially get into a state where there are many * duplicate copies of objects. That can create high memory pressure if @@ -663,22 +688,10 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, cur_fanout); for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { - uint32_t start = 0, end; int preferred = cur_pack == preferred_pack; - - if (cur_fanout) - start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1); - end = get_pack_fanout(info[cur_pack].p, cur_fanout); - - for (cur_object = start; cur_object < end; cur_object++) { - midx_fanout_grow(&fanout, fanout.nr + 1); - fill_pack_entry(cur_pack, - info[cur_pack].p, - cur_object, - &fanout.entries[fanout.nr], - preferred); - fanout.nr++; - } + midx_fanout_add_pack_fanout(&fanout, + info, cur_pack, + preferred, cur_fanout); } midx_fanout_sort(&fanout); -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX 2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau ` (4 preceding siblings ...) 2022-08-19 21:30 ` [PATCH 5/6] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau @ 2022-08-19 21:30 ` Taylor Blau 2022-08-20 18:40 ` Abhradeep Chakraborty 2022-08-22 17:03 ` Derrick Stolee 2022-08-22 17:04 ` [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee 2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau 7 siblings, 2 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-19 21:30 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam This patch resolves an issue where the object order used to generate a MIDX bitmap would violate an invariant that all of the preferred pack's objects are represented by that pack in the MIDX. The problem arises when reusing an existing MIDX while generating a new one, and occurs specifically when the identity of the preferred pack changes from one MIDX to another, along with a few other conditions: - the new preferred pack must also be present in the existing MIDX - the new preferred pack must *not* have been the preferred pack in the existing MIDX - most importantly, there must be at least one object present in the physical preferred pack (ie., it shows up in that pack's index) but was selected from a *different* pack when the previous MIDX was generated When the above conditions are all met, we end up (incorrectly) discarding copies of some objects in the pack selected as the preferred pack. This is because `get_sorted_entries()` adds objects to its list by doing the following at each fanout level: - first, adding all objects from that fanout level from an existing MIDX - then, adding all objects from that fanout level in each pack *not* included in the existing MIDX So if some object was not selected from the to-be-preferred pack when writing the previous MIDX, then we will never consider it as a candidate when generating the new MIDX. This means that it's possible for the preferred pack to not include all of its objects in the MIDX's pseudo-pack object order, which is an invariant violation of that order. Resolve this by adding all objects from the preferred pack separately when it appears in the existing MIDX (if one was present). This will duplicate objects from that pack that *did* appear in the MIDX, but this is fine, since get_sorted_entries() already handles duplicates. (A future optimization in this area could avoid adding copies of objects that we know already existing in the MIDX.) Note that we no longer need to compute the preferred-ness of objects added from the MIDX, since we only want to select the preferred objects from a single source. (We could still mark these preferred bits, but doing so is redundant and unnecessary). This resolves the bug described in the first patch of this series. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 14 +++++++------- t/t5326-multi-pack-bitmaps.sh | 2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/midx.c b/midx.c index be8186eec2..bd1d27090e 100644 --- a/midx.c +++ b/midx.c @@ -595,7 +595,6 @@ static void midx_fanout_sort(struct midx_fanout *fanout) static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, struct multi_pack_index *m, - int preferred_pack, uint32_t cur_fanout) { uint32_t start = 0, end; @@ -610,10 +609,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, nth_midxed_pack_midx_entry(m, &fanout->entries[fanout->nr], cur_object); - if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) - fanout->entries[fanout->nr].preferred = 1; - else - fanout->entries[fanout->nr].preferred = 0; + fanout->entries[fanout->nr].preferred = 0; fanout->nr++; } } @@ -684,8 +680,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, fanout.nr = 0; if (m) - midx_fanout_add_midx_fanout(&fanout, m, preferred_pack, - cur_fanout); + midx_fanout_add_midx_fanout(&fanout, m, cur_fanout); for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { int preferred = cur_pack == preferred_pack; @@ -694,6 +689,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, preferred, cur_fanout); } + if (-1 < preferred_pack && preferred_pack < start_pack) + midx_fanout_add_pack_fanout(&fanout, info, + preferred_pack, 1, + cur_fanout); + midx_fanout_sort(&fanout); /* diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index a60cec5cab..4f3841661a 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -346,7 +346,7 @@ test_expect_success 'preferred pack change with existing MIDX bitmap' ' test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - test_must_fail git clone --no-local . clone2 + git clone --no-local . clone2 ) ' -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX 2022-08-19 21:30 ` [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX Taylor Blau @ 2022-08-20 18:40 ` Abhradeep Chakraborty 2022-08-22 18:08 ` Taylor Blau 2022-08-22 17:03 ` Derrick Stolee 1 sibling, 1 reply; 29+ messages in thread From: Abhradeep Chakraborty @ 2022-08-20 18:40 UTC (permalink / raw) To: Taylor Blau Cc: git, Junio C Hamano, Johannes Schindelin, Derrick Stolee, jonathantanmy, Kaartic Sivaraam On Sat, Aug 20, 2022 at 3:00 AM Taylor Blau <me@ttaylorr.com> wrote: > > + if (-1 < preferred_pack && preferred_pack < start_pack) > + midx_fanout_add_pack_fanout(&fanout, info, > + preferred_pack, 1, > + cur_fanout); > + All the other changes make sense to me but I have a question about this particular change. Instead of adding all the preferred objects again (but in this case these are being added from preferred pack) in `fanout->entries`, will it be better if we call `midx_fanout_add_pack_fanout()` function from `midx_fanout_add_midx_fanout()` when above conditions are met? Something like this - static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, struct multi_pack_index *m, struct pack_info *info, uint32_t cur_pack, int preferred, uint32_t cur_fanout) { ... if (cur_fanout) start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]); end = ntohl(m->chunk_oid_fanout[cur_fanout]); if (preferred) { midx_fanout_add_pack_fanout(&fanout, info, cur_pack, preferred, cur_fanout); return; } for (.....) { ........ } It may reduce some extra code work but also could decrease readability of the function(may be). What's your thought here? Thanks :) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX 2022-08-20 18:40 ` Abhradeep Chakraborty @ 2022-08-22 18:08 ` Taylor Blau 0 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 18:08 UTC (permalink / raw) To: Abhradeep Chakraborty Cc: git, Junio C Hamano, Johannes Schindelin, Derrick Stolee, jonathantanmy, Kaartic Sivaraam On Sun, Aug 21, 2022 at 12:10:42AM +0530, Abhradeep Chakraborty wrote: > On Sat, Aug 20, 2022 at 3:00 AM Taylor Blau <me@ttaylorr.com> wrote: > > > > + if (-1 < preferred_pack && preferred_pack < start_pack) > > + midx_fanout_add_pack_fanout(&fanout, info, > > + preferred_pack, 1, > > + cur_fanout); > > + > > All the other changes make sense to me but I have a question about > this particular change. Instead of adding all the preferred objects > again (but in this case these are being added from preferred pack) in > `fanout->entries`, will it be better if we call > `midx_fanout_add_pack_fanout()` function from > `midx_fanout_add_midx_fanout()` when above conditions are met? > Something like this - > > static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, > struct multi_pack_index *m, > struct pack_info *info, > uint32_t cur_pack, > int preferred, > uint32_t cur_fanout) > { > ... > if (cur_fanout) > start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]); > end = ntohl(m->chunk_oid_fanout[cur_fanout]); > if (preferred) { > midx_fanout_add_pack_fanout(&fanout, info, cur_pack, > > preferred, cur_fanout); > return; > } > > for (.....) { > ........ > } A slightly simpler approach might be to see that the pack_midx_entry structure we get back from calling nth_midxed_pack_midx_entry() contains the pack from which each object is represented. So if we see one that collides with the preferred pack, then we can just skip it, since we know they'll be handled separately down below. I'll add another patch on top which makes that optimization. Thanks, Taylor ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX 2022-08-19 21:30 ` [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX Taylor Blau 2022-08-20 18:40 ` Abhradeep Chakraborty @ 2022-08-22 17:03 ` Derrick Stolee 2022-08-22 18:14 ` Taylor Blau 1 sibling, 1 reply; 29+ messages in thread From: Derrick Stolee @ 2022-08-22 17:03 UTC (permalink / raw) To: Taylor Blau, git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, jonathantanmy, kaartic.sivaraam On 8/19/2022 5:30 PM, Taylor Blau wrote: > Resolve this by adding all objects from the preferred pack separately > when it appears in the existing MIDX (if one was present). This will > duplicate objects from that pack that *did* appear in the MIDX, but this > is fine, since get_sorted_entries() already handles duplicates. (A > future optimization in this area could avoid adding copies of objects > that we know already existing in the MIDX.) ... > This resolves the bug described in the first patch of this series. Thinking ahead to when this is a commit, perhaps this could instead refer to the 'preferred pack change with existing MIDX bitmap' test case? > @@ -610,10 +609,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, > nth_midxed_pack_midx_entry(m, > &fanout->entries[fanout->nr], > cur_object); > - if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) > - fanout->entries[fanout->nr].preferred = 1; > - else > - fanout->entries[fanout->nr].preferred = 0; > + fanout->entries[fanout->nr].preferred = 0; > fanout->nr++; Here, we have lost the ability to set the 'preferred' bit from the previous MIDX. Good. > @@ -694,6 +689,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, > preferred, cur_fanout); > } > > + if (-1 < preferred_pack && preferred_pack < start_pack) > + midx_fanout_add_pack_fanout(&fanout, info, > + preferred_pack, 1, > + cur_fanout); > + And here, when there is a preferred pack _in the previous MIDX_, we add its objects a second time, but now with the preferred bit on. If the preferred pack is _not_ in the previous MIDX, then the 'preferred_pack < start_pack' condition will fail and the bits would have been set within the for loop. > @@ -346,7 +346,7 @@ test_expect_success 'preferred pack change with existing MIDX bitmap' ' > test_path_is_file $midx && > test_path_is_file $midx-$(midx_checksum $objdir).bitmap && > > - test_must_fail git clone --no-local . clone2 > + git clone --no-local . clone2 I mentioned in patch 1 that this test could use some comments about what is unexpected and what _is_ expected. I think this comment needs an update in this patch: # Generate a new MIDX which changes the preferred pack to a pack # contained in the existing MIDX, such that not all objects from # p2 that appear in the MIDX had their copy selected from p2. Thanks, -Stolee ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX 2022-08-22 17:03 ` Derrick Stolee @ 2022-08-22 18:14 ` Taylor Blau 0 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 18:14 UTC (permalink / raw) To: Derrick Stolee Cc: git, gitster, Johannes.Schindelin, chakrabortyabhradeep79, jonathantanmy, kaartic.sivaraam On Mon, Aug 22, 2022 at 01:03:11PM -0400, Derrick Stolee wrote: > On 8/19/2022 5:30 PM, Taylor Blau wrote: > > > Resolve this by adding all objects from the preferred pack separately > > when it appears in the existing MIDX (if one was present). This will > > duplicate objects from that pack that *did* appear in the MIDX, but this > > is fine, since get_sorted_entries() already handles duplicates. (A > > future optimization in this area could avoid adding copies of objects > > that we know already existing in the MIDX.) > > ... > > > This resolves the bug described in the first patch of this series. > > Thinking ahead to when this is a commit, perhaps this could instead > refer to the 'preferred pack change with existing MIDX bitmap' test > case? Good idea, thanks. > > @@ -610,10 +609,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, > > nth_midxed_pack_midx_entry(m, > > &fanout->entries[fanout->nr], > > cur_object); > > - if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) > > - fanout->entries[fanout->nr].preferred = 1; > > - else > > - fanout->entries[fanout->nr].preferred = 0; > > + fanout->entries[fanout->nr].preferred = 0; > > fanout->nr++; > > Here, we have lost the ability to set the 'preferred' bit from the > previous MIDX. Good. Yep, we don't want to propagate any of these bits forward when reusing an existing MIDX. Thinking on it more, I think this is the only legitimate use of MIDX reuse in the "I'm about to write bitmaps" context. I mentioned before the idea that we could use `--stdin-packs` now when writing a MIDX bitmap where before it wasn't implemented (likely due to problems caused by this bug). But the whole premise doesn't make a ton of sense: - Every pack that's in the include_packs list would need to be handled separately. - And every pack that *isn't* in that list would be skipped. Which means that it wouldn't help at all to reuse an existing MIDX. The reason that we'd need to handle all included packs separately is subtle and a little different from what's going on here, though. The problem there is that if you have two packs, say P1 and P2, and P1 is in the include list but P2 is not, then any objects duplicated between the two and selected from P2 will disappear when writing the new MIDX. Since the set of packs that are going into the new MIDX are precisely equal to the set of packs that we'd need to handle separately, it probably makes sense to continue to avoid using the existing MIDX when writing a bitmap with the `--stdin-packs` option. > > @@ -694,6 +689,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, > > preferred, cur_fanout); > > } > > > > + if (-1 < preferred_pack && preferred_pack < start_pack) > > + midx_fanout_add_pack_fanout(&fanout, info, > > + preferred_pack, 1, > > + cur_fanout); > > + > > And here, when there is a preferred pack _in the previous MIDX_, > we add its objects a second time, but now with the preferred bit > on. If the preferred pack is _not_ in the previous MIDX, then the > 'preferred_pack < start_pack' condition will fail and the bits > would have been set within the for loop. Exactly! > > @@ -346,7 +346,7 @@ test_expect_success 'preferred pack change with existing MIDX bitmap' ' > > test_path_is_file $midx && > > test_path_is_file $midx-$(midx_checksum $objdir).bitmap && > > > > - test_must_fail git clone --no-local . clone2 > > + git clone --no-local . clone2 > > I mentioned in patch 1 that this test could use some comments about > what is unexpected and what _is_ expected. I think this comment needs > an update in this patch: > > # Generate a new MIDX which changes the preferred pack to a pack > # contained in the existing MIDX, such that not all objects from > # p2 that appear in the MIDX had their copy selected from p2. Good eyes, thanks for spotting. I updated the comment below, too (which doesn't exist in this version of the patch, but you suggested adding to the first patch in this series). Thanks, Taylor ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX 2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau ` (5 preceding siblings ...) 2022-08-19 21:30 ` [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX Taylor Blau @ 2022-08-22 17:04 ` Derrick Stolee 2022-08-22 19:44 ` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau 7 siblings, 1 reply; 29+ messages in thread From: Derrick Stolee @ 2022-08-22 17:04 UTC (permalink / raw) To: Taylor Blau, git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, jonathantanmy, kaartic.sivaraam On 8/19/2022 5:30 PM, Taylor Blau wrote: > This series resolves a bug that was reported[1] by Johannes, and > investigated by him, Abhradeep, and Stolee in that same sub-thread. > > The crux of the issue is that a MIDX bitmap can enter a corrupt state > when changing the preferred pack from its value in an existing MIDX in > certain circumstances as described in the first and final patches. > > This series is structured as follows: > > - The first patch of this series adds a test which demonstrates the > problem. (This is an improvement from the debugging in [1], where we > only noticed the problem racily in an existing test, and only in > SHA-256 mode). > > - The next small handful of patches refactor midx.c's > `get_sorted_entries()` function to make fixing this bug more > straightforward. > > - The final patch resolves the bug and updates the test to no longer > expect failure. Thanks for putting this together. Definitely not an easy bug to find and fix. I mostly have nitpicks, but the overall structure is sound. Thanks, -Stolee ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX 2022-08-22 17:04 ` [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee @ 2022-08-22 19:44 ` Taylor Blau 0 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 19:44 UTC (permalink / raw) To: Derrick Stolee Cc: git, gitster, Johannes.Schindelin, chakrabortyabhradeep79, jonathantanmy, kaartic.sivaraam On Mon, Aug 22, 2022 at 01:04:29PM -0400, Derrick Stolee wrote: > Thanks for putting this together. Definitely not an easy bug to find > and fix. > > I mostly have nitpicks, but the overall structure is sound. Thanks very much for reviewing (both to you and Abhradeep). I have a new version that I'll send now, which incorporates your suggestions. It also adds a new patch on top that takes a slight optimization, by avoiding adding objects from the MIDX that show up in the (new) preferred pack, since we know we'll discard them anyway. Thanks, Taylor ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 0/7] midx: permit changing the preferred pack when reusing the MIDX 2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau ` (6 preceding siblings ...) 2022-08-22 17:04 ` [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee @ 2022-08-22 19:50 ` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 1/7] t5326: demonstrate potential bitmap corruption Taylor Blau ` (7 more replies) 7 siblings, 8 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam Here is a small reroll of my series that resolves a bug that was reported[1] by Johannes, and investigated by him, Abhradeep, and Stolee in that same sub-thread. As before: the crux of the issue is that a MIDX bitmap can enter a corrupt state when changing the preferred pack from its value in an existing MIDX in certain circumstances as described in the first and final patches. This version incorporates some cosmetic changes suggested by Stolee, and adds a new patch on top which avoids adding objects from the MIDX that were represented by the (new) preferred pack, since we know we'll end up discarding those objects anyways. For convenience, a range-diff against v1 is included below. Thanks again for your review! [1]: https://lore.kernel.org/git/p3r70610-8n52-s8q0-n641-onp4ps01330n@tzk.qr/ Taylor Blau (7): t5326: demonstrate potential bitmap corruption t/lib-bitmap.sh: avoid silencing stderr midx.c: extract `struct midx_fanout` midx.c: extract `midx_fanout_add_midx_fanout()` midx.c: extract `midx_fanout_add_pack_fanout()` midx.c: include preferred pack correctly with existing MIDX midx.c: avoid adding preferred objects twice midx.c | 139 +++++++++++++++++++++++----------- t/lib-bitmap.sh | 2 +- t/t5326-multi-pack-bitmaps.sh | 44 +++++++++++ 3 files changed, 139 insertions(+), 46 deletions(-) Range-diff against v1: 1: 3e30ab1a19 ! 1: 6b38bfcd2c t5326: demonstrate potential bitmap corruption @@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'graceful fallback when missi ' +test_expect_success 'preferred pack change with existing MIDX bitmap' ' -+ rm -fr repo && -+ git init repo && -+ test_when_finished "rm -fr repo" && ++ git init preferred-pack-with-existing && + ( -+ cd repo && ++ cd preferred-pack-with-existing && + + test_commit base && + test_commit other && @@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'graceful fallback when missi + p2="$(git pack-objects "$objdir/pack/pack" \ + --delta-base-offset <p2.objects)" && + -+ # Generate a MIDX containing the first two packs, marking p1 as -+ # preferred, and ensure that it can be successfully cloned. ++ # Generate a MIDX containing the first two packs, ++ # marking p1 as preferred, and ensure that it can be ++ # successfully cloned. + git multi-pack-index write --bitmap \ + --preferred-pack="pack-$p1.pack" && + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + git clone --no-local . clone1 && + -+ # Then generate a new pack which sorts ahead of any existing -+ # pack (by tweaking the pack prefix). ++ # Then generate a new pack which sorts ahead of any ++ # existing pack (by tweaking the pack prefix). + test_commit foo && + git pack-objects --all --unpacked $objdir/pack/pack0 && + -+ # Generate a new MIDX which changes the preferred pack to a pack -+ # contained in the existing MIDX, such that not all objects from -+ # p2 that appear in the MIDX had their copy selected from p2. ++ # Generate a new MIDX which changes the preferred pack ++ # to a pack contained in the existing MIDX, such that ++ # not all objects from p2 that appear in the MIDX had ++ # their copy selected from p2. + git multi-pack-index write --bitmap \ + --preferred-pack="pack-$p2.pack" && + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + ++ # When the above circumstances are met, an existing bug ++ # in the MIDX machinery will cause the reverse index to ++ # be read incorrectly, resulting in failed clones (among ++ # other things). + test_must_fail git clone --no-local . clone2 + ) +' 2: 053045db14 = 2: d6648ed88f t/lib-bitmap.sh: avoid silencing stderr 3: 2df8f1e884 = 3: ae2077acb7 midx.c: extract `struct midx_fanout` 4: 92b82c83ea = 4: 2351a9fc27 midx.c: extract `midx_fanout_add_midx_fanout()` 5: db1c6ea8e5 = 5: 845e1484b4 midx.c: extract `midx_fanout_add_pack_fanout()` 6: 4ddddc959b ! 6: d301c4d87f midx.c: include preferred pack correctly with existing MIDX @@ Commit message from a single source. (We could still mark these preferred bits, but doing so is redundant and unnecessary). - This resolves the bug described in the first patch of this series. + This resolves the bug demonstrated by t5326.174 ("preferred pack change + with existing MIDX bitmap"). Signed-off-by: Taylor Blau <me@ttaylorr.com> @@ midx.c: static struct pack_midx_entry *get_sorted_entries(struct multi_pack_inde ## t/t5326-multi-pack-bitmaps.sh ## @@ t/t5326-multi-pack-bitmaps.sh: test_expect_success 'preferred pack change with existing MIDX bitmap' ' + git pack-objects --all --unpacked $objdir/pack/pack0 && + + # Generate a new MIDX which changes the preferred pack +- # to a pack contained in the existing MIDX, such that +- # not all objects from p2 that appear in the MIDX had +- # their copy selected from p2. ++ # to a pack contained in the existing MIDX. + git multi-pack-index write --bitmap \ + --preferred-pack="pack-$p2.pack" && test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap && +- # When the above circumstances are met, an existing bug +- # in the MIDX machinery will cause the reverse index to +- # be read incorrectly, resulting in failed clones (among +- # other things). - test_must_fail git clone --no-local . clone2 ++ # When the above circumstances are met, the preferred ++ # pack should change appropriately and clones should ++ # (still) succeed. + git clone --no-local . clone2 ) ' -: ---------- > 7: 887ab9485f midx.c: avoid adding preferred objects twice -- 2.37.0.1.g1379af2e9d ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v2 1/7] t5326: demonstrate potential bitmap corruption 2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau @ 2022-08-22 19:50 ` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 2/7] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau ` (6 subsequent siblings) 7 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam It is possible to generate a corrupt MIDX bitmap when certain conditions are met. This happens when the preferred pack "P" changes to one (say, "Q") that: - "Q" has objects included in an existing MIDX, - but "Q" is different than "P", - and "Q" and "P" have some objects in common When this is the case, not all objects from "Q" will be selected from "Q" (ie., the generated MIDX will represent them as coming from a different pack), despite "Q" being preferred. This is an invariant violation, since all objects contained in the MIDX's preferred pack are supposed to originate from the preferred pack. In other words, all duplicate objects are resolved in favor of the copy that comes from the MIDX's preferred pack, if any. This violation results in a corrupt object order, which cannot be interpreted by the pack-bitmap code, leading to broken clones and other defects. This test demonstrates the above problem by constructing a minimal reproduction, and showing that the final `git clone` invocation fails. The reproduction is mostly straightforward, except that the new pack generated between MIDX writes (which is necessary in order to prevent that operation from being a noop) must sort ahead of all existing packs in order to prevent a different pack (neither "P" nor "Q") from appearing as preferred (meaning all its objects appear in order at the beginning of the pseudo-pack order). Subsequent commits will first refactor the midx.c::get_sorted_entries() function, and then fix this bug. Reported-by: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/t5326-multi-pack-bitmaps.sh | 47 +++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index 4fe57414c1..c364677ae8 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -307,4 +307,51 @@ test_expect_success 'graceful fallback when missing reverse index' ' ) ' +test_expect_success 'preferred pack change with existing MIDX bitmap' ' + git init preferred-pack-with-existing && + ( + cd preferred-pack-with-existing && + + test_commit base && + test_commit other && + + git rev-list --objects --no-object-names base >p1.objects && + git rev-list --objects --no-object-names other >p2.objects && + + p1="$(git pack-objects "$objdir/pack/pack" \ + --delta-base-offset <p1.objects)" && + p2="$(git pack-objects "$objdir/pack/pack" \ + --delta-base-offset <p2.objects)" && + + # Generate a MIDX containing the first two packs, + # marking p1 as preferred, and ensure that it can be + # successfully cloned. + git multi-pack-index write --bitmap \ + --preferred-pack="pack-$p1.pack" && + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + git clone --no-local . clone1 && + + # Then generate a new pack which sorts ahead of any + # existing pack (by tweaking the pack prefix). + test_commit foo && + git pack-objects --all --unpacked $objdir/pack/pack0 && + + # Generate a new MIDX which changes the preferred pack + # to a pack contained in the existing MIDX, such that + # not all objects from p2 that appear in the MIDX had + # their copy selected from p2. + git multi-pack-index write --bitmap \ + --preferred-pack="pack-$p2.pack" && + test_path_is_file $midx && + test_path_is_file $midx-$(midx_checksum $objdir).bitmap && + + # When the above circumstances are met, an existing bug + # in the MIDX machinery will cause the reverse index to + # be read incorrectly, resulting in failed clones (among + # other things). + test_must_fail git clone --no-local . clone2 + ) +' + test_done -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 2/7] t/lib-bitmap.sh: avoid silencing stderr 2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau 2022-08-22 19:50 ` [PATCH v2 1/7] t5326: demonstrate potential bitmap corruption Taylor Blau @ 2022-08-22 19:50 ` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 3/7] midx.c: extract `struct midx_fanout` Taylor Blau ` (5 subsequent siblings) 7 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam The midx_bitmap_partial_tests() function is responsible for setting up a state where some (but not all) packs in the repository are covered by a MIDX (and bitmap). This function has redirected the `git multi-pack-index write --bitmap`'s stderr to a file "err" since its introduction back in c51f5a6437 (t5326: test multi-pack bitmap behavior, 2021-08-31). This was likely a stray change left over from a slightly different version of this test, since the file "err" is never read after being written. This leads to confusingly-missing output, especially when the contents of stderr are important. Resolve this confusion by avoiding silencing stderr in this case. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/lib-bitmap.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh index a95537e759..f595937094 100644 --- a/t/lib-bitmap.sh +++ b/t/lib-bitmap.sh @@ -440,7 +440,7 @@ midx_bitmap_partial_tests () { test_commit packed && git repack && test_commit loose && - git multi-pack-index write --bitmap 2>err && + git multi-pack-index write --bitmap && test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap ' -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 3/7] midx.c: extract `struct midx_fanout` 2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau 2022-08-22 19:50 ` [PATCH v2 1/7] t5326: demonstrate potential bitmap corruption Taylor Blau 2022-08-22 19:50 ` [PATCH v2 2/7] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau @ 2022-08-22 19:50 ` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 4/7] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau ` (4 subsequent siblings) 7 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam To build up a list of objects (along with their packs, and the offsets within those packs that each object appears at), the MIDX code implements `get_sorted_entries()` which builds up a list of candidates, sorts them, and then removes duplicate entries. To do this, it keeps an array of `pack_midx_entry` structures that it builds up once for each fanout level (ie., for all possible values of the first byte of each object's ID). This array is a function-local variable of `get_sorted_entries()`. Since it uses the ALLOC_GROW() macro, having the `alloc_fanout` variable also be local to that function, and only modified within that function is convenient. However, subsequent changes will extract the two ways this array is filled (from a pack at some fanout value, and from an existing MIDX at some fanout value) into separate functions. Instead of passing around pointers to the entries array, along with `nr_fanout` and `alloc_fanout`, encapsulate these three into a structure instead. Then pass around a pointer to this structure instead. This patch does not yet extract the above two functions, but sets us up to begin doing so in the following commit. For now, the implementation of get_sorted_entries() is only modified to replace `entries_by_fanout` with `fanout.entries`, `nr_fanout` with `fanout.nr`, and so on. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 54 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 35 insertions(+), 19 deletions(-) diff --git a/midx.c b/midx.c index 4e956cacb7..cdb6c481c7 100644 --- a/midx.c +++ b/midx.c @@ -577,6 +577,22 @@ static void fill_pack_entry(uint32_t pack_int_id, entry->preferred = !!preferred; } +struct midx_fanout { + struct pack_midx_entry *entries; + uint32_t nr; + uint32_t alloc; +}; + +static void midx_fanout_grow(struct midx_fanout *fanout, uint32_t nr) +{ + ALLOC_GROW(fanout->entries, nr, fanout->alloc); +} + +static void midx_fanout_sort(struct midx_fanout *fanout) +{ + QSORT(fanout->entries, fanout->nr, midx_oid_compare); +} + /* * It is possible to artificially get into a state where there are many * duplicate copies of objects. That can create high memory pressure if @@ -595,8 +611,8 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, int preferred_pack) { uint32_t cur_fanout, cur_pack, cur_object; - uint32_t alloc_fanout, alloc_objects, total_objects = 0; - struct pack_midx_entry *entries_by_fanout = NULL; + uint32_t alloc_objects, total_objects = 0; + struct midx_fanout fanout = { 0 }; struct pack_midx_entry *deduplicated_entries = NULL; uint32_t start_pack = m ? m->num_packs : 0; @@ -608,14 +624,14 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, * slices to be evenly distributed, with some noise. Hence, * allocate slightly more than one 256th. */ - alloc_objects = alloc_fanout = total_objects > 3200 ? total_objects / 200 : 16; + alloc_objects = fanout.alloc = total_objects > 3200 ? total_objects / 200 : 16; - ALLOC_ARRAY(entries_by_fanout, alloc_fanout); + ALLOC_ARRAY(fanout.entries, fanout.alloc); ALLOC_ARRAY(deduplicated_entries, alloc_objects); *nr_objects = 0; for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) { - uint32_t nr_fanout = 0; + fanout.nr = 0; if (m) { uint32_t start = 0, end; @@ -625,15 +641,15 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, end = ntohl(m->chunk_oid_fanout[cur_fanout]); for (cur_object = start; cur_object < end; cur_object++) { - ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout); + midx_fanout_grow(&fanout, fanout.nr + 1); nth_midxed_pack_midx_entry(m, - &entries_by_fanout[nr_fanout], + &fanout.entries[fanout.nr], cur_object); if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) - entries_by_fanout[nr_fanout].preferred = 1; + fanout.entries[fanout.nr].preferred = 1; else - entries_by_fanout[nr_fanout].preferred = 0; - nr_fanout++; + fanout.entries[fanout.nr].preferred = 0; + fanout.nr++; } } @@ -646,36 +662,36 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, end = get_pack_fanout(info[cur_pack].p, cur_fanout); for (cur_object = start; cur_object < end; cur_object++) { - ALLOC_GROW(entries_by_fanout, nr_fanout + 1, alloc_fanout); + midx_fanout_grow(&fanout, fanout.nr + 1); fill_pack_entry(cur_pack, info[cur_pack].p, cur_object, - &entries_by_fanout[nr_fanout], + &fanout.entries[fanout.nr], preferred); - nr_fanout++; + fanout.nr++; } } - QSORT(entries_by_fanout, nr_fanout, midx_oid_compare); + midx_fanout_sort(&fanout); /* * The batch is now sorted by OID and then mtime (descending). * Take only the first duplicate. */ - for (cur_object = 0; cur_object < nr_fanout; cur_object++) { - if (cur_object && oideq(&entries_by_fanout[cur_object - 1].oid, - &entries_by_fanout[cur_object].oid)) + for (cur_object = 0; cur_object < fanout.nr; cur_object++) { + if (cur_object && oideq(&fanout.entries[cur_object - 1].oid, + &fanout.entries[cur_object].oid)) continue; ALLOC_GROW(deduplicated_entries, *nr_objects + 1, alloc_objects); memcpy(&deduplicated_entries[*nr_objects], - &entries_by_fanout[cur_object], + &fanout.entries[cur_object], sizeof(struct pack_midx_entry)); (*nr_objects)++; } } - free(entries_by_fanout); + free(fanout.entries); return deduplicated_entries; } -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 4/7] midx.c: extract `midx_fanout_add_midx_fanout()` 2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau ` (2 preceding siblings ...) 2022-08-22 19:50 ` [PATCH v2 3/7] midx.c: extract `struct midx_fanout` Taylor Blau @ 2022-08-22 19:50 ` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 5/7] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau ` (3 subsequent siblings) 7 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam Extract a routine to add all objects whose object ID's first byte is `cur_fanout` from an existing MIDX. This function will only be called once, so extracting it is purely cosmetic to improve the readability of `get_sorted_entries()` (its sole caller) below. The functionality is unchanged in this commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 47 ++++++++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 19 deletions(-) diff --git a/midx.c b/midx.c index cdb6c481c7..0d40089c4d 100644 --- a/midx.c +++ b/midx.c @@ -593,6 +593,31 @@ static void midx_fanout_sort(struct midx_fanout *fanout) QSORT(fanout->entries, fanout->nr, midx_oid_compare); } +static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, + struct multi_pack_index *m, + int preferred_pack, + uint32_t cur_fanout) +{ + uint32_t start = 0, end; + uint32_t cur_object; + + if (cur_fanout) + start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]); + end = ntohl(m->chunk_oid_fanout[cur_fanout]); + + for (cur_object = start; cur_object < end; cur_object++) { + midx_fanout_grow(fanout, fanout->nr + 1); + nth_midxed_pack_midx_entry(m, + &fanout->entries[fanout->nr], + cur_object); + if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) + fanout->entries[fanout->nr].preferred = 1; + else + fanout->entries[fanout->nr].preferred = 0; + fanout->nr++; + } +} + /* * It is possible to artificially get into a state where there are many * duplicate copies of objects. That can create high memory pressure if @@ -633,25 +658,9 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, for (cur_fanout = 0; cur_fanout < 256; cur_fanout++) { fanout.nr = 0; - if (m) { - uint32_t start = 0, end; - - if (cur_fanout) - start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]); - end = ntohl(m->chunk_oid_fanout[cur_fanout]); - - for (cur_object = start; cur_object < end; cur_object++) { - midx_fanout_grow(&fanout, fanout.nr + 1); - nth_midxed_pack_midx_entry(m, - &fanout.entries[fanout.nr], - cur_object); - if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) - fanout.entries[fanout.nr].preferred = 1; - else - fanout.entries[fanout.nr].preferred = 0; - fanout.nr++; - } - } + if (m) + midx_fanout_add_midx_fanout(&fanout, m, preferred_pack, + cur_fanout); for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { uint32_t start = 0, end; -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 5/7] midx.c: extract `midx_fanout_add_pack_fanout()` 2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau ` (3 preceding siblings ...) 2022-08-22 19:50 ` [PATCH v2 4/7] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau @ 2022-08-22 19:50 ` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 6/7] midx.c: include preferred pack correctly with existing MIDX Taylor Blau ` (2 subsequent siblings) 7 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam Extract a routine to add all objects whose object ID's first byte is `cur_fanout` from a given pack (identified by its index into the `struct pack_info` array maintained by the MIDX writing routine). Unlike the previous extraction (for `midx_fanout_add_midx_fanout()`), this function will be called twice, once for all new packs, and again for the preferred pack (if it appears in an existing MIDX). The latter change is to resolve the bug described a few patches ago, and will be made in the subsequent commit. Similar to the previous refactoring, this function also enhances the readability of its caller in `get_sorted_entries()`. Its functionality is unchanged in this commit. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-) diff --git a/midx.c b/midx.c index 0d40089c4d..be8186eec2 100644 --- a/midx.c +++ b/midx.c @@ -618,6 +618,31 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, } } +static void midx_fanout_add_pack_fanout(struct midx_fanout *fanout, + struct pack_info *info, + uint32_t cur_pack, + int preferred, + uint32_t cur_fanout) +{ + struct packed_git *pack = info[cur_pack].p; + uint32_t start = 0, end; + uint32_t cur_object; + + if (cur_fanout) + start = get_pack_fanout(pack, cur_fanout - 1); + end = get_pack_fanout(pack, cur_fanout); + + for (cur_object = start; cur_object < end; cur_object++) { + midx_fanout_grow(fanout, fanout->nr + 1); + fill_pack_entry(cur_pack, + info[cur_pack].p, + cur_object, + &fanout->entries[fanout->nr], + preferred); + fanout->nr++; + } +} + /* * It is possible to artificially get into a state where there are many * duplicate copies of objects. That can create high memory pressure if @@ -663,22 +688,10 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, cur_fanout); for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { - uint32_t start = 0, end; int preferred = cur_pack == preferred_pack; - - if (cur_fanout) - start = get_pack_fanout(info[cur_pack].p, cur_fanout - 1); - end = get_pack_fanout(info[cur_pack].p, cur_fanout); - - for (cur_object = start; cur_object < end; cur_object++) { - midx_fanout_grow(&fanout, fanout.nr + 1); - fill_pack_entry(cur_pack, - info[cur_pack].p, - cur_object, - &fanout.entries[fanout.nr], - preferred); - fanout.nr++; - } + midx_fanout_add_pack_fanout(&fanout, + info, cur_pack, + preferred, cur_fanout); } midx_fanout_sort(&fanout); -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 6/7] midx.c: include preferred pack correctly with existing MIDX 2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau ` (4 preceding siblings ...) 2022-08-22 19:50 ` [PATCH v2 5/7] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau @ 2022-08-22 19:50 ` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 7/7] midx.c: avoid adding preferred objects twice Taylor Blau 2022-08-23 16:23 ` [PATCH v2 0/7] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee 7 siblings, 0 replies; 29+ messages in thread From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam This patch resolves an issue where the object order used to generate a MIDX bitmap would violate an invariant that all of the preferred pack's objects are represented by that pack in the MIDX. The problem arises when reusing an existing MIDX while generating a new one, and occurs specifically when the identity of the preferred pack changes from one MIDX to another, along with a few other conditions: - the new preferred pack must also be present in the existing MIDX - the new preferred pack must *not* have been the preferred pack in the existing MIDX - most importantly, there must be at least one object present in the physical preferred pack (ie., it shows up in that pack's index) but was selected from a *different* pack when the previous MIDX was generated When the above conditions are all met, we end up (incorrectly) discarding copies of some objects in the pack selected as the preferred pack. This is because `get_sorted_entries()` adds objects to its list by doing the following at each fanout level: - first, adding all objects from that fanout level from an existing MIDX - then, adding all objects from that fanout level in each pack *not* included in the existing MIDX So if some object was not selected from the to-be-preferred pack when writing the previous MIDX, then we will never consider it as a candidate when generating the new MIDX. This means that it's possible for the preferred pack to not include all of its objects in the MIDX's pseudo-pack object order, which is an invariant violation of that order. Resolve this by adding all objects from the preferred pack separately when it appears in the existing MIDX (if one was present). This will duplicate objects from that pack that *did* appear in the MIDX, but this is fine, since get_sorted_entries() already handles duplicates. (A future optimization in this area could avoid adding copies of objects that we know already existing in the MIDX.) Note that we no longer need to compute the preferred-ness of objects added from the MIDX, since we only want to select the preferred objects from a single source. (We could still mark these preferred bits, but doing so is redundant and unnecessary). This resolves the bug demonstrated by t5326.174 ("preferred pack change with existing MIDX bitmap"). Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 14 +++++++------- t/t5326-multi-pack-bitmaps.sh | 13 +++++-------- 2 files changed, 12 insertions(+), 15 deletions(-) diff --git a/midx.c b/midx.c index be8186eec2..bd1d27090e 100644 --- a/midx.c +++ b/midx.c @@ -595,7 +595,6 @@ static void midx_fanout_sort(struct midx_fanout *fanout) static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, struct multi_pack_index *m, - int preferred_pack, uint32_t cur_fanout) { uint32_t start = 0, end; @@ -610,10 +609,7 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, nth_midxed_pack_midx_entry(m, &fanout->entries[fanout->nr], cur_object); - if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack) - fanout->entries[fanout->nr].preferred = 1; - else - fanout->entries[fanout->nr].preferred = 0; + fanout->entries[fanout->nr].preferred = 0; fanout->nr++; } } @@ -684,8 +680,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, fanout.nr = 0; if (m) - midx_fanout_add_midx_fanout(&fanout, m, preferred_pack, - cur_fanout); + midx_fanout_add_midx_fanout(&fanout, m, cur_fanout); for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { int preferred = cur_pack == preferred_pack; @@ -694,6 +689,11 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, preferred, cur_fanout); } + if (-1 < preferred_pack && preferred_pack < start_pack) + midx_fanout_add_pack_fanout(&fanout, info, + preferred_pack, 1, + cur_fanout); + midx_fanout_sort(&fanout); /* diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh index c364677ae8..89ecd1062c 100755 --- a/t/t5326-multi-pack-bitmaps.sh +++ b/t/t5326-multi-pack-bitmaps.sh @@ -338,19 +338,16 @@ test_expect_success 'preferred pack change with existing MIDX bitmap' ' git pack-objects --all --unpacked $objdir/pack/pack0 && # Generate a new MIDX which changes the preferred pack - # to a pack contained in the existing MIDX, such that - # not all objects from p2 that appear in the MIDX had - # their copy selected from p2. + # to a pack contained in the existing MIDX. git multi-pack-index write --bitmap \ --preferred-pack="pack-$p2.pack" && test_path_is_file $midx && test_path_is_file $midx-$(midx_checksum $objdir).bitmap && - # When the above circumstances are met, an existing bug - # in the MIDX machinery will cause the reverse index to - # be read incorrectly, resulting in failed clones (among - # other things). - test_must_fail git clone --no-local . clone2 + # When the above circumstances are met, the preferred + # pack should change appropriately and clones should + # (still) succeed. + git clone --no-local . clone2 ) ' -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v2 7/7] midx.c: avoid adding preferred objects twice 2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau ` (5 preceding siblings ...) 2022-08-22 19:50 ` [PATCH v2 6/7] midx.c: include preferred pack correctly with existing MIDX Taylor Blau @ 2022-08-22 19:50 ` Taylor Blau 2022-08-23 16:22 ` Derrick Stolee 2022-08-23 16:23 ` [PATCH v2 0/7] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee 7 siblings, 1 reply; 29+ messages in thread From: Taylor Blau @ 2022-08-22 19:50 UTC (permalink / raw) To: git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, derrickstolee, jonathantanmy, kaartic.sivaraam The last commit changes the behavior of midx.c's `get_sorted_objects()` function to handle the case of writing a MIDX bitmap while reusing an existing MIDX and changing the identity of the preferred pack separately. As part of this change, all objects from the (new) preferred pack are added to the fanout table in a separate pass. Since these copies of the objects all have their preferred bits set, any duplicates will be resolved in their favor. Importantly, this includes any copies of those same objects that come from the existing MIDX. We know at the time of adding them that they'll be redundant if their source pack is the (new) preferred one, so we can avoid adding them to the list in this case. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/midx.c b/midx.c index bd1d27090e..148ecc2f14 100644 --- a/midx.c +++ b/midx.c @@ -595,7 +595,8 @@ static void midx_fanout_sort(struct midx_fanout *fanout) static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, struct multi_pack_index *m, - uint32_t cur_fanout) + uint32_t cur_fanout, + int preferred_pack) { uint32_t start = 0, end; uint32_t cur_object; @@ -605,6 +606,15 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, end = ntohl(m->chunk_oid_fanout[cur_fanout]); for (cur_object = start; cur_object < end; cur_object++) { + if ((preferred_pack > -1) && + (preferred_pack == nth_midxed_pack_int_id(m, cur_object))) { + /* + * Objects from preferred packs are added + * separately. + */ + continue; + } + midx_fanout_grow(fanout, fanout->nr + 1); nth_midxed_pack_midx_entry(m, &fanout->entries[fanout->nr], @@ -680,7 +690,8 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m, fanout.nr = 0; if (m) - midx_fanout_add_midx_fanout(&fanout, m, cur_fanout); + midx_fanout_add_midx_fanout(&fanout, m, cur_fanout, + preferred_pack); for (cur_pack = start_pack; cur_pack < nr_packs; cur_pack++) { int preferred = cur_pack == preferred_pack; -- 2.37.0.1.g1379af2e9d ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v2 7/7] midx.c: avoid adding preferred objects twice 2022-08-22 19:50 ` [PATCH v2 7/7] midx.c: avoid adding preferred objects twice Taylor Blau @ 2022-08-23 16:22 ` Derrick Stolee 0 siblings, 0 replies; 29+ messages in thread From: Derrick Stolee @ 2022-08-23 16:22 UTC (permalink / raw) To: Taylor Blau, git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, jonathantanmy, kaartic.sivaraam On 8/22/2022 3:50 PM, Taylor Blau wrote: > The last commit changes the behavior of midx.c's `get_sorted_objects()` > function to handle the case of writing a MIDX bitmap while reusing an > existing MIDX and changing the identity of the preferred pack > separately. > > As part of this change, all objects from the (new) preferred pack are > added to the fanout table in a separate pass. Since these copies of the > objects all have their preferred bits set, any duplicates will be > resolved in their favor. > > Importantly, this includes any copies of those same objects that come > from the existing MIDX. We know at the time of adding them that they'll > be redundant if their source pack is the (new) preferred one, so we can > avoid adding them to the list in this case. Good call to reduce memory requirements. > @@ -605,6 +606,15 @@ static void midx_fanout_add_midx_fanout(struct midx_fanout *fanout, > end = ntohl(m->chunk_oid_fanout[cur_fanout]); > > for (cur_object = start; cur_object < end; cur_object++) { > + if ((preferred_pack > -1) && > + (preferred_pack == nth_midxed_pack_int_id(m, cur_object))) { nit: you don't need the extra parentheses here. Thanks, -Stolee ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v2 0/7] midx: permit changing the preferred pack when reusing the MIDX 2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau ` (6 preceding siblings ...) 2022-08-22 19:50 ` [PATCH v2 7/7] midx.c: avoid adding preferred objects twice Taylor Blau @ 2022-08-23 16:23 ` Derrick Stolee 7 siblings, 0 replies; 29+ messages in thread From: Derrick Stolee @ 2022-08-23 16:23 UTC (permalink / raw) To: Taylor Blau, git Cc: gitster, Johannes.Schindelin, chakrabortyabhradeep79, jonathantanmy, kaartic.sivaraam On 8/22/2022 3:50 PM, Taylor Blau wrote: > Here is a small reroll of my series that resolves a bug that was > reported[1] by Johannes, and investigated by him, Abhradeep, and Stolee > in that same sub-thread. > > As before: the crux of the issue is that a MIDX bitmap can enter a > corrupt state when changing the preferred pack from its value in an > existing MIDX in certain circumstances as described in the first and > final patches. > > This version incorporates some cosmetic changes suggested by Stolee, and > adds a new patch on top which avoids adding objects from the MIDX that > were represented by the (new) preferred pack, since we know we'll end up > discarding those objects anyways. For convenience, a range-diff against > v1 is included below. You resolved the comments from the previous version well. I'm happy with the changes. I have one style nit in the new patch, but it doesn't merit a re-roll on its own. Thanks, -Stolee ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2022-08-23 18:12 UTC | newest] Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-08-19 21:30 [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Taylor Blau 2022-08-19 21:30 ` [PATCH 1/6] t5326: demonstrate potential bitmap corruption Taylor Blau 2022-08-22 16:09 ` Derrick Stolee 2022-08-22 17:57 ` Taylor Blau 2022-08-22 19:31 ` Junio C Hamano 2022-08-22 19:41 ` Taylor Blau 2022-08-19 21:30 ` [PATCH 2/6] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau 2022-08-20 16:44 ` Abhradeep Chakraborty 2022-08-22 17:58 ` Taylor Blau 2022-08-19 21:30 ` [PATCH 3/6] midx.c: extract `struct midx_fanout` Taylor Blau 2022-08-19 21:30 ` [PATCH 4/6] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau 2022-08-19 21:30 ` [PATCH 5/6] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau 2022-08-19 21:30 ` [PATCH 6/6] midx.c: include preferred pack correctly with existing MIDX Taylor Blau 2022-08-20 18:40 ` Abhradeep Chakraborty 2022-08-22 18:08 ` Taylor Blau 2022-08-22 17:03 ` Derrick Stolee 2022-08-22 18:14 ` Taylor Blau 2022-08-22 17:04 ` [PATCH 0/6] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee 2022-08-22 19:44 ` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 0/7] " Taylor Blau 2022-08-22 19:50 ` [PATCH v2 1/7] t5326: demonstrate potential bitmap corruption Taylor Blau 2022-08-22 19:50 ` [PATCH v2 2/7] t/lib-bitmap.sh: avoid silencing stderr Taylor Blau 2022-08-22 19:50 ` [PATCH v2 3/7] midx.c: extract `struct midx_fanout` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 4/7] midx.c: extract `midx_fanout_add_midx_fanout()` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 5/7] midx.c: extract `midx_fanout_add_pack_fanout()` Taylor Blau 2022-08-22 19:50 ` [PATCH v2 6/7] midx.c: include preferred pack correctly with existing MIDX Taylor Blau 2022-08-22 19:50 ` [PATCH v2 7/7] midx.c: avoid adding preferred objects twice Taylor Blau 2022-08-23 16:22 ` Derrick Stolee 2022-08-23 16:23 ` [PATCH v2 0/7] midx: permit changing the preferred pack when reusing the MIDX Derrick Stolee
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).