git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Abhradeep Chakraborty <chakrabortyabhradeep79@gmail.com>
Cc: "Derrick Stolee" <derrickstolee@github.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Abhradeep Chakraborty via GitGitGadget" <gitgitgadget@gmail.com>,
	git <git@vger.kernel.org>,
	"Kaartic Sivaram" <kaartic.sivaraam@gmail.com>,
	"Philip Oakley" <philipoakley@iee.email>,
	"Martin Ågren" <martin.agren@gmail.com>
Subject: Re: [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests
Date: Tue, 16 Aug 2022 17:57:57 -0400	[thread overview]
Message-ID: <YvwS5RlEMvgDm93m@nand.local> (raw)
In-Reply-To: <CAPOJW5z99b0_NGBYDbZUvmzbWECJKxGvB4RffoPJYszfFB0cEg@mail.gmail.com>

Hi Abhradeep,

On Sat, Aug 13, 2022 at 04:29:32PM +0530, Abhradeep Chakraborty wrote:
> One thing that really worries me is what if the failure is not related
> to calling `oe_map_new_pack()? I did all my work assuming that this
> function is the culprit. But I don't know if it is.

After much consternation, I was able to rule out `oe_map_new_pack()` as
the culprit.

(Your find that we call `add_packed_git()` with arguments corresponding
to pack(s) that we've already loaded is good, and I think that is
definitely something we can and should consider cleaning up. But it
ultimately doesn't affect correctness, just the memory efficiency of the
process).

When I took a close look at the process to generate MIDX bitmaps, I found a
couple of interesting things. The first more trivial fix is that we
incorrectly propagate the "preferred"-ness bit from packs in an existing
MIDX when generating a new one. If the identity of the preferred pack
changes, we should not drag forward those bits on objects already known
(and preferred) by the existing MIDX:

--- >8 ---

diff --git a/midx.c b/midx.c
index 3ff6e91e6e..40e520534c 100644
--- a/midx.c
+++ b/midx.c
@@ -619,6 +619,9 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,

 		if (m) {
 			uint32_t start = 0, end;
+			int orig_preferred_pack = -1;
+			if (0 <= preferred_pack && preferred_pack < m->num_packs)
+				orig_preferred_pack = info[preferred_pack].orig_pack_int_id;

 			if (cur_fanout)
 				start = ntohl(m->chunk_oid_fanout[cur_fanout - 1]);
@@ -629,7 +632,7 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
 				nth_midxed_pack_midx_entry(m,
 							   &entries_by_fanout[nr_fanout],
 							   cur_object);
-				if (nth_midxed_pack_int_id(m, cur_object) == preferred_pack)
+				if (nth_midxed_pack_int_id(m, cur_object) == orig_preferred_pack)
 					entries_by_fanout[nr_fanout].preferred = 1;
 				else
 					entries_by_fanout[nr_fanout].preferred = 0;

--- 8< ---

But a more interesting problem arose when I took a closer look at the
psuedo-pack order of objects generated according to
`prepare_midx_packing_data()`. With Johannes' fixed $test_tick value, I
was able to see the following in runs that succeeded:

    27bb4ecd3e96cd0b3bc37d92a78cb5cbf34c418afa67f74cc52517ff7df418e1 (12 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
    c68154d69c19f010afce786c6debe926ae6e7decfb946a4549085a792cf9de7e (202 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
    a0b85b314ede46aa9f9b5796a284a4cf0b86ebb8fa32f87ae246e21b5378b11c (392 in pack-63c460f99a5c08f631396b1828c64006170a9d543b064506fd11b504a62acf52.idx)
    [...]

and the following in runs that failed:

    46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b (221 in pack-3fc052de674e3d48096af7cc5125675c0ae1082aa798eb9358de357b2655f9ad.idx)
    67df8a01ac84cf5f028855c48384eac3336bb02a52603bac285c4b31d66b3ab5 (12 in pack-2021cdedb33b542b244eacf3d009d1384471a53286b0c1235c91d124355dc818.idx)
    1556b5f0ad7cb0c25a1fc47355fcffc00775e90d94ae8c511e5776b204796ce6 (200 in pack-2021cdedb33b542b244eacf3d009d1384471a53286b0c1235c91d124355dc818.idx)

In the successful case, pack 63c460f99a... is preferred, and its objects
appear in ascending order of their pack offsets. But in the other case,
pack 3fc052de67... is preferred, but its first object starts at offset
221. Huh? That's not right:

    $ git show-index <.git/objects/pack/pack-3fc052de674e3d48096af7cc5125675c0ae1082aa798eb9358de357b2655f9ad.idx
    221 46193a971f5045cb3ca6022957541f9ccddfbfe78591d8506e2d952f8113059b (1f4bd28e)
    12 4d332072f161629ffe4652ecd3ce377ef88447bec73f05ab0f3515f98bd061cf (fadf885b)

Indeed, there is another object there at offset 12. Missing that object
(since it comes from a preferred pack) is an invariant violation (since
all objects from the preferred pack should be selected when multiple
copies are available).

It's missing because the existing MIDX selects that object from a
different pack, and when we get to fanout 0x4d (the one which should
include that object), we skip over seeing its copy in the preferred pack
because that pack already appears in the existing MIDX, though it wasn't
preferred.

I think there are a couple of ways to fix this. The easiest thing to do
would be to force the identity of the preferred pack to be the same when
generating a MIDX bitmap *while reusing an existing MIDX*, since that is
the only time this bug can happen.

But that's a little magical for my taste. I think a more reasonable fix
would be to include copies of all objects from the preferred pack
including in the case where that pack was non-preferred in an existing
MIDX and at least one object in that pack was selected from a different
pack in the existing MIDX.

Abhradeep -- let me know if this is something you want to look into. I
think it's a very worthwhile bug to fix, since it is definitely
trigger-able in the wild (notably, only with `git multi-pack-index write
--bitmap` without `--stdin-packs` and only under certain circumstances),
and not just limited to SHA-256 mode.

If you are busy experimenting with CRoaring, that's no problem and I can
fix this up, too. Either way, it would be worth you and others weighing
in on which fix you think is worth pursuing.

Phew.

Thanks,
Taylor

  reply	other threads:[~2022-08-16 21:58 UTC|newest]

Thread overview: 162+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-20 12:33 [PATCH 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-06-20 12:33 ` [PATCH 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-06-20 16:56   ` Derrick Stolee
2022-06-20 17:09     ` Taylor Blau
2022-06-21  8:31       ` Abhradeep Chakraborty
2022-06-22 16:26         ` Taylor Blau
2022-06-21  8:23     ` Abhradeep Chakraborty
2022-06-20 17:21   ` Taylor Blau
2022-06-21  9:22     ` Abhradeep Chakraborty
2022-06-22 16:29       ` Taylor Blau
2022-06-22 16:45         ` Abhradeep Chakraborty
2022-06-20 20:21   ` Derrick Stolee
2022-06-21 10:08     ` Abhradeep Chakraborty
2022-06-22 16:30       ` Taylor Blau
2022-06-20 12:33 ` [PATCH 2/6] pack-bitmap: prepare to read " Abhradeep Chakraborty via GitGitGadget
2022-06-20 20:49   ` Derrick Stolee
2022-06-21 10:28     ` Abhradeep Chakraborty
2022-06-20 22:06   ` Taylor Blau
2022-06-21 11:52     ` Abhradeep Chakraborty
2022-06-22 16:49       ` Taylor Blau
2022-06-22 17:18         ` Abhradeep Chakraborty
2022-06-22 21:34           ` Taylor Blau
2022-06-20 12:33 ` [PATCH 3/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-06-20 22:16   ` Taylor Blau
2022-06-21 12:50     ` Abhradeep Chakraborty
2022-06-22 16:51       ` Taylor Blau
2022-06-20 12:33 ` [PATCH 4/6] builtin/pack-objects.c: learn pack.writeBitmapLookupTable Taylor Blau via GitGitGadget
2022-06-20 22:18   ` Taylor Blau
2022-06-20 12:33 ` [PATCH 5/6] bitmap-commit-table: add tests for the bitmap lookup table Abhradeep Chakraborty via GitGitGadget
2022-06-22 16:54   ` Taylor Blau
2022-06-20 12:33 ` [PATCH 6/6] bitmap-lookup-table: add performance tests Abhradeep Chakraborty via GitGitGadget
2022-06-22 17:14   ` Taylor Blau
2022-06-26 13:10 ` [PATCH v2 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-06-26 13:10   ` [PATCH v2 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-06-27 14:18     ` Derrick Stolee
2022-06-27 15:48       ` Taylor Blau
2022-06-27 16:51       ` Abhradeep Chakraborty
2022-06-26 13:10   ` [PATCH v2 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-06-27 14:35     ` Derrick Stolee
2022-06-27 16:12       ` Taylor Blau
2022-06-27 17:10       ` Abhradeep Chakraborty
2022-06-27 16:05     ` Taylor Blau
2022-06-27 18:29       ` Abhradeep Chakraborty
2022-06-26 13:10   ` [PATCH v2 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-06-27 14:43     ` Derrick Stolee
2022-06-27 17:42       ` Abhradeep Chakraborty
2022-06-27 17:49         ` Taylor Blau
2022-06-27 17:47     ` Taylor Blau
2022-06-27 18:39       ` Abhradeep Chakraborty
2022-06-29 20:11         ` Taylor Blau
2022-06-26 13:10   ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-06-27 15:12     ` Derrick Stolee
2022-06-27 18:06       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table Abhradeep Chakraborty
2022-06-27 18:32         ` Derrick Stolee
2022-06-27 21:49       ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Taylor Blau
2022-06-28  8:59         ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table Abhradeep Chakraborty
2022-06-29 20:22           ` Taylor Blau
2022-06-30  6:58             ` [PATCH v2 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty
2022-06-27 21:38     ` Taylor Blau
2022-06-28 19:25       ` Abhradeep Chakraborty
2022-06-29 20:37         ` Taylor Blau
2022-06-29 20:41           ` Taylor Blau
2022-06-30  8:35           ` Abhradeep Chakraborty
2022-06-26 13:10   ` [PATCH v2 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-06-27 21:53     ` Taylor Blau
2022-06-28  7:58       ` Abhradeep Chakraborty
2022-06-29 20:40         ` Taylor Blau
2022-06-26 13:10   ` [PATCH v2 6/6] p5310-pack-bitmaps.sh: enable pack.writeReverseIndex for testing Abhradeep Chakraborty via GitGitGadget
2022-06-27 21:50     ` Taylor Blau
2022-06-28  8:01       ` Abhradeep Chakraborty
2022-07-04  8:46   ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-07-04  8:46     ` [PATCH v3 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-08 16:38       ` Philip Oakley
2022-07-09  7:53         ` Abhradeep Chakraborty
2022-07-10 15:01           ` Philip Oakley
2022-07-14 23:15             ` Taylor Blau
2022-07-15 10:36               ` Philip Oakley
2022-07-15 18:48             ` Abhradeep Chakraborty
2022-07-04  8:46     ` [PATCH v3 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-07-14 23:26       ` Taylor Blau
2022-07-15  2:22       ` Taylor Blau
2022-07-15 15:58         ` Abhradeep Chakraborty
2022-07-15 22:15           ` Taylor Blau
2022-07-16 11:50             ` Abhradeep Chakraborty
2022-07-26  0:34               ` Taylor Blau
2022-07-18  8:59       ` Martin Ågren
2022-07-04  8:46     ` [PATCH v3 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-07-04  8:46     ` [PATCH v3 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-15  2:46       ` Taylor Blau
2022-07-15 16:38         ` Abhradeep Chakraborty
2022-07-15 22:20           ` Taylor Blau
2022-07-18  9:06             ` Martin Ågren
2022-07-18 19:25               ` Abhradeep Chakraborty
2022-07-18 23:26                 ` Martin Ågren
2022-07-26  0:45               ` Taylor Blau
2022-07-04  8:46     ` [PATCH v3 5/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-07-15  2:53       ` Taylor Blau
2022-07-15 18:23         ` Abhradeep Chakraborty
2022-07-04  8:46     ` [PATCH v3 6/6] p5310-pack-bitmaps.sh: remove pack.writeReverseIndex Abhradeep Chakraborty via GitGitGadget
2022-07-04 16:35     ` [PATCH v3 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty
2022-07-06 19:21     ` Junio C Hamano
2022-07-07  8:48       ` Abhradeep Chakraborty
2022-07-07 18:09         ` Kaartic Sivaraam
2022-07-07 18:42           ` Abhradeep Chakraborty
2022-07-20 14:05     ` [PATCH v4 " Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex` Abhradeep Chakraborty via GitGitGadget
2022-07-20 14:05       ` [PATCH v4 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-07-20 18:38       ` [PATCH v5 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-07-20 18:38         ` [PATCH v5 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-20 18:38         ` [PATCH v5 2/6] pack-bitmap-write.c: write " Abhradeep Chakraborty via GitGitGadget
2022-07-26  0:52           ` Taylor Blau
2022-07-26 18:22             ` Abhradeep Chakraborty
2022-07-20 18:38         ` [PATCH v5 3/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-07-28 19:22           ` Johannes Schindelin
2022-08-02 12:40             ` Abhradeep Chakraborty
2022-08-02 15:35               ` Johannes Schindelin
2022-08-02 17:44                 ` Abhradeep Chakraborty
2022-08-08 13:06                   ` Johannes Schindelin
2022-08-08 13:58                     ` Abhradeep Chakraborty
2022-08-09  9:03                       ` Johannes Schindelin
2022-08-09 12:03                         ` Abhradeep Chakraborty
2022-08-09 12:07                           ` Abhradeep Chakraborty
2022-08-10  9:09                           ` Johannes Schindelin
2022-08-10  9:20                             ` Johannes Schindelin
2022-08-10 10:04                               ` Abhradeep Chakraborty
2022-08-10 17:51                                 ` Derrick Stolee
2022-08-12 18:51                                   ` Abhradeep Chakraborty
2022-08-12 19:22                                     ` Derrick Stolee
2022-08-13 10:59                                       ` Abhradeep Chakraborty
2022-08-16 21:57                                         ` Taylor Blau [this message]
2022-08-17 10:02                                           ` Abhradeep Chakraborty
2022-08-17 20:38                                             ` Taylor Blau
2022-08-19 21:49                                               ` Taylor Blau
2022-08-13 11:05                               ` Abhradeep Chakraborty
2022-08-16 18:47                             ` Taylor Blau
2022-07-20 18:38         ` [PATCH v5 4/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-07-26  1:13           ` Taylor Blau
2022-07-26 18:56             ` Abhradeep Chakraborty
2022-07-26 19:36             ` Eric Sunshine
2022-07-20 18:38         ` [PATCH v5 5/6] p5310-pack-bitmaps.sh: enable `pack.writeReverseIndex` Abhradeep Chakraborty via GitGitGadget
2022-07-26  1:18           ` Taylor Blau
2022-07-26  7:15             ` Ævar Arnfjörð Bjarmason
2022-07-26 13:32               ` Derrick Stolee
2022-07-26 13:54                 ` Ævar Arnfjörð Bjarmason
2022-07-26 18:17                   ` Abhradeep Chakraborty
2022-07-20 18:38         ` [PATCH v5 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55         ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 1/6] Documentation/technical: describe bitmap lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 2/6] bitmap: move `get commit positions` code to `bitmap_writer_finish` Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 3/6] pack-bitmap-write.c: write lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 4/6] pack-bitmap-write: learn pack.writeBitmapLookupTable and add tests Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 5/6] pack-bitmap: prepare to read lookup table extension Abhradeep Chakraborty via GitGitGadget
2022-08-14 16:55           ` [PATCH v6 6/6] bitmap-lookup-table: add performance tests for lookup table Abhradeep Chakraborty via GitGitGadget
2022-08-19 21:21           ` [PATCH v6 0/6] [GSoC] bitmap: integrate a lookup table extension to the bitmap format Junio C Hamano
2022-08-22 14:42             ` Johannes Schindelin
2022-08-22 14:48               ` Taylor Blau
2022-08-25 22:16           ` Taylor Blau
2022-08-26 16:02             ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YvwS5RlEMvgDm93m@nand.local \
    --to=me@ttaylorr.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=chakrabortyabhradeep79@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=kaartic.sivaraam@gmail.com \
    --cc=martin.agren@gmail.com \
    --cc=philipoakley@iee.email \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).