git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: peff@peff.net, gitster@pobox.com, avarab@gmail.com
Subject: [PATCH v2 3/8] midx: preliminary support for `--refs-snapshot`
Date: Wed, 15 Sep 2021 14:24:40 -0400	[thread overview]
Message-ID: <42f1ae9edeb55d8e1b0c8f2c3110e9c8326a5fc1.1631730270.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1631730270.git.me@ttaylorr.com>

To figure out which commits we can write a bitmap for, the multi-pack
index/bitmap code does a reachability traversal, marking any commit
which can be found in the MIDX as eligible to receive a bitmap.

This approach will cause a problem when multi-pack bitmaps are able to
be generated from `git repack`, since the reference tips can change
during the repack. Even though we ignore commits that don't exist in
the MIDX (when doing a scan of the ref tips), it's possible that a
commit in the MIDX reaches something that isn't.

This can happen when a multi-pack index contains some pack which refers
to loose objects (which by definition aren't included in the multi-pack
index).

By taking a snapshot of the references before we start repacking, we can
close that race window. In the above scenario (where we have a packed
object pointing at a loose one), we'll either (a) take a snapshot of the
references before seeing the packed one, or (b) take it after, at which
point we can guarantee that the loose object will be packed and included
in the MIDX.

This patch does just that. It writes a temporary "reference snapshot",
which is a list of OIDs that are at the ref tips before writing a
multi-pack bitmap. References that are "preferred" (i.e,. are a suffix
of at least one value of the 'pack.preferBitmapTips' configuration) are
marked with a special '+'.

The format is simple: one line per commit at each tip, with an optional
'+' at the beginning (for preferred references, as described above).

When provided, the reference snapshot is used to drive bitmap selection
instead of the MIDX code doing its own traversal. When it isn't
provided, the usual traversal takes place instead.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt | 15 +++++
 builtin/multi-pack-index.c             | 11 +++-
 builtin/repack.c                       |  2 +-
 midx.c                                 | 60 ++++++++++++++++---
 midx.h                                 |  6 +-
 t/t5326-multi-pack-bitmaps.sh          | 82 ++++++++++++++++++++++++++
 6 files changed, 163 insertions(+), 13 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 009c989ef8..27f83932e4 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -49,6 +49,21 @@ write::
 	--stdin-packs::
 		Write a multi-pack index containing only the set of
 		line-delimited pack index basenames provided over stdin.
+
+	--refs-snapshot=<path>::
+		With `--bitmap`, optionally specify a file which
+		contains a "refs snapshot" taken prior to repacking.
++
+A reference snapshot is composed of line-delimited OIDs corresponding to
+the reference tips, usually taken by `git repack` prior to generating a
+new pack. A line may optionally start with a `+` character to indicate
+that the reference which corresponds to that OID is "preferred" (see
+linkgit:git-config[1]'s `pack.preferBitmapTips`.)
++
+The file given at `<path>` is expected to be readable, and can contain
+duplicates. (If a given OID is given more than once, it is marked as
+preferred if at least one instance of it begins with the special `+`
+marker).
 --
 
 verify::
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 047647b5f2..4b827a07c0 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -7,7 +7,8 @@
 #include "object-store.h"
 
 #define BUILTIN_MIDX_WRITE_USAGE \
-	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]")
+	N_("git multi-pack-index [<options>] write [--preferred-pack=<pack>]" \
+	   "[--refs-snapshot=<path>]")
 
 #define BUILTIN_MIDX_VERIFY_USAGE \
 	N_("git multi-pack-index [<options>] verify")
@@ -45,6 +46,7 @@ static char const * const builtin_multi_pack_index_usage[] = {
 static struct opts_multi_pack_index {
 	const char *object_dir;
 	const char *preferred_pack;
+	const char *refs_snapshot;
 	unsigned long batch_size;
 	unsigned flags;
 	int stdin_packs;
@@ -83,6 +85,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_BOOL(0, "stdin-packs", &opts.stdin_packs,
 			 N_("write multi-pack index containing only given indexes")),
+		OPT_FILENAME(0, "refs-snapshot", &opts.refs_snapshot,
+			     N_("refs snapshot for selecting bitmap commits")),
 		OPT_END(),
 	};
 
@@ -106,7 +110,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		read_packs_from_stdin(&packs);
 
 		ret = write_midx_file_only(opts.object_dir, &packs,
-					   opts.preferred_pack, opts.flags);
+					   opts.preferred_pack,
+					   opts.refs_snapshot, opts.flags);
 
 		string_list_clear(&packs, 0);
 
@@ -114,7 +119,7 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 
 	}
 	return write_midx_file(opts.object_dir, opts.preferred_pack,
-			       opts.flags);
+			       opts.refs_snapshot, opts.flags);
 }
 
 static int cmd_multi_pack_index_verify(int argc, const char **argv)
diff --git a/builtin/repack.c b/builtin/repack.c
index 82ab668272..27158a897b 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -733,7 +733,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		unsigned flags = 0;
 		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
 			flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
-		write_midx_file(get_object_directory(), NULL, flags);
+		write_midx_file(get_object_directory(), NULL, NULL, flags);
 	}
 
 	string_list_clear(&names, 0);
diff --git a/midx.c b/midx.c
index 0330202fda..97ba3421f2 100644
--- a/midx.c
+++ b/midx.c
@@ -968,7 +968,42 @@ static void bitmap_show_commit(struct commit *commit, void *_data)
 	data->commits[data->commits_nr++] = commit;
 }
 
+static int read_refs_snapshot(const char *refs_snapshot,
+			      struct rev_info *revs)
+{
+	struct strbuf buf = STRBUF_INIT;
+	struct object_id oid;
+	FILE *f = xfopen(refs_snapshot, "r");
+
+	while (strbuf_getline(&buf, f) != EOF) {
+		struct object *object;
+		int preferred = 0;
+		char *hex = buf.buf;
+		const char *end = NULL;
+
+		if (buf.len && *buf.buf == '+') {
+			preferred = 1;
+			hex = &buf.buf[1];
+		}
+
+		if (parse_oid_hex(hex, &oid, &end) < 0)
+			die(_("could not parse line: %s"), buf.buf);
+		if (*end)
+			die(_("malformed line: %s"), buf.buf);
+
+		object = parse_object_or_die(&oid, NULL);
+		if (preferred)
+			object->flags |= NEEDS_BITMAP;
+
+		add_pending_object(revs, object, "");
+	}
+
+	fclose(f);
+	return 0;
+}
+
 static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
+						    const char *refs_snapshot,
 						    struct write_midx_context *ctx)
 {
 	struct rev_info revs;
@@ -977,8 +1012,12 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
 	cb.ctx = ctx;
 
 	repo_init_revisions(the_repository, &revs, NULL);
-	setup_revisions(0, NULL, &revs, NULL);
-	for_each_ref(add_ref_to_pending, &revs);
+	if (refs_snapshot) {
+		read_refs_snapshot(refs_snapshot, &revs);
+	} else {
+		setup_revisions(0, NULL, &revs, NULL);
+		for_each_ref(add_ref_to_pending, &revs);
+	}
 
 	/*
 	 * Skipping promisor objects here is intentional, since it only excludes
@@ -1007,6 +1046,7 @@ static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr
 
 static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 			     struct write_midx_context *ctx,
+			     const char *refs_snapshot,
 			     unsigned flags)
 {
 	struct packing_data pdata;
@@ -1018,7 +1058,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
 
 	prepare_midx_packing_data(&pdata, ctx);
 
-	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
+	commits = find_commits_for_midx_bitmap(&commits_nr, refs_snapshot, ctx);
 
 	/*
 	 * Build the MIDX-order index based on pdata.objects (which is already
@@ -1066,6 +1106,7 @@ static int write_midx_internal(const char *object_dir,
 			       struct string_list *packs_to_include,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
+			       const char *refs_snapshot,
 			       unsigned flags)
 {
 	char *midx_name;
@@ -1359,7 +1400,8 @@ static int write_midx_internal(const char *object_dir,
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
 	if (flags & MIDX_WRITE_BITMAP) {
-		if (write_midx_bitmap(midx_name, midx_hash, &ctx, flags) < 0) {
+		if (write_midx_bitmap(midx_name, midx_hash, &ctx,
+				      refs_snapshot, flags) < 0) {
 			error(_("could not write multi-pack bitmap"));
 			result = 1;
 			goto cleanup;
@@ -1394,19 +1436,21 @@ static int write_midx_internal(const char *object_dir,
 
 int write_midx_file(const char *object_dir,
 		    const char *preferred_pack_name,
+		    const char *refs_snapshot,
 		    unsigned flags)
 {
 	return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name,
-				   flags);
+				   refs_snapshot, flags);
 }
 
 int write_midx_file_only(const char *object_dir,
 			 struct string_list *packs_to_include,
 			 const char *preferred_pack_name,
+			 const char *refs_snapshot,
 			 unsigned flags)
 {
 	return write_midx_internal(object_dir, packs_to_include, NULL,
-				   preferred_pack_name, flags);
+				   preferred_pack_name, refs_snapshot, flags);
 }
 
 struct clear_midx_data {
@@ -1686,7 +1730,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla
 	free(count);
 
 	if (packs_to_drop.nr) {
-		result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, flags);
+		result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, NULL, flags);
 		m = NULL;
 	}
 
@@ -1877,7 +1921,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 		goto cleanup;
 	}
 
-	result = write_midx_internal(object_dir, NULL, NULL, NULL, flags);
+	result = write_midx_internal(object_dir, NULL, NULL, NULL, NULL, flags);
 	m = NULL;
 
 cleanup:
diff --git a/midx.h b/midx.h
index 80f502d39b..c0b4e21df4 100644
--- a/midx.h
+++ b/midx.h
@@ -62,10 +62,14 @@ int fill_midx_entry(struct repository *r, const struct object_id *oid, struct pa
 int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name);
 int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
 
-int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
+int write_midx_file(const char *object_dir,
+		    const char *preferred_pack_name,
+		    const char *refs_snapshot,
+		    unsigned flags);
 int write_midx_file_only(const char *object_dir,
 			 struct string_list *packs_to_include,
 			 const char *preferred_pack_name,
+			 const char *refs_snapshot,
 			 unsigned flags);
 void clear_midx_file(struct repository *r);
 int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags);
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 4ad7c2c969..069dab3e17 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -283,4 +283,86 @@ test_expect_success 'pack.preferBitmapTips' '
 	)
 '
 
+test_expect_success 'writing a bitmap with --refs-snapshot' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit one &&
+		test_commit two &&
+
+		git rev-parse one >snapshot &&
+
+		git repack -ad &&
+
+		# First, write a MIDX which see both refs/tags/one and
+		# refs/tags/two (causing both of those commits to receive
+		# bitmaps).
+		git multi-pack-index write --bitmap &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+		test-tool bitmap list-commits | sort >bitmaps &&
+		grep "$(git rev-parse one)" bitmaps &&
+		grep "$(git rev-parse two)" bitmaps &&
+
+		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+		rm -fr $midx-$(midx_checksum $objdir).rev &&
+		rm -fr $midx &&
+
+		# Then again, but with a refs snapshot which only sees
+		# refs/tags/one.
+		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+		test-tool bitmap list-commits | sort >bitmaps &&
+		grep "$(git rev-parse one)" bitmaps &&
+		! grep "$(git rev-parse two)" bitmaps
+	)
+'
+
+test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit_bulk --message="%s" 103 &&
+
+		git log --format="%H" >commits.raw &&
+		sort <commits.raw >commits &&
+
+		git log --format="create refs/tags/%s %H" HEAD >refs &&
+		git update-ref --stdin <refs &&
+
+		git multi-pack-index write --bitmap &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >before &&
+		test_line_count = 1 before &&
+
+		(
+			grep -vf before commits.raw &&
+			# mark missing commits as preferred
+			sed "s/^/+/" before
+		) >snapshot &&
+
+		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+		rm -fr $midx-$(midx_checksum $objdir).rev &&
+		rm -fr $midx &&
+
+		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >after &&
+
+		! test_cmp before after
+	)
+'
+
 test_done
-- 
2.33.0.96.g73915697e6


  parent reply	other threads:[~2021-09-15 18:24 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-11  3:32 [PATCH 0/8] repack: introduce `--write-midx` Taylor Blau
2021-09-11  3:32 ` [PATCH 1/8] midx: expose 'write_midx_file_only()' publicly Taylor Blau
2021-09-11  5:00   ` Junio C Hamano
2021-09-11 16:17     ` Taylor Blau
2021-09-11 10:07   ` Ævar Arnfjörð Bjarmason
2021-09-11 16:21     ` Taylor Blau
2021-09-11  3:32 ` [PATCH 2/8] builtin/multi-pack-index.c: support --stdin-packs mode Taylor Blau
2021-09-11 10:05   ` Ævar Arnfjörð Bjarmason
2021-09-11 16:25     ` Taylor Blau
2021-09-11 16:28       ` Taylor Blau
2021-09-12  2:08       ` Eric Sunshine
2021-09-12  2:21         ` Taylor Blau
2021-09-12 15:15           ` Ævar Arnfjörð Bjarmason
2021-09-12 22:30             ` Junio C Hamano
2021-09-12 22:32               ` Ævar Arnfjörð Bjarmason
2021-09-14 19:02       ` Jeff King
2021-09-14 23:48         ` Taylor Blau
2021-09-15  1:55           ` Eric Sunshine
2021-09-11  3:32 ` [PATCH 3/8] midx: preliminary support for `--refs-snapshot` Taylor Blau
2021-09-11  3:32 ` [PATCH 4/8] builtin/repack.c: keep track of existing packs unconditionally Taylor Blau
2021-09-11  3:32 ` [PATCH 5/8] builtin/repack.c: extract showing progress to a variable Taylor Blau
2021-09-11  3:32 ` [PATCH 6/8] builtin/repack.c: support writing a MIDX while repacking Taylor Blau
2021-09-11  3:32 ` [PATCH 7/8] builtin/repack.c: make largest pack preferred Taylor Blau
2021-09-11 10:17   ` Ævar Arnfjörð Bjarmason
2021-09-11 16:35     ` Taylor Blau
2021-09-11  3:32 ` [PATCH 8/8] builtin/repack.c: pass `--refs-snapshot` when writing bitmaps Taylor Blau
2021-09-11 10:27   ` Ævar Arnfjörð Bjarmason
2021-09-11 11:19     ` Ævar Arnfjörð Bjarmason
2021-09-11 16:51       ` Taylor Blau
2021-09-14 18:55         ` Jeff King
2021-09-14 23:34           ` Taylor Blau
2021-09-14 23:56             ` Ævar Arnfjörð Bjarmason
2021-09-15  4:31               ` Taylor Blau
2021-09-11 16:49     ` Taylor Blau
2021-09-15 18:24 ` [PATCH v2 0/8] repack: introduce `--write-midx` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 1/8] midx: expose `write_midx_file_only()` publicly Taylor Blau
2021-09-22 23:14     ` Jonathan Tan
2021-09-23  3:09       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 2/8] builtin/multi-pack-index.c: support `--stdin-packs` mode Taylor Blau
2021-09-22 22:29     ` Josh Steadmon
2021-09-23  2:03       ` Taylor Blau
2021-09-22 23:11     ` Jonathan Tan
2021-09-23  2:06       ` Taylor Blau
2021-09-15 18:24   ` Taylor Blau [this message]
2021-09-22 22:34     ` [PATCH v2 3/8] midx: preliminary support for `--refs-snapshot` Josh Steadmon
2021-09-23  2:08       ` Taylor Blau
2021-09-22 23:00     ` Jonathan Tan
2021-09-23  2:18       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 4/8] builtin/repack.c: keep track of existing packs unconditionally Taylor Blau
2021-09-22 22:56     ` Jonathan Tan
2021-09-23  2:59       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 5/8] builtin/repack.c: extract showing progress to a variable Taylor Blau
2021-09-15 18:24   ` [PATCH v2 6/8] builtin/repack.c: support writing a MIDX while repacking Taylor Blau
2021-09-22 22:39     ` Jonathan Tan
2021-09-23  2:40       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 7/8] builtin/repack.c: make largest pack preferred Taylor Blau
2021-09-15 18:24   ` [PATCH v2 8/8] builtin/repack.c: pass `--refs-snapshot` when writing bitmaps Taylor Blau
2021-09-24 18:22     ` Jonathan Tan
2021-10-01 22:38       ` Taylor Blau
2021-09-15 19:22   ` [PATCH v2 0/8] repack: introduce `--write-midx` Junio C Hamano
2021-09-15 19:29     ` Junio C Hamano
2021-09-15 21:19       ` Taylor Blau
2021-09-16 22:16         ` Junio C Hamano
2021-09-29  1:54   ` [PATCH v3 0/9] " Taylor Blau
2021-09-29  1:55     ` [PATCH v3 1/9] midx: expose `write_midx_file_only()` publicly Taylor Blau
2021-09-29  1:55     ` [PATCH v3 2/9] builtin/multi-pack-index.c: support `--stdin-packs` mode Taylor Blau
2021-09-29  1:55     ` [PATCH v3 3/9] midx: preliminary support for `--refs-snapshot` Taylor Blau
2021-09-29  1:55     ` [PATCH v3 4/9] builtin/repack.c: keep track of existing packs unconditionally Taylor Blau
2021-09-29  1:55     ` [PATCH v3 5/9] builtin/repack.c: rename variables that deal with non-kept packs Taylor Blau
2021-09-29  1:55     ` [PATCH v3 6/9] builtin/repack.c: extract showing progress to a variable Taylor Blau
2021-09-29  1:55     ` [PATCH v3 7/9] builtin/repack.c: support writing a MIDX while repacking Taylor Blau
2021-09-29  1:55     ` [PATCH v3 8/9] builtin/repack.c: make largest pack preferred Taylor Blau
2021-09-29  1:55     ` [PATCH v3 9/9] builtin/repack.c: pass `--refs-snapshot` when writing bitmaps Taylor Blau
2021-09-29  4:24     ` [PATCH v3 0/9] repack: introduce `--write-midx` Junio C Hamano
2021-10-01 20:01     ` Jonathan Tan
2021-10-01 22:40       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=42f1ae9edeb55d8e1b0c8f2c3110e9c8326a5fc1.1631730270.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).