git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: peff@peff.net, avarab@gmail.com, gitster@pobox.com,
	jonathantanmy@google.com, steadmon@google.com
Subject: [PATCH v3 0/9] repack: introduce `--write-midx`
Date: Tue, 28 Sep 2021 21:54:37 -0400	[thread overview]
Message-ID: <cover.1632880469.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1631730270.git.me@ttaylorr.com>

Here is another small reroll of my series to implement the final component of
multi-pack reachability bitamps, which is to be able to write them from `git
repack`.

This version incorporates feedback from Jonathan Tan and others at Google who
graciously provided review. A range-diff is included below, but the major
changes are:

  - A comment to explain the relationship between 'ctx->m' and 'ctx->to_include'
    in midx.c:add_pack_to_midx().

  - A comment to explain the meaning of write_midx_file_only().

  - Clean-up of stray hunks, a missing strbuf_release().

  - Replacing the name `existing_packs` with `existing_nonkept_packs` to
    emphasize the relationship between it and the similarly-named
    `existing_kept_packs`.

Instead of depending on tb/multi-pack-bitmaps, this series now depends on the
tip of master.

Taylor Blau (9):
  midx: expose `write_midx_file_only()` publicly
  builtin/multi-pack-index.c: support `--stdin-packs` mode
  midx: preliminary support for `--refs-snapshot`
  builtin/repack.c: keep track of existing packs unconditionally
  builtin/repack.c: rename variables that deal with non-kept packs
  builtin/repack.c: extract showing progress to a variable
  builtin/repack.c: support writing a MIDX while repacking
  builtin/repack.c: make largest pack preferred
  builtin/repack.c: pass `--refs-snapshot` when writing bitmaps

 Documentation/git-multi-pack-index.txt |  19 ++
 Documentation/git-repack.txt           |  18 +-
 builtin/multi-pack-index.c             |  36 +++-
 builtin/repack.c                       | 279 ++++++++++++++++++++++---
 midx.c                                 | 110 ++++++++--
 midx.h                                 |  15 +-
 pack-bitmap.c                          |   2 +-
 pack-bitmap.h                          |   1 +
 t/helper/test-read-midx.c              |  25 ++-
 t/lib-midx.sh                          |   8 +
 t/t5319-multi-pack-index.sh            |  15 ++
 t/t5326-multi-pack-bitmaps.sh          |  82 ++++++++
 t/t7700-repack.sh                      |  96 +++++++++
 t/t7703-repack-geometric.sh            |  24 ++-
 14 files changed, 674 insertions(+), 56 deletions(-)
 create mode 100644 t/lib-midx.sh

Range-diff against v2:
 1:  03c1b2c4d3 !  1:  b7c72d0bf5 midx: expose `write_midx_file_only()` publicly
    @@ midx.c: struct write_midx_context {
      
      static void add_pack_to_midx(const char *full_path, size_t full_path_len,
     @@ midx.c: static void add_pack_to_midx(const char *full_path, size_t full_path_len,
    + 
    + 	if (ends_with(file_name, ".idx")) {
      		display_progress(ctx->progress, ++ctx->pack_paths_checked);
    ++		/*
    ++		 * Note that at most one of ctx->m and ctx->to_include are set,
    ++		 * so we are testing midx_contains_pack() and
    ++		 * string_list_has_string() independently (guarded by the
    ++		 * appropriate NULL checks).
    ++		 *
    ++		 * We could support passing to_include while reusing an existing
    ++		 * MIDX, but don't currently since the reuse process drags
    ++		 * forward all packs from an existing MIDX (without checking
    ++		 * whether or not they appear in the to_include list).
    ++		 *
    ++		 * If we added support for that, these next two conditional
    ++		 * should be performed independently (likely checking
    ++		 * to_include before the existing MIDX).
    ++		 */
      		if (ctx->m && midx_contains_pack(ctx->m, file_name))
      			return;
     +		else if (ctx->to_include &&
    @@ midx.c: static int write_midx_internal(const char *object_dir,
      		struct bitmap_index *bitmap_git;
      		int bitmap_exists;
      		int want_bitmap = flags & MIDX_WRITE_BITMAP;
    -@@ midx.c: static int write_midx_internal(const char *object_dir,
    - 
    - 	QSORT(ctx.info, ctx.nr, pack_info_compare);
    - 
    --	if (packs_to_drop && packs_to_drop->nr) {
    -+	if (ctx.m && packs_to_drop && packs_to_drop->nr) {
    - 		int drop_index = 0;
    - 		int missing_drops = 0;
    - 
     @@ midx.c: int write_midx_file(const char *object_dir,
      		    const char *preferred_pack_name,
      		    unsigned flags)
    @@ midx.h: int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pa
      int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
      
      int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
    ++/*
    ++ * Variant of write_midx_file which writes a MIDX containing only the packs
    ++ * specified in packs_to_include.
    ++ */
     +int write_midx_file_only(const char *object_dir,
     +			 struct string_list *packs_to_include,
     +			 const char *preferred_pack_name,
 2:  59556e5545 =  2:  986ef14f2a builtin/multi-pack-index.c: support `--stdin-packs` mode
 3:  42f1ae9ede !  3:  4e3769a4f3 midx: preliminary support for `--refs-snapshot`
    @@ Commit message
         commit in the MIDX reaches something that isn't.
     
         This can happen when a multi-pack index contains some pack which refers
    -    to loose objects (which by definition aren't included in the multi-pack
    -    index).
    +    to loose objects (e.g., if a pack was pushed after starting the repack
    +    but before generating the MIDX which depends on an object which is
    +    stored as loose in the repository, and by definition isn't included in
    +    the multi-pack index).
     
         By taking a snapshot of the references before we start repacking, we can
         close that race window. In the above scenario (where we have a packed
    @@ midx.c: static void bitmap_show_commit(struct commit *commit, void *_data)
     +	}
     +
     +	fclose(f);
    ++	strbuf_release(&buf);
     +	return 0;
     +}
     +
    @@ midx.h: int fill_midx_entry(struct repository *r, const struct object_id *oid, s
      int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local);
      
     -int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags);
    + /*
    +  * Variant of write_midx_file which writes a MIDX containing only the packs
    +  * specified in packs_to_include.
    +  */
     +int write_midx_file(const char *object_dir,
     +		    const char *preferred_pack_name,
     +		    const char *refs_snapshot,
 4:  c0d045a9de !  4:  1b3dd331ca builtin/repack.c: keep track of existing packs unconditionally
    @@ Commit message
     
      ## builtin/repack.c ##
     @@ builtin/repack.c: static void remove_pack_on_signal(int signo)
    -  * have a corresponding .keep file. These packs are not to
    -  * be kept if we are going to pack everything into one file.
    + }
    + 
    + /*
    +- * Adds all packs hex strings to the fname list, which do not
    +- * have a corresponding .keep file. These packs are not to
    +- * be kept if we are going to pack everything into one file.
    ++ * Adds all packs hex strings to either fname_list or fname_kept_list
    ++ * based on whether each pack has a corresponding .keep file or not.
    ++ * Packs without a .keep file are not to be kept if we are going to
    ++ * pack everything into one file.
       */
     -static void get_non_kept_pack_filenames(struct string_list *fname_list,
     -					const struct string_list *extra_keep)
 -:  ---------- >  5:  15831a201a builtin/repack.c: rename variables that deal with non-kept packs
 5:  09de03de47 =  6:  1a40161baf builtin/repack.c: extract showing progress to a variable
 6:  83dfdb8b12 !  7:  6854f0751d builtin/repack.c: support writing a MIDX while repacking
    @@ Commit message
         check we can make consistently when (1) telling the MIDX which packs we
         want to exclude, and (2) actually unlinking the redundant packs.
     
    +    There is also a tiny change to short-circuit the body of
    +    write_midx_included_packs() when no packs remain in the case of an empty
    +    repository. The MIDX code does not handle this, so avoid trying to
    +    generate a MIDX covering zero packs in the first place.
    +
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## Documentation/git-repack.txt ##
    @@ builtin/repack.c: static void clear_pack_geometry(struct pack_geometry *geometry
      }
      
     +static void midx_included_packs(struct string_list *include,
    -+				struct string_list *existing_packs,
    ++				struct string_list *existing_nonkept_packs,
     +				struct string_list *existing_kept_packs,
     +				struct string_list *names,
     +				struct pack_geometry *geometry)
    @@ builtin/repack.c: static void clear_pack_geometry(struct pack_geometry *geometry
     +			string_list_insert(include, strbuf_detach(&buf, NULL));
     +		}
     +	} else {
    -+		for_each_string_list_item(item, existing_packs) {
    ++		for_each_string_list_item(item, existing_nonkept_packs) {
     +			if (item->util)
     +				continue;
     +			string_list_insert(include, xstrfmt("%s.idx", item->string));
    @@ builtin/repack.c: static void clear_pack_geometry(struct pack_geometry *geometry
     +	FILE *in;
     +	int ret;
     +
    ++	if (!include->nr)
    ++		return 0;
    ++
     +	cmd.in = -1;
     +	cmd.git_cmd = 1;
     +
    @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix
     +	if (delete_redundant && pack_everything & ALL_INTO_ONE) {
     +		const int hexsz = the_hash_algo->hexsz;
     +		string_list_sort(&names);
    -+		for_each_string_list_item(item, &existing_packs) {
    ++		for_each_string_list_item(item, &existing_nonkept_packs) {
     +			char *sha1;
     +			size_t len = strlen(item->string);
     +			if (len < hexsz)
     +				continue;
     +			sha1 = item->string + len - hexsz;
    ++			/*
    ++			 * Mark this pack for deletion, which ensures that this
    ++			 * pack won't be included in a MIDX (if `--write-midx`
    ++			 * was given) and that we will actually delete this pack
    ++			 * (if `-d` was given).
    ++			 */
     +			item->util = (void*)(intptr_t)!string_list_has_string(&names, sha1);
     +		}
     +	}
     +
     +	if (write_midx) {
     +		struct string_list include = STRING_LIST_INIT_NODUP;
    -+		midx_included_packs(&include, &existing_packs,
    ++		midx_included_packs(&include, &existing_nonkept_packs,
     +				    &existing_kept_packs, &names, geometry);
     +
     +		ret = write_midx_included_packs(&include,
    @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix
     -		if (pack_everything & ALL_INTO_ONE) {
     -			const int hexsz = the_hash_algo->hexsz;
     -			string_list_sort(&names);
    --			for_each_string_list_item(item, &existing_packs) {
    +-			for_each_string_list_item(item, &existing_nonkept_packs) {
     -				char *sha1;
     -				size_t len = strlen(item->string);
     -				if (len < hexsz)
    @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix
     -				if (!string_list_has_string(&names, sha1))
     -					remove_redundant_pack(packdir, item->string);
     -			}
    -+		for_each_string_list_item(item, &existing_packs) {
    ++		for_each_string_list_item(item, &existing_nonkept_packs) {
     +			if (!item->util)
     +				continue;
     +			remove_redundant_pack(packdir, item->string);
    @@ t/t7700-repack.sh: test_expect_success 'auto-bitmaps do not complain if unavaila
     +'
     +
      test_done
    +
    + ## t/t7703-repack-geometric.sh ##
    +@@ t/t7703-repack-geometric.sh: test_expect_success '--geometric with no packs' '
    + 	(
    + 		cd geometric &&
    + 
    +-		git repack --geometric 2 >out &&
    ++		git repack --write-midx --geometric 2 >out &&
    + 		test_i18ngrep "Nothing new to pack" out
    + 	)
    + '
 7:  68bc49d8ae !  8:  3596c76daf builtin/repack.c: make largest pack preferred
    @@ builtin/repack.c: static int write_midx_included_packs(struct string_list *inclu
      	if (ret)
      		return ret;
     @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix)
    - 		midx_included_packs(&include, &existing_packs,
    + 		midx_included_packs(&include, &existing_nonkept_packs,
      				    &existing_kept_packs, &names, geometry);
      
     -		ret = write_midx_included_packs(&include,
 8:  eb24b308ec !  9:  d99f075321 builtin/repack.c: pass `--refs-snapshot` when writing bitmaps
    @@ builtin/repack.c: static void clear_pack_geometry(struct pack_geometry *geometry
     +}
     +
      static void midx_included_packs(struct string_list *include,
    - 				struct string_list *existing_packs,
    + 				struct string_list *existing_nonkept_packs,
      				struct string_list *existing_kept_packs,
     @@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
      
-- 
2.33.0.96.g73915697e6

  parent reply	other threads:[~2021-09-29  1:55 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-11  3:32 [PATCH 0/8] repack: introduce `--write-midx` Taylor Blau
2021-09-11  3:32 ` [PATCH 1/8] midx: expose 'write_midx_file_only()' publicly Taylor Blau
2021-09-11  5:00   ` Junio C Hamano
2021-09-11 16:17     ` Taylor Blau
2021-09-11 10:07   ` Ævar Arnfjörð Bjarmason
2021-09-11 16:21     ` Taylor Blau
2021-09-11  3:32 ` [PATCH 2/8] builtin/multi-pack-index.c: support --stdin-packs mode Taylor Blau
2021-09-11 10:05   ` Ævar Arnfjörð Bjarmason
2021-09-11 16:25     ` Taylor Blau
2021-09-11 16:28       ` Taylor Blau
2021-09-12  2:08       ` Eric Sunshine
2021-09-12  2:21         ` Taylor Blau
2021-09-12 15:15           ` Ævar Arnfjörð Bjarmason
2021-09-12 22:30             ` Junio C Hamano
2021-09-12 22:32               ` Ævar Arnfjörð Bjarmason
2021-09-14 19:02       ` Jeff King
2021-09-14 23:48         ` Taylor Blau
2021-09-15  1:55           ` Eric Sunshine
2021-09-11  3:32 ` [PATCH 3/8] midx: preliminary support for `--refs-snapshot` Taylor Blau
2021-09-11  3:32 ` [PATCH 4/8] builtin/repack.c: keep track of existing packs unconditionally Taylor Blau
2021-09-11  3:32 ` [PATCH 5/8] builtin/repack.c: extract showing progress to a variable Taylor Blau
2021-09-11  3:32 ` [PATCH 6/8] builtin/repack.c: support writing a MIDX while repacking Taylor Blau
2021-09-11  3:32 ` [PATCH 7/8] builtin/repack.c: make largest pack preferred Taylor Blau
2021-09-11 10:17   ` Ævar Arnfjörð Bjarmason
2021-09-11 16:35     ` Taylor Blau
2021-09-11  3:32 ` [PATCH 8/8] builtin/repack.c: pass `--refs-snapshot` when writing bitmaps Taylor Blau
2021-09-11 10:27   ` Ævar Arnfjörð Bjarmason
2021-09-11 11:19     ` Ævar Arnfjörð Bjarmason
2021-09-11 16:51       ` Taylor Blau
2021-09-14 18:55         ` Jeff King
2021-09-14 23:34           ` Taylor Blau
2021-09-14 23:56             ` Ævar Arnfjörð Bjarmason
2021-09-15  4:31               ` Taylor Blau
2021-09-11 16:49     ` Taylor Blau
2021-09-15 18:24 ` [PATCH v2 0/8] repack: introduce `--write-midx` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 1/8] midx: expose `write_midx_file_only()` publicly Taylor Blau
2021-09-22 23:14     ` Jonathan Tan
2021-09-23  3:09       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 2/8] builtin/multi-pack-index.c: support `--stdin-packs` mode Taylor Blau
2021-09-22 22:29     ` Josh Steadmon
2021-09-23  2:03       ` Taylor Blau
2021-09-22 23:11     ` Jonathan Tan
2021-09-23  2:06       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 3/8] midx: preliminary support for `--refs-snapshot` Taylor Blau
2021-09-22 22:34     ` Josh Steadmon
2021-09-23  2:08       ` Taylor Blau
2021-09-22 23:00     ` Jonathan Tan
2021-09-23  2:18       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 4/8] builtin/repack.c: keep track of existing packs unconditionally Taylor Blau
2021-09-22 22:56     ` Jonathan Tan
2021-09-23  2:59       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 5/8] builtin/repack.c: extract showing progress to a variable Taylor Blau
2021-09-15 18:24   ` [PATCH v2 6/8] builtin/repack.c: support writing a MIDX while repacking Taylor Blau
2021-09-22 22:39     ` Jonathan Tan
2021-09-23  2:40       ` Taylor Blau
2021-09-15 18:24   ` [PATCH v2 7/8] builtin/repack.c: make largest pack preferred Taylor Blau
2021-09-15 18:24   ` [PATCH v2 8/8] builtin/repack.c: pass `--refs-snapshot` when writing bitmaps Taylor Blau
2021-09-24 18:22     ` Jonathan Tan
2021-10-01 22:38       ` Taylor Blau
2021-09-15 19:22   ` [PATCH v2 0/8] repack: introduce `--write-midx` Junio C Hamano
2021-09-15 19:29     ` Junio C Hamano
2021-09-15 21:19       ` Taylor Blau
2021-09-16 22:16         ` Junio C Hamano
2021-09-29  1:54   ` Taylor Blau [this message]
2021-09-29  1:55     ` [PATCH v3 1/9] midx: expose `write_midx_file_only()` publicly Taylor Blau
2021-09-29  1:55     ` [PATCH v3 2/9] builtin/multi-pack-index.c: support `--stdin-packs` mode Taylor Blau
2021-09-29  1:55     ` [PATCH v3 3/9] midx: preliminary support for `--refs-snapshot` Taylor Blau
2021-09-29  1:55     ` [PATCH v3 4/9] builtin/repack.c: keep track of existing packs unconditionally Taylor Blau
2021-09-29  1:55     ` [PATCH v3 5/9] builtin/repack.c: rename variables that deal with non-kept packs Taylor Blau
2021-09-29  1:55     ` [PATCH v3 6/9] builtin/repack.c: extract showing progress to a variable Taylor Blau
2021-09-29  1:55     ` [PATCH v3 7/9] builtin/repack.c: support writing a MIDX while repacking Taylor Blau
2021-09-29  1:55     ` [PATCH v3 8/9] builtin/repack.c: make largest pack preferred Taylor Blau
2021-09-29  1:55     ` [PATCH v3 9/9] builtin/repack.c: pass `--refs-snapshot` when writing bitmaps Taylor Blau
2021-09-29  4:24     ` [PATCH v3 0/9] repack: introduce `--write-midx` Junio C Hamano
2021-10-01 20:01     ` Jonathan Tan
2021-10-01 22:40       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1632880469.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=peff@peff.net \
    --cc=steadmon@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).