All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: Son Luong Ngoc <sluongng@gmail.com>
Subject: [PATCH v2 0/2] midx: apply gitconfig to midx repack
Date: Wed, 06 May 2020 09:43:12 +0000	[thread overview]
Message-ID: <pull.626.v2.git.1588758194.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.626.git.1588684003766.gitgitgadget@gmail.com>

Midx repack has largely been used in Microsoft Scalar on the client side to
optimize the repository multiple packs state. However when I tried to apply
this onto the server-side, I realized that there are certain features that
were lacking compare to git repack. Most of these features are highly
desirable on the server-side to create the most optimized pack possible.

One of the example is delta_base_offset, comparing an midx repack
with/without delta_base_offset, we can observe significant size differences.

> du objects/pack/*pack
14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack

Latest 2.26.2 (without delta_base_offset)
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack

With delta_base_offset
> git version
git version 2.26.2.672.g232c24e857.dirty
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack

In this patch, I intentionally leaving out repack.writeBitmaps as I see that
it might need some update on pack-objects to improve the performance

Derrick Stolee following patch with address repack. packKeptObjects support.

Derrick Stolee (1):
  multi-pack-index: respect repack.packKeptObjects=false

Son Luong Ngoc (1):
  midx: apply gitconfig to midx repack

 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 36 ++++++++++++++++++++++----
 t/t5319-multi-pack-index.sh            | 26 +++++++++++++++++++
 3 files changed, 60 insertions(+), 5 deletions(-)


base-commit: b34789c0b0d3b137f0bb516b417bd8d75e0cb306
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/626

Range-diff vs v1:

 1:  215c882a503 ! 1:  21c648cc486 midx: apply gitconfig to midx repack
     @@ Commit message
          In this patch, I applies those flags into `git multi-pack-index repack`
          so that it respect the `repack.*` config series.
      
     -    Note: I left out `repack.packKeptObjects` intentionally as I dont think
     -    its relevant to midx repack use case.
     +    Note:
     +    - `repack.packKeptObjects` will be addressed by Derrick Stolee in
     +    the following patch
     +    - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
     +    requires `--all` to be passed onto `git pack-objects`, which is very
     +    slow. I think it would be nice to have this in a future patch.
      
          Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
      
       ## midx.c ##
     -@@ midx.c: static int fill_included_packs_batch(struct repository *r,
     - 	return 0;
     - }
     +@@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
     + 	struct child_process cmd = CHILD_PROCESS_INIT;
     + 	struct strbuf base_name = STRBUF_INIT;
     + 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
     ++	int delta_base_offset = 1;
     ++	int use_delta_islands;
       
     -+static int delta_base_offset = 1;
     -+static int write_bitmaps = -1;
     -+static int use_delta_islands;
     -+
     - int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
     - {
     - 	int result = 0;
     + 	if (!m)
     + 		return 0;
      @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
       	} else if (fill_included_packs_all(m, include_pack))
       		goto cleanup;
       
     -+  git_config_get_bool("repack.usedeltabaseoffset", &delta_base_offset);
     -+  git_config_get_bool("repack.writebitmaps", &write_bitmaps);
     -+  git_config_get_bool("repack.usedeltaislands", &use_delta_islands);
     ++	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
     ++	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
      +
       	argv_array_push(&cmd.args, "pack-objects");
       
     @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t bat
      +		argv_array_push(&cmd.args, "--delta-base-offset");
      +	if (use_delta_islands)
      +		argv_array_push(&cmd.args, "--delta-islands");
     -+	if (write_bitmaps > 0)
     -+		argv_array_push(&cmd.args, "--write-bitmap-index");
     -+	else if (write_bitmaps < 0)
     -+		argv_array_push(&cmd.args, "--write-bitmap-index-quiet");
      +
       	if (flags & MIDX_PROGRESS)
       		argv_array_push(&cmd.args, "--progress");
 -:  ----------- > 2:  3d7b334f5c6 multi-pack-index: respect repack.packKeptObjects=false

-- 
gitgitgadget

  parent reply	other threads:[~2020-05-06  9:43 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-05 13:06 [PATCH] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
2020-05-05 13:50 ` Derrick Stolee
2020-05-05 16:03   ` Son Luong Ngoc
2020-05-06  8:56     ` Son Luong Ngoc
2020-05-06  9:43 ` Son Luong Ngoc via GitGitGadget [this message]
2020-05-06  9:43   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
2020-05-06 12:03     ` Derrick Stolee
2020-05-06 17:03     ` Junio C Hamano
2020-05-07  7:29       ` Son Luong Ngoc
2020-05-06  9:43   ` [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
2020-05-06 16:18     ` Eric Sunshine
2020-05-06 16:36       ` Derrick Stolee
2020-05-09 14:24   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
2020-05-09 14:24     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
2020-05-09 16:51       ` Junio C Hamano
2020-05-10 14:27         ` Son Luong Ngoc
2020-05-09 14:24     ` [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
2020-05-09 16:11       ` Đoàn Trần Công Danh
2020-05-09 17:33         ` Junio C Hamano
2020-05-10  6:38           ` Đoàn Trần Công Danh
2020-05-10 15:52             ` Son Luong Ngoc
2020-05-09 14:24     ` [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline Son Luong Ngoc via GitGitGadget
2020-05-09 16:55       ` Junio C Hamano
2020-05-10 16:07     ` [PATCH v4 0/2] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
2020-05-10 16:07       ` [PATCH v4 1/2] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
2020-05-10 16:07       ` [PATCH v4 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.626.v2.git.1588758194.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=sluongng@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.