All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] midx: apply gitconfig to midx repack
@ 2020-05-05 13:06 Son Luong Ngoc via GitGitGadget
  2020-05-05 13:50 ` Derrick Stolee
  2020-05-06  9:43 ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
  0 siblings, 2 replies; 26+ messages in thread
From: Son Luong Ngoc via GitGitGadget @ 2020-05-05 13:06 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc

From: Son Luong Ngoc <sluongng@gmail.com>

Multi-Pack-Index repack is an incremental, repack solutions
that allows user to consolidate multiple packfiles in a non-disruptive
way. However the new packfile could be created without some of the
capabilities of a packfile that is created by calling `git repack`.

This is because with `git repack`, there are configuration that would
enable different flags to be passed down to `git pack-objects` plumbing.

In this patch, I applies those flags into `git multi-pack-index repack`
so that it respect the `repack.*` config series.

Note: I left out `repack.packKeptObjects` intentionally as I dont think
its relevant to midx repack use case.

Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
    midx: apply gitconfig to midx repack
    
    Midx repack has largely been used in Microsoft Scalar on the client side
    to optimize the repository multiple packs state. However when I tried to
    apply this onto the server-side, I realized that there are certain
    features that were lacking compare to git repack. Most of these features
    are highly desirable on the server-side to create the most optimized
    pack possible.
    
    One of the example is delta_base_offset, comparing an midx repack
    with/without delta_base_offset, we can observe significant size
    differences.
    
    > du objects/pack/*pack
    14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
    9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack
    
    Latest 2.26.2 (without delta_base_offset)
    > git multi-pack-index write
    > git multi-pack-index repack
    > git multi-pack-index expire
    > du objects/pack/*pack
    9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack
    
    With delta_base_offset
    > git version
    git version 2.26.2.672.g232c24e857.dirty
    > git multi-pack-index write
    > git multi-pack-index repack
    > git multi-pack-index expire
    > du objects/pack/*pack
    9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack
    
    In this patch, I intentionally leaving out repack.packKeptObjects as I
    don't think its very relevant to midx repack use case:
    
     * One could always exclude biggest packs with --batch-size option
       
       
     * For non-biggest-packs exclusion use case, its rather rare (unless you
       want to have a special pack with only commits and trees being
       excluded from repack to serve partial clone better?)
       
       
    
    Please let me know if anyone think that we should include that option
    for the sake of completions.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/626

 midx.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/midx.c b/midx.c
index 9a61d3b37d9..88f16594268 100644
--- a/midx.c
+++ b/midx.c
@@ -1361,6 +1361,10 @@ static int fill_included_packs_batch(struct repository *r,
 	return 0;
 }
 
+static int delta_base_offset = 1;
+static int write_bitmaps = -1;
+static int use_delta_islands;
+
 int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
 {
 	int result = 0;
@@ -1381,12 +1385,25 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	} else if (fill_included_packs_all(m, include_pack))
 		goto cleanup;
 
+  git_config_get_bool("repack.usedeltabaseoffset", &delta_base_offset);
+  git_config_get_bool("repack.writebitmaps", &write_bitmaps);
+  git_config_get_bool("repack.usedeltaislands", &use_delta_islands);
+
 	argv_array_push(&cmd.args, "pack-objects");
 
 	strbuf_addstr(&base_name, object_dir);
 	strbuf_addstr(&base_name, "/pack/pack");
 	argv_array_push(&cmd.args, base_name.buf);
 
+	if (delta_base_offset)
+		argv_array_push(&cmd.args, "--delta-base-offset");
+	if (use_delta_islands)
+		argv_array_push(&cmd.args, "--delta-islands");
+	if (write_bitmaps > 0)
+		argv_array_push(&cmd.args, "--write-bitmap-index");
+	else if (write_bitmaps < 0)
+		argv_array_push(&cmd.args, "--write-bitmap-index-quiet");
+
 	if (flags & MIDX_PROGRESS)
 		argv_array_push(&cmd.args, "--progress");
 	else

base-commit: b34789c0b0d3b137f0bb516b417bd8d75e0cb306
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] midx: apply gitconfig to midx repack
  2020-05-05 13:06 [PATCH] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
@ 2020-05-05 13:50 ` Derrick Stolee
  2020-05-05 16:03   ` Son Luong Ngoc
  2020-05-06  9:43 ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
  1 sibling, 1 reply; 26+ messages in thread
From: Derrick Stolee @ 2020-05-05 13:50 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget, git; +Cc: Son Luong Ngoc

On 5/5/2020 9:06 AM, Son Luong Ngoc via GitGitGadget wrote:
> From: Son Luong Ngoc <sluongng@gmail.com>
> 
> Multi-Pack-Index repack is an incremental, repack solutions
> that allows user to consolidate multiple packfiles in a non-disruptive
> way. However the new packfile could be created without some of the
> capabilities of a packfile that is created by calling `git repack`.
> 
> This is because with `git repack`, there are configuration that would
> enable different flags to be passed down to `git pack-objects` plumbing.
> 
> In this patch, I applies those flags into `git multi-pack-index repack`
> so that it respect the `repack.*` config series.

This is a good idea! The fact that these are specified by 'git repack'
and not 'git pack-objects' makes intervention here necessary.

However, I don't think that all of these will apply properly.

> Note: I left out `repack.packKeptObjects` intentionally as I dont think
> its relevant to midx repack use case.

I think it would be good to add this, but in a different way.

> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
> ---
>     midx: apply gitconfig to midx repack
>     
>     Midx repack has largely been used in Microsoft Scalar on the client side
>     to optimize the repository multiple packs state. However when I tried to
>     apply this onto the server-side, I realized that there are certain
>     features that were lacking compare to git repack. Most of these features
>     are highly desirable on the server-side to create the most optimized
>     pack possible.
>     
>     One of the example is delta_base_offset, comparing an midx repack
>     with/without delta_base_offset, we can observe significant size
>     differences.
>     
>     > du objects/pack/*pack
>     14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
>     9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack
>     
>     Latest 2.26.2 (without delta_base_offset)
>     > git multi-pack-index write
>     > git multi-pack-index repack
>     > git multi-pack-index expire
>     > du objects/pack/*pack
>     9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack
>     
>     With delta_base_offset
>     > git version
>     git version 2.26.2.672.g232c24e857.dirty
>     > git multi-pack-index write
>     > git multi-pack-index repack
>     > git multi-pack-index expire
>     > du objects/pack/*pack
>     9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack
>     
>     In this patch, I intentionally leaving out repack.packKeptObjects as I
>     don't think its very relevant to midx repack use case:
>     
>      * One could always exclude biggest packs with --batch-size option
>        
>        
>      * For non-biggest-packs exclusion use case, its rather rare (unless you
>        want to have a special pack with only commits and trees being
>        excluded from repack to serve partial clone better?)
>        
>        
>     
>     Please let me know if anyone think that we should include that option
>     for the sake of completions.

In the scenario where there is a .keep pack _and_ it is small enough to get
picked up by the batch size, the 'git multi-pack-index repack' command will
create a new pack containing its objects (and objects from other packs) but
the 'git multi-pack-index expire' command will not delete the pack with .keep.

The good news is that after the first repack, the objects in the pack are
in a newer pack, so the multi-pack-index will not repack those objects from
that pack multiple times. However, this may be unintended behavior for the
user that specified the .keep pack.

I think the right thing to do to respect "repack.packKeptObjects = false" is
to ignore the packs when selecting the batch of objects. Instead of asking
you to do this, I added a patch below. Please take it into your v2, if you
don't mind.
 
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/626
> 
>  midx.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/midx.c b/midx.c
> index 9a61d3b37d9..88f16594268 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1361,6 +1361,10 @@ static int fill_included_packs_batch(struct repository *r,
>  	return 0;
>  }
>  
> +static int delta_base_offset = 1;
> +static int write_bitmaps = -1;
> +static int use_delta_islands;
> +

Why not make these local to the midx_repack method?

>  int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
>  {
>  	int result = 0;
> @@ -1381,12 +1385,25 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  	} else if (fill_included_packs_all(m, include_pack))
>  		goto cleanup;
>  
> +  git_config_get_bool("repack.usedeltabaseoffset", &delta_base_offset);
> +  git_config_get_bool("repack.writebitmaps", &write_bitmaps);
> +  git_config_get_bool("repack.usedeltaislands", &use_delta_islands);
> +

It looks like you have some spacing issues here. Perhaps use tabs?

>  	argv_array_push(&cmd.args, "pack-objects");
>  
>  	strbuf_addstr(&base_name, object_dir);
>  	strbuf_addstr(&base_name, "/pack/pack");
>  	argv_array_push(&cmd.args, base_name.buf);
>  
> +	if (delta_base_offset)
> +		argv_array_push(&cmd.args, "--delta-base-offset");
> +	if (use_delta_islands)
> +		argv_array_push(&cmd.args, "--delta-islands");

These two probably make sense.

> +	if (write_bitmaps > 0)
> +		argv_array_push(&cmd.args, "--write-bitmap-index");
> +	else if (write_bitmaps < 0)
> +		argv_array_push(&cmd.args, "--write-bitmap-index-quiet");

These make less sense. Unless --batch-size=0 and there are no .keep
packs (with the patch below) I'm not sure we _can_ write bitmap indexes
here. The pack-file is not necessarily closed under reachability. Or,
will supplying these arguments to 'git pack-objects' actually do that
closure?

I would be happy to special-case these options to the "--batch-size=0"
situation and otherwise ignore them. This then gets into enough
complication that we should update the documentation as in the patch
below.

At minimum, it would be good to have some tests that exercise these
code paths so we know they are behaving correctly.

Thanks,
-Stolee


-- >8 --
From 8a115191cbf21c553675a235c8c678affbca609b Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Tue, 5 May 2020 09:37:50 -0400
Subject: [PATCH] multi-pack-index: respect repack.packKeptObjects=false

When selecting a batch of pack-files to repack in the "git
multi-pack-index repack" command, Git should respect the
repack.packKeptObjects config option. When false, this option says that
the pack-files with an associated ".keep" file should not be repacked.
This config value is "false" by default.

There are two cases for selecting a batch of objects. The first is the
case where the input batch-size is zero, which specifies "repack
everything". The second is with a non-zero batch size, which selects
pack-files using a greedy selection criteria. Both of these cases are
updated and tested.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 26 +++++++++++++++++++++-----
 t/t5319-multi-pack-index.sh            | 26 ++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 642d9ac5b7..0c6619493c 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -56,6 +56,9 @@ repack::
 	file is created, rewrite the multi-pack-index to reference the
 	new pack-file. A later run of 'git multi-pack-index expire' will
 	delete the pack-files that were part of this batch.
++
+If `repack.packKeptObjects` is `false`, then any pack-files with an
+associated `.keep` file will not be selected for the batch to repack.
 
 
 EXAMPLES
diff --git a/midx.c b/midx.c
index 1527e464a7..d055bf3cd3 100644
--- a/midx.c
+++ b/midx.c
@@ -1280,15 +1280,26 @@ static int compare_by_mtime(const void *a_, const void *b_)
 	return 0;
 }
 
-static int fill_included_packs_all(struct multi_pack_index *m,
+static int fill_included_packs_all(struct repository *r,
+				   struct multi_pack_index *m,
 				   unsigned char *include_pack)
 {
-	uint32_t i;
+	uint32_t i, count = 0;
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(r, m, i))
+			continue;
+		if (!pack_kept_objects && m->packs[i]->pack_keep)
+			continue;
 
-	for (i = 0; i < m->num_packs; i++)
 		include_pack[i] = 1;
+		count++;
+	}
 
-	return m->num_packs < 2;
+	return count < 2;
 }
 
 static int fill_included_packs_batch(struct repository *r,
@@ -1299,6 +1310,9 @@ static int fill_included_packs_batch(struct repository *r,
 	uint32_t i, packs_to_repack;
 	size_t total_size;
 	struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info));
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
 		pack_info[i].pack_int_id = i;
@@ -1325,6 +1339,8 @@ static int fill_included_packs_batch(struct repository *r,
 
 		if (!p)
 			continue;
+		if (!pack_kept_objects && p->pack_keep)
+			continue;
 		if (open_pack_index(p) || !p->num_objects)
 			continue;
 
@@ -1365,7 +1381,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	if (batch_size) {
 		if (fill_included_packs_batch(r, m, include_pack, batch_size))
 			goto cleanup;
-	} else if (fill_included_packs_all(m, include_pack))
+	} else if (fill_included_packs_all(r, m, include_pack))
 		goto cleanup;
 
 	argv_array_push(&cmd.args, "pack-objects");
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 43a7a66c9d..b2fece5d3d 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -533,6 +533,32 @@ test_expect_success 'repack with minimum size does not alter existing packs' '
 	)
 '
 
+test_expect_success 'repack respects repack.packKeptObjects=false' '
+	test_when_finished rm -f dup/.git/objects/pack/*keep &&
+	(
+		cd dup &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
+		for keep in $(cat keep-list)
+		do
+			touch $keep || return 1
+		done &&
+		git multi-pack-index repack --batch-size=0 &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list &&
+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
+		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
+		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list
+	)
+'
+
 test_expect_success 'repack creates a new pack' '
 	(
 		cd dup &&
-- 
2.26.2.vfs.1.2



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] midx: apply gitconfig to midx repack
  2020-05-05 13:50 ` Derrick Stolee
@ 2020-05-05 16:03   ` Son Luong Ngoc
  2020-05-06  8:56     ` Son Luong Ngoc
  0 siblings, 1 reply; 26+ messages in thread
From: Son Luong Ngoc @ 2020-05-05 16:03 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Son Luong Ngoc via GitGitGadget, git

Hi Derrick,

Thanks for a swift and comprehensive review.

> On May 5, 2020, at 15:50, Derrick Stolee <stolee@gmail.com> wrote:
> 
> In the scenario where there is a .keep pack _and_ it is small enough to get
> picked up by the batch size, the 'git multi-pack-index repack' command will
> create a new pack containing its objects (and objects from other packs) but
> the 'git multi-pack-index expire' command will not delete the pack with .keep.
> 
> The good news is that after the first repack, the objects in the pack are
> in a newer pack, so the multi-pack-index will not repack those objects from
> that pack multiple times. However, this may be unintended behavior for the
> user that specified the .keep pack.

Yup I experienced exactly this when trying to test midx repack/expire
with biggest pack file marked with `.keep`.
Luckily the storage size bump for duplicated objects was not noticeable in my case.
You worded the situation precisely.

> I think the right thing to do to respect "repack.packKeptObjects = false" is
> to ignore the packs when selecting the batch of objects. Instead of asking
> you to do this, I added a patch below. Please take it into your v2, if you
> don't mind.

Gladly.
This should help me a lot for re-rolling V2.

>> +static int delta_base_offset = 1;
>> +static int write_bitmaps = -1;
>> +static int use_delta_islands;
>> +
> 
> Why not make these local to the midx_repack method?

No practical reason except me shamelessly lifted those from builtin/repack.c.
I was a bit confused how `git repack` houses these logic in the builtin file,
while midx was having these logic in the midx.c instead of builtin/multi-pack-index.c.

I make them local in V2.

>> int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
>> {
>> 	int result = 0;
>> @@ -1381,12 +1385,25 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>> 	} else if (fill_included_packs_all(m, include_pack))
>> 		goto cleanup;
>> 
>> +  git_config_get_bool("repack.usedeltabaseoffset", &delta_base_offset);
>> +  git_config_get_bool("repack.writebitmaps", &write_bitmaps);
>> +  git_config_get_bool("repack.usedeltaislands", &use_delta_islands);
>> +
> 
> It looks like you have some spacing issues here. Perhaps use tabs?

Rookie mistake on my part. Will fix it in V2

>> +	if (write_bitmaps > 0)
>> +		argv_array_push(&cmd.args, "--write-bitmap-index");
>> +	else if (write_bitmaps < 0)
>> +		argv_array_push(&cmd.args, "--write-bitmap-index-quiet");
> 
> These make less sense. Unless --batch-size=0 and there are no .keep
> packs (with the patch below) I'm not sure we _can_ write bitmap indexes
> here. The pack-file is not necessarily closed under reachability. Or,
> will supplying these arguments to 'git pack-objects' actually do that
> closure?
> 
> I would be happy to special-case these options to the "--batch-size=0"
> situation and otherwise ignore them. This then gets into enough
> complication that we should update the documentation as in the patch
> below.

You make a great point here. 
I completely missed this as I have been largely testing with repacking only 2 packs,
effectively with --batch-size=0.

I think having the bitmaps index is highly desirable in `--batch-size=0` case.
I will try to include that in V2 (with Documentation).

> At minimum, it would be good to have some tests that exercise these
> code paths so we know they are behaving correctly.

I will do some readings with the current tests for repack and midx.
Hopefully I will have something for V2. (^_^ !)

> Thanks,
> -Stolee

Cheers,
Son Luong


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] midx: apply gitconfig to midx repack
  2020-05-05 16:03   ` Son Luong Ngoc
@ 2020-05-06  8:56     ` Son Luong Ngoc
  0 siblings, 0 replies; 26+ messages in thread
From: Son Luong Ngoc @ 2020-05-06  8:56 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Son Luong Ngoc via GitGitGadget, git

Hi,

> On May 5, 2020, at 18:03, Son Luong Ngoc <sluongng@gmail.com> wrote:
>> On May 5, 2020, at 15:50, Derrick Stolee <stolee@gmail.com> wrote:
>>> +	if (write_bitmaps > 0)
>>> +		argv_array_push(&cmd.args, "--write-bitmap-index");
>>> +	else if (write_bitmaps < 0)
>>> +		argv_array_push(&cmd.args, "--write-bitmap-index-quiet");
>> 
>> These make less sense. Unless --batch-size=0 and there are no .keep
>> packs (with the patch below) I'm not sure we _can_ write bitmap indexes
>> here. The pack-file is not necessarily closed under reachability. Or,
>> will supplying these arguments to 'git pack-objects' actually do that
>> closure?
>> 
>> I would be happy to special-case these options to the "--batch-size=0"
>> situation and otherwise ignore them. This then gets into enough
>> complication that we should update the documentation as in the patch
>> below.
> 
> You make a great point here. 
> I completely missed this as I have been largely testing with repacking only 2 packs,
> effectively with --batch-size=0.
> 
> I think having the bitmaps index is highly desirable in `--batch-size=0` case.
> I will try to include that in V2 (with Documentation).

Hmm, I just realized that there is a check for `--all` is being passed on pack-objects side.

	if (batch_size == 0) {
		argv_array_push(&cmd.args, "--all");
		if (write_bitmaps > 0)
			argv_array_push(&cmd.args, "--write-bitmap-index");
		else if (write_bitmaps < 0)
			argv_array_push(&cmd.args, "--write-bitmap-index-quiet");
	}

If I do something like this, the midx repack will become tremendously slow as I think pack-objects
needs to scan for all revs (fed from midx) and all refs.
Perhaps special exception needed to be made on pack-objects side to trust that midx is feeding it
everything there is?

I think adding `write_bitmaps` support would be a bit out of my hand for now, so I will settle with
the delta configs and Derrick's patch for V2. (sending it later today)

>> Thanks,
>> -Stolee
> 
> Cheers,
> Son Luong
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 0/2] midx: apply gitconfig to midx repack
  2020-05-05 13:06 [PATCH] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2020-05-05 13:50 ` Derrick Stolee
@ 2020-05-06  9:43 ` Son Luong Ngoc via GitGitGadget
  2020-05-06  9:43   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
                     ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Son Luong Ngoc via GitGitGadget @ 2020-05-06  9:43 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc

Midx repack has largely been used in Microsoft Scalar on the client side to
optimize the repository multiple packs state. However when I tried to apply
this onto the server-side, I realized that there are certain features that
were lacking compare to git repack. Most of these features are highly
desirable on the server-side to create the most optimized pack possible.

One of the example is delta_base_offset, comparing an midx repack
with/without delta_base_offset, we can observe significant size differences.

> du objects/pack/*pack
14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack

Latest 2.26.2 (without delta_base_offset)
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack

With delta_base_offset
> git version
git version 2.26.2.672.g232c24e857.dirty
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack

In this patch, I intentionally leaving out repack.writeBitmaps as I see that
it might need some update on pack-objects to improve the performance

Derrick Stolee following patch with address repack. packKeptObjects support.

Derrick Stolee (1):
  multi-pack-index: respect repack.packKeptObjects=false

Son Luong Ngoc (1):
  midx: apply gitconfig to midx repack

 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 36 ++++++++++++++++++++++----
 t/t5319-multi-pack-index.sh            | 26 +++++++++++++++++++
 3 files changed, 60 insertions(+), 5 deletions(-)


base-commit: b34789c0b0d3b137f0bb516b417bd8d75e0cb306
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/626

Range-diff vs v1:

 1:  215c882a503 ! 1:  21c648cc486 midx: apply gitconfig to midx repack
     @@ Commit message
          In this patch, I applies those flags into `git multi-pack-index repack`
          so that it respect the `repack.*` config series.
      
     -    Note: I left out `repack.packKeptObjects` intentionally as I dont think
     -    its relevant to midx repack use case.
     +    Note:
     +    - `repack.packKeptObjects` will be addressed by Derrick Stolee in
     +    the following patch
     +    - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
     +    requires `--all` to be passed onto `git pack-objects`, which is very
     +    slow. I think it would be nice to have this in a future patch.
      
          Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
      
       ## midx.c ##
     -@@ midx.c: static int fill_included_packs_batch(struct repository *r,
     - 	return 0;
     - }
     +@@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
     + 	struct child_process cmd = CHILD_PROCESS_INIT;
     + 	struct strbuf base_name = STRBUF_INIT;
     + 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
     ++	int delta_base_offset = 1;
     ++	int use_delta_islands;
       
     -+static int delta_base_offset = 1;
     -+static int write_bitmaps = -1;
     -+static int use_delta_islands;
     -+
     - int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, unsigned flags)
     - {
     - 	int result = 0;
     + 	if (!m)
     + 		return 0;
      @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
       	} else if (fill_included_packs_all(m, include_pack))
       		goto cleanup;
       
     -+  git_config_get_bool("repack.usedeltabaseoffset", &delta_base_offset);
     -+  git_config_get_bool("repack.writebitmaps", &write_bitmaps);
     -+  git_config_get_bool("repack.usedeltaislands", &use_delta_islands);
     ++	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
     ++	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
      +
       	argv_array_push(&cmd.args, "pack-objects");
       
     @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t bat
      +		argv_array_push(&cmd.args, "--delta-base-offset");
      +	if (use_delta_islands)
      +		argv_array_push(&cmd.args, "--delta-islands");
     -+	if (write_bitmaps > 0)
     -+		argv_array_push(&cmd.args, "--write-bitmap-index");
     -+	else if (write_bitmaps < 0)
     -+		argv_array_push(&cmd.args, "--write-bitmap-index-quiet");
      +
       	if (flags & MIDX_PROGRESS)
       		argv_array_push(&cmd.args, "--progress");
 -:  ----------- > 2:  3d7b334f5c6 multi-pack-index: respect repack.packKeptObjects=false

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/2] midx: apply gitconfig to midx repack
  2020-05-06  9:43 ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
@ 2020-05-06  9:43   ` Son Luong Ngoc via GitGitGadget
  2020-05-06 12:03     ` Derrick Stolee
  2020-05-06 17:03     ` Junio C Hamano
  2020-05-06  9:43   ` [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
  2020-05-09 14:24   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2 siblings, 2 replies; 26+ messages in thread
From: Son Luong Ngoc via GitGitGadget @ 2020-05-06  9:43 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc

From: Son Luong Ngoc <sluongng@gmail.com>

Multi-Pack-Index repack is an incremental, repack solutions
that allows user to consolidate multiple packfiles in a non-disruptive
way. However the new packfile could be created without some of the
capabilities of a packfile that is created by calling `git repack`.

This is because with `git repack`, there are configuration that would
enable different flags to be passed down to `git pack-objects` plumbing.

In this patch, I applies those flags into `git multi-pack-index repack`
so that it respect the `repack.*` config series.

Note:
- `repack.packKeptObjects` will be addressed by Derrick Stolee in
the following patch
- `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
requires `--all` to be passed onto `git pack-objects`, which is very
slow. I think it would be nice to have this in a future patch.

Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
 midx.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/midx.c b/midx.c
index 9a61d3b37d9..3348f8e569b 100644
--- a/midx.c
+++ b/midx.c
@@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct strbuf base_name = STRBUF_INIT;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+	int delta_base_offset = 1;
+	int use_delta_islands;
 
 	if (!m)
 		return 0;
@@ -1381,12 +1383,20 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	} else if (fill_included_packs_all(m, include_pack))
 		goto cleanup;
 
+	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
+	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
+
 	argv_array_push(&cmd.args, "pack-objects");
 
 	strbuf_addstr(&base_name, object_dir);
 	strbuf_addstr(&base_name, "/pack/pack");
 	argv_array_push(&cmd.args, base_name.buf);
 
+	if (delta_base_offset)
+		argv_array_push(&cmd.args, "--delta-base-offset");
+	if (use_delta_islands)
+		argv_array_push(&cmd.args, "--delta-islands");
+
 	if (flags & MIDX_PROGRESS)
 		argv_array_push(&cmd.args, "--progress");
 	else
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-06  9:43 ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
  2020-05-06  9:43   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
@ 2020-05-06  9:43   ` Derrick Stolee via GitGitGadget
  2020-05-06 16:18     ` Eric Sunshine
  2020-05-09 14:24   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2 siblings, 1 reply; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-06  9:43 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When selecting a batch of pack-files to repack in the "git
multi-pack-index repack" command, Git should respect the
repack.packKeptObjects config option. When false, this option says that
the pack-files with an associated ".keep" file should not be repacked.
This config value is "false" by default.

There are two cases for selecting a batch of objects. The first is the
case where the input batch-size is zero, which specifies "repack
everything". The second is with a non-zero batch size, which selects
pack-files using a greedy selection criteria. Both of these cases are
updated and tested.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 26 +++++++++++++++++++++-----
 t/t5319-multi-pack-index.sh            | 26 ++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 642d9ac5b72..0c6619493c1 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -56,6 +56,9 @@ repack::
 	file is created, rewrite the multi-pack-index to reference the
 	new pack-file. A later run of 'git multi-pack-index expire' will
 	delete the pack-files that were part of this batch.
++
+If `repack.packKeptObjects` is `false`, then any pack-files with an
+associated `.keep` file will not be selected for the batch to repack.
 
 
 EXAMPLES
diff --git a/midx.c b/midx.c
index 3348f8e569b..b8a52740832 100644
--- a/midx.c
+++ b/midx.c
@@ -1293,15 +1293,26 @@ static int compare_by_mtime(const void *a_, const void *b_)
 	return 0;
 }
 
-static int fill_included_packs_all(struct multi_pack_index *m,
+static int fill_included_packs_all(struct repository *r,
+				   struct multi_pack_index *m,
 				   unsigned char *include_pack)
 {
-	uint32_t i;
+	uint32_t i, count = 0;
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(r, m, i))
+			continue;
+		if (!pack_kept_objects && m->packs[i]->pack_keep)
+			continue;
 
-	for (i = 0; i < m->num_packs; i++)
 		include_pack[i] = 1;
+		count++;
+	}
 
-	return m->num_packs < 2;
+	return count < 2;
 }
 
 static int fill_included_packs_batch(struct repository *r,
@@ -1312,6 +1323,9 @@ static int fill_included_packs_batch(struct repository *r,
 	uint32_t i, packs_to_repack;
 	size_t total_size;
 	struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info));
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
 		pack_info[i].pack_int_id = i;
@@ -1338,6 +1352,8 @@ static int fill_included_packs_batch(struct repository *r,
 
 		if (!p)
 			continue;
+		if (!pack_kept_objects && p->pack_keep)
+			continue;
 		if (open_pack_index(p) || !p->num_objects)
 			continue;
 
@@ -1380,7 +1396,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	if (batch_size) {
 		if (fill_included_packs_batch(r, m, include_pack, batch_size))
 			goto cleanup;
-	} else if (fill_included_packs_all(m, include_pack))
+	} else if (fill_included_packs_all(r, m, include_pack))
 		goto cleanup;
 
 	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 030a7222b2a..67afe1bb8d9 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -538,6 +538,32 @@ test_expect_success 'repack with minimum size does not alter existing packs' '
 	)
 '
 
+test_expect_success 'repack respects repack.packKeptObjects=false' '
+	test_when_finished rm -f dup/.git/objects/pack/*keep &&
+	(
+		cd dup &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
+		for keep in $(cat keep-list)
+		do
+			touch $keep || return 1
+		done &&
+		git multi-pack-index repack --batch-size=0 &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list &&
+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
+		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
+		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list
+	)
+'
+
 test_expect_success 'repack creates a new pack' '
 	(
 		cd dup &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] midx: apply gitconfig to midx repack
  2020-05-06  9:43   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
@ 2020-05-06 12:03     ` Derrick Stolee
  2020-05-06 17:03     ` Junio C Hamano
  1 sibling, 0 replies; 26+ messages in thread
From: Derrick Stolee @ 2020-05-06 12:03 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget, git; +Cc: Son Luong Ngoc

On 5/6/2020 5:43 AM, Son Luong Ngoc via GitGitGadget wrote:
> From: Son Luong Ngoc <sluongng@gmail.com>
...
> - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
> requires `--all` to be passed onto `git pack-objects`, which is very
> slow. I think it would be nice to have this in a future patch.

Just my two cents here: the reachability bitmaps are really tied to the
idea of a single pack right now. To create bitmaps, I would currently
suggest using the 'git repack' builtin with the proper options. That
command deletes the multi-pack-index, unfortunately, but it also produces
a single pack and deletes the others (when creating bitmaps).

You are right that the `--all` option required to pack-objects is not
appropriate to add inside `git multi-pack-index repack` as that changes
the pattern. It requires loading all reachable objects, even if they are
not already in packs covered by the multi-pack-index. This at minimum
violates expectations with the --batch-size argument.

Integrating reachability bitmaps more closely with the multi-pack-index
is certainly on our radar, but is a large endeavor.

This new patch looks good to me.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-06  9:43   ` [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
@ 2020-05-06 16:18     ` Eric Sunshine
  2020-05-06 16:36       ` Derrick Stolee
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Sunshine @ 2020-05-06 16:18 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: Git List, Son Luong Ngoc, Derrick Stolee

On Wed, May 6, 2020 at 5:44 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> @@ -538,6 +538,32 @@ test_expect_success 'repack with minimum size does not alter existing packs' '
> +test_expect_success 'repack respects repack.packKeptObjects=false' '
> +       test_when_finished rm -f dup/.git/objects/pack/*keep &&
> +       (
> +               [...]
> +               THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
> +               BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&

Taking jk/arith-expansion-coding-guidelines[1] into consideration,
perhaps write this as:

    BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&

[1]: https://lore.kernel.org/git/20200504160709.GB12842@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-06 16:18     ` Eric Sunshine
@ 2020-05-06 16:36       ` Derrick Stolee
  0 siblings, 0 replies; 26+ messages in thread
From: Derrick Stolee @ 2020-05-06 16:36 UTC (permalink / raw)
  To: Eric Sunshine, Derrick Stolee via GitGitGadget
  Cc: Git List, Son Luong Ngoc, Derrick Stolee

On 5/6/2020 12:18 PM, Eric Sunshine wrote:
> On Wed, May 6, 2020 at 5:44 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>> diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
>> @@ -538,6 +538,32 @@ test_expect_success 'repack with minimum size does not alter existing packs' '
>> +test_expect_success 'repack respects repack.packKeptObjects=false' '
>> +       test_when_finished rm -f dup/.git/objects/pack/*keep &&
>> +       (
>> +               [...]
>> +               THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
>> +               BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
> 
> Taking jk/arith-expansion-coding-guidelines[1] into consideration,
> perhaps write this as:
> 
>     BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&

Thanks for pointing this out. This line is repeated in the test
after this one. That should be fixed, too.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] midx: apply gitconfig to midx repack
  2020-05-06  9:43   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
  2020-05-06 12:03     ` Derrick Stolee
@ 2020-05-06 17:03     ` Junio C Hamano
  2020-05-07  7:29       ` Son Luong Ngoc
  1 sibling, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2020-05-06 17:03 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc, Derrick Stolee

"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Multi-Pack-Index repack is an incremental, repack solutions
> that allows user to consolidate multiple packfiles in a non-disruptive
> way. However the new packfile could be created without some of the
> capabilities of a packfile that is created by calling `git repack`.

It may be clear to you who wrote the patch, but it is quite unclear
to readers how `repack` gets into the picture.  The first sentence
talks about what "git multi-pack-index repack" subcommand.  Unless
you mention that that "git multi-pack-index repack" subcommand calls
"git repack" under the hood in order to create a new packfile, the
second paragraph can be read as if you are pointing out a problem if
the user did

	$ git multi-pack-index repack
	$ git repack

and the explicit "repack" initiated by the user may create a
packfile that is somehow incompatible with what the previous repack
wanted to do, or something like that.

> This is because with `git repack`, there are configuration that would
> enable different flags to be passed down to `git pack-objects` plumbing.

And this does not help to clear the possible confusion, either.

I think all of the above is clearer if you rewrite the above
(including the title) like so:

    midx: teach "git multi-pack-index repack" honor "git repack" configuration

    When the "repack" subcommand of "git multi-pack-index" command
    creates new packfile(s), it does not call the "git repack"
    command but instead directly calls the "git pack-objects"
    command, and the configuration variables meant for the "git
    repack" command, like "repack.usedaeltabaseoffset", are ignored.

Now the problem description is behind us, let's see the description
of proposed solution.  We write this part in imperative mood, as if
we are giving an order to the codebase to "become like so".  We do
not say "I do X, I do Y".

> In this patch, I applies those flags into `git multi-pack-index repack`
> so that it respect the `repack.*` config series.

    Check the configuration variables used by "git repack" ourselves
    and pass the corresponding options to underlying "git pack-objects"
    in this codepath.


> Note:
> - `repack.packKeptObjects` will be addressed by Derrick Stolee in
> the following patch

This definitely does not belong to the commit log message.  It would
make a helpful note meant for the reviewers if written below the
three-dash line, though.

> - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
> requires `--all` to be passed onto `git pack-objects`, which is very
> slow. I think it would be nice to have this in a future patch.

The phrasing makes it hard to grok.  Do you want to say that the
repack.writeBitmaps configuration variable is ignored?

I think Derrick gave you the reason why bitmaps is not compatible
with midx in general, and that would be a better rationale to record
why the configuration is ignored.  Perhaps like

    Note that `repack.writeBitmaps` configuration is ignored, as the
    pack bitmap faciility is useful only with a single packfile.

or something like that?

Do we need to worry about the configuration variables understood by
the "git pack-objects" command to get in the way, by the way?
"pack.packsizelimit" may cause "git repack" to produce more than one
packfile, and if this codepath wants to avoid it (I do not know if
that is the case), it may have to override it from the command line,
for example.

> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
> ---
>  midx.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/midx.c b/midx.c
> index 9a61d3b37d9..3348f8e569b 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  	struct child_process cmd = CHILD_PROCESS_INIT;
>  	struct strbuf base_name = STRBUF_INIT;
>  	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
> +	int delta_base_offset = 1;

By default we use delta-base-offset, so if repo_config_get_bool()
did not see the repack.usedeltabaseoffset configuration defined in
any configuration file, we still want to see 1 after it returns.

> +	int use_delta_islands;

What is the reason why it is safe to leave this uninitialized?  Did
you mean 

	int use_delta_islands = 0;

here?

> @@ -1381,12 +1383,20 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>  	} else if (fill_included_packs_all(m, include_pack))
>  		goto cleanup;
>  
> +	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
> +	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
> +
>  	argv_array_push(&cmd.args, "pack-objects");
>  
>  	strbuf_addstr(&base_name, object_dir);
>  	strbuf_addstr(&base_name, "/pack/pack");
>  	argv_array_push(&cmd.args, base_name.buf);
>  
> +	if (delta_base_offset)
> +		argv_array_push(&cmd.args, "--delta-base-offset");
> +	if (use_delta_islands)
> +		argv_array_push(&cmd.args, "--delta-islands");
> +

These look like good changes.

>  	if (flags & MIDX_PROGRESS)
>  		argv_array_push(&cmd.args, "--progress");
>  	else

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/2] midx: apply gitconfig to midx repack
  2020-05-06 17:03     ` Junio C Hamano
@ 2020-05-07  7:29       ` Son Luong Ngoc
  0 siblings, 0 replies; 26+ messages in thread
From: Son Luong Ngoc @ 2020-05-07  7:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Son Luong Ngoc via GitGitGadget, git, Derrick Stolee

Hi Junio,

Thanks for the feedbacks

> On May 6, 2020, at 19:03, Junio C Hamano <gitster@pobox.com> wrote:

...
> We write this part in imperative mood, as if
> we are giving an order to the codebase to "become like so".  We do
> not say "I do X, I do Y".

This is a great feedback.
I will try to include all of your suggestions and edit the message
before submitting V3.

>> Note:
>> - `repack.packKeptObjects` will be addressed by Derrick Stolee in
>> the following patch
> 
> This definitely does not belong to the commit log message.  It would
> make a helpful note meant for the reviewers if written below the
> three-dash line, though.

Duly noted.

> Do we need to worry about the configuration variables understood by
> the "git pack-objects" command to get in the way, by the way?
> "pack.packsizelimit" may cause "git repack" to produce more than one
> packfile, and if this codepath wants to avoid it (I do not know if
> that is the case), it may have to override it from the command line,
> for example.

I dont think we want to avoid the packsizelimit here.
The point of repacking with midx is to help
end users consolidate multiple packfile in a non-disruptive way.

If you wish to put a constraint (i.e. packsizelimit, packKeptObjects) on this process,
you should be able to.

>> Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
>> ---
>> midx.c | 10 ++++++++++
>> 1 file changed, 10 insertions(+)
>> 
>> diff --git a/midx.c b/midx.c
>> index 9a61d3b37d9..3348f8e569b 100644
>> --- a/midx.c
>> +++ b/midx.c
>> @@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
>> 	struct child_process cmd = CHILD_PROCESS_INIT;
>> 	struct strbuf base_name = STRBUF_INIT;
>> 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
>> +	int delta_base_offset = 1;
> 
> By default we use delta-base-offset, so if repo_config_get_bool()
> did not see the repack.usedeltabaseoffset configuration defined in
> any configuration file, we still want to see 1 after it returns.
> 
>> +	int use_delta_islands;
> 
> What is the reason why it is safe to leave this uninitialized?  Did
> you mean 
> 
> 	int use_delta_islands = 0;
> 
> here?

I think I totally misread how repo_config_get_bool() supposed to work
Your comment here made me re-read it and things got a lot clearer.

Will set the default value to 0 in next version.

> Thanks.

Much appreciate,
Son Luong.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 0/3] midx: apply gitconfig to midx repack
  2020-05-06  9:43 ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
  2020-05-06  9:43   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
  2020-05-06  9:43   ` [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
@ 2020-05-09 14:24   ` Son Luong Ngoc via GitGitGadget
  2020-05-09 14:24     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
                       ` (3 more replies)
  2 siblings, 4 replies; 26+ messages in thread
From: Son Luong Ngoc via GitGitGadget @ 2020-05-09 14:24 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc

Midx repack has largely been used in Microsoft Scalar on the client side to
optimize the repository multiple packs state. However when I tried to apply
this onto the server-side, I realized that there are certain features that
were lacking compare to git repack. Most of these features are highly
desirable on the server-side to create the most optimized pack possible.

One of the example is delta_base_offset, comparing an midx repack
with/without delta_base_offset, we can observe significant size differences.

> du objects/pack/*pack
14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack

Latest 2.26.2 (without delta_base_offset)
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack

With delta_base_offset
> git version
git version 2.26.2.672.g232c24e857.dirty
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack

Note that repack.writeBitmaps configuration is ignored, as the pack bitmap
facility is useful only with a single packfile.

Derrick Stolee's following patch will address repack.packKeptObjects 
support.

Derrick Stolee (1):
  multi-pack-index: respect repack.packKeptObjects=false

Son Luong Ngoc (2):
  midx: teach "git multi-pack-index repack" honor "git repack"
    configurations
  Ensured t5319 follows arith expansion guideline

 Documentation/git-multi-pack-index.txt |  3 ++
 midx.c                                 | 36 ++++++++++++---
 t/t5319-multi-pack-index.sh            | 62 ++++++++++++++++++--------
 3 files changed, 78 insertions(+), 23 deletions(-)


base-commit: b994622632154fc3b17fb40a38819ad954a5fb88
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/626

Range-diff vs v2:

 1:  21c648cc486 ! 1:  a925307d4c5 midx: apply gitconfig to midx repack
     @@ Metadata
      Author: Son Luong Ngoc <sluongng@gmail.com>
      
       ## Commit message ##
     -    midx: apply gitconfig to midx repack
     +    midx: teach "git multi-pack-index repack" honor "git repack" configurations
      
     -    Multi-Pack-Index repack is an incremental, repack solutions
     -    that allows user to consolidate multiple packfiles in a non-disruptive
     -    way. However the new packfile could be created without some of the
     -    capabilities of a packfile that is created by calling `git repack`.
     +    Previously, when the "repack" subcommand of "git multi-pack-index" command
     +    creates new packfile(s), it does not call the "git repack" command but
     +    instead directly calls the "git pack-objects" command, and the
     +    configuration variables meant for the "git repack" command, like
     +    "repack.usedaeltabaseoffset", are ignored.
      
     -    This is because with `git repack`, there are configuration that would
     -    enable different flags to be passed down to `git pack-objects` plumbing.
     +    This patch ensured "git multi-pack-index" checks the configuration
     +    variables used by "git repack" and passes the corresponding options to
     +    the underlying "git pack-objects" command.
      
     -    In this patch, I applies those flags into `git multi-pack-index repack`
     -    so that it respect the `repack.*` config series.
     -
     -    Note:
     -    - `repack.packKeptObjects` will be addressed by Derrick Stolee in
     -    the following patch
     -    - `repack.writeBitmaps` when `--batch-size=0` was NOT adopted here as it
     -    requires `--all` to be passed onto `git pack-objects`, which is very
     -    slow. I think it would be nice to have this in a future patch.
     +    Note that `repack.writeBitmaps` configuration is ignored, as the
     +    pack bitmap facility is useful only with a single packfile.
      
          Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
      
     @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t bat
       	struct strbuf base_name = STRBUF_INIT;
       	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
      +	int delta_base_offset = 1;
     -+	int use_delta_islands;
     ++	int use_delta_islands = 0;
       
       	if (!m)
       		return 0;
 2:  3d7b334f5c6 = 2:  988697dd512 multi-pack-index: respect repack.packKeptObjects=false
 -:  ----------- > 3:  efeb3d7d132 Ensured t5319 follows arith expansion guideline

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations
  2020-05-09 14:24   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
@ 2020-05-09 14:24     ` Son Luong Ngoc via GitGitGadget
  2020-05-09 16:51       ` Junio C Hamano
  2020-05-09 14:24     ` [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Son Luong Ngoc via GitGitGadget @ 2020-05-09 14:24 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc

From: Son Luong Ngoc <sluongng@gmail.com>

Previously, when the "repack" subcommand of "git multi-pack-index" command
creates new packfile(s), it does not call the "git repack" command but
instead directly calls the "git pack-objects" command, and the
configuration variables meant for the "git repack" command, like
"repack.usedaeltabaseoffset", are ignored.

This patch ensured "git multi-pack-index" checks the configuration
variables used by "git repack" and passes the corresponding options to
the underlying "git pack-objects" command.

Note that `repack.writeBitmaps` configuration is ignored, as the
pack bitmap facility is useful only with a single packfile.

Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
 midx.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/midx.c b/midx.c
index 9a61d3b37d9..1e76be56826 100644
--- a/midx.c
+++ b/midx.c
@@ -1369,6 +1369,8 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct strbuf base_name = STRBUF_INIT;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
+	int delta_base_offset = 1;
+	int use_delta_islands = 0;
 
 	if (!m)
 		return 0;
@@ -1381,12 +1383,20 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	} else if (fill_included_packs_all(m, include_pack))
 		goto cleanup;
 
+	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
+	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
+
 	argv_array_push(&cmd.args, "pack-objects");
 
 	strbuf_addstr(&base_name, object_dir);
 	strbuf_addstr(&base_name, "/pack/pack");
 	argv_array_push(&cmd.args, base_name.buf);
 
+	if (delta_base_offset)
+		argv_array_push(&cmd.args, "--delta-base-offset");
+	if (use_delta_islands)
+		argv_array_push(&cmd.args, "--delta-islands");
+
 	if (flags & MIDX_PROGRESS)
 		argv_array_push(&cmd.args, "--progress");
 	else
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-09 14:24   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2020-05-09 14:24     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
@ 2020-05-09 14:24     ` Derrick Stolee via GitGitGadget
  2020-05-09 16:11       ` Đoàn Trần Công Danh
  2020-05-09 14:24     ` [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline Son Luong Ngoc via GitGitGadget
  2020-05-10 16:07     ` [PATCH v4 0/2] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  3 siblings, 1 reply; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-09 14:24 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When selecting a batch of pack-files to repack in the "git
multi-pack-index repack" command, Git should respect the
repack.packKeptObjects config option. When false, this option says that
the pack-files with an associated ".keep" file should not be repacked.
This config value is "false" by default.

There are two cases for selecting a batch of objects. The first is the
case where the input batch-size is zero, which specifies "repack
everything". The second is with a non-zero batch size, which selects
pack-files using a greedy selection criteria. Both of these cases are
updated and tested.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 26 +++++++++++++++++++++-----
 t/t5319-multi-pack-index.sh            | 26 ++++++++++++++++++++++++++
 3 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 642d9ac5b72..0c6619493c1 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -56,6 +56,9 @@ repack::
 	file is created, rewrite the multi-pack-index to reference the
 	new pack-file. A later run of 'git multi-pack-index expire' will
 	delete the pack-files that were part of this batch.
++
+If `repack.packKeptObjects` is `false`, then any pack-files with an
+associated `.keep` file will not be selected for the batch to repack.
 
 
 EXAMPLES
diff --git a/midx.c b/midx.c
index 1e76be56826..9b14d915db1 100644
--- a/midx.c
+++ b/midx.c
@@ -1293,15 +1293,26 @@ static int compare_by_mtime(const void *a_, const void *b_)
 	return 0;
 }
 
-static int fill_included_packs_all(struct multi_pack_index *m,
+static int fill_included_packs_all(struct repository *r,
+				   struct multi_pack_index *m,
 				   unsigned char *include_pack)
 {
-	uint32_t i;
+	uint32_t i, count = 0;
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(r, m, i))
+			continue;
+		if (!pack_kept_objects && m->packs[i]->pack_keep)
+			continue;
 
-	for (i = 0; i < m->num_packs; i++)
 		include_pack[i] = 1;
+		count++;
+	}
 
-	return m->num_packs < 2;
+	return count < 2;
 }
 
 static int fill_included_packs_batch(struct repository *r,
@@ -1312,6 +1323,9 @@ static int fill_included_packs_batch(struct repository *r,
 	uint32_t i, packs_to_repack;
 	size_t total_size;
 	struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info));
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
 		pack_info[i].pack_int_id = i;
@@ -1338,6 +1352,8 @@ static int fill_included_packs_batch(struct repository *r,
 
 		if (!p)
 			continue;
+		if (!pack_kept_objects && p->pack_keep)
+			continue;
 		if (open_pack_index(p) || !p->num_objects)
 			continue;
 
@@ -1380,7 +1396,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	if (batch_size) {
 		if (fill_included_packs_batch(r, m, include_pack, batch_size))
 			goto cleanup;
-	} else if (fill_included_packs_all(m, include_pack))
+	} else if (fill_included_packs_all(r, m, include_pack))
 		goto cleanup;
 
 	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 030a7222b2a..67afe1bb8d9 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -538,6 +538,32 @@ test_expect_success 'repack with minimum size does not alter existing packs' '
 	)
 '
 
+test_expect_success 'repack respects repack.packKeptObjects=false' '
+	test_when_finished rm -f dup/.git/objects/pack/*keep &&
+	(
+		cd dup &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
+		for keep in $(cat keep-list)
+		do
+			touch $keep || return 1
+		done &&
+		git multi-pack-index repack --batch-size=0 &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list &&
+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
+		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
+		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list
+	)
+'
+
 test_expect_success 'repack creates a new pack' '
 	(
 		cd dup &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline
  2020-05-09 14:24   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2020-05-09 14:24     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
  2020-05-09 14:24     ` [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
@ 2020-05-09 14:24     ` Son Luong Ngoc via GitGitGadget
  2020-05-09 16:55       ` Junio C Hamano
  2020-05-10 16:07     ` [PATCH v4 0/2] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  3 siblings, 1 reply; 26+ messages in thread
From: Son Luong Ngoc via GitGitGadget @ 2020-05-09 14:24 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc

From: Son Luong Ngoc <sluongng@gmail.com>

As the old versions of dash is deprecated, dollar-sign inside
artihmetic expansion is no longer needed.
This ensures t5319 follows the coding guideline updated
in 'jk/arith-expansion-coding-guidelines' 6d4bf5813cd2c1a3b93fd4f0b231733f82133cce.

Reported-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
 t/t5319-multi-pack-index.sh | 38 ++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 67afe1bb8d9..065f48747f3 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -62,8 +62,8 @@ generate_objects () {
 	} >wide_delta_$iii &&
 	{
 		test-tool genrandom "foo"$i 100 &&
-		test-tool genrandom "foo"$(( $i + 1 )) 100 &&
-		test-tool genrandom "foo"$(( $i + 2 )) 100
+		test-tool genrandom "foo"$(( i + 1 )) 100 &&
+		test-tool genrandom "foo"$(( i + 2 )) 100
 	} >deep_delta_$iii &&
 	{
 		echo $iii &&
@@ -251,21 +251,21 @@ MIDX_BYTE_OID_VERSION=5
 MIDX_BYTE_CHUNK_COUNT=6
 MIDX_HEADER_SIZE=12
 MIDX_BYTE_CHUNK_ID=$MIDX_HEADER_SIZE
-MIDX_BYTE_CHUNK_OFFSET=$(($MIDX_HEADER_SIZE + 4))
+MIDX_BYTE_CHUNK_OFFSET=$((MIDX_HEADER_SIZE + 4))
 MIDX_NUM_CHUNKS=5
 MIDX_CHUNK_LOOKUP_WIDTH=12
-MIDX_OFFSET_PACKNAMES=$(($MIDX_HEADER_SIZE + \
-			 $MIDX_NUM_CHUNKS * $MIDX_CHUNK_LOOKUP_WIDTH))
-MIDX_BYTE_PACKNAME_ORDER=$(($MIDX_OFFSET_PACKNAMES + 2))
-MIDX_OFFSET_OID_FANOUT=$(($MIDX_OFFSET_PACKNAMES + $(test_oid packnameoff)))
+MIDX_OFFSET_PACKNAMES=$((MIDX_HEADER_SIZE + \
+			 MIDX_NUM_CHUNKS * MIDX_CHUNK_LOOKUP_WIDTH))
+MIDX_BYTE_PACKNAME_ORDER=$((MIDX_OFFSET_PACKNAMES + 2))
+MIDX_OFFSET_OID_FANOUT=$((MIDX_OFFSET_PACKNAMES + $(test_oid packnameoff)))
 MIDX_OID_FANOUT_WIDTH=4
-MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * $MIDX_OID_FANOUT_WIDTH + $(test_oid fanoutoff)))
-MIDX_OFFSET_OID_LOOKUP=$(($MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
-MIDX_BYTE_OID_LOOKUP=$(($MIDX_OFFSET_OID_LOOKUP + 16 * $HASH_LEN))
-MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * $HASH_LEN))
+MIDX_BYTE_OID_FANOUT_ORDER=$((MIDX_OFFSET_OID_FANOUT + 250 * MIDX_OID_FANOUT_WIDTH + $(test_oid fanoutoff)))
+MIDX_OFFSET_OID_LOOKUP=$((MIDX_OFFSET_OID_FANOUT + 256 * MIDX_OID_FANOUT_WIDTH))
+MIDX_BYTE_OID_LOOKUP=$((MIDX_OFFSET_OID_LOOKUP + 16 * HASH_LEN))
+MIDX_OFFSET_OBJECT_OFFSETS=$((MIDX_OFFSET_OID_LOOKUP + NUM_OBJECTS * HASH_LEN))
 MIDX_OFFSET_WIDTH=8
-MIDX_BYTE_PACK_INT_ID=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH + 2))
-MIDX_BYTE_OFFSET=$(($MIDX_OFFSET_OBJECT_OFFSETS + 16 * $MIDX_OFFSET_WIDTH + 6))
+MIDX_BYTE_PACK_INT_ID=$((MIDX_OFFSET_OBJECT_OFFSETS + 16 * MIDX_OFFSET_WIDTH + 2))
+MIDX_BYTE_OFFSET=$((MIDX_OFFSET_OBJECT_OFFSETS + 16 * MIDX_OFFSET_WIDTH + 6))
 
 test_expect_success 'verify bad version' '
 	corrupt_midx_and_verify $MIDX_BYTE_VERSION "\00" $objdir \
@@ -417,10 +417,10 @@ test_expect_success 'verify multi-pack-index with 64-bit offsets' '
 
 NUM_OBJECTS=63
 MIDX_OFFSET_OID_FANOUT=$((MIDX_OFFSET_PACKNAMES + 54))
-MIDX_OFFSET_OID_LOOKUP=$((MIDX_OFFSET_OID_FANOUT + 256 * $MIDX_OID_FANOUT_WIDTH))
-MIDX_OFFSET_OBJECT_OFFSETS=$(($MIDX_OFFSET_OID_LOOKUP + $NUM_OBJECTS * $HASH_LEN))
-MIDX_OFFSET_LARGE_OFFSETS=$(($MIDX_OFFSET_OBJECT_OFFSETS + $NUM_OBJECTS * $MIDX_OFFSET_WIDTH))
-MIDX_BYTE_LARGE_OFFSET=$(($MIDX_OFFSET_LARGE_OFFSETS + 3))
+MIDX_OFFSET_OID_LOOKUP=$((MIDX_OFFSET_OID_FANOUT + 256 * MIDX_OID_FANOUT_WIDTH))
+MIDX_OFFSET_OBJECT_OFFSETS=$((MIDX_OFFSET_OID_LOOKUP + NUM_OBJECTS * HASH_LEN))
+MIDX_OFFSET_LARGE_OFFSETS=$((MIDX_OFFSET_OBJECT_OFFSETS + NUM_OBJECTS * MIDX_OFFSET_WIDTH))
+MIDX_BYTE_LARGE_OFFSET=$((MIDX_OFFSET_LARGE_OFFSETS + 3))
 
 test_expect_success 'verify incorrect 64-bit offset' '
 	corrupt_midx_and_verify $MIDX_BYTE_LARGE_OFFSET "\07" objects64 \
@@ -555,7 +555,7 @@ test_expect_success 'repack respects repack.packKeptObjects=false' '
 		test-tool read-midx .git/objects | grep idx >midx-list &&
 		test_line_count = 5 midx-list &&
 		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
-		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
+		BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&
 		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
 		ls .git/objects/pack/*idx >idx-list &&
 		test_line_count = 5 idx-list &&
@@ -570,7 +570,7 @@ test_expect_success 'repack creates a new pack' '
 		ls .git/objects/pack/*idx >idx-list &&
 		test_line_count = 5 idx-list &&
 		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
-		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
+		BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&
 		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
 		ls .git/objects/pack/*idx >idx-list &&
 		test_line_count = 6 idx-list &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-09 14:24     ` [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
@ 2020-05-09 16:11       ` Đoàn Trần Công Danh
  2020-05-09 17:33         ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: Đoàn Trần Công Danh @ 2020-05-09 16:11 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, Son Luong Ngoc, Derrick Stolee

On 2020-05-09 14:24:29+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> +test_expect_success 'repack respects repack.packKeptObjects=false' '
> +	test_when_finished rm -f dup/.git/objects/pack/*keep &&
> +	(
> +		cd dup &&
> +		ls .git/objects/pack/*idx >idx-list &&

I think ls(1) is an overkill.
I think:

	echo .git/objects/pack/*idx

is more efficient.

> +		test_line_count = 5 idx-list &&
> +		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&

Likewise.

> +		for keep in $(cat keep-list)
> +		do
> +			touch $keep || return 1

Is this intended?
Since touch(1) accepts multiple files as argument.

> +		done &&
> +		git multi-pack-index repack --batch-size=0 &&
> +		ls .git/objects/pack/*idx >idx-list &&
> +		test_line_count = 5 idx-list &&
> +		test-tool read-midx .git/objects | grep idx >midx-list &&
> +		test_line_count = 5 midx-list &&
> +		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&

This line is overly long.
Should we write test-tool's output to temp file and process it?

And I think either

	sed -n '3{p;q}'

or:

	sed -n 3p

is cleaner than

	head -n 3 | tail -n 1

> +		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&

I think we're better to make this correct in this patch instead of
spend a dollar here, than take it back in the next patch.

> +		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
> +		ls .git/objects/pack/*idx >idx-list &&

-- 
Danh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations
  2020-05-09 14:24     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
@ 2020-05-09 16:51       ` Junio C Hamano
  2020-05-10 14:27         ` Son Luong Ngoc
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2020-05-09 16:51 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc

"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Son Luong Ngoc <sluongng@gmail.com>
>
> Previously, when the "repack" subcommand of "git multi-pack-index" command
> creates new packfile(s), it does not call the "git repack" command but
> instead directly calls the "git pack-objects" command, and the
> configuration variables meant for the "git repack" command, like
> "repack.usedaeltabaseoffset", are ignored.

When we talk about the current state of the code (i.e. before
applying this patch), we do not say "previously".  It's not like you
are complaining about a recent breakage, e.g. "previously X worked
like this but since change Y, it instead works like that, which
breaks Z".

> This patch ensured "git multi-pack-index" checks the configuration
> variables used by "git repack" and passes the corresponding options to
> the underlying "git pack-objects" command.

We write this part in imperative mood, as if we are giving an order
to the codebase to "become like so".  We do not give an observation
about the patch or the author ("This patch does X, this patch also
does Y", "I do X, I do Y").

Taking these two together, perhaps like:

    When the "repack" subcommand of "git multi-pack-index" command
    creates new packfile(s), it does not call the "git repack"
    command but instead directly calls the "git pack-objects"
    command, and the configuration variables meant for the "git
    repack" command, like "repack.usedaeltabaseoffset", are ignored.

    Check the configuration variables used by "git repack" ourselves
    in "git multi-index-pack" and pass the corresponding options to
    underlying "git pack-objects".

> Note that `repack.writeBitmaps` configuration is ignored, as the
> pack bitmap facility is useful only with a single packfile.

Good.

> +	int delta_base_offset = 1;
> +	int use_delta_islands = 0;

These give the default values for two configurations and over there
builtin/repack.c has these lines:

    17	static int delta_base_offset = 1;
    18	static int pack_kept_objects = -1;
    19	static int write_bitmaps = -1;
    20	static int use_delta_islands;
    21	static char *packdir, *packtmp;

When somebody is tempted to update these to change the default used
by "git repack", it should be easy to notice that such a change must
be accompanied by a matching change to the lines you are introducing
in this patch, or we'll be out of sync.

The easiest way to avoid such a problem may be to stop bypassing
"git repack" and calling "pack-objects" ourselves.  That is the
reason why the configuration variables honored by "git repack" are
ignored in this codepath in the first place.  But that is not the
approach we are taking, so we need a reasonable way to tell those
who update this file and builtin/repack.c to make matching changes.
At the very least, perhaps we should give a comment above these two
lines in this file, e.g.

	/*
	 * when updating the default for these configuration
	 * variables in builtin/repack.c, these must be adjusted
	 * to match.
	 */
	int delta_base_offset = 1;
	int use_delta_islands = 0;

or something like that.

With that, the rest of the patch makes sense.

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline
  2020-05-09 14:24     ` [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline Son Luong Ngoc via GitGitGadget
@ 2020-05-09 16:55       ` Junio C Hamano
  0 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2020-05-09 16:55 UTC (permalink / raw)
  To: Son Luong Ngoc via GitGitGadget; +Cc: git, Son Luong Ngoc

"Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Son Luong Ngoc <sluongng@gmail.com>
>
> As the old versions of dash is deprecated, dollar-sign inside
> artihmetic expansion is no longer needed.
> This ensures t5319 follows the coding guideline updated
> in 'jk/arith-expansion-coding-guidelines' 6d4bf5813cd2c1a3b93fd4f0b231733f82133cce.

That does not match my understanding of the guideline.  By removing
the "dollar required" rule and not adding a new "dollar forbidden"
rule, we pretty much declared that "we do not care much either way"
[*1*].

Even if we cared, "Once it _is_ in the tree, it's not really worth
the patch noise to go and fix it up." rule from the guidelines
applies here.

Thanks.


[Reference]

*1* https://lore.kernel.org/git/20200505210741.GB645290@coredump.intra.peff.net/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-09 16:11       ` Đoàn Trần Công Danh
@ 2020-05-09 17:33         ` Junio C Hamano
  2020-05-10  6:38           ` Đoàn Trần Công Danh
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2020-05-09 17:33 UTC (permalink / raw)
  To: Đoàn Trần Công Danh
  Cc: Derrick Stolee via GitGitGadget, git, Son Luong Ngoc, Derrick Stolee

Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:

> On 2020-05-09 14:24:29+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>> 
>> +test_expect_success 'repack respects repack.packKeptObjects=false' '
>> +	test_when_finished rm -f dup/.git/objects/pack/*keep &&
>> +	(
>> +		cd dup &&
>> +		ls .git/objects/pack/*idx >idx-list &&
>
> I think ls(1) is an overkill.
> I think:
>
> 	echo .git/objects/pack/*idx
>
> is more efficient.

When there is no file whose name ends with idx, what happens?

    $ ls *idx && echo OK
    ls: cannot access '*idx': No such file or directory
    $ echo *idx && echo OK
    *idx
    OK

>> +		test_line_count = 5 idx-list &&
>> +		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
>
> Likewise.

Likewise.

>> +		for keep in $(cat keep-list)
>> +		do
>> +			touch $keep || return 1
>
> Is this intended?
> Since touch(1) accepts multiple files as argument.

Good suggestion, but doesn't .keep file record why the pack is kept
in real life (i.e. not an empty file)?

>> +		done &&
>> +		git multi-pack-index repack --batch-size=0 &&
>> +		ls .git/objects/pack/*idx >idx-list &&
>> +		test_line_count = 5 idx-list &&
>> +		test-tool read-midx .git/objects | grep idx >midx-list &&
>> +		test_line_count = 5 midx-list &&
>> +		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
>
> This line is overly long.
> Should we write test-tool's output to temp file and process it?
>
> And I think either
>
> 	sed -n '3{p;q}'
>
> or:
>
> 	sed -n 3p
>
> is cleaner than
>
> 	head -n 3 | tail -n 1

"sed -n 3p" is the only valid way to write it ;-)


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-09 17:33         ` Junio C Hamano
@ 2020-05-10  6:38           ` Đoàn Trần Công Danh
  2020-05-10 15:52             ` Son Luong Ngoc
  0 siblings, 1 reply; 26+ messages in thread
From: Đoàn Trần Công Danh @ 2020-05-10  6:38 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, Son Luong Ngoc, Derrick Stolee

On 2020-05-09 10:33:30-0700, Junio C Hamano <gitster@pobox.com> wrote:
> Đoàn Trần Công Danh  <congdanhqx@gmail.com> writes:
> 
> > On 2020-05-09 14:24:29+0000, Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com> wrote:
> >> From: Derrick Stolee <dstolee@microsoft.com>
> >> 
> >> +test_expect_success 'repack respects repack.packKeptObjects=false' '
> >> +	test_when_finished rm -f dup/.git/objects/pack/*keep &&
> >> +	(
> >> +		cd dup &&
> >> +		ls .git/objects/pack/*idx >idx-list &&
> >
> > I think ls(1) is an overkill.
> > I think:
> >
> > 	echo .git/objects/pack/*idx
> >
> > is more efficient.
> 
> When there is no file whose name ends with idx, what happens?
> 
>     $ ls *idx && echo OK
>     ls: cannot access '*idx': No such file or directory
>     $ echo *idx && echo OK
>     *idx
>     OK

Yes, but I think the next line is checking for the number of lines.
This is better to fail faster.

(My suggestion was wrong anyway, it should be "printf "%s\\n" *idx)

> >> +		test_line_count = 5 idx-list &&
> >> +		for keep in $(cat keep-list)
> >> +		do
> >> +			touch $keep || return 1
> >
> > Is this intended?
> > Since touch(1) accepts multiple files as argument.
> 
> Good suggestion, but doesn't .keep file record why the pack is kept
> in real life (i.e. not an empty file)?

Yes, in real life, we usually provide a reason in this .keep file.
But, we also allow empty file with git-index-pack --keep
I think simple touch is fine for this test.

Missing piece for my previous command:
if `keep-list` is empty, we may want to fail fast,
touch with empty list will error out (at least in my system).


-- 
Danh

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations
  2020-05-09 16:51       ` Junio C Hamano
@ 2020-05-10 14:27         ` Son Luong Ngoc
  0 siblings, 0 replies; 26+ messages in thread
From: Son Luong Ngoc @ 2020-05-10 14:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Son Luong Ngoc via GitGitGadget, git

On Sat, May 09, 2020 at 09:51:08AM -0700, Junio C Hamano wrote:
> "Son Luong Ngoc via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> > From: Son Luong Ngoc <sluongng@gmail.com>
> >
> > Previously, when the "repack" subcommand of "git multi-pack-index" command
> > creates new packfile(s), it does not call the "git repack" command but
> > instead directly calls the "git pack-objects" command, and the
> > configuration variables meant for the "git repack" command, like
> > "repack.usedaeltabaseoffset", are ignored.
> 
> When we talk about the current state of the code (i.e. before
> applying this patch), we do not say "previously".  It's not like you
> are complaining about a recent breakage, e.g. "previously X worked
> like this but since change Y, it instead works like that, which
> breaks Z".
> 
> > This patch ensured "git multi-pack-index" checks the configuration
> > variables used by "git repack" and passes the corresponding options to
> > the underlying "git pack-objects" command.
> 
> We write this part in imperative mood, as if we are giving an order
> to the codebase to "become like so".  We do not give an observation
> about the patch or the author ("This patch does X, this patch also
> does Y", "I do X, I do Y").
> 
> Taking these two together, perhaps like:
> 
>     When the "repack" subcommand of "git multi-pack-index" command
>     creates new packfile(s), it does not call the "git repack"
>     command but instead directly calls the "git pack-objects"
>     command, and the configuration variables meant for the "git
>     repack" command, like "repack.usedaeltabaseoffset", are ignored.
> 
>     Check the configuration variables used by "git repack" ourselves
>     in "git multi-index-pack" and pass the corresponding options to
>     underlying "git pack-objects".

Thanks for this, it will take me a bit to adjust to this style of
writing but I do find it to be a lot clearer and practical.
Will update in next version.

> 
> > Note that `repack.writeBitmaps` configuration is ignored, as the
> > pack bitmap facility is useful only with a single packfile.
> 
> Good.
> 
> > +	int delta_base_offset = 1;
> > +	int use_delta_islands = 0;
> 
> These give the default values for two configurations and over there
> builtin/repack.c has these lines:
> 
>     17	static int delta_base_offset = 1;
>     18	static int pack_kept_objects = -1;
>     19	static int write_bitmaps = -1;
>     20	static int use_delta_islands;
>     21	static char *packdir, *packtmp;
> 
> When somebody is tempted to update these to change the default used
> by "git repack", it should be easy to notice that such a change must
> be accompanied by a matching change to the lines you are introducing
> in this patch, or we'll be out of sync.
> 
> The easiest way to avoid such a problem may be to stop bypassing
> "git repack" and calling "pack-objects" ourselves.  That is the
> reason why the configuration variables honored by "git repack" are
> ignored in this codepath in the first place.  But that is not the
> approach we are taking, so we need a reasonable way to tell those
> who update this file and builtin/repack.c to make matching changes.
> At the very least, perhaps we should give a comment above these two
> lines in this file, e.g.
> 
> 	/*
> 	 * when updating the default for these configuration
> 	 * variables in builtin/repack.c, these must be adjusted
> 	 * to match.
> 	 */
> 	int delta_base_offset = 1;
> 	int use_delta_islands = 0;
> 
> or something like that.

Will add the comments in next version.

> 
> With that, the rest of the patch makes sense.
> 
> Thanks.

Cheers,
Son Luong

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-10  6:38           ` Đoàn Trần Công Danh
@ 2020-05-10 15:52             ` Son Luong Ngoc
  0 siblings, 0 replies; 26+ messages in thread
From: Son Luong Ngoc @ 2020-05-10 15:52 UTC (permalink / raw)
  To: Đoàn Trần Công Danh
  Cc: Junio C Hamano, Derrick Stolee via GitGitGadget, git, Derrick Stolee

Hi,

Thanks Danh and Junio for the testing improvement suggestions.
I think these are the points I will adopt into next version:

- Remove the 3rd patch and keep the removal of dollar sign locally
  inside `repack respects repack.packKeptObjects=false`.

- Change `head -n -3 | tail -n -1` to `sed -n 3p`

- Apply test_line_count on keep-list for failing fast (before touch)

Cheers,
Son Luong.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 0/2] midx: apply gitconfig to midx repack
  2020-05-09 14:24   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
                       ` (2 preceding siblings ...)
  2020-05-09 14:24     ` [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline Son Luong Ngoc via GitGitGadget
@ 2020-05-10 16:07     ` Son Luong Ngoc via GitGitGadget
  2020-05-10 16:07       ` [PATCH v4 1/2] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
  2020-05-10 16:07       ` [PATCH v4 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
  3 siblings, 2 replies; 26+ messages in thread
From: Son Luong Ngoc via GitGitGadget @ 2020-05-10 16:07 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc

Midx repack has largely been used in Microsoft Scalar on the client side to
optimize the repository multiple packs state. However when I tried to apply
this onto the server-side, I realized that there are certain features that
were lacking compare to git repack. Most of these features are highly
desirable on the server-side to create the most optimized pack possible.

One of the example is delta_base_offset, comparing an midx repack
with/without delta_base_offset, we can observe significant size differences.

> du objects/pack/*pack
14536   objects/pack/pack-08a017b424534c88191addda1aa5dd6f24bf7a29.pack
9435280 objects/pack/pack-8829c53ad1dca02e7311f8e5b404962ab242e8f1.pack

Latest 2.26.2 (without delta_base_offset)
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9446096 objects/pack/pack-366c75e2c2f987b9836d3bf0bf5e4a54b6975036.pack

With delta_base_offset
> git version
git version 2.26.2.672.g232c24e857.dirty
> git multi-pack-index write
> git multi-pack-index repack
> git multi-pack-index expire
> du objects/pack/*pack
9152512 objects/pack/pack-3bc8c1ec496ab95d26875f8367ff6807081e9e7d.pack

Note that repack.writeBitmaps configuration is ignored, as the pack bitmap
facility is useful only with a single packfile.

Derrick Stolee's following patch will address repack.packKeptObjects 
support.

Derrick Stolee (1):
  multi-pack-index: respect repack.packKeptObjects=false

Son Luong Ngoc (1):
  midx: teach "git multi-pack-index repack" honor "git repack"
    configurations

 Documentation/git-multi-pack-index.txt |  3 ++
 midx.c                                 | 42 +++++++++++++++++++++++---
 t/t5319-multi-pack-index.sh            | 27 +++++++++++++++++
 3 files changed, 67 insertions(+), 5 deletions(-)


base-commit: b994622632154fc3b17fb40a38819ad954a5fb88
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-626%2Fsluongng%2Fsluongngoc%2Fmidx-config-v4
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-626/sluongng/sluongngoc/midx-config-v4
Pull-Request: https://github.com/gitgitgadget/git/pull/626

Range-diff vs v3:

 1:  a925307d4c5 ! 1:  a8f75e34e5b midx: teach "git multi-pack-index repack" honor "git repack" configurations
     @@ Metadata
       ## Commit message ##
          midx: teach "git multi-pack-index repack" honor "git repack" configurations
      
     -    Previously, when the "repack" subcommand of "git multi-pack-index" command
     -    creates new packfile(s), it does not call the "git repack" command but
     -    instead directly calls the "git pack-objects" command, and the
     -    configuration variables meant for the "git repack" command, like
     -    "repack.usedaeltabaseoffset", are ignored.
     +    When the "repack" subcommand of "git multi-pack-index" command
     +    creates new packfile(s), it does not call the "git repack"
     +    command but instead directly calls the "git pack-objects"
     +    command, and the configuration variables meant for the "git
     +    repack" command, like "repack.usedaeltabaseoffset", are ignored.
      
     -    This patch ensured "git multi-pack-index" checks the configuration
     -    variables used by "git repack" and passes the corresponding options to
     -    the underlying "git pack-objects" command.
     +    Check the configuration variables used by "git repack" ourselves
     +    in "git multi-index-pack" and pass the corresponding options to
     +    underlying "git pack-objects".
      
          Note that `repack.writeBitmaps` configuration is ignored, as the
          pack bitmap facility is useful only with a single packfile.
     @@ Commit message
      
       ## midx.c ##
      @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
     - 	struct child_process cmd = CHILD_PROCESS_INIT;
       	struct strbuf base_name = STRBUF_INIT;
       	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
     + 
     ++	/*
     ++	 * When updating the default for these configuration
     ++	 * variables in builtin/repack.c, these must be adjusted
     ++	 * to match.
     ++	 */
      +	int delta_base_offset = 1;
      +	int use_delta_islands = 0;
     - 
     ++
       	if (!m)
       		return 0;
     + 
      @@ midx.c: int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
       	} else if (fill_included_packs_all(m, include_pack))
       		goto cleanup;
 2:  988697dd512 ! 2:  192fc785382 multi-pack-index: respect repack.packKeptObjects=false
     @@ t/t5319-multi-pack-index.sh: test_expect_success 'repack with minimum size does
      +		ls .git/objects/pack/*idx >idx-list &&
      +		test_line_count = 5 idx-list &&
      +		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
     ++		test_line_count = 5 keep-list &&
      +		for keep in $(cat keep-list)
      +		do
      +			touch $keep || return 1
     @@ t/t5319-multi-pack-index.sh: test_expect_success 'repack with minimum size does
      +		test_line_count = 5 idx-list &&
      +		test-tool read-midx .git/objects | grep idx >midx-list &&
      +		test_line_count = 5 midx-list &&
     -+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | head -n 3 | tail -n 1) &&
     -+		BATCH_SIZE=$(($THIRD_SMALLEST_SIZE + 1)) &&
     ++		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | sed -n 3p) &&
     ++		BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&
      +		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
      +		ls .git/objects/pack/*idx >idx-list &&
      +		test_line_count = 5 idx-list &&
 3:  efeb3d7d132 < -:  ----------- Ensured t5319 follows arith expansion guideline

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v4 1/2] midx: teach "git multi-pack-index repack" honor "git repack" configurations
  2020-05-10 16:07     ` [PATCH v4 0/2] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
@ 2020-05-10 16:07       ` Son Luong Ngoc via GitGitGadget
  2020-05-10 16:07       ` [PATCH v4 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
  1 sibling, 0 replies; 26+ messages in thread
From: Son Luong Ngoc via GitGitGadget @ 2020-05-10 16:07 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Son Luong Ngoc

From: Son Luong Ngoc <sluongng@gmail.com>

When the "repack" subcommand of "git multi-pack-index" command
creates new packfile(s), it does not call the "git repack"
command but instead directly calls the "git pack-objects"
command, and the configuration variables meant for the "git
repack" command, like "repack.usedaeltabaseoffset", are ignored.

Check the configuration variables used by "git repack" ourselves
in "git multi-index-pack" and pass the corresponding options to
underlying "git pack-objects".

Note that `repack.writeBitmaps` configuration is ignored, as the
pack bitmap facility is useful only with a single packfile.

Signed-off-by: Son Luong Ngoc <sluongng@gmail.com>
---
 midx.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/midx.c b/midx.c
index 9a61d3b37d9..d2a43bd1a38 100644
--- a/midx.c
+++ b/midx.c
@@ -1370,6 +1370,14 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	struct strbuf base_name = STRBUF_INIT;
 	struct multi_pack_index *m = load_multi_pack_index(object_dir, 1);
 
+	/*
+	 * When updating the default for these configuration
+	 * variables in builtin/repack.c, these must be adjusted
+	 * to match.
+	 */
+	int delta_base_offset = 1;
+	int use_delta_islands = 0;
+
 	if (!m)
 		return 0;
 
@@ -1381,12 +1389,20 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	} else if (fill_included_packs_all(m, include_pack))
 		goto cleanup;
 
+	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
+	repo_config_get_bool(r, "repack.usedeltaislands", &use_delta_islands);
+
 	argv_array_push(&cmd.args, "pack-objects");
 
 	strbuf_addstr(&base_name, object_dir);
 	strbuf_addstr(&base_name, "/pack/pack");
 	argv_array_push(&cmd.args, base_name.buf);
 
+	if (delta_base_offset)
+		argv_array_push(&cmd.args, "--delta-base-offset");
+	if (use_delta_islands)
+		argv_array_push(&cmd.args, "--delta-islands");
+
 	if (flags & MIDX_PROGRESS)
 		argv_array_push(&cmd.args, "--progress");
 	else
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v4 2/2] multi-pack-index: respect repack.packKeptObjects=false
  2020-05-10 16:07     ` [PATCH v4 0/2] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
  2020-05-10 16:07       ` [PATCH v4 1/2] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
@ 2020-05-10 16:07       ` Derrick Stolee via GitGitGadget
  1 sibling, 0 replies; 26+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-05-10 16:07 UTC (permalink / raw)
  To: git; +Cc: Son Luong Ngoc, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

When selecting a batch of pack-files to repack in the "git
multi-pack-index repack" command, Git should respect the
repack.packKeptObjects config option. When false, this option says that
the pack-files with an associated ".keep" file should not be repacked.
This config value is "false" by default.

There are two cases for selecting a batch of objects. The first is the
case where the input batch-size is zero, which specifies "repack
everything". The second is with a non-zero batch size, which selects
pack-files using a greedy selection criteria. Both of these cases are
updated and tested.

Reported-by: Son Luong Ngoc <sluongng@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-multi-pack-index.txt |  3 +++
 midx.c                                 | 26 ++++++++++++++++++++-----
 t/t5319-multi-pack-index.sh            | 27 ++++++++++++++++++++++++++
 3 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index 642d9ac5b72..0c6619493c1 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -56,6 +56,9 @@ repack::
 	file is created, rewrite the multi-pack-index to reference the
 	new pack-file. A later run of 'git multi-pack-index expire' will
 	delete the pack-files that were part of this batch.
++
+If `repack.packKeptObjects` is `false`, then any pack-files with an
+associated `.keep` file will not be selected for the batch to repack.
 
 
 EXAMPLES
diff --git a/midx.c b/midx.c
index d2a43bd1a38..6d1584ca51d 100644
--- a/midx.c
+++ b/midx.c
@@ -1293,15 +1293,26 @@ static int compare_by_mtime(const void *a_, const void *b_)
 	return 0;
 }
 
-static int fill_included_packs_all(struct multi_pack_index *m,
+static int fill_included_packs_all(struct repository *r,
+				   struct multi_pack_index *m,
 				   unsigned char *include_pack)
 {
-	uint32_t i;
+	uint32_t i, count = 0;
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
+
+	for (i = 0; i < m->num_packs; i++) {
+		if (prepare_midx_pack(r, m, i))
+			continue;
+		if (!pack_kept_objects && m->packs[i]->pack_keep)
+			continue;
 
-	for (i = 0; i < m->num_packs; i++)
 		include_pack[i] = 1;
+		count++;
+	}
 
-	return m->num_packs < 2;
+	return count < 2;
 }
 
 static int fill_included_packs_batch(struct repository *r,
@@ -1312,6 +1323,9 @@ static int fill_included_packs_batch(struct repository *r,
 	uint32_t i, packs_to_repack;
 	size_t total_size;
 	struct repack_info *pack_info = xcalloc(m->num_packs, sizeof(struct repack_info));
+	int pack_kept_objects = 0;
+
+	repo_config_get_bool(r, "repack.packkeptobjects", &pack_kept_objects);
 
 	for (i = 0; i < m->num_packs; i++) {
 		pack_info[i].pack_int_id = i;
@@ -1338,6 +1352,8 @@ static int fill_included_packs_batch(struct repository *r,
 
 		if (!p)
 			continue;
+		if (!pack_kept_objects && p->pack_keep)
+			continue;
 		if (open_pack_index(p) || !p->num_objects)
 			continue;
 
@@ -1386,7 +1402,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size,
 	if (batch_size) {
 		if (fill_included_packs_batch(r, m, include_pack, batch_size))
 			goto cleanup;
-	} else if (fill_included_packs_all(m, include_pack))
+	} else if (fill_included_packs_all(r, m, include_pack))
 		goto cleanup;
 
 	repo_config_get_bool(r, "repack.usedeltabaseoffset", &delta_base_offset);
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 030a7222b2a..7214cab36c0 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -538,6 +538,33 @@ test_expect_success 'repack with minimum size does not alter existing packs' '
 	)
 '
 
+test_expect_success 'repack respects repack.packKeptObjects=false' '
+	test_when_finished rm -f dup/.git/objects/pack/*keep &&
+	(
+		cd dup &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		ls .git/objects/pack/*.pack | sed "s/\.pack/.keep/" >keep-list &&
+		test_line_count = 5 keep-list &&
+		for keep in $(cat keep-list)
+		do
+			touch $keep || return 1
+		done &&
+		git multi-pack-index repack --batch-size=0 &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list &&
+		THIRD_SMALLEST_SIZE=$(test-tool path-utils file-size .git/objects/pack/*pack | sort -n | sed -n 3p) &&
+		BATCH_SIZE=$((THIRD_SMALLEST_SIZE + 1)) &&
+		git multi-pack-index repack --batch-size=$BATCH_SIZE &&
+		ls .git/objects/pack/*idx >idx-list &&
+		test_line_count = 5 idx-list &&
+		test-tool read-midx .git/objects | grep idx >midx-list &&
+		test_line_count = 5 midx-list
+	)
+'
+
 test_expect_success 'repack creates a new pack' '
 	(
 		cd dup &&
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-05-10 16:07 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05 13:06 [PATCH] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
2020-05-05 13:50 ` Derrick Stolee
2020-05-05 16:03   ` Son Luong Ngoc
2020-05-06  8:56     ` Son Luong Ngoc
2020-05-06  9:43 ` [PATCH v2 0/2] " Son Luong Ngoc via GitGitGadget
2020-05-06  9:43   ` [PATCH v2 1/2] " Son Luong Ngoc via GitGitGadget
2020-05-06 12:03     ` Derrick Stolee
2020-05-06 17:03     ` Junio C Hamano
2020-05-07  7:29       ` Son Luong Ngoc
2020-05-06  9:43   ` [PATCH v2 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
2020-05-06 16:18     ` Eric Sunshine
2020-05-06 16:36       ` Derrick Stolee
2020-05-09 14:24   ` [PATCH v3 0/3] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
2020-05-09 14:24     ` [PATCH v3 1/3] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
2020-05-09 16:51       ` Junio C Hamano
2020-05-10 14:27         ` Son Luong Ngoc
2020-05-09 14:24     ` [PATCH v3 2/3] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget
2020-05-09 16:11       ` Đoàn Trần Công Danh
2020-05-09 17:33         ` Junio C Hamano
2020-05-10  6:38           ` Đoàn Trần Công Danh
2020-05-10 15:52             ` Son Luong Ngoc
2020-05-09 14:24     ` [PATCH v3 3/3] Ensured t5319 follows arith expansion guideline Son Luong Ngoc via GitGitGadget
2020-05-09 16:55       ` Junio C Hamano
2020-05-10 16:07     ` [PATCH v4 0/2] midx: apply gitconfig to midx repack Son Luong Ngoc via GitGitGadget
2020-05-10 16:07       ` [PATCH v4 1/2] midx: teach "git multi-pack-index repack" honor "git repack" configurations Son Luong Ngoc via GitGitGadget
2020-05-10 16:07       ` [PATCH v4 2/2] multi-pack-index: respect repack.packKeptObjects=false Derrick Stolee via GitGitGadget

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.