git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] builtin/repack.c: invalidate MIDX only when necessary
@ 2020-08-25  2:01 Taylor Blau
  2020-08-25  2:26 ` Jeff King
  2020-08-25  7:55 ` [PATCH] builtin/repack.c: invalidate MIDX only when necessary Son Luong Ngoc
  0 siblings, 2 replies; 78+ messages in thread
From: Taylor Blau @ 2020-08-25  2:01 UTC (permalink / raw)
  To: git; +Cc: dstolee

In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
learned to remove a multi-pack-index file if it added or removed a pack
from the object store.

This mechanism is a little over-eager, since it is only necessary to
drop a MIDX if 'git repack' removes a pack that the MIDX references.
Adding a pack outside of the MIDX does not require invalidating the
MIDX, and likewise for removing a pack the MIDX does not know about.

Teach 'git repack' to check for this by loading the MIDX, and checking
whether the to-be-removed pack is known to the MIDX. This requires a
slightly odd alternation to a test in t5319, which is explained with a
comment.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Something I noticed while working on a MIDX-related topic. Not critical
that we take this now since it's only dropping the MIDX too
aggressively, but not otherwise doing anything destructive. But, this
will make my somewhat large other series a little smaller.

 builtin/repack.c            | 12 +++++-------
 t/t5319-multi-pack-index.sh | 21 +++++++++++++++++++--
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 04c5ceaf7e..98fac03946 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
 	struct strbuf buf = STRBUF_INIT;
-	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
+	struct multi_pack_index *m = get_multi_pack_index(the_repository);
+	strbuf_addf(&buf, "%s.pack", base_name);
+	if (m && midx_contains_pack(m, buf.buf))
+		clear_midx_file(the_repository);
+	strbuf_insertf(&buf, 0, "%s/", dir_name);
 	unlink_pack_path(buf.buf, 1);
 	strbuf_release(&buf);
 }
@@ -286,7 +290,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	int keep_unreachable = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
 	int no_update_server_info = 0;
-	int midx_cleared = 0;
 	struct pack_objects_args po_args = {NULL};

 	struct option builtin_repack_options[] = {
@@ -439,11 +442,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
 			char *fname, *fname_old;

-			if (!midx_cleared) {
-				clear_midx_file(the_repository);
-				midx_cleared = 1;
-			}
-
 			fname = mkpathdup("%s/pack-%s%s", packdir,
 						item->string, exts[ext].name);
 			if (!file_exists(fname)) {
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 43b1b5b2af..16a1ad040e 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -382,12 +382,29 @@ test_expect_success 'repack with the --no-progress option' '
 	test_line_count = 0 err
 '

-test_expect_success 'repack removes multi-pack-index' '
+test_expect_success 'repack removes multi-pack-index when deleting packs' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
-	GIT_TEST_MULTI_PACK_INDEX=0 git repack -adf &&
+	# Set GIT_TEST_MULTI_PACK_INDEX to 0 to avoid writing a new
+	# multi-pack-index after repacking, but set "core.multiPackIndex" to
+	# true so that "git repack" can read the existing MIDX.
+	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -adf &&
 	test_path_is_missing $objdir/pack/multi-pack-index
 '

+test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
+	git multi-pack-index write &&
+	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
+	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
+
+	# Write a new pack that is unknown to the multi-pack-index.
+	git hash-object -w </dev/null >blob &&
+	git pack-objects $objdir/pack/pack <blob &&
+
+	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
+	test_cmp_bin $objdir/pack/multi-pack-index \
+		$objdir/pack/multi-pack-index.bak
+'
+
 compare_results_with_midx "after repack"

 test_expect_success 'multi-pack-index and pack-bitmap' '
--
2.28.0.rc1.13.ge78abce653

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25  2:01 [PATCH] builtin/repack.c: invalidate MIDX only when necessary Taylor Blau
@ 2020-08-25  2:26 ` Jeff King
  2020-08-25  2:37   ` Taylor Blau
  2020-08-25 15:58   ` Junio C Hamano
  2020-08-25  7:55 ` [PATCH] builtin/repack.c: invalidate MIDX only when necessary Son Luong Ngoc
  1 sibling, 2 replies; 78+ messages in thread
From: Jeff King @ 2020-08-25  2:26 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee

On Mon, Aug 24, 2020 at 10:01:04PM -0400, Taylor Blau wrote:

> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
> learned to remove a multi-pack-index file if it added or removed a pack
> from the object store.
> 
> This mechanism is a little over-eager, since it is only necessary to
> drop a MIDX if 'git repack' removes a pack that the MIDX references.
> Adding a pack outside of the MIDX does not require invalidating the
> MIDX, and likewise for removing a pack the MIDX does not know about.

Does "git repack" ever remove just one pack? Obviously "git repack -ad"
or "git repack -Ad" is going to pack everything and delete the old
packs. So I think we'd want to remove a midx there.

And "git repack -d" I think of as deleting only loose objects that we
just packed. But I guess it could also remove a pack that has now been
made redundant? That seems like a rare case in practice, but I suppose
is possible.

Not exactly related to your fix, but kind of the flip side of it: would
we ever need to retain a midx that mentions some packs that still exist?

E.g., imagine we have a midx that points to packs A and B, and
git-repack deletes B. By your logic above, we need to remove the midx
because now it points to objects in B which aren't accessible. But by
deleting it, could we be deleting the only thing that mentions the
objects in A?

I _think_ the answer is "no", because we never went all-in on midx and
allowed deleting the matching .idx files for contained packs. So we'd
still have that A.idx, and we could just use the pack as normal. But
it's an interesting corner case if we ever do go in that direction.

If you'll let me muse a bit more on midx-lifetime issues (which I've
never really thought about before just now):

I'm also a little curious how bad it is to have a midx whose pack has
gone away. I guess we'd answer queries for "yes, we have this object"
even if we don't, which is bad. Though in practice we'd only delete
those packs if we have their objects elsewhere. And the pack code is
pretty good about retrying other copies of objects that can't be
accessed. Alternatively, I wonder if the midx-loading code ought to
check that all of the constituent packs are available.

In that line of thinking, do we even need to delete midx files if one of
their packs goes away? The reading side probably ought to be able to
handle that gracefully.

And the more interesting case is when you repack everything with "-ad"
or similar, at which point you shouldn't even need to look up what's in
the midx to see if you deleted its packs. The point of your operation is
to put it all-into-one, so you know the old midx should be discarded.

> Teach 'git repack' to check for this by loading the MIDX, and checking
> whether the to-be-removed pack is known to the MIDX. This requires a
> slightly odd alternation to a test in t5319, which is explained with a
> comment.

My above musings aside, this seems like an obvious improvement.

> diff --git a/builtin/repack.c b/builtin/repack.c
> index 04c5ceaf7e..98fac03946 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
>  {
>  	struct strbuf buf = STRBUF_INIT;
> -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
> +	strbuf_addf(&buf, "%s.pack", base_name);
> +	if (m && midx_contains_pack(m, buf.buf))
> +		clear_midx_file(the_repository);
> +	strbuf_insertf(&buf, 0, "%s/", dir_name);

Makes sense. midx_contains_pack() is a binary search, so we'll spend
O(n log n) effort deleting the packs (I wondered if this might be
accidentally quadratic over the number of packs).

And after we clear, "m" will be NULL, so we'll do it at most once. Which
is why you can get rid of the manual "midx_cleared" flag from the
preimage.

So the patch looks good to me.

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25  2:26 ` Jeff King
@ 2020-08-25  2:37   ` Taylor Blau
  2020-08-25 13:14     ` Derrick Stolee
  2020-08-25 15:58   ` Junio C Hamano
  1 sibling, 1 reply; 78+ messages in thread
From: Taylor Blau @ 2020-08-25  2:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee

On Mon, Aug 24, 2020 at 10:26:14PM -0400, Jeff King wrote:
> On Mon, Aug 24, 2020 at 10:01:04PM -0400, Taylor Blau wrote:
>
> > In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
> > learned to remove a multi-pack-index file if it added or removed a pack
> > from the object store.
> >
> > This mechanism is a little over-eager, since it is only necessary to
> > drop a MIDX if 'git repack' removes a pack that the MIDX references.
> > Adding a pack outside of the MIDX does not require invalidating the
> > MIDX, and likewise for removing a pack the MIDX does not know about.
>
> Does "git repack" ever remove just one pack? Obviously "git repack -ad"
> or "git repack -Ad" is going to pack everything and delete the old
> packs. So I think we'd want to remove a midx there.
>
> And "git repack -d" I think of as deleting only loose objects that we
> just packed. But I guess it could also remove a pack that has now been
> made redundant? That seems like a rare case in practice, but I suppose
> is possible.

Yeah, the patch message makes this sound more likely than it actually
is, which I agree is very rare. I often write 'git repack' instead of
'git pack-objects' to slurp up everything loose into a new pack without
having to list loose objects by name.

That's the case that I really care about here: purely adding a new pack
should not invalidate the existing MIDX.

> Not exactly related to your fix, but kind of the flip side of it: would
> we ever need to retain a midx that mentions some packs that still exist?
>
> E.g., imagine we have a midx that points to packs A and B, and
> git-repack deletes B. By your logic above, we need to remove the midx
> because now it points to objects in B which aren't accessible. But by
> deleting it, could we be deleting the only thing that mentions the
> objects in A?
>
> I _think_ the answer is "no", because we never went all-in on midx and
> allowed deleting the matching .idx files for contained packs. So we'd
> still have that A.idx, and we could just use the pack as normal. But
> it's an interesting corner case if we ever do go in that direction.

Agreed. Maybe a (admittedly somewhat large) #leftoverbits.

> If you'll let me muse a bit more on midx-lifetime issues (which I've
> never really thought about before just now):
>
> I'm also a little curious how bad it is to have a midx whose pack has
> gone away. I guess we'd answer queries for "yes, we have this object"
> even if we don't, which is bad. Though in practice we'd only delete
> those packs if we have their objects elsewhere. And the pack code is
> pretty good about retrying other copies of objects that can't be
> accessed. Alternatively, I wonder if the midx-loading code ought to
> check that all of the constituent packs are available.
>
> In that line of thinking, do we even need to delete midx files if one of
> their packs goes away? The reading side probably ought to be able to
> handle that gracefully.

I think that this is probably the right direction, although I've only
spend time in the MIDX code over the past couple of weeks, so I can't
say with authority. It seems like it would be pretty annoying, though.
For example, code that cares about listing all objects in a MIDX would
have to check first whether the pack they're in still exists before
emitting them. On top of that, there are more corner cases when object X
exists in more than one pack, but some strict subset of those packs
containing X have gone away.

I don't think that it couldn't be done, though.

> And the more interesting case is when you repack everything with "-ad"
> or similar, at which point you shouldn't even need to look up what's in
> the midx to see if you deleted its packs. The point of your operation is
> to put it all-into-one, so you know the old midx should be discarded.
>
> > Teach 'git repack' to check for this by loading the MIDX, and checking
> > whether the to-be-removed pack is known to the MIDX. This requires a
> > slightly odd alternation to a test in t5319, which is explained with a
> > comment.
>
> My above musings aside, this seems like an obvious improvement.
>
> > diff --git a/builtin/repack.c b/builtin/repack.c
> > index 04c5ceaf7e..98fac03946 100644
> > --- a/builtin/repack.c
> > +++ b/builtin/repack.c
> > @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
> >  static void remove_redundant_pack(const char *dir_name, const char *base_name)
> >  {
> >  	struct strbuf buf = STRBUF_INIT;
> > -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> > +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
> > +	strbuf_addf(&buf, "%s.pack", base_name);
> > +	if (m && midx_contains_pack(m, buf.buf))
> > +		clear_midx_file(the_repository);
> > +	strbuf_insertf(&buf, 0, "%s/", dir_name);
>
> Makes sense. midx_contains_pack() is a binary search, so we'll spend
> O(n log n) effort deleting the packs (I wondered if this might be
> accidentally quadratic over the number of packs).

Right. The MIDX stores packs in lexographic order, so checking them is
O(log n), which we do at most 'n' times.

> And after we clear, "m" will be NULL, so we'll do it at most once. Which
> is why you can get rid of the manual "midx_cleared" flag from the
> preimage.

Yep. I thought briefly about passing 'm' as a parameter, but then you
have to worry about a dangling reference to
'the_repository->objects->multi_pack_index' after calling
'clear_midx_file()', so it's easier to look it up each time.

> So the patch looks good to me.

Thanks.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25  2:01 [PATCH] builtin/repack.c: invalidate MIDX only when necessary Taylor Blau
  2020-08-25  2:26 ` Jeff King
@ 2020-08-25  7:55 ` Son Luong Ngoc
  2020-08-25 12:45   ` Derrick Stolee
  2020-08-25 14:45   ` Taylor Blau
  1 sibling, 2 replies; 78+ messages in thread
From: Son Luong Ngoc @ 2020-08-25  7:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, dstolee

Hi Taylor,

Thanks for working on this.

> On Aug 25, 2020, at 04:01, Taylor Blau <me@ttaylorr.com> wrote:
> 
> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
> learned to remove a multi-pack-index file if it added or removed a pack
> from the object store.
> 
> This mechanism is a little over-eager, since it is only necessary to
> drop a MIDX if 'git repack' removes a pack that the MIDX references.
> Adding a pack outside of the MIDX does not require invalidating the
> MIDX, and likewise for removing a pack the MIDX does not know about.

I wonder if its worth to trigger write_midx_file() to update the midx instead of
just removing MIDX?

That is already the direction we are taking in the 'maintenance' patch series
whenever the multi-pack-index file was deemed invalid.

Or perhaps, we can check for 'core.multiPackIndex' value (which recently is 
'true' by default) and determine whether we should remove the MIDX or rewrite
it?

> 
> Teach 'git repack' to check for this by loading the MIDX, and checking
> whether the to-be-removed pack is known to the MIDX. This requires a
> slightly odd alternation to a test in t5319, which is explained with a
> comment.
> 
> Signed-off-by: Taylor Blau <me@ttaylorr.com>

Cheers,
Son Luong.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25  7:55 ` [PATCH] builtin/repack.c: invalidate MIDX only when necessary Son Luong Ngoc
@ 2020-08-25 12:45   ` Derrick Stolee
  2020-08-25 14:45   ` Taylor Blau
  1 sibling, 0 replies; 78+ messages in thread
From: Derrick Stolee @ 2020-08-25 12:45 UTC (permalink / raw)
  To: Son Luong Ngoc, Taylor Blau; +Cc: git, dstolee

On 8/25/2020 3:55 AM, Son Luong Ngoc wrote:
> Hi Taylor,
> 
> Thanks for working on this.
> 
>> On Aug 25, 2020, at 04:01, Taylor Blau <me@ttaylorr.com> wrote:
>>
>> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
>> learned to remove a multi-pack-index file if it added or removed a pack
>> from the object store.
>>
>> This mechanism is a little over-eager, since it is only necessary to
>> drop a MIDX if 'git repack' removes a pack that the MIDX references.
>> Adding a pack outside of the MIDX does not require invalidating the
>> MIDX, and likewise for removing a pack the MIDX does not know about.
> 
> I wonder if its worth to trigger write_midx_file() to update the midx instead of
> just removing MIDX?

It would be reasonable to have 'git repack' run write_midx_file() after updating
the pack-files, but it still needs to delete the existing multi-pack-index after
deleting a referenced pack, as the current multi-pack-index will be incorrect,
leading to possibly invalid data during the write. 'git multi-pack-index expire'
carefully handles deleting pack-files that have become redundant.

It may be possible to change the behavior of write_midx_file() to check for the
state of each referenced pack-file, but it would then need to re-scan the
pack-indexes to rebuild the set of objects and which pack-files contain them.

> That is already the direction we are taking in the 'maintenance' patch series
> whenever the multi-pack-index file was deemed invalid.

The 'maintenance' builtin definitely adds new MIDX-aware maintenance tasks.
By performing a repack using the multi-pack-index as the starting point, we
can get around these issues.

One thing I've noticed by talking with Taylor about this change is that
'git repack' is a bit like 'git gc' in that it does a LOT of things and it
can be hard to understand exactly when certain sub-tasks are run or not.
There are likely some interesting operations that could be separated into
maintenance tasks. For example, 'git repack -d' is very similar to
'git maintenance run --task=loose-objects'.

> Or perhaps, we can check for 'core.multiPackIndex' value (which recently is 
> 'true' by default) and determine whether we should remove the MIDX or rewrite
> it?

We currently rewrite it during 'git repack' when GIT_TEST_MULTI_PACK_INDEX=1
to maximize how often we have a multi-pack-index in the test suite. It would
be simple to update that to also check a config option. I don't think adding
that behavior to 'core.multiPackIndex' is a good direction, though.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25  2:37   ` Taylor Blau
@ 2020-08-25 13:14     ` Derrick Stolee
  2020-08-25 14:41       ` Taylor Blau
  0 siblings, 1 reply; 78+ messages in thread
From: Derrick Stolee @ 2020-08-25 13:14 UTC (permalink / raw)
  To: Taylor Blau, Jeff King; +Cc: git, dstolee

On 8/24/2020 10:37 PM, Taylor Blau wrote:
> On Mon, Aug 24, 2020 at 10:26:14PM -0400, Jeff King wrote:
>> On Mon, Aug 24, 2020 at 10:01:04PM -0400, Taylor Blau wrote:
>>
>>> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
>>> learned to remove a multi-pack-index file if it added or removed a pack
>>> from the object store.
>>>
>>> This mechanism is a little over-eager, since it is only necessary to
>>> drop a MIDX if 'git repack' removes a pack that the MIDX references.
>>> Adding a pack outside of the MIDX does not require invalidating the
>>> MIDX, and likewise for removing a pack the MIDX does not know about.
>>
>> Does "git repack" ever remove just one pack? Obviously "git repack -ad"
>> or "git repack -Ad" is going to pack everything and delete the old
>> packs. So I think we'd want to remove a midx there.
>>
>> And "git repack -d" I think of as deleting only loose objects that we
>> just packed. But I guess it could also remove a pack that has now been
>> made redundant? That seems like a rare case in practice, but I suppose
>> is possible.
> 
> Yeah, the patch message makes this sound more likely than it actually
> is, which I agree is very rare. I often write 'git repack' instead of
> 'git pack-objects' to slurp up everything loose into a new pack without
> having to list loose objects by name.
> 
> That's the case that I really care about here: purely adding a new pack
> should not invalidate the existing MIDX.
> 
>> Not exactly related to your fix, but kind of the flip side of it: would
>> we ever need to retain a midx that mentions some packs that still exist?
>>
>> E.g., imagine we have a midx that points to packs A and B, and
>> git-repack deletes B. By your logic above, we need to remove the midx
>> because now it points to objects in B which aren't accessible. But by
>> deleting it, could we be deleting the only thing that mentions the
>> objects in A?
>>
>> I _think_ the answer is "no", because we never went all-in on midx and
>> allowed deleting the matching .idx files for contained packs. So we'd
>> still have that A.idx, and we could just use the pack as normal. But
>> it's an interesting corner case if we ever do go in that direction.
> 
> Agreed. Maybe a (admittedly somewhat large) #leftoverbits.
> 
>> If you'll let me muse a bit more on midx-lifetime issues (which I've
>> never really thought about before just now):
>>
>> I'm also a little curious how bad it is to have a midx whose pack has
>> gone away. I guess we'd answer queries for "yes, we have this object"
>> even if we don't, which is bad. Though in practice we'd only delete
>> those packs if we have their objects elsewhere. And the pack code is
>> pretty good about retrying other copies of objects that can't be
>> accessed. Alternatively, I wonder if the midx-loading code ought to
>> check that all of the constituent packs are available.
>>
>> In that line of thinking, do we even need to delete midx files if one of
>> their packs goes away? The reading side probably ought to be able to
>> handle that gracefully.
> 
> I think that this is probably the right direction, although I've only
> spend time in the MIDX code over the past couple of weeks, so I can't
> say with authority. It seems like it would be pretty annoying, though.
> For example, code that cares about listing all objects in a MIDX would
> have to check first whether the pack they're in still exists before
> emitting them. On top of that, there are more corner cases when object X
> exists in more than one pack, but some strict subset of those packs
> containing X have gone away.
> 
> I don't think that it couldn't be done, though.
> 
>> And the more interesting case is when you repack everything with "-ad"
>> or similar, at which point you shouldn't even need to look up what's in
>> the midx to see if you deleted its packs. The point of your operation is
>> to put it all-into-one, so you know the old midx should be discarded.
>>
>>> Teach 'git repack' to check for this by loading the MIDX, and checking
>>> whether the to-be-removed pack is known to the MIDX. This requires a
>>> slightly odd alternation to a test in t5319, which is explained with a
>>> comment.
>>
>> My above musings aside, this seems like an obvious improvement.
>>
>>> diff --git a/builtin/repack.c b/builtin/repack.c
>>> index 04c5ceaf7e..98fac03946 100644
>>> --- a/builtin/repack.c
>>> +++ b/builtin/repack.c
>>> @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
>>>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
>>>  {
>>>  	struct strbuf buf = STRBUF_INIT;
>>> -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
>>> +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
>>> +	strbuf_addf(&buf, "%s.pack", base_name);
>>> +	if (m && midx_contains_pack(m, buf.buf))
>>> +		clear_midx_file(the_repository);
>>> +	strbuf_insertf(&buf, 0, "%s/", dir_name);
>>
>> Makes sense. midx_contains_pack() is a binary search, so we'll spend
>> O(n log n) effort deleting the packs (I wondered if this might be
>> accidentally quadratic over the number of packs).
> 
> Right. The MIDX stores packs in lexographic order, so checking them is
> O(log n), which we do at most 'n' times.
> 
>> And after we clear, "m" will be NULL, so we'll do it at most once. Which
>> is why you can get rid of the manual "midx_cleared" flag from the
>> preimage.
> 
> Yep. I thought briefly about passing 'm' as a parameter, but then you
> have to worry about a dangling reference to
> 'the_repository->objects->multi_pack_index' after calling
> 'clear_midx_file()', so it's easier to look it up each time.

The discussion in this thread matches my understanding of the
situation.

>> So the patch looks good to me.

The code in builtin/repack.c looks good for sure. I have a quick question
about this new test:

+test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
+	git multi-pack-index write &&
+	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
+	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
+
+	# Write a new pack that is unknown to the multi-pack-index.
+	git hash-object -w </dev/null >blob &&
+	git pack-objects $objdir/pack/pack <blob &&
+
+	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
+	test_cmp_bin $objdir/pack/multi-pack-index \
+		$objdir/pack/multi-pack-index.bak
+'
+

You create an arbitrary blob, and then add it to a pack-file. Do we
know that 'git repack' is definitely creating a new pack-file that makes
our manually-created pack-file redundant?

My suggestion is to have the test check itself:

+test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
+	git multi-pack-index write &&
+	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
+	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
+
+	# Write a new pack that is unknown to the multi-pack-index.
+	git hash-object -w </dev/null >blob &&
+	HASH=$(git pack-objects $objdir/pack/pack <blob) &&
+
+	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
+	test_cmp_bin $objdir/pack/multi-pack-index \
+		$objdir/pack/multi-pack-index.bak &&
+	test_path_is_missing $objdir/pack/pack-$HASH.pack
+'
+

This test fails for me, on the 'test_path_is_missing'. Likely, the
blob is seen as already in a pack-file so is just pruned by 'git repack'
instead. I thought that perhaps we need to add a new pack ourselves that
overrides the small pack. Here is my attempt:

test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
	git multi-pack-index write &&
	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&

	# Write a new pack that is unknown to the multi-pack-index.
	BLOB1=$(echo blob1 | git hash-object -w --stdin) &&
	BLOB2=$(echo blob2 | git hash-object -w --stdin) &&
	cat >blobs <<-EOF &&
	$BLOB1
	$BLOB2
	EOF
	HASH1=$(echo $BLOB1 | git pack-objects $objdir/pack/pack) &&
	HASH2=$(git pack-objects $objdir/pack/pack <blobs) &&
	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
	test_cmp_bin $objdir/pack/multi-pack-index \
		$objdir/pack/multi-pack-index.bak &&
	test_path_is_file $objdir/pack/pack-$HASH2.pack &&
	test_path_is_missing $objdir/pack/pack-$HASH1.pack
'

However, this _still_ fails on the "test_path_is_missing" line, so I'm not sure
how to make sure your logic is tested. I saw that 'git repack' was writing
"nothing new to pack" in the output, so I also tested adding a few commits and
trying to force it to repack reachable data, but I cannot seem to trigger it
to create a new pack that overrides only one pack that is not in the MIDX.

Likely, I just don't know how 'git rebase' works well enough to trigger this
behavior. But the test as-is is not testing what you want it to test.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 13:14     ` Derrick Stolee
@ 2020-08-25 14:41       ` Taylor Blau
  2020-08-25 15:14         ` Derrick Stolee
  0 siblings, 1 reply; 78+ messages in thread
From: Taylor Blau @ 2020-08-25 14:41 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Jeff King, git, dstolee

On Tue, Aug 25, 2020 at 09:14:19AM -0400, Derrick Stolee wrote:
> On 8/24/2020 10:37 PM, Taylor Blau wrote:
> > On Mon, Aug 24, 2020 at 10:26:14PM -0400, Jeff King wrote:
> >> On Mon, Aug 24, 2020 at 10:01:04PM -0400, Taylor Blau wrote:
> >>
> >>> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
> >>> learned to remove a multi-pack-index file if it added or removed a pack
> >>> from the object store.
> >>>
> >>> This mechanism is a little over-eager, since it is only necessary to
> >>> drop a MIDX if 'git repack' removes a pack that the MIDX references.
> >>> Adding a pack outside of the MIDX does not require invalidating the
> >>> MIDX, and likewise for removing a pack the MIDX does not know about.
> >>
> >> Does "git repack" ever remove just one pack? Obviously "git repack -ad"
> >> or "git repack -Ad" is going to pack everything and delete the old
> >> packs. So I think we'd want to remove a midx there.
> >>
> >> And "git repack -d" I think of as deleting only loose objects that we
> >> just packed. But I guess it could also remove a pack that has now been
> >> made redundant? That seems like a rare case in practice, but I suppose
> >> is possible.
> >
> > Yeah, the patch message makes this sound more likely than it actually
> > is, which I agree is very rare. I often write 'git repack' instead of
> > 'git pack-objects' to slurp up everything loose into a new pack without
> > having to list loose objects by name.
> >
> > That's the case that I really care about here: purely adding a new pack
> > should not invalidate the existing MIDX.
> >
> >> Not exactly related to your fix, but kind of the flip side of it: would
> >> we ever need to retain a midx that mentions some packs that still exist?
> >>
> >> E.g., imagine we have a midx that points to packs A and B, and
> >> git-repack deletes B. By your logic above, we need to remove the midx
> >> because now it points to objects in B which aren't accessible. But by
> >> deleting it, could we be deleting the only thing that mentions the
> >> objects in A?
> >>
> >> I _think_ the answer is "no", because we never went all-in on midx and
> >> allowed deleting the matching .idx files for contained packs. So we'd
> >> still have that A.idx, and we could just use the pack as normal. But
> >> it's an interesting corner case if we ever do go in that direction.
> >
> > Agreed. Maybe a (admittedly somewhat large) #leftoverbits.
> >
> >> If you'll let me muse a bit more on midx-lifetime issues (which I've
> >> never really thought about before just now):
> >>
> >> I'm also a little curious how bad it is to have a midx whose pack has
> >> gone away. I guess we'd answer queries for "yes, we have this object"
> >> even if we don't, which is bad. Though in practice we'd only delete
> >> those packs if we have their objects elsewhere. And the pack code is
> >> pretty good about retrying other copies of objects that can't be
> >> accessed. Alternatively, I wonder if the midx-loading code ought to
> >> check that all of the constituent packs are available.
> >>
> >> In that line of thinking, do we even need to delete midx files if one of
> >> their packs goes away? The reading side probably ought to be able to
> >> handle that gracefully.
> >
> > I think that this is probably the right direction, although I've only
> > spend time in the MIDX code over the past couple of weeks, so I can't
> > say with authority. It seems like it would be pretty annoying, though.
> > For example, code that cares about listing all objects in a MIDX would
> > have to check first whether the pack they're in still exists before
> > emitting them. On top of that, there are more corner cases when object X
> > exists in more than one pack, but some strict subset of those packs
> > containing X have gone away.
> >
> > I don't think that it couldn't be done, though.
> >
> >> And the more interesting case is when you repack everything with "-ad"
> >> or similar, at which point you shouldn't even need to look up what's in
> >> the midx to see if you deleted its packs. The point of your operation is
> >> to put it all-into-one, so you know the old midx should be discarded.
> >>
> >>> Teach 'git repack' to check for this by loading the MIDX, and checking
> >>> whether the to-be-removed pack is known to the MIDX. This requires a
> >>> slightly odd alternation to a test in t5319, which is explained with a
> >>> comment.
> >>
> >> My above musings aside, this seems like an obvious improvement.
> >>
> >>> diff --git a/builtin/repack.c b/builtin/repack.c
> >>> index 04c5ceaf7e..98fac03946 100644
> >>> --- a/builtin/repack.c
> >>> +++ b/builtin/repack.c
> >>> @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
> >>>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
> >>>  {
> >>>  	struct strbuf buf = STRBUF_INIT;
> >>> -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> >>> +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
> >>> +	strbuf_addf(&buf, "%s.pack", base_name);
> >>> +	if (m && midx_contains_pack(m, buf.buf))
> >>> +		clear_midx_file(the_repository);
> >>> +	strbuf_insertf(&buf, 0, "%s/", dir_name);
> >>
> >> Makes sense. midx_contains_pack() is a binary search, so we'll spend
> >> O(n log n) effort deleting the packs (I wondered if this might be
> >> accidentally quadratic over the number of packs).
> >
> > Right. The MIDX stores packs in lexographic order, so checking them is
> > O(log n), which we do at most 'n' times.
> >
> >> And after we clear, "m" will be NULL, so we'll do it at most once. Which
> >> is why you can get rid of the manual "midx_cleared" flag from the
> >> preimage.
> >
> > Yep. I thought briefly about passing 'm' as a parameter, but then you
> > have to worry about a dangling reference to
> > 'the_repository->objects->multi_pack_index' after calling
> > 'clear_midx_file()', so it's easier to look it up each time.
>
> The discussion in this thread matches my understanding of the
> situation.
>
> >> So the patch looks good to me.
>
> The code in builtin/repack.c looks good for sure. I have a quick question
> about this new test:
>
> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> +	git multi-pack-index write &&
> +	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> +	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> +
> +	# Write a new pack that is unknown to the multi-pack-index.
> +	git hash-object -w </dev/null >blob &&
> +	git pack-objects $objdir/pack/pack <blob &&
> +
> +	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> +	test_cmp_bin $objdir/pack/multi-pack-index \
> +		$objdir/pack/multi-pack-index.bak
> +'
> +
>
> You create an arbitrary blob, and then add it to a pack-file. Do we
> know that 'git repack' is definitely creating a new pack-file that makes
> our manually-created pack-file redundant?
>
> My suggestion is to have the test check itself:
>
> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> +	git multi-pack-index write &&
> +	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> +	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> +
> +	# Write a new pack that is unknown to the multi-pack-index.
> +	git hash-object -w </dev/null >blob &&
> +	HASH=$(git pack-objects $objdir/pack/pack <blob) &&
> +
> +	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> +	test_cmp_bin $objdir/pack/multi-pack-index \
> +		$objdir/pack/multi-pack-index.bak &&
> +	test_path_is_missing $objdir/pack/pack-$HASH.pack
> +'
> +
>
> This test fails for me, on the 'test_path_is_missing'. Likely, the
> blob is seen as already in a pack-file so is just pruned by 'git repack'
> instead. I thought that perhaps we need to add a new pack ourselves that
> overrides the small pack. Here is my attempt:
>
> test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> 	git multi-pack-index write &&
> 	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> 	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
>
> 	# Write a new pack that is unknown to the multi-pack-index.
> 	BLOB1=$(echo blob1 | git hash-object -w --stdin) &&
> 	BLOB2=$(echo blob2 | git hash-object -w --stdin) &&
> 	cat >blobs <<-EOF &&
> 	$BLOB1
> 	$BLOB2
> 	EOF
> 	HASH1=$(echo $BLOB1 | git pack-objects $objdir/pack/pack) &&
> 	HASH2=$(git pack-objects $objdir/pack/pack <blobs) &&
> 	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> 	test_cmp_bin $objdir/pack/multi-pack-index \
> 		$objdir/pack/multi-pack-index.bak &&
> 	test_path_is_file $objdir/pack/pack-$HASH2.pack &&
> 	test_path_is_missing $objdir/pack/pack-$HASH1.pack
> '
>
> However, this _still_ fails on the "test_path_is_missing" line, so I'm not sure
> how to make sure your logic is tested. I saw that 'git repack' was writing
> "nothing new to pack" in the output, so I also tested adding a few commits and
> trying to force it to repack reachable data, but I cannot seem to trigger it
> to create a new pack that overrides only one pack that is not in the MIDX.
>
> Likely, I just don't know how 'git rebase' works well enough to trigger this
> behavior. But the test as-is is not testing what you want it to test.

I think this case might actually be impossible to tickle in a test. I
thought that 'git repack -d' looked for existing packs whose objects are
a subset of some new pack generated. But, it's much simpler than that:
'-d' by itself just looks for packs that were already on disk with the
same SHA-1 as a new pack, and it removes the old one.

Note that 'git repack' uses 'git pack-objects' internally to find
objects and generate a packfile. When calling 'git pack-objects', 'git
repack -d' passes '--all' and '--unpacked', which means that there is no
way we'd generate a new pack with the same SHA-1 as an existing pack
ordinarily.

So, I think this case is impossible, or at least astronomically
unlikely. What is more interesting (and untested) is that adding a _new_
pack doesn't cause us to invalidate the MIDX. Here's a patch that does
that:

  diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
  index 16a1ad040e..620f2058d6 100755
  --- a/t/t5319-multi-pack-index.sh
  +++ b/t/t5319-multi-pack-index.sh
  @@ -391,18 +391,27 @@ test_expect_success 'repack removes multi-pack-index when deleting packs' '
          test_path_is_missing $objdir/pack/multi-pack-index
   '

  -test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
  -       git multi-pack-index write &&
  -       cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
  -       test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
  -
  -       # Write a new pack that is unknown to the multi-pack-index.
  -       git hash-object -w </dev/null >blob &&
  -       git pack-objects $objdir/pack/pack <blob &&
  -
  -       GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
  -       test_cmp_bin $objdir/pack/multi-pack-index \
  -               $objdir/pack/multi-pack-index.bak
  +test_expect_success 'repack preserves multi-pack-index when creating packs' '
  +       git init preserve &&
  +       test_when_finished "rm -fr preserve" &&
  +       (
  +               cd preserve &&
  +               midx=.git/objects/pack/multi-pack-index &&
  +
  +               test_commit "initial" &&
  +               git repack -ad &&
  +               git multi-pack-index write &&
  +               ls .git/objects/pack | grep "\.pack$" >before &&
  +
  +               cp $midx $midx.bak &&
  +
  +               test_commit "another" &&
  +               GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
  +               ls .git/objects/pack | grep "\.pack$" >after &&
  +
  +               test_cmp_bin $midx.bak $midx &&
  +               ! test_cmp before after
  +       )
   '

   compare_results_with_midx "after repack"

What do you think about applying this on top and then calling it a day?

> Thanks,
> -Stolee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25  7:55 ` [PATCH] builtin/repack.c: invalidate MIDX only when necessary Son Luong Ngoc
  2020-08-25 12:45   ` Derrick Stolee
@ 2020-08-25 14:45   ` Taylor Blau
  2020-08-25 16:04     ` [PATCH v2] " Taylor Blau
  2020-08-25 16:47     ` [PATCH] " Jeff King
  1 sibling, 2 replies; 78+ messages in thread
From: Taylor Blau @ 2020-08-25 14:45 UTC (permalink / raw)
  To: Son Luong Ngoc; +Cc: Taylor Blau, git, dstolee

On Tue, Aug 25, 2020 at 09:55:22AM +0200, Son Luong Ngoc wrote:
> Hi Taylor,
>
> Thanks for working on this.
>
> > On Aug 25, 2020, at 04:01, Taylor Blau <me@ttaylorr.com> wrote:
> >
> > In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
> > learned to remove a multi-pack-index file if it added or removed a pack
> > from the object store.
> >
> > This mechanism is a little over-eager, since it is only necessary to
> > drop a MIDX if 'git repack' removes a pack that the MIDX references.
> > Adding a pack outside of the MIDX does not require invalidating the
> > MIDX, and likewise for removing a pack the MIDX does not know about.
>
> I wonder if its worth to trigger write_midx_file() to update the midx instead of
> just removing MIDX?

There's no reason that we couldn't do this, but I don't think that it's
a very good idea, especially if the new 'git maintenance' command will
be able to do something like this by itself.

I'm hesitant to add yet another option to 'git repack', which I have
always thought as a plumbing tool. That's important because callers
(like 'git maintenance' or user scripts) can 'git multi-pack-index write
...' after their 'git repack' to generate a new MIDX if they want one.

This becomes even trickier if 'git multi-pack-index write' were to gain
new options, at which point 'git repack' would *also* have to learn
those options to plumb them through.

Maybe there's a legitimate case that I'm overlooking, but I don't think
so. If there is, I'd be happy to come back to this, but this patch is
really just focused on fixing a bug.

> That is already the direction we are taking in the 'maintenance' patch series
> whenever the multi-pack-index file was deemed invalid.
>
> Or perhaps, we can check for 'core.multiPackIndex' value (which recently is
> 'true' by default) and determine whether we should remove the MIDX or rewrite
> it?
>
> >
> > Teach 'git repack' to check for this by loading the MIDX, and checking
> > whether the to-be-removed pack is known to the MIDX. This requires a
> > slightly odd alternation to a test in t5319, which is explained with a
> > comment.
> >
> > Signed-off-by: Taylor Blau <me@ttaylorr.com>

Thanks for your feedback.

> Cheers,
> Son Luong.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 14:41       ` Taylor Blau
@ 2020-08-25 15:14         ` Derrick Stolee
  2020-08-25 15:42           ` Taylor Blau
  0 siblings, 1 reply; 78+ messages in thread
From: Derrick Stolee @ 2020-08-25 15:14 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, git, dstolee

On 8/25/2020 10:41 AM, Taylor Blau wrote:
> On Tue, Aug 25, 2020 at 09:14:19AM -0400, Derrick Stolee wrote:
>> The code in builtin/repack.c looks good for sure. I have a quick question
>> about this new test:
>>
>> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
>> +	git multi-pack-index write &&
>> +	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
>> +	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
>> +
>> +	# Write a new pack that is unknown to the multi-pack-index.
>> +	git hash-object -w </dev/null >blob &&
>> +	git pack-objects $objdir/pack/pack <blob &&
>> +
>> +	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
>> +	test_cmp_bin $objdir/pack/multi-pack-index \
>> +		$objdir/pack/multi-pack-index.bak
>> +'
>> +
>>
>> You create an arbitrary blob, and then add it to a pack-file. Do we
>> know that 'git repack' is definitely creating a new pack-file that makes
>> our manually-created pack-file redundant?
>>
>> My suggestion is to have the test check itself:
>>
>> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
>> +	git multi-pack-index write &&
>> +	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
>> +	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
>> +
>> +	# Write a new pack that is unknown to the multi-pack-index.
>> +	git hash-object -w </dev/null >blob &&
>> +	HASH=$(git pack-objects $objdir/pack/pack <blob) &&
>> +
>> +	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
>> +	test_cmp_bin $objdir/pack/multi-pack-index \
>> +		$objdir/pack/multi-pack-index.bak &&
>> +	test_path_is_missing $objdir/pack/pack-$HASH.pack
>> +'
>> +
>>
>> This test fails for me, on the 'test_path_is_missing'. Likely, the
>> blob is seen as already in a pack-file so is just pruned by 'git repack'
>> instead. I thought that perhaps we need to add a new pack ourselves that
>> overrides the small pack. Here is my attempt:
>>
>> test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
>> 	git multi-pack-index write &&
>> 	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
>> 	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
>>
>> 	# Write a new pack that is unknown to the multi-pack-index.
>> 	BLOB1=$(echo blob1 | git hash-object -w --stdin) &&
>> 	BLOB2=$(echo blob2 | git hash-object -w --stdin) &&
>> 	cat >blobs <<-EOF &&
>> 	$BLOB1
>> 	$BLOB2
>> 	EOF
>> 	HASH1=$(echo $BLOB1 | git pack-objects $objdir/pack/pack) &&
>> 	HASH2=$(git pack-objects $objdir/pack/pack <blobs) &&
>> 	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
>> 	test_cmp_bin $objdir/pack/multi-pack-index \
>> 		$objdir/pack/multi-pack-index.bak &&
>> 	test_path_is_file $objdir/pack/pack-$HASH2.pack &&
>> 	test_path_is_missing $objdir/pack/pack-$HASH1.pack
>> '
>>
>> However, this _still_ fails on the "test_path_is_missing" line, so I'm not sure
>> how to make sure your logic is tested. I saw that 'git repack' was writing
>> "nothing new to pack" in the output, so I also tested adding a few commits and
>> trying to force it to repack reachable data, but I cannot seem to trigger it
>> to create a new pack that overrides only one pack that is not in the MIDX.
>>
>> Likely, I just don't know how 'git rebase' works well enough to trigger this
>> behavior. But the test as-is is not testing what you want it to test.
> 
> I think this case might actually be impossible to tickle in a test. I
> thought that 'git repack -d' looked for existing packs whose objects are
> a subset of some new pack generated. But, it's much simpler than that:
> '-d' by itself just looks for packs that were already on disk with the
> same SHA-1 as a new pack, and it removes the old one.

If 'git repack' never calls remove_redundant_pack() unless we are doing
a "full repack", then we could simplify this logic:

 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
 	struct strbuf buf = STRBUF_INIT;
-	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
+	struct multi_pack_index *m = get_multi_pack_index(the_repository);
+	strbuf_addf(&buf, "%s.pack", base_name);
+	if (m && midx_contains_pack(m, buf.buf))
+		clear_midx_file(the_repository);
+	strbuf_insertf(&buf, 0, "%s/", dir_name);
 	unlink_pack_path(buf.buf, 1);
 	strbuf_release(&buf);
 }

to

 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
 	struct strbuf buf = STRBUF_INIT;
	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
+	clear_midx_file(the_repository);
 	unlink_pack_path(buf.buf, 1);
 	strbuf_release(&buf);
 }

and get the same results as we are showing in these tests. This does
move us incrementally to a better situation: don't delete the MIDX
if we aren't deleting pack files. But, I think we can get around it.

> Note that 'git repack' uses 'git pack-objects' internally to find
> objects and generate a packfile. When calling 'git pack-objects', 'git
> repack -d' passes '--all' and '--unpacked', which means that there is no
> way we'd generate a new pack with the same SHA-1 as an existing pack
> ordinarily.
> 
> So, I think this case is impossible, or at least astronomically
> unlikely. What is more interesting (and untested) is that adding a _new_
> pack doesn't cause us to invalidate the MIDX. Here's a patch that does
> that:
> 
>   diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
>   index 16a1ad040e..620f2058d6 100755
>   --- a/t/t5319-multi-pack-index.sh
>   +++ b/t/t5319-multi-pack-index.sh
>   @@ -391,18 +391,27 @@ test_expect_success 'repack removes multi-pack-index when deleting packs' '
>           test_path_is_missing $objdir/pack/multi-pack-index
>    '
> 
>   -test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
>   -       git multi-pack-index write &&
>   -       cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
>   -       test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
>   -
>   -       # Write a new pack that is unknown to the multi-pack-index.
>   -       git hash-object -w </dev/null >blob &&
>   -       git pack-objects $objdir/pack/pack <blob &&
>   -
>   -       GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
>   -       test_cmp_bin $objdir/pack/multi-pack-index \
>   -               $objdir/pack/multi-pack-index.bak
>   +test_expect_success 'repack preserves multi-pack-index when creating packs' '
>   +       git init preserve &&
>   +       test_when_finished "rm -fr preserve" &&
>   +       (
>   +               cd preserve &&
>   +               midx=.git/objects/pack/multi-pack-index &&
>   +
>   +               test_commit "initial" &&
>   +               git repack -ad &&
>   +               git multi-pack-index write &&
>   +               ls .git/objects/pack | grep "\.pack$" >before &&
>   +
>   +               cp $midx $midx.bak &&
>   +
>   +               test_commit "another" &&
>   +               GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
>   +               ls .git/objects/pack | grep "\.pack$" >after &&
>   +
>   +               test_cmp_bin $midx.bak $midx &&
>   +               ! test_cmp before after
>   +       )
>    '

After looking at the callers to remote_redundant_pack() I noticed that it is only
called after inspecting the "names" struct, which contains the names of the packs
to group into a new pack-file. We can use a .keep file to preserve the pack-file(s) in
the MIDX but also to ensure multiple pack-files outside of the MIDX are repacked and
deleted. While this is very unlikely in the wild, it is definitely possible.

test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
	git init preserve &&
	test_when_finished "rm -fr preserve" &&
	(
		cd preserve &&
		midx=.git/objects/pack/multi-pack-index &&

		test_commit 1 &&
		HASH1=$(git pack-objects --all .git/objects/pack/pack) &&
		touch .git/objects/pack/pack-$HASH1.keep &&

		cat >pack-input <<-\EOF &&
		HEAD
		^HEAD~1
		EOF

		test_commit 2 &&
		HASH2=$(git pack-objects --revs .git/objects/pack/pack <pack-input) &&
		touch .git/objects/pack/pack-$HASH2.keep &&

		git multi-pack-index write &&
		cp $midx $midx.bak &&

		test_commit 3 &&
		HASH3=$(git pack-objects --revs .git/objects/pack/pack <pack-input) &&

		test_commit 4 &&
		HASH4=$(git pack-objects --revs .git/objects/pack/pack <pack-input) &&

		GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -ad &&
		test_path_is_file .git/objects/pack/pack-$HASH1.pack &&
		test_path_is_file .git/objects/pack/pack-$HASH2.pack &&
		test_path_is_missing .git/objects/pack/pack-$HASH3.pack &&
		test_path_is_missing .git/objects/pack/pack-$HASH4.pack
       )
'

I believe this checks your condition properly enough.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 15:14         ` Derrick Stolee
@ 2020-08-25 15:42           ` Taylor Blau
  2020-08-25 16:56             ` Jeff King
  0 siblings, 1 reply; 78+ messages in thread
From: Taylor Blau @ 2020-08-25 15:42 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Jeff King, git, dstolee

On Tue, Aug 25, 2020 at 11:14:41AM -0400, Derrick Stolee wrote:
> On 8/25/2020 10:41 AM, Taylor Blau wrote:
> > On Tue, Aug 25, 2020 at 09:14:19AM -0400, Derrick Stolee wrote:
> >> The code in builtin/repack.c looks good for sure. I have a quick question
> >> about this new test:
> >>
> >> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> >> +	git multi-pack-index write &&
> >> +	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> >> +	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> >> +
> >> +	# Write a new pack that is unknown to the multi-pack-index.
> >> +	git hash-object -w </dev/null >blob &&
> >> +	git pack-objects $objdir/pack/pack <blob &&
> >> +
> >> +	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> >> +	test_cmp_bin $objdir/pack/multi-pack-index \
> >> +		$objdir/pack/multi-pack-index.bak
> >> +'
> >> +
> >>
> >> You create an arbitrary blob, and then add it to a pack-file. Do we
> >> know that 'git repack' is definitely creating a new pack-file that makes
> >> our manually-created pack-file redundant?
> >>
> >> My suggestion is to have the test check itself:
> >>
> >> +test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> >> +	git multi-pack-index write &&
> >> +	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> >> +	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> >> +
> >> +	# Write a new pack that is unknown to the multi-pack-index.
> >> +	git hash-object -w </dev/null >blob &&
> >> +	HASH=$(git pack-objects $objdir/pack/pack <blob) &&
> >> +
> >> +	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> >> +	test_cmp_bin $objdir/pack/multi-pack-index \
> >> +		$objdir/pack/multi-pack-index.bak &&
> >> +	test_path_is_missing $objdir/pack/pack-$HASH.pack
> >> +'
> >> +
> >>
> >> This test fails for me, on the 'test_path_is_missing'. Likely, the
> >> blob is seen as already in a pack-file so is just pruned by 'git repack'
> >> instead. I thought that perhaps we need to add a new pack ourselves that
> >> overrides the small pack. Here is my attempt:
> >>
> >> test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> >> 	git multi-pack-index write &&
> >> 	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> >> 	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> >>
> >> 	# Write a new pack that is unknown to the multi-pack-index.
> >> 	BLOB1=$(echo blob1 | git hash-object -w --stdin) &&
> >> 	BLOB2=$(echo blob2 | git hash-object -w --stdin) &&
> >> 	cat >blobs <<-EOF &&
> >> 	$BLOB1
> >> 	$BLOB2
> >> 	EOF
> >> 	HASH1=$(echo $BLOB1 | git pack-objects $objdir/pack/pack) &&
> >> 	HASH2=$(git pack-objects $objdir/pack/pack <blobs) &&
> >> 	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> >> 	test_cmp_bin $objdir/pack/multi-pack-index \
> >> 		$objdir/pack/multi-pack-index.bak &&
> >> 	test_path_is_file $objdir/pack/pack-$HASH2.pack &&
> >> 	test_path_is_missing $objdir/pack/pack-$HASH1.pack
> >> '
> >>
> >> However, this _still_ fails on the "test_path_is_missing" line, so I'm not sure
> >> how to make sure your logic is tested. I saw that 'git repack' was writing
> >> "nothing new to pack" in the output, so I also tested adding a few commits and
> >> trying to force it to repack reachable data, but I cannot seem to trigger it
> >> to create a new pack that overrides only one pack that is not in the MIDX.
> >>
> >> Likely, I just don't know how 'git rebase' works well enough to trigger this
> >> behavior. But the test as-is is not testing what you want it to test.
> >
> > I think this case might actually be impossible to tickle in a test. I
> > thought that 'git repack -d' looked for existing packs whose objects are
> > a subset of some new pack generated. But, it's much simpler than that:
> > '-d' by itself just looks for packs that were already on disk with the
> > same SHA-1 as a new pack, and it removes the old one.
>
> If 'git repack' never calls remove_redundant_pack() unless we are doing
> a "full repack", then we could simplify this logic:
>
>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
>  {
>  	struct strbuf buf = STRBUF_INIT;
> -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
> +	strbuf_addf(&buf, "%s.pack", base_name);
> +	if (m && midx_contains_pack(m, buf.buf))
> +		clear_midx_file(the_repository);
> +	strbuf_insertf(&buf, 0, "%s/", dir_name);
>  	unlink_pack_path(buf.buf, 1);
>  	strbuf_release(&buf);
>  }
>
> to
>
>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
>  {
>  	struct strbuf buf = STRBUF_INIT;
> 	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> +	clear_midx_file(the_repository);
>  	unlink_pack_path(buf.buf, 1);
>  	strbuf_release(&buf);
>  }
>
> and get the same results as we are showing in these tests. This does
> move us incrementally to a better situation: don't delete the MIDX
> if we aren't deleting pack files. But, I think we can get around it.

Makes sense, but reading your whole email we are better off leaving this
as-is and changing the test to exercise it more often.

> > Note that 'git repack' uses 'git pack-objects' internally to find
> > objects and generate a packfile. When calling 'git pack-objects', 'git
> > repack -d' passes '--all' and '--unpacked', which means that there is no
> > way we'd generate a new pack with the same SHA-1 as an existing pack
> > ordinarily.
> >
> > So, I think this case is impossible, or at least astronomically
> > unlikely. What is more interesting (and untested) is that adding a _new_
> > pack doesn't cause us to invalidate the MIDX. Here's a patch that does
> > that:
> >
> >   diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
> >   index 16a1ad040e..620f2058d6 100755
> >   --- a/t/t5319-multi-pack-index.sh
> >   +++ b/t/t5319-multi-pack-index.sh
> >   @@ -391,18 +391,27 @@ test_expect_success 'repack removes multi-pack-index when deleting packs' '
> >           test_path_is_missing $objdir/pack/multi-pack-index
> >    '
> >
> >   -test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> >   -       git multi-pack-index write &&
> >   -       cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
> >   -       test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
> >   -
> >   -       # Write a new pack that is unknown to the multi-pack-index.
> >   -       git hash-object -w </dev/null >blob &&
> >   -       git pack-objects $objdir/pack/pack <blob &&
> >   -
> >   -       GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> >   -       test_cmp_bin $objdir/pack/multi-pack-index \
> >   -               $objdir/pack/multi-pack-index.bak
> >   +test_expect_success 'repack preserves multi-pack-index when creating packs' '
> >   +       git init preserve &&
> >   +       test_when_finished "rm -fr preserve" &&
> >   +       (
> >   +               cd preserve &&
> >   +               midx=.git/objects/pack/multi-pack-index &&
> >   +
> >   +               test_commit "initial" &&
> >   +               git repack -ad &&
> >   +               git multi-pack-index write &&
> >   +               ls .git/objects/pack | grep "\.pack$" >before &&
> >   +
> >   +               cp $midx $midx.bak &&
> >   +
> >   +               test_commit "another" &&
> >   +               GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
> >   +               ls .git/objects/pack | grep "\.pack$" >after &&
> >   +
> >   +               test_cmp_bin $midx.bak $midx &&
> >   +               ! test_cmp before after
> >   +       )
> >    '
>
> After looking at the callers to remote_redundant_pack() I noticed that it is only
> called after inspecting the "names" struct, which contains the names of the packs
> to group into a new pack-file. We can use a .keep file to preserve the pack-file(s) in
> the MIDX but also to ensure multiple pack-files outside of the MIDX are repacked and
> deleted. While this is very unlikely in the wild, it is definitely possible.

Great idea.

> test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> 	git init preserve &&
> 	test_when_finished "rm -fr preserve" &&
> 	(
> 		cd preserve &&
> 		midx=.git/objects/pack/multi-pack-index &&
>
> 		test_commit 1 &&
> 		HASH1=$(git pack-objects --all .git/objects/pack/pack) &&
> 		touch .git/objects/pack/pack-$HASH1.keep &&
>
> 		cat >pack-input <<-\EOF &&

Escaping the heredoc shouldn't be necessary, so this can be written
instead as '<<-EOF'.

> 		HEAD
> 		^HEAD~1
> 		EOF
>
> 		test_commit 2 &&
> 		HASH2=$(git pack-objects --revs .git/objects/pack/pack <pack-input) &&
> 		touch .git/objects/pack/pack-$HASH2.keep &&
>
> 		git multi-pack-index write &&
> 		cp $midx $midx.bak &&
>
> 		test_commit 3 &&
> 		HASH3=$(git pack-objects --revs .git/objects/pack/pack <pack-input) &&
>
> 		test_commit 4 &&
> 		HASH4=$(git pack-objects --revs .git/objects/pack/pack <pack-input) &&
>
> 		GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -ad &&
> 		test_path_is_file .git/objects/pack/pack-$HASH1.pack &&
> 		test_path_is_file .git/objects/pack/pack-$HASH2.pack &&
> 		test_path_is_missing .git/objects/pack/pack-$HASH3.pack &&
> 		test_path_is_missing .git/objects/pack/pack-$HASH4.pack

...and we should check that 'test_cmp $midx.bak $midx' is clean, i.e.,
that we didn't touch the MIDX.

>        )
> '
>
> I believe this checks your condition properly enough.

Otherwise, I think that this test looks great. Thanks for suggesting
it. I'll send a new patch now...

> Thanks,
> -Stolee

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25  2:26 ` Jeff King
  2020-08-25  2:37   ` Taylor Blau
@ 2020-08-25 15:58   ` Junio C Hamano
  2020-08-25 16:08     ` Taylor Blau
                       ` (2 more replies)
  1 sibling, 3 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-25 15:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee

Jeff King <peff@peff.net> writes:

> Does "git repack" ever remove just one pack? Obviously "git repack -ad"
> or "git repack -Ad" is going to pack everything and delete the old
> packs. So I think we'd want to remove a midx there.

AFAIK, the pack-redundant subcommand is used by nobody in-tree, so
nobody is doing "there are three packfiles, but all the objects in
one of them are contained in the other two, so let's remove that
redundant one".

> And "git repack -d" I think of as deleting only loose objects that we
> just packed. But I guess it could also remove a pack that has now been
> made redundant? That seems like a rare case in practice, but I suppose
> is possible.

Meaning it can become reality?  Yes.  Or it already can happen?  I
doubt it.

> E.g., imagine we have a midx that points to packs A and B, and
> git-repack deletes B. By your logic above, we need to remove the midx
> because now it points to objects in B which aren't accessible. But by
> deleting it, could we be deleting the only thing that mentions the
> objects in A?
>
> I _think_ the answer is "no", because we never went all-in on midx and
> allowed deleting the matching .idx files for contained packs.

Yeah, it has been my assumption that that part of the design would
never change.

> I'm also a little curious how bad it is to have a midx whose pack has
> gone away. I guess we'd answer queries for "yes, we have this object"
> even if we don't, which is bad. Though in practice we'd only delete
> those packs if we have their objects elsewhere.

Hmph, object O used to be in pack A and pack B, midx points at the
copy in pack B but we deleted it because the pack is redundant.
Now, midx says O exists, but midx cannot be used to retrieve O.  We
need to ask A.idx about O to locate it.

That sounds brittle.  I am not sure our error fallback is all that
patient.


> And the pack code is
> pretty good about retrying other copies of objects that can't be
> accessed. Alternatively, I wonder if the midx-loading code ought to
> check that all of the constituent packs are available.
>
> In that line of thinking, do we even need to delete midx files if one of
> their packs goes away? The reading side probably ought to be able to
> handle that gracefully.

But at that point, is there even a point to have that midx file that
knows about objects (so it can answer has_object()? correctly and
quickly) but does not know the correct location of half of the objects?
Instead of going directly to A.idx to locate O, we need to go to midx
to learn the location of O in B (which no longer exists), and then
fall back to it, that is a pure overhead.

> And the more interesting case is when you repack everything with "-ad"
> or similar, at which point you shouldn't even need to look up what's in
> the midx to see if you deleted its packs. The point of your operation is
> to put it all-into-one, so you know the old midx should be discarded.

Old one, yes.  Do we need to have the new one in that case?

>> Teach 'git repack' to check for this by loading the MIDX, and checking
>> whether the to-be-removed pack is known to the MIDX. This requires a
>> slightly odd alternation to a test in t5319, which is explained with a
>> comment.
>
> My above musings aside, this seems like an obvious improvement.

Yes.

>> diff --git a/builtin/repack.c b/builtin/repack.c
>> index 04c5ceaf7e..98fac03946 100644
>> --- a/builtin/repack.c
>> +++ b/builtin/repack.c
>> @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
>>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
>>  {
>>  	struct strbuf buf = STRBUF_INIT;
>> -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
>> +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
>> +	strbuf_addf(&buf, "%s.pack", base_name);
>> +	if (m && midx_contains_pack(m, buf.buf))
>> +		clear_midx_file(the_repository);
>> +	strbuf_insertf(&buf, 0, "%s/", dir_name);
>
> Makes sense. midx_contains_pack() is a binary search, so we'll spend
> O(n log n) effort deleting the packs (I wondered if this might be
> accidentally quadratic over the number of packs).
>
> And after we clear, "m" will be NULL, so we'll do it at most once. Which
> is why you can get rid of the manual "midx_cleared" flag from the
> preimage.
>
> So the patch looks good to me.

Thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 14:45   ` Taylor Blau
@ 2020-08-25 16:04     ` Taylor Blau
  2020-08-26 20:51       ` Derrick Stolee
  2020-08-25 16:47     ` [PATCH] " Jeff King
  1 sibling, 1 reply; 78+ messages in thread
From: Taylor Blau @ 2020-08-25 16:04 UTC (permalink / raw)
  To: git; +Cc: dstolee, peff, sluongng

In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
learned to remove a multi-pack-index file if it added or removed a pack
from the object store.

This mechanism is a little over-eager, since it is only necessary to
drop a MIDX if 'git repack' removes a pack that the MIDX references.
Adding a pack outside of the MIDX does not require invalidating the
MIDX, and likewise for removing a pack the MIDX does not know about.

Teach 'git repack' to check for this by loading the MIDX, and checking
whether the to-be-removed pack is known to the MIDX. This requires a
slightly odd alternation to a test in t5319, which is explained with a
comment. A new test is added to show that the MIDX is left alone when
both packs known to it are marked as .keep, but two packs unknown to it
are removed and combined into one new pack.

Helped-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
Range-diff against v1:
1:  ef9186a8df ! 1:  87a3b7a5a2 builtin/repack.c: invalidate MIDX only when necessary
    @@ Commit message
         Teach 'git repack' to check for this by loading the MIDX, and checking
         whether the to-be-removed pack is known to the MIDX. This requires a
         slightly odd alternation to a test in t5319, which is explained with a
    -    comment.
    +    comment. A new test is added to show that the MIDX is left alone when
    +    both packs known to it are marked as .keep, but two packs unknown to it
    +    are removed and combined into one new pack.
     
    +    Helped-by: Derrick Stolee <dstolee@microsoft.com>
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
      ## builtin/repack.c ##
    @@ t/t5319-multi-pack-index.sh: test_expect_success 'repack with the --no-progress
      	test_path_is_missing $objdir/pack/multi-pack-index
      '
      
    -+test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
    -+	git multi-pack-index write &&
    -+	cp $objdir/pack/multi-pack-index $objdir/pack/multi-pack-index.bak &&
    -+	test_when_finished "rm -f $objdir/pack/multi-pack-index.bak" &&
    -+
    -+	# Write a new pack that is unknown to the multi-pack-index.
    -+	git hash-object -w </dev/null >blob &&
    -+	git pack-objects $objdir/pack/pack <blob &&
    -+
    -+	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -d &&
    -+	test_cmp_bin $objdir/pack/multi-pack-index \
    -+		$objdir/pack/multi-pack-index.bak
    ++test_expect_success 'repack preserves multi-pack-index when creating packs' '
    ++	git init preserve &&
    ++	test_when_finished "rm -fr preserve" &&
    ++	(
    ++		cd preserve &&
    ++		packdir=.git/objects/pack &&
    ++		midx=$packdir/multi-pack-index &&
    ++
    ++		test_commit 1 &&
    ++		pack1=$(git pack-objects --all $packdir/pack) &&
    ++		touch $packdir/pack-$pack1.keep &&
    ++		test_commit 2 &&
    ++		pack2=$(git pack-objects --revs $packdir/pack) &&
    ++		touch $packdir/pack-$pack2.keep &&
    ++
    ++		git multi-pack-index write &&
    ++		cp $midx $midx.bak &&
    ++
    ++		cat >pack-input <<-EOF &&
    ++		HEAD
    ++		^HEAD~1
    ++		EOF
    ++		test_commit 3 &&
    ++		pack3=$(git pack-objects --revs $packdir/pack <pack-input) &&
    ++		test_commit 4 &&
    ++		pack4=$(git pack-objects --revs $packdir/pack <pack-input) &&
    ++
    ++		GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -ad &&
    ++		ls -la $packdir &&
    ++		test_path_is_file $packdir/pack-$pack1.pack &&
    ++		test_path_is_file $packdir/pack-$pack2.pack &&
    ++		test_path_is_missing $packdir/pack-$pack3.pack &&
    ++		test_path_is_missing $packdir/pack-$pack4.pack &&
    ++		test_cmp_bin $midx.bak $midx
    ++	)
     +'
     +
      compare_results_with_midx "after repack"

 builtin/repack.c            | 12 +++++-----
 t/t5319-multi-pack-index.sh | 44 +++++++++++++++++++++++++++++++++++--
 2 files changed, 47 insertions(+), 9 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 04c5ceaf7e..98fac03946 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
 	struct strbuf buf = STRBUF_INIT;
-	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
+	struct multi_pack_index *m = get_multi_pack_index(the_repository);
+	strbuf_addf(&buf, "%s.pack", base_name);
+	if (m && midx_contains_pack(m, buf.buf))
+		clear_midx_file(the_repository);
+	strbuf_insertf(&buf, 0, "%s/", dir_name);
 	unlink_pack_path(buf.buf, 1);
 	strbuf_release(&buf);
 }
@@ -286,7 +290,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	int keep_unreachable = 0;
 	struct string_list keep_pack_list = STRING_LIST_INIT_NODUP;
 	int no_update_server_info = 0;
-	int midx_cleared = 0;
 	struct pack_objects_args po_args = {NULL};
 
 	struct option builtin_repack_options[] = {
@@ -439,11 +442,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		for (ext = 0; ext < ARRAY_SIZE(exts); ext++) {
 			char *fname, *fname_old;
 
-			if (!midx_cleared) {
-				clear_midx_file(the_repository);
-				midx_cleared = 1;
-			}
-
 			fname = mkpathdup("%s/pack-%s%s", packdir,
 						item->string, exts[ext].name);
 			if (!file_exists(fname)) {
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 43b1b5b2af..f340b376bc 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -382,12 +382,52 @@ test_expect_success 'repack with the --no-progress option' '
 	test_line_count = 0 err
 '
 
-test_expect_success 'repack removes multi-pack-index' '
+test_expect_success 'repack removes multi-pack-index when deleting packs' '
 	test_path_is_file $objdir/pack/multi-pack-index &&
-	GIT_TEST_MULTI_PACK_INDEX=0 git repack -adf &&
+	# Set GIT_TEST_MULTI_PACK_INDEX to 0 to avoid writing a new
+	# multi-pack-index after repacking, but set "core.multiPackIndex" to
+	# true so that "git repack" can read the existing MIDX.
+	GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -adf &&
 	test_path_is_missing $objdir/pack/multi-pack-index
 '
 
+test_expect_success 'repack preserves multi-pack-index when creating packs' '
+	git init preserve &&
+	test_when_finished "rm -fr preserve" &&
+	(
+		cd preserve &&
+		packdir=.git/objects/pack &&
+		midx=$packdir/multi-pack-index &&
+
+		test_commit 1 &&
+		pack1=$(git pack-objects --all $packdir/pack) &&
+		touch $packdir/pack-$pack1.keep &&
+		test_commit 2 &&
+		pack2=$(git pack-objects --revs $packdir/pack) &&
+		touch $packdir/pack-$pack2.keep &&
+
+		git multi-pack-index write &&
+		cp $midx $midx.bak &&
+
+		cat >pack-input <<-EOF &&
+		HEAD
+		^HEAD~1
+		EOF
+		test_commit 3 &&
+		pack3=$(git pack-objects --revs $packdir/pack <pack-input) &&
+		test_commit 4 &&
+		pack4=$(git pack-objects --revs $packdir/pack <pack-input) &&
+
+		GIT_TEST_MULTI_PACK_INDEX=0 git -c core.multiPackIndex repack -ad &&
+		ls -la $packdir &&
+		test_path_is_file $packdir/pack-$pack1.pack &&
+		test_path_is_file $packdir/pack-$pack2.pack &&
+		test_path_is_missing $packdir/pack-$pack3.pack &&
+		test_path_is_missing $packdir/pack-$pack4.pack &&
+		test_cmp_bin $midx.bak $midx
+	)
+'
+
 compare_results_with_midx "after repack"
 
 test_expect_success 'multi-pack-index and pack-bitmap' '
-- 
2.28.0.338.g87a3b7a5a2

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 15:58   ` Junio C Hamano
@ 2020-08-25 16:08     ` Taylor Blau
  2020-08-25 16:18     ` Derrick Stolee
  2020-08-25 17:22     ` Jeff King
  2 siblings, 0 replies; 78+ messages in thread
From: Taylor Blau @ 2020-08-25 16:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, dstolee

On Tue, Aug 25, 2020 at 08:58:46AM -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
> > Does "git repack" ever remove just one pack? Obviously "git repack -ad"
> > or "git repack -Ad" is going to pack everything and delete the old
> > packs. So I think we'd want to remove a midx there.
>
> AFAIK, the pack-redundant subcommand is used by nobody in-tree, so
> nobody is doing "there are three packfiles, but all the objects in
> one of them are contained in the other two, so let's remove that
> redundant one".

Not AFAICT either.

> > And "git repack -d" I think of as deleting only loose objects that we
> > just packed. But I guess it could also remove a pack that has now been
> > made redundant? That seems like a rare case in practice, but I suppose
> > is possible.
>
> Meaning it can become reality?  Yes.  Or it already can happen?  I
> doubt it.
>
> > E.g., imagine we have a midx that points to packs A and B, and
> > git-repack deletes B. By your logic above, we need to remove the midx
> > because now it points to objects in B which aren't accessible. But by
> > deleting it, could we be deleting the only thing that mentions the
> > objects in A?
> >
> > I _think_ the answer is "no", because we never went all-in on midx and
> > allowed deleting the matching .idx files for contained packs.
>
> Yeah, it has been my assumption that that part of the design would
> never change.
>
> > I'm also a little curious how bad it is to have a midx whose pack has
> > gone away. I guess we'd answer queries for "yes, we have this object"
> > even if we don't, which is bad. Though in practice we'd only delete
> > those packs if we have their objects elsewhere.
>
> Hmph, object O used to be in pack A and pack B, midx points at the
> copy in pack B but we deleted it because the pack is redundant.
> Now, midx says O exists, but midx cannot be used to retrieve O.  We
> need to ask A.idx about O to locate it.
>
> That sounds brittle.  I am not sure our error fallback is all that
> patient.

Me either. Like I said, I think that all of this is possible at least in
theory, but in practice it may be somewhat annoying in cases like these.

>
> > And the pack code is
> > pretty good about retrying other copies of objects that can't be
> > accessed. Alternatively, I wonder if the midx-loading code ought to
> > check that all of the constituent packs are available.
> >
> > In that line of thinking, do we even need to delete midx files if one of
> > their packs goes away? The reading side probably ought to be able to
> > handle that gracefully.
>
> But at that point, is there even a point to have that midx file that
> knows about objects (so it can answer has_object()? correctly and
> quickly) but does not know the correct location of half of the objects?
> Instead of going directly to A.idx to locate O, we need to go to midx
> to learn the location of O in B (which no longer exists), and then
> fall back to it, that is a pure overhead.

Well put.

> > And the more interesting case is when you repack everything with "-ad"
> > or similar, at which point you shouldn't even need to look up what's in
> > the midx to see if you deleted its packs. The point of your operation is
> > to put it all-into-one, so you know the old midx should be discarded.
>
> Old one, yes.  Do we need to have the new one in that case?
>
> >> Teach 'git repack' to check for this by loading the MIDX, and checking
> >> whether the to-be-removed pack is known to the MIDX. This requires a
> >> slightly odd alternation to a test in t5319, which is explained with a
> >> comment.
> >
> > My above musings aside, this seems like an obvious improvement.
>
> Yes.
>
> >> diff --git a/builtin/repack.c b/builtin/repack.c
> >> index 04c5ceaf7e..98fac03946 100644
> >> --- a/builtin/repack.c
> >> +++ b/builtin/repack.c
> >> @@ -133,7 +133,11 @@ static void get_non_kept_pack_filenames(struct string_list *fname_list,
> >>  static void remove_redundant_pack(const char *dir_name, const char *base_name)
> >>  {
> >>  	struct strbuf buf = STRBUF_INIT;
> >> -	strbuf_addf(&buf, "%s/%s.pack", dir_name, base_name);
> >> +	struct multi_pack_index *m = get_multi_pack_index(the_repository);
> >> +	strbuf_addf(&buf, "%s.pack", base_name);
> >> +	if (m && midx_contains_pack(m, buf.buf))
> >> +		clear_midx_file(the_repository);
> >> +	strbuf_insertf(&buf, 0, "%s/", dir_name);
> >
> > Makes sense. midx_contains_pack() is a binary search, so we'll spend
> > O(n log n) effort deleting the packs (I wondered if this might be
> > accidentally quadratic over the number of packs).
> >
> > And after we clear, "m" will be NULL, so we'll do it at most once. Which
> > is why you can get rid of the manual "midx_cleared" flag from the
> > preimage.
> >
> > So the patch looks good to me.
>
> Thanks.

Thanks, both. If you're ready, let's use the version in [1] for
queueing.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/87a3b7a5a2f091e2a23c163a7d86effbbbedfa3a.1598371475.git.me@ttaylorr.com/

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 15:58   ` Junio C Hamano
  2020-08-25 16:08     ` Taylor Blau
@ 2020-08-25 16:18     ` Derrick Stolee
  2020-08-25 17:34       ` Jeff King
  2020-08-25 17:22     ` Jeff King
  2 siblings, 1 reply; 78+ messages in thread
From: Derrick Stolee @ 2020-08-25 16:18 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: Taylor Blau, git, dstolee

On 8/25/2020 11:58 AM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
>> Does "git repack" ever remove just one pack? Obviously "git repack -ad"
>> or "git repack -Ad" is going to pack everything and delete the old
>> packs. So I think we'd want to remove a midx there.
> 
> AFAIK, the pack-redundant subcommand is used by nobody in-tree, so
> nobody is doing "there are three packfiles, but all the objects in
> one of them are contained in the other two, so let's remove that
> redundant one".
> 
>> And "git repack -d" I think of as deleting only loose objects that we
>> just packed. But I guess it could also remove a pack that has now been
>> made redundant? That seems like a rare case in practice, but I suppose
>> is possible.
> 
> Meaning it can become reality?  Yes.  Or it already can happen?  I
> doubt it.
> 
>> E.g., imagine we have a midx that points to packs A and B, and
>> git-repack deletes B. By your logic above, we need to remove the midx
>> because now it points to objects in B which aren't accessible. But by
>> deleting it, could we be deleting the only thing that mentions the
>> objects in A?
>>
>> I _think_ the answer is "no", because we never went all-in on midx and
>> allowed deleting the matching .idx files for contained packs.
> 
> Yeah, it has been my assumption that that part of the design would
> never change.

Yes, we should intend to keep the .idx files around forever even when
using the multi-pack-index. The duplicated data overhead is not typically
a real problem.

The one caveat I would consider is if we wanted to let a multi-pack-index
cover thin packs, but that would be a substantial change to the object
database that I do not plan to tackle any time soon.

>> I'm also a little curious how bad it is to have a midx whose pack has
>> gone away. I guess we'd answer queries for "yes, we have this object"
>> even if we don't, which is bad. Though in practice we'd only delete
>> those packs if we have their objects elsewhere.
> 
> Hmph, object O used to be in pack A and pack B, midx points at the
> copy in pack B but we deleted it because the pack is redundant.
> Now, midx says O exists, but midx cannot be used to retrieve O.  We
> need to ask A.idx about O to locate it.
> 
> That sounds brittle.  I am not sure our error fallback is all that
> patient.

The best I can see is that prepare_midx_pack() will return 1 if the
pack no longer exists, and this would cause a die("error preparing
packfile from multi-pack-index") in nth_midxed_pack_entry(), killing
the following stack trace:

+ find_pack_entry():packfile.c
 + fill_midx_entry():midx.c
  + nth_midxed_pack_entry():midx.c

Perhaps that die() is a bit over-zealous since we could return 1
through this stack and find_pack_entry() could continue searching
for the object in the packed_git list. However, it could start
returning false _negatives_ if there were duplicates of the object
in the multi-pack-index but only the latest copy was deleted (and
the object does not appear in a pack-file outside of the multi-
pack-index).

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 14:45   ` Taylor Blau
  2020-08-25 16:04     ` [PATCH v2] " Taylor Blau
@ 2020-08-25 16:47     ` Jeff King
  2020-08-25 17:10       ` Derrick Stolee
  1 sibling, 1 reply; 78+ messages in thread
From: Jeff King @ 2020-08-25 16:47 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Son Luong Ngoc, git, dstolee

On Tue, Aug 25, 2020 at 10:45:15AM -0400, Taylor Blau wrote:

> > I wonder if its worth to trigger write_midx_file() to update the midx instead of
> > just removing MIDX?
> 
> There's no reason that we couldn't do this, but I don't think that it's
> a very good idea, especially if the new 'git maintenance' command will
> be able to do something like this by itself.
> 
> I'm hesitant to add yet another option to 'git repack', which I have
> always thought as a plumbing tool. That's important because callers
> (like 'git maintenance' or user scripts) can 'git multi-pack-index write
> ...' after their 'git repack' to generate a new MIDX if they want one.

It may be worth thinking a bit about atomicity here, though. Rather than
separate delete and write steps, would somebody want a sequence like:

  - we have a midx with packs A, B, C

  - we find out that pack C is redundant and want to drop it

  - we create a new midx with A and B in a tempfile

  - we atomically rename the new midx over the old

  - we delete pack C

A simultaneous reader always sees a consistent midx. Whereas in a
delete-then-rewrite scenario there is a moment where there is no midx,
and they'd fall back to reading the individual idx files.

It may not matter all that much, though, for two reasons:

  - reading individual idx files should still give a correct answer.
    It just may be a bit slower.

  - even with an atomic replacement, I think caching on the reader side
    may cause interesting effects. For instance, if a reader process
    opens the old midx, it will generally cache that knowledge in memory
    (including mmap the content). So even after the replacement and
    deletion of C, it may still try to use the old midx that references
    C.

If we do care, though, that implies some cooperation between the
deletion process and the midx code. Perhaps it even argues that repack
should refuse to delete such a single pack at all, since it _isn't_
redundant. It's part of a midx, and the caller should rewrite the midx
first itself, and _then_ look for redundant packs.

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 15:42           ` Taylor Blau
@ 2020-08-25 16:56             ` Jeff King
  0 siblings, 0 replies; 78+ messages in thread
From: Jeff King @ 2020-08-25 16:56 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Derrick Stolee, git, dstolee

On Tue, Aug 25, 2020 at 11:42:24AM -0400, Taylor Blau wrote:

> > test_expect_success 'repack preserves multi-pack-index when deleting unknown packs' '
> > 	git init preserve &&
> > 	test_when_finished "rm -fr preserve" &&
> > 	(
> > 		cd preserve &&
> > 		midx=.git/objects/pack/multi-pack-index &&
> >
> > 		test_commit 1 &&
> > 		HASH1=$(git pack-objects --all .git/objects/pack/pack) &&
> > 		touch .git/objects/pack/pack-$HASH1.keep &&
> >
> > 		cat >pack-input <<-\EOF &&
> 
> Escaping the heredoc shouldn't be necessary, so this can be written
> instead as '<<-EOF'.

We usually go the opposite way: if a here-doc doesn't need
interpolation, then we prefer to mark it as such to avoid surprising
somebody who adds lines later (and might need quoting).

Certainly you can argue that least-surprise would be in the other
direction (i.e., people expect to interpolate by default, and you
surprise anybody adding a variable reference), but for Git's test suite,
the convention usually runs the other way.

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 16:47     ` [PATCH] " Jeff King
@ 2020-08-25 17:10       ` Derrick Stolee
  2020-08-25 17:29         ` Jeff King
  0 siblings, 1 reply; 78+ messages in thread
From: Derrick Stolee @ 2020-08-25 17:10 UTC (permalink / raw)
  To: Jeff King, Taylor Blau; +Cc: Son Luong Ngoc, git, dstolee

On 8/25/2020 12:47 PM, Jeff King wrote:
> It may be worth thinking a bit about atomicity here, though. Rather than
> separate delete and write steps, would somebody want a sequence like:

...
> If we do care, though, that implies some cooperation between the
> deletion process and the midx code. Perhaps it even argues that repack
> should refuse to delete such a single pack at all, since it _isn't_
> redundant. It's part of a midx, and the caller should rewrite the midx
> first itself, and _then_ look for redundant packs.

It is worth noting that we _do_ have a way to integrate the delete and
write code using 'git multi-pack-index expire'. One way to resolve this
atomicity would be to do the following inside the repack command:

 1. Create and index the new pack.
 2. git multi-pack-index write
 3. git multi-pack-index expire

While this _mostly_ works, we still have issues around a concurrent
process opening a multi-pack-index before step (2) and trying to
read from the pack-file deleted in step (3). For this reason, the
'incremental-repack' task in the maintenance builtin series [1] runs
'expire' then 'repack' (and not the opposite order). To be completely
safe, we'd want to make sure that no Git command that started before
the 'repack' command is still running when we run 'expire'. In practice,
it suffices to run the 'incremental-repack' step at most once a day.

[1] https://lore.kernel.org/git/a8d956dad6b3a81d0f1b63cbd48f36a5e1195ae8.1597760730.git.gitgitgadget@gmail.com/

This discussion points out a big reason why the multi-pack-index
wasn't fully integrated with 'git repack': it can be tricky. Using
a repacking strategy that is focused on the multi-pack-index is
less tricky.

I still think that this patch is moving us in a better direction.

I'm making note of these possible improvements:

1. Have 'git repack' integrate with 'git multi-pack-index expire'
   when deleting pack-files after creating a new one (if a
   multi-pack-index exists).

2. Handle "missing packs" referenced by the multi-pack-index more
   gracefully, likely by disabling the multi-pack-index and
   calling reprepare_packed_git().

Thanks,
-Stolee



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 15:58   ` Junio C Hamano
  2020-08-25 16:08     ` Taylor Blau
  2020-08-25 16:18     ` Derrick Stolee
@ 2020-08-25 17:22     ` Jeff King
  2020-08-25 18:05       ` Junio C Hamano
  2 siblings, 1 reply; 78+ messages in thread
From: Jeff King @ 2020-08-25 17:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, dstolee

On Tue, Aug 25, 2020 at 08:58:46AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Does "git repack" ever remove just one pack? Obviously "git repack -ad"
> > or "git repack -Ad" is going to pack everything and delete the old
> > packs. So I think we'd want to remove a midx there.
> 
> AFAIK, the pack-redundant subcommand is used by nobody in-tree, so
> nobody is doing "there are three packfiles, but all the objects in
> one of them are contained in the other two, so let's remove that
> redundant one".

OK, that's the part I was missing. The discussion here and the statement
from git-repack(1):

  -d
      After packing, if the newly created packs make some existing packs
      redundant, remove the redundant packs. Also run git prune-packed
      to remove redundant loose object files.

made me think that it was running pack-redundant. But it doesn't seem
to. It looks like we stopped doing so in 6ed64058e1 (git-repack: do not
do complex redundancy check., 2005-11-19).

As an aside, we tried using pack-redundant at GitHub several years ago
for dropping packs that were replicated in alternates storage. It
performs very poorly (quadratically, perhaps?) to the point that we
found it unusable, and I implemented a "local-redundant" command to do
the same thing more quickly. We ended up dropping that as we now migrate
packs wholesale (rather than via fetch), so I never upstreamed it.

I mention that here to warn people away from pack-redundant (I wondered
at the time why nobody had noticed its terrible performance, but now I
think the answer is just that nobody ever runs it). I wonder if we
should even consider deprecating and removing it.

I'm also happy to share local-redundant if anybody is interested (though
as I've mentioned elsewhere, I think the larger scheme it was part of it
also performed poorly and isn't worth pursuing).

> > And "git repack -d" I think of as deleting only loose objects that we
> > just packed. But I guess it could also remove a pack that has now been
> > made redundant? That seems like a rare case in practice, but I suppose
> > is possible.
> 
> Meaning it can become reality?  Yes.  Or it already can happen?  I
> doubt it.

I thought the latter, but after this thread I agree that it can't.

> > I'm also a little curious how bad it is to have a midx whose pack has
> > gone away. I guess we'd answer queries for "yes, we have this object"
> > even if we don't, which is bad. Though in practice we'd only delete
> > those packs if we have their objects elsewhere.
> 
> Hmph, object O used to be in pack A and pack B, midx points at the
> copy in pack B but we deleted it because the pack is redundant.
> Now, midx says O exists, but midx cannot be used to retrieve O.  We
> need to ask A.idx about O to locate it.
> 
> That sounds brittle.  I am not sure our error fallback is all that
> patient.

It has to be that patient even without a midx, I think. We might open
A.idx and mmap its contents, without opening the matching A.pack (or we
might open it and later close it due to memory or descriptor
constraints). And we have two layers there:

  - when object_info_extended sees a bad return from
    packed_object_info(), it will mark the entry as bad and recurse,
    giving us an opportunity to find another one

  - when we can't find the object at all (e.g., because we marked that
    particular pack's version as inaccessible), we'll hit the
    reprepare_packed_git() path and look for a new idx

Those same things should be kicking in for midx lookups, too (I didn't
test them, but from a cursory look at the organization of the code, I
think they will).

> > In that line of thinking, do we even need to delete midx files if one of
> > their packs goes away? The reading side probably ought to be able to
> > handle that gracefully.
> 
> But at that point, is there even a point to have that midx file that
> knows about objects (so it can answer has_object()? correctly and
> quickly) but does not know the correct location of half of the objects?
> Instead of going directly to A.idx to locate O, we need to go to midx
> to learn the location of O in B (which no longer exists), and then
> fall back to it, that is a pure overhead.

It is overhead for sure. But my thinking was that you're trading one
efficiency for another:

  - the midx may incur an extra lookup for objects in the redundant
    pack, but that pack may be a small portion of what the midx indexes
    (and this is likely in the usual case that you have one big
    cumulative pack and many recently-created smaller ones; the big one
    will never be made redundant and dominates the object count of the
    midx)

  - by keeping the midx, you're improving lookup speed for all of the
    other objects, which don't have to hunt through separate pack idx
    files

So my guess is that it would usually be a win. Though really the correct
answer is: if you are mucking with packs you should just generate a new
midx (or you should pack all-into-one, which generates a single idx
accomplishing the same thing).

> > And the more interesting case is when you repack everything with "-ad"
> > or similar, at which point you shouldn't even need to look up what's in
> > the midx to see if you deleted its packs. The point of your operation is
> > to put it all-into-one, so you know the old midx should be discarded.
> 
> Old one, yes.  Do we need to have the new one in that case?

You shouldn't need one since you'd only have a single pack now.

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 17:10       ` Derrick Stolee
@ 2020-08-25 17:29         ` Jeff King
  2020-08-25 17:34           ` Taylor Blau
  0 siblings, 1 reply; 78+ messages in thread
From: Jeff King @ 2020-08-25 17:29 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Son Luong Ngoc, git, dstolee

On Tue, Aug 25, 2020 at 01:10:13PM -0400, Derrick Stolee wrote:

> > If we do care, though, that implies some cooperation between the
> > deletion process and the midx code. Perhaps it even argues that repack
> > should refuse to delete such a single pack at all, since it _isn't_
> > redundant. It's part of a midx, and the caller should rewrite the midx
> > first itself, and _then_ look for redundant packs.
> 
> It is worth noting that we _do_ have a way to integrate the delete and
> write code using 'git multi-pack-index expire'. One way to resolve this
> atomicity would be to do the following inside the repack command:
> 
>  1. Create and index the new pack.
>  2. git multi-pack-index write
>  3. git multi-pack-index expire

Given that discussion elsewhere points to git-repack only really
deleting packs in all-in-one mode (and not ever a single pack), it seems
like we can really be much simpler here. If we're not deleting packs via
all-in-one, there's no need to touch the midx at all. If we are, then
it's reasonable to delete the midx immediately (after having written our
new pack but before deleting) since our new single pack idx is as good
or better.

I.e., drop step 2 above, and make step 3 just clear_midx_file(). Which
is roughly what the code does now, isn't it? Or is there some reason
that "expire" is more interesting than just clearing?

And if anybody does want to drop single packs, etc, they can do so by
generating a sensible midx separately from the repack operation (and
probably doing so before dropping the packs for atomicity).

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 17:29         ` Jeff King
@ 2020-08-25 17:34           ` Taylor Blau
  2020-08-25 17:42             ` Jeff King
  0 siblings, 1 reply; 78+ messages in thread
From: Taylor Blau @ 2020-08-25 17:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, Taylor Blau, Son Luong Ngoc, git, dstolee

On Tue, Aug 25, 2020 at 01:29:01PM -0400, Jeff King wrote:
> On Tue, Aug 25, 2020 at 01:10:13PM -0400, Derrick Stolee wrote:
>
> > > If we do care, though, that implies some cooperation between the
> > > deletion process and the midx code. Perhaps it even argues that repack
> > > should refuse to delete such a single pack at all, since it _isn't_
> > > redundant. It's part of a midx, and the caller should rewrite the midx
> > > first itself, and _then_ look for redundant packs.
> >
> > It is worth noting that we _do_ have a way to integrate the delete and
> > write code using 'git multi-pack-index expire'. One way to resolve this
> > atomicity would be to do the following inside the repack command:
> >
> >  1. Create and index the new pack.
> >  2. git multi-pack-index write
> >  3. git multi-pack-index expire
>
> Given that discussion elsewhere points to git-repack only really
> deleting packs in all-in-one mode (and not ever a single pack), it seems
> like we can really be much simpler here. If we're not deleting packs via
> all-in-one, there's no need to touch the midx at all. If we are, then
> it's reasonable to delete the midx immediately (after having written our
> new pack but before deleting) since our new single pack idx is as good
> or better.
>
> I.e., drop step 2 above, and make step 3 just clear_midx_file(). Which
> is roughly what the code does now, isn't it? Or is there some reason
> that "expire" is more interesting than just clearing?

It's not clear to me whether you're talking about my patch, or what a
more full integration with 'git repack' looks like.

If you are talking about my patch, I disagree: checking that the MIDX
doesn't know about a pack we're dropping *is* useful even without
all-in-one, because of '.keep' packs (as demonstrated by the new test in
my patch).

To me, this patch seems like an incremental step in the direction that
we ultimately want to be going, but it's hard to untangle whether the
ensuing discussion is targeted at my patch, or the ultimate goal.

> And if anybody does want to drop single packs, etc, they can do so by
> generating a sensible midx separately from the repack operation (and
> probably doing so before dropping the packs for atomicity).
>
> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 16:18     ` Derrick Stolee
@ 2020-08-25 17:34       ` Jeff King
  0 siblings, 0 replies; 78+ messages in thread
From: Jeff King @ 2020-08-25 17:34 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, Taylor Blau, git, dstolee

On Tue, Aug 25, 2020 at 12:18:42PM -0400, Derrick Stolee wrote:

> The best I can see is that prepare_midx_pack() will return 1 if the
> pack no longer exists, and this would cause a die("error preparing
> packfile from multi-pack-index") in nth_midxed_pack_entry(), killing
> the following stack trace:
> 
> + find_pack_entry():packfile.c
>  + fill_midx_entry():midx.c
>   + nth_midxed_pack_entry():midx.c
> 
> Perhaps that die() is a bit over-zealous since we could return 1
> through this stack and find_pack_entry() could continue searching
> for the object in the packed_git list. However, it could start
> returning false _negatives_ if there were duplicates of the object
> in the multi-pack-index but only the latest copy was deleted (and
> the object does not appear in a pack-file outside of the multi-
> pack-index).

Hmm, yeah.

I thought this code is already doing the right thing, because I'd expect
the is_pack_valid() call later in nth_midxed_pack_entry() to be where we
notice the problem. But add_packed_git() does stat the packfile and
return an error.

So that die() really ought to be just "return 0". The caller already has
to (and does) handle similar errors (including that the pack went away
after we added it to the packed_git list, or that it exists but has
bogus data, etc).

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 17:34           ` Taylor Blau
@ 2020-08-25 17:42             ` Jeff King
  0 siblings, 0 replies; 78+ messages in thread
From: Jeff King @ 2020-08-25 17:42 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Derrick Stolee, Son Luong Ngoc, git, dstolee

On Tue, Aug 25, 2020 at 01:34:25PM -0400, Taylor Blau wrote:

> > I.e., drop step 2 above, and make step 3 just clear_midx_file(). Which
> > is roughly what the code does now, isn't it? Or is there some reason
> > that "expire" is more interesting than just clearing?
> 
> It's not clear to me whether you're talking about my patch, or what a
> more full integration with 'git repack' looks like.
> 
> If you are talking about my patch, I disagree: checking that the MIDX
> doesn't know about a pack we're dropping *is* useful even without
> all-in-one, because of '.keep' packs (as demonstrated by the new test in
> my patch).

I hadn't considered .keep. So all-in-one may still involve selectively
deleting some packs. It makes sense, then, for repack to consider
whether the midx is actually redundant or not rather than just always
clearing it (i.e., what your patch does).

In general I consider .keep packs kind of an awful feature that
introduces a lot of confusion and works against other features like
bitmaps. But I guess that they're the only thing that allows you to have
a gigantic Windows-like repo where you never fully pack it, but just
keep acquiring a string of big packs. Which is the exact case that the
midx is most useful for. So they're definitely worth considering and
supporting here.

> To me, this patch seems like an incremental step in the direction that
> we ultimately want to be going, but it's hard to untangle whether the
> ensuing discussion is targeted at my patch, or the ultimate goal.

I wasn't sure of the answer to that until we untangled more. I.e., it
wasn't clear to me if your incremental step was even in the right
direction if we weren't in fact ever selectively deleting packs in
git-repack. (And now it sounds like we aren't via "git repack -d", which
was my original question, but we are via .keep).

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 17:22     ` Jeff King
@ 2020-08-25 18:05       ` Junio C Hamano
  2020-08-25 18:27         ` Jeff King
  0 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2020-08-25 18:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee

Jeff King <peff@peff.net> writes:

> OK, that's the part I was missing. The discussion here and the statement
> from git-repack(1):
>
>   -d
>       After packing, if the newly created packs make some existing packs
>       redundant, remove the redundant packs. Also run git prune-packed
>       to remove redundant loose object files.
>
> made me think that it was running pack-redundant. But it doesn't seem
> to. It looks like we stopped doing so in 6ed64058e1 (git-repack: do not
> do complex redundancy check., 2005-11-19).

Thanks for digging.  A good opportunity for a #leftoverbits
documentation update from new people is here.

> As an aside, we tried using pack-redundant at GitHub several years ago
> for dropping packs that were replicated in alternates storage. It
> performs very poorly (quadratically, perhaps?) to the point that we
> found it unusable,...

Yes, I originally wrote "the pack-redundant subcommand" in the
message you are responding to with a bit more colourful adjectives,
but rewrote it ;-)  My recollection from the last time I looked at
it is that it is quadratic or even worse---that was long time ago,
but on the other hand I think the subcommand had no significant
improvement over the course of its life.

Perhaps it is time to drop it.

-- >8 --
Subject: [RFC] pack-redundant: gauge the usage before proposing its removal

The subcommand is unusably slow and the reason why nobody reports it
as a performance bug is suspected to be the absense of users.  Let's
disable the normal use of command by making it error out with a big
message that asks the user to tell us that they still care about the
command, with an escape hatch to override it with a command line
option.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/pack-redundant.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index 178e3409b7..97cf3df79b 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -554,6 +554,7 @@ static void load_all(void)
 int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	int i_still_use_this = 0;
 	struct pack_list *min = NULL, *red, *pl;
 	struct llist *ignore;
 	struct object_id *oid;
@@ -580,12 +581,25 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 			alt_odb = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--i-still-use-this")) {
+			i_still_use_this = 1;
+			continue;
+		}
 		if (*arg == '-')
 			usage(pack_redundant_usage);
 		else
 			break;
 	}
 
+	if (!i_still_use_this) {
+		puts(_("'git pack-redundant' is nominated for removal.\n"
+		       "If you still use this command, please add an extra\n"
+		       "option, '--i-still-use-this', on the command line\n"
+		       "and let us know you still use it by sending an e-mail\n"
+		       "to <git@vger.kernel.org>.  Thanks\n"));
+		exit(1);
+	}
+
 	if (load_all_packs)
 		load_all();
 	else

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 18:05       ` Junio C Hamano
@ 2020-08-25 18:27         ` Jeff King
  2020-08-25 22:45           ` [PATCH] pack-redundant: gauge the usage before proposing its removal Junio C Hamano
  0 siblings, 1 reply; 78+ messages in thread
From: Jeff King @ 2020-08-25 18:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, dstolee

On Tue, Aug 25, 2020 at 11:05:09AM -0700, Junio C Hamano wrote:

> Subject: [RFC] pack-redundant: gauge the usage before proposing its removal
> 
> The subcommand is unusably slow and the reason why nobody reports it
> as a performance bug is suspected to be the absense of users.  Let's
> disable the normal use of command by making it error out with a big
> message that asks the user to tell us that they still care about the
> command, with an escape hatch to override it with a command line
> option.

I like this direction. And I'm tempted to say it's OK to go straight
here given how generally useless the command is. But it feels like a big
jump for the initial change. While we give the user an easy escape
hatch, it might still break their scripts.

A more gentle transition would I guess be:

  1. Mention deprecation in release notes (hah, as if anybody reads
     them).

  2. Issue a warning but continue to behave as normal. That might break
     scripts that care a lot about stderr, but otherwise is harmless. No
     clue if anybody would actually see the message or not.

  3. Flip warning to error, as your patch does.

  4. Removal.

I'd guess we could do 1+2 in one release, then 3 a release or two later,
and then finally 4. I dunno. That's more tedious and I'll be surprised
if we get any bites, but maybe we ought to do it out of principle.

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH] pack-redundant: gauge the usage before proposing its removal
  2020-08-25 18:27         ` Jeff King
@ 2020-08-25 22:45           ` Junio C Hamano
  2020-08-25 23:09             ` Taylor Blau
                               ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-25 22:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee

The subcommand is unusably slow and the reason why nobody reports it
as a performance bug is suspected to be the absense of users.  Let's
show a big message that asks the user to tell us that they still
care about the command when an attempt is made to run the command,
with an escape hatch to override it with a command line option.

In a few releases, we may turn it into an error and keep it for a
few more releases before finally removing it (during the whole time,
the plan to remove it would be interrupted by end user raising hand).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

    Jeff King <peff@peff.net> writes:

    > A more gentle transition would I guess be:
    >
    >   1. Mention deprecation in release notes (hah, as if anybody reads
    >      them).
    >
    >   2. Issue a warning but continue to behave as normal. That might break
    >      scripts that care a lot about stderr, but otherwise is harmless. No
    >      clue if anybody would actually see the message or not.

    OK, so here is an update for the above.

 builtin/pack-redundant.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index 178e3409b7..b94c2f2423 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -554,6 +554,7 @@ static void load_all(void)
 int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 {
 	int i;
+	int i_still_use_this = 0;
 	struct pack_list *min = NULL, *red, *pl;
 	struct llist *ignore;
 	struct object_id *oid;
@@ -580,12 +581,24 @@ int cmd_pack_redundant(int argc, const char **argv, const char *prefix)
 			alt_odb = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--i-still-use-this")) {
+			i_still_use_this = 1;
+			continue;
+		}
 		if (*arg == '-')
 			usage(pack_redundant_usage);
 		else
 			break;
 	}
 
+	if (!i_still_use_this) {
+		fputs(_("'git pack-redundant' is nominated for removal.\n"
+			"If you still use this command, please add an extra\n"
+			"option, '--i-still-use-this', on the command line\n"
+			"and let us know you still use it by sending an e-mail\n"
+			"to <git@vger.kernel.org>.  Thanks.\n"), stderr);
+	}
+
 	if (load_all_packs)
 		load_all();
 	else
-- 
2.28.0-454-g5f859b1948


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH] pack-redundant: gauge the usage before proposing its removal
  2020-08-25 22:45           ` [PATCH] pack-redundant: gauge the usage before proposing its removal Junio C Hamano
@ 2020-08-25 23:09             ` Taylor Blau
  2020-08-25 23:22               ` Junio C Hamano
  2020-08-26  1:17             ` [PATCH v1 0/3] War on dashed-git Junio C Hamano
  2020-08-28  9:14             ` [PATCH] pack-redundant: gauge the usage before proposing its removal Jeff King
  2 siblings, 1 reply; 78+ messages in thread
From: Taylor Blau @ 2020-08-25 23:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Taylor Blau, git, dstolee

Hi Junio,

On Tue, Aug 25, 2020 at 03:45:52PM -0700, Junio C Hamano wrote:
> The subcommand is unusably slow and the reason why nobody reports it
> as a performance bug is suspected to be the absense of users.  Let's
> show a big message that asks the user to tell us that they still
> care about the command when an attempt is made to run the command,
> with an escape hatch to override it with a command line option.
>
> In a few releases, we may turn it into an error and keep it for a
> few more releases before finally removing it (during the whole time,
> the plan to remove it would be interrupted by end user raising hand).
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---

Thanks. Peff's plan seems reasonable to me (and I'd like to add that I
am a frequent reader of the release notes ;-)), as does this patch.

  Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] pack-redundant: gauge the usage before proposing its removal
  2020-08-25 23:09             ` Taylor Blau
@ 2020-08-25 23:22               ` Junio C Hamano
  0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-25 23:22 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, git, dstolee

Taylor Blau <me@ttaylorr.com> writes:

> On Tue, Aug 25, 2020 at 03:45:52PM -0700, Junio C Hamano wrote:
>> The subcommand is unusably slow and the reason why nobody reports it
>> as a performance bug is suspected to be the absense of users.  Let's
>> show a big message that asks the user to tell us that they still
>> care about the command when an attempt is made to run the command,
>> with an escape hatch to override it with a command line option.
>>
>> In a few releases, we may turn it into an error and keep it for a
>> few more releases before finally removing it (during the whole time,
>> the plan to remove it would be interrupted by end user raising hand).
>>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> ---
>
> Thanks. Peff's plan seems reasonable to me (and I'd like to add that I
> am a frequent reader of the release notes ;-)), as does this patch.
>
>   Reviewed-by: Taylor Blau <me@ttaylorr.com>

Thanks.  It needs updates to a test script, though.

diff --git a/t/t5323-pack-redundant.sh b/t/t5323-pack-redundant.sh
index 6b4d1ca353..2dd2d67b9e 100755
--- a/t/t5323-pack-redundant.sh
+++ b/t/t5323-pack-redundant.sh
@@ -39,6 +39,8 @@ relationship between packs and objects is as follows:
 master_repo=master.git
 shared_repo=shared.git
 
+git_pack_redundant='git pack-redundant --i-still-use-this'
+
 # Create commits in <repo> and assign each commit's oid to shell variables
 # given in the arguments (A, B, and C). E.g.:
 #
@@ -154,7 +156,7 @@ test_expect_success 'master: no redundant for pack 1, 2, 3' '
 		EOF
 	(
 		cd "$master_repo" &&
-		git pack-redundant --all >out &&
+		$git_pack_redundant --all >out &&
 		test_must_be_empty out
 	)
 '
@@ -192,7 +194,7 @@ test_expect_success 'master: one of pack-2/pack-3 is redundant' '
 		cat >expect <<-EOF &&
 			P3:$P3
 			EOF
-		git pack-redundant --all >out &&
+		$git_pack_redundant --all >out &&
 		format_packfiles <out >actual &&
 		test_cmp expect actual
 	)
@@ -231,7 +233,7 @@ test_expect_success 'master: pack 2, 4, and 6 are redundant' '
 			P4:$P4
 			P6:$P6
 			EOF
-		git pack-redundant --all >out &&
+		$git_pack_redundant --all >out &&
 		format_packfiles <out >actual &&
 		test_cmp expect actual
 	)
@@ -266,7 +268,7 @@ test_expect_success 'master: pack-8 (subset of pack-1) is also redundant' '
 			P6:$P6
 			P8:$P8
 			EOF
-		git pack-redundant --all >out &&
+		$git_pack_redundant --all >out &&
 		format_packfiles <out >actual &&
 		test_cmp expect actual
 	)
@@ -284,9 +286,9 @@ test_expect_success 'master: clean loose objects' '
 test_expect_success 'master: remove redundant packs and pass fsck' '
 	(
 		cd "$master_repo" &&
-		git pack-redundant --all | xargs rm &&
+		$git_pack_redundant --all | xargs rm &&
 		git fsck &&
-		git pack-redundant --all >out &&
+		$git_pack_redundant --all >out &&
 		test_must_be_empty out
 	)
 '
@@ -304,7 +306,7 @@ test_expect_success 'setup shared.git' '
 test_expect_success 'shared: all packs are redundant, but no output without --alt-odb' '
 	(
 		cd "$shared_repo" &&
-		git pack-redundant --all >out &&
+		$git_pack_redundant --all >out &&
 		test_must_be_empty out
 	)
 '
@@ -343,7 +345,7 @@ test_expect_success 'shared: show redundant packs in stderr for verbose mode' '
 			P5:$P5
 			P7:$P7
 			EOF
-		git pack-redundant --all --verbose >out 2>out.err &&
+		$git_pack_redundant --all --verbose >out 2>out.err &&
 		test_must_be_empty out &&
 		grep "pack$" out.err | format_packfiles >actual &&
 		test_cmp expect actual
@@ -356,9 +358,9 @@ test_expect_success 'shared: remove redundant packs, no packs left' '
 		cat >expect <<-EOF &&
 			fatal: Zero packs found!
 			EOF
-		git pack-redundant --all --alt-odb | xargs rm &&
+		$git_pack_redundant --all --alt-odb | xargs rm &&
 		git fsck &&
-		test_must_fail git pack-redundant --all --alt-odb >actual 2>&1 &&
+		test_must_fail $git_pack_redundant --all --alt-odb >actual 2>&1 &&
 		test_cmp expect actual
 	)
 '
@@ -386,7 +388,7 @@ test_expect_success 'shared: create new objects and packs' '
 test_expect_success 'shared: no redundant without --alt-odb' '
 	(
 		cd "$shared_repo" &&
-		git pack-redundant --all >out &&
+		$git_pack_redundant --all >out &&
 		test_must_be_empty out
 	)
 '
@@ -417,7 +419,7 @@ test_expect_success 'shared: no redundant without --alt-odb' '
 test_expect_success 'shared: one pack is redundant with --alt-odb' '
 	(
 		cd "$shared_repo" &&
-		git pack-redundant --all --alt-odb >out &&
+		$git_pack_redundant --all --alt-odb >out &&
 		format_packfiles <out >actual &&
 		test_line_count = 1 actual
 	)
@@ -454,7 +456,7 @@ test_expect_success 'shared: ignore unique objects and all two packs are redunda
 			Px1:$Px1
 			Px2:$Px2
 			EOF
-		git pack-redundant --all --alt-odb >out <<-EOF &&
+		$git_pack_redundant --all --alt-odb >out <<-EOF &&
 			$X
 			$Y
 			$Z

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 0/3] War on dashed-git
  2020-08-25 22:45           ` [PATCH] pack-redundant: gauge the usage before proposing its removal Junio C Hamano
  2020-08-25 23:09             ` Taylor Blau
@ 2020-08-26  1:17             ` Junio C Hamano
  2020-08-26  1:17               ` [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form Junio C Hamano
                                 ` (3 more replies)
  2020-08-28  9:14             ` [PATCH] pack-redundant: gauge the usage before proposing its removal Jeff King
  2 siblings, 4 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26  1:17 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Jeff King

If we were to propose breaking the 12-year old promise to our users
that we will keep "git-foo" binaries on disk where "git --exec-path"
tells them, we first need to gauge how big a population we would be
hurting with such a move.

So here is a miniseries towards that.  The third one hooks to the
codepath in git.c::cmd_main() where av[0] is of "git-foo" form and
we dispatch to "foo" as a builtin command.  In the original code, we
will die() if "foo" is not a built-in command in this codepath, so
it is exactly the place we want to catch remaining uses of "git-foo"
invoking built-in commands.

There are a few legitimate "git-foo" calls made even for built-ins
and those exceptions are marked in the command-list mechanism, which
is shared with the help subsystem.  We might want to see if we can
unify this exception list with what we have in the Makefile as
BIN_PROGRAMS and what Dscho introduces as ALL_COMMANDS_TO_INSTALL
in his series.  These have large overlaps in what they mean, but
they are not exactly identical.

Junio C Hamano (3):
  transport-helper: do not run git-remote-ext etc. in dashed form
  cvsexportcommit: do not run git programs in dashed form
  git: catch an attempt to run "git-foo"

 command-list.txt         | 11 +++++++----
 git-cvsexportcommit.perl | 16 ++++++++--------
 git.c                    |  2 ++
 help.c                   | 34 ++++++++++++++++++++++++++++++++++
 help.h                   |  3 +++
 transport-helper.c       |  3 ++-
 6 files changed, 56 insertions(+), 13 deletions(-)

-- 
2.28.0-454-g5f859b1948


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form
  2020-08-26  1:17             ` [PATCH v1 0/3] War on dashed-git Junio C Hamano
@ 2020-08-26  1:17               ` Junio C Hamano
  2020-08-26  1:24                 ` Eric Sunshine
  2020-08-26  1:17               ` [PATCH v1 2/3] cvsexportcommit: do not run git programs " Junio C Hamano
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26  1:17 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Jeff King

Runing them as "git remote-ext" and letting "git" dispatch to
"remote-ext" would just be fine and is more idiomatic.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 transport-helper.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/transport-helper.c b/transport-helper.c
index defafbf4c1..fae3d99500 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -128,7 +128,8 @@ static struct child_process *get_helper(struct transport *transport)
 	helper->in = -1;
 	helper->out = -1;
 	helper->err = 0;
-	strvec_pushf(&helper->args, "git-remote-%s", data->name);
+	strvec_push(&helper->args, "git");
+	strvec_pushf(&helper->args, "remote-%s", data->name);
 	strvec_push(&helper->args, transport->remote->name);
 	strvec_push(&helper->args, remove_ext_force(transport->url));
 	helper->git_cmd = 0;
-- 
2.28.0-454-g5f859b1948


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 2/3] cvsexportcommit: do not run git programs in dashed form
  2020-08-26  1:17             ` [PATCH v1 0/3] War on dashed-git Junio C Hamano
  2020-08-26  1:17               ` [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form Junio C Hamano
@ 2020-08-26  1:17               ` Junio C Hamano
  2020-08-26  1:28                 ` Eric Sunshine
  2020-08-26  8:02                 ` Johannes Schindelin
  2020-08-26  1:17               ` [PATCH v1 3/3] git: catch an attempt to run "git-foo" Junio C Hamano
  2020-08-26  8:09               ` [PATCH v1 0/3] War on dashed-git Johannes Schindelin
  3 siblings, 2 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26  1:17 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Jeff King

This ancient script runs "git-foo" all over the place.  A strange
thing is that it has t9200 tests successfully running, even though
it does not seem to futz with PATH to prepend $(git --exec-path)
output.  It is tempting to declare that the command must be unused,
but that is left to another topic.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 git-cvsexportcommit.perl | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/git-cvsexportcommit.perl b/git-cvsexportcommit.perl
index 0ae8bce3fb..289d4bc684 100755
--- a/git-cvsexportcommit.perl
+++ b/git-cvsexportcommit.perl
@@ -30,7 +30,7 @@
 	# Remember where GIT_DIR is before changing to CVS checkout
 	unless ($ENV{GIT_DIR}) {
 		# No GIT_DIR set. Figure it out for ourselves
-		my $gd =`git-rev-parse --git-dir`;
+		my $gd =`git rev-parse --git-dir`;
 		chomp($gd);
 		$ENV{GIT_DIR} = $gd;
 	}
@@ -66,7 +66,7 @@
 # resolve target commit
 my $commit;
 $commit = pop @ARGV;
-$commit = safe_pipe_capture('git-rev-parse', '--verify', "$commit^0");
+$commit = safe_pipe_capture('git', 'rev-parse', '--verify', "$commit^0");
 chomp $commit;
 if ($?) {
     die "The commit reference $commit did not resolve!";
@@ -76,7 +76,7 @@
 my $parent;
 if (@ARGV) {
     $parent = pop @ARGV;
-    $parent =  safe_pipe_capture('git-rev-parse', '--verify', "$parent^0");
+    $parent =  safe_pipe_capture('git', 'rev-parse', '--verify', "$parent^0");
     chomp $parent;
     if ($?) {
 	die "The parent reference did not resolve!";
@@ -84,7 +84,7 @@
 }
 
 # find parents from the commit itself
-my @commit  = safe_pipe_capture('git-cat-file', 'commit', $commit);
+my @commit  = safe_pipe_capture('git', 'cat-file', 'commit', $commit);
 my @parents;
 my $committer;
 my $author;
@@ -162,9 +162,9 @@
 close MSG;
 
 if ($parent eq $noparent) {
-    `git-diff-tree --binary -p --root $commit >.cvsexportcommit.diff`;# || die "Cannot diff";
+    `git diff-tree --binary -p --root $commit >.cvsexportcommit.diff`;# || die "Cannot diff";
 } else {
-    `git-diff-tree --binary -p $parent $commit >.cvsexportcommit.diff`;# || die "Cannot diff";
+    `git diff-tree --binary -p $parent $commit >.cvsexportcommit.diff`;# || die "Cannot diff";
 }
 
 ## apply non-binary changes
@@ -178,7 +178,7 @@
 print "Checking if patch will apply\n";
 
 my @stat;
-open APPLY, "GIT_INDEX_FILE=$tmpdir/index git-apply $context --summary --numstat<.cvsexportcommit.diff|" || die "cannot patch";
+open APPLY, "GIT_INDEX_FILE=$tmpdir/index git apply $context --summary --numstat<.cvsexportcommit.diff|" || die "cannot patch";
 @stat=<APPLY>;
 close APPLY || die "Cannot patch";
 my (@bfiles,@files,@afiles,@dfiles);
@@ -333,7 +333,7 @@
 if ($opt_W) {
     system("git checkout -q $commit^0") && die "cannot patch";
 } else {
-    `GIT_INDEX_FILE=$tmpdir/index git-apply $context --summary --numstat --apply <.cvsexportcommit.diff` || die "cannot patch";
+    `GIT_INDEX_FILE=$tmpdir/index git apply $context --summary --numstat --apply <.cvsexportcommit.diff` || die "cannot patch";
 }
 
 print "Patch applied successfully. Adding new files and directories to CVS\n";
-- 
2.28.0-454-g5f859b1948


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v1 3/3] git: catch an attempt to run "git-foo"
  2020-08-26  1:17             ` [PATCH v1 0/3] War on dashed-git Junio C Hamano
  2020-08-26  1:17               ` [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form Junio C Hamano
  2020-08-26  1:17               ` [PATCH v1 2/3] cvsexportcommit: do not run git programs " Junio C Hamano
@ 2020-08-26  1:17               ` Junio C Hamano
  2020-08-26  1:19                 ` Junio C Hamano
                                   ` (2 more replies)
  2020-08-26  8:09               ` [PATCH v1 0/3] War on dashed-git Johannes Schindelin
  3 siblings, 3 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26  1:17 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Jeff King

If we were to propose removing "git-foo" binaries from the
filesystem for built-in commands, we should first see if there are
users who will be affected by such a move.  When cmd_main() detects
we were called not as "git", but as "git-foo", give an error message
to ask the user to let us know and stop our removal plan, unless we
are running a selected few programs that MUST be callable in the
dashed form (e.g. "git-upload-pack").

Those who are always using "git foo" form will not be affected, but
those who trusted the promise we made to them 12 years ago that by
prepending the output of $(git --exec-path) to the list of
directories on their $PATH, they can safely keep writing
"git-cat-file" will be negatively affected as all their scripts
assuming the promise will be kept are now broken.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 command-list.txt | 11 +++++++----
 git.c            |  2 ++
 help.c           | 34 ++++++++++++++++++++++++++++++++++
 help.h           |  3 +++
 4 files changed, 46 insertions(+), 4 deletions(-)

diff --git a/command-list.txt b/command-list.txt
index e5901f2213..1238f6ec8b 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -39,6 +39,9 @@
 # mainporcelain commands are completable so you don't need this
 # attribute.
 #
+# "onpath" attribute is used to mark that the command MUST appear
+# on $PATH (typically in /usr/bin) due to protocol requirement.
+#
 # As part of the Git man page list, the man(5/7) guides are also
 # specified here, which can only have "guide" attribute and nothing
 # else.
@@ -144,7 +147,7 @@ git-quiltimport                         foreignscminterface
 git-range-diff                          mainporcelain
 git-read-tree                           plumbingmanipulators
 git-rebase                              mainporcelain           history
-git-receive-pack                        synchelpers
+git-receive-pack                        synchelpers        onpath
 git-reflog                              ancillarymanipulators           complete
 git-remote                              ancillarymanipulators           complete
 git-repack                              ancillarymanipulators           complete
@@ -159,7 +162,7 @@ git-rev-parse                           plumbinginterrogators
 git-rm                                  mainporcelain           worktree
 git-send-email                          foreignscminterface             complete
 git-send-pack                           synchingrepositories
-git-shell                               synchelpers
+git-shell                               synchelpers        onpath
 git-shortlog                            mainporcelain
 git-show                                mainporcelain           info
 git-show-branch                         ancillaryinterrogators          complete
@@ -182,8 +185,8 @@ git-unpack-objects                      plumbingmanipulators
 git-update-index                        plumbingmanipulators
 git-update-ref                          plumbingmanipulators
 git-update-server-info                  synchingrepositories
-git-upload-archive                      synchelpers
-git-upload-pack                         synchelpers
+git-upload-archive                      synchelpers        onpath
+git-upload-pack                         synchelpers        onpath
 git-var                                 plumbinginterrogators
 git-verify-commit                       ancillaryinterrogators
 git-verify-pack                         plumbinginterrogators
diff --git a/git.c b/git.c
index 8bd1d7551d..927018bda7 100644
--- a/git.c
+++ b/git.c
@@ -839,6 +839,8 @@ int cmd_main(int argc, const char **argv)
 	 * that one cannot handle it.
 	 */
 	if (skip_prefix(cmd, "git-", &cmd)) {
+		warn_on_dashed_git(argv[0]);
+
 		argv[0] = cmd;
 		handle_builtin(argc, argv);
 		die(_("cannot handle %s as a builtin"), cmd);
diff --git a/help.c b/help.c
index d478afb2af..490d2bc3ae 100644
--- a/help.c
+++ b/help.c
@@ -720,3 +720,37 @@ NORETURN void help_unknown_ref(const char *ref, const char *cmd,
 	string_list_clear(&suggested_refs, 0);
 	exit(1);
 }
+
+static struct cmdname_help *find_cmdname_help(const char *name)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(command_list); i++) {
+		if (!strcmp(command_list[i].name, name))
+			return &command_list[i];
+	}
+	return NULL;
+}
+
+void warn_on_dashed_git(const char *cmd)
+{
+	struct cmdname_help *cmdname;
+	static const char *still_in_use_var = "GIT_I_STILL_USE_DASHED_GIT";
+	static const char *still_in_use_msg =
+		N_("Use of '%s' in the dashed-form is nominated for removal.\n"
+		   "If you still use it, export '%s=true'\n"
+		   "and send an e-mail to <git@vger.kernel.org>\n"
+		   "to let us know and stop our removal plan.  Thanks.\n");
+
+	if (!cmd)
+		return; /* git-help is OK */
+
+	cmdname = find_cmdname_help(cmd);
+	if (cmdname && (cmdname->category & CAT_onpath))
+		return; /* git-upload-pack and friends are OK */
+
+	if (!git_env_bool(still_in_use_var, 0)) {
+		fprintf(stderr, _(still_in_use_msg), cmd, still_in_use_var);
+		exit(1);
+	}
+}
diff --git a/help.h b/help.h
index dc02458855..d3de5e0d69 100644
--- a/help.h
+++ b/help.h
@@ -45,6 +45,9 @@ void get_version_info(struct strbuf *buf, int show_build_options);
  */
 NORETURN void help_unknown_ref(const char *ref, const char *cmd, const char *error);
 
+/* When the cmd_main() sees "git-foo", check if the user intended */
+void warn_on_dashed_git(const char *);
+
 static inline void list_config_item(struct string_list *list,
 				    const char *prefix,
 				    const char *str)
-- 
2.28.0-454-g5f859b1948


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 3/3] git: catch an attempt to run "git-foo"
  2020-08-26  1:17               ` [PATCH v1 3/3] git: catch an attempt to run "git-foo" Junio C Hamano
@ 2020-08-26  1:19                 ` Junio C Hamano
  2020-08-26  8:06                 ` Johannes Schindelin
  2020-08-28  2:13                 ` Johannes Schindelin
  2 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26  1:19 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> If we were to propose removing "git-foo" binaries from the
> filesystem for built-in commands, we should first see if there are
> users who will be affected by such a move.  When cmd_main() detects
> we were called not as "git", but as "git-foo", give an error message
> to ask the user to let us know and stop our removal plan, unless we
> are running a selected few programs that MUST be callable in the
> dashed form (e.g. "git-upload-pack").
>
> Those who are always using "git foo" form will not be affected, but
> those who trusted the promise we made to them 12 years ago that by
> prepending the output of $(git --exec-path) to the list of
> directories on their $PATH, they can safely keep writing
> "git-cat-file" will be negatively affected as all their scripts
> assuming the promise will be kept are now broken.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---

The same idea as the one for pack-redundant.  I do not use the
technique to inspect $PATH and see if $GIT_EXEC_PATH is on it, as
that would mean we will *not* bug users with legitimate need to keep
the feature working, hence will not get "don't do that" objections.

We may want to ensure command_list[] is sorted by name and run
binary search on it if running find_cmdname_help() for each and
every dashed "git-foo" invocations turns out to be costly.  Our
conjecture behind this patch is that the form is rarely if ever
used, so it may not matter at all, though.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form
  2020-08-26  1:17               ` [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form Junio C Hamano
@ 2020-08-26  1:24                 ` Eric Sunshine
  2020-08-26  7:55                   ` Johannes Schindelin
  2020-08-26 16:27                   ` Junio C Hamano
  0 siblings, 2 replies; 78+ messages in thread
From: Eric Sunshine @ 2020-08-26  1:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Johannes Schindelin, Jeff King

On Tue, Aug 25, 2020 at 9:17 PM Junio C Hamano <gitster@pobox.com> wrote:
> Runing them as "git remote-ext" and letting "git" dispatch to

s/Runing/Running/
s/them/it/

> "remote-ext" would just be fine and is more idiomatic.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> diff --git a/transport-helper.c b/transport-helper.c
> @@ -128,7 +128,8 @@ static struct child_process *get_helper(struct transport *transport)
> -       strvec_pushf(&helper->args, "git-remote-%s", data->name);
> +       strvec_push(&helper->args, "git");
> +       strvec_pushf(&helper->args, "remote-%s", data->name);

Rather than pushing "git" as the first argument, would it instead be
more idiomatic to set `helper->git_cmd = 1` (or would that not work
correctly for some reason)?

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 2/3] cvsexportcommit: do not run git programs in dashed form
  2020-08-26  1:17               ` [PATCH v1 2/3] cvsexportcommit: do not run git programs " Junio C Hamano
@ 2020-08-26  1:28                 ` Eric Sunshine
  2020-08-26  1:42                   ` Junio C Hamano
  2020-08-26 16:08                   ` Junio C Hamano
  2020-08-26  8:02                 ` Johannes Schindelin
  1 sibling, 2 replies; 78+ messages in thread
From: Eric Sunshine @ 2020-08-26  1:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git List, Johannes Schindelin, Jeff King

On Tue, Aug 25, 2020 at 9:17 PM Junio C Hamano <gitster@pobox.com> wrote:
> This ancient script runs "git-foo" all over the place.  A strange
> thing is that it has t9200 tests successfully running, even though
> it does not seem to futz with PATH to prepend $(git --exec-path)
> output.

t/test-lib.sh takes care of that for us, doesn't it?

> It is tempting to declare that the command must be unused,
> but that is left to another topic.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 2/3] cvsexportcommit: do not run git programs in dashed form
  2020-08-26  1:28                 ` Eric Sunshine
@ 2020-08-26  1:42                   ` Junio C Hamano
  2020-08-26 16:08                   ` Junio C Hamano
  1 sibling, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26  1:42 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Johannes Schindelin, Jeff King

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Tue, Aug 25, 2020 at 9:17 PM Junio C Hamano <gitster@pobox.com> wrote:
>> This ancient script runs "git-foo" all over the place.  A strange
>> thing is that it has t9200 tests successfully running, even though
>> it does not seem to futz with PATH to prepend $(git --exec-path)
>> output.
>
> t/test-lib.sh takes care of that for us, doesn't it?

It generally shouldn't, I think.

We have bin-wrappers directory and we shouldn't be copying things we
would install in libexec/git-core/ in real life.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form
  2020-08-26  1:24                 ` Eric Sunshine
@ 2020-08-26  7:55                   ` Johannes Schindelin
  2020-08-26 16:27                   ` Junio C Hamano
  1 sibling, 0 replies; 78+ messages in thread
From: Johannes Schindelin @ 2020-08-26  7:55 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Junio C Hamano, Git List, Jeff King

Hi,

On Tue, 25 Aug 2020, Eric Sunshine wrote:

> On Tue, Aug 25, 2020 at 9:17 PM Junio C Hamano <gitster@pobox.com> wrote:
> > Runing them as "git remote-ext" and letting "git" dispatch to
>
> s/Runing/Running/
> s/them/it/
>
> > "remote-ext" would just be fine and is more idiomatic.
> >
> > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> > ---
> > diff --git a/transport-helper.c b/transport-helper.c
> > @@ -128,7 +128,8 @@ static struct child_process *get_helper(struct transport *transport)
> > -       strvec_pushf(&helper->args, "git-remote-%s", data->name);
> > +       strvec_push(&helper->args, "git");
> > +       strvec_pushf(&helper->args, "remote-%s", data->name);
>
> Rather than pushing "git" as the first argument, would it instead be
> more idiomatic to set `helper->git_cmd = 1` (or would that not work
> correctly for some reason)?

That is precisely what I did in Git for Windows:

-- snipsnap --
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: [PATCH] transport-helper: prefer Git's builtins over dashed form

This helps with minimal installations such as MinGit that refuse to
waste .zip real estate by shipping identical copies of builtins (.zip
files do not support hard links).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 transport-helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index 35d6437b557..2c4b7a023db 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -130,10 +130,10 @@ static struct child_process *get_helper(struct transport *transport)
 	helper->in = -1;
 	helper->out = -1;
 	helper->err = 0;
-	argv_array_pushf(&helper->args, "git-remote-%s", data->name);
+	argv_array_pushf(&helper->args, "remote-%s", data->name);
 	argv_array_push(&helper->args, transport->remote->name);
 	argv_array_push(&helper->args, remove_ext_force(transport->url));
-	helper->git_cmd = 0;
+	helper->git_cmd = 1;
 	helper->silent_exec_failure = 1;

 	if (have_git_dir())
--
2.28.0.windows.1.18.g5300e52e185


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 2/3] cvsexportcommit: do not run git programs in dashed form
  2020-08-26  1:17               ` [PATCH v1 2/3] cvsexportcommit: do not run git programs " Junio C Hamano
  2020-08-26  1:28                 ` Eric Sunshine
@ 2020-08-26  8:02                 ` Johannes Schindelin
  1 sibling, 0 replies; 78+ messages in thread
From: Johannes Schindelin @ 2020-08-26  8:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

Hi Junio,

On Tue, 25 Aug 2020, Junio C Hamano wrote:

> This ancient script runs "git-foo" all over the place.  A strange
> thing is that it has t9200 tests successfully running, even though
> it does not seem to futz with PATH to prepend $(git --exec-path)
> output.  It is tempting to declare that the command must be unused,
> but that is left to another topic.

Not surprising at all: when t9200 runs `git cvsexportcommit`, it actually
runs `bin-wrappers/git`, which sets `GIT_EXEC_PATH` to the top-level
directory of the Git source code, then calls the `git` executable which in
turn will set up the `PATH` to prepend `GIT_EXEC_PATH`, and then look for
`git-cvsexportcommit` (which does not exist in `bin-wrappers/`, but in
`GIT_EXEC_PATH`). And of course then `git-rev-parse` is found on the
`PATH`, too.

Slightly more surprising is that my PR build did not fail. I guess we do
test `git svn`, but we skip all CVS-related tests, eh?

Slightly related: it might be a good time to deprecate the CVS-related Git
commands...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 3/3] git: catch an attempt to run "git-foo"
  2020-08-26  1:17               ` [PATCH v1 3/3] git: catch an attempt to run "git-foo" Junio C Hamano
  2020-08-26  1:19                 ` Junio C Hamano
@ 2020-08-26  8:06                 ` Johannes Schindelin
  2020-08-26 16:30                   ` Junio C Hamano
  2020-08-28  2:13                 ` Johannes Schindelin
  2 siblings, 1 reply; 78+ messages in thread
From: Johannes Schindelin @ 2020-08-26  8:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

Hi Junio,

On Tue, 25 Aug 2020, Junio C Hamano wrote:

> If we were to propose removing "git-foo" binaries from the
> filesystem for built-in commands, we should first see if there are
> users who will be affected by such a move.  When cmd_main() detects
> we were called not as "git", but as "git-foo", give an error message
> to ask the user to let us know and stop our removal plan, unless we
> are running a selected few programs that MUST be callable in the
> dashed form (e.g. "git-upload-pack").
>
> Those who are always using "git foo" form will not be affected, but
> those who trusted the promise we made to them 12 years ago that by
> prepending the output of $(git --exec-path) to the list of
> directories on their $PATH, they can safely keep writing
> "git-cat-file" will be negatively affected as all their scripts
> assuming the promise will be kept are now broken.

It might be a good idea to also instrument the existing scripts, to show
the same warning unless they were called through the `git` binary.

_If_ we were to do this ;-)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 0/3] War on dashed-git
  2020-08-26  1:17             ` [PATCH v1 0/3] War on dashed-git Junio C Hamano
                                 ` (2 preceding siblings ...)
  2020-08-26  1:17               ` [PATCH v1 3/3] git: catch an attempt to run "git-foo" Junio C Hamano
@ 2020-08-26  8:09               ` Johannes Schindelin
  2020-08-26 16:45                 ` Junio C Hamano
  3 siblings, 1 reply; 78+ messages in thread
From: Johannes Schindelin @ 2020-08-26  8:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

Hi Junio,

On Tue, 25 Aug 2020, Junio C Hamano wrote:

> If we were to propose breaking the 12-year old promise to our users
> that we will keep "git-foo" binaries on disk where "git --exec-path"
> tells them, we first need to gauge how big a population we would be
> hurting with such a move.
>
> So here is a miniseries towards that.  The third one hooks to the
> codepath in git.c::cmd_main() where av[0] is of "git-foo" form and
> we dispatch to "foo" as a builtin command.  In the original code, we
> will die() if "foo" is not a built-in command in this codepath, so
> it is exactly the place we want to catch remaining uses of "git-foo"
> invoking built-in commands.
>
> There are a few legitimate "git-foo" calls made even for built-ins
> and those exceptions are marked in the command-list mechanism, which
> is shared with the help subsystem.  We might want to see if we can
> unify this exception list with what we have in the Makefile as
> BIN_PROGRAMS and what Dscho introduces as ALL_COMMANDS_TO_INSTALL
> in his series.  These have large overlaps in what they mean, but
> they are not exactly identical.

As it happens, I discussed a scenario the other day where dashed
invocations might still happen, and the consensus was that nobody looks at
the output unless things are broken.

So maybe this patch series would be a good first step, but if we truly
wanted to break that 12-year old promise, we might need to have another
patch on top that _does_ install the dashed commands, but into a
subdirectory of `libexec/git-core/` that is only added to the `PATH` when
an "escape hatch"-style config setting is set.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 2/3] cvsexportcommit: do not run git programs in dashed form
  2020-08-26  1:28                 ` Eric Sunshine
  2020-08-26  1:42                   ` Junio C Hamano
@ 2020-08-26 16:08                   ` Junio C Hamano
  2020-08-26 16:28                     ` Junio C Hamano
  1 sibling, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26 16:08 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Johannes Schindelin, Jeff King

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Tue, Aug 25, 2020 at 9:17 PM Junio C Hamano <gitster@pobox.com> wrote:
>> This ancient script runs "git-foo" all over the place.  A strange
>> thing is that it has t9200 tests successfully running, even though
>> it does not seem to futz with PATH to prepend $(git --exec-path)
>> output.
>
> t/test-lib.sh takes care of that for us, doesn't it?

Actually, it is "git" itself.  

The tests spawn "git cvsexportcommit" via the "git" dispatcher.  The
dispatcher adds GIT_EXEC_PATH to PATH in exec-cmd.c::setup_path()
and that is used to (1) locate "git-foo" from /usr/libexec/git-core/
that are not builtin and (2) passed to any processes we invoke.  I
think the latter was originally done primarily for not breaking
hooks, but that is what allows this script going in this particular
case.

If this were only a fluke in the test that kept otherwise unrunnable
script passing, I'd say it is an evidence enough that lets us drop
the cvsexportcommit immediately, but now I rediscovered how it was
supposed to work and saw how it actually does work as designed, I
would not be surprised if it is still used in the wild.

Thanks.




^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form
  2020-08-26  1:24                 ` Eric Sunshine
  2020-08-26  7:55                   ` Johannes Schindelin
@ 2020-08-26 16:27                   ` Junio C Hamano
  1 sibling, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26 16:27 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Johannes Schindelin, Jeff King

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Tue, Aug 25, 2020 at 9:17 PM Junio C Hamano <gitster@pobox.com> wrote:
>> Runing them as "git remote-ext" and letting "git" dispatch to
>
> s/Runing/Running/
> s/them/it/

I wrote 'them' because it is not only 'ext' but other helpers are
also run with the same mechanism, but the sentence uses remote-ext
as a single concrete example, so 'it' would be more appropriate.
Thanks.

>> "remote-ext" would just be fine and is more idiomatic.
>>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> ---
>> diff --git a/transport-helper.c b/transport-helper.c
>> @@ -128,7 +128,8 @@ static struct child_process *get_helper(struct transport *transport)
>> -       strvec_pushf(&helper->args, "git-remote-%s", data->name);
>> +       strvec_push(&helper->args, "git");
>> +       strvec_pushf(&helper->args, "remote-%s", data->name);
>
> Rather than pushing "git" as the first argument, would it instead be
> more idiomatic to set `helper->git_cmd = 1` (or would that not work
> correctly for some reason)?

I was aiming for minimum change and did not think too deeply.  

If .git_cmd=1 works here (and offhand I do not see a reason why
not), then that would be simpler.  Thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 2/3] cvsexportcommit: do not run git programs in dashed form
  2020-08-26 16:08                   ` Junio C Hamano
@ 2020-08-26 16:28                     ` Junio C Hamano
  0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26 16:28 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, Johannes Schindelin, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Eric Sunshine <sunshine@sunshineco.com> writes:
>
>> On Tue, Aug 25, 2020 at 9:17 PM Junio C Hamano <gitster@pobox.com> wrote:
>>> This ancient script runs "git-foo" all over the place.  A strange
>>> thing is that it has t9200 tests successfully running, even though
>>> it does not seem to futz with PATH to prepend $(git --exec-path)
>>> output.
>>
>> t/test-lib.sh takes care of that for us, doesn't it?
>
> Actually, it is "git" itself.  
>
> The tests spawn "git cvsexportcommit" via the "git" dispatcher.  The
> dispatcher adds GIT_EXEC_PATH to PATH in exec-cmd.c::setup_path()
> and that is used to (1) locate "git-foo" from /usr/libexec/git-core/
> that are not builtin and (2) passed to any processes we invoke.  I
> think the latter was originally done primarily for not breaking
> hooks, but that is what allows this script going in this particular
> case.
>
> If this were only a fluke in the test that kept otherwise unrunnable
> script passing, I'd say it is an evidence enough that lets us drop
> the cvsexportcommit immediately, but now I rediscovered how it was
> supposed to work and saw how it actually does work as designed, I
> would not be surprised if it is still used in the wild.

So, the conclusion: the proposed log message needs a large revamp.
Thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 3/3] git: catch an attempt to run "git-foo"
  2020-08-26  8:06                 ` Johannes Schindelin
@ 2020-08-26 16:30                   ` Junio C Hamano
  0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26 16:30 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Jeff King

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi Junio,
>
> On Tue, 25 Aug 2020, Junio C Hamano wrote:
>
>> If we were to propose removing "git-foo" binaries from the
>> filesystem for built-in commands, we should first see if there are
>> users who will be affected by such a move.  When cmd_main() detects
>> we were called not as "git", but as "git-foo", give an error message
>> to ask the user to let us know and stop our removal plan, unless we
>> are running a selected few programs that MUST be callable in the
>> dashed form (e.g. "git-upload-pack").
>>
>> Those who are always using "git foo" form will not be affected, but
>> those who trusted the promise we made to them 12 years ago that by
>> prepending the output of $(git --exec-path) to the list of
>> directories on their $PATH, they can safely keep writing
>> "git-cat-file" will be negatively affected as all their scripts
>> assuming the promise will be kept are now broken.
>
> It might be a good idea to also instrument the existing scripts, to show
> the same warning unless they were called through the `git` binary.
>
> _If_ we were to do this ;-)

Sure.  

I am not the advocate for removing builtins from on-disk, though.
The burden of proof... ;-)



^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 0/3] War on dashed-git
  2020-08-26  8:09               ` [PATCH v1 0/3] War on dashed-git Johannes Schindelin
@ 2020-08-26 16:45                 ` Junio C Hamano
  2020-08-26 19:46                   ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Junio C Hamano
  0 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26 16:45 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Jeff King

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> So maybe this patch series would be a good first step, but if we truly
> wanted to break that 12-year old promise, we might need to have another
> patch on top that _does_ install the dashed commands, but into a
> subdirectory of `libexec/git-core/` that is only added to the `PATH` when
> an "escape hatch"-style config setting is set.

Yes, I think that is a much more reasonable approach.  Instead of
checking another environment variable that does not do anything but
bypass the "you used dashed form for builtin" check, we install non
builtins in libexec/git-core and leave the exec-cmd.c::setup_path()
as-is to add it to the PATH.  A new location will have the buitlin
binaries, say in libexec/git-core/builtins, and it is not added to
the $PATH by us.  The scripts that the users updated 12-years ago by
adding the former to the $PATH now needs to also add the latter,
too, and those users will loudly complain (which is what we want to
see happen [*1*]), but doing so is an easy way to unbreak them while
we reverse the course.

I think the cvsexportcommit and transport-helper changes are worth
salvaging even if we don't remove builtin binaries, so I'll split
them and whip them a bit more into a reasonable shape to be merged
to 'next'.  The "break those who say 'git-cat-file'" can be left for
future.

Thanks.


[Footnote]

*1* In the ancient thread, I was sick of hearing complaints that
beat the dead horse, but in this particular case, we do want to hear
from them---that is the primary reason why we are doing it.

https://public-inbox.org/git/7vr68b8q9p.fsf@gitster.siamese.dyndns.org/


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form
  2020-08-26 16:45                 ` Junio C Hamano
@ 2020-08-26 19:46                   ` Junio C Hamano
  2020-08-26 19:46                     ` [PATCH v2 1/2] transport-helper: do not run git-remote-ext etc. in " Junio C Hamano
                                       ` (2 more replies)
  0 siblings, 3 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26 19:46 UTC (permalink / raw)
  To: git

Junio C Hamano <gitster@pobox.com> writes:

> I think the cvsexportcommit and transport-helper changes are worth
> salvaging even if we don't remove builtin binaries, so I'll split
> them and whip them a bit more into a reasonable shape to be merged
> to 'next'.  The "break those who say 'git-cat-file'" can be left for
> future.

And here it is.

These two patches are clean-ups that are worth doing whether we
decide to remove on-disk binaries for the builtin commands.  

I droped the third one, that actually stops users from running
built-in commands using the dashed form, at least for now.  It can
be resurrected later if we really wants to propose removal to the
users, but I am not inclined to make such a proposal right now.

Junio C Hamano (2):
  transport-helper: do not run git-remote-ext etc. in dashed form
  cvsexportcommit: do not run git programs in dashed form

 git-cvsexportcommit.perl | 16 ++++++++--------
 transport-helper.c       |  4 ++--
 2 files changed, 10 insertions(+), 10 deletions(-)

-- 
2.28.0-454-g5f859b1948


^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 1/2] transport-helper: do not run git-remote-ext etc. in dashed form
  2020-08-26 19:46                   ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Junio C Hamano
@ 2020-08-26 19:46                     ` Junio C Hamano
  2020-08-26 19:46                     ` [PATCH v2 2/2] cvsexportcommit: do not run git programs " Junio C Hamano
  2020-08-27  0:57                     ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Derrick Stolee
  2 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26 19:46 UTC (permalink / raw)
  To: git

Running it as "git remote-ext" and letting "git" dispatch to
"remote-ext" would just be fine and is more idiomatic.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 transport-helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/transport-helper.c b/transport-helper.c
index defafbf4c1..c52c99d829 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -128,10 +128,10 @@ static struct child_process *get_helper(struct transport *transport)
 	helper->in = -1;
 	helper->out = -1;
 	helper->err = 0;
-	strvec_pushf(&helper->args, "git-remote-%s", data->name);
+	strvec_pushf(&helper->args, "remote-%s", data->name);
 	strvec_push(&helper->args, transport->remote->name);
 	strvec_push(&helper->args, remove_ext_force(transport->url));
-	helper->git_cmd = 0;
+	helper->git_cmd = 1;
 	helper->silent_exec_failure = 1;
 
 	if (have_git_dir())
-- 
2.28.0-454-g5f859b1948


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH v2 2/2] cvsexportcommit: do not run git programs in dashed form
  2020-08-26 19:46                   ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Junio C Hamano
  2020-08-26 19:46                     ` [PATCH v2 1/2] transport-helper: do not run git-remote-ext etc. in " Junio C Hamano
@ 2020-08-26 19:46                     ` Junio C Hamano
  2020-08-26 21:37                       ` [PATCH v2 3/2] credential-cache: use child_process.args Junio C Hamano
  2020-08-27  0:57                     ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Derrick Stolee
  2 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26 19:46 UTC (permalink / raw)
  To: git

This ancient script runs "git-foo" all over the place, which is
OK for a scripted Porcelain in the Git suite, but asking "git" to
dispatch to subcommands is the usual way these days.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 git-cvsexportcommit.perl | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/git-cvsexportcommit.perl b/git-cvsexportcommit.perl
index 0ae8bce3fb..289d4bc684 100755
--- a/git-cvsexportcommit.perl
+++ b/git-cvsexportcommit.perl
@@ -30,7 +30,7 @@
 	# Remember where GIT_DIR is before changing to CVS checkout
 	unless ($ENV{GIT_DIR}) {
 		# No GIT_DIR set. Figure it out for ourselves
-		my $gd =`git-rev-parse --git-dir`;
+		my $gd =`git rev-parse --git-dir`;
 		chomp($gd);
 		$ENV{GIT_DIR} = $gd;
 	}
@@ -66,7 +66,7 @@
 # resolve target commit
 my $commit;
 $commit = pop @ARGV;
-$commit = safe_pipe_capture('git-rev-parse', '--verify', "$commit^0");
+$commit = safe_pipe_capture('git', 'rev-parse', '--verify', "$commit^0");
 chomp $commit;
 if ($?) {
     die "The commit reference $commit did not resolve!";
@@ -76,7 +76,7 @@
 my $parent;
 if (@ARGV) {
     $parent = pop @ARGV;
-    $parent =  safe_pipe_capture('git-rev-parse', '--verify', "$parent^0");
+    $parent =  safe_pipe_capture('git', 'rev-parse', '--verify', "$parent^0");
     chomp $parent;
     if ($?) {
 	die "The parent reference did not resolve!";
@@ -84,7 +84,7 @@
 }
 
 # find parents from the commit itself
-my @commit  = safe_pipe_capture('git-cat-file', 'commit', $commit);
+my @commit  = safe_pipe_capture('git', 'cat-file', 'commit', $commit);
 my @parents;
 my $committer;
 my $author;
@@ -162,9 +162,9 @@
 close MSG;
 
 if ($parent eq $noparent) {
-    `git-diff-tree --binary -p --root $commit >.cvsexportcommit.diff`;# || die "Cannot diff";
+    `git diff-tree --binary -p --root $commit >.cvsexportcommit.diff`;# || die "Cannot diff";
 } else {
-    `git-diff-tree --binary -p $parent $commit >.cvsexportcommit.diff`;# || die "Cannot diff";
+    `git diff-tree --binary -p $parent $commit >.cvsexportcommit.diff`;# || die "Cannot diff";
 }
 
 ## apply non-binary changes
@@ -178,7 +178,7 @@
 print "Checking if patch will apply\n";
 
 my @stat;
-open APPLY, "GIT_INDEX_FILE=$tmpdir/index git-apply $context --summary --numstat<.cvsexportcommit.diff|" || die "cannot patch";
+open APPLY, "GIT_INDEX_FILE=$tmpdir/index git apply $context --summary --numstat<.cvsexportcommit.diff|" || die "cannot patch";
 @stat=<APPLY>;
 close APPLY || die "Cannot patch";
 my (@bfiles,@files,@afiles,@dfiles);
@@ -333,7 +333,7 @@
 if ($opt_W) {
     system("git checkout -q $commit^0") && die "cannot patch";
 } else {
-    `GIT_INDEX_FILE=$tmpdir/index git-apply $context --summary --numstat --apply <.cvsexportcommit.diff` || die "cannot patch";
+    `GIT_INDEX_FILE=$tmpdir/index git apply $context --summary --numstat --apply <.cvsexportcommit.diff` || die "cannot patch";
 }
 
 print "Patch applied successfully. Adding new files and directories to CVS\n";
-- 
2.28.0-454-g5f859b1948


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH v2] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-25 16:04     ` [PATCH v2] " Taylor Blau
@ 2020-08-26 20:51       ` Derrick Stolee
  2020-08-26 20:54         ` Junio C Hamano
  0 siblings, 1 reply; 78+ messages in thread
From: Derrick Stolee @ 2020-08-26 20:51 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: dstolee, peff, sluongng

On 8/25/2020 12:04 PM, Taylor Blau wrote:
> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
> learned to remove a multi-pack-index file if it added or removed a pack
> from the object store.
> 
> This mechanism is a little over-eager, since it is only necessary to
> drop a MIDX if 'git repack' removes a pack that the MIDX references.
> Adding a pack outside of the MIDX does not require invalidating the
> MIDX, and likewise for removing a pack the MIDX does not know about.
> 
> Teach 'git repack' to check for this by loading the MIDX, and checking
> whether the to-be-removed pack is known to the MIDX. This requires a
> slightly odd alternation to a test in t5319, which is explained with a
> comment. A new test is added to show that the MIDX is left alone when
> both packs known to it are marked as .keep, but two packs unknown to it
> are removed and combined into one new pack.
> 
> Helped-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>
> ---
> Range-diff against v1:

I know this thread went in a new direction about pack-redundant and
dashed-git, but this version looks good to me. I wanted to make sure
it wasn't lost in the shuffle.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2] builtin/repack.c: invalidate MIDX only when necessary
  2020-08-26 20:51       ` Derrick Stolee
@ 2020-08-26 20:54         ` Junio C Hamano
  0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26 20:54 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, dstolee, peff, sluongng

Derrick Stolee <stolee@gmail.com> writes:

> On 8/25/2020 12:04 PM, Taylor Blau wrote:
>> In 525e18c04b (midx: clear midx on repack, 2018-07-12), 'git repack'
>> learned to remove a multi-pack-index file if it added or removed a pack
>> from the object store.
>> 
>> This mechanism is a little over-eager, since it is only necessary to
>> drop a MIDX if 'git repack' removes a pack that the MIDX references.
>> Adding a pack outside of the MIDX does not require invalidating the
>> MIDX, and likewise for removing a pack the MIDX does not know about.
>> 
>> Teach 'git repack' to check for this by loading the MIDX, and checking
>> whether the to-be-removed pack is known to the MIDX. This requires a
>> slightly odd alternation to a test in t5319, which is explained with a
>> comment. A new test is added to show that the MIDX is left alone when
>> both packs known to it are marked as .keep, but two packs unknown to it
>> are removed and combined into one new pack.
>> 
>> Helped-by: Derrick Stolee <dstolee@microsoft.com>
>> Signed-off-by: Taylor Blau <me@ttaylorr.com>
>> ---
>> Range-diff against v1:
>
> I know this thread went in a new direction about pack-redundant and
> dashed-git, but this version looks good to me. I wanted to make sure
> it wasn't lost in the shuffle.

Very much appreciated.  I was wondering if I'll see a final reroll
or v2 already had all we wanted.

Thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH v2 3/2] credential-cache: use child_process.args
  2020-08-26 19:46                     ` [PATCH v2 2/2] cvsexportcommit: do not run git programs " Junio C Hamano
@ 2020-08-26 21:37                       ` Junio C Hamano
  2020-08-26 22:25                         ` [PATCH] run_command: teach API users to use embedded 'args' more Junio C Hamano
                                           ` (3 more replies)
  0 siblings, 4 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26 21:37 UTC (permalink / raw)
  To: git; +Cc: Jeff King

As child_process structure has an embedded strvec args for
formulating the command line, let's use it instead of using
an out-of-line argv[] whose length needs to be maintained
correctly. 

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/credential-cache.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/credential-cache.c b/builtin/credential-cache.c
index d0fafdeb9e..195335a783 100644
--- a/builtin/credential-cache.c
+++ b/builtin/credential-cache.c
@@ -42,13 +42,13 @@ static int send_request(const char *socket, const struct strbuf *out)
 static void spawn_daemon(const char *socket)
 {
 	struct child_process daemon = CHILD_PROCESS_INIT;
-	const char *argv[] = { NULL, NULL, NULL };
 	char buf[128];
 	int r;
 
-	argv[0] = "git-credential-cache--daemon";
-	argv[1] = socket;
-	daemon.argv = argv;
+	strvec_pushl(&daemon.args, 
+		     "credential-cache--daemon", socket,
+		     NULL);
+	daemon.git_cmd = 1;
 	daemon.no_stdin = 1;
 	daemon.out = -1;
 

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* [PATCH] run_command: teach API users to use embedded 'args' more
  2020-08-26 21:37                       ` [PATCH v2 3/2] credential-cache: use child_process.args Junio C Hamano
@ 2020-08-26 22:25                         ` Junio C Hamano
  2020-08-27  4:21                           ` Jeff King
  2020-08-27  4:13                         ` [PATCH v2 3/2] credential-cache: use child_process.args Jeff King
                                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2020-08-26 22:25 UTC (permalink / raw)
  To: git; +Cc: Jeff King

The child_process structure has an embedded strvec for formulating
the command line argument list these days, but code that predates
the wide use of it prepared a separate char *argv[] array and
manually set the child_process.argv pointer point at it.

Teach these old-style code to lose the separate argv[] array.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 convert.c    |  5 +----
 credential.c |  4 +---
 submodule.c  | 13 ++++---------
 trailer.c    |  4 +---
 4 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/convert.c b/convert.c
index 572449825c..8e6c292421 100644
--- a/convert.c
+++ b/convert.c
@@ -638,7 +638,6 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	struct child_process child_process = CHILD_PROCESS_INIT;
 	struct filter_params *params = (struct filter_params *)data;
 	int write_err, status;
-	const char *argv[] = { NULL, NULL };
 
 	/* apply % substitution to cmd */
 	struct strbuf cmd = STRBUF_INIT;
@@ -656,9 +655,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	strbuf_expand(&cmd, params->cmd, strbuf_expand_dict_cb, &dict);
 	strbuf_release(&path);
 
-	argv[0] = cmd.buf;
-
-	child_process.argv = argv;
+	strvec_push(&child_process.args, cmd.buf);
 	child_process.use_shell = 1;
 	child_process.in = -1;
 	child_process.out = out;
diff --git a/credential.c b/credential.c
index d8d226b97e..efc29dc5e1 100644
--- a/credential.c
+++ b/credential.c
@@ -274,11 +274,9 @@ static int run_credential_helper(struct credential *c,
 				 int want_output)
 {
 	struct child_process helper = CHILD_PROCESS_INIT;
-	const char *argv[] = { NULL, NULL };
 	FILE *fp;
 
-	argv[0] = cmd;
-	helper.argv = argv;
+	strvec_push(&helper.args, cmd);
 	helper.use_shell = 1;
 	helper.in = -1;
 	if (want_output)
diff --git a/submodule.c b/submodule.c
index 01697848be..e6086faff7 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1727,14 +1727,6 @@ unsigned is_submodule_modified(const char *path, int ignore_untracked)
 int submodule_uses_gitfile(const char *path)
 {
 	struct child_process cp = CHILD_PROCESS_INIT;
-	const char *argv[] = {
-		"submodule",
-		"foreach",
-		"--quiet",
-		"--recursive",
-		"test -f .git",
-		NULL,
-	};
 	struct strbuf buf = STRBUF_INIT;
 	const char *git_dir;
 
@@ -1747,7 +1739,10 @@ int submodule_uses_gitfile(const char *path)
 	strbuf_release(&buf);
 
 	/* Now test that all nested submodules use a gitfile too */
-	cp.argv = argv;
+	strvec_pushl(&cp.args,
+		     "submodule", "foreach", "--quiet",	"--recursive",
+		     "test -f .git", NULL);
+
 	prepare_submodule_repo_env(&cp.env_array);
 	cp.git_cmd = 1;
 	cp.no_stdin = 1;
diff --git a/trailer.c b/trailer.c
index 0c414f2fed..68dabc2556 100644
--- a/trailer.c
+++ b/trailer.c
@@ -221,15 +221,13 @@ static char *apply_command(const char *command, const char *arg)
 	struct strbuf cmd = STRBUF_INIT;
 	struct strbuf buf = STRBUF_INIT;
 	struct child_process cp = CHILD_PROCESS_INIT;
-	const char *argv[] = {NULL, NULL};
 	char *result;
 
 	strbuf_addstr(&cmd, command);
 	if (arg)
 		strbuf_replace(&cmd, TRAILER_ARG_STRING, arg);
 
-	argv[0] = cmd.buf;
-	cp.argv = argv;
+	strvec_push(&cp.args, cmd.buf);
 	cp.env = local_repo_env;
 	cp.no_stdin = 1;
 	cp.use_shell = 1;

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form
  2020-08-26 19:46                   ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Junio C Hamano
  2020-08-26 19:46                     ` [PATCH v2 1/2] transport-helper: do not run git-remote-ext etc. in " Junio C Hamano
  2020-08-26 19:46                     ` [PATCH v2 2/2] cvsexportcommit: do not run git programs " Junio C Hamano
@ 2020-08-27  0:57                     ` Derrick Stolee
  2020-08-27  1:22                       ` Junio C Hamano
  2 siblings, 1 reply; 78+ messages in thread
From: Derrick Stolee @ 2020-08-27  0:57 UTC (permalink / raw)
  To: Junio C Hamano, git

On 8/26/2020 3:46 PM, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> I think the cvsexportcommit and transport-helper changes are worth
>> salvaging even if we don't remove builtin binaries, so I'll split
>> them and whip them a bit more into a reasonable shape to be merged
>> to 'next'.  The "break those who say 'git-cat-file'" can be left for
>> future.
> 
> And here it is.
> 
> These two patches are clean-ups that are worth doing whether we
> decide to remove on-disk binaries for the builtin commands.  

LGTM.

> I droped the third one, that actually stops users from running
> built-in commands using the dashed form, at least for now.  It can
> be resurrected later if we really wants to propose removal to the
> users, but I am not inclined to make such a proposal right now.

I think that's fine for now. A plan to fully deprecate the dashed
forms can come later.

Would an interesting in-between step include removing the dashed
forms for builtins that didn't exist in Git 2.0?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form
  2020-08-27  0:57                     ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Derrick Stolee
@ 2020-08-27  1:22                       ` Junio C Hamano
  0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-27  1:22 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git

Derrick Stolee <stolee@gmail.com> writes:

> Would an interesting in-between step include removing the dashed
> forms for builtins that didn't exist in Git 2.0?

I am not sure if it makes much sense to treat newer and older
built-in commands differently.

Imagine that an old-timer wrote a script by somebody who trusted the
"futz with PATH and you can use git-foo" promise before Git 2.0 and
then the script was inherited by relatively new users.  Adhering to
the "when in doubt, mimic the surrounding code", which is usually a
good discipline to follow, these new users, who are now in charge of
maintaining the script, would add any new calls in "git-foo" form to
match the local convention in good faith.  And the resulting code
would have been working just fine.

Before such a "in-between step" is thrown at them, that is, at which
point it stops working if they were unlucky that they used a
relatively new built-ins.  Typically new end-users would not know
which ones are old built-ins and which ones are new, I suspect.

I do agree with Dscho that "we won't let you use builtin in dashed
form before you export an enviornment" I wrote was not a good way to
gauge the usage of on-disk builtins.  We should move the on-disk
builtins to a different directory and have them point at the
location with their $PATH as the escape hatch, as Dscho suggested,
if we were to do this for real, I'd think.

Thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 3/2] credential-cache: use child_process.args
  2020-08-26 21:37                       ` [PATCH v2 3/2] credential-cache: use child_process.args Junio C Hamano
  2020-08-26 22:25                         ` [PATCH] run_command: teach API users to use embedded 'args' more Junio C Hamano
@ 2020-08-27  4:13                         ` Jeff King
  2020-08-27  4:22                           ` Jeff King
  2020-08-27  4:31                           ` Junio C Hamano
  2020-08-27  4:14                         ` Jeff King
  2020-08-31 22:56                         ` Junio C Hamano
  3 siblings, 2 replies; 78+ messages in thread
From: Jeff King @ 2020-08-27  4:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Aug 26, 2020 at 02:37:39PM -0700, Junio C Hamano wrote:

> diff --git a/builtin/credential-cache.c b/builtin/credential-cache.c
> index d0fafdeb9e..195335a783 100644
> --- a/builtin/credential-cache.c
> +++ b/builtin/credential-cache.c
> @@ -42,13 +42,13 @@ static int send_request(const char *socket, const struct strbuf *out)
>  static void spawn_daemon(const char *socket)
>  {
>  	struct child_process daemon = CHILD_PROCESS_INIT;
> -	const char *argv[] = { NULL, NULL, NULL };
>  	char buf[128];
>  	int r;
>  
> -	argv[0] = "git-credential-cache--daemon";
> -	argv[1] = socket;
> -	daemon.argv = argv;
> +	strvec_pushl(&daemon.args, 
> +		     "credential-cache--daemon", socket,
> +		     NULL);
> +	daemon.git_cmd = 1;

Yep, this makes sense. I don't recall any reason to use the dashed form
back then, but probably it was just that I knew it was a separate
program. Doing it this way will mean an extra parent "git" process
hanging around, but I don't think it's that big a deal. We never try to
kill it by PID, etc (instead we connect to the socket and ask it to
exit). And anyway, it is becoming a builtin in a parallel topic, so that
extra process will go away. :)

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 3/2] credential-cache: use child_process.args
  2020-08-26 21:37                       ` [PATCH v2 3/2] credential-cache: use child_process.args Junio C Hamano
  2020-08-26 22:25                         ` [PATCH] run_command: teach API users to use embedded 'args' more Junio C Hamano
  2020-08-27  4:13                         ` [PATCH v2 3/2] credential-cache: use child_process.args Jeff King
@ 2020-08-27  4:14                         ` Jeff King
  2020-08-27 15:34                           ` Junio C Hamano
  2020-08-31 22:56                         ` Junio C Hamano
  3 siblings, 1 reply; 78+ messages in thread
From: Jeff King @ 2020-08-27  4:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Aug 26, 2020 at 02:37:39PM -0700, Junio C Hamano wrote:

> As child_process structure has an embedded strvec args for
> formulating the command line, let's use it instead of using
> an out-of-line argv[] whose length needs to be maintained
> correctly.

I forgot to mention in my other reply: I think this fails to mention the
actual functional change, which is switching from the dashed form to
using the "git" wrapper.

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] run_command: teach API users to use embedded 'args' more
  2020-08-26 22:25                         ` [PATCH] run_command: teach API users to use embedded 'args' more Junio C Hamano
@ 2020-08-27  4:21                           ` Jeff King
  2020-08-27  4:30                             ` Junio C Hamano
  2020-08-27  4:31                             ` Eric Sunshine
  0 siblings, 2 replies; 78+ messages in thread
From: Jeff King @ 2020-08-27  4:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Aug 26, 2020 at 03:25:03PM -0700, Junio C Hamano wrote:

> The child_process structure has an embedded strvec for formulating
> the command line argument list these days, but code that predates
> the wide use of it prepared a separate char *argv[] array and
> manually set the child_process.argv pointer point at it.
> 
> Teach these old-style code to lose the separate argv[] array.

I think the result is much nicer and less error-prone (especially the
ones that pre-size the array with NULLs and fill in the components
later). It incurs a few extra mallocs at run-time, but on top of an
execve(), that's a drop in the bucket.

I've actually considered dropping child_process.argv entirely. Having
two separate ways to do the same thing gives the potential for
confusion. But I never dug into whether any existing callers would be
made worse for it (I kind of doubt it, though; worst case they can use
strvec_pushv). There are still several left after this patch, it seems.

Likewise for child_process.env_array.

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 3/2] credential-cache: use child_process.args
  2020-08-27  4:13                         ` [PATCH v2 3/2] credential-cache: use child_process.args Jeff King
@ 2020-08-27  4:22                           ` Jeff King
  2020-08-27  4:31                           ` Junio C Hamano
  1 sibling, 0 replies; 78+ messages in thread
From: Jeff King @ 2020-08-27  4:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Aug 27, 2020 at 12:13:28AM -0400, Jeff King wrote:

> On Wed, Aug 26, 2020 at 02:37:39PM -0700, Junio C Hamano wrote:
> 
> > diff --git a/builtin/credential-cache.c b/builtin/credential-cache.c
> > index d0fafdeb9e..195335a783 100644
> > --- a/builtin/credential-cache.c
> > +++ b/builtin/credential-cache.c
> > @@ -42,13 +42,13 @@ static int send_request(const char *socket, const struct strbuf *out)
> >  static void spawn_daemon(const char *socket)
> >  {
> >  	struct child_process daemon = CHILD_PROCESS_INIT;
> > -	const char *argv[] = { NULL, NULL, NULL };
> >  	char buf[128];
> >  	int r;
> >  
> > -	argv[0] = "git-credential-cache--daemon";
> > -	argv[1] = socket;
> > -	daemon.argv = argv;
> > +	strvec_pushl(&daemon.args, 
> > +		     "credential-cache--daemon", socket,
> > +		     NULL);
> > +	daemon.git_cmd = 1;
> 
> Yep, this makes sense. I don't recall any reason to use the dashed form
> back then, but probably it was just that I knew it was a separate
> program. Doing it this way will mean an extra parent "git" process
> hanging around, but I don't think it's that big a deal. We never try to
> kill it by PID, etc (instead we connect to the socket and ask it to
> exit). And anyway, it is becoming a builtin in a parallel topic, so that
> extra process will go away. :)

...and if I had read the diff header more carefully, I'd see that you
did this on top of that topic. :)

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] run_command: teach API users to use embedded 'args' more
  2020-08-27  4:21                           ` Jeff King
@ 2020-08-27  4:30                             ` Junio C Hamano
  2020-08-27  4:31                             ` Eric Sunshine
  1 sibling, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-27  4:30 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> I've actually considered dropping child_process.argv entirely. Having
> two separate ways to do the same thing gives the potential for
> confusion. But I never dug into whether any existing callers would be
> made worse for it (I kind of doubt it, though; worst case they can use
> strvec_pushv). There are still several left after this patch, it seems.
>
> Likewise for child_process.env_array.

Yup, conversion similar to what I did in this patch may be too
trivial for #microproject, but would nevertheless be a good
#leftoverbits task.  The removal of .argv/.env is not entirely
trivial but a good candidate for #microproject.

Thanks.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 3/2] credential-cache: use child_process.args
  2020-08-27  4:13                         ` [PATCH v2 3/2] credential-cache: use child_process.args Jeff King
  2020-08-27  4:22                           ` Jeff King
@ 2020-08-27  4:31                           ` Junio C Hamano
  1 sibling, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-27  4:31 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Yep, this makes sense. I don't recall any reason to use the dashed form
> back then, but probably it was just that I knew it was a separate
> program. Doing it this way will mean an extra parent "git" process
> hanging around, but I don't think it's that big a deal. We never try to
> kill it by PID, etc (instead we connect to the socket and ask it to
> exit). And anyway, it is becoming a builtin in a parallel topic, so that
> extra process will go away. :)

Yeah, this was discovered when I tentatively merged that "don't run
built-in as git-foo" to the tip of seen.  Without your slimmed-down
topic, it wouldn't have been caught.  Likewise for remote-ext.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] run_command: teach API users to use embedded 'args' more
  2020-08-27  4:21                           ` Jeff King
  2020-08-27  4:30                             ` Junio C Hamano
@ 2020-08-27  4:31                             ` Eric Sunshine
  2020-08-27  4:44                               ` Jeff King
  1 sibling, 1 reply; 78+ messages in thread
From: Eric Sunshine @ 2020-08-27  4:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git List

On Thu, Aug 27, 2020 at 12:22 AM Jeff King <peff@peff.net> wrote:
> I've actually considered dropping child_process.argv entirely. Having
> two separate ways to do the same thing gives the potential for
> confusion. [...]
>
> Likewise for child_process.env_array.

builtin/worktree.c:add_worktree() is a case in which an environment
strvec is built once and re-used for multiple run_command()
invocations by re-assigning it to child_process.env before each
run_command(). It uses child_process.env rather than
child_process.env_array because run_command() clears
child_process.env_array upon completion, which makes it difficult to
reuse a prepared environment strvec repeatedly.

Nevertheless, that isn't much of a reason to keep child_process.env.
Refactoring add_worktree() a bit to rebuild the environment strvec
on-demand wouldn't be very difficult.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] run_command: teach API users to use embedded 'args' more
  2020-08-27  4:31                             ` Eric Sunshine
@ 2020-08-27  4:44                               ` Jeff King
  2020-08-27  5:03                                 ` Eric Sunshine
  0 siblings, 1 reply; 78+ messages in thread
From: Jeff King @ 2020-08-27  4:44 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Junio C Hamano, Git List

On Thu, Aug 27, 2020 at 12:31:52AM -0400, Eric Sunshine wrote:

> On Thu, Aug 27, 2020 at 12:22 AM Jeff King <peff@peff.net> wrote:
> > I've actually considered dropping child_process.argv entirely. Having
> > two separate ways to do the same thing gives the potential for
> > confusion. [...]
> >
> > Likewise for child_process.env_array.
> 
> builtin/worktree.c:add_worktree() is a case in which an environment
> strvec is built once and re-used for multiple run_command()
> invocations by re-assigning it to child_process.env before each
> run_command(). It uses child_process.env rather than
> child_process.env_array because run_command() clears
> child_process.env_array upon completion, which makes it difficult to
> reuse a prepared environment strvec repeatedly.
> 
> Nevertheless, that isn't much of a reason to keep child_process.env.
> Refactoring add_worktree() a bit to rebuild the environment strvec
> on-demand wouldn't be very difficult.

I think they'd still be one-liner changes, like:

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 378f332b5d..b40f0d4cea 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -422,7 +422,7 @@ static int add_worktree(const char *path, const char *refname,
 			strvec_push(&cp.args, "--quiet");
 	}
 
-	cp.env = child_env.v;
+	strvec_pushv(&cp.env_array, child_env.v);
 	ret = run_command(&cp);
 	if (ret)
 		goto done;
@@ -433,7 +433,7 @@ static int add_worktree(const char *path, const char *refname,
 		strvec_pushl(&cp.args, "reset", "--hard", "--no-recurse-submodules", NULL);
 		if (opts->quiet)
 			strvec_push(&cp.args, "--quiet");
-		cp.env = child_env.v;
+		strvec_pushv(&cp.env_array, child_env.v);
 		ret = run_command(&cp);
 		if (ret)
 			goto done;

I think there are other opportunities for cleanup, too:

  - the code right above the second hunk clears cp.args manually. That
    shouldn't be necessary because run_command() will leave it in a
    blank state (and we're already relying on that, since otherwise we'd
    be left with cruft in other fields from the previous run).

  - check_clean_worktree() only uses it once, and could drop the
    separate child_env (and in fact appears to leak it)

-Peff

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH] run_command: teach API users to use embedded 'args' more
  2020-08-27  4:44                               ` Jeff King
@ 2020-08-27  5:03                                 ` Eric Sunshine
  2020-08-27  5:25                                   ` [PATCH] worktree: fix leak in check_clean_worktree() Jeff King
  0 siblings, 1 reply; 78+ messages in thread
From: Eric Sunshine @ 2020-08-27  5:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git List

On Thu, Aug 27, 2020 at 12:44 AM Jeff King <peff@peff.net> wrote:
> On Thu, Aug 27, 2020 at 12:31:52AM -0400, Eric Sunshine wrote:
> > Nevertheless, that isn't much of a reason to keep child_process.env.
> > Refactoring add_worktree() a bit to rebuild the environment strvec
> > on-demand wouldn't be very difficult.
>
> I think they'd still be one-liner changes, like:
>
> -       cp.env = child_env.v;
> +       strvec_pushv(&cp.env_array, child_env.v);

Nice and simple.

> I think there are other opportunities for cleanup, too:

True on both counts.

>   - the code right above the second hunk clears cp.args manually. That
>     shouldn't be necessary because run_command() will leave it in a
>     blank state (and we're already relying on that, since otherwise we'd
>     be left with cruft in other fields from the previous run).

Right. I wonder why the author of 7f44e3d1de (worktree: make setup of
new HEAD distinct from worktree population, 2015-07-17) chose to clear
cp.args manually like that.

>   - check_clean_worktree() only uses it once, and could drop the
>     separate child_env (and in fact appears to leak it)

Perhaps this unnecessary 'child_env' strvec was a copy/paste from
add_worktree()? But certainly cp.env_array would be simpler and avoid
the leak.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* [PATCH] worktree: fix leak in check_clean_worktree()
  2020-08-27  5:03                                 ` Eric Sunshine
@ 2020-08-27  5:25                                   ` Jeff King
  2020-08-27  5:56                                     ` Eric Sunshine
  0 siblings, 1 reply; 78+ messages in thread
From: Jeff King @ 2020-08-27  5:25 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Junio C Hamano, Git List

On Thu, Aug 27, 2020 at 01:03:43AM -0400, Eric Sunshine wrote:

> >   - the code right above the second hunk clears cp.args manually. That
> >     shouldn't be necessary because run_command() will leave it in a
> >     blank state (and we're already relying on that, since otherwise we'd
> >     be left with cruft in other fields from the previous run).
> 
> Right. I wonder why the author of 7f44e3d1de (worktree: make setup of
> new HEAD distinct from worktree population, 2015-07-17) chose to clear
> cp.args manually like that.

I wondered if we might not have cleared the array automatically back
then, but it looks like we did.

I do think this kind of child_process struct reuse is slightly sketchy
in general. Looking at child_process_clear(), we only free the memory,
but leave other fields set. And in fact we rely on that here; git_cmd
needs to remain set for both commands to work. But if the first command
had used, say, cp.in, then we'd be left with a bogus descriptor.

> >   - check_clean_worktree() only uses it once, and could drop the
> >     separate child_env (and in fact appears to leak it)
> 
> Perhaps this unnecessary 'child_env' strvec was a copy/paste from
> add_worktree()? But certainly cp.env_array would be simpler and avoid
> the leak.

Yeah, that was my guess, too.

Most of these issues are more complex and/or should be part of a larger
cleanup effort. But let's fix the leak while we're thinking about it.

-- >8 --
Subject: [PATCH] worktree: fix leak in check_clean_worktree()

We allocate a child_env strvec but never free its memory. Instead, let's
just use the strvec that our child_process struct provides, which is
cleaned up automatically when we run the command.

And while we're moving the initialization of the child_process around,
let's switch it to use the official init function (zero-initializing it
works OK, since strvec is happy enough with that, but it sets a bad
example).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/worktree.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 378f332b5d..df214697d2 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -924,7 +924,6 @@ static int move_worktree(int ac, const char **av, const char *prefix)
 static void check_clean_worktree(struct worktree *wt,
 				 const char *original_path)
 {
-	struct strvec child_env = STRVEC_INIT;
 	struct child_process cp;
 	char buf[1];
 	int ret;
@@ -935,15 +934,14 @@ static void check_clean_worktree(struct worktree *wt,
 	 */
 	validate_no_submodules(wt);
 
-	strvec_pushf(&child_env, "%s=%s/.git",
+	child_process_init(&cp);
+	strvec_pushf(&cp.env_array, "%s=%s/.git",
 		     GIT_DIR_ENVIRONMENT, wt->path);
-	strvec_pushf(&child_env, "%s=%s",
+	strvec_pushf(&cp.env_array, "%s=%s",
 		     GIT_WORK_TREE_ENVIRONMENT, wt->path);
-	memset(&cp, 0, sizeof(cp));
 	strvec_pushl(&cp.args, "status",
 		     "--porcelain", "--ignore-submodules=none",
 		     NULL);
-	cp.env = child_env.v;
 	cp.git_cmd = 1;
 	cp.dir = wt->path;
 	cp.out = -1;
-- 
2.28.0.751.g0834cceced


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH] worktree: fix leak in check_clean_worktree()
  2020-08-27  5:25                                   ` [PATCH] worktree: fix leak in check_clean_worktree() Jeff King
@ 2020-08-27  5:56                                     ` Eric Sunshine
  2020-08-27 15:31                                       ` Junio C Hamano
  0 siblings, 1 reply; 78+ messages in thread
From: Eric Sunshine @ 2020-08-27  5:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git List

On Thu, Aug 27, 2020 at 1:25 AM Jeff King <peff@peff.net> wrote:
> On Thu, Aug 27, 2020 at 01:03:43AM -0400, Eric Sunshine wrote:
> > Right. I wonder why the author of 7f44e3d1de (worktree: make setup of
> > new HEAD distinct from worktree population, 2015-07-17) chose to clear
> > cp.args manually like that.
>
> I wondered if we might not have cleared the array automatically back
> then, but it looks like we did.

I had the same thought and came to the same conclusion. It's possible
that the manual clearing of the array was leftover cruft as the
implementation matured before the patch was submitted. I have a vague
(perhaps false) recollection that a local argv_array was populated and
assigned to cp.argv originally, until cp.args was discovered as a
cleaner alternative and used instead. If that was the case, then the
local argv_array wouldn't have been cleared automatically by
run_command(), which would account for clearing it manually.

> I do think this kind of child_process struct reuse is slightly sketchy
> in general. Looking at child_process_clear(), we only free the memory,
> but leave other fields set. And in fact we rely on that here; git_cmd
> needs to remain set for both commands to work. But if the first command
> had used, say, cp.in, then we'd be left with a bogus descriptor.

Agreed. The current usage in worktree.c is a bit too familiar with the
current internal implementation of run_command(). Reinitializing the
child_process struct or using a separate one would be a good cleanup.

> -- >8 --
> Subject: [PATCH] worktree: fix leak in check_clean_worktree()
>
> We allocate a child_env strvec but never free its memory. Instead, let's
> just use the strvec that our child_process struct provides, which is
> cleaned up automatically when we run the command.
>
> And while we're moving the initialization of the child_process around,
> let's switch it to use the official init function (zero-initializing it
> works OK, since strvec is happy enough with that, but it sets a bad
> example).

The various memset()'s in worktree.c seem to have been inherited (and
multiplied) from Duy's original "git checkout --to" implementation
(which later became the basis for "git worktree add" after which it
mutated significantly), and "git checkout --to" predates introduction
of child_process_init().

> diff --git a/builtin/worktree.c b/builtin/worktree.c
> @@ -924,7 +924,6 @@ static int move_worktree(int ac, const char **av, const char *prefix)
> -       struct strvec child_env = STRVEC_INIT;
> @@ -935,15 +934,14 @@ static void check_clean_worktree(struct worktree *wt,
> -       strvec_pushf(&child_env, "%s=%s/.git",
> +       child_process_init(&cp);
> +       strvec_pushf(&cp.env_array, "%s=%s/.git",
>                      GIT_DIR_ENVIRONMENT, wt->path);
> -       strvec_pushf(&child_env, "%s=%s",
> +       strvec_pushf(&cp.env_array, "%s=%s",
>                      GIT_WORK_TREE_ENVIRONMENT, wt->path);
> -       memset(&cp, 0, sizeof(cp));
> -       cp.env = child_env.v;

Looks good to me. For what it's worth:

Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] worktree: fix leak in check_clean_worktree()
  2020-08-27  5:56                                     ` Eric Sunshine
@ 2020-08-27 15:31                                       ` Junio C Hamano
  0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-27 15:31 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Jeff King, Git List

Eric Sunshine <sunshine@sunshineco.com> writes:

> Agreed. The current usage in worktree.c is a bit too familiar with the
> current internal implementation of run_command(). Reinitializing the
> child_process struct or using a separate one would be a good cleanup.
>
>> -- >8 --
>> Subject: [PATCH] worktree: fix leak in check_clean_worktree()
>>
>> We allocate a child_env strvec but never free its memory. Instead, let's
>> just use the strvec that our child_process struct provides, which is
>> cleaned up automatically when we run the command.
>>
>> And while we're moving the initialization of the child_process around,
>> let's switch it to use the official init function (zero-initializing it
>> works OK, since strvec is happy enough with that, but it sets a bad
>> example).
>
> The various memset()'s in worktree.c seem to have been inherited (and
> multiplied) from Duy's original "git checkout --to" implementation
> (which later became the basis for "git worktree add" after which it
> mutated significantly), and "git checkout --to" predates introduction
> of child_process_init().
>
>> diff --git a/builtin/worktree.c b/builtin/worktree.c
>> @@ -924,7 +924,6 @@ static int move_worktree(int ac, const char **av, const char *prefix)
>> -       struct strvec child_env = STRVEC_INIT;
>> @@ -935,15 +934,14 @@ static void check_clean_worktree(struct worktree *wt,
>> -       strvec_pushf(&child_env, "%s=%s/.git",
>> +       child_process_init(&cp);
>> +       strvec_pushf(&cp.env_array, "%s=%s/.git",
>>                      GIT_DIR_ENVIRONMENT, wt->path);
>> -       strvec_pushf(&child_env, "%s=%s",
>> +       strvec_pushf(&cp.env_array, "%s=%s",
>>                      GIT_WORK_TREE_ENVIRONMENT, wt->path);
>> -       memset(&cp, 0, sizeof(cp));
>> -       cp.env = child_env.v;
>
> Looks good to me. For what it's worth:
>
> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>

Thanks, both.  This looks good.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 3/2] credential-cache: use child_process.args
  2020-08-27  4:14                         ` Jeff King
@ 2020-08-27 15:34                           ` Junio C Hamano
  0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-27 15:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Wed, Aug 26, 2020 at 02:37:39PM -0700, Junio C Hamano wrote:
>
>> As child_process structure has an embedded strvec args for
>> formulating the command line, let's use it instead of using
>> an out-of-line argv[] whose length needs to be maintained
>> correctly.
>
> I forgot to mention in my other reply: I think this fails to mention the
> actual functional change, which is switching from the dashed form to
> using the "git" wrapper.

True.  With an extra paragraph.

-- >8 --
Subject: [PATCH] credential-cache: use child_process.args

As child_process structure has an embedded strvec args for
formulating the command line, let's use it instead of using
an out-of-line argv[] whose length needs to be maintained
correctly.

Also, when spawning a git subcommand, omit it from the command list
and instead use the .git_cmd bit in the child_process structure.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 credential-cache.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/credential-cache.c b/credential-cache.c
index 1cccc3a0b9..04df61cf02 100644
--- a/credential-cache.c
+++ b/credential-cache.c
@@ -39,13 +39,13 @@ static int send_request(const char *socket, const struct strbuf *out)
 static void spawn_daemon(const char *socket)
 {
 	struct child_process daemon = CHILD_PROCESS_INIT;
-	const char *argv[] = { NULL, NULL, NULL };
 	char buf[128];
 	int r;
 
-	argv[0] = "git-credential-cache--daemon";
-	argv[1] = socket;
-	daemon.argv = argv;
+	strvec_pushl(&daemon.args,
+		     "credential-cache--daemon", socket,
+		     NULL);
+	daemon.git_cmd = 1;
 	daemon.no_stdin = 1;
 	daemon.out = -1;
 
-- 
2.28.0-462-gf84ddd074d


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 3/3] git: catch an attempt to run "git-foo"
  2020-08-26  1:17               ` [PATCH v1 3/3] git: catch an attempt to run "git-foo" Junio C Hamano
  2020-08-26  1:19                 ` Junio C Hamano
  2020-08-26  8:06                 ` Johannes Schindelin
@ 2020-08-28  2:13                 ` Johannes Schindelin
  2020-08-28 22:03                   ` Junio C Hamano
  2020-12-20 15:25                   ` Johannes Schindelin
  2 siblings, 2 replies; 78+ messages in thread
From: Johannes Schindelin @ 2020-08-28  2:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

Hi Junio,

On Tue, 25 Aug 2020, Junio C Hamano wrote:

> diff --git a/git.c b/git.c
> index 8bd1d7551d..927018bda7 100644
> --- a/git.c
> +++ b/git.c
> @@ -839,6 +839,8 @@ int cmd_main(int argc, const char **argv)
>  	 * that one cannot handle it.
>  	 */
>  	if (skip_prefix(cmd, "git-", &cmd)) {
> +		warn_on_dashed_git(argv[0]);
> +
>  		argv[0] = cmd;
>  		handle_builtin(argc, argv);
>  		die(_("cannot handle %s as a builtin"), cmd);
> diff --git a/help.c b/help.c
> index d478afb2af..490d2bc3ae 100644
> --- a/help.c
> +++ b/help.c
> @@ -720,3 +720,37 @@ NORETURN void help_unknown_ref(const char *ref, const char *cmd,
>  	string_list_clear(&suggested_refs, 0);
>  	exit(1);
>  }
> +
> +static struct cmdname_help *find_cmdname_help(const char *name)
> +{
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(command_list); i++) {
> +		if (!strcmp(command_list[i].name, name))
> +			return &command_list[i];
> +	}
> +	return NULL;
> +}
> +
> +void warn_on_dashed_git(const char *cmd)
> +{
> +	struct cmdname_help *cmdname;
> +	static const char *still_in_use_var = "GIT_I_STILL_USE_DASHED_GIT";
> +	static const char *still_in_use_msg =
> +		N_("Use of '%s' in the dashed-form is nominated for removal.\n"
> +		   "If you still use it, export '%s=true'\n"
> +		   "and send an e-mail to <git@vger.kernel.org>\n"
> +		   "to let us know and stop our removal plan.  Thanks.\n");
> +
> +	if (!cmd)
> +		return; /* git-help is OK */
> +
> +	cmdname = find_cmdname_help(cmd);
> +	if (cmdname && (cmdname->category & CAT_onpath))
> +		return; /* git-upload-pack and friends are OK */
> +
> +	if (!git_env_bool(still_in_use_var, 0)) {
> +		fprintf(stderr, _(still_in_use_msg), cmd, still_in_use_var);
> +		exit(1);
> +	}
> +}

I need this on top, to make it work on Windows:

-- snipsnap --
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: [PATCH] fixup??? git: catch an attempt to run "git-foo"

This is needed to handle the case where `argv[0]` contains the full path
(which is the case on Windows) and the suffix `.exe` (which is also the
case on Windows).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 git.c  | 3 ++-
 help.c | 5 ++++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/git.c b/git.c
index 71ef4835b20e..863fd0c58a66 100644
--- a/git.c
+++ b/git.c
@@ -851,7 +851,8 @@ int cmd_main(int argc, const char **argv)
 	 * that one cannot handle it.
 	 */
 	if (skip_prefix(cmd, "git-", &cmd)) {
-		warn_on_dashed_git(argv[0]);
+		strip_extension(&cmd);
+		warn_on_dashed_git(cmd);

 		argv[0] = cmd;
 		handle_builtin(argc, argv);
diff --git a/help.c b/help.c
index c93a76944b00..27b1b26890be 100644
--- a/help.c
+++ b/help.c
@@ -724,9 +724,12 @@ NORETURN void help_unknown_ref(const char *ref, const char *cmd,
 static struct cmdname_help *find_cmdname_help(const char *name)
 {
 	int i;
+	const char *p;

+	skip_prefix(name, "git-", &name);
 	for (i = 0; i < ARRAY_SIZE(command_list); i++) {
-		if (!strcmp(command_list[i].name, name))
+		if (skip_prefix(command_list[i].name, "git-", &p) &&
+		    !strcmp(p, name))
 			return &command_list[i];
 	}
 	return NULL;
--
2.28.0.windows.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH] pack-redundant: gauge the usage before proposing its removal
  2020-08-25 22:45           ` [PATCH] pack-redundant: gauge the usage before proposing its removal Junio C Hamano
  2020-08-25 23:09             ` Taylor Blau
  2020-08-26  1:17             ` [PATCH v1 0/3] War on dashed-git Junio C Hamano
@ 2020-08-28  9:14             ` Jeff King
  2020-08-28 22:45               ` Junio C Hamano
  2 siblings, 1 reply; 78+ messages in thread
From: Jeff King @ 2020-08-28  9:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, dstolee

On Tue, Aug 25, 2020 at 03:45:52PM -0700, Junio C Hamano wrote:

> The subcommand is unusably slow and the reason why nobody reports it
> as a performance bug is suspected to be the absense of users.  Let's
> show a big message that asks the user to tell us that they still
> care about the command when an attempt is made to run the command,
> with an escape hatch to override it with a command line option.

I was looking at the history here and noticed this topic, which I
somehow missed when it happened:

  $ git show -s cf0879f7e98d2213503622f780d2ab0dd3f93477
  commit cf0879f7e98d2213503622f780d2ab0dd3f93477
  Merge: 3710f60a80 0e37abd2e8
  Author: Junio C Hamano <gitster@pobox.com>
  Date:   Thu Mar 7 09:59:54 2019 +0900
  
      Merge branch 'sc/pack-redundant'
      
      Update the implementation of pack-redundant for performance in a
      repository with many packfiles.
      
      * sc/pack-redundant:
        pack-redundant: consistent sort method
        pack-redundant: rename pack_list.all_objects
        pack-redundant: new algorithm to find min packs
        pack-redundant: delete redundant code
        pack-redundant: delay creation of unique_objects
        t5323: test cases for git-pack-redundant

So it sounds like:

  - somebody does care enough to use it

  - it may not be horrifically slow anymore

So it may not be worth trying to follow through on the deprecation
(though the fact that neither of us realized this makes me worried for
the general state of maintenance of this code).

-Peff

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 3/3] git: catch an attempt to run "git-foo"
  2020-08-28  2:13                 ` Johannes Schindelin
@ 2020-08-28 22:03                   ` Junio C Hamano
  2020-08-31  9:59                     ` Johannes Schindelin
  2020-12-20 15:25                   ` Johannes Schindelin
  1 sibling, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2020-08-28 22:03 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Jeff King

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> This is needed to handle the case where `argv[0]` contains the full path
> (which is the case on Windows) and the suffix `.exe` (which is also the
> case on Windows).

Oy.  Yeah, I totally forgot about ".exe" thing.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  git.c  | 3 ++-
>  help.c | 5 ++++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/git.c b/git.c
> index 71ef4835b20e..863fd0c58a66 100644
> --- a/git.c
> +++ b/git.c
> @@ -851,7 +851,8 @@ int cmd_main(int argc, const char **argv)
>  	 * that one cannot handle it.
>  	 */
>  	if (skip_prefix(cmd, "git-", &cmd)) {
> -		warn_on_dashed_git(argv[0]);
> +		strip_extension(&cmd);
> +		warn_on_dashed_git(cmd);

The argv[0] may have been NULL from the beginning of cmd_main(), and
cmd would be "git-help" in such a case.  We used to pass NULL to
warn_on_dashed_git() in such a case because "git-help" is not what
the user typed, but what we internally trigger, so we didn't want
warn_on_dashed_git() to do anything based on that internal string.

As there is no special casing of "help" in warn_on_dashed_git()
mechanism, it probably would start triggering a warning/die when
"git<ENTER>" is typed, no?

+	if (argv[0]) {
		strip_extension(&cmd);
		warn_on_dashed_git(cmd);

may be the minimum fix, but I would strongly prefer to keep the
interface into warn_on_dashed_git() (eh, the most important is the
interface into find_cmdname_help() helper function, which is
designed to be reused by other parts of help.c) to take the full
command name, not without "git-" prefix.  This is primarily because
the entirety of the help.c API is driven by full command names,
without removing "git-" prefix, and it has to, because the help.c
API needs to handle "gitk", from which you cannot remove any "git-"
prefix.

Perhaps

	if (starts_with(cmd, "git-")) {
               	strip_extension(&cmd);
		if (argv[0])
			warn_on_dashed_git(cmd);
		argv[0] = cmd + 4;
                handle_builtin(argc, argv);
		die(...);

How does your handle_builtin() work, by the way?  

The original code (even before we added warn_on_dashed_git() in this
codepath) did not do any strip_extension(), so handle_builtin() can
take commands with ".exe" suffix, but now we are feeding the result
of strip_extension() to it, so it can deal with both? 

Sounds convenient and sloppy (not the handle_builtin's
implementation, but its callers that sometimes feeds the full
executable name, and feeds only the basename some other times) at
the same time.

>  		argv[0] = cmd;
>  		handle_builtin(argc, argv);
> diff --git a/help.c b/help.c
> index c93a76944b00..27b1b26890be 100644
> --- a/help.c
> +++ b/help.c
> @@ -724,9 +724,12 @@ NORETURN void help_unknown_ref(const char *ref, const char *cmd,
>  static struct cmdname_help *find_cmdname_help(const char *name)
>  {
>  	int i;
> +	const char *p;
>
> +	skip_prefix(name, "git-", &name);
>  	for (i = 0; i < ARRAY_SIZE(command_list); i++) {
> -		if (!strcmp(command_list[i].name, name))
> +		if (skip_prefix(command_list[i].name, "git-", &p) &&
> +		    !strcmp(p, name))
>  			return &command_list[i];
>  	}
>  	return NULL;
> --
> 2.28.0.windows.1

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH] pack-redundant: gauge the usage before proposing its removal
  2020-08-28  9:14             ` [PATCH] pack-redundant: gauge the usage before proposing its removal Jeff King
@ 2020-08-28 22:45               ` Junio C Hamano
  0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-28 22:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, dstolee

Jeff King <peff@peff.net> writes:

> On Tue, Aug 25, 2020 at 03:45:52PM -0700, Junio C Hamano wrote:
>
>> The subcommand is unusably slow and the reason why nobody reports it
>> as a performance bug is suspected to be the absense of users.  Let's
>> show a big message that asks the user to tell us that they still
>> care about the command when an attempt is made to run the command,
>> with an escape hatch to override it with a command line option.
>
> I was looking at the history here and noticed this topic, which I
> somehow missed when it happened:
>
>   $ git show -s cf0879f7e98d2213503622f780d2ab0dd3f93477
>   commit cf0879f7e98d2213503622f780d2ab0dd3f93477
>   Merge: 3710f60a80 0e37abd2e8
>   Author: Junio C Hamano <gitster@pobox.com>
>   Date:   Thu Mar 7 09:59:54 2019 +0900
>   
>       Merge branch 'sc/pack-redundant'
>       
>       Update the implementation of pack-redundant for performance in a
>       repository with many packfiles.
>       
>       * sc/pack-redundant:
>         pack-redundant: consistent sort method
>         pack-redundant: rename pack_list.all_objects
>         pack-redundant: new algorithm to find min packs
>         pack-redundant: delete redundant code
>         pack-redundant: delay creation of unique_objects
>         t5323: test cases for git-pack-redundant
>
> So it sounds like:
>
>   - somebody does care enough to use it
>
>   - it may not be horrifically slow anymore
>
> So it may not be worth trying to follow through on the deprecation
> (though the fact that neither of us realized this makes me worried for
> the general state of maintenance of this code).

OK.  Just dropping the topic is the easiest ;-)  Thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 3/3] git: catch an attempt to run "git-foo"
  2020-08-28 22:03                   ` Junio C Hamano
@ 2020-08-31  9:59                     ` Johannes Schindelin
  2020-08-31 17:45                       ` Junio C Hamano
  0 siblings, 1 reply; 78+ messages in thread
From: Johannes Schindelin @ 2020-08-31  9:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

Hi Junio,

On Fri, 28 Aug 2020, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  git.c  | 3 ++-
> >  help.c | 5 ++++-
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/git.c b/git.c
> > index 71ef4835b20e..863fd0c58a66 100644
> > --- a/git.c
> > +++ b/git.c
> > @@ -851,7 +851,8 @@ int cmd_main(int argc, const char **argv)
> >  	 * that one cannot handle it.
> >  	 */
> >  	if (skip_prefix(cmd, "git-", &cmd)) {
> > -		warn_on_dashed_git(argv[0]);
> > +		strip_extension(&cmd);
> > +		warn_on_dashed_git(cmd);
>
> The argv[0] may have been NULL from the beginning of cmd_main(), and
> cmd would be "git-help" in such a case. We used to pass NULL to
> warn_on_dashed_git() in such a case because "git-help" is not what the
> user typed, but what we internally trigger, so we didn't want
> warn_on_dashed_git() to do anything based on that internal string.

True. The test suite passes with this, though, which means we haven't
covered that case.

> As there is no special casing of "help" in warn_on_dashed_git()
> mechanism, it probably would start triggering a warning/die when
> "git<ENTER>" is typed, no?
>
> +	if (argv[0]) {
> 		strip_extension(&cmd);
> 		warn_on_dashed_git(cmd);
>
> may be the minimum fix, but I would strongly prefer to keep the
> interface into warn_on_dashed_git() (eh, the most important is the
> interface into find_cmdname_help() helper function, which is
> designed to be reused by other parts of help.c) to take the full
> command name, not without "git-" prefix.  This is primarily because
> the entirety of the help.c API is driven by full command names,
> without removing "git-" prefix, and it has to, because the help.c
> API needs to handle "gitk", from which you cannot remove any "git-"
> prefix.
>
> Perhaps
>
> 	if (starts_with(cmd, "git-")) {
>                	strip_extension(&cmd);
> 		if (argv[0])
> 			warn_on_dashed_git(cmd);
> 		argv[0] = cmd + 4;
>                 handle_builtin(argc, argv);
> 		die(...);

Sure.

> How does your handle_builtin() work, by the way?
>
> The original code (even before we added warn_on_dashed_git() in this
> codepath) did not do any strip_extension(), so handle_builtin() can
> take commands with ".exe" suffix, but now we are feeding the result
> of strip_extension() to it, so it can deal with both?

Yes, it can deal with both. It calls `strip_extension()`, which on Windows
removes the `.exe` suffix, if found.

> Sounds convenient and sloppy (not the handle_builtin's
> implementation, but its callers that sometimes feeds the full
> executable name, and feeds only the basename some other times) at
> the same time.

Right, it does not _require_ the extension. I do not necessarily agree
that it is sloppy, but I do agree that it is convenient ;-)

Ciao,
Dscho

> >  		argv[0] = cmd;
> >  		handle_builtin(argc, argv);
> > diff --git a/help.c b/help.c
> > index c93a76944b00..27b1b26890be 100644
> > --- a/help.c
> > +++ b/help.c
> > @@ -724,9 +724,12 @@ NORETURN void help_unknown_ref(const char *ref, const char *cmd,
> >  static struct cmdname_help *find_cmdname_help(const char *name)
> >  {
> >  	int i;
> > +	const char *p;
> >
> > +	skip_prefix(name, "git-", &name);
> >  	for (i = 0; i < ARRAY_SIZE(command_list); i++) {
> > -		if (!strcmp(command_list[i].name, name))
> > +		if (skip_prefix(command_list[i].name, "git-", &p) &&
> > +		    !strcmp(p, name))
> >  			return &command_list[i];
> >  	}
> >  	return NULL;
> > --
> > 2.28.0.windows.1
>

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 3/3] git: catch an attempt to run "git-foo"
  2020-08-31  9:59                     ` Johannes Schindelin
@ 2020-08-31 17:45                       ` Junio C Hamano
  0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-08-31 17:45 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Jeff King

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi Junio,
>
> On Fri, 28 Aug 2020, Junio C Hamano wrote:
>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>> > ---
>> >  git.c  | 3 ++-
>> >  help.c | 5 ++++-
>> >  2 files changed, 6 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/git.c b/git.c
>> > index 71ef4835b20e..863fd0c58a66 100644
>> > --- a/git.c
>> > +++ b/git.c
>> > @@ -851,7 +851,8 @@ int cmd_main(int argc, const char **argv)
>> >  	 * that one cannot handle it.
>> >  	 */
>> >  	if (skip_prefix(cmd, "git-", &cmd)) {
>> > -		warn_on_dashed_git(argv[0]);
>> > +		strip_extension(&cmd);
>> > +		warn_on_dashed_git(cmd);
>>
>> The argv[0] may have been NULL from the beginning of cmd_main(), and
>> cmd would be "git-help" in such a case. We used to pass NULL to
>> warn_on_dashed_git() in such a case because "git-help" is not what the
>> user typed, but what we internally trigger, so we didn't want
>> warn_on_dashed_git() to do anything based on that internal string.
>
> True. The test suite passes with this, though, which means we haven't
> covered that case.

Yup, it would be a good thing to add to our tests, with or without
this patch.

>> How does your handle_builtin() work, by the way?
>>
>> The original code (even before we added warn_on_dashed_git() in this
>> codepath) did not do any strip_extension(), so handle_builtin() can
>> take commands with ".exe" suffix, but now we are feeding the result
>> of strip_extension() to it, so it can deal with both?
>
> Yes, it can deal with both. It calls `strip_extension()`, which on Windows
> removes the `.exe` suffix, if found.
>
>> Sounds convenient and sloppy (not the handle_builtin's
>> implementation, but its callers that sometimes feeds the full
>> executable name, and feeds only the basename some other times) at
>> the same time.
>
> Right, it does not _require_ the extension. I do not necessarily agree
> that it is sloppy, but I do agree that it is convenient ;-)

Being lenient to its input is not sloppy.  

It just is by being convenient, it allows its callers to be sloppy,
which may hurt them as not all callees they call may not be as
lenient as handle_builtin(), which is the only downside I would be
worried about.  Nothing big.

thanks.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 3/2] credential-cache: use child_process.args
  2020-08-26 21:37                       ` [PATCH v2 3/2] credential-cache: use child_process.args Junio C Hamano
                                           ` (2 preceding siblings ...)
  2020-08-27  4:14                         ` Jeff King
@ 2020-08-31 22:56                         ` Junio C Hamano
  2020-09-01  4:49                           ` Jeff King
  3 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2020-08-31 22:56 UTC (permalink / raw)
  To: git; +Cc: Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> As child_process structure has an embedded strvec args for
> formulating the command line, let's use it instead of using
> an out-of-line argv[] whose length needs to be maintained
> correctly. 
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  builtin/credential-cache.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/credential-cache.c b/builtin/credential-cache.c
> index d0fafdeb9e..195335a783 100644
> --- a/builtin/credential-cache.c
> +++ b/builtin/credential-cache.c
> @@ -42,13 +42,13 @@ static int send_request(const char *socket, const struct strbuf *out)
>  static void spawn_daemon(const char *socket)
>  {
>  	struct child_process daemon = CHILD_PROCESS_INIT;
> -	const char *argv[] = { NULL, NULL, NULL };
>  	char buf[128];
>  	int r;
>  
> -	argv[0] = "git-credential-cache--daemon";
> -	argv[1] = socket;
> -	daemon.argv = argv;
> +	strvec_pushl(&daemon.args, 
> +		     "credential-cache--daemon", socket,
> +		     NULL);
> +	daemon.git_cmd = 1;
>  	daemon.no_stdin = 1;
>  	daemon.out = -1;
>  

By the way, an interesting fact is that this cannot graduate UNTIL
credential-cache becomes a built-in.  Having an intermediate level
process seems to break t0301.


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 3/2] credential-cache: use child_process.args
  2020-08-31 22:56                         ` Junio C Hamano
@ 2020-09-01  4:49                           ` Jeff King
  2020-09-01 16:11                             ` Junio C Hamano
  0 siblings, 1 reply; 78+ messages in thread
From: Jeff King @ 2020-09-01  4:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, Aug 31, 2020 at 03:56:00PM -0700, Junio C Hamano wrote:

> > -	argv[0] = "git-credential-cache--daemon";
> > -	argv[1] = socket;
> > -	daemon.argv = argv;
> > +	strvec_pushl(&daemon.args, 
> > +		     "credential-cache--daemon", socket,
> > +		     NULL);
> > +	daemon.git_cmd = 1;
> >  	daemon.no_stdin = 1;
> >  	daemon.out = -1;
> >  
> 
> By the way, an interesting fact is that this cannot graduate UNTIL
> credential-cache becomes a built-in.  Having an intermediate level
> process seems to break t0301.

Hmm, that is interesting. I thought it would work OK because we don't
rely on any process-id magic for finding the daemon, etc, and instead
talk over pipe descriptors. But that proves to be our undoing.

What happens usually is this:

  - credential-cache spawns credential-cache--daemon with its
    stdout connected to a pipe

  - the client calls read_in_full() waiting for the daemon to tell us
    that it started up

  - the daemon opens the socket, then writes "ok\n" to stdout and closes
    the pipe

  - the client gets EOF on the pipe, then compares what it read to
    "ok\n", and continues (or relays an error)

But when we add the extra "git" wrapper process into the mix, we never
see that EOF. The wrapper's stdout also points to that pipe, so closing
it in the daemon process isn't enough for the client to get EOF. And the
wrapper doesn't exit, because it's waiting on the daemon.

So one hacky fix is:

diff --git a/credential-cache.c b/credential-cache.c
index 04df61cf02..9bfddaf050 100644
--- a/credential-cache.c
+++ b/credential-cache.c
@@ -51,7 +51,7 @@ static void spawn_daemon(const char *socket)
 
 	if (start_command(&daemon))
 		die_errno("unable to start cache daemon");
-	r = read_in_full(daemon.out, buf, sizeof(buf));
+	r = read_in_full(daemon.out, buf, 3);
 	if (r < 0)
 		die_errno("unable to read result code from cache daemon");
 	if (r != 3 || memcmp(buf, "ok\n", 3))

which makes t0301 pass. A less-hacky solution might be to loosen its
expectations not to require EOF at all (and just accept anything
starting with "ok\n". But I don't think it's worth doing either, if we
know we're going to switch it to a builtin anyway (and that also makes
me feel slightly better about the plan to do so).

I do wonder if it points to a problem in the git.c wrapper. It's
duplicating all of the descriptors that the child external commands will
see, so callers can't rely on EOF (or EPIPE for its stdin) to know when
the external program has closed them. For the most part that's OK,
because we expect them to close when the external program exits, at
which point the wrapper will exit, too. But things get weird around
half-duplex shutdowns, or programs that try to daemonize.

Perhaps git.c should be closing all descriptors after spawning the
child. Of course that gets weird if it wants to write an error to stderr
about spawning the child. I dunno. It seems as likely to introduce
problems as solve them, so if nothing is broken beyond this cache-daemon
thing, I'd just as soon leave it.

I wouldn't be surprised if older pre-builtin "upload-pack" could run
into problems. But we always called it as "git-upload-pack" back then
anyway. Possibly modern "git daemon" could suffer weirdness, as it's
still a separate program (I shied away from including it in my recent
"slimming" series exactly because I was afraid of these kinds of issues;
but it sounds like being a builtin probably has less-surprising
implications overall).

All of which is to say I'm happy to do nothing, but that turned out to
be a very interesting data point. Thanks for mentioning it. :)

-Peff

^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH v2 3/2] credential-cache: use child_process.args
  2020-09-01  4:49                           ` Jeff King
@ 2020-09-01 16:11                             ` Junio C Hamano
  0 siblings, 0 replies; 78+ messages in thread
From: Junio C Hamano @ 2020-09-01 16:11 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Hmm, that is interesting. I thought it would work OK because we don't
> rely on any process-id magic for finding the daemon, etc, and instead
> talk over pipe descriptors. But that proves to be our undoing.

Yup.  I also do suspect that closing all excess fds in the git
process in the middle may not be a bad idea, but we may not know
what the upper bound is.

> Perhaps git.c should be closing all descriptors after spawning the
> child. Of course that gets weird if it wants to write an error to stderr
> about spawning the child. I dunno. It seems as likely to introduce
> problems as solve them, so if nothing is broken beyond this cache-daemon
> thing, I'd just as soon leave it.

We do employ the "open extra pipe that is set to close-on-exec so
that child that failed to exec can report back" trick, but in order
to report the failure back, the standard error of the process in the
middle may have to be kept open, so let's not disturb this sleeping
dragon ;-)


^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 3/3] git: catch an attempt to run "git-foo"
  2020-08-28  2:13                 ` Johannes Schindelin
  2020-08-28 22:03                   ` Junio C Hamano
@ 2020-12-20 15:25                   ` Johannes Schindelin
  2020-12-21 22:24                     ` Junio C Hamano
  1 sibling, 1 reply; 78+ messages in thread
From: Johannes Schindelin @ 2020-12-20 15:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

Hi Junio,

On Fri, 28 Aug 2020, Johannes Schindelin wrote:

> On Tue, 25 Aug 2020, Junio C Hamano wrote:
>
> > diff --git a/git.c b/git.c
> > index 8bd1d7551d..927018bda7 100644
> > --- a/git.c
> > +++ b/git.c
> > @@ -839,6 +839,8 @@ int cmd_main(int argc, const char **argv)
> >  	 * that one cannot handle it.
> >  	 */
> >  	if (skip_prefix(cmd, "git-", &cmd)) {
> > +		warn_on_dashed_git(argv[0]);
> > +
> >  		argv[0] = cmd;
> >  		handle_builtin(argc, argv);
> >  		die(_("cannot handle %s as a builtin"), cmd);
> > diff --git a/help.c b/help.c
> > index d478afb2af..490d2bc3ae 100644
> > --- a/help.c
> > +++ b/help.c
> > @@ -720,3 +720,37 @@ NORETURN void help_unknown_ref(const char *ref, const char *cmd,
> >  	string_list_clear(&suggested_refs, 0);
> >  	exit(1);
> >  }
> > +
> > +static struct cmdname_help *find_cmdname_help(const char *name)
> > +{
> > +	int i;
> > +
> > +	for (i = 0; i < ARRAY_SIZE(command_list); i++) {
> > +		if (!strcmp(command_list[i].name, name))
> > +			return &command_list[i];
> > +	}
> > +	return NULL;
> > +}
> > +
> > +void warn_on_dashed_git(const char *cmd)
> > +{
> > +	struct cmdname_help *cmdname;
> > +	static const char *still_in_use_var = "GIT_I_STILL_USE_DASHED_GIT";
> > +	static const char *still_in_use_msg =
> > +		N_("Use of '%s' in the dashed-form is nominated for removal.\n"
> > +		   "If you still use it, export '%s=true'\n"
> > +		   "and send an e-mail to <git@vger.kernel.org>\n"
> > +		   "to let us know and stop our removal plan.  Thanks.\n");
> > +
> > +	if (!cmd)
> > +		return; /* git-help is OK */
> > +
> > +	cmdname = find_cmdname_help(cmd);
> > +	if (cmdname && (cmdname->category & CAT_onpath))
> > +		return; /* git-upload-pack and friends are OK */
> > +
> > +	if (!git_env_bool(still_in_use_var, 0)) {
> > +		fprintf(stderr, _(still_in_use_msg), cmd, still_in_use_var);
> > +		exit(1);
> > +	}
> > +}
>
> I need this on top, to make it work on Windows:
>
> -- snipsnap --
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> Subject: [PATCH] fixup??? git: catch an attempt to run "git-foo"
>
> This is needed to handle the case where `argv[0]` contains the full path
> (which is the case on Windows) and the suffix `.exe` (which is also the
> case on Windows).
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  git.c  | 3 ++-
>  help.c | 5 ++++-
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/git.c b/git.c
> index 71ef4835b20e..863fd0c58a66 100644
> --- a/git.c
> +++ b/git.c
> @@ -851,7 +851,8 @@ int cmd_main(int argc, const char **argv)
>  	 * that one cannot handle it.
>  	 */
>  	if (skip_prefix(cmd, "git-", &cmd)) {
> -		warn_on_dashed_git(argv[0]);
> +		strip_extension(&cmd);
> +		warn_on_dashed_git(cmd);
>
>  		argv[0] = cmd;
>  		handle_builtin(argc, argv);
> diff --git a/help.c b/help.c
> index c93a76944b00..27b1b26890be 100644
> --- a/help.c
> +++ b/help.c
> @@ -724,9 +724,12 @@ NORETURN void help_unknown_ref(const char *ref, const char *cmd,
>  static struct cmdname_help *find_cmdname_help(const char *name)
>  {
>  	int i;
> +	const char *p;
>
> +	skip_prefix(name, "git-", &name);
>  	for (i = 0; i < ARRAY_SIZE(command_list); i++) {
> -		if (!strcmp(command_list[i].name, name))
> +		if (skip_prefix(command_list[i].name, "git-", &p) &&
> +		    !strcmp(p, name))
>  			return &command_list[i];
>  	}
>  	return NULL;
> --
> 2.28.0.windows.1

How about this instead (to fix that part of the CI failures of `seen`)?

-- snipsnap --
From e8ce19db04657b6ef1c73989695c97a773a9c001 Mon Sep 17 00:00:00 2001
From: Johannes Schindelin <johannes.schindelin@gmx.de>
Date: Fri, 28 Aug 2020 14:50:25 +0200
Subject: [PATCH] fixup??? git: catch an attempt to run "git-foo"

This is needed to handle the case where `argv[0]` contains the full path
(which is the case on Windows) and the suffix `.exe` (which is also the
case on Windows).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 git.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/git.c b/git.c
index 7544d2187306..c924c53ea76f 100644
--- a/git.c
+++ b/git.c
@@ -854,6 +854,7 @@ int cmd_main(int argc, const char **argv)
 	const char *cmd;
 	int done_help = 0;

+	strip_extension(argv);
 	cmd = argv[0];
 	if (!cmd)
 		cmd = "git-help";
@@ -875,12 +876,11 @@ int cmd_main(int argc, const char **argv)
 	 * So we just directly call the builtin handler, and die if
 	 * that one cannot handle it.
 	 */
-	if (skip_prefix(cmd, "git-", &cmd)) {
-		warn_on_dashed_git(argv[0]);
+	if (skip_prefix(cmd, "git-", &argv[0])) {
+		warn_on_dashed_git(cmd);

-		argv[0] = cmd;
 		handle_builtin(argc, argv);
-		die(_("cannot handle %s as a builtin"), cmd);
+		die(_("cannot handle %s as a builtin"), argv[0]);
 	}

 	/* Look for flags.. */
--
2.30.0.rc0.windows.1


^ permalink raw reply related	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 3/3] git: catch an attempt to run "git-foo"
  2020-12-20 15:25                   ` Johannes Schindelin
@ 2020-12-21 22:24                     ` Junio C Hamano
  2020-12-30  5:30                       ` Johannes Schindelin
  0 siblings, 1 reply; 78+ messages in thread
From: Junio C Hamano @ 2020-12-21 22:24 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Jeff King

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> I need this on top, to make it work on Windows:
>> ...
> How about this instead (to fix that part of the CI failures of `seen`)?

Ah, I didn't knew the backburnered stuff was breaking 'seen'.
Thanks for helping to cleanse 'seen'; I do not actually mind
dropping the offending topic at this point in the cycle, though.

> -- snipsnap --
> From e8ce19db04657b6ef1c73989695c97a773a9c001 Mon Sep 17 00:00:00 2001
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
> Date: Fri, 28 Aug 2020 14:50:25 +0200
> Subject: [PATCH] fixup??? git: catch an attempt to run "git-foo"
>
> This is needed to handle the case where `argv[0]` contains the full path
> (which is the case on Windows) and the suffix `.exe` (which is also the
> case on Windows).
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  git.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/git.c b/git.c
> index 7544d2187306..c924c53ea76f 100644
> --- a/git.c
> +++ b/git.c
> @@ -854,6 +854,7 @@ int cmd_main(int argc, const char **argv)
>  	const char *cmd;
>  	int done_help = 0;
>
> +	strip_extension(argv);
>  	cmd = argv[0];

Hph, would this make strip_extension() at the beginning of
handle_builtin() redundant and unneeded, I wonder?

Yes, I know stripping .exe twice would be fine most of the time, so
I'll queue the patch on top just to make 'seen' pass the tests, but
it is just as easy to discard jc/war-on-dashed-git topic, so...

>  	if (!cmd)
>  		cmd = "git-help";
> @@ -875,12 +876,11 @@ int cmd_main(int argc, const char **argv)
>  	 * So we just directly call the builtin handler, and die if
>  	 * that one cannot handle it.
>  	 */
> -	if (skip_prefix(cmd, "git-", &cmd)) {
> -		warn_on_dashed_git(argv[0]);
> +	if (skip_prefix(cmd, "git-", &argv[0])) {
> +		warn_on_dashed_git(cmd);
>
> -		argv[0] = cmd;
>  		handle_builtin(argc, argv);
> -		die(_("cannot handle %s as a builtin"), cmd);
> +		die(_("cannot handle %s as a builtin"), argv[0]);
>  	}
>
>  	/* Look for flags.. */
> --
> 2.30.0.rc0.windows.1

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: [PATCH v1 3/3] git: catch an attempt to run "git-foo"
  2020-12-21 22:24                     ` Junio C Hamano
@ 2020-12-30  5:30                       ` Johannes Schindelin
  0 siblings, 0 replies; 78+ messages in thread
From: Johannes Schindelin @ 2020-12-30  5:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

Hi Junio,

On Mon, 21 Dec 2020, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> > -- snipsnap --
> > From e8ce19db04657b6ef1c73989695c97a773a9c001 Mon Sep 17 00:00:00 2001
> > From: Johannes Schindelin <johannes.schindelin@gmx.de>
> > Date: Fri, 28 Aug 2020 14:50:25 +0200
> > Subject: [PATCH] fixup??? git: catch an attempt to run "git-foo"
> >
> > This is needed to handle the case where `argv[0]` contains the full path
> > (which is the case on Windows) and the suffix `.exe` (which is also the
> > case on Windows).
> >
> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> > ---
> >  git.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/git.c b/git.c
> > index 7544d2187306..c924c53ea76f 100644
> > --- a/git.c
> > +++ b/git.c
> > @@ -854,6 +854,7 @@ int cmd_main(int argc, const char **argv)
> >  	const char *cmd;
> >  	int done_help = 0;
> >
> > +	strip_extension(argv);
> >  	cmd = argv[0];
>
> Hph, would this make strip_extension() at the beginning of
> handle_builtin() redundant and unneeded, I wonder?

I poured over this for a couple minutes, and I think you're right.

> Yes, I know stripping .exe twice would be fine most of the time, so
> I'll queue the patch on top just to make 'seen' pass the tests, but
> it is just as easy to discard jc/war-on-dashed-git topic, so...

I dunno. There is probably value in starting the deprecation for realz. I
mean, we deprecated dashed commands, like, an age ago (2020 alone felt
like a century, didn't it?). Maybe it is time to warn about using dashed
commands now. We could even consider "brown-outs" at some stage, via a
pseudo-random condition that triggers rarely in spring 2021 but
incrementally frequently over time?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2020-12-30 22:00 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-25  2:01 [PATCH] builtin/repack.c: invalidate MIDX only when necessary Taylor Blau
2020-08-25  2:26 ` Jeff King
2020-08-25  2:37   ` Taylor Blau
2020-08-25 13:14     ` Derrick Stolee
2020-08-25 14:41       ` Taylor Blau
2020-08-25 15:14         ` Derrick Stolee
2020-08-25 15:42           ` Taylor Blau
2020-08-25 16:56             ` Jeff King
2020-08-25 15:58   ` Junio C Hamano
2020-08-25 16:08     ` Taylor Blau
2020-08-25 16:18     ` Derrick Stolee
2020-08-25 17:34       ` Jeff King
2020-08-25 17:22     ` Jeff King
2020-08-25 18:05       ` Junio C Hamano
2020-08-25 18:27         ` Jeff King
2020-08-25 22:45           ` [PATCH] pack-redundant: gauge the usage before proposing its removal Junio C Hamano
2020-08-25 23:09             ` Taylor Blau
2020-08-25 23:22               ` Junio C Hamano
2020-08-26  1:17             ` [PATCH v1 0/3] War on dashed-git Junio C Hamano
2020-08-26  1:17               ` [PATCH v1 1/3] transport-helper: do not run git-remote-ext etc. in dashed form Junio C Hamano
2020-08-26  1:24                 ` Eric Sunshine
2020-08-26  7:55                   ` Johannes Schindelin
2020-08-26 16:27                   ` Junio C Hamano
2020-08-26  1:17               ` [PATCH v1 2/3] cvsexportcommit: do not run git programs " Junio C Hamano
2020-08-26  1:28                 ` Eric Sunshine
2020-08-26  1:42                   ` Junio C Hamano
2020-08-26 16:08                   ` Junio C Hamano
2020-08-26 16:28                     ` Junio C Hamano
2020-08-26  8:02                 ` Johannes Schindelin
2020-08-26  1:17               ` [PATCH v1 3/3] git: catch an attempt to run "git-foo" Junio C Hamano
2020-08-26  1:19                 ` Junio C Hamano
2020-08-26  8:06                 ` Johannes Schindelin
2020-08-26 16:30                   ` Junio C Hamano
2020-08-28  2:13                 ` Johannes Schindelin
2020-08-28 22:03                   ` Junio C Hamano
2020-08-31  9:59                     ` Johannes Schindelin
2020-08-31 17:45                       ` Junio C Hamano
2020-12-20 15:25                   ` Johannes Schindelin
2020-12-21 22:24                     ` Junio C Hamano
2020-12-30  5:30                       ` Johannes Schindelin
2020-08-26  8:09               ` [PATCH v1 0/3] War on dashed-git Johannes Schindelin
2020-08-26 16:45                 ` Junio C Hamano
2020-08-26 19:46                   ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Junio C Hamano
2020-08-26 19:46                     ` [PATCH v2 1/2] transport-helper: do not run git-remote-ext etc. in " Junio C Hamano
2020-08-26 19:46                     ` [PATCH v2 2/2] cvsexportcommit: do not run git programs " Junio C Hamano
2020-08-26 21:37                       ` [PATCH v2 3/2] credential-cache: use child_process.args Junio C Hamano
2020-08-26 22:25                         ` [PATCH] run_command: teach API users to use embedded 'args' more Junio C Hamano
2020-08-27  4:21                           ` Jeff King
2020-08-27  4:30                             ` Junio C Hamano
2020-08-27  4:31                             ` Eric Sunshine
2020-08-27  4:44                               ` Jeff King
2020-08-27  5:03                                 ` Eric Sunshine
2020-08-27  5:25                                   ` [PATCH] worktree: fix leak in check_clean_worktree() Jeff King
2020-08-27  5:56                                     ` Eric Sunshine
2020-08-27 15:31                                       ` Junio C Hamano
2020-08-27  4:13                         ` [PATCH v2 3/2] credential-cache: use child_process.args Jeff King
2020-08-27  4:22                           ` Jeff King
2020-08-27  4:31                           ` Junio C Hamano
2020-08-27  4:14                         ` Jeff King
2020-08-27 15:34                           ` Junio C Hamano
2020-08-31 22:56                         ` Junio C Hamano
2020-09-01  4:49                           ` Jeff King
2020-09-01 16:11                             ` Junio C Hamano
2020-08-27  0:57                     ` [PATCH v2 0/2] avoid running "git-subcmd" in the dashed form Derrick Stolee
2020-08-27  1:22                       ` Junio C Hamano
2020-08-28  9:14             ` [PATCH] pack-redundant: gauge the usage before proposing its removal Jeff King
2020-08-28 22:45               ` Junio C Hamano
2020-08-25  7:55 ` [PATCH] builtin/repack.c: invalidate MIDX only when necessary Son Luong Ngoc
2020-08-25 12:45   ` Derrick Stolee
2020-08-25 14:45   ` Taylor Blau
2020-08-25 16:04     ` [PATCH v2] " Taylor Blau
2020-08-26 20:51       ` Derrick Stolee
2020-08-26 20:54         ` Junio C Hamano
2020-08-25 16:47     ` [PATCH] " Jeff King
2020-08-25 17:10       ` Derrick Stolee
2020-08-25 17:29         ` Jeff King
2020-08-25 17:34           ` Taylor Blau
2020-08-25 17:42             ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).