All of lore.kernel.org
 help / color / mirror / Atom feed
* Memory leak with sparse-checkout
@ 2021-09-20 12:15 Calbabreaker
  2021-09-20 15:52 ` Taylor Blau
  0 siblings, 1 reply; 15+ messages in thread
From: Calbabreaker @ 2021-09-20 12:15 UTC (permalink / raw)
  To: git

What did you do before the bug happened? (Steps to reproduce your issue)

This was ran:

git clone https://github.com/Calbabreaker/piano --sparse
cd piano
git sparse-checkout add any_text
git checkout deploy-frontend
git sparse-checkout init --cone
git sparse-checkout add any_text

Basically clone any repository with --sparse, add anything to
sparse-checkout list
then checkout to another branch, init the sparse-checkout with --cone and add
anything to it again.

What did you expect to happen? (Expected behavior)

Nothing at all except for the sparse-checkout list to be updated.

What happened instead? (Actual behavior)

A massive memory leak using all the RAM with one of the cpu cores
being at 100% usage.

[System Info]
git version:
git version 2.33.0
cpu: x86_64
no commit associated with this build
sizeof-long: 8
sizeof-size_t: 8
shell-path: /bin/sh
uname: Linux 5.10.63-1-MANJARO #1 SMP PREEMPT Wed Sep 8 14:13:59 UTC 2021 x86_64
compiler info: gnuc: 11.1
libc info: glibc: 2.33
$SHELL (typically, interactive shell): /usr/bin/zsh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-20 12:15 Memory leak with sparse-checkout Calbabreaker
@ 2021-09-20 15:52 ` Taylor Blau
  2021-09-20 16:29   ` Derrick Stolee
  0 siblings, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2021-09-20 15:52 UTC (permalink / raw)
  To: Calbabreaker, Derrick Stolee; +Cc: git

On Mon, Sep 20, 2021 at 09:45:14PM +0930, Calbabreaker wrote:
> What did you do before the bug happened? (Steps to reproduce your issue)
>
> This was ran:
>
> git clone https://github.com/Calbabreaker/piano --sparse
> cd piano
> git sparse-checkout add any_text
> git checkout deploy-frontend
> git sparse-checkout init --cone
> git sparse-checkout add any_text

Thanks for the reproduction. An even simpler one may be (inside of any
repository):

    git sparse-checkout init
    git sparse-checkout add dir
    git sparse-checkout init --cone
    git sparse-checkout add dir

The problem occurs because we keep existing entries when adding to the
sparse-checkout list, and cone-mode patterns do not mix with
non cone-mode patterns.

So after the first init and "add dir", your sparse-checkout file looks
like:

  /*
  !/*/
  dir

but then when we convert to cone-mode and try and add "dir" (which in
cone-mode we'll convert to "/dir/"), we run into trouble when adding the
existing "dir" entry. That's because add_patterns_cone_mode() calls
insert_recursive_pattern() on every entry in the existing list,
including "dir".

So when we call insert_recursive_pattern() with any pattern list and
path containing "dir", we first insert "dir" into the list, and then:

  char *slash = strrchr(e->pattern, '/');
  char *oldpattern = e->pattern;

  if (slash == e->pattern)
    break;
  // trim off a slash, repeat

except slash is NULL because "dir" doesn't contain a slash. And that
explains the problem you're seeing, because (a) we'll stay in that while
loop forever, and (b) because each iteration allocates memory to
accommodate the new pattern, so we'll eventually run out of memory.

The wrong thing to do would be to handle this case by changing the
conditional to "if (!slash || slash == e->pattern)", because we can't
blindly carry forward some patterns which look like cone-mode patterns,
since together the list of sparse-checkout entries may not represent a
cone.

(An example here is if we added /foo/bar/baz/* without the corresponding
/foo/, !/foo/*, and so on).

So I think the problem really is that we need to drop existing patterns
when re-initializing the sparse-checkout in cone mode. We could try to
recognize that existing patterns may already constitute a cone (and/or
create a cone that covers the existing patterns).

But I think the easiest thing (if a little unfriendly) would be to just
drop them and start afresh when re-initializing the sparse-checkout in
cone mode.

I'm adding Stolee to the CC to see if he thinks that would be sensible
behavior or not.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-20 15:52 ` Taylor Blau
@ 2021-09-20 16:29   ` Derrick Stolee
  2021-09-20 16:42     ` Taylor Blau
  0 siblings, 1 reply; 15+ messages in thread
From: Derrick Stolee @ 2021-09-20 16:29 UTC (permalink / raw)
  To: Taylor Blau, Calbabreaker; +Cc: git

On 9/20/2021 11:52 AM, Taylor Blau wrote:
> On Mon, Sep 20, 2021 at 09:45:14PM +0930, Calbabreaker wrote:
>> What did you do before the bug happened? (Steps to reproduce your issue)
>>
>> This was ran:
>>
>> git clone https://github.com/Calbabreaker/piano --sparse
>> cd piano
>> git sparse-checkout add any_text
>> git checkout deploy-frontend
>> git sparse-checkout init --cone
>> git sparse-checkout add any_text

Thank you for pointing this out.

I'll point out that this was likely found because "--sparse" does not
initialize cone mode patterns, but you might have expected it to. This
will increase the priority of adding something like "--sparse=cone"
to the 'git clone' options.

> Thanks for the reproduction. An even simpler one may be (inside of any
> repository):
> 
>     git sparse-checkout init
>     git sparse-checkout add dir
>     git sparse-checkout init --cone
>     git sparse-checkout add dir
> 
> The problem occurs because we keep existing entries when adding to the
> sparse-checkout list, and cone-mode patterns do not mix with
> non cone-mode patterns.
> 
> So after the first init and "add dir", your sparse-checkout file looks
> like:
> 
>   /*
>   !/*/
>   dir
> 
> but then when we convert to cone-mode and try and add "dir" (which in
> cone-mode we'll convert to "/dir/"), we run into trouble when adding the
> existing "dir" entry. That's because add_patterns_cone_mode() calls
> insert_recursive_pattern() on every entry in the existing list,
> including "dir".
> 
> So when we call insert_recursive_pattern() with any pattern list and
> path containing "dir", we first insert "dir" into the list, and then:
> 
>   char *slash = strrchr(e->pattern, '/');
>   char *oldpattern = e->pattern;
> 
>   if (slash == e->pattern)
>     break;
>   // trim off a slash, repeat
> 
> except slash is NULL because "dir" doesn't contain a slash. And that
> explains the problem you're seeing, because (a) we'll stay in that while
> loop forever, and (b) because each iteration allocates memory to
> accommodate the new pattern, so we'll eventually run out of memory.

Yikes! Thanks for digging into the details.

> The wrong thing to do would be to handle this case by changing the
> conditional to "if (!slash || slash == e->pattern)", because we can't
> blindly carry forward some patterns which look like cone-mode patterns,
> since together the list of sparse-checkout entries may not represent a
> cone.
> 
> (An example here is if we added /foo/bar/baz/* without the corresponding
> /foo/, !/foo/*, and so on).
> 
> So I think the problem really is that we need to drop existing patterns
> when re-initializing the sparse-checkout in cone mode. We could try to
> recognize that existing patterns may already constitute a cone (and/or
> create a cone that covers the existing patterns).
> 
> But I think the easiest thing (if a little unfriendly) would be to just
> drop them and start afresh when re-initializing the sparse-checkout in
> cone mode.

This isn't sufficient, as a user can modify their .git/info/sparse-checkout
file whenever they want, so we should fix this bug regardless. We could add
a "Your existing patterns are not in cone mode" error.

It might still be a good idea to let "git sparse-checkout init --cone"
overwrite the sparse-checkout file _if the file is not already in cone
mode_.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-20 16:29   ` Derrick Stolee
@ 2021-09-20 16:42     ` Taylor Blau
  2021-09-20 17:25       ` Derrick Stolee
  0 siblings, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2021-09-20 16:42 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Calbabreaker, git

On Mon, Sep 20, 2021 at 12:29:36PM -0400, Derrick Stolee wrote:
> > So I think the problem really is that we need to drop existing patterns
> > when re-initializing the sparse-checkout in cone mode. We could try to
> > recognize that existing patterns may already constitute a cone (and/or
> > create a cone that covers the existing patterns).
> >
> > But I think the easiest thing (if a little unfriendly) would be to just
> > drop them and start afresh when re-initializing the sparse-checkout in
> > cone mode.
>
> This isn't sufficient, as a user can modify their .git/info/sparse-checkout
> file whenever they want, so we should fix this bug regardless. We could add
> a "Your existing patterns are not in cone mode" error.
>
> It might still be a good idea to let "git sparse-checkout init --cone"
> overwrite the sparse-checkout file _if the file is not already in cone
> mode_.

I'm not sure how helpful such an error message might be to a user in
this scenario without extra information. After seeing just "this isn't a
cone", it's not clear what they should do other than drop their
sparse-checkout configuration and start over.

It would be nice to have an intermediate step between seeing realizing
that the existing patterns don't form a cone and dropping them.

Perhaps we could include an error message and say something like:

    warning: your sparse-checkout patterns do not from a cone
       hint: to reinitialize your sparse-checkout configuration
       hint: try running:
       hint:
       hint:   git sparse-checkout init --cone --reinitialize

Where `--reinitialize` means to drop existing patterns. I suppose it
could be the default when transitioning from non-cone to cone mode, but
that would defeat the purpose of the warning.

We would probably want to perform this check both during initialization,
and when adding patterns in cone mode. It may also be worthwhile to
check the validity of the cone before running 'list' or 'reapply', too.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-20 16:42     ` Taylor Blau
@ 2021-09-20 17:25       ` Derrick Stolee
  2021-09-20 17:27         ` Derrick Stolee
  0 siblings, 1 reply; 15+ messages in thread
From: Derrick Stolee @ 2021-09-20 17:25 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Calbabreaker, git

On 9/20/2021 12:42 PM, Taylor Blau wrote:
> On Mon, Sep 20, 2021 at 12:29:36PM -0400, Derrick Stolee wrote:
>>> So I think the problem really is that we need to drop existing patterns
>>> when re-initializing the sparse-checkout in cone mode. We could try to
>>> recognize that existing patterns may already constitute a cone (and/or
>>> create a cone that covers the existing patterns).
>>>
>>> But I think the easiest thing (if a little unfriendly) would be to just
>>> drop them and start afresh when re-initializing the sparse-checkout in
>>> cone mode.
>>
>> This isn't sufficient, as a user can modify their .git/info/sparse-checkout
>> file whenever they want, so we should fix this bug regardless. We could add
>> a "Your existing patterns are not in cone mode" error.
>>
>> It might still be a good idea to let "git sparse-checkout init --cone"
>> overwrite the sparse-checkout file _if the file is not already in cone
>> mode_.
> 
> I'm not sure how helpful such an error message might be to a user in
> this scenario without extra information. After seeing just "this isn't a
> cone", it's not clear what they should do other than drop their
> sparse-checkout configuration and start over.
> 
> It would be nice to have an intermediate step between seeing realizing
> that the existing patterns don't form a cone and dropping them.
> 
> Perhaps we could include an error message and say something like:
> 
>     warning: your sparse-checkout patterns do not from a cone
>        hint: to reinitialize your sparse-checkout configuration
>        hint: try running:
>        hint:
>        hint:   git sparse-checkout init --cone --reinitialize
> 
> Where `--reinitialize` means to drop existing patterns. I suppose it
> could be the default when transitioning from non-cone to cone mode, but
> that would defeat the purpose of the warning.

I've got an initial set of commits in a GGG pull request [1], and I'm
waiting build validation as a double-check before sending them to the
list.

[1] https://github.com/gitgitgadget/git/pull/1043

There, I think the best thing to do is have 'init --cone' overwrite the
old, non-cone-mode patterns. This essentially makes your --reinitialize
suggestion on permanently in that case. Note that it doesn't reinitialize
when the patterns are already in cone mode.

Note that 'git sparse-checkout set' (with no other arguments) initializes
the sparse-checkout file with the default set of cone mode patterns, so
we have a way to do that now.
 
> We would probably want to perform this check both during initialization,
> and when adding patterns in cone mode. It may also be worthwhile to
> check the validity of the cone before running 'list' or 'reapply', too.

'list' definitely seems like a good idea, since it is expecting different
output than the literal patterns when cone mode is enabled.

'reapply' should work in both cases, it will just use the old pattern
matching algorithm when in cone mode. It will present a warning when the
patterns don't match as expected.

One thing that is very interesting about the example here is that a
pattern "dir" is not recognized as a non-cone-mode pattern, but it
should be. The fix is to look for a '/' at the start of the pattern and
reject it otherwise. This is fixed in my series.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-20 17:25       ` Derrick Stolee
@ 2021-09-20 17:27         ` Derrick Stolee
  2021-09-20 19:08           ` Taylor Blau
  0 siblings, 1 reply; 15+ messages in thread
From: Derrick Stolee @ 2021-09-20 17:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Calbabreaker, git

On 9/20/2021 1:25 PM, Derrick Stolee wrote:
> On 9/20/2021 12:42 PM, Taylor Blau wrote:
>> We would probably want to perform this check both during initialization,
>> and when adding patterns in cone mode. It may also be worthwhile to
>> check the validity of the cone before running 'list' or 'reapply', too.
> 
> 'list' definitely seems like a good idea, since it is expecting different
> output than the literal patterns when cone mode is enabled.

I double-checked this to see how to fix this, and the 'list' subcommand
already notices that the patterns are not in cone mode and reverts its
behavior to writing all of the sparse-checkout file to stdout. It also
writes warnings over stderr before that.

There might not be anything pressing to do here.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-20 17:27         ` Derrick Stolee
@ 2021-09-20 19:08           ` Taylor Blau
  2021-09-20 20:56             ` Derrick Stolee
  0 siblings, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2021-09-20 19:08 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Calbabreaker, git

On Mon, Sep 20, 2021 at 01:27:25PM -0400, Derrick Stolee wrote:
> On 9/20/2021 1:25 PM, Derrick Stolee wrote:
> > On 9/20/2021 12:42 PM, Taylor Blau wrote:
> >> We would probably want to perform this check both during initialization,
> >> and when adding patterns in cone mode. It may also be worthwhile to
> >> check the validity of the cone before running 'list' or 'reapply', too.
> >
> > 'list' definitely seems like a good idea, since it is expecting different
> > output than the literal patterns when cone mode is enabled.
>
> I double-checked this to see how to fix this, and the 'list' subcommand
> already notices that the patterns are not in cone mode and reverts its
> behavior to writing all of the sparse-checkout file to stdout. It also
> writes warnings over stderr before that.
>
> There might not be anything pressing to do here.

Hmm. I think we'd probably want the same behavior for init and any other
commands which could potentially overwrite the contents of the
sparse-checkout file.

Those may already call list routines internally, in which case I agree
that this is already taken care of. But if not, then I think we should
match list's behavior in the new locations, too.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-20 19:08           ` Taylor Blau
@ 2021-09-20 20:56             ` Derrick Stolee
  2021-09-20 21:20               ` Taylor Blau
  0 siblings, 1 reply; 15+ messages in thread
From: Derrick Stolee @ 2021-09-20 20:56 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Calbabreaker, git

On 9/20/2021 3:08 PM, Taylor Blau wrote:
> On Mon, Sep 20, 2021 at 01:27:25PM -0400, Derrick Stolee wrote:
>> On 9/20/2021 1:25 PM, Derrick Stolee wrote:
>>> On 9/20/2021 12:42 PM, Taylor Blau wrote:
>>>> We would probably want to perform this check both during initialization,
>>>> and when adding patterns in cone mode. It may also be worthwhile to
>>>> check the validity of the cone before running 'list' or 'reapply', too.
>>>
>>> 'list' definitely seems like a good idea, since it is expecting different
>>> output than the literal patterns when cone mode is enabled.
>>
>> I double-checked this to see how to fix this, and the 'list' subcommand
>> already notices that the patterns are not in cone mode and reverts its
>> behavior to writing all of the sparse-checkout file to stdout. It also
>> writes warnings over stderr before that.
>>
>> There might not be anything pressing to do here.
> 
> Hmm. I think we'd probably want the same behavior for init and any other
> commands which could potentially overwrite the contents of the
> sparse-checkout file.

Could you elaborate on what you mean by "the same behavior"?

Do you mean that "git sparse-checkout add X" should act as if cone mode
is not enabled if the existing patterns are not cone-mode patterns?

What exactly do you mean about "init" changing behavior here?
 
> Those may already call list routines internally, in which case I agree
> that this is already taken care of. But if not, then I think we should
> match list's behavior in the new locations, too.

"list" interprets the 'struct pattern_list' in two different ways,
depending on the use_cone_patterns member. They are static methods in
the builtin code, not used by anything else.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-20 20:56             ` Derrick Stolee
@ 2021-09-20 21:20               ` Taylor Blau
  2021-09-21 12:55                 ` Derrick Stolee
  0 siblings, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2021-09-20 21:20 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Calbabreaker, git

On Mon, Sep 20, 2021 at 04:56:47PM -0400, Derrick Stolee wrote:
> >> I double-checked this to see how to fix this, and the 'list' subcommand
> >> already notices that the patterns are not in cone mode and reverts its
> >> behavior to writing all of the sparse-checkout file to stdout. It also
> >> writes warnings over stderr before that.
> >>
> >> There might not be anything pressing to do here.
> >
> > Hmm. I think we'd probably want the same behavior for init and any other
> > commands which could potentially overwrite the contents of the
> > sparse-checkout file.
>
> Could you elaborate on what you mean by "the same behavior"?
>
> Do you mean that "git sparse-checkout add X" should act as if cone mode
> is not enabled if the existing patterns are not cone-mode patterns?
>
> What exactly do you mean about "init" changing behavior here?

No, I was referring to your suggestion from [1] to add a warning from
"git sparse-checkout init --cone" when there are existing patterns which
are not in cone-mode.

> > Those may already call list routines internally, in which case I agree
> > that this is already taken care of. But if not, then I think we should
> > match list's behavior in the new locations, too.
>
> "list" interprets the 'struct pattern_list' in two different ways,
> depending on the use_cone_patterns member. They are static methods in
> the builtin code, not used by anything else.

Ah, bummer. I was hoping that they'd be used internally by init so that
it would automatically emit a warning in the case where a user's
existing patterns are not in cone mode.

Apologies for any confusion.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/8a0ddd8e-b585-8f40-c4b1-0a51f11e6b84@gmail.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-20 21:20               ` Taylor Blau
@ 2021-09-21 12:55                 ` Derrick Stolee
  2021-09-21 16:32                   ` Taylor Blau
  0 siblings, 1 reply; 15+ messages in thread
From: Derrick Stolee @ 2021-09-21 12:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Calbabreaker, git

On 9/20/2021 5:20 PM, Taylor Blau wrote:
> On Mon, Sep 20, 2021 at 04:56:47PM -0400, Derrick Stolee wrote:
>>>> I double-checked this to see how to fix this, and the 'list' subcommand
>>>> already notices that the patterns are not in cone mode and reverts its
>>>> behavior to writing all of the sparse-checkout file to stdout. It also
>>>> writes warnings over stderr before that.
>>>>
>>>> There might not be anything pressing to do here.
>>>
>>> Hmm. I think we'd probably want the same behavior for init and any other
>>> commands which could potentially overwrite the contents of the
>>> sparse-checkout file.
>>
>> Could you elaborate on what you mean by "the same behavior"?
>>
>> Do you mean that "git sparse-checkout add X" should act as if cone mode
>> is not enabled if the existing patterns are not cone-mode patterns?
>>
>> What exactly do you mean about "init" changing behavior here?
> 
> No, I was referring to your suggestion from [1] to add a warning from
> "git sparse-checkout init --cone" when there are existing patterns which
> are not in cone-mode.

This warning is part of the sparse-checkout pattern parsing logic, so
it happens whenever the patterns are loaded, including the "list"
subcommand (among other commands, not just the sparse-checkout builtin).

>>> Those may already call list routines internally, in which case I agree
>>> that this is already taken care of. But if not, then I think we should
>>> match list's behavior in the new locations, too.
>>
>> "list" interprets the 'struct pattern_list' in two different ways,
>> depending on the use_cone_patterns member. They are static methods in
>> the builtin code, not used by anything else.
> 
> Ah, bummer. I was hoping that they'd be used internally by init so that
> it would automatically emit a warning in the case where a user's
> existing patterns are not in cone mode.
> 
> Apologies for any confusion.

Thanks for clearing it up!

-Stolee

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-21 12:55                 ` Derrick Stolee
@ 2021-09-21 16:32                   ` Taylor Blau
  2021-09-21 18:56                     ` Derrick Stolee
  0 siblings, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2021-09-21 16:32 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Calbabreaker, git

On Tue, Sep 21, 2021 at 08:55:01AM -0400, Derrick Stolee wrote:
> On 9/20/2021 5:20 PM, Taylor Blau wrote:
> > On Mon, Sep 20, 2021 at 04:56:47PM -0400, Derrick Stolee wrote:
> >>>> I double-checked this to see how to fix this, and the 'list' subcommand
> >>>> already notices that the patterns are not in cone mode and reverts its
> >>>> behavior to writing all of the sparse-checkout file to stdout. It also
> >>>> writes warnings over stderr before that.
> >>>>
> >>>> There might not be anything pressing to do here.
> >>>
> >>> Hmm. I think we'd probably want the same behavior for init and any other
> >>> commands which could potentially overwrite the contents of the
> >>> sparse-checkout file.
> >>
> >> Could you elaborate on what you mean by "the same behavior"?
> >>
> >> Do you mean that "git sparse-checkout add X" should act as if cone mode
> >> is not enabled if the existing patterns are not cone-mode patterns?
> >>
> >> What exactly do you mean about "init" changing behavior here?
> >
> > No, I was referring to your suggestion from [1] to add a warning from
> > "git sparse-checkout init --cone" when there are existing patterns which
> > are not in cone-mode.
>
> This warning is part of the sparse-checkout pattern parsing logic, so
> it happens whenever the patterns are loaded, including the "list"
> subcommand (among other commands, not just the sparse-checkout builtin).

I might be misunderstanding what you're saying. But what I'm wondering
is: if we detect when existing patterns aren't in cone-mode, why didn't
we catch that case originally when the memory leak was discovered?

I thought that it might have been related to your third patch to change
how bad patterns are detected. But I ran the following script after
applying each of your three patches individually:

    rm -fr repo
    git init repo
    cd repo

    git sparse-checkout init
    git sparse-checkout add foo
    git sparse-checkout init --cone
    git sparse-checkout add foo

and the only difference is that we started silently dropping the bad
"foo" pattern after re-adding foo in cone-mode starting with the second
patch.

I guess my question is: it seems like there may be a friendlier way to
tell the user that we're about to drop their sparse-checkout definition
instead of just doing it silently. And it seems like you're saying that
we already have something that detects that and is used everywhere. But
I wonder why it wasn't kicking in in the original report.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-21 16:32                   ` Taylor Blau
@ 2021-09-21 18:56                     ` Derrick Stolee
  2021-09-21 20:45                       ` Taylor Blau
  0 siblings, 1 reply; 15+ messages in thread
From: Derrick Stolee @ 2021-09-21 18:56 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Calbabreaker, git

On 9/21/2021 12:32 PM, Taylor Blau wrote:
> On Tue, Sep 21, 2021 at 08:55:01AM -0400, Derrick Stolee wrote:
>> On 9/20/2021 5:20 PM, Taylor Blau wrote:
>>> On Mon, Sep 20, 2021 at 04:56:47PM -0400, Derrick Stolee wrote:
>>>>>> I double-checked this to see how to fix this, and the 'list' subcommand
>>>>>> already notices that the patterns are not in cone mode and reverts its
>>>>>> behavior to writing all of the sparse-checkout file to stdout. It also
>>>>>> writes warnings over stderr before that.
>>>>>>
>>>>>> There might not be anything pressing to do here.
>>>>>
>>>>> Hmm. I think we'd probably want the same behavior for init and any other
>>>>> commands which could potentially overwrite the contents of the
>>>>> sparse-checkout file.
>>>>
>>>> Could you elaborate on what you mean by "the same behavior"?
>>>>
>>>> Do you mean that "git sparse-checkout add X" should act as if cone mode
>>>> is not enabled if the existing patterns are not cone-mode patterns?
>>>>
>>>> What exactly do you mean about "init" changing behavior here?
>>>
>>> No, I was referring to your suggestion from [1] to add a warning from
>>> "git sparse-checkout init --cone" when there are existing patterns which
>>> are not in cone-mode.
>>
>> This warning is part of the sparse-checkout pattern parsing logic, so
>> it happens whenever the patterns are loaded, including the "list"
>> subcommand (among other commands, not just the sparse-checkout builtin).
> 
> I might be misunderstanding what you're saying. But what I'm wondering
> is: if we detect when existing patterns aren't in cone-mode, why didn't
> we catch that case originally when the memory leak was discovered?

Easy answer: there was a bug. The pattern in the original report evaded
the checks that were implemented to identify non-cone-mode patterns.

My patch 3 [1] fixes that problem, with this hunk:

diff --git a/dir.c b/dir.c
index 03c4d212672..93136442103 100644
--- a/dir.c
+++ b/dir.c
@@ -727,7 +727,7 @@ static void add_pattern_to_hashsets(struct pattern_list *pl, struct path_pattern
 	}
 
 	if (given->patternlen < 2 ||
-	    *given->pattern == '*' ||
+	    *given->pattern != '/' ||
 	    strstr(given->pattern, "**")) {
 		/* Not a cone pattern. */
 		warning(_("unrecognized pattern: '%s'"), given->pattern);

and a test case change is required to avoid having the warning message
appear in the wrong places:

@@ -182,9 +185,9 @@ test_expect_success 'set sparse-checkout using --stdin' '
 test_expect_success 'add to sparse-checkout' '
 	cat repo/.git/info/sparse-checkout >expect &&
 	cat >add <<-\EOF &&
-	pattern1
+	/pattern1
 	/folder1/
-	pattern2
+	/pattern2
 	EOF
 	cat add >>expect &&
 	git -C repo sparse-checkout add --stdin <add &&

[1] https://lore.kernel.org/git/d513b28b75189d066f9c66de44a1a578cbc38139.1632160658.git.gitgitgadget@gmail.com/

> I thought that it might have been related to your third patch to change
> how bad patterns are detected. But I ran the following script after
> applying each of your three patches individually:
> 
>     rm -fr repo
>     git init repo
>     cd repo
> 
>     git sparse-checkout init
>     git sparse-checkout add foo
>     git sparse-checkout init --cone
>     git sparse-checkout add foo
> 
> and the only difference is that we started silently dropping the bad
> "foo" pattern after re-adding foo in cone-mode starting with the second
> patch.

In patch 2, we "detect" that the old patterns were not in cone mode
because the core.sparseCheckoutCone config is false when parsing the
patterns, so use_cone_patterns is 0.

> I guess my question is: it seems like there may be a friendlier way to
> tell the user that we're about to drop their sparse-checkout definition
> instead of just doing it silently. And it seems like you're saying that
> we already have something that detects that and is used everywhere. But
> I wonder why it wasn't kicking in in the original report.

You are correct that we should make better documentation around the
re-initialization of patterns. And this _is_ a change of behavior and
I will cave to concerns around that. I think this is more a case of
matching what users expect from the interface, or at minimum helping
them fall into the "pit of success".

Let's move the discussion to that thread so we can interleave the
patches themselves.

Thanks,
-Stolee

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-21 18:56                     ` Derrick Stolee
@ 2021-09-21 20:45                       ` Taylor Blau
  2021-09-22 19:16                         ` Derrick Stolee
  0 siblings, 1 reply; 15+ messages in thread
From: Taylor Blau @ 2021-09-21 20:45 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Calbabreaker, git

On Tue, Sep 21, 2021 at 02:56:01PM -0400, Derrick Stolee wrote:
> > I thought that it might have been related to your third patch to change
> > how bad patterns are detected. But I ran the following script after
> > applying each of your three patches individually:
> >
> >     rm -fr repo
> >     git init repo
> >     cd repo
> >
> >     git sparse-checkout init
> >     git sparse-checkout add foo
> >     git sparse-checkout init --cone
> >     git sparse-checkout add foo
> >
> > and the only difference is that we started silently dropping the bad
> > "foo" pattern after re-adding foo in cone-mode starting with the second
> > patch.
>
> In patch 2, we "detect" that the old patterns were not in cone mode
> because the core.sparseCheckoutCone config is false when parsing the
> patterns, so use_cone_patterns is 0.

I fear that we're talking about different things. With your patches, if
I munge my .git/info/sparse-checkout file, I can easily get something
like:

  $ git.compile sparse-checkout list
  warning: unrecognized pattern: 'foo'
  warning: disabling cone pattern matching

to appear. But I'm wondering why the same doesn't happen when running
`git sparse-checkout init --cone` while the existing sparse-checkout
definition contains non-cone mode entries.

I could be holding it wrong, but I was unsuccessful in getting a warning
out with the quoted script.

> Let's move the discussion to that thread so we can interleave the
> patches themselves.

Sure, if you want. I just found it easier to reply here since I can more
readily quote the things I want to respond to.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-21 20:45                       ` Taylor Blau
@ 2021-09-22 19:16                         ` Derrick Stolee
  2021-09-22 19:37                           ` Taylor Blau
  0 siblings, 1 reply; 15+ messages in thread
From: Derrick Stolee @ 2021-09-22 19:16 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Calbabreaker, git

On 9/21/2021 4:45 PM, Taylor Blau wrote:
> On Tue, Sep 21, 2021 at 02:56:01PM -0400, Derrick Stolee wrote:
>>> I thought that it might have been related to your third patch to change
>>> how bad patterns are detected. But I ran the following script after
>>> applying each of your three patches individually:
>>>
>>>     rm -fr repo
>>>     git init repo
>>>     cd repo
>>>
>>>     git sparse-checkout init
>>>     git sparse-checkout add foo
>>>     git sparse-checkout init --cone
>>>     git sparse-checkout add foo
>>>
>>> and the only difference is that we started silently dropping the bad
>>> "foo" pattern after re-adding foo in cone-mode starting with the second
>>> patch.
>>
>> In patch 2, we "detect" that the old patterns were not in cone mode
>> because the core.sparseCheckoutCone config is false when parsing the
>> patterns, so use_cone_patterns is 0.
> 
> I fear that we're talking about different things. With your patches, if
> I munge my .git/info/sparse-checkout file, I can easily get something
> like:
> 
>   $ git.compile sparse-checkout list
>   warning: unrecognized pattern: 'foo'
>   warning: disabling cone pattern matching
> 
> to appear. But I'm wondering why the same doesn't happen when running
> `git sparse-checkout init --cone` while the existing sparse-checkout
> definition contains non-cone mode entries.

You don't get that warning because it's not trying to parse the
previous patterns using cone mode. Before my series, you would get
the warning in a _second_ run of "git sparse-checkout init --cone".

We should add a different warning for overwriting the existing
patterns.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory leak with sparse-checkout
  2021-09-22 19:16                         ` Derrick Stolee
@ 2021-09-22 19:37                           ` Taylor Blau
  0 siblings, 0 replies; 15+ messages in thread
From: Taylor Blau @ 2021-09-22 19:37 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Calbabreaker, git

On Wed, Sep 22, 2021 at 03:16:22PM -0400, Derrick Stolee wrote:
> On 9/21/2021 4:45 PM, Taylor Blau wrote:
> [...]
> > I fear that we're talking about different things. With your patches, if
> > I munge my .git/info/sparse-checkout file, I can easily get something
> > like:
> >
> >   $ git.compile sparse-checkout list
> >   warning: unrecognized pattern: 'foo'
> >   warning: disabling cone pattern matching
> >
> > to appear. But I'm wondering why the same doesn't happen when running
> > `git sparse-checkout init --cone` while the existing sparse-checkout
> > definition contains non-cone mode entries.
>
> You don't get that warning because it's not trying to parse the
> previous patterns using cone mode. Before my series, you would get
> the warning in a _second_ run of "git sparse-checkout init --cone".

OK. I was confused because in [1] you said that we produce this error
any time we try to load the existing definitions and parse them in
cone-mode.

I would have assumed that we would try to do that during `init --cone`
to read the existing sparse-checkout definition.

Thanks,
Taylor

[1]: https://lore.kernel.org/git/734ecf93-e563-20d5-7cf1-74048aa74d56@gmail.com/

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-09-22 19:37 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-20 12:15 Memory leak with sparse-checkout Calbabreaker
2021-09-20 15:52 ` Taylor Blau
2021-09-20 16:29   ` Derrick Stolee
2021-09-20 16:42     ` Taylor Blau
2021-09-20 17:25       ` Derrick Stolee
2021-09-20 17:27         ` Derrick Stolee
2021-09-20 19:08           ` Taylor Blau
2021-09-20 20:56             ` Derrick Stolee
2021-09-20 21:20               ` Taylor Blau
2021-09-21 12:55                 ` Derrick Stolee
2021-09-21 16:32                   ` Taylor Blau
2021-09-21 18:56                     ` Derrick Stolee
2021-09-21 20:45                       ` Taylor Blau
2021-09-22 19:16                         ` Derrick Stolee
2021-09-22 19:37                           ` Taylor Blau

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.