All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.34 regression (and workaround): deleting untracked files both outside *and inside* desired sparsity cone
@ 2021-12-01 17:16 Elijah Newren
  2021-12-01 19:19 ` Derrick Stolee
  0 siblings, 1 reply; 4+ messages in thread
From: Elijah Newren @ 2021-12-01 17:16 UTC (permalink / raw)
  To: Derrick Stolee, Victoria Dye, Lessley Dennington; +Cc: Git Mailing List

Hi,

I've got a proposal for changing the sparse-checkout command slightly;
but it probably doesn't make sense without the context of the bugs
(old and new) we are facing.  Consider this an RFC, with the final
bullet point particularly in need of comment and ideas.

== Background ==

sparse-checkouts in cone mode are documented as being obtained either
by using the `--sparse` flag to `git clone`, or by running the
sequence:

    git sparse-checkout init --cone [--sparse-index]
    git sparse-checkout set ...

The first step has traditionally deleted all the tracked files from
the working tree, except in the toplevel directory, and the second
restores all the tracked files that are wanted.

(Usage context:)
My understanding is Microsoft never uses this sequence, instead using
the --sparse flag to `git clone`.  In contrast, at Palantir the
--sparse flag to clone is rarely used.

(An aside on pre-existing bugs/warts of the above sequence:)
This has always been bad from a performance point of view (especially
in more extreme not-so-sparse cases when the desired sparsity paths
represent roughly half of all paths), and has been suboptimal from a
UI point of view due to the dual progress meters (other wrappers, such
as an in-house `sparsify`, can have a single user-facing command for
switching to a sparse-checkout that has to call both of these git
commands under the hood; that wrapper would prefer one progress meter
to two).  But it never quite arose to the level of needing to be
fixed.

== The (New) Bug ==

Starting with Git 2.34, each step will delete all ignored files
outside the sparsity paths specified to the individual command in
question.  We are totally onboard with deleting ignored files outside
the sparsity paths the user wants, but the first command is required
according to the documentation and does not allow specifying any
sparsity paths.  Since it does not allow specifying any sparsity
paths, it treats *everything* as outside and essentially deletes all
ignored files everywhere.  That's not workable for us.  We want a
single command for changing to a sparse-checkout.

== The Current Workaround ==

Luckily, having these two commands separate isn't enforced, and the
first command is basically roughly equivalent to setting a few config
variables and then running `sparse-checkout set` with an empty set of
paths.  So, currently, we can just do the config setting part of init
manually, and then skip the rest of init, and then call our desired
`set` command:
    git config extensions.worktreeConfig true
    git config --worktree core.sparseCheckout true
    git config --worktree core.sparseCheckoutCone true
    git sparse-checkout set ...

Since we're using a wrapper anyway (for computing dependencies and
determining the list of directories to include), it was relatively
easy for us to add this workaround.

However, it is not clear that our current workaround will continue
functioning with future versions of git, particularly if
`sparse-checkout init` gains more options.  In fact, it already
doesn't handle --sparse-index.

== Long term proposal ==

Make `set` do both the work of `init` and `set`.

This means:
  * `set` gains the ability to parse both --cone and --sparse-index
(in addition to --stdin, etc.)
  * If the sparse-index is not initialized, `set` does the
initialization work of `init`.
  * Modify the `init` documentation to mark it as deprecated,
mentioning the 2-3 bugs above as reasons why.
  * We could effectively just turn `git sparse-checkout init ...` into
an alias for `git sparse-checkout set ...`, since init's parameters
would be a subset of those that `set` accepts.  However, the latter
might interact badly with allowing a user to toggle sparse-index on
and off in the middle of a sparse-checkout...so maybe we need
something more?  Alternatively, we could leave `init` as-is and just
consider it set in concrete, possibly risking it becoming
non-functional in a future upgrade.  Hmm...


Thoughts?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.34 regression (and workaround): deleting untracked files both outside *and inside* desired sparsity cone
  2021-12-01 17:16 2.34 regression (and workaround): deleting untracked files both outside *and inside* desired sparsity cone Elijah Newren
@ 2021-12-01 19:19 ` Derrick Stolee
  2021-12-01 23:40   ` Elijah Newren
  0 siblings, 1 reply; 4+ messages in thread
From: Derrick Stolee @ 2021-12-01 19:19 UTC (permalink / raw)
  To: Elijah Newren, Victoria Dye, Lessley Dennington; +Cc: Git Mailing List

On 12/1/2021 12:16 PM, Elijah Newren wrote:
> Hi,
> 
> I've got a proposal for changing the sparse-checkout command slightly;
> but it probably doesn't make sense without the context of the bugs
> (old and new) we are facing.  Consider this an RFC, with the final
> bullet point particularly in need of comment and ideas.
> 
> == Background ==
> 
> sparse-checkouts in cone mode are documented as being obtained either
> by using the `--sparse` flag to `git clone`, ...

The `--sparse` flag doesn't initialize cone mode, unfortunately.

>     git sparse-checkout init --cone [--sparse-index]
>     git sparse-checkout set ...
> 
> The first step has traditionally deleted all the tracked files from
> the working tree, except in the toplevel directory, and the second
> restores all the tracked files that are wanted.
> 
> (Usage context:)
> My understanding is Microsoft never uses this sequence, instead using
> the --sparse flag to `git clone`.  In contrast, at Palantir the
> --sparse flag to clone is rarely used.

We use the sparse-checkout builtin. From the Scalar patch series,
you can see that we don't call "git clone" at all, but instead
"scalar clone" does a lot with "git init" and works from there by
setting config and fetching at the correct time.
 
> == The (New) Bug ==
> 
> Starting with Git 2.34, each step will delete all ignored files
> outside the sparsity paths specified to the individual command in
> question.  We are totally onboard with deleting ignored files outside
> the sparsity paths the user wants, but the first command is required
> according to the documentation and does not allow specifying any
> sparsity paths.  Since it does not allow specifying any sparsity
> paths, it treats *everything* as outside and essentially deletes all
> ignored files everywhere.  That's not workable for us.  We want a
> single command for changing to a sparse-checkout.

Ah, since 'git sparse-checkout set' would work differently if not
in cone mode, I see the problem. It's a little too much to manually
set core.sparseCheckoutCone=true before running the 'set' command,
probably.

> == The Current Workaround ==
> 
> Luckily, having these two commands separate isn't enforced, and the
> first command is basically roughly equivalent to setting a few config
> variables and then running `sparse-checkout set` with an empty set of
> paths.  So, currently, we can just do the config setting part of init
> manually, and then skip the rest of init, and then call our desired
> `set` command:
>     git config extensions.worktreeConfig true
>     git config --worktree core.sparseCheckout true
>     git config --worktree core.sparseCheckoutCone true
>     git sparse-checkout set ...

Which you have already worked out.

> Since we're using a wrapper anyway (for computing dependencies and
> determining the list of directories to include), it was relatively
> easy for us to add this workaround.
> 
> However, it is not clear that our current workaround will continue
> functioning with future versions of git, particularly if
> `sparse-checkout init` gains more options.  In fact, it already
> doesn't handle --sparse-index.

Right. It's _yet another_ config option to tweak.

> == Long term proposal ==
> 
> Make `set` do both the work of `init` and `set`.
> 
> This means:
>   * `set` gains the ability to parse both --cone and --sparse-index
> (in addition to --stdin, etc.)
>   * If the sparse-index is not initialized, `set` does the
> initialization work of `init`.
>   * Modify the `init` documentation to mark it as deprecated,
> mentioning the 2-3 bugs above as reasons why.
>   * We could effectively just turn `git sparse-checkout init ...` into
> an alias for `git sparse-checkout set ...`, since init's parameters
> would be a subset of those that `set` accepts.  However, the latter
> might interact badly with allowing a user to toggle sparse-index on
> and off in the middle of a sparse-checkout...so maybe we need
> something more?  Alternatively, we could leave `init` as-is and just
> consider it set in concrete, possibly risking it becoming
> non-functional in a future upgrade.  Hmm...

I think this is a good plan. Making 'init' the same as 'set' with
no paths makes sense to me. We would want to be careful now that
"--option" could be interpreted as a path to recommend using

  git sparse-checkout set <options> -- <path1> ... <pathN>

While you are here, I would be interested in making 'git clone
--sparse' default to cone mode. Or, should it be 'git clone
--sparse=cone' or something? Not making it default to cone mode
is a big regret of mine.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.34 regression (and workaround): deleting untracked files both outside *and inside* desired sparsity cone
  2021-12-01 19:19 ` Derrick Stolee
@ 2021-12-01 23:40   ` Elijah Newren
  2021-12-02  0:41     ` Derrick Stolee
  0 siblings, 1 reply; 4+ messages in thread
From: Elijah Newren @ 2021-12-01 23:40 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Victoria Dye, Lessley Dennington, Git Mailing List

On Wed, Dec 1, 2021 at 11:19 AM Derrick Stolee <stolee@gmail.com> wrote:
>
...
> We use the sparse-checkout builtin. From the Scalar patch series,
> you can see that we don't call "git clone" at all, but instead
> "scalar clone" does a lot with "git init" and works from there by
> setting config and fetching at the correct time.

Ah, thanks for the correction.  I had seen that, but forgotten.

...
> > == Long term proposal ==
> >
> > Make `set` do both the work of `init` and `set`.
> >
> > This means:
> >   * `set` gains the ability to parse both --cone and --sparse-index
> > (in addition to --stdin, etc.)
> >   * If the sparse-index is not initialized, `set` does the
> > initialization work of `init`.
> >   * Modify the `init` documentation to mark it as deprecated,
> > mentioning the 2-3 bugs above as reasons why.
> >   * We could effectively just turn `git sparse-checkout init ...` into
> > an alias for `git sparse-checkout set ...`, since init's parameters
> > would be a subset of those that `set` accepts.  However, the latter
> > might interact badly with allowing a user to toggle sparse-index on
> > and off in the middle of a sparse-checkout...so maybe we need
> > something more?  Alternatively, we could leave `init` as-is and just
> > consider it set in concrete, possibly risking it becoming
> > non-functional in a future upgrade.  Hmm...
>
> I think this is a good plan. Making 'init' the same as 'set' with
> no paths makes sense to me.

Cool, I'll get to work on it.

> We would want to be careful now that
> "--option" could be interpreted as a path to recommend using
>
>   git sparse-checkout set <options> -- <path1> ... <pathN>

Makes sense.  However, wasn't this already an issue when you added
`--stdin` as an option for the `set` command?

> While you are here, I would be interested in making 'git clone
> --sparse' default to cone mode. Or, should it be 'git clone
> --sparse=cone' or something? Not making it default to cone mode
> is a big regret of mine.

I agree it'd be much nicer to have it default to cone mode, and the
big warning in git-sparse-checkout.txt might permit us to do so.  A
few related questions:

* Should we document how to change from cone mode to non-cone mode?
We have --sparse-index, --no-sparse-index, and --cone flags, but no
--no-cone one.  Should we?  (Do these flags belong somewhere other
than `init` since it's toggling some other flag while already using a
sparse-checkout?)

* Should we clean up the wording in clone's --sparse option?  In particular:

--sparse::
Initialize the sparse-checkout file so the working
directory starts with only the files in the root
of the repository. The sparse-checkout file can be
modified to grow the working directory as needed.

This wording seems to suggest direct editing of
.git/info/sparse-checkout, and might confuse users.  Perhaps the last
sentence could change "sparse-checkout file can be modified" ->
"sparse-checkout command can be used" or something like that?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: 2.34 regression (and workaround): deleting untracked files both outside *and inside* desired sparsity cone
  2021-12-01 23:40   ` Elijah Newren
@ 2021-12-02  0:41     ` Derrick Stolee
  0 siblings, 0 replies; 4+ messages in thread
From: Derrick Stolee @ 2021-12-02  0:41 UTC (permalink / raw)
  To: Elijah Newren; +Cc: Victoria Dye, Lessley Dennington, Git Mailing List

On 12/1/2021 6:40 PM, Elijah Newren wrote:
> On Wed, Dec 1, 2021 at 11:19 AM Derrick Stolee <stolee@gmail.com> wrote:
>>
>> We would want to be careful now that
>> "--option" could be interpreted as a path to recommend using
>>
>>   git sparse-checkout set <options> -- <path1> ... <pathN>
> 
> Makes sense.  However, wasn't this already an issue when you added
> `--stdin` as an option for the `set` command?

You are right. This should already be handled in a sane way.
 
>> While you are here, I would be interested in making 'git clone
>> --sparse' default to cone mode. Or, should it be 'git clone
>> --sparse=cone' or something? Not making it default to cone mode
>> is a big regret of mine.
> 
> I agree it'd be much nicer to have it default to cone mode, and the
> big warning in git-sparse-checkout.txt might permit us to do so.  A
> few related questions:
> 
> * Should we document how to change from cone mode to non-cone mode?
> We have --sparse-index, --no-sparse-index, and --cone flags, but no
> --no-cone one.  Should we?  (Do these flags belong somewhere other
> than `init` since it's toggling some other flag while already using a
> sparse-checkout?)

--no-cone exists, it probably just isn't in the docs. Our 'init'
options are defined as follows:

	static struct option builtin_sparse_checkout_init_options[] = {
		OPT_BOOL(0, "cone", &init_opts.cone_mode,
			 N_("initialize the sparse-checkout in cone mode")),
		OPT_BOOL(0, "sparse-index", &init_opts.sparse_index,
			 N_("toggle the use of a sparse index")),
		OPT_END(),
	};


so --no-cone exists the same way --no-sparse-index does.

> * Should we clean up the wording in clone's --sparse option?  In particular:
> 
> --sparse::
> Initialize the sparse-checkout file so the working
> directory starts with only the files in the root
> of the repository. The sparse-checkout file can be
> modified to grow the working directory as needed.
> 
> This wording seems to suggest direct editing of
> .git/info/sparse-checkout, and might confuse users.  Perhaps the last
> sentence could change "sparse-checkout file can be modified" ->
> "sparse-checkout command can be used" or something like that?

This is probably just too old. It could use updates to link to
git-sparse-checkout.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-12-02  0:43 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-01 17:16 2.34 regression (and workaround): deleting untracked files both outside *and inside* desired sparsity cone Elijah Newren
2021-12-01 19:19 ` Derrick Stolee
2021-12-01 23:40   ` Elijah Newren
2021-12-02  0:41     ` Derrick Stolee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.