From: Shaoxuan Yuan <firstname.lastname@example.org>
To: Elijah Newren <email@example.com>
Cc: Derrick Stolee <firstname.lastname@example.org>,
Victoria Dye <email@example.com>,
Git Mailing List <firstname.lastname@example.org>,
Junio C Hamano <email@example.com>
Subject: Re: [PATCH v5 1/3] builtin/grep.c: add --sparse option
Date: Sun, 18 Sep 2022 21:13:05 -0700 [thread overview]
Message-ID: <firstname.lastname@example.org> (raw)
On 9/17/2022 9:24 PM, Elijah Newren wrote:
> On Fri, Sep 16, 2022 at 8:34 PM Shaoxuan Yuan <email@example.com> wrote:
>>> I'm also curious whether there shouldn't be a config option for
>>> something like this, so folks don't have to specify it with every
>>> invocation. In particular, while I certainly have users that want to
>>> just query git for information about the part of the history they are
>>> interested in, there are other users who are fully aware they are
>>> working in a bigger repository and want to search for additional
>>> things to add to their sparse-checkout and predominantly use grep for
>>> things like that. They have even documented that `git grep --cached
>>> <TERM>` can be used in sparse-checkouts for this purpose...and have
>>> been using that for a few years. (I did warn them at the time that
>>> there was a risk they'd have to change their command, but it's still
>>> going to be a behavioral change they might not expect.) Further, when
>>> I brought up changing the behavior of commands during sparse-checkouts
>>> to limit to files matching the sparsity paths in that old thread at
>>> , Stolee was a bit skeptical of making that the default. That
>>> suggests, at least, that two independent groups of users would want to
>>> use the non-sparse searching frequently, and frequently enough that
>>> they'd appreciate a config option.
>> A config option sounds good. Though I think
>> 1. If this option is for global behavior: users may better off turning
>> off sparse-checkout if they want a config to do things densely everywhere.
> Sorry, it sounds like I haven't explained the usecases to you very
> well. Let me try again.
> There are people who want to do everything densely, as you say, and
> those folks can just turn off sparse-checkout or not use it in the
> first place. Git has traditionally catered to these folks just fine.
> However, it's not a subset of interest for this discussion and wasn't
> what I was talking about.
> There are (at least) two different usecases for people wanting to use
> sparse-checkouts; I have users that fall under each category:
> 1) Working on a repository subset; users are _only_ interested in that subset.
> This usecase is very poorly supported in Git right now, but I think
> you understand it so I'll only briefly describe it.
> These folks might know there are other things in the repository, but
> don't care. Not only should the working tree be sparse, but grep,
> log, diff, etc. should be restricted to the subset of the tree they
> are interested in.
Right, this is the usecase I am familiar with.
> Restricting operations to the sparsity specification is also important
> for marrying partial clones with sparse checkouts while allowing
> disconnected development. Without such a restrict-to-sparsity-paths
> feature, the partial clones will attempt to download objects the first
> time they try to grep an old revision, or do log with a glob path.
> The download will fail, causing the operation to fail, and break the
> ability of the user to work in a disconnected manner.
OK, I'm still learning about partial clone, didn't get a chance to look
at it. Will try to figure out what this means :)
> 2) The working directory is sparse, but users are working in a larger whole.
> Stolee described this usecase this way:
> "I'm also focused on users that know that they are a part of a larger
> whole. They know they are operating on a large repository but focus on
> what they need to contribute their part. I expect multiple "roles" to
> use very different, almost disjoint parts of the codebase. Some other
> "architect" users operate across the entire tree or hop between different
> sections of the codebase as necessary. In this situation, I'm wary of
> scoping too many features to the sparse-checkout definition, especially
> "git log," as it can be too confusing to have their view of the codebase
> depend on your "point of view."
>  https://firstname.lastname@example.org/
> I describe it very similarly, but I'd like to point out something
> additional around this usecase and how it can be influenced by
> dependencies. The first cut for sparse-checkouts is usually the
> directories you are interested in plus what those directories depend
> upon within your repository. But there's a monkey wrench here: if you
> have integration tests, they invert the hierarchy: to run integration
> tests, you need not only what you are interested in and its
> dependencies, you also need everything that depends upon what you are
> interested in or that depends upon one of your dependencies...AND you
> need all the dependencies of that expanded group. That can easily
> change your sparse-checkout into a nearly dense one. Naturally, that
> tends to kill the benefits of sparse-checkouts. There are a couple
> solutions to this conundrum: either avoid grabbing dependencies (maybe
> have built versions of your dependencies pulled from a CI cache
> somewhere), or say that users shouldn't run integration tests directly
> and instead do it on the CI server when they submit a code review. Or
> do both. Regardless of whether you stub out your dependencies or stub
> out the things that depend upon you, there is certainly a reason to
> want to query and be aware of those other parts of the repository.
> Thus, sparse-checkouts can be used to limit what you directly build
> and modify, but these users do not want it to limit their queries of
> Once users pick either the first or the second usecase, they often
> stick within it. For either group, regardless of what Git's default
> is, needing to specify an additional flag for *every*
> grep/log/diff/etc. they run would just be a total annoyance. Neither
> wants a dense worktree, but one side wants a dense history query while
> the other wants a sparse one. Different groups should be able to
> configure the default that works well for them, much like we allow
> users to configure whether they want "git pull" to rebase or merge.
OK, now I get it:
Case A: users only interested in a subset, so they need only sparse
history and a sparse worktree.
Case B: users works within a subset but needs a larger context, so they
need a dense history/query (that's why we should let grep default to
--no-restrict, as you suggested?), though still a sparse worktree.
>> 2. If this option is for a single subcommand (e.g. 'grep'): I don't have
>> much thoughts here. It certainly can be nice for users who need to do
>> non-sparse searching frequently. This design, if necessary, should
>> belong to a patch where this config is added for every single subcommand?
>>> I also brought up in that old thread that perhaps we want to avoid
>>> adding a flag to every subcommand, and instead just having a
>>> git-global flag for triggering this type of behavior. (e.g. `git
>>> --no-restrict grep --cached ...` or `git --dense grep --cached ...`).
>> This looks more like the answer to me. It's a peace of mind for users if
>> they don't have to worry about whether a subcommand is sparse-aware, and
>> how may their behaviors differ. Though we still may need to update the
>> actual behavior in each subcommand over an extended period of time
>> (though may not be difficult?), which you mentioned above "seems like a
>> poor strategy".
>>>  https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
>>> and the responses to that email>
>>>> Change the default behavior of 'git grep' to focus on the files within
>>>> the sparse-checkout definition. To enable the previous behavior, add a
>>>> '--sparse' option to 'git grep' that triggers the old behavior that
>>>> inspects paths outside of the sparse-checkout definition when paired
>>>> with the '--cached' option.
>>> I still think the flag name of `--sparse` is totally backwards and
>>> highly confusing for the described behavior. I missed Stolee's email
>>> at the time (wasn't cc'ed) where he brought up that "--sparse" had
>>> already been added to "git-add" and "git-rm", but in those cases the
>>> commands aren't querying and I just don't see how they lead to the
>>> same level of user confusion. This one seems glaringly wrong to me
>>> and both Junio and I flagged it on v1 when we first saw it. (Perhaps
>>> it also helps that for the add/rm cases, that a user is often given an
>>> error message with the suggested flag to use, which just doesn't make
>>> sense here either.) If there is concern that this flag should be the
>>> same as add and rm, then I think we need to do the backward
>>> compatibility dance and fix add and rm by adding an alias over there
>>> so that grep's flag won't be so confusing.
>> I guess I'm using "--sparse" here because "add", "rm" and "mv" all imply
>> that "when operating on a sparse path, ignores/warns unless '--sparse'
>> is used". I take it as an analogy so "when searching a sparse path,
>> ignores/warns unless '--sparse' is used". As the idea that "Git does
>> *not* care sparse contents unless '--[no-]sparse' is specified" is sort
>> of established through the implementations in "add", "rm", or "mv", I
>> don't see a big problem using "--sparse" here.
> Well, I do.
> In addition to just being utterly backwards and confusing in the
> context of grep:
> * Both `clone` and `ls-files` use `--sparse` to mean to limit things
> to the sparsity cone, so we're already kinda split-brained.
> * grep is more like ls-files (both being querying functions) than
> add/rm/mv, so should really follow its lead instead of the one from
> * There's another way to interpret `--sparse` for `add` and `rm`
> such that it makes sense (at least to me); see my other email to Junio
> in this thread.
According to the spirit of your points, I think they should be
defaulting to --restrict (a rename perhaps) right now.
> * `mv` is indeed using it backward, but the `mv` change is new to
> this cycle (and undocumented) so I'm not sure it counts as much of a
> precedent yet.
Oops, I was making the modifications to `mv` and forgot to add
documentation to it. Though the --sparse of `mv` was not documented
before I touching it. Perhaps it can be added later if we are going to
rename --sparse/--dense to --restrict/--no-restrict.
>> I *think*, as long as the users are informed that the default is to
>> ignore things outside of the sparse-checkout definition, and they have
>> to do something (using "--sparse" or a potential better name) to
>> override the default, we are safe to use a name that is famous (i.e.
>> "--sparse") even though its literal meaning is not perfectly descriptive.
>> One outlier I do find confusing though, is the "--sparse" option from
>> "git-ls-files". Without it, Git expands the index and show everything
>> outside of sparse-checkout definition, which seems a bit controversial.
> Nah, that perfectly matches the expectation of users in the second
> usecase above -- querying (ls-files/grep/log/diff) defaults to
> non-restricted history, modifying (add/rm/mv) defaults to restricted
> paths but warns if the arguments could have matched something else,
> and the working tree is restricted to sparse paths. It doesn't seem
> too controversial to me, even if it's not what we want for the
> long-term default.
OK. After the reasoning you gave above, now the --sparse of ls-files
> The defaults for the first usecase would be defaulting to restricted
> paths for everything, and perhaps not warn if arguments to a modifying
> command could have matched something else.
> Anyway, hope that helps you understand my perspective and framing.
Thanks for the explanations, now I get it and agree with your points :)
next prev parent reply other threads:[~2022-09-19 4:13 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-17 7:56 [PATCH v1 0/2] grep: integrate with sparse index Shaoxuan Yuan
2022-08-17 7:56 ` [PATCH v1 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-17 14:12 ` Derrick Stolee
2022-08-17 17:13 ` Junio C Hamano
2022-08-17 17:34 ` Victoria Dye
2022-08-17 17:43 ` Derrick Stolee
2022-08-17 18:47 ` Junio C Hamano
2022-08-17 17:37 ` Elijah Newren
2022-08-24 18:20 ` Shaoxuan Yuan
2022-08-24 19:08 ` Derrick Stolee
2022-08-17 7:56 ` [PATCH v1 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-17 14:23 ` Derrick Stolee
2022-08-24 21:06 ` Shaoxuan Yuan
2022-08-25 0:39 ` Derrick Stolee
2022-08-17 13:46 ` [PATCH v1 0/2] grep: " Derrick Stolee
2022-08-29 23:28 ` [PATCH v2 " Shaoxuan Yuan
2022-08-29 23:28 ` [PATCH v2 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-29 23:28 ` [PATCH v2 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-30 13:45 ` Derrick Stolee
2022-09-01 4:57 ` [PATCH v3 0/3] grep: " Shaoxuan Yuan
2022-09-01 4:57 ` [PATCH v3 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-01 4:57 ` [PATCH v3 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-01 4:57 ` [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-01 17:03 ` Derrick Stolee
2022-09-01 18:31 ` Shaoxuan Yuan
2022-09-01 17:17 ` Junio C Hamano
2022-09-01 17:27 ` Junio C Hamano
2022-09-01 22:49 ` Shaoxuan Yuan
2022-09-01 22:36 ` Shaoxuan Yuan
2022-09-02 3:28 ` Victoria Dye
2022-09-02 18:47 ` Shaoxuan Yuan
2022-09-03 0:36 ` [PATCH v4 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-03 0:36 ` [PATCH v4 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-03 0:36 ` [PATCH v4 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-03 0:36 ` [PATCH v4 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-03 4:39 ` Junio C Hamano
2022-09-08 0:24 ` Shaoxuan Yuan
2022-09-08 0:18 ` [PATCH v5 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-08 0:18 ` [PATCH v5 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-10 1:07 ` Victoria Dye
2022-09-14 6:08 ` Elijah Newren
2022-09-15 2:57 ` Junio C Hamano
2022-09-18 2:14 ` Elijah Newren
2022-09-18 19:52 ` Victoria Dye
2022-09-19 1:23 ` Junio C Hamano
2022-09-19 4:27 ` Shaoxuan Yuan
2022-09-19 11:03 ` Ævar Arnfjörð Bjarmason
2022-09-20 7:13 ` Elijah Newren
2022-09-17 3:34 ` Shaoxuan Yuan
2022-09-18 4:24 ` Elijah Newren
2022-09-19 4:13 ` Shaoxuan Yuan [this message]
2022-09-17 3:45 ` Shaoxuan Yuan
2022-09-08 0:18 ` [PATCH v5 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-08 0:18 ` [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-08 17:59 ` Junio C Hamano
2022-09-08 20:46 ` Derrick Stolee
2022-09-08 20:56 ` Junio C Hamano
2022-09-08 21:06 ` Shaoxuan Yuan
2022-09-09 12:49 ` Derrick Stolee
2022-09-13 17:23 ` Junio C Hamano
2022-09-10 2:04 ` Victoria Dye
2022-09-23 4:18 ` [PATCH v6 0/1] grep: integrate with sparse index Shaoxuan Yuan
2022-09-23 4:18 ` [PATCH v6 1/1] builtin/grep.c: " Shaoxuan Yuan
2022-09-23 16:40 ` Junio C Hamano
2022-09-23 16:58 ` Junio C Hamano
2022-09-26 17:28 ` Junio C Hamano
2022-09-23 14:13 ` [PATCH v6 0/1] grep: " Derrick Stolee
2022-09-23 16:01 ` Victoria Dye
2022-09-23 17:08 ` Junio C Hamano
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).