git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Shaoxuan Yuan <shaoxuan.yuan02@gmail.com>
Cc: Derrick Stolee <derrickstolee@github.com>,
	Victoria Dye <vdye@github.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v5 1/3] builtin/grep.c: add --sparse option
Date: Sat, 17 Sep 2022 21:24:23 -0700	[thread overview]
Message-ID: <CABPp-BG4_JepP089uOwcRZVcnEM_C_-OvsUzAtPkZdAEyuJTHw@mail.gmail.com> (raw)
In-Reply-To: <fed3c401-6cba-c729-74a8-d0bf53e12699@gmail.com>

On Fri, Sep 16, 2022 at 8:34 PM Shaoxuan Yuan <shaoxuan.yuan02@gmail.com> wrote:
>
> > I'm also curious whether there shouldn't be a config option for
> > something like this, so folks don't have to specify it with every
> > invocation.  In particular, while I certainly have users that want to
> > just query git for information about the part of the history they are
> > interested in, there are other users who are fully aware they are
> > working in a bigger repository and want to search for additional
> > things to add to their sparse-checkout and predominantly use grep for
> > things like that.  They have even documented that `git grep --cached
> > <TERM>` can be used in sparse-checkouts for this purpose...and have
> > been using that for a few years.  (I did warn them at the time that
> > there was a risk they'd have to change their command, but it's still
> > going to be a behavioral change they might not expect.)  Further, when
> > I brought up changing the behavior of commands during sparse-checkouts
> > to limit to files matching the sparsity paths in that old thread at
> > [1], Stolee was a bit skeptical of making that the default.  That
> > suggests, at least, that two independent groups of users would want to
> > use the non-sparse searching frequently, and frequently enough that
> > they'd appreciate a config option.
>
> A config option sounds good. Though I think
>
> 1. If this option is for global behavior: users may better off turning
> off sparse-checkout if they want a config to do things densely everywhere.

Sorry, it sounds like I haven't explained the usecases to you very
well.  Let me try again.

There are people who want to do everything densely, as you say, and
those folks can just turn off sparse-checkout or not use it in the
first place.  Git has traditionally catered to these folks just fine.
However, it's not a subset of interest for this discussion and wasn't
what I was talking about.

There are (at least) two different usecases for people wanting to use
sparse-checkouts; I have users that fall under each category:


1) Working on a repository subset; users are _only_ interested in that subset.

This usecase is very poorly supported in Git right now, but I think
you understand it so I'll only briefly describe it.

These folks might know there are other things in the repository, but
don't care.  Not only should the working tree be sparse, but grep,
log, diff, etc. should be restricted to the subset of the tree they
are interested in.

Restricting operations to the sparsity specification is also important
for marrying partial clones with sparse checkouts while allowing
disconnected development.  Without such a restrict-to-sparsity-paths
feature, the partial clones will attempt to download objects the first
time they try to grep an old revision, or do log with a glob path.
The download will fail, causing the operation to fail, and break the
ability of the user to work in a disconnected manner.


2) The working directory is sparse, but users are working in a larger whole.

Stolee described this usecase this way[2]:

"I'm also focused on users that know that they are a part of a larger
whole. They know they are operating on a large repository but focus on
what they need to contribute their part. I expect multiple "roles" to
use very different, almost disjoint parts of the codebase. Some other
"architect" users operate across the entire tree or hop between different
sections of the codebase as necessary. In this situation, I'm wary of
scoping too many features to the sparse-checkout definition, especially
"git log," as it can be too confusing to have their view of the codebase
depend on your "point of view."

[2] https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/

I describe it very similarly, but I'd like to point out something
additional around this usecase and how it can be influenced by
dependencies.  The first cut for sparse-checkouts is usually the
directories you are interested in plus what those directories depend
upon within your repository.  But there's a monkey wrench here: if you
have integration tests, they invert the hierarchy: to run integration
tests, you need not only what you are interested in and its
dependencies, you also need everything that depends upon what you are
interested in or that depends upon one of your dependencies...AND you
need all the dependencies of that expanded group.  That can easily
change your sparse-checkout into a nearly dense one.  Naturally, that
tends to kill the benefits of sparse-checkouts.  There are a couple
solutions to this conundrum: either avoid grabbing dependencies (maybe
have built versions of your dependencies pulled from a CI cache
somewhere), or say that users shouldn't run integration tests directly
and instead do it on the CI server when they submit a code review.  Or
do both.  Regardless of whether you stub out your dependencies or stub
out the things that depend upon you, there is certainly a reason to
want to query and be aware of those other parts of the repository.
Thus, sparse-checkouts can be used to limit what you directly build
and modify, but these users do not want it to limit their queries of
history.


Once users pick either the first or the second usecase, they often
stick within it.  For either group, regardless of what Git's default
is, needing to specify an additional flag for *every*
grep/log/diff/etc. they run would just be a total annoyance.  Neither
wants a dense worktree, but one side wants a dense history query while
the other wants a sparse one.  Different groups should be able to
configure the default that works well for them, much like we allow
users to configure whether they want "git pull" to rebase or merge.

> 2. If this option is for a single subcommand (e.g. 'grep'): I don't have
> much thoughts here. It certainly can be nice for users who need to do
> non-sparse searching frequently. This design, if necessary, should
> belong to a patch where this config is added for every single subcommand?
>
> > I also brought up in that old thread that perhaps we want to avoid
> > adding a flag to every subcommand, and instead just having a
> > git-global flag for triggering this type of behavior.  (e.g. `git
> > --no-restrict grep --cached ...` or `git --dense grep --cached ...`).
>
> This looks more like the answer to me. It's a peace of mind for users if
> they don't have to worry about whether a subcommand is sparse-aware, and
> how may their behaviors differ. Though we still may need to update the
> actual behavior in each subcommand over an extended period of time
> (though may not be difficult?), which you mentioned above "seems like a
> poor strategy".
>
> > [1] https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
> > and the responses to that email>
> >> Change the default behavior of 'git grep' to focus on the files within
> >> the sparse-checkout definition. To enable the previous behavior, add a
> >> '--sparse' option to 'git grep' that triggers the old behavior that
> >> inspects paths outside of the sparse-checkout definition when paired
> >> with the '--cached' option.
> >
> > I still think the flag name of `--sparse` is totally backwards and
> > highly confusing for the described behavior.  I missed Stolee's email
> > at the time (wasn't cc'ed) where he brought up that "--sparse" had
> > already been added to "git-add" and "git-rm", but in those cases the
> > commands aren't querying and I just don't see how they lead to the
> > same level of user confusion.  This one seems glaringly wrong to me
> > and both Junio and I flagged it on v1 when we first saw it.  (Perhaps
> > it also helps that for the add/rm cases, that a user is often given an
> > error message with the suggested flag to use, which just doesn't make
> > sense here either.)  If there is concern that this flag should be the
> > same as add and rm, then I think we need to do the backward
> > compatibility dance and fix add and rm by adding an alias over there
> > so that grep's flag won't be so confusing.
>
> I guess I'm using "--sparse" here because "add", "rm" and "mv" all imply
> that "when operating on a sparse path, ignores/warns unless '--sparse'
> is used". I take it as an analogy so "when searching a sparse path,
> ignores/warns unless '--sparse' is used". As the idea that "Git does
> *not* care sparse contents unless '--[no-]sparse' is specified" is sort
> of established through the implementations in "add", "rm", or "mv", I
> don't see a big problem using "--sparse" here.

Well, I do.

In addition to just being utterly backwards and confusing in the
context of grep:
  * Both `clone` and `ls-files` use `--sparse` to mean to limit things
to the sparsity cone, so we're already kinda split-brained.
  * grep is more like ls-files (both being querying functions) than
add/rm/mv, so should really follow its lead instead of the one from
add/rm/mv.
  * There's another way to interpret `--sparse` for `add` and `rm`
such that it makes sense (at least to me); see my other email to Junio
in this thread.
  * `mv` is indeed using it backward, but the `mv` change is new to
this cycle (and undocumented) so I'm not sure it counts as much of a
precedent yet.

> I *think*, as long as the users are informed that the default is to
> ignore things outside of the sparse-checkout definition, and they have
> to do something (using "--sparse" or a potential better name) to
> override the default, we are safe to use a name that is famous (i.e.
> "--sparse") even though its literal meaning is not perfectly descriptive.
>
> One outlier I do find confusing though, is the "--sparse" option from
> "git-ls-files". Without it, Git expands the index and show everything
> outside of sparse-checkout definition, which seems a bit controversial.

Nah, that perfectly matches the expectation of users in the second
usecase above -- querying (ls-files/grep/log/diff) defaults to
non-restricted history, modifying (add/rm/mv) defaults to restricted
paths but warns if the arguments could have matched something else,
and the working tree is restricted to sparse paths.  It doesn't seem
too controversial to me, even if it's not what we want for the
long-term default.

The defaults for the first usecase would be defaulting to restricted
paths for everything, and perhaps not warn if arguments to a modifying
command could have matched something else.


Anyway, hope that helps you understand my perspective and framing.

  reply	other threads:[~2022-09-18  4:32 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-17  7:56 [PATCH v1 0/2] grep: integrate with sparse index Shaoxuan Yuan
2022-08-17  7:56 ` [PATCH v1 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-17 14:12   ` Derrick Stolee
2022-08-17 17:13     ` Junio C Hamano
2022-08-17 17:34       ` Victoria Dye
2022-08-17 17:43         ` Derrick Stolee
2022-08-17 18:47           ` Junio C Hamano
2022-08-17 17:37     ` Elijah Newren
2022-08-24 18:20     ` Shaoxuan Yuan
2022-08-24 19:08       ` Derrick Stolee
2022-08-17  7:56 ` [PATCH v1 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-17 14:23   ` Derrick Stolee
2022-08-24 21:06     ` Shaoxuan Yuan
2022-08-25  0:39       ` Derrick Stolee
2022-08-17 13:46 ` [PATCH v1 0/2] grep: " Derrick Stolee
2022-08-29 23:28 ` [PATCH v2 " Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 1/2] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-08-29 23:28   ` [PATCH v2 2/2] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-08-30 13:45     ` Derrick Stolee
2022-09-01  4:57 ` [PATCH v3 0/3] grep: " Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-01  4:57   ` [PATCH v3 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-01 17:03     ` Derrick Stolee
2022-09-01 18:31       ` Shaoxuan Yuan
2022-09-01 17:17     ` Junio C Hamano
2022-09-01 17:27       ` Junio C Hamano
2022-09-01 22:49         ` Shaoxuan Yuan
2022-09-01 22:36       ` Shaoxuan Yuan
2022-09-02  3:28     ` Victoria Dye
2022-09-02 18:47       ` Shaoxuan Yuan
2022-09-03  0:36 ` [PATCH v4 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-03  0:36   ` [PATCH v4 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-03  4:39     ` Junio C Hamano
2022-09-08  0:24       ` Shaoxuan Yuan
2022-09-08  0:18 ` [PATCH v5 0/3] grep: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 1/3] builtin/grep.c: add --sparse option Shaoxuan Yuan
2022-09-10  1:07     ` Victoria Dye
2022-09-14  6:08     ` Elijah Newren
2022-09-15  2:57       ` Junio C Hamano
2022-09-18  2:14         ` Elijah Newren
2022-09-18 19:52           ` Victoria Dye
2022-09-19  1:23             ` Junio C Hamano
2022-09-19  4:27             ` Shaoxuan Yuan
2022-09-19 11:03             ` Ævar Arnfjörð Bjarmason
2022-09-20  7:13             ` Elijah Newren
2022-09-17  3:34       ` Shaoxuan Yuan
2022-09-18  4:24         ` Elijah Newren [this message]
2022-09-19  4:13           ` Shaoxuan Yuan
2022-09-17  3:45       ` Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 2/3] builtin/grep.c: integrate with sparse index Shaoxuan Yuan
2022-09-08  0:18   ` [PATCH v5 3/3] builtin/grep.c: walking tree instead of expanding index with --sparse Shaoxuan Yuan
2022-09-08 17:59     ` Junio C Hamano
2022-09-08 20:46       ` Derrick Stolee
2022-09-08 20:56         ` Junio C Hamano
2022-09-08 21:06           ` Shaoxuan Yuan
2022-09-09 12:49           ` Derrick Stolee
2022-09-13 17:23         ` Junio C Hamano
2022-09-10  2:04     ` Victoria Dye
2022-09-23  4:18 ` [PATCH v6 0/1] grep: integrate with sparse index Shaoxuan Yuan
2022-09-23  4:18   ` [PATCH v6 1/1] builtin/grep.c: " Shaoxuan Yuan
2022-09-23 16:40     ` Junio C Hamano
2022-09-23 16:58     ` Junio C Hamano
2022-09-26 17:28       ` Junio C Hamano
2022-09-23 14:13   ` [PATCH v6 0/1] grep: " Derrick Stolee
2022-09-23 16:01   ` Victoria Dye
2022-09-23 17:08     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BG4_JepP089uOwcRZVcnEM_C_-OvsUzAtPkZdAEyuJTHw@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).