git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Bert Wesarg <bert.wesarg@googlemail.com>
Cc: Matheus Tavares Bernardino <matheus.bernardino@usp.br>,
	git <git@vger.kernel.org>, Jeff King <peff@peff.net>,
	Derrick Stolee <dstolee@microsoft.com>,
	Taylor Blau <me@ttaylorr.com>
Subject: Re: git-grep in sparse checkout
Date: Tue, 1 Oct 2019 09:12:26 -0700	[thread overview]
Message-ID: <CABPp-BGuFhDwWZBRaD3nA8ui46wor-4=Ha1G1oApsfF8KNpfGQ@mail.gmail.com> (raw)
In-Reply-To: <CAKPyHN3G-j6p8YZmk+07Sb3tL9vua_R-hJ=W81O7vCYr07AkxA@mail.gmail.com>

On Tue, Oct 1, 2019 at 6:30 AM Bert Wesarg <bert.wesarg@googlemail.com> wrote:
>
> Hi,
>
> On Tue, Oct 1, 2019 at 3:06 PM Matheus Tavares Bernardino
> <matheus.bernardino@usp.br> wrote:
> >
> > Hi,
> >
> > During Git Summit it was mentioned that git-grep searches outside
> > sparsity pattern which is not aligned with user expectation. I took a
> > quick look at it and it seems the reason is
> > builtin/grep.c:grep_cache() (which also greps worktree) will grep the
> > object store when a given index entry has the CE_SKIP_WORKTREE bit
> > turned on.
> >
>
> I also had once this problem and found that out and wrote a patch. I
> was just about to send this patch out.
>
> Btw, ls-files should also learn to skip worktree files.
>
> Stay tuned.

I too have a small patch for just grep without --cached or revisions
(it's only a few lines), but it's very incomplete as that is the only
usecase it handles.  For the usecases I'm closest too, what users have
reported they want is essentially a miniature repo where they work on
stuff they care about and ignore the rest.  As such, the desired
functionality for these users is:

* git grep, by default, should only search within the sparsity pattern
* git grep --cached and git grep $REVISION should also only search
within the sparsity pattern
* git diff $REV1 $REV2, git diff $REV1, etc., should by default only
search within sparsity patterns
* git log should by default only show commits modifying files matching
the sparsity patterns
* for all of these, there should be some kind of
--ignore-sparsity-patterns flag to allow searching outside the
sparsity pattern
* other commands (archive, bisect, clean?, gitk, shortlog, blame,
fsck?, etc.) likely need to pay attention to sparsity patterns as
well, but there are some special cases:

* merge, cherry-pick, and rebase (anything touching the merge
machinery) will need to expand the size of the non-sparse worktree if
there are files outside the sparsity patterns with conflicts.  (Though
merge should do a better job of not expanding the non-sparse worktree
when files can cleanly be resolved.)
* ls-files has a -t option which can be used to differentiate which
entries in the index are skip-worktree (S) and which are not.  As
such, use of that flag should probably imply
--ignore-sparsity-patterns
* fast-export and format-patch are not about viewing history but about
exporting it, and limiting to sparsity patterns would result in the
creation of an incompatible history.  As such, they should error out
without a --ignore-sparsity-patterns when invoked from a repository
that has a sparse checkout.
* We may want to augment status with additional information to remind
users they are in a sparse checkout
* New worktrees, by default, should copy the sparsity-patterns of the
worktree they were created from (much like a new shell inherits the
current working directory of it's parent process)

I should note here that Stolee wasn't so sure about having 'log' only
show commits that touched files within the sparse patterns, so we may
need some kind of config setting and have a good usability story for
what each of the settings means and usecases in order to guide how to
handle weird cases better.

Also, as if that weren't enough, there are two more challenges too:
1) As pointed out by Dscho in the contributor summit, we want
intersection of pathspecs specified by the user and those in the
sparsity patterns; e.g. if the user says `git diff $REV -- '*.c' `, we
want to show them a diff against $REV of all .c files that are within
their sparsity patterns.
2) We have two different kinds of path patterns, one for .gitignore
and sparse-checkout, and the other for command-line pathspecs.  See
https://public-inbox.org/git/xmqq4l1qpiaw.fsf@gitster-ct.c.googlers.com/.
These differences might make the implementation more difficult, and
making the two types of path patterns have more overlap might be a
necessary first step.

However, the "work with a miniature repo" probably makes the VFS for
Git and partial clone stuff easier -- we don't have to worry about the
incessant need to download more blobs after the initial partial clone
because git commands by default would avoid requesting them.  It also
would work quite nicely with a partial index -- we could have a
directory entry in the index and marked as skip-worktree and avoid
having all the paths under it show up in the index.  That would
accelerate many operations within git.



I'd love to work on this, but I've got plenty of other things on my
plate at the moment, so I probably won't get time for it at least
until the middle of next year.  But I thought I'd send out what I view
as the bigger picture.  Also, this is very much still idea stage; the
contributor summit refined some of the ideas and there may be more
refinement as more people in the list chime in.


Hope that helps,
Elijah

  reply	other threads:[~2019-10-01 16:12 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-01 13:06 git-grep in sparse checkout Matheus Tavares Bernardino
2019-10-01 13:30 ` Bert Wesarg
2019-10-01 16:12   ` Elijah Newren [this message]
2019-10-02  6:33     ` Junio C Hamano
2019-10-02 16:46       ` Elijah Newren
2019-10-01 18:29 ` Derrick Stolee
2019-10-02  0:06   ` Matheus Tavares Bernardino
2019-10-02  6:18   ` Junio C Hamano
2019-10-02 12:09     ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BGuFhDwWZBRaD3nA8ui46wor-4=Ha1G1oApsfF8KNpfGQ@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=bert.wesarg@googlemail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=matheus.bernardino@usp.br \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).