All of lore.kernel.org
 help / color / mirror / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Elijah Newren <newren@gmail.com>
Cc: "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org,
	"Christian Couder" <christian.couder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>, "Junio C Hamano" <gitster@pobox.com>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Victoria Dye" <vdye@github.com>,
	"Emily Shaffer" <emilyshaffer@google.com>,
	"Matheus Tavares Bernardino" <matheus.bernardino@usp.br>,
	"Shaoxuan Yuan" <shaoxuan.yuan02@gmail.com>
Subject: Re: [PATCH] [RFC] diff: introduce scope option
Date: Mon, 14 Nov 2022 17:08:21 +0800	[thread overview]
Message-ID: <CAOLTT8R1K7L5p3+FkUAKMOA5+KcQU3YM8Pq2Oe4u24zi+t9Nrg@mail.gmail.com> (raw)
In-Reply-To: <CABPp-BErUhzm5=c6Tn4nPFKx-Kx0tYwf-J=G+nOFJ2bu-r+Dhg@mail.gmail.com>

Elijah Newren <newren@gmail.com> 于2022年11月6日周日 14:58写道:
>
> Hi,
>
> On Sat, Nov 5, 2022 at 7:11 PM ZheNing Hu <adlternative@gmail.com> wrote:
> > Elijah Newren <newren@gmail.com> 于2022年11月1日周二 13:18写道:
> > > On Sun, Oct 30, 2022 at 9:11 PM ZheNing Hu via GitGitGadget
> > > <gitgitgadget@gmail.com> wrote:
> > > >
> > > > From: ZheNing Hu <adlternative@gmail.com>
> > > >
> > > > When we use sparse-checkout, we often want the set of files
> > > > that some commands operate on to be restricted to the
> > > > sparse-checkout specification.
> > >
> > > It seems a bit premature to send this, when the guideline document[*]
> > > detailing how these options should work is still in the "Needs Review"
> > > state.  I know, it's been waiting for a while, but it's a _long_
> > > document.
> > >
> > > [*] https://lore.kernel.org/git/pull.1367.v3.git.1665269538608.gitgitgadget@gmail.com/
> > >
> >
> > Fine, I just want to start trying to experiment with this feature in
> > git-diff earlier,
> > and I can wait for the sparse-checkout.txt documentation to finish
> > first if needed :)
>
> Note that you may be able to help reduce the wait by reviewing the
> document.  (You commented on v1 -- thanks! -- but no one has commented
> on any of the newer versions, despite being out for over a month and
> still being marked as "Needs Review" in the "What's cooking" reports.)
>

I've skimmed v3 in general, but haven't looked at v4 yet, I'll take a closer
look at it.

> > > > `--no-scope` is the default
> > > > option for now.
> > >
> > > What does --no-scope even mean?  You didn't explain it, and I don't
> > > see how it could make sense.  We explicitly avoided a `--no-` prefix
> > > by allowing the --scope option to take a value.  I don't think there
> > > should be a --no-scope option.
> >
> > I think the “--no-scope” here does nothing, as if it were unaffected by scope
> > (just like correctly "--scope=all", right?)
>
> Reading your patch, I was unable to determine where you made
> --no-scope behave differently than how you implemented --scope=all.
> Maybe there was a difference and I missed it.
>
> Anyway, I don't think there should be a --no-scope option.
>

Fine, I'll get rid of it.

> > > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > > > index 3674ac48e92..04bf83e8be1 100644
> > > > --- a/Documentation/diff-options.txt
> > > > +++ b/Documentation/diff-options.txt
> > > > @@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
> > > >  non-default value and want to use the default one, then you
> > > >  have to use `--diff-algorithm=default` option.
> > > >
> > > > +ifndef::git-format-patch[]
> > > > +ifndef::git-log[]
> > > > +
> > > > +--scope={sparse|all}::
> > > > +       Choose diff scope. The variants are as follows:
> > > > ++
> > > > +--
> > > > +`--sparse`;;
> > > > +       Restrict diff paths to those matching sparse-checkout specification.
> > > > +`--all`;;
> > > > +       Without restriction, diff is performed regardless of whether the path
> > > > +       meets the sparse-checkout specification.
> > > > +--
> > > > ++
> > > > +
> > > > +endif::git-log[]
> > > > +endif::git-format-patch[]
> > >
> > > What about diff-files, diff-index, diff-tree, and show?
> > >
> >
> > diff-options.txt included in their documents, and git-format-patch.txt,
> > git-log.txt, but should not in git-format-patch.txt. I don't know if it
> > should be included in git-diff-files.txt, because git diff-files compare
> > the files in the working tree and the index (so it's the same as
> > "simple" git-diff, which should not be affected by scope?)
>
> Oh, good point.  Yeah, git-diff-files.txt should not get this option
> as it won't have any affect.  git-diff-index and git-diff-tree and
> git-show and git-log should get it, though.
>
> > > > +int diff_path_in_sparse_checkout(const char *path) {
> > > > +       if (core_sparse_checkout_cone)
> > > > +               return path_in_cone_mode_sparse_checkout(path, the_repository->index);
> > > > +       else
> > > > +               return path_in_sparse_checkout(path, the_repository->index);
> > > > +}
> > >
> > > This says we are including the path if it matches the sparsity
> > > patterns.  Thus, we have to be careful when we use this function,
> > > because the relevant paths are ones that match the sparsity
> > > specification.  The sparsity specification will always match the
> > > sparsity patterns when diffing two commits, but when either the index
> > > or the working tree is part of the diff, the sparsity specification
> > > *might* be temporarily expanded.
> >
> > Yes, I may have to look at more code to better understand how and when the
> > "sparsity specification" is extended. Any recommendations for places to read?
>
> Yes, please review the newer version of my sparse-checkout.txt
> directions file; it covers this in more detail.
>

Ok, thanks.

> > > > @@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,
> > >
> > > do_oneway_diff is for cases where we are diffing against the index...
> > >
> > > >                         return; /* nothing to diff.. */
> > > >         }
> > > >
> > > > +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
> > > > +               if (idx && tree) {
> > > > +                       if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
> > > > +                               return;
> > > > +               } else if (idx) {
> > > > +                       if (!diff_path_in_sparse_checkout(idx->name))
> > > > +                               return;
> > > > +               } else if (tree) {
> > > > +                       if (!diff_path_in_sparse_checkout(tree->name))
> > > > +                               return;
> > > > +               } else
> > > > +                       return;
> > > > +       }
> > >
> > > ...and you again mistakenly only compare to the sparsity patterns
> > > instead of the sparse specification.
> > >
> >
> > So here we should use ce_skip_worktree(idx) to check if this index entry matches
> > sparse specification.
>
> ce_skip_worktree(idx) only checks for expansions to the sparse
> specification in the working tree.  If dealing with index files, the
> sparse specification is also expanded to handle all files in the index
> that differ from HEAD.
>

I will read more code to understand this detail.

> > > > +               } else
> > > > +                       return error(_("invalid --scope value: %s"), optarg);
> > > > +       }
> > >
> > > As written with no follow-on else clause here, wouldn't this silently
> > > accept "--scope" without an "=<something>" argument without an error
> > > and without properly initializing opt->scope?
> > >
> >
> > Because opt will be initializing with default_diff_options in repo_diff_setup(),
> > and opt->scope should respect config value first. So I don't think there's a
> > mistake here.
>
> Okay, it's good that you've got the variables initialized somehow, but
> that's only half the point here.  The main point is that the user can
> specify something that makes no sense and if they do, we should throw
> an error informing them of their mistake.  --scope should not be
> passed without an argument (either "all" or "sparse", currently), but
> this code allowed it.
>

Ah, it makes sense, I will fix it.

> > > > diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
> > > > new file mode 100644
> > >
> > > This needs to be fixed.
>
> Since you didn't comment on this but usually do comment on thing, I'll
> just re-ping it to make sure you don't miss the comment about the
> incorrect file mode here.
>

Yes, I've noticed the file permissions issue here, but there may be some other
issues with this test file: it fails when running CI tests, presumably
because of
some shell syntax incompatibility.

> > > > +reset_repo () {
> > > > +       rm -rf repo &&
> > > > +       git clone --no-checkout temp repo
> > >
> > > Why --no-checkout rather than say --sparse?
> > >
> >
> > This is because I accidentally associated it with a
> > partial clone. I often use "git clone -filter=blob:none -no-checkout"
> > to do a partial clone, then "git sparse- checkout set <pattern>"
> > to set sparse-checkout patterns, and "git checkout" to download
> > the missing blobs and checkout to a branch. But in this
> > test file, we only need sparse-checkout, so it's true that I should
> > not do such strange no-checkout thing.
>
> Yeah, no need to involve partial clones here.  (As an aside, though, I
> think even with a partial clone that --sparse makes more sense than
> --no-checkout.)
>

Using --sparse may indeed be better than -no-checkout, as there is no
need to perform an additional checkout after using clone --no-checkout
and sparse-checkout. But it uses sparse-checkout cone mode by default.

A little curious, why can't we specify --no-cone mode when doing git clone
and why can't we specify sparse-checkout patterns here? If such a feature
is available, git clone and git sparse-checkout will be combined in one step.

> > > > +change_worktree_and_index() {
> > > > +       (
> > > > +               cd repo &&
> > > > +               mkdir sub2 sub3 &&
> > > > +               echo sub1/file3 >sub1/file3 &&
> > > > +               echo sub2/file3 >sub2/file3 &&
> > > > +               echo sub3/file3 >sub3/file3 &&
> > > > +               echo file3 >file3 &&
> > > > +               git add --all --sparse &&
> > > > +               echo sub1/file3 >>sub1/file3 &&
> > > > +               echo sub2/file3 >>sub2/file3 &&
> > > > +               echo sub3/file3 >>sub3/file3 &&
> > > > +               echo file3 >>file3
> > > > +       )
> > > > +}
> > >
> > > It would be nice to modify different paths in the working tree and
> > > index, to see if we can handle cases where the sparse specification
> > > does not match the sparsity patterns better.  (So, modify files inside
> > > and outside the sparsity patterns, stage those changes, and then do a
> > > `git sparse-checkout reapply` to make the files outside the sparsity
> > > patterns disappear from the working tree...then modify different files
> > > in the working tree both inside and outside the sparsity patterns.
> > > And also remove some file that matches the sparsity patterns and
> > > manually mark it as SKIP_WORKTREE.)  That'd give us much better
> > > coverage.
> > >
> >
> > Nice addition. So I should use git update-index --skip-worktree to set
> > skip_worktree bit for index entries.
>
> Well, I'd say use normal "sparse-checkout" commands to set most of
> them.  However, adding one or two extra SKIP_WORKTREE paths via
> running `git update-index --skip-worktree $PATH` (and removing the
> corresponding local file) would make the testcases more interesting.
>

Get it.

> > > I think I'm going to stop reviewing here.  I'm probably just going to
> > > keep repeating the same issues I identified earlier if I continue.
> >
> > Thank you very much for your review, you have pointed out very many
> > problems with this patch :)
>
> I'm glad it was helpful.  :)

Thanks.

ZheNing Hu

  reply	other threads:[~2022-11-14  9:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31  4:11 [PATCH] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
2022-11-01  1:34 ` Taylor Blau
2022-11-01  2:13   ` ZheNing Hu
2022-11-01  5:18 ` Elijah Newren
2022-11-06  2:11   ` ZheNing Hu
2022-11-06  6:58     ` Elijah Newren
2022-11-14  9:08       ` ZheNing Hu [this message]
2022-11-25  2:45 ` [PATCH v2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00   ` [PATCH v3 0/2] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00     ` [PATCH v3 1/2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00     ` [PATCH v3 2/2] [RPC] grep: " ZheNing Hu via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8R1K7L5p3+FkUAKMOA5+KcQU3YM8Pq2Oe4u24zi+t9Nrg@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.