git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: "ZheNing Hu via GitGitGadget" <gitgitgadget@gmail.com>,
	git@vger.kernel.org,
	"Christian Couder" <christian.couder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jeff King" <peff@peff.net>, "Junio C Hamano" <gitster@pobox.com>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Victoria Dye" <vdye@github.com>,
	"Emily Shaffer" <emilyshaffer@google.com>,
	"Matheus Tavares Bernardino" <matheus.bernardino@usp.br>,
	"Shaoxuan Yuan" <shaoxuan.yuan02@gmail.com>
Subject: Re: [PATCH] [RFC] diff: introduce scope option
Date: Sat, 5 Nov 2022 23:58:26 -0700	[thread overview]
Message-ID: <CABPp-BErUhzm5=c6Tn4nPFKx-Kx0tYwf-J=G+nOFJ2bu-r+Dhg@mail.gmail.com> (raw)
In-Reply-To: <CAOLTT8TceM-NpV2_hUCZj2Dx=W30f_9SHW8CcRH-pw32BRd-oA@mail.gmail.com>

Hi,

On Sat, Nov 5, 2022 at 7:11 PM ZheNing Hu <adlternative@gmail.com> wrote:
> Elijah Newren <newren@gmail.com> 于2022年11月1日周二 13:18写道:
> > On Sun, Oct 30, 2022 at 9:11 PM ZheNing Hu via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > >
> > > From: ZheNing Hu <adlternative@gmail.com>
> > >
> > > When we use sparse-checkout, we often want the set of files
> > > that some commands operate on to be restricted to the
> > > sparse-checkout specification.
> >
> > It seems a bit premature to send this, when the guideline document[*]
> > detailing how these options should work is still in the "Needs Review"
> > state.  I know, it's been waiting for a while, but it's a _long_
> > document.
> >
> > [*] https://lore.kernel.org/git/pull.1367.v3.git.1665269538608.gitgitgadget@gmail.com/
> >
>
> Fine, I just want to start trying to experiment with this feature in
> git-diff earlier,
> and I can wait for the sparse-checkout.txt documentation to finish
> first if needed :)

Note that you may be able to help reduce the wait by reviewing the
document.  (You commented on v1 -- thanks! -- but no one has commented
on any of the newer versions, despite being out for over a month and
still being marked as "Needs Review" in the "What's cooking" reports.)

> > > `--no-scope` is the default
> > > option for now.
> >
> > What does --no-scope even mean?  You didn't explain it, and I don't
> > see how it could make sense.  We explicitly avoided a `--no-` prefix
> > by allowing the --scope option to take a value.  I don't think there
> > should be a --no-scope option.
>
> I think the “--no-scope” here does nothing, as if it were unaffected by scope
> (just like correctly "--scope=all", right?)

Reading your patch, I was unable to determine where you made
--no-scope behave differently than how you implemented --scope=all.
Maybe there was a difference and I missed it.

Anyway, I don't think there should be a --no-scope option.

> > > diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> > > index 3674ac48e92..04bf83e8be1 100644
> > > --- a/Documentation/diff-options.txt
> > > +++ b/Documentation/diff-options.txt
> > > @@ -195,6 +195,24 @@ For instance, if you configured the `diff.algorithm` variable to a
> > >  non-default value and want to use the default one, then you
> > >  have to use `--diff-algorithm=default` option.
> > >
> > > +ifndef::git-format-patch[]
> > > +ifndef::git-log[]
> > > +
> > > +--scope={sparse|all}::
> > > +       Choose diff scope. The variants are as follows:
> > > ++
> > > +--
> > > +`--sparse`;;
> > > +       Restrict diff paths to those matching sparse-checkout specification.
> > > +`--all`;;
> > > +       Without restriction, diff is performed regardless of whether the path
> > > +       meets the sparse-checkout specification.
> > > +--
> > > ++
> > > +
> > > +endif::git-log[]
> > > +endif::git-format-patch[]
> >
> > What about diff-files, diff-index, diff-tree, and show?
> >
>
> diff-options.txt included in their documents, and git-format-patch.txt,
> git-log.txt, but should not in git-format-patch.txt. I don't know if it
> should be included in git-diff-files.txt, because git diff-files compare
> the files in the working tree and the index (so it's the same as
> "simple" git-diff, which should not be affected by scope?)

Oh, good point.  Yeah, git-diff-files.txt should not get this option
as it won't have any affect.  git-diff-index and git-diff-tree and
git-show and git-log should get it, though.

> > > +int diff_path_in_sparse_checkout(const char *path) {
> > > +       if (core_sparse_checkout_cone)
> > > +               return path_in_cone_mode_sparse_checkout(path, the_repository->index);
> > > +       else
> > > +               return path_in_sparse_checkout(path, the_repository->index);
> > > +}
> >
> > This says we are including the path if it matches the sparsity
> > patterns.  Thus, we have to be careful when we use this function,
> > because the relevant paths are ones that match the sparsity
> > specification.  The sparsity specification will always match the
> > sparsity patterns when diffing two commits, but when either the index
> > or the working tree is part of the diff, the sparsity specification
> > *might* be temporarily expanded.
>
> Yes, I may have to look at more code to better understand how and when the
> "sparsity specification" is extended. Any recommendations for places to read?

Yes, please review the newer version of my sparse-checkout.txt
directions file; it covers this in more detail.

> > > @@ -439,6 +459,20 @@ static void do_oneway_diff(struct unpack_trees_options *o,
> >
> > do_oneway_diff is for cases where we are diffing against the index...
> >
> > >                         return; /* nothing to diff.. */
> > >         }
> > >
> > > +       if (revs->diffopt.scope == DIFF_SCOPE_SPARSE) {
> > > +               if (idx && tree) {
> > > +                       if (!diff_paths_in_sparse_checkout(idx->name, tree->name))
> > > +                               return;
> > > +               } else if (idx) {
> > > +                       if (!diff_path_in_sparse_checkout(idx->name))
> > > +                               return;
> > > +               } else if (tree) {
> > > +                       if (!diff_path_in_sparse_checkout(tree->name))
> > > +                               return;
> > > +               } else
> > > +                       return;
> > > +       }
> >
> > ...and you again mistakenly only compare to the sparsity patterns
> > instead of the sparse specification.
> >
>
> So here we should use ce_skip_worktree(idx) to check if this index entry matches
> sparse specification.

ce_skip_worktree(idx) only checks for expansions to the sparse
specification in the working tree.  If dealing with index files, the
sparse specification is also expanded to handle all files in the index
that differ from HEAD.

> > > +               } else
> > > +                       return error(_("invalid --scope value: %s"), optarg);
> > > +       }
> >
> > As written with no follow-on else clause here, wouldn't this silently
> > accept "--scope" without an "=<something>" argument without an error
> > and without properly initializing opt->scope?
> >
>
> Because opt will be initializing with default_diff_options in repo_diff_setup(),
> and opt->scope should respect config value first. So I don't think there's a
> mistake here.

Okay, it's good that you've got the variables initialized somehow, but
that's only half the point here.  The main point is that the user can
specify something that makes no sense and if they do, we should throw
an error informing them of their mistake.  --scope should not be
passed without an argument (either "all" or "sparse", currently), but
this code allowed it.

> > > diff --git a/t/t4070-diff-sparse-checkout-scope.sh b/t/t4070-diff-sparse-checkout-scope.sh
> > > new file mode 100644
> >
> > This needs to be fixed.

Since you didn't comment on this but usually do comment on thing, I'll
just re-ping it to make sure you don't miss the comment about the
incorrect file mode here.

> > > +reset_repo () {
> > > +       rm -rf repo &&
> > > +       git clone --no-checkout temp repo
> >
> > Why --no-checkout rather than say --sparse?
> >
>
> This is because I accidentally associated it with a
> partial clone. I often use "git clone -filter=blob:none -no-checkout"
> to do a partial clone, then "git sparse- checkout set <pattern>"
> to set sparse-checkout patterns, and "git checkout" to download
> the missing blobs and checkout to a branch. But in this
> test file, we only need sparse-checkout, so it's true that I should
> not do such strange no-checkout thing.

Yeah, no need to involve partial clones here.  (As an aside, though, I
think even with a partial clone that --sparse makes more sense than
--no-checkout.)

> > > +change_worktree_and_index() {
> > > +       (
> > > +               cd repo &&
> > > +               mkdir sub2 sub3 &&
> > > +               echo sub1/file3 >sub1/file3 &&
> > > +               echo sub2/file3 >sub2/file3 &&
> > > +               echo sub3/file3 >sub3/file3 &&
> > > +               echo file3 >file3 &&
> > > +               git add --all --sparse &&
> > > +               echo sub1/file3 >>sub1/file3 &&
> > > +               echo sub2/file3 >>sub2/file3 &&
> > > +               echo sub3/file3 >>sub3/file3 &&
> > > +               echo file3 >>file3
> > > +       )
> > > +}
> >
> > It would be nice to modify different paths in the working tree and
> > index, to see if we can handle cases where the sparse specification
> > does not match the sparsity patterns better.  (So, modify files inside
> > and outside the sparsity patterns, stage those changes, and then do a
> > `git sparse-checkout reapply` to make the files outside the sparsity
> > patterns disappear from the working tree...then modify different files
> > in the working tree both inside and outside the sparsity patterns.
> > And also remove some file that matches the sparsity patterns and
> > manually mark it as SKIP_WORKTREE.)  That'd give us much better
> > coverage.
> >
>
> Nice addition. So I should use git update-index --skip-worktree to set
> skip_worktree bit for index entries.

Well, I'd say use normal "sparse-checkout" commands to set most of
them.  However, adding one or two extra SKIP_WORKTREE paths via
running `git update-index --skip-worktree $PATH` (and removing the
corresponding local file) would make the testcases more interesting.

> > I think I'm going to stop reviewing here.  I'm probably just going to
> > keep repeating the same issues I identified earlier if I continue.
>
> Thank you very much for your review, you have pointed out very many
> problems with this patch :)

I'm glad it was helpful.  :)

  reply	other threads:[~2022-11-06  6:58 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31  4:11 [PATCH] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
2022-11-01  1:34 ` Taylor Blau
2022-11-01  2:13   ` ZheNing Hu
2022-11-01  5:18 ` Elijah Newren
2022-11-06  2:11   ` ZheNing Hu
2022-11-06  6:58     ` Elijah Newren [this message]
2022-11-14  9:08       ` ZheNing Hu
2022-11-25  2:45 ` [PATCH v2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00   ` [PATCH v3 0/2] [RFC] diff: introduce scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00     ` [PATCH v3 1/2] [RFC] diff: introduce --scope option ZheNing Hu via GitGitGadget
2022-11-29 12:00     ` [PATCH v3 2/2] [RPC] grep: " ZheNing Hu via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BErUhzm5=c6Tn4nPFKx-Kx0tYwf-J=G+nOFJ2bu-r+Dhg@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=johannes.schindelin@gmx.de \
    --cc=matheus.bernardino@usp.br \
    --cc=peff@peff.net \
    --cc=shaoxuan.yuan02@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).