git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Derrick Stolee <stolee@gmail.com>
Cc: Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 06/27] checkout-index: ensure full index
Date: Wed, 17 Mar 2021 14:10:37 -0700	[thread overview]
Message-ID: <CABPp-BH-c8gzrkOFFNb=8b8R+X+VRXsziKoE_RtcR4mh6zjR4g@mail.gmail.com> (raw)
In-Reply-To: <08ffff48-7b9c-7113-1a5a-557f3efff26f@gmail.com>

On Wed, Mar 17, 2021 at 1:05 PM Derrick Stolee <stolee@gmail.com> wrote:
>
> On 3/17/2021 1:50 PM, Elijah Newren wrote:
> > On Tue, Mar 16, 2021 at 2:17 PM Derrick Stolee via GitGitGadget
> > <gitgitgadget@gmail.com> wrote:
> > With the caveat in the commit message, this change looks okay, but
> > checkout-index may be buggy regardless of the presence of
> > ensure_full_index().  If ensure_full_index() really is needed here
> > because it needs to operate on all SKIP_WORKTREE paths and not just
> > leading directories, that's because it's writing all those
> > SKIP_WORKTREE entries to the working tree.  When it writes them to the
> > working tree, is it clearing the SKIP_WORKTREE bit?  If not, we're in
> > a bit of a pickle...
>
> Perhaps I'm unclear in my intentions with this series: _every_
> insertion of ensure_full_index() is intended to be audited with
> tests in the future. Some might need behavior change, and others
> will not. In this series, I'm just putting in the protections so
> we don't accidentally trigger unexpected behavior.

I think this may be part of my qualms -- what do you mean by not
accidentally triggering unexpected behavior?  In particular, does your
statement imply that whatever behavior you get after putting in
ensure_full_index() is "expected"?  I think I'm reading that
implication into it, and objecting that the behavior with the
ensure_full_index() still isn't expected.  You've only removed a
certain class of unexpected behavior, namely code that wasn't written
to expect tree entries that suddenly gets them.  You haven't handled
the class of "user wants to work with a subset of files, why are all
these unrelated files being munged/updated/computed/shown/etc."
unexpected behavior.

I'm worrying that expectations are being set up such that working with
just a small section of the code will be unusably hard.  There may be
several commands/flags where it could make sense to operate on either
(a) all files in the repo or (b) just on files within your sparse
paths.  If, though, folks interpret operate-on-all-files as the
"normal" mode (and history suggests they will), then people start
adding all kinds of --no-do-this-sparsely flags to each command, and
then users who want sparse operation have to remember to type such a
flag with each and every command they ever run -- despite having taken
at least three steps already to get a sparse-index.

I believe the extended discussions (for _months_!) on just grep & rm,
plus watching a --sparse patch being floated just in the last day for
ls-files suggest to me that this is a _very_ likely outcome and I'm
worried about it.

> Since tests take time to write and review, I was hoping that these
> insertions were minimal enough to get us to a safe place where we
> can remove the guards carefully.
>
> So with that in mind...
>
> > Might be nice to add a
> > /* TODO: audit if this is needed; if it is, we may have other bugs... */
> > or something like that.  But then again, perhaps you're considering
> > all uses of ensure_full_index() to be need-to-be-reaudited codepaths?
> > If so, and we determine we really do need one and want to keep it
> > indefinitely, will we mark those with a comment about why it's
> > considered correct?
> >
> > I just want a way to know what still needs to be audited and what
> > doesn't without doing a lot of history spelunking...
>
> ...every insertion "needs to be audited" in the future. That's a
> big part of the next "phases" in the implementation plan.
>
> As you suggest, it might be a good idea to add a comment to every
> insertion, to mark it as un-audited, such as:
>
>         /* TODO: test if ensure_full_index() is necessary */
>
> We can come back later to delete the comment if it truly is
> necessary (and add tests to guarantee correct behavior). We can
> also remove the comment _and_ the call by modifying the loop
> behavior to do the right thing in some cases.

If it's "needs to be audited for both performance reasons (can we
operate on fewer entries as an invisible doesn't-change-results
optimization) and correctness reasons (should we operate on fewer
entries and given a modified result within a sparse-index because
users would expect that, but maybe provide a special flag for the
users who want to operate on all files in the repo)" and there's also
an agreement that either audited or unaudited ones will be marked (or
both), then great, I'm happy.  If not, can we discuss which part of my
performance/correctness/marking we aren't in agreement on?


Thanks,
Elijah

  reply	other threads:[~2021-03-17 21:11 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-16 21:16 [PATCH 00/27] Sparse Index: API protections Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 01/27] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-03-19 21:01   ` Junio C Hamano
2021-03-20  1:45     ` Derrick Stolee
2021-03-20  1:52     ` Junio C Hamano
2021-03-30 16:53       ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 02/27] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 03/27] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 04/27] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 05/27] add: ensure full index Derrick Stolee via GitGitGadget
2021-03-17 17:35   ` Elijah Newren
2021-03-17 20:35     ` Matheus Tavares Bernardino
2021-03-17 20:55       ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 06/27] checkout-index: " Derrick Stolee via GitGitGadget
2021-03-17 17:50   ` Elijah Newren
2021-03-17 20:05     ` Derrick Stolee
2021-03-17 21:10       ` Elijah Newren [this message]
2021-03-17 21:33         ` Derrick Stolee
2021-03-17 22:36           ` Elijah Newren
2021-03-18  1:17             ` Derrick Stolee
2021-03-16 21:16 ` [PATCH 07/27] checkout: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 08/27] commit: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 09/27] difftool: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 10/27] fsck: " Derrick Stolee via GitGitGadget
2021-03-16 21:16 ` [PATCH 11/27] grep: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 12/27] ls-files: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 13/27] merge-index: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 14/27] rm: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 15/27] sparse-checkout: " Derrick Stolee via GitGitGadget
2021-03-18  5:22   ` Elijah Newren
2021-03-23 13:13     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 16/27] update-index: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 17/27] diff-lib: " Derrick Stolee via GitGitGadget
2021-03-18  5:24   ` Elijah Newren
2021-03-23 13:15     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 18/27] dir: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 19/27] entry: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 20/27] merge-ort: " Derrick Stolee via GitGitGadget
2021-03-18  5:31   ` Elijah Newren
2021-03-23 13:26     ` Derrick Stolee
2021-03-16 21:17 ` [PATCH 21/27] merge-recursive: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 22/27] pathspec: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 23/27] read-cache: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 24/27] resolve-undo: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 25/27] revision: " Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 26/27] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-03-16 21:17 ` [PATCH 27/27] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-03-17 18:03 ` [PATCH 00/27] Sparse Index: API protections Elijah Newren
2021-03-18  6:32   ` Elijah Newren
2021-04-01  1:49 ` [PATCH v2 00/25] " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 01/25] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 02/25] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 03/25] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 04/25] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 05/25] add: ensure full index Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 06/25] checkout-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 07/25] checkout: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 08/25] commit: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 09/25] difftool: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 10/25] fsck: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 11/25] grep: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 12/25] ls-files: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 13/25] merge-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 14/25] rm: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 15/25] stash: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 16/25] update-index: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 17/25] dir: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 18/25] entry: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 19/25] merge-recursive: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 20/25] pathspec: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 21/25] read-cache: " Derrick Stolee via GitGitGadget
2021-04-01  1:49   ` [PATCH v2 22/25] resolve-undo: " Derrick Stolee via GitGitGadget
2021-04-01  1:50   ` [PATCH v2 23/25] revision: " Derrick Stolee via GitGitGadget
2021-04-01  1:50   ` [PATCH v2 24/25] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-04-05 19:32     ` Elijah Newren
2021-04-06 11:46       ` Derrick Stolee
2021-04-01  1:50   ` [PATCH v2 25/25] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-04-05 19:53     ` Elijah Newren
2021-04-01  7:07   ` [PATCH v2 00/25] Sparse Index: API protections Junio C Hamano
2021-04-01 13:32     ` Derrick Stolee
2021-04-05 19:55   ` Elijah Newren
2021-04-12 21:07   ` [PATCH v3 00/26] " Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 01/26] sparse-index: API protection strategy Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 02/26] *: remove 'const' qualifier for struct index_state Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 03/26] read-cache: expand on query into sparse-directory entry Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 04/26] cache: move ensure_full_index() to cache.h Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 05/26] add: ensure full index Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 06/26] checkout-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 07/26] checkout: " Derrick Stolee via GitGitGadget
2021-04-12 21:07     ` [PATCH v3 08/26] commit: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 09/26] difftool: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 10/26] fsck: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 11/26] grep: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 12/26] ls-files: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 13/26] merge-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 14/26] rm: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 15/26] stash: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 16/26] update-index: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 17/26] dir: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 18/26] entry: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 19/26] merge-recursive: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 20/26] pathspec: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 21/26] read-cache: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 22/26] resolve-undo: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 23/26] revision: " Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 24/26] name-hash: don't add directories to name_hash Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 25/26] sparse-index: expand_to_path() Derrick Stolee via GitGitGadget
2021-04-12 21:08     ` [PATCH v3 26/26] name-hash: use expand_to_path() Derrick Stolee via GitGitGadget
2021-04-13 16:02     ` [PATCH v3 00/26] Sparse Index: API protections Elijah Newren
2021-04-14 20:44       ` Junio C Hamano
2021-04-15  2:42         ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BH-c8gzrkOFFNb=8b8R+X+VRXsziKoE_RtcR4mh6zjR4g@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).