All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Elijah Newren <newren@gmail.com>,
	Derrick Stolee via GitGitGadget <gitgitgadget@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Victoria Dye <vdye@github.com>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: Re: [PATCH 2/2] ls-files: add --sparse option
Date: Mon, 22 Nov 2021 14:44:20 -0500	[thread overview]
Message-ID: <191b2db8-1be4-d6a4-1cf6-eaefaf373dd2@gmail.com> (raw)
In-Reply-To: <CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com>


On 11/22/2021 1:36 PM, Elijah Newren wrote:
> On Tue, Nov 16, 2021 at 7:38 AM Derrick Stolee via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
>>
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> Existing callers to 'git ls-files' are expecting file names, not
>> directories. It is best to expand a sparse index to show all of the
>> contained files in this case.
>>
>> However, expert users may want to inspect the contents of the index
>> itself including which directories are sparse. Add a --sparse option to
>> allow users to request this information.
>>
>> During testing, I noticed that options such as --modified did not affect
>> the output when the files in question were outside the sparse-checkout
>> definition. Tests are added to document this preexisting behavior and
>> how it remains unchanged with the sparse index and the --sparse option.
> 
> Sure, 'git diff' and 'git status' don't show anything for such files
> either; we're being consistent.  However, this feels like it's
> normalizing an erroneous condition, possibly starting to cement one of
> the bigger UI problems we have the sparse-checkout (and sparse-index):
> 
> I assert that: Having tracked files be SKIP_WORKTREE despite having a
> file present in the working directory is typically an *error*, and one
> that we need to help users prevent/discover/recover-from.

I do think we need to tread carefully here, since some users use
'git update-index --skip-worktree' to mark a file in their
worktree as "ignored" so they can change it from HEAD without it
ever being staged. This is usually independent of the sparse-
checkout feature, so you might want to focus your efforts to paths
that exist in the worktree but are outside of the sparse-checkout
patterns.

> * I've seen users get into this condition by doing the equivalent of
> either 'git sparse-checkout disable' or 'git sparse-checkout set
> $NEW_PATHS' and then hit Ctrl-C in the middle of its operation.
> * Users also just occasionally copy files from other places (maybe
> even `git show REVISION:FILE >FILE`), and potentially edit.
> 
> In either of the above cases:
> * There is no command provided for discovering these files; not diff,
> status, ls-files, or anything
> * There is no command provided for correcting the problem; not clean,
> not reset, or anything.  They have to do it by hand
> * There are ample opportunities for the error to propagate and grow,
> due to the fact that these files will not be reported or updated.[1]
> 
> [1] For example, the user may first either switch to another branch or
> resets to some other commit.  Neither of those operations update these
> erroneous present-despite-SKIP_WORKTREE files.  And there will still
> be nothing that reports them.  But now the files are out-of-date with
> respect to the new commit.  Now if the user disables or changes the
> sparse checkout, they suddenly have several "changed" files in their
> working tree -- which they didn't touch.  In the best case, they run
> diff or status or something and notice the files and then correct it,
> and just get perplexed at where the changes came from.  But with
> enough users, some percentage of them are just going to commit (some
> of) those changes.  When someone else asks why they modified those
> files, they'll (correctly, as it turns out) claim they never did and
> that no program on their system did.  They might even think those
> files weren't part of the commit and claim that git modified the
> commit behind their back, which would be wrong, but there won't be any
> reasonable logs to check to prove what happened.
> 
> 
> I know this issue is out of scope for your series here, but the
> addition of testcases that purposely set up an erroneous condition
> makes it feel like we're trying to normalize that situation and not
> treat it like an error, and are perhap undercutting future attempts to
> try to fix it.  I'd rather have it explicitly stated that we're
> setting up an erroneous condition in our tests, in order to make sure
> we handle it as best as we can so far -- in a manner in line with diff
> and grep -- but also note that later solutions to this deeper issue
> may affect one or more of these commands.

I appreciate the attention to this scenario, but I also think
that a patch series whose goal is to change how we react when
in this case already needs to make a case for changing the
behavior. Having tests that exist documenting the previous
behavior form a good basis for "it did this before, but now it
will do this," which hopefully further justifies the change.

Personally, I think of _every_ test as "this is what the code
does right now" and the purpose of a test is to show that we
don't change things _unintentionally_. Every test can be changed
if there is sufficient evidence that it should.

>> @@ -838,6 +844,7 @@ test_expect_success 'sparse-index is not expanded' '
>>         init_repos &&
>>
>>         ensure_not_expanded status &&
>> +       ensure_not_expanded ls-files --sparse &&
> 
> Do we also want a
>     ensure_not_expanded ls-files &&
> ?  We don't expect ls-files to write a new index file in any scenario, right?

When the index is sparse, 'git ls-files' will expand before
listing all of the contents, to avoid listing a sparse
directory entry. One way to avoid ensure_full_index() would
be to do trickery with read_tree_at() whenever we find a
sparse directory entry and use the callback to output what
ls-files would have written. However, that's pretty much the
same amount of work as what ensure_full_index() is doing, so
there is likely little benefit to complicating the code for
this reason.

>>         ensure_not_expanded commit --allow-empty -m empty &&
>>         echo >>sparse-index/a &&
>>         ensure_not_expanded commit -a -m a &&
>> @@ -942,6 +949,46 @@ test_expect_success 'sparse index is not expanded: fetch/pull' '
>>         ensure_not_expanded pull full base
>>  '
>>
>> +test_expect_success 'ls-files' '
>> +       init_repos &&
>> +
>> +       # Behavior agrees by default. Sparse index is expanded.
>> +       test_all_match git ls-files &&
>> +
>> +       # With --sparse, the sparse index data changes behavior.
>> +       git -C sparse-index ls-files --sparse >sparse-index-out &&
>> +       grep "^folder1/\$" sparse-index-out &&
>> +       grep "^folder2/\$" sparse-index-out &&
>> +
>> +       # With --sparse and no sparse index, nothing changes.
>> +       git -C sparse-checkout ls-files --sparse >sparse-checkout-out &&
>> +       grep "^folder1/0/0/0\$" sparse-checkout-out &&
>> +       ! grep "/\$" sparse-checkout-out &&
> 
> I'd be tempted to split the test up to this point from the rest of the test.

Every instance of "init_repos" adds to the cost of this test script,
so when possible I'd err on the side of grouping them. It also keeps
test able to isolate with "--run=1,<N>".

>> +
>> +       write_script edit-content <<-\EOF &&
>> +       mkdir folder1 &&
>> +       echo content >>folder1/a
>> +       EOF
>> +       run_on_sparse ../edit-content &&
> 
> As above, since folder1/a is a tracked file, I'd rather we explicitly
> called out that we're intentionally setting up an erroneous condition.
> 
>> +
>> +       # ls-files does not notice modified files whose
>> +       # cache entries are marked SKIP_WORKTREE.
> 
> ...nor does diff, status, etc., as per my lengthy comment from the
> beginning of the email.

I think the existence of this comment points out that "this is
such a strange situation that it requires explanation." A slight
comment change such as

	# In this strange situation where a tracked file
	# exists in the worktree but is marked with the
	# SKIP_WORKTREE bit in the index, Git ignores the
	# worktree contents.

That both points out that this case is unusual, but also that
ls-files isn't the only command that does this.
> Other than my concerns about careful messages with erroneous
> conditions (and the minor question about also having a
> ensure_not_expanded without the --sparse flag), the patch looks good
> to me.
 
Thanks for your careful read!
-Stolee

  reply	other threads:[~2021-11-22 19:44 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16 15:38 [PATCH 0/2] Sparse index: fetch, pull, ls-files Derrick Stolee via GitGitGadget
2021-11-16 15:38 ` [PATCH 1/2] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-11-16 15:38 ` [PATCH 2/2] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-11-22 18:36   ` Elijah Newren
2021-11-22 19:44     ` Derrick Stolee [this message]
2021-11-23  2:07   ` Ævar Arnfjörð Bjarmason
2021-12-08 15:14     ` Derrick Stolee
2021-12-08 15:20       ` Derrick Stolee
2021-12-08 17:04       ` Elijah Newren
2021-12-08 18:23         ` Derrick Stolee
2021-12-08 18:36           ` Elijah Newren
2021-12-08 19:06             ` Derrick Stolee
2021-12-09 12:50               ` Ævar Arnfjörð Bjarmason
2021-12-10 13:57                 ` Derrick Stolee
2021-12-10 15:13                   ` Ævar Arnfjörð Bjarmason
2021-12-13 19:16                   ` Junio C Hamano
2021-12-16 14:11                     ` Derrick Stolee
2021-11-17  9:29 ` [PATCH 0/2] Sparse index: fetch, pull, ls-files Junio C Hamano
2021-11-17 15:28   ` Derrick Stolee
2021-11-18 22:13     ` Junio C Hamano
2021-11-23  1:57 ` Ævar Arnfjörð Bjarmason
2021-12-08 19:39 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2021-12-08 19:39   ` [PATCH v2 1/5] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-12-08 19:39   ` [PATCH v2 2/5] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-12-09  5:08     ` Elijah Newren
2021-12-10 13:51       ` Derrick Stolee
2021-12-08 19:39   ` [PATCH v2 3/5] t1092: replace 'read-cache --table' with 'ls-files --sparse' Derrick Stolee via GitGitGadget
2021-12-09  5:19     ` Elijah Newren
2021-12-08 19:39   ` [PATCH v2 4/5] t1091/t3705: remove 'test-tool read-cache --table' Derrick Stolee via GitGitGadget
2021-12-09  5:20     ` Elijah Newren
2021-12-08 19:39   ` [PATCH v2 5/5] test-read-cache: remove --table, --expand options Derrick Stolee via GitGitGadget
2021-12-09  5:23   ` [PATCH v2 0/5] Sparse index: fetch, pull, ls-files Elijah Newren
2021-12-10 15:13   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-12-10 15:13     ` [PATCH v3 1/5] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-12-10 15:13     ` [PATCH v3 2/5] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-12-10 15:13     ` [PATCH v3 3/5] t1092: replace 'read-cache --table' with 'ls-files --sparse' Derrick Stolee via GitGitGadget
2021-12-10 15:13     ` [PATCH v3 4/5] t1091/t3705: remove 'test-tool read-cache --table' Derrick Stolee via GitGitGadget
2021-12-10 15:13     ` [PATCH v3 5/5] test-read-cache: remove --table, --expand options Derrick Stolee via GitGitGadget
2021-12-10 16:16     ` [PATCH v3 0/5] Sparse index: fetch, pull, ls-files Ævar Arnfjörð Bjarmason
2021-12-10 18:45       ` Elijah Newren
2021-12-11  2:24         ` Ævar Arnfjörð Bjarmason
2021-12-11  4:45           ` Elijah Newren
2021-12-10 18:53     ` Elijah Newren
2021-12-22 14:20     ` [PATCH v4 " Derrick Stolee via GitGitGadget
2021-12-22 14:20       ` [PATCH v4 1/5] fetch/pull: use the sparse index Derrick Stolee via GitGitGadget
2021-12-22 14:20       ` [PATCH v4 2/5] ls-files: add --sparse option Derrick Stolee via GitGitGadget
2021-12-22 14:20       ` [PATCH v4 3/5] t1092: replace 'read-cache --table' with 'ls-files --sparse' Derrick Stolee via GitGitGadget
2021-12-22 14:20       ` [PATCH v4 4/5] t1091/t3705: remove 'test-tool read-cache --table' Derrick Stolee via GitGitGadget
2021-12-22 14:20       ` [PATCH v4 5/5] test-read-cache: remove --table, --expand options Derrick Stolee via GitGitGadget
2021-12-22 19:17       ` [PATCH v4 0/5] Sparse index: fetch, pull, ls-files Elijah Newren
2021-12-22 23:56         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=191b2db8-1be4-d6a4-1cf6-eaefaf373dd2@gmail.com \
    --to=stolee@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.