All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <derrickstolee@github.com>
To: Jonathan Nieder <jrnieder@gmail.com>,
	Elijah Newren via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, Victoria Dye <vdye@github.com>,
	Derrick Stolee <stolee@gmail.com>,
	Lessley Dennington <lessleydennington@gmail.com>,
	Elijah Newren <newren@gmail.com>,
	Jonathan Tan <jonathantanmy@google.com>,
	jabolopes@google.com, Jeff Hostetler <jeffhostetler@github.com>
Subject: Re: [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
Date: Sun, 20 Feb 2022 11:56:15 -0500	[thread overview]
Message-ID: <0979ce9b-d7be-9f84-0942-201626b488a4@github.com> (raw)
In-Reply-To: <YhBCsg2DCEd9FXjE@google.com>

On 2/18/2022 8:06 PM, Jonathan Nieder wrote:
> (cc-ing Jonathan Tan, Jose Lopes, and Jeff Hostetler, vfs experts)
> Hi Elijah,
> 
> Elijah Newren wrote[1]:
> 
>> The fix is short (~30 lines), but the description is not.  Sorry.
>>
>> There is a set of problems caused by files in what I'll refer to as the
>> "present-despite-SKIP_WORKTREE" state.  This commit aims to not just fix
>> these problems, but remove the entire class as a possibility -- for
>> those using sparse checkouts.  But first, we need to understand the
>> problems this class presents.  A quick outline:
>>
>>    * Problems
>>      * User facing issues
>>      * Problem space complexity
>>      * Maintenance and code correctness challenges
>>    * SKIP_WORKTREE expectations in Git
>>    * Suggested solution
>>    * Pros/Cons of suggested solution
>>    * Notes on testcase modifications
> 
> Thanks for a clear explanation!  This is very helpful.
> 
>> === User facing issues ===
>>
>> There are various ways for users to get files to be present in the
>> working copy despite having the SKIP_WORKTREE bit set for that file in
>> the index.  This may come from:
>>   * various git commands not really supporting the SKIP_WORKTREE bit[1,2]
>>   * users grabbing files from elsewhere and writing them to the worktree
>>     (perhaps even cached in their editor)
>>   * users attempting to "abort" a sparse-checkout operation with a
>>     not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
>>     working tree is not atomic)[3].
>>
>> Once users have present-despite-SKIP_WORKTREE files, any modifications
>> users make to these files will be ignored, possibly to users' confusion.
> [...]
>> The suggests a simple solution: present-despite-SKIP_WORKTREE files
>> should not exist, for those using sparse-checkouts.
> 
> This patch just reached "next", so at $DAYJOB a test for our vfsd[2]
> noticed this change.  The trick behind a Git-based virtual filesystem
> is typically:
> 
> - since we control the filesystem, we can pretend all files are
>   already present.  On access, we obtain the file content from the git
>   object store.  On write, we update the sparse-checkout pattern so
>   that Git knows to start tracking the file.
> 
> - by keeping the sparse-checkout pattern narrow, we minimize the time
>   commands like "git status" need to spend looking for changes in
>   unmodified files.  Controlling the filesystem means we don't need to
>   worry about changes to files that don't match that pattern (since
>   any modification would also trigger a sparse-checkout pattern
>   update).
> 
> If I understand the intent behind this change correctly, it's
> incompatible with that trick.  How would you recommend handling that?
> In the not too far away future, I'd expect something like the "VFS
> projection hook" to handle this use case, but in the meantime, I would
> expect this change to break VFS for Git performance.  A few options:
> 
>  a. We could guard it with a config option.  It would still be a
>     regression for VFS for Git users, but they'd be able to use the
>     config option to restore the expected behavior.  (Or
>     alternatively, such a config option could be disabled by default,
>     but I suspect that would defeat the purpose described for the
>     patch.)
> 
>  b. We could distinguish between the vfsd and the "interrupted and
>     forgot to update SKIP_WORKTREE bits in the index" cases somehow.
>     This sounds complex.
> 
>  c. Something else?
> 
> (b) and (c) aren't sounding obviously good, so (a) seems tempting.
> What do you think?

Just chiming in here to say that we've dealt with these issues in
microsoft/git by special-casing them behind our core_virtualfilesystem
global. A recent example is the changes to 'git add' to prevent
adding a file that is marked as sparse (unless --sparse is specified).
We always allow this when in the virtualized scenario [1].

[1] https://github.com/microsoft/git/blob/2f6531aced2e77a6c1000a923967ae0105383930/builtin/add.c#L50-L54

This seems like something that should be on vfsd to handle, and should
not prevent upstream Git from making changes that benefit its users.

Thanks,
-Stolee

  parent reply	other threads:[~2022-02-20 16:56 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-13 16:43 [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
2022-01-13 23:35   ` Elijah Newren
2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
2022-01-14 15:59   ` [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
2022-02-16  8:51     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:02       ` Elijah Newren
2022-01-14 15:59   ` [PATCH v2 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
2022-01-14 15:59   ` [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
2022-02-16  8:57     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:08       ` Elijah Newren
2022-02-19  1:06     ` Jonathan Nieder
2022-02-19 16:42       ` Elijah Newren
2022-02-19 18:14         ` Jonathan Nieder
2022-02-20  5:28           ` Elijah Newren
2022-02-20 16:56       ` Derrick Stolee [this message]
2022-02-22 23:17         ` Jonathan Nieder
2022-01-14 15:59   ` [PATCH v2 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
2022-02-16  9:15     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:21       ` Elijah Newren
2022-01-14 15:59   ` [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
2022-01-15  1:39     ` Victoria Dye
2022-02-16  9:32     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:30       ` Elijah Newren
2022-02-17  4:40         ` Elijah Newren
2022-01-15  1:51   ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Victoria Dye

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0979ce9b-d7be-9f84-0942-201626b488a4@github.com \
    --to=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=jabolopes@google.com \
    --cc=jeffhostetler@github.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=lessleydennington@gmail.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.