All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts)
@ 2022-01-13 16:43 Elijah Newren via GitGitGadget
  2022-01-13 16:43 ` [PATCH 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
                   ` (5 more replies)
  0 siblings, 6 replies; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-13 16:43 UTC (permalink / raw)
  To: git; +Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren

(Maintainer note: This series builds on (v2 of) vd/sparse-clean-etc, because
it tweaks one of the testcases added there.)

(Note 2: There was a previous RFC round of this series at
https://lore.kernel.org/git/20220109045732.2497526-1-newren@gmail.com/.)

Files in the present-despite-SKIP_WORKTREE state have caused no ends of
discussions and bugs[1,2,3,4,5,6,...and lots of others]. Trying to address
the big issue of discovering & recovering from this state has befuddled me
for over a year because I was worried we'd need additional code at every
skip_worktree-checking path in the code (and they are all over the place),
and that we'd make the code significantly slower unless we plumbed a bunch
of additional information all over the place to allow some reasonable
optimizations.

This series tries to solve the problem a bit differently by automatic early
discovery and recovery; as it result, it greatly simplifies the landscape,
reduces our testing matrix burden, and fixes a large swath of bugs. And I
figured out how to get the perf cost down to essentially negligible.

Changes since RFC version:

 * updated the commit messages as per suggestions from Victoria, including
   adding performance measurements
 * renamed the new function to use a clearer name
 * replaced the final patch with a different optimization, which is both
   simpler and performs quite a bit better (the cost for my previous patch 5
   was already decent in many cases, but had a few cases where the cost was
   significant).

Quick overview:

 * Patches 1 & 2 add a test to demonstrate accidental deletion of
   possibly-modified files, and then fix the bug.
 * Patch 3 is the crux of this series; a small amount of code with a huge
   commit message
 * Patch 4 updates the documentation
 * Patch 5 adds an optimization to reduce the performance impact of patch 3

[1]
https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/
[2]
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
[3]
https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/
[4] commit 66b209b ("merge-ort: implement CE_SKIP_WORKTREE handling with
conflicted entries", 2021-03-20) [5] commit ba359fd ("stash: fix stash
application in sparse-checkouts", 2020-12-01) [6]
https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/

Elijah Newren (5):
  t1011: add testcase demonstrating accidental loss of user
    modifications
  unpack-trees: fix accidental loss of user changes
  repo_read_index: clear SKIP_WORKTREE bit from files present in
    worktree
  Update documentation related to sparsity and the skip-worktree bit
  Accelerate clear_skip_worktree_from_present_files() by caching

 Documentation/git-read-tree.txt          | 12 +++-
 Documentation/git-sparse-checkout.txt    | 76 ++++++++++++++----------
 Documentation/git-update-index.txt       | 57 +++++++++++++-----
 repository.c                             |  7 +++
 sparse-index.c                           | 73 +++++++++++++++++++++++
 sparse-index.h                           |  1 +
 t/t1011-read-tree-sparse-checkout.sh     | 23 ++++++-
 t/t1092-sparse-checkout-compatibility.sh | 16 ++---
 t/t3705-add-sparse-checkout.sh           |  2 +
 t/t6428-merge-conflicts-sparse.sh        | 23 ++-----
 t/t7012-skip-worktree-writing.sh         | 44 +++-----------
 t/t7817-grep-sparse-checkout.sh          | 11 +++-
 unpack-trees.c                           |  4 +-
 13 files changed, 235 insertions(+), 114 deletions(-)


base-commit: 80697e9259e4a2c12fd0546f5f1895fb3068a66c
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1114%2Fnewren%2Ffix-present-despite-skip-worktree-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1114/newren/fix-present-despite-skip-worktree-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1114
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/5] t1011: add testcase demonstrating accidental loss of user modifications
  2022-01-13 16:43 [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
@ 2022-01-13 16:43 ` Elijah Newren via GitGitGadget
  2022-01-13 16:43 ` [PATCH 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-13 16:43 UTC (permalink / raw)
  To: git
  Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

If a user has a file with local modifications that is not marked as
SKIP_WORKTREE, but the sparsity patterns are such that it should be
marked that way, and the user then invokes a command like

   * git checkout -q HEAD^

or

   * git read-tree -mu HEAD^

Then the file will be deleted along with all the users' modifications.
Add a testcase demonstrating this problem.

Note: This bug only triggers if something other than 'HEAD' is given;
if the commands above had specified 'HEAD', then the users' file would
be left alone.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t1011-read-tree-sparse-checkout.sh | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 24092c09a95..1b2395b8a82 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -187,6 +187,27 @@ test_expect_success 'read-tree updates worktree, absent case' '
 	test ! -f init.t
 '
 
+test_expect_success 'read-tree will not throw away dirty changes, non-sparse' '
+	echo "/*" >.git/info/sparse-checkout &&
+	read_tree_u_must_succeed -m -u HEAD &&
+
+	echo dirty >init.t &&
+	read_tree_u_must_fail -m -u HEAD^ &&
+	test_path_is_file init.t &&
+	grep -q dirty init.t
+'
+
+test_expect_failure 'read-tree will not throw away dirty changes, sparse' '
+	echo "/*" >.git/info/sparse-checkout &&
+	read_tree_u_must_succeed -m -u HEAD &&
+
+	echo dirty >init.t &&
+	echo sub/added >.git/info/sparse-checkout &&
+	read_tree_u_must_fail -m -u HEAD^ &&
+	test_path_is_file init.t &&
+	grep -q dirty init.t
+'
+
 test_expect_success 'read-tree updates worktree, dirty case' '
 	echo sub/added >.git/info/sparse-checkout &&
 	git checkout -f top &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/5] unpack-trees: fix accidental loss of user changes
  2022-01-13 16:43 [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
  2022-01-13 16:43 ` [PATCH 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
@ 2022-01-13 16:43 ` Elijah Newren via GitGitGadget
  2022-01-13 16:43 ` [PATCH 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-13 16:43 UTC (permalink / raw)
  To: git
  Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

For sparse-checkouts, we don't want unpack-trees to error out on files
that are missing from the worktree, so there has traditionally been
logic to make it skip the verify_uptodate() check for these.
Unfortunately, it was skipping the verify_uptodate() check for files
that were expected to *become* SKIP_WORKTREE.  For files that were not
already SKIP_WORKTREE, that can cause us to later delete the file in
apply_sparse_checkout().  Only skip the check for files that were
already SKIP_WORKTREE as well to avoid lightly discarding important
changes users may have made to files.

Note 1: unpack-trees.c is already a bit complex, and the logic around
CE_SKIP_WORKTREE and CE_NEW_SKIP_WORKTREE in that file are no exception.
I also tried just replacing CE_NEW_SKIP_WORKTREE with CE_SKIP_WORKTREE
in the verify_uptodate() check instead of checking for both flags, and
found that it also fixed this bug and passed all the tests.  I also
attempted to devise a few testcases that might trip either variant of my
fix and was unable to find any problems.  It may be that just checking
CE_SKIP_WORKTREE is a better fix, but I'm not sure.  I thought it
was a bit safer to strictly reduce the number of cases where we skip the
up-to-date check rather than just toggling which kind of cases skip it,
and thus went with the current variant of the fix.

Note 2: I also wondered if verify_absent() might have a similar bug, but
despite my attempts to try to devise a testcase that would trigger such
a thing, I couldn't find any problematic testcases.  Thus, this patch
makes no attempt to apply similar changes to verify_absent() and
verify_absent_if_directory().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t1011-read-tree-sparse-checkout.sh | 2 +-
 unpack-trees.c                       | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 1b2395b8a82..4ed0885bf2f 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -197,7 +197,7 @@ test_expect_success 'read-tree will not throw away dirty changes, non-sparse' '
 	grep -q dirty init.t
 '
 
-test_expect_failure 'read-tree will not throw away dirty changes, sparse' '
+test_expect_success 'read-tree will not throw away dirty changes, sparse' '
 	echo "/*" >.git/info/sparse-checkout &&
 	read_tree_u_must_succeed -m -u HEAD &&
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 98e2f2e0e6f..6d9d89c662a 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2059,7 +2059,9 @@ static int verify_uptodate_1(const struct cache_entry *ce,
 int verify_uptodate(const struct cache_entry *ce,
 		    struct unpack_trees_options *o)
 {
-	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
+	if (!o->skip_sparse_checkout &&
+	    (ce->ce_flags & CE_SKIP_WORKTREE) &&
+	    (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
 		return 0;
 	return verify_uptodate_1(ce, o, ERROR_NOT_UPTODATE_FILE);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  2022-01-13 16:43 [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
  2022-01-13 16:43 ` [PATCH 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
  2022-01-13 16:43 ` [PATCH 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
@ 2022-01-13 16:43 ` Elijah Newren via GitGitGadget
  2022-01-13 16:43 ` [PATCH 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-13 16:43 UTC (permalink / raw)
  To: git
  Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

The fix is short (~30 lines), but the description is not.  Sorry.

There is a set of problems caused by files in what I'll refer to as the
"present-despite-SKIP_WORKTREE" state.  This commit aims to not just fix
these problems, but remove the entire class as a possibility -- for
those using sparse checkouts.  But first, we need to understand the
problems this class presents.  A quick outline:

   * Problems
     * User facing issues
     * Problem space complexity
     * Maintenance and code correctness challenges
   * SKIP_WORKTREE expectations in Git
   * Suggested solution
   * Pros/Cons of suggested solution
   * Notes on testcase modifications

=== User facing issues ===

There are various ways for users to get files to be present in the
working copy despite having the SKIP_WORKTREE bit set for that file in
the index.  This may come from:
  * various git commands not really supporting the SKIP_WORKTREE bit[1,2]
  * users grabbing files from elsewhere and writing them to the worktree
    (perhaps even cached in their editor)
  * users attempting to "abort" a sparse-checkout operation with a
    not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
    working tree is not atomic)[3].

Once users have present-despite-SKIP_WORKTREE files, any modifications
users make to these files will be ignored, possibly to users' confusion.

Further:
  * these files will degrade performance for the sparse-index case due
    to requiring the index to be expanded (see commit 55dfcf9591
    ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why
    we try to delete entire directories outside the sparse cone).
  * these files will not be updated by by standard commands
    (switch/checkout/pull/merge/rebase will leave them alone unless
    conflicts happen -- and even then, the conflicted file may be
    written somewhere else to avoid overwriting the SKIP_WORKTREE file
    that is present and in the way)
  * there is nothing in Git that users can use to discover such
    files (status, diff, grep, etc. all ignore it)
  * there is no reasonable mechanism to "recover" from such a condition
    (neither `git sparse-checkout reapply` nor `git reset --hard` will
    correct it).

So, not only are users modifications ignored, but the files get
progressively more stale over time.  At some point in the future, they
may change their sparseness specification or disable sparse-checkouts.
At that time, all present-despite-SKIP_WORKTREE files will show up as
having lots of modifications because they represent a version from a
different branch or commit.  These might include user-made local changes
from days before, but the only way to tell is to have users look through
them all closely.

If these users come to others for help, there will be no logs that
explain the issue; it's just a mysterious list of changes.  Users might
adamantly claim (correctly, as it turns out) that they didn't modify
these files, while others presume they did.

[1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
[2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/

=== Problem space complexity ===

SKIP_WORKTREE has been part of Git for over a decade.  Duy did lots of
work on it initially, and several others have since come along and put
lots of work into it.  Stolee spent most of 2021 on the sparse-index,
with lots of bugfixes along the way including to non-sparse-index cases
as we are still trying to get sparse checkouts to behave reasonably.
Basically every codepath throughout the treat needs to be aware of an
additional type of file: tracked-but-not-present.  The extra type
results in lots of extra testcases and lots of extra code everywhere.

But, the sad thing is that we actually have more than one extra type.
We have tracked, tracked-but-not-present (SKIP_WORKTREE), and
tracked-but-promised-to-not-be-present-but-is-present-anyway
(present-despite-SKIP_WORKTREE).  Two types is a monumental amount of
effort to support, and adding a third feels a bit like insanity[4].

[4] Some examples of which can be seen at
    https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

=== Maintenance and code correctness challenges ===

Matheus' patches to grep stalled for nearly a year, in part because of
complications of how to handle sparse-checkouts appropriately in all
cases[5][6] (with trying to sanely figure out how to sanely handle
present-despite-SKIP_WORKTREE files being one of the complications).
His rm/add follow-ups also took months because of those kinds of
issues[7].  The corner cases with things like submodules and
SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start
becoming really complex[8].

We've had to add ugly logic to merge-ort to attempt to handle
present-despite-SKIP_WORKTREE files[9], and basically just been forced
to give up in merge-recursive knowing full well that we'll sometimes
silently discard user modifications.  Despite stash essentially being a
merge, it needed extra code (beyond what was in merge-ort and
merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid
a few different bugs that'd result in an early abort with a partial
stash application[10].

[5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t
    and the dates on the thread; also Matheus and I had several
    conversations off-list trying to resolve the issues over that time
[6] ...it finally kind of got unstuck after
    https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
[7] See for example
    https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t
    and quotes like "The core functionality of sparse-checkout has always
    been only partially implemented", a statement I still believe is true
    today.
[8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/
[9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE
    handling with conflicted entries", 2021-03-20)
[10] See commit ba359fd507 ("stash: fix stash application in
     sparse-checkouts", 2020-12-01)

=== SKIP_WORKTREE expectations in Git ===

A couple quotes:

From [11] (before the "sparse-checkout" command existed):
   If it needs too many special cases, hacks, and conditionals, then it
   is not worth the complexity---if it is easier to write a correct code
   by allowing Git to populate working tree files, it is perfectly fine
   to do so.

   In a sense, the sparse checkout "feature" itself is a hack by itself,
   and that is why I think this part should be "best effort" as well.

From the git-sparse-checkout manual (still present today):

   THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
   COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
   THE FUTURE.

[11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/

=== Suggested solution ===

SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as
the name of the option implies, to allow the file to NOT be in the
worktree but consider it to be unchanged rather than deleted.

The suggests a simple solution: present-despite-SKIP_WORKTREE files
should not exist, for those using sparse-checkouts.

Enforce this at index loading time by checking if core.sparseCheckout is
true; if so, check files in the index with the SKIP_WORKTREE bit set to
verify that they are absent from the working tree.  If they are present,
unset the bit (in memory, though any commands that write to the index
will record the update).

Users can, of course, can get the SKIP_WORKTREE bit back such as by
running `git sparse-checkout reapply` (if they have ensured the file is
unmodified and doesn't match the specified sparsity patterns).

=== Pros/Cons of suggested solution ===

Pros:

  * Solves the user visible problems reported above, which I've been
    complaining about for nearly a year but couldn't find a solution to.
  * Helps prevent slow performance degradation with a sparse-index.
  * Much easier behavior in sparse-checkouts for users to reason about
  * Very simple, ~30 lines of code.
  * Significantly simplifies some ugly testcases, and obviates the need
    to test an entire class of potential issues.
  * Reduces code complexity, reasoning, and maintenance.  Avoids
    disagreements about weird corner cases[12].
  * It has been reported that some users might be (ab)using
    SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree
    mechanism[13, and a few other similar references].  These users know
    of multiple caveats and shortcomings in doing so; perhaps not
    surprising given the "SKIP_WORKTREE expecations" section above.
    However, these users use `git update-index --skip-worktree`, and not
    `git sparse-checkout` or core.sparseCheckout=true.  As such, these
    users would be unaffected by this change and can continue abusing
    the system as before.

[12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree

Cons:

  * When core.sparseCheckout is enabled, this adds a performance cost to
    reading the index.  I'll defer discussion of this cost to a subsequent
    patch, since I have some optimizations to add.

=== Notes on testcase modifications ===

The good:
  * t1011: Compare to two cases above it ('read-tree will not throw away
    dirty changes, non-sparse'); since the file is present, it should
    match the non-sparse case now
  * t1092: sparse-index & sparse-checkout now match full-worktree
    behavior in more cases!  Yaay for consistency!
  * t6428, t7012: look at how much simpler the tests become!  Merge and
    stash can just fail early telling the user there's a file in the
    way, instead of not noticing until it's about to write a file and
    then have to implement sudden crash avoidance.  Hurray for sanity!
  * t7817: sparse behavior better matches full tree behavior.  Hurray
    for sanity!

The confusing:
  * t3705: These changes were ONLY needed on Windows, but they don't
    hurt other platforms.  Let's discuss each individually:

    * core.sparseCheckout should be false by default.  Nothing in this
      testcase toggles that until many, many tests later.  However,
      early tests (#5 in particular) were testing `update-index
      --skip-worktree` behavior in a non-sparse-checkout, but the
      Windows tests in CI were behaving as if core.sparseCheckout=true
      had been specified somewhere.  I do not have access to a Windows
      machine.  But I just manually did what should have been a no-op
      and turned the config off.  And it fixed the test.
    * I have no idea why the leftover .gitattributes file from this
      test was causing failures for test #18 on Windows, but only with
      these changes of mine.  Test #18 was checking for empty stderr,
      and specifically wanted to know that some error completely
      unrelated to file endings did not appear.  The leftover
      .gitattributes file thus caused some spurious stderr unrelated to
      the thing being checked.  Since other tests did not intend to
      test normalization, just proactively remove the .gitattributes
      file.  I'm certain this is cleaner and better, I'm just unsure
      why/how this didn't trigger problems before.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 repository.c                             |  7 ++++
 sparse-index.c                           | 21 +++++++++++
 sparse-index.h                           |  1 +
 t/t1011-read-tree-sparse-checkout.sh     |  2 +-
 t/t1092-sparse-checkout-compatibility.sh | 16 ++++-----
 t/t3705-add-sparse-checkout.sh           |  2 ++
 t/t6428-merge-conflicts-sparse.sh        | 23 +++----------
 t/t7012-skip-worktree-writing.sh         | 44 ++++--------------------
 t/t7817-grep-sparse-checkout.sh          | 11 ++++--
 9 files changed, 61 insertions(+), 66 deletions(-)

diff --git a/repository.c b/repository.c
index 34610c5a33e..dddee32258f 100644
--- a/repository.c
+++ b/repository.c
@@ -301,6 +301,13 @@ int repo_read_index(struct repository *repo)
 	if (repo->settings.command_requires_full_index)
 		ensure_full_index(repo->index);
 
+	/*
+	 * If sparse checkouts are in use, check whether paths with the
+	 * SKIP_WORKTREE attribute are missing from the worktree; if not,
+	 * clear that attribute for that path.
+	 */
+	clear_skip_worktree_from_present_files(repo->index);
+
 	return res;
 }
 
diff --git a/sparse-index.c b/sparse-index.c
index a1d505d50e9..b82648b10ee 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -341,6 +341,27 @@ void ensure_correct_sparsity(struct index_state *istate)
 		ensure_full_index(istate);
 }
 
+void clear_skip_worktree_from_present_files(struct index_state *istate)
+{
+	int i;
+	if (!core_apply_sparse_checkout)
+		return;
+
+restart:
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		struct stat st;
+
+		if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
+			if (S_ISSPARSEDIR(ce->ce_mode)) {
+				ensure_full_index(istate);
+				goto restart;
+			}
+			ce->ce_flags &= ~CE_SKIP_WORKTREE;
+		}
+	}
+}
+
 /*
  * This static global helps avoid infinite recursion between
  * expand_to_path() and index_file_exists().
diff --git a/sparse-index.h b/sparse-index.h
index 656bd835b25..633d4fb7e31 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -5,6 +5,7 @@ struct index_state;
 #define SPARSE_INDEX_MEMORY_ONLY (1 << 0)
 int convert_to_sparse(struct index_state *istate, int flags);
 void ensure_correct_sparsity(struct index_state *istate);
+void clear_skip_worktree_from_present_files(struct index_state *istate);
 
 /*
  * Some places in the codebase expect to search for a specific path.
diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 4ed0885bf2f..dd957be1b78 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -212,7 +212,7 @@ test_expect_success 'read-tree updates worktree, dirty case' '
 	echo sub/added >.git/info/sparse-checkout &&
 	git checkout -f top &&
 	echo dirty >init.t &&
-	read_tree_u_must_succeed -m -u HEAD^ &&
+	read_tree_u_must_fail -m -u HEAD^ &&
 	grep -q dirty init.t &&
 	rm init.t
 '
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 53f84881de7..df2839e8e2c 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -370,7 +370,7 @@ test_expect_success 'status/add: outside sparse cone' '
 	write_script edit-contents <<-\EOF &&
 	echo text >>$1
 	EOF
-	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/a &&
 	run_on_all ../edit-contents folder1/new &&
 
 	test_sparse_match git status --porcelain=v2 &&
@@ -379,8 +379,8 @@ test_expect_success 'status/add: outside sparse cone' '
 	test_sparse_match test_must_fail git add folder1/a &&
 	grep "Disable or modify the sparsity rules" sparse-checkout-err &&
 	test_sparse_unstaged folder1/a &&
-	test_sparse_match test_must_fail git add --refresh folder1/a &&
-	grep "Disable or modify the sparsity rules" sparse-checkout-err &&
+	test_all_match git add --refresh folder1/a &&
+	test_must_be_empty sparse-checkout-err &&
 	test_sparse_unstaged folder1/a &&
 	test_sparse_match test_must_fail git add folder1/new &&
 	grep "Disable or modify the sparsity rules" sparse-checkout-err &&
@@ -646,11 +646,11 @@ test_expect_success 'update-index modify outside sparse definition' '
 	run_on_sparse cp ../initial-repo/folder1/a folder1/a &&
 	run_on_all ../edit-contents folder1/a &&
 
-	# If file has skip-worktree enabled, update-index does not modify the
-	# index entry
-	test_sparse_match git update-index folder1/a &&
-	test_sparse_match git status --porcelain=v2 &&
-	test_must_be_empty sparse-checkout-out &&
+	# If file has skip-worktree enabled, but the file is present, it is
+	# treated the same as if skip-worktree is disabled
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git update-index folder1/a &&
+	test_all_match git status --porcelain=v2 &&
 
 	# When skip-worktree is disabled (even on files outside sparse cone), file
 	# is updated in the index
diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
index f3143c92908..61506c1d7ce 100755
--- a/t/t3705-add-sparse-checkout.sh
+++ b/t/t3705-add-sparse-checkout.sh
@@ -19,6 +19,7 @@ setup_sparse_entry () {
 	fi &&
 	git add sparse_entry &&
 	git update-index --skip-worktree sparse_entry &&
+	git config core.sparseCheckout false &&
 	git commit --allow-empty -m "ensure sparse_entry exists at HEAD" &&
 	SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
 }
@@ -126,6 +127,7 @@ test_expect_success 'git add --chmod does not update sparse entries' '
 '
 
 test_expect_success 'git add --renormalize does not update sparse entries' '
+	test_when_finished rm .gitattributes &&
 	test_config core.autocrlf false &&
 	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
 	echo "sparse_entry text=auto" >.gitattributes &&
diff --git a/t/t6428-merge-conflicts-sparse.sh b/t/t6428-merge-conflicts-sparse.sh
index 7e8bf497f82..142c9aaabc5 100755
--- a/t/t6428-merge-conflicts-sparse.sh
+++ b/t/t6428-merge-conflicts-sparse.sh
@@ -112,7 +112,7 @@ test_expect_success 'conflicting entries written to worktree even if sparse' '
 	)
 '
 
-test_expect_merge_algorithm failure success 'present-despite-SKIP_WORKTREE handled reasonably' '
+test_expect_success 'present-despite-SKIP_WORKTREE handled reasonably' '
 	test_setup_numerals in_the_way &&
 	(
 		cd numerals_in_the_way &&
@@ -132,26 +132,13 @@ test_expect_merge_algorithm failure success 'present-despite-SKIP_WORKTREE handl
 
 		test_must_fail git merge -s recursive B^0 &&
 
-		git ls-files -t >index_files &&
-		test_cmp expected-index index_files &&
+		test_path_is_missing .git/MERGE_HEAD &&
 
-		test_path_is_file README &&
 		test_path_is_file numerals &&
 
-		test_cmp expected-merge numerals &&
-
-		# There should still be a file with "foobar" in it
-		grep foobar * &&
-
-		# 5 other files:
-		#   * expected-merge
-		#   * expected-index
-		#   * index_files
-		#   * others
-		#   * whatever name was given to the numerals file that had
-		#     "foobar" in it
-		git ls-files -o >others &&
-		test_line_count = 5 others
+		# numerals should still have "foobar" in it
+		echo foobar >expect &&
+		test_cmp expect numerals
 	)
 '
 
diff --git a/t/t7012-skip-worktree-writing.sh b/t/t7012-skip-worktree-writing.sh
index a1080b94e38..cb9f1a6981e 100755
--- a/t/t7012-skip-worktree-writing.sh
+++ b/t/t7012-skip-worktree-writing.sh
@@ -171,50 +171,20 @@ test_expect_success 'stash restore in sparse checkout' '
 
 		# Put a file in the working directory in the way
 		echo in the way >modified &&
-		git stash apply &&
+		test_must_fail git stash apply 2>error&&
 
-		# Ensure stash vivifies modifies paths...
-		cat >expect <<-EOF &&
-		H addme
-		H modified
-		H removeme
-		H subdir/A
-		S untouched
-		EOF
-		git ls-files -t >actual &&
-		test_cmp expect actual &&
+		grep "changes.*would be overwritten by merge" error &&
 
-		# ...and that the paths show up in status as changed...
-		cat >expect <<-EOF &&
-		A  addme
-		 M modified
-		 D removeme
-		 M subdir/A
-		?? actual
-		?? expect
-		?? modified.stash.XXXXXX
-		EOF
-		git status --porcelain | \
-			sed -e s/stash......./stash.XXXXXX/ >actual &&
-		test_cmp expect actual &&
+		echo in the way >expect &&
+		test_cmp expect modified &&
+		git diff --quiet HEAD ":!modified" &&
 
 		# ...and that working directory reflects the files correctly
-		test_path_is_file    addme &&
+		test_path_is_missing addme &&
 		test_path_is_file    modified &&
 		test_path_is_missing removeme &&
 		test_path_is_file    subdir/A &&
-		test_path_is_missing untouched &&
-
-		# ...including that we have the expected "modified" file...
-		cat >expect <<-EOF &&
-		modified
-		tweaked
-		EOF
-		test_cmp expect modified &&
-
-		# ...and that the other "modified" file is still present...
-		echo in the way >expect &&
-		test_cmp expect modified.stash.*
+		test_path_is_missing untouched
 	)
 '
 
diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh
index 590b99bbb6f..eb595645657 100755
--- a/t/t7817-grep-sparse-checkout.sh
+++ b/t/t7817-grep-sparse-checkout.sh
@@ -83,10 +83,13 @@ test_expect_success 'setup' '
 
 # The test below covers a special case: the sparsity patterns exclude '/b' and
 # sparse checkout is enabled, but the path exists in the working tree (e.g.
-# manually created after `git sparse-checkout init`). git grep should skip it.
+# manually created after `git sparse-checkout init`).  Although b is marked
+# as SKIP_WORKTREE, git grep should notice it IS present in the worktree and
+# report it.
 test_expect_success 'working tree grep honors sparse checkout' '
 	cat >expect <<-EOF &&
 	a:text
+	b:new-text
 	EOF
 	test_when_finished "rm -f b" &&
 	echo "new-text" >b &&
@@ -126,12 +129,16 @@ test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit'
 '
 
 # Note that sub2/ is present in the worktree but it is excluded by the sparsity
-# patterns, so grep should not recurse into it.
+# patterns.  We also explicitly mark it as SKIP_WORKTREE in case it got cleared
+# by previous git commands.  Thus sub2 starts as SKIP_WORKTREE but since it is
+# present in the working tree, grep should recurse into it.
 test_expect_success 'grep --recurse-submodules honors sparse checkout in submodule' '
 	cat >expect <<-EOF &&
 	a:text
 	sub/B/b:text
+	sub2/a:text
 	EOF
+	git update-index --skip-worktree sub2 &&
 	git grep --recurse-submodules "text" >actual &&
 	test_cmp expect actual
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/5] Update documentation related to sparsity and the skip-worktree bit
  2022-01-13 16:43 [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
                   ` (2 preceding siblings ...)
  2022-01-13 16:43 ` [PATCH 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
@ 2022-01-13 16:43 ` Elijah Newren via GitGitGadget
  2022-01-13 16:43 ` [PATCH 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
  2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
  5 siblings, 0 replies; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-13 16:43 UTC (permalink / raw)
  To: git
  Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Make several small updates, to address a few documentation issues
I spotted:
  * sparse-checkout focused on "patterns" even though the inputs (and
    outputs in the case of `list`) are directories in cone-mode
  * The description section of the sparse-checkout documentation
    was a bit sparse (no pun intended), and focused more on internal
    mechanics rather than end user usage.  This made sense in the
    early days when the command was even more experimental, but let's
    adjust a bit to try to make it more approachable to end users who
    may want to consider using it.  Keep the scary backward
    compatibility warning, though; we're still hard at work trying to
    fix up commands to behave reasonably in sparse checkouts.
  * both read-tree and update-index tried to describe how to use the
    skip-worktree bit, but both predated the sparse-checkout command.
    The sparse-checkout command is a far easier mechanism to use and
    for users trying to reduce the size of their working tree, we
    should recommend users to look at it instead.
  * The update-index documentation pointed out that assume-unchanged
    and skip-worktree sounded similar but had different purposes.
    However, it made no attempt to explain the differences, only to
    point out that they were different.  Explain the differences.
  * The update-index documentation focused much more on (internal?)
    implementation details than on end-user usage.  Try to explain
    its purpose better for users of update-index, rather than
    fellow developers trying to work with the SKIP_WORKTREE bit.
  * Clarify that when core.sparseCheckout=true, we treat a file's
    presence in the working tree as being an override to the
    SKIP_WORKTREE bit (i.e. in sparse checkouts when the file is
    present we ignore the SKIP_WORKTREE bit).

Note that this commit, like many touching documentation, is best viewed
with the `--color-words` option to diff/log.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-read-tree.txt       | 12 +++--
 Documentation/git-sparse-checkout.txt | 76 ++++++++++++++++-----------
 Documentation/git-update-index.txt    | 57 +++++++++++++++-----
 3 files changed, 98 insertions(+), 47 deletions(-)

diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index 8c3aceb8324..99bb387134d 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -375,9 +375,14 @@ have finished your work-in-progress), attempt the merge again.
 SPARSE CHECKOUT
 ---------------
 
+Note: The `update-index` and `read-tree` primitives for supporting the
+skip-worktree bit predated the introduction of
+linkgit:git-sparse-checkout[1].  Users are encouraged to use
+`sparse-checkout` in preference to these low-level primitives.
+
 "Sparse checkout" allows populating the working directory sparsely.
-It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
-Git whether a file in the working directory is worth looking at.
+It uses the skip-worktree bit (see linkgit:git-update-index[1]) to
+tell Git whether a file in the working directory is worth looking at.
 
 'git read-tree' and other merge-based commands ('git merge', 'git
 checkout'...) can help maintaining the skip-worktree bitmap and working
@@ -385,7 +390,8 @@ directory update. `$GIT_DIR/info/sparse-checkout` is used to
 define the skip-worktree reference bitmap. When 'git read-tree' needs
 to update the working directory, it resets the skip-worktree bit in the index
 based on this file, which uses the same syntax as .gitignore files.
-If an entry matches a pattern in this file, skip-worktree will not be
+If an entry matches a pattern in this file, or the entry corresponds to
+a file present in the working tree, then skip-worktree will not be
 set on that entry. Otherwise, skip-worktree will be set.
 
 Then it compares the new skip-worktree value with the previous one. If
diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index b81dbe06543..3da3d5a1007 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -3,9 +3,7 @@ git-sparse-checkout(1)
 
 NAME
 ----
-git-sparse-checkout - Initialize and modify the sparse-checkout
-configuration, which reduces the checkout to a set of paths
-given by a list of patterns.
+git-sparse-checkout - Reduce your working tree to a subset of tracked files
 
 
 SYNOPSIS
@@ -17,8 +15,20 @@ SYNOPSIS
 DESCRIPTION
 -----------
 
-Initialize and modify the sparse-checkout configuration, which reduces
-the checkout to a set of paths given by a list of patterns.
+This command is used to create sparse checkouts, which means that it
+changes the working tree from having all tracked files present, to only
+have a subset of them.  It can also switch which subset of files are
+present, or undo and go back to having all tracked files present in the
+working copy.
+
+The subset of files is chosen by providing a list of directories in
+cone mode (which is recommended), or by providing a list of patterns
+in non-cone mode.
+
+When in a sparse-checkout, other Git commands behave a bit differently.
+For example, switching branches will not update paths outside the
+sparse-checkout directories/patterns, and `git commit -a` will not record
+paths outside the sparse-checkout directories/patterns as deleted.
 
 THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
 COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
@@ -28,7 +38,7 @@ THE FUTURE.
 COMMANDS
 --------
 'list'::
-	Describe the patterns in the sparse-checkout file.
+	Describe the directories or patterns in the sparse-checkout file.
 
 'set'::
 	Enable the necessary config settings
@@ -38,20 +48,26 @@ COMMANDS
 	list of arguments following the 'set' subcommand. Update the
 	working directory to match the new patterns.
 +
-When the `--stdin` option is provided, the patterns are read from
-standard in as a newline-delimited list instead of from the arguments.
+When the `--stdin` option is provided, the directories or patterns are
+read from standard in as a newline-delimited list instead of from the
+arguments.
 +
 When `--cone` is passed or `core.sparseCheckoutCone` is enabled, the
-input list is considered a list of directories instead of
-sparse-checkout patterns.  This allows for better performance with a
-limited set of patterns (see 'CONE PATTERN SET' below).  Note that the
-set command will write patterns to the sparse-checkout file to include
-all files contained in those directories (recursively) as well as
-files that are siblings of ancestor directories. The input format
-matches the output of `git ls-tree --name-only`.  This includes
-interpreting pathnames that begin with a double quote (") as C-style
-quoted strings.  This may become the default in the future; --no-cone
-can be passed to request non-cone mode.
+input list is considered a list of directories.  This allows for
+better performance with a limited set of patterns (see 'CONE PATTERN
+SET' below).  The input format matches the output of `git ls-tree
+--name-only`.  This includes interpreting pathnames that begin with a
+double quote (") as C-style quoted strings.  Note that the set command
+will write patterns to the sparse-checkout file to include all files
+contained in those directories (recursively) as well as files that are
+siblings of ancestor directories. This may become the default in the
+future; --no-cone can be passed to request non-cone mode.
++
+When `--no-cone` is passed or `core.sparseCheckoutCone` is not enabled,
+the input list is considered a list of patterns.  This mode is harder
+to use and less performant, and is thus not recommended.  See the
+"Sparse Checkout" section of linkgit:git-read-tree[1] and the "Pattern
+Set" sections below for more details.
 +
 Use the `--[no-]sparse-index` option to use a sparse index (the
 default is to not use it).  A sparse index reduces the size of the
@@ -69,11 +85,10 @@ understand the sparse directory entries index extension and may fail to
 interact with your repository until it is disabled.
 
 'add'::
-	Update the sparse-checkout file to include additional patterns.
-	By default, these patterns are read from the command-line arguments,
-	but they can be read from stdin using the `--stdin` option. When
-	`core.sparseCheckoutCone` is enabled, the given patterns are interpreted
-	as directory names as in the 'set' subcommand.
+	Update the sparse-checkout file to include additional directories
+	(in cone mode) or patterns (in non-cone mode).  By default, these
+	directories or patterns are read from the command-line arguments,
+	but they can be read from stdin using the `--stdin` option.
 
 'reapply'::
 	Reapply the sparsity pattern rules to paths in the working tree.
@@ -117,13 +132,14 @@ decreased in utility.
 SPARSE CHECKOUT
 ---------------
 
-"Sparse checkout" allows populating the working directory sparsely.
-It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
-Git whether a file in the working directory is worth looking at. If
-the skip-worktree bit is set, then the file is ignored in the working
-directory. Git will avoid populating the contents of those files, which
-makes a sparse checkout helpful when working in a repository with many
-files, but only a few are important to the current user.
+"Sparse checkout" allows populating the working directory sparsely.  It
+uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell Git
+whether a file in the working directory is worth looking at. If the
+skip-worktree bit is set, and the file is not present in the working tree,
+then its absence is ignored. Git will avoid populating the contents of
+those files, which makes a sparse checkout helpful when working in a
+repository with many files, but only a few are important to the current
+user.
 
 The `$GIT_DIR/info/sparse-checkout` file is used to define the
 skip-worktree reference bitmap. When Git updates the working
diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 2853f168d97..568dbfe76b8 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -351,6 +351,10 @@ unchanged".  Note that "assume unchanged" bit is *not* set if
 the index (use `git update-index --really-refresh` if you want
 to mark them as "assume unchanged").
 
+Sometimes users confuse the assume-unchanged bit with the
+skip-worktree bit.  See the final paragraph in the "Skip-worktree bit"
+section below for an explanation of the differences.
+
 
 EXAMPLES
 --------
@@ -392,22 +396,47 @@ M foo.c
 SKIP-WORKTREE BIT
 -----------------
 
-Skip-worktree bit can be defined in one (long) sentence: When reading
-an entry, if it is marked as skip-worktree, then Git pretends its
-working directory version is up to date and read the index version
-instead.
-
-To elaborate, "reading" means checking for file existence, reading
-file attributes or file content. The working directory version may be
-present or absent. If present, its content may match against the index
-version or not. Writing is not affected by this bit, content safety
-is still first priority. Note that Git _can_ update working directory
-file, that is marked skip-worktree, if it is safe to do so (i.e.
-working directory version matches index version)
+Skip-worktree bit can be defined in one (long) sentence: Tell git to
+avoid writing the file to the working directory when reasonably
+possible, and treat the file as unchanged when it is not
+present in the working directory.
+
+Note that not all git commands will pay attention to this bit, and
+some only partially support it.
+
+The update-index flags and the read-tree capabilities relating to the
+skip-worktree bit predated the introduction of the
+linkgit:git-sparse-checkout[1] command, which provides a much easier
+way to configure and handle the skip-worktree bits.  If you want to
+reduce your working tree to only deal with a subset of the files in
+the repository, we strongly encourage the use of
+linkgit:git-sparse-checkout[1] in preference to the low-level
+update-index and read-tree primitives.
+
+The primary purpose of the skip-worktree bit is to enable sparse
+checkouts, i.e. to have working directories with only a subset of
+paths present.  When the skip-worktree bit is set, Git commands (such
+as `switch`, `pull`, `merge`) will avoid writing these files.
+However, these commands will sometimes write these files anyway in
+important cases such as conflicts during a merge or rebase.  Git
+commands will also avoid treating the lack of such files as an
+intentional deletion; for example `git add -u` will not not stage a
+deletion for these files and `git commit -a` will not make a commit
+deleting them either.
 
 Although this bit looks similar to assume-unchanged bit, its goal is
-different from assume-unchanged bit's. Skip-worktree also takes
-precedence over assume-unchanged bit when both are set.
+different.  The assume-unchanged bit is for leaving the file in the
+working tree but having Git omit checking it for changes and presuming
+that the file has not been changed (though if it can determine without
+stat'ing the file that it has changed, it is free to record the
+changes).  skip-worktree tells Git to ignore the absence of the file,
+avoid updating it when possible with commands that normally update
+much of the working directory (e.g. `checkout`, `switch`, `pull`,
+etc.), and not have its absence be recorded in commits.  Note that in
+sparse checkouts (setup by `git sparse-checkout` or by configuring
+core.sparseCheckout to true), if a file is marked as skip-worktree in
+the index but is found in the working tree, Git will clear the
+skip-worktree bit for that file.
 
 SPLIT INDEX
 -----------
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/5] Accelerate clear_skip_worktree_from_present_files() by caching
  2022-01-13 16:43 [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
                   ` (3 preceding siblings ...)
  2022-01-13 16:43 ` [PATCH 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
@ 2022-01-13 16:43 ` Elijah Newren via GitGitGadget
  2022-01-13 23:35   ` Elijah Newren
  2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
  5 siblings, 1 reply; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-13 16:43 UTC (permalink / raw)
  To: git
  Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren,
	Elijah Newren

From: Elijah Newren <newren@gmail.com>

Trying to clear the skip-worktree bit from files that are present does
present some computational overhead, for sparse-checkouts.  (We do not
do the bit clearing in non-sparse-checkouts.)  Optimize it as follows:

Rather than lstat()'ing every SKIP_WORKTREE path, take advantage of the
fact that entire directories will often be missing, especially for cone
mode and even more so ever since commit 55dfcf9591 ("sparse-checkout:
clear tracked sparse dirs", 2021-09-08).  If we have already determined
that the parent directory of a file (or other previous ancestor) does
not exist, then the file cannot exist either so we do not need to
lstat() it separately.

Timings for p2000 included below, reformatted to fit in normal commit
message line lengths, which compare three things:
  * Timings before this series
  * Timings of the unoptimized version of
    clear_skip_worktree_from_present_files() from a few commits ago
  * Timings after the optimization in this commit

(NOTE: t/perf/ appears to have timing resolution only down to 0.01 s,
which presents significant measurement error when timings only differ by
0.01s.  I don't trust any such timings below, and yet all the optimized
results differ by at most 0.01s.)

Test        Before Series    Unoptimized              Optimized
-----------------------------------------------------------------------------
*git status*
full-v3     0.15(0.10+0.06)  0.32(0.16+0.17) +113.3%  0.16(0.10+0.07) +6.7%
full-v4     0.15(0.11+0.05)  0.32(0.17+0.16) +113.3%  0.16(0.11+0.05) +6.7%
sparse-v3   0.04(0.03+0.04)  0.04(0.02+0.05) +0.0%    0.04(0.02+0.05) +0.0%
sparse-v4   0.04(0.03+0.04)  0.04(0.02+0.05) +0.0%    0.04(0.03+0.05) +0.0%

*git add -A*
full-v3     0.40(0.30+0.07)  0.56(0.36+0.17) +40.0%   0.39(0.30+0.07) -2.5%
full-v4     0.37(0.28+0.07)  0.54(0.37+0.16) +45.9%   0.38(0.29+0.07) +2.7%
sparse-v3   0.06(0.04+0.05)  0.08(0.05+0.05) +33.3%   0.06(0.05+0.04) +0.0%
sparse-v4   0.05(0.03+0.05)  0.05(0.04+0.04) +0.0%    0.06(0.04+0.05) +20.0%

*git add .*
full-v3     0.40(0.31+0.07)  0.57(0.37+0.17) +42.5%   0.41(0.30+0.08) +2.5%
full-v4     0.38(0.30+0.06)  0.55(0.37+0.16) +44.7%   0.38(0.30+0.06) +0.0%
sparse-v3   0.06(0.04+0.05)  0.06(0.05+0.04) +0.0%    0.06(0.03+0.05) +0.0%
sparse-v4   0.06(0.05+0.05)  0.06(0.04+0.05) +0.0%    0.06(0.04+0.06) +0.0%

*git commit -a -m A*
full-v3     0.41(0.32+0.06)  0.58(0.39+0.17) +41.5%   0.42(0.32+0.07) +2.4%
full-v4     0.39(0.30+0.07)  0.56(0.38+0.17) +43.6%   0.40(0.31+0.07) +2.6%
sparse-v3   0.04(0.03+0.04)  0.04(0.03+0.04) +0.0%    0.04(0.03+0.04) +0.0%
sparse-v4   0.04(0.03+0.05)  0.04(0.03+0.05) +0.0%    0.04(0.03+0.04) +0.0%

*git checkout -f -*
full-v3     0.56(0.46+0.07)  0.73(0.55+0.16) +30.4%   0.57(0.47+0.08) +1.8%
full-v4     0.54(0.45+0.07)  0.71(0.53+0.17) +31.5%   0.55(0.45+0.07) +1.9%
sparse-v3   0.06(0.04+0.04)  0.06(0.04+0.05) +0.0%    0.06(0.04+0.05) +0.0%
sparse-v4   0.05(0.05+0.04)  0.05(0.04+0.05) +0.0%    0.06(0.04+0.05) +20.0%

*git reset*
full-v3     0.34(0.26+0.05)  0.51(0.34+0.15) +50.0%   0.34(0.26+0.06) +0.0%
full-v4     0.32(0.24+0.06)  0.49(0.32+0.15) +53.1%   0.33(0.25+0.06) +3.1%
sparse-v3   0.04(0.03+0.04)  0.04(0.03+0.04) +0.0%    0.04(0.03+0.04) +0.0%
sparse-v4   0.03(0.03+0.04)  0.03(0.02+0.04) +0.0%    0.03(0.03+0.04) +0.0%

*git reset --hard*
full-v3     0.57(0.46+0.07)  0.90(0.61+0.25) +57.9%   0.57(0.45+0.08) +0.0%
full-v4     0.54(0.46+0.05)  0.88(0.59+0.26) +63.0%   0.55(0.45+0.07) +1.9%
sparse-v3   0.07(0.03+0.03)  0.07(0.04+0.03) +0.0%    0.07(0.03+0.03) +0.0%
sparse-v4   0.06(0.03+0.03)  0.06(0.04+0.02) +0.0%    0.06(0.03+0.03) +0.0%

*git reset -- does-not-exist*
full-v3     0.35(0.27+0.06)  0.52(0.32+0.17) +48.6%   0.35(0.27+0.06) +0.0%
full-v4     0.33(0.26+0.05)  0.50(0.33+0.15) +51.5%   0.33(0.26+0.06) +0.0%
sparse-v3   0.04(0.03+0.04)  0.04(0.03+0.04) +0.0%    0.04(0.03+0.04) +0.0%
sparse-v4   0.04(0.02+0.04)  0.03(0.02+0.04) -25.0%   0.03(0.02+0.04) -25.0%

*git diff*
full-v3     0.07(0.04+0.04)  0.24(0.11+0.14) +242.9%  0.07(0.04+0.04) +0.0%
full-v4     0.07(0.03+0.05)  0.24(0.13+0.12) +242.9%  0.08(0.04+0.05) +14.3%
sparse-v3   0.02(0.01+0.04)  0.02(0.01+0.04) +0.0%    0.02(0.01+0.05) +0.0%
sparse-v4   0.02(0.02+0.03)  0.02(0.01+0.04) +0.0%    0.02(0.01+0.04) +0.0%

*git diff --cached*
full-v3     0.05(0.03+0.02)  0.22(0.12+0.09) +340.0%  0.05(0.03+0.01) +0.0%
full-v4     0.05(0.03+0.01)  0.23(0.12+0.11) +360.0%  0.05(0.03+0.02) +0.0%
sparse-v3   0.01(0.00+0.00)  0.01(0.00+0.00) +0.0%    0.01(0.00+0.00) +0.0%
sparse-v4   0.01(0.00+0.00)  0.01(0.00+0.00) +0.0%    0.01(0.00+0.00) +0.0%

*git blame f2/f4/a*
full-v3     0.18(0.13+0.05)  0.52(0.29+0.23) +188.9%  0.19(0.15+0.04) +5.6%
full-v4     0.19(0.15+0.04)  0.52(0.28+0.23) +173.7%  0.19(0.14+0.04) +0.0%
sparse-v3   0.10(0.08+0.02)  0.10(0.09+0.01) +0.0%    0.10(0.09+0.01) +0.0%
sparse-v4   0.10(0.08+0.02)  0.10(0.08+0.02) +0.0%    0.10(0.08+0.02) +0.0%

*git blame f2/f4/f3/a*
full-v3     0.45(0.36+0.08)  0.78(0.51+0.27) +73.3%   0.45(0.37+0.08) +0.0%
full-v4     0.45(0.37+0.08)  0.78(0.51+0.26) +73.3%   0.45(0.37+0.08) +0.0%
sparse-v3   0.36(0.32+0.04)  0.36(0.31+0.05) +0.0%    0.36(0.31+0.04) +0.0%
sparse-v4   0.36(0.31+0.05)  0.36(0.31+0.05) +0.0%    0.36(0.31+0.04) +0.0%

*git checkout-index -f --all*
full-v3     0.07(0.02+0.05)  0.24(0.12+0.12) +242.9%  0.08(0.04+0.04) +14.3%
full-v4     0.07(0.03+0.04)  0.24(0.11+0.13) +242.9%  0.08(0.03+0.04) +14.3%
sparse-v3   0.04(0.01+0.03)  0.04(0.00+0.03) +0.0%    0.04(0.01+0.03) +0.0%
sparse-v4   0.04(0.01+0.02)  0.04(0.01+0.03) +0.0%    0.04(0.01+0.02) +0.0%

*git update-index --add --remove f2/f4/a*
full-v3     0.29(0.23+0.02)  0.46(0.30+0.12) +58.6%   0.30(0.24+0.02) +3.4%
full-v4     0.27(0.22+0.02)  0.45(0.29+0.12) +66.7%   0.28(0.22+0.03) +3.7%
sparse-v3   0.02(0.02+0.00)  0.02(0.01+0.00) +0.0%    0.02(0.01+0.00) +0.0%
sparse-v4   0.02(0.02+0.00)  0.02(0.02+0.00) +0.0%    0.02(0.02+0.00) +0.0%

So, with the optimization, the extra work appears to be essentially 0
for sparse-checkouts that are also using sparse-indexes (even before my
optimization), and the extra work appears to be just marginally more
than 0 for sparse-checkouts that are using full indexes.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 sparse-index.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index b82648b10ee..eed170cd8f7 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -341,18 +341,70 @@ void ensure_correct_sparsity(struct index_state *istate)
 		ensure_full_index(istate);
 }
 
+static int path_found(const char *path, const char **dirname, size_t *dir_len,
+		      int *dir_found)
+{
+	struct stat st;
+	char *newdir;
+	char *tmp;
+
+	/*
+	 * If dirname corresponds to a directory that doesn't exist, and this
+	 * path starts with dirname, then path can't exist.
+	 */
+	if (!*dir_found && !memcmp(path, *dirname, *dir_len))
+		return 0;
+
+	/*
+	 * If path itself exists, return 1.
+	 */
+	if (!lstat(path, &st))
+		return 1;
+
+	/*
+	 * Otherwise, path does not exist so we'll return 0...but we'll first
+	 * determine some info about its parent directory so we can avoid
+	 * lstat calls for future cache entries.
+	 */
+	newdir = strrchr(path, '/');
+	if (!newdir)
+		return 0; /* Didn't find a parent dir; just return 0 now. */
+
+	/*
+	 * If path starts with directory (which we already lstat'ed and found),
+	 * then no need to lstat parent directory again.
+	 */
+	if (*dir_found && *dirname && memcmp(path, *dirname, *dir_len))
+		return 0;
+
+	/* Free previous dirname, and cache path's dirname */
+	*dirname = path;
+	*dir_len = newdir - path + 1;
+
+	tmp = xstrndup(path, *dir_len);
+	*dir_found = !lstat(tmp, &st);
+	free(tmp);
+
+	return 0;
+}
+
 void clear_skip_worktree_from_present_files(struct index_state *istate)
 {
+	const char *last_dirname = NULL;
+	size_t dir_len = 0;
+	int dir_found = 1;
+
 	int i;
+
 	if (!core_apply_sparse_checkout)
 		return;
 
 restart:
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
-		struct stat st;
 
-		if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
+		if (ce_skip_worktree(ce) &&
+		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
 			if (S_ISSPARSEDIR(ce->ce_mode)) {
 				ensure_full_index(istate);
 				goto restart;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] Accelerate clear_skip_worktree_from_present_files() by caching
  2022-01-13 16:43 ` [PATCH 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
@ 2022-01-13 23:35   ` Elijah Newren
  0 siblings, 0 replies; 30+ messages in thread
From: Elijah Newren @ 2022-01-13 23:35 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: Git Mailing List, Victoria Dye, Derrick Stolee, Lessley Dennington

On Thu, Jan 13, 2022 at 8:43 AM Elijah Newren via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Elijah Newren <newren@gmail.com>
>
> Trying to clear the skip-worktree bit from files that are present does
> present some computational overhead, for sparse-checkouts.  (We do not
> do the bit clearing in non-sparse-checkouts.)  Optimize it as follows:
>
...
> (NOTE: t/perf/ appears to have timing resolution only down to 0.01 s,
> which presents significant measurement error when timings only differ by
> 0.01s.  I don't trust any such timings below, and yet all the optimized
> results differ by at most 0.01s.)

To follow up on this using using a tool with higher precision for a
few selected cases:

> Test        Before Series    Unoptimized              Optimized
> -----------------------------------------------------------------------------
...
> *git add -A*
> full-v3     0.40(0.30+0.07)  0.56(0.36+0.17) +40.0%   0.39(0.30+0.07) -2.5%

In hyperfine, the results were:

before-series,full-v3$ hyperfine '~/floss/git/git add -A'
Benchmark #1: ~/floss/git/git add -A
  Time (mean ± σ):     129.4 ms ±   3.0 ms    [User: 84.0 ms, System: 59.9 ms]
  Range (min … max):   122.5 ms … 138.3 ms    24 runs

post-optim,full-v3$ hyperfine '~/floss/git/git add -A'
Benchmark #1: ~/floss/git/git add -A
  Time (mean ± σ):     133.4 ms ±   3.0 ms    [User: 85.6 ms, System: 61.4 ms]
  Range (min … max):   129.7 ms … 142.7 ms    22 runs

Since 133.4/129.4 ~ 1.0309, we see that we have +3.1% on the execution
timing due to changes in this series.  Notably, the number is actually
positive as we'd expect since we are clearly doing more work, whereas
the t/perf system reports -2.5% due to measurement inaccuracy.

> full-v4     0.37(0.28+0.07)  0.54(0.37+0.16) +45.9%   0.38(0.29+0.07) +2.7%
> sparse-v3   0.06(0.04+0.05)  0.08(0.05+0.05) +33.3%   0.06(0.05+0.04) +0.0%
> sparse-v4   0.05(0.03+0.05)  0.05(0.04+0.04) +0.0%    0.06(0.04+0.05) +20.0%

before-series,sparse-v4$ hyperfine '~/floss/git/git add -A'
Benchmark #1: ~/floss/git/git add -A
  Time (mean ± σ):      37.0 ms ±   1.1 ms    [User: 23.1 ms, System: 47.8 ms]
  Range (min … max):    35.6 ms …  42.1 ms    72 runs

post-optim,sparse-v4$ hyperfine '~/floss/git/git add -A'
Benchmark #1: ~/floss/git/git add -A
  Time (mean ± σ):      37.3 ms ±   0.9 ms    [User: 22.8 ms, System: 48.7 ms]
  Range (min … max):    36.2 ms …  40.7 ms    72 runs

37.3/37.0 ~ 1.008, so a +0.8% on execution timing.  Hard to measure,
and nice that hyperfine automatically chooses to run this 72 times
just to ensure the data is more reliable.  (Also, the +0.8% is much
more believable than the +20.0% from lack of measurement accuracy,
especially combined with data below)

> *git diff*
> full-v3     0.07(0.04+0.04)  0.24(0.11+0.14) +242.9%  0.07(0.04+0.04) +0.0%
> full-v4     0.07(0.03+0.05)  0.24(0.13+0.12) +242.9%  0.08(0.04+0.05) +14.3%

before-series,full-v4$ hyperfine '~/floss/git/git diff'
Benchmark #1: ~/floss/git/git diff
  Time (mean ± σ):      69.9 ms ±   1.9 ms    [User: 37.5 ms, System: 47.6 ms]
  Range (min … max):    66.5 ms …  74.0 ms    44 runs

post-optim,full-v4$ hyperfine '~/floss/git/git diff'
Benchmark #1: ~/floss/git/git diff
  Time (mean ± σ):      73.0 ms ±   1.7 ms    [User: 40.7 ms, System: 47.1 ms]
  Range (min … max):    70.3 ms …  76.2 ms    40 runs

So 73.0/69.9 ~ 1.044, so a +4.4% to the execution timing -- much less
than the +14.3% reported due to lack of measurement accuracy.

(If I had used ~/floss/git/bin-wrappers/git instead of
~/floss/git/git, the timings would have been going from 72.3ms to
75.1ms, representing a +3.9% increase instead.  Probably doesn't
matter, but just in case folks were curious about what if I used the
bin-wrappers.)

> sparse-v3   0.02(0.01+0.04)  0.02(0.01+0.04) +0.0%    0.02(0.01+0.05) +0.0%
> sparse-v4   0.02(0.02+0.03)  0.02(0.01+0.04) +0.0%    0.02(0.01+0.04) +0.0%

Here, the timings went from 17.5ms to 17.6ms, or +0.6%.  It's really
hard to measure the overhead of the changes of my patch series when
sparse-index is in use, which makes sense since the extra loop only
has to check the toplevel directory, not any of the entries under it.
But it does show just a minor cost in runtime due to the new added
check.



I think the original commit message makes it clear that with the (new)
optimization, the timings are quite reasonable, but maybe this little
extra flavor helps for anyone who was just looking at the percentage
changes.  The timings and timing differences really need to be bigger
than the precision in order to trust the percentage differences, which
just wasn't the case for t/perf here.  But t/perf was good enough to
show that the timings after my series are roughly the same as before
the series (never more than 0.01s slower).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts)
  2022-01-13 16:43 [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
                   ` (4 preceding siblings ...)
  2022-01-13 16:43 ` [PATCH 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
@ 2022-01-14 15:59 ` Elijah Newren via GitGitGadget
  2022-01-14 15:59   ` [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
                     ` (5 more replies)
  5 siblings, 6 replies; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-14 15:59 UTC (permalink / raw)
  To: git
  Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren,
	Elijah Newren

(Maintainer note: This series builds on (v2 of) vd/sparse-clean-etc, because
it tweaks one of the testcases added there.)

(Note 2: There was a previous RFC round of this series at
https://lore.kernel.org/git/20220109045732.2497526-1-newren@gmail.com/.)

Files in the present-despite-SKIP_WORKTREE state have caused no ends of
discussions and bugs[1,2,3,4,5,6,...and lots of others]. Trying to address
the big issue of discovering & recovering from this state has befuddled me
for over a year because I was worried we'd need additional code at every
skip_worktree-checking path in the code (and they are all over the place),
and that we'd make the code significantly slower unless we plumbed a bunch
of additional information all over the place to allow some reasonable
optimizations.

This series tries to solve the problem a bit differently by automatic early
discovery and recovery; as a result, it greatly simplifies the landscape,
reduces our testing matrix burden, and fixes a large swath of bugs. And I
figured out how to get the perf cost down to essentially negligible.

Changes since v1 (or v2 if you count RFC as v1):

 * now includes some fixes for testcases from
   ds/fetch-pull-with-sparse-index (which topic marked its own new tests as
   potentially suboptimal; with my series, the sparse behavior now matches
   the full tree behavior on that test. Wahoo!). Note that Junio's version
   of vd/sparse-clean-etc already includes ds/fetch-pull-with-sparse-index,
   so no need to merge in anything extra.

Changes since RFC version:

 * updated the commit messages as per suggestions from Victoria, including
   adding performance measurements
 * renamed the new function to use a clearer name
 * replaced the final patch with a different optimization, which is both
   simpler and performs quite a bit better (the cost for my previous patch 5
   was already decent in many cases, but had a few cases where the cost was
   significant).

Quick overview:

 * Patches 1 & 2 add a test to demonstrate accidental deletion of
   possibly-modified files, and then fix the bug.
 * Patch 3 is the crux of this series; a small amount of code with a huge
   commit message
 * Patch 4 updates the documentation
 * Patch 5 adds an optimization to reduce the performance impact of patch 3

[1]
https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/
[2]
https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
[3]
https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/
[4] commit 66b209b ("merge-ort: implement CE_SKIP_WORKTREE handling with
conflicted entries", 2021-03-20) [5] commit ba359fd ("stash: fix stash
application in sparse-checkouts", 2020-12-01) [6]
https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/

Elijah Newren (5):
  t1011: add testcase demonstrating accidental loss of user
    modifications
  unpack-trees: fix accidental loss of user changes
  repo_read_index: clear SKIP_WORKTREE bit from files present in
    worktree
  Update documentation related to sparsity and the skip-worktree bit
  Accelerate clear_skip_worktree_from_present_files() by caching

 Documentation/git-read-tree.txt          | 12 +++-
 Documentation/git-sparse-checkout.txt    | 76 ++++++++++++++----------
 Documentation/git-update-index.txt       | 57 +++++++++++++-----
 repository.c                             |  7 +++
 sparse-index.c                           | 73 +++++++++++++++++++++++
 sparse-index.h                           |  1 +
 t/t1011-read-tree-sparse-checkout.sh     | 23 ++++++-
 t/t1092-sparse-checkout-compatibility.sh | 41 ++++++-------
 t/t3705-add-sparse-checkout.sh           |  2 +
 t/t6428-merge-conflicts-sparse.sh        | 23 ++-----
 t/t7012-skip-worktree-writing.sh         | 44 +++-----------
 t/t7817-grep-sparse-checkout.sh          | 11 +++-
 unpack-trees.c                           |  4 +-
 13 files changed, 246 insertions(+), 128 deletions(-)


base-commit: 48609de3bf32befb69c40c1a2595a98dac0448b4
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1114%2Fnewren%2Ffix-present-despite-skip-worktree-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1114/newren/fix-present-despite-skip-worktree-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1114

Range-diff vs v1:

 1:  c553d558c2f = 1:  d50d804af4e t1011: add testcase demonstrating accidental loss of user modifications
 2:  1e3958576e2 = 2:  206c638fa90 unpack-trees: fix accidental loss of user changes
 3:  b263cc75b7d ! 3:  11d46a399d2 repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
     @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'update-index modi
       
       	# When skip-worktree is disabled (even on files outside sparse cone), file
       	# is updated in the index
     +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'ls-files' '
     + 	test_cmp dense sparse &&
     + 
     + 	# Set up a strange condition of having a file edit
     +-	# outside of the sparse-checkout cone. This is just
     +-	# to verify that sparse-checkout and sparse-index
     +-	# behave the same in this case.
     ++	# outside of the sparse-checkout cone. We want to verify
     ++	# that all modes handle this the same, and detect the
     ++	# modification.
     + 	write_script edit-content <<-\EOF &&
     +-	mkdir folder1 &&
     ++	mkdir -p folder1 &&
     + 	echo content >>folder1/a
     + 	EOF
     +-	run_on_sparse ../edit-content &&
     ++	run_on_all ../edit-content &&
     + 
     +-	# ls-files does not currently notice modified files whose
     +-	# cache entries are marked SKIP_WORKTREE. This may change
     +-	# in the future, but here we test that sparse index does
     +-	# not accidentally create a change of behavior.
     +-	test_sparse_match git ls-files --modified &&
     +-	test_must_be_empty sparse-checkout-out &&
     +-	test_must_be_empty sparse-index-out &&
     ++	test_all_match git ls-files --modified &&
     + 
     + 	git -C sparse-index ls-files --sparse --modified >sparse-index-out &&
     +-	test_must_be_empty sparse-index-out &&
     ++	cat >expect <<-\EOF &&
     ++	folder1/a
     ++	EOF
     ++	test_cmp expect sparse-index-out &&
     + 
     + 	# Add folder1 to the sparse-checkout cone and
     + 	# check that ls-files shows the expanded files.
     + 	test_sparse_match git sparse-checkout add folder1 &&
     +-	test_sparse_match git ls-files --modified &&
     ++	test_all_match git ls-files --modified &&
     + 
     + 	test_all_match git ls-files &&
     + 	git -C sparse-index ls-files --sparse >actual &&
      
       ## t/t3705-add-sparse-checkout.sh ##
      @@ t/t3705-add-sparse-checkout.sh: setup_sparse_entry () {
 4:  c74ad19616e = 4:  0af00779128 Update documentation related to sparsity and the skip-worktree bit
 5:  e68028ebe0a = 5:  05ac964e630 Accelerate clear_skip_worktree_from_present_files() by caching

-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications
  2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
@ 2022-01-14 15:59   ` Elijah Newren via GitGitGadget
  2022-02-16  8:51     ` Ævar Arnfjörð Bjarmason
  2022-01-14 15:59   ` [PATCH v2 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-14 15:59 UTC (permalink / raw)
  To: git
  Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

If a user has a file with local modifications that is not marked as
SKIP_WORKTREE, but the sparsity patterns are such that it should be
marked that way, and the user then invokes a command like

   * git checkout -q HEAD^

or

   * git read-tree -mu HEAD^

Then the file will be deleted along with all the users' modifications.
Add a testcase demonstrating this problem.

Note: This bug only triggers if something other than 'HEAD' is given;
if the commands above had specified 'HEAD', then the users' file would
be left alone.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t1011-read-tree-sparse-checkout.sh | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 24092c09a95..1b2395b8a82 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -187,6 +187,27 @@ test_expect_success 'read-tree updates worktree, absent case' '
 	test ! -f init.t
 '
 
+test_expect_success 'read-tree will not throw away dirty changes, non-sparse' '
+	echo "/*" >.git/info/sparse-checkout &&
+	read_tree_u_must_succeed -m -u HEAD &&
+
+	echo dirty >init.t &&
+	read_tree_u_must_fail -m -u HEAD^ &&
+	test_path_is_file init.t &&
+	grep -q dirty init.t
+'
+
+test_expect_failure 'read-tree will not throw away dirty changes, sparse' '
+	echo "/*" >.git/info/sparse-checkout &&
+	read_tree_u_must_succeed -m -u HEAD &&
+
+	echo dirty >init.t &&
+	echo sub/added >.git/info/sparse-checkout &&
+	read_tree_u_must_fail -m -u HEAD^ &&
+	test_path_is_file init.t &&
+	grep -q dirty init.t
+'
+
 test_expect_success 'read-tree updates worktree, dirty case' '
 	echo sub/added >.git/info/sparse-checkout &&
 	git checkout -f top &&
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/5] unpack-trees: fix accidental loss of user changes
  2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
  2022-01-14 15:59   ` [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
@ 2022-01-14 15:59   ` Elijah Newren via GitGitGadget
  2022-01-14 15:59   ` [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-14 15:59 UTC (permalink / raw)
  To: git
  Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

For sparse-checkouts, we don't want unpack-trees to error out on files
that are missing from the worktree, so there has traditionally been
logic to make it skip the verify_uptodate() check for these.
Unfortunately, it was skipping the verify_uptodate() check for files
that were expected to *become* SKIP_WORKTREE.  For files that were not
already SKIP_WORKTREE, that can cause us to later delete the file in
apply_sparse_checkout().  Only skip the check for files that were
already SKIP_WORKTREE as well to avoid lightly discarding important
changes users may have made to files.

Note 1: unpack-trees.c is already a bit complex, and the logic around
CE_SKIP_WORKTREE and CE_NEW_SKIP_WORKTREE in that file are no exception.
I also tried just replacing CE_NEW_SKIP_WORKTREE with CE_SKIP_WORKTREE
in the verify_uptodate() check instead of checking for both flags, and
found that it also fixed this bug and passed all the tests.  I also
attempted to devise a few testcases that might trip either variant of my
fix and was unable to find any problems.  It may be that just checking
CE_SKIP_WORKTREE is a better fix, but I'm not sure.  I thought it
was a bit safer to strictly reduce the number of cases where we skip the
up-to-date check rather than just toggling which kind of cases skip it,
and thus went with the current variant of the fix.

Note 2: I also wondered if verify_absent() might have a similar bug, but
despite my attempts to try to devise a testcase that would trigger such
a thing, I couldn't find any problematic testcases.  Thus, this patch
makes no attempt to apply similar changes to verify_absent() and
verify_absent_if_directory().

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 t/t1011-read-tree-sparse-checkout.sh | 2 +-
 unpack-trees.c                       | 4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 1b2395b8a82..4ed0885bf2f 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -197,7 +197,7 @@ test_expect_success 'read-tree will not throw away dirty changes, non-sparse' '
 	grep -q dirty init.t
 '
 
-test_expect_failure 'read-tree will not throw away dirty changes, sparse' '
+test_expect_success 'read-tree will not throw away dirty changes, sparse' '
 	echo "/*" >.git/info/sparse-checkout &&
 	read_tree_u_must_succeed -m -u HEAD &&
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 360844bda3a..96525d2ec26 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -2065,7 +2065,9 @@ static int verify_uptodate_1(const struct cache_entry *ce,
 int verify_uptodate(const struct cache_entry *ce,
 		    struct unpack_trees_options *o)
 {
-	if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
+	if (!o->skip_sparse_checkout &&
+	    (ce->ce_flags & CE_SKIP_WORKTREE) &&
+	    (ce->ce_flags & CE_NEW_SKIP_WORKTREE))
 		return 0;
 	return verify_uptodate_1(ce, o, ERROR_NOT_UPTODATE_FILE);
 }
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
  2022-01-14 15:59   ` [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
  2022-01-14 15:59   ` [PATCH v2 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
@ 2022-01-14 15:59   ` Elijah Newren via GitGitGadget
  2022-02-16  8:57     ` Ævar Arnfjörð Bjarmason
  2022-02-19  1:06     ` Jonathan Nieder
  2022-01-14 15:59   ` [PATCH v2 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-14 15:59 UTC (permalink / raw)
  To: git
  Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

The fix is short (~30 lines), but the description is not.  Sorry.

There is a set of problems caused by files in what I'll refer to as the
"present-despite-SKIP_WORKTREE" state.  This commit aims to not just fix
these problems, but remove the entire class as a possibility -- for
those using sparse checkouts.  But first, we need to understand the
problems this class presents.  A quick outline:

   * Problems
     * User facing issues
     * Problem space complexity
     * Maintenance and code correctness challenges
   * SKIP_WORKTREE expectations in Git
   * Suggested solution
   * Pros/Cons of suggested solution
   * Notes on testcase modifications

=== User facing issues ===

There are various ways for users to get files to be present in the
working copy despite having the SKIP_WORKTREE bit set for that file in
the index.  This may come from:
  * various git commands not really supporting the SKIP_WORKTREE bit[1,2]
  * users grabbing files from elsewhere and writing them to the worktree
    (perhaps even cached in their editor)
  * users attempting to "abort" a sparse-checkout operation with a
    not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
    working tree is not atomic)[3].

Once users have present-despite-SKIP_WORKTREE files, any modifications
users make to these files will be ignored, possibly to users' confusion.

Further:
  * these files will degrade performance for the sparse-index case due
    to requiring the index to be expanded (see commit 55dfcf9591
    ("sparse-checkout: clear tracked sparse dirs", 2021-09-08) for why
    we try to delete entire directories outside the sparse cone).
  * these files will not be updated by by standard commands
    (switch/checkout/pull/merge/rebase will leave them alone unless
    conflicts happen -- and even then, the conflicted file may be
    written somewhere else to avoid overwriting the SKIP_WORKTREE file
    that is present and in the way)
  * there is nothing in Git that users can use to discover such
    files (status, diff, grep, etc. all ignore it)
  * there is no reasonable mechanism to "recover" from such a condition
    (neither `git sparse-checkout reapply` nor `git reset --hard` will
    correct it).

So, not only are users modifications ignored, but the files get
progressively more stale over time.  At some point in the future, they
may change their sparseness specification or disable sparse-checkouts.
At that time, all present-despite-SKIP_WORKTREE files will show up as
having lots of modifications because they represent a version from a
different branch or commit.  These might include user-made local changes
from days before, but the only way to tell is to have users look through
them all closely.

If these users come to others for help, there will be no logs that
explain the issue; it's just a mysterious list of changes.  Users might
adamantly claim (correctly, as it turns out) that they didn't modify
these files, while others presume they did.

[1] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/
[2] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[3] https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/

=== Problem space complexity ===

SKIP_WORKTREE has been part of Git for over a decade.  Duy did lots of
work on it initially, and several others have since come along and put
lots of work into it.  Stolee spent most of 2021 on the sparse-index,
with lots of bugfixes along the way including to non-sparse-index cases
as we are still trying to get sparse checkouts to behave reasonably.
Basically every codepath throughout the treat needs to be aware of an
additional type of file: tracked-but-not-present.  The extra type
results in lots of extra testcases and lots of extra code everywhere.

But, the sad thing is that we actually have more than one extra type.
We have tracked, tracked-but-not-present (SKIP_WORKTREE), and
tracked-but-promised-to-not-be-present-but-is-present-anyway
(present-despite-SKIP_WORKTREE).  Two types is a monumental amount of
effort to support, and adding a third feels a bit like insanity[4].

[4] Some examples of which can be seen at
    https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/

=== Maintenance and code correctness challenges ===

Matheus' patches to grep stalled for nearly a year, in part because of
complications of how to handle sparse-checkouts appropriately in all
cases[5][6] (with trying to sanely figure out how to sanely handle
present-despite-SKIP_WORKTREE files being one of the complications).
His rm/add follow-ups also took months because of those kinds of
issues[7].  The corner cases with things like submodules and
SKIP_WORKTREE with the addition of present-despite-SKIP_WORKTREE start
becoming really complex[8].

We've had to add ugly logic to merge-ort to attempt to handle
present-despite-SKIP_WORKTREE files[9], and basically just been forced
to give up in merge-recursive knowing full well that we'll sometimes
silently discard user modifications.  Despite stash essentially being a
merge, it needed extra code (beyond what was in merge-ort and
merge-recursive) to manually tweak SKIP_WORKTREE bits in order to avoid
a few different bugs that'd result in an early abort with a partial
stash application[10].

[5] See https://lore.kernel.org/git/5f3f7ac77039d41d1692ceae4b0c5df3bb45b74a.1612901326.git.matheus.bernardino@usp.br/#t
    and the dates on the thread; also Matheus and I had several
    conversations off-list trying to resolve the issues over that time
[6] ...it finally kind of got unstuck after
    https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
[7] See for example
    https://lore.kernel.org/git/CABPp-BHwNoVnooqDFPAsZxBT9aR5Dwk5D9sDRCvYSb8akxAJgA@mail.gmail.com/#t
    and quotes like "The core functionality of sparse-checkout has always
    been only partially implemented", a statement I still believe is true
    today.
[8] https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/
[9] See commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE
    handling with conflicted entries", 2021-03-20)
[10] See commit ba359fd507 ("stash: fix stash application in
     sparse-checkouts", 2020-12-01)

=== SKIP_WORKTREE expectations in Git ===

A couple quotes:

From [11] (before the "sparse-checkout" command existed):
   If it needs too many special cases, hacks, and conditionals, then it
   is not worth the complexity---if it is easier to write a correct code
   by allowing Git to populate working tree files, it is perfectly fine
   to do so.

   In a sense, the sparse checkout "feature" itself is a hack by itself,
   and that is why I think this part should be "best effort" as well.

From the git-sparse-checkout manual (still present today):

   THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
   COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
   THE FUTURE.

[11] https://lore.kernel.org/git/xmqqbmb1a7ga.fsf@gitster-ct.c.googlers.com/

=== Suggested solution ===

SKIP_WORKTREE was written to allow sparse-checkouts, in particular, as
the name of the option implies, to allow the file to NOT be in the
worktree but consider it to be unchanged rather than deleted.

The suggests a simple solution: present-despite-SKIP_WORKTREE files
should not exist, for those using sparse-checkouts.

Enforce this at index loading time by checking if core.sparseCheckout is
true; if so, check files in the index with the SKIP_WORKTREE bit set to
verify that they are absent from the working tree.  If they are present,
unset the bit (in memory, though any commands that write to the index
will record the update).

Users can, of course, can get the SKIP_WORKTREE bit back such as by
running `git sparse-checkout reapply` (if they have ensured the file is
unmodified and doesn't match the specified sparsity patterns).

=== Pros/Cons of suggested solution ===

Pros:

  * Solves the user visible problems reported above, which I've been
    complaining about for nearly a year but couldn't find a solution to.
  * Helps prevent slow performance degradation with a sparse-index.
  * Much easier behavior in sparse-checkouts for users to reason about
  * Very simple, ~30 lines of code.
  * Significantly simplifies some ugly testcases, and obviates the need
    to test an entire class of potential issues.
  * Reduces code complexity, reasoning, and maintenance.  Avoids
    disagreements about weird corner cases[12].
  * It has been reported that some users might be (ab)using
    SKIP_WORKTREE as a let-me-modify-but-keep-the-file-in-the-worktree
    mechanism[13, and a few other similar references].  These users know
    of multiple caveats and shortcomings in doing so; perhaps not
    surprising given the "SKIP_WORKTREE expecations" section above.
    However, these users use `git update-index --skip-worktree`, and not
    `git sparse-checkout` or core.sparseCheckout=true.  As such, these
    users would be unaffected by this change and can continue abusing
    the system as before.

[12] https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
[13] https://stackoverflow.com/questions/13630849/git-difference-between-assume-unchanged-and-skip-worktree

Cons:

  * When core.sparseCheckout is enabled, this adds a performance cost to
    reading the index.  I'll defer discussion of this cost to a subsequent
    patch, since I have some optimizations to add.

=== Notes on testcase modifications ===

The good:
  * t1011: Compare to two cases above it ('read-tree will not throw away
    dirty changes, non-sparse'); since the file is present, it should
    match the non-sparse case now
  * t1092: sparse-index & sparse-checkout now match full-worktree
    behavior in more cases!  Yaay for consistency!
  * t6428, t7012: look at how much simpler the tests become!  Merge and
    stash can just fail early telling the user there's a file in the
    way, instead of not noticing until it's about to write a file and
    then have to implement sudden crash avoidance.  Hurray for sanity!
  * t7817: sparse behavior better matches full tree behavior.  Hurray
    for sanity!

The confusing:
  * t3705: These changes were ONLY needed on Windows, but they don't
    hurt other platforms.  Let's discuss each individually:

    * core.sparseCheckout should be false by default.  Nothing in this
      testcase toggles that until many, many tests later.  However,
      early tests (#5 in particular) were testing `update-index
      --skip-worktree` behavior in a non-sparse-checkout, but the
      Windows tests in CI were behaving as if core.sparseCheckout=true
      had been specified somewhere.  I do not have access to a Windows
      machine.  But I just manually did what should have been a no-op
      and turned the config off.  And it fixed the test.
    * I have no idea why the leftover .gitattributes file from this
      test was causing failures for test #18 on Windows, but only with
      these changes of mine.  Test #18 was checking for empty stderr,
      and specifically wanted to know that some error completely
      unrelated to file endings did not appear.  The leftover
      .gitattributes file thus caused some spurious stderr unrelated to
      the thing being checked.  Since other tests did not intend to
      test normalization, just proactively remove the .gitattributes
      file.  I'm certain this is cleaner and better, I'm just unsure
      why/how this didn't trigger problems before.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 repository.c                             |  7 ++++
 sparse-index.c                           | 21 +++++++++++
 sparse-index.h                           |  1 +
 t/t1011-read-tree-sparse-checkout.sh     |  2 +-
 t/t1092-sparse-checkout-compatibility.sh | 41 ++++++++++------------
 t/t3705-add-sparse-checkout.sh           |  2 ++
 t/t6428-merge-conflicts-sparse.sh        | 23 +++----------
 t/t7012-skip-worktree-writing.sh         | 44 ++++--------------------
 t/t7817-grep-sparse-checkout.sh          | 11 ++++--
 9 files changed, 72 insertions(+), 80 deletions(-)

diff --git a/repository.c b/repository.c
index 34610c5a33e..dddee32258f 100644
--- a/repository.c
+++ b/repository.c
@@ -301,6 +301,13 @@ int repo_read_index(struct repository *repo)
 	if (repo->settings.command_requires_full_index)
 		ensure_full_index(repo->index);
 
+	/*
+	 * If sparse checkouts are in use, check whether paths with the
+	 * SKIP_WORKTREE attribute are missing from the worktree; if not,
+	 * clear that attribute for that path.
+	 */
+	clear_skip_worktree_from_present_files(repo->index);
+
 	return res;
 }
 
diff --git a/sparse-index.c b/sparse-index.c
index a1d505d50e9..b82648b10ee 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -341,6 +341,27 @@ void ensure_correct_sparsity(struct index_state *istate)
 		ensure_full_index(istate);
 }
 
+void clear_skip_worktree_from_present_files(struct index_state *istate)
+{
+	int i;
+	if (!core_apply_sparse_checkout)
+		return;
+
+restart:
+	for (i = 0; i < istate->cache_nr; i++) {
+		struct cache_entry *ce = istate->cache[i];
+		struct stat st;
+
+		if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
+			if (S_ISSPARSEDIR(ce->ce_mode)) {
+				ensure_full_index(istate);
+				goto restart;
+			}
+			ce->ce_flags &= ~CE_SKIP_WORKTREE;
+		}
+	}
+}
+
 /*
  * This static global helps avoid infinite recursion between
  * expand_to_path() and index_file_exists().
diff --git a/sparse-index.h b/sparse-index.h
index 656bd835b25..633d4fb7e31 100644
--- a/sparse-index.h
+++ b/sparse-index.h
@@ -5,6 +5,7 @@ struct index_state;
 #define SPARSE_INDEX_MEMORY_ONLY (1 << 0)
 int convert_to_sparse(struct index_state *istate, int flags);
 void ensure_correct_sparsity(struct index_state *istate);
+void clear_skip_worktree_from_present_files(struct index_state *istate);
 
 /*
  * Some places in the codebase expect to search for a specific path.
diff --git a/t/t1011-read-tree-sparse-checkout.sh b/t/t1011-read-tree-sparse-checkout.sh
index 4ed0885bf2f..dd957be1b78 100755
--- a/t/t1011-read-tree-sparse-checkout.sh
+++ b/t/t1011-read-tree-sparse-checkout.sh
@@ -212,7 +212,7 @@ test_expect_success 'read-tree updates worktree, dirty case' '
 	echo sub/added >.git/info/sparse-checkout &&
 	git checkout -f top &&
 	echo dirty >init.t &&
-	read_tree_u_must_succeed -m -u HEAD^ &&
+	read_tree_u_must_fail -m -u HEAD^ &&
 	grep -q dirty init.t &&
 	rm init.t
 '
diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index f3a059e5af5..2a04b532f91 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -367,7 +367,7 @@ test_expect_success 'status/add: outside sparse cone' '
 	write_script edit-contents <<-\EOF &&
 	echo text >>$1
 	EOF
-	run_on_sparse ../edit-contents folder1/a &&
+	run_on_all ../edit-contents folder1/a &&
 	run_on_all ../edit-contents folder1/new &&
 
 	test_sparse_match git status --porcelain=v2 &&
@@ -376,8 +376,8 @@ test_expect_success 'status/add: outside sparse cone' '
 	test_sparse_match test_must_fail git add folder1/a &&
 	grep "Disable or modify the sparsity rules" sparse-checkout-err &&
 	test_sparse_unstaged folder1/a &&
-	test_sparse_match test_must_fail git add --refresh folder1/a &&
-	grep "Disable or modify the sparsity rules" sparse-checkout-err &&
+	test_all_match git add --refresh folder1/a &&
+	test_must_be_empty sparse-checkout-err &&
 	test_sparse_unstaged folder1/a &&
 	test_sparse_match test_must_fail git add folder1/new &&
 	grep "Disable or modify the sparsity rules" sparse-checkout-err &&
@@ -643,11 +643,11 @@ test_expect_success 'update-index modify outside sparse definition' '
 	run_on_sparse cp ../initial-repo/folder1/a folder1/a &&
 	run_on_all ../edit-contents folder1/a &&
 
-	# If file has skip-worktree enabled, update-index does not modify the
-	# index entry
-	test_sparse_match git update-index folder1/a &&
-	test_sparse_match git status --porcelain=v2 &&
-	test_must_be_empty sparse-checkout-out &&
+	# If file has skip-worktree enabled, but the file is present, it is
+	# treated the same as if skip-worktree is disabled
+	test_all_match git status --porcelain=v2 &&
+	test_all_match git update-index folder1/a &&
+	test_all_match git status --porcelain=v2 &&
 
 	# When skip-worktree is disabled (even on files outside sparse cone), file
 	# is updated in the index
@@ -1331,30 +1331,27 @@ test_expect_success 'ls-files' '
 	test_cmp dense sparse &&
 
 	# Set up a strange condition of having a file edit
-	# outside of the sparse-checkout cone. This is just
-	# to verify that sparse-checkout and sparse-index
-	# behave the same in this case.
+	# outside of the sparse-checkout cone. We want to verify
+	# that all modes handle this the same, and detect the
+	# modification.
 	write_script edit-content <<-\EOF &&
-	mkdir folder1 &&
+	mkdir -p folder1 &&
 	echo content >>folder1/a
 	EOF
-	run_on_sparse ../edit-content &&
+	run_on_all ../edit-content &&
 
-	# ls-files does not currently notice modified files whose
-	# cache entries are marked SKIP_WORKTREE. This may change
-	# in the future, but here we test that sparse index does
-	# not accidentally create a change of behavior.
-	test_sparse_match git ls-files --modified &&
-	test_must_be_empty sparse-checkout-out &&
-	test_must_be_empty sparse-index-out &&
+	test_all_match git ls-files --modified &&
 
 	git -C sparse-index ls-files --sparse --modified >sparse-index-out &&
-	test_must_be_empty sparse-index-out &&
+	cat >expect <<-\EOF &&
+	folder1/a
+	EOF
+	test_cmp expect sparse-index-out &&
 
 	# Add folder1 to the sparse-checkout cone and
 	# check that ls-files shows the expanded files.
 	test_sparse_match git sparse-checkout add folder1 &&
-	test_sparse_match git ls-files --modified &&
+	test_all_match git ls-files --modified &&
 
 	test_all_match git ls-files &&
 	git -C sparse-index ls-files --sparse >actual &&
diff --git a/t/t3705-add-sparse-checkout.sh b/t/t3705-add-sparse-checkout.sh
index 81f3384eeed..95609046c61 100755
--- a/t/t3705-add-sparse-checkout.sh
+++ b/t/t3705-add-sparse-checkout.sh
@@ -19,6 +19,7 @@ setup_sparse_entry () {
 	fi &&
 	git add sparse_entry &&
 	git update-index --skip-worktree sparse_entry &&
+	git config core.sparseCheckout false &&
 	git commit --allow-empty -m "ensure sparse_entry exists at HEAD" &&
 	SPARSE_ENTRY_BLOB=$(git rev-parse :sparse_entry)
 }
@@ -126,6 +127,7 @@ test_expect_success 'git add --chmod does not update sparse entries' '
 '
 
 test_expect_success 'git add --renormalize does not update sparse entries' '
+	test_when_finished rm .gitattributes &&
 	test_config core.autocrlf false &&
 	setup_sparse_entry "LINEONE\r\nLINETWO\r\n" &&
 	echo "sparse_entry text=auto" >.gitattributes &&
diff --git a/t/t6428-merge-conflicts-sparse.sh b/t/t6428-merge-conflicts-sparse.sh
index 7e8bf497f82..142c9aaabc5 100755
--- a/t/t6428-merge-conflicts-sparse.sh
+++ b/t/t6428-merge-conflicts-sparse.sh
@@ -112,7 +112,7 @@ test_expect_success 'conflicting entries written to worktree even if sparse' '
 	)
 '
 
-test_expect_merge_algorithm failure success 'present-despite-SKIP_WORKTREE handled reasonably' '
+test_expect_success 'present-despite-SKIP_WORKTREE handled reasonably' '
 	test_setup_numerals in_the_way &&
 	(
 		cd numerals_in_the_way &&
@@ -132,26 +132,13 @@ test_expect_merge_algorithm failure success 'present-despite-SKIP_WORKTREE handl
 
 		test_must_fail git merge -s recursive B^0 &&
 
-		git ls-files -t >index_files &&
-		test_cmp expected-index index_files &&
+		test_path_is_missing .git/MERGE_HEAD &&
 
-		test_path_is_file README &&
 		test_path_is_file numerals &&
 
-		test_cmp expected-merge numerals &&
-
-		# There should still be a file with "foobar" in it
-		grep foobar * &&
-
-		# 5 other files:
-		#   * expected-merge
-		#   * expected-index
-		#   * index_files
-		#   * others
-		#   * whatever name was given to the numerals file that had
-		#     "foobar" in it
-		git ls-files -o >others &&
-		test_line_count = 5 others
+		# numerals should still have "foobar" in it
+		echo foobar >expect &&
+		test_cmp expect numerals
 	)
 '
 
diff --git a/t/t7012-skip-worktree-writing.sh b/t/t7012-skip-worktree-writing.sh
index a1080b94e38..cb9f1a6981e 100755
--- a/t/t7012-skip-worktree-writing.sh
+++ b/t/t7012-skip-worktree-writing.sh
@@ -171,50 +171,20 @@ test_expect_success 'stash restore in sparse checkout' '
 
 		# Put a file in the working directory in the way
 		echo in the way >modified &&
-		git stash apply &&
+		test_must_fail git stash apply 2>error&&
 
-		# Ensure stash vivifies modifies paths...
-		cat >expect <<-EOF &&
-		H addme
-		H modified
-		H removeme
-		H subdir/A
-		S untouched
-		EOF
-		git ls-files -t >actual &&
-		test_cmp expect actual &&
+		grep "changes.*would be overwritten by merge" error &&
 
-		# ...and that the paths show up in status as changed...
-		cat >expect <<-EOF &&
-		A  addme
-		 M modified
-		 D removeme
-		 M subdir/A
-		?? actual
-		?? expect
-		?? modified.stash.XXXXXX
-		EOF
-		git status --porcelain | \
-			sed -e s/stash......./stash.XXXXXX/ >actual &&
-		test_cmp expect actual &&
+		echo in the way >expect &&
+		test_cmp expect modified &&
+		git diff --quiet HEAD ":!modified" &&
 
 		# ...and that working directory reflects the files correctly
-		test_path_is_file    addme &&
+		test_path_is_missing addme &&
 		test_path_is_file    modified &&
 		test_path_is_missing removeme &&
 		test_path_is_file    subdir/A &&
-		test_path_is_missing untouched &&
-
-		# ...including that we have the expected "modified" file...
-		cat >expect <<-EOF &&
-		modified
-		tweaked
-		EOF
-		test_cmp expect modified &&
-
-		# ...and that the other "modified" file is still present...
-		echo in the way >expect &&
-		test_cmp expect modified.stash.*
+		test_path_is_missing untouched
 	)
 '
 
diff --git a/t/t7817-grep-sparse-checkout.sh b/t/t7817-grep-sparse-checkout.sh
index 590b99bbb6f..eb595645657 100755
--- a/t/t7817-grep-sparse-checkout.sh
+++ b/t/t7817-grep-sparse-checkout.sh
@@ -83,10 +83,13 @@ test_expect_success 'setup' '
 
 # The test below covers a special case: the sparsity patterns exclude '/b' and
 # sparse checkout is enabled, but the path exists in the working tree (e.g.
-# manually created after `git sparse-checkout init`). git grep should skip it.
+# manually created after `git sparse-checkout init`).  Although b is marked
+# as SKIP_WORKTREE, git grep should notice it IS present in the worktree and
+# report it.
 test_expect_success 'working tree grep honors sparse checkout' '
 	cat >expect <<-EOF &&
 	a:text
+	b:new-text
 	EOF
 	test_when_finished "rm -f b" &&
 	echo "new-text" >b &&
@@ -126,12 +129,16 @@ test_expect_success 'grep --cached searches entries with the SKIP_WORKTREE bit'
 '
 
 # Note that sub2/ is present in the worktree but it is excluded by the sparsity
-# patterns, so grep should not recurse into it.
+# patterns.  We also explicitly mark it as SKIP_WORKTREE in case it got cleared
+# by previous git commands.  Thus sub2 starts as SKIP_WORKTREE but since it is
+# present in the working tree, grep should recurse into it.
 test_expect_success 'grep --recurse-submodules honors sparse checkout in submodule' '
 	cat >expect <<-EOF &&
 	a:text
 	sub/B/b:text
+	sub2/a:text
 	EOF
+	git update-index --skip-worktree sub2 &&
 	git grep --recurse-submodules "text" >actual &&
 	test_cmp expect actual
 '
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 4/5] Update documentation related to sparsity and the skip-worktree bit
  2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
                     ` (2 preceding siblings ...)
  2022-01-14 15:59   ` [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
@ 2022-01-14 15:59   ` Elijah Newren via GitGitGadget
  2022-02-16  9:15     ` Ævar Arnfjörð Bjarmason
  2022-01-14 15:59   ` [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
  2022-01-15  1:51   ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Victoria Dye
  5 siblings, 1 reply; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-14 15:59 UTC (permalink / raw)
  To: git
  Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Make several small updates, to address a few documentation issues
I spotted:
  * sparse-checkout focused on "patterns" even though the inputs (and
    outputs in the case of `list`) are directories in cone-mode
  * The description section of the sparse-checkout documentation
    was a bit sparse (no pun intended), and focused more on internal
    mechanics rather than end user usage.  This made sense in the
    early days when the command was even more experimental, but let's
    adjust a bit to try to make it more approachable to end users who
    may want to consider using it.  Keep the scary backward
    compatibility warning, though; we're still hard at work trying to
    fix up commands to behave reasonably in sparse checkouts.
  * both read-tree and update-index tried to describe how to use the
    skip-worktree bit, but both predated the sparse-checkout command.
    The sparse-checkout command is a far easier mechanism to use and
    for users trying to reduce the size of their working tree, we
    should recommend users to look at it instead.
  * The update-index documentation pointed out that assume-unchanged
    and skip-worktree sounded similar but had different purposes.
    However, it made no attempt to explain the differences, only to
    point out that they were different.  Explain the differences.
  * The update-index documentation focused much more on (internal?)
    implementation details than on end-user usage.  Try to explain
    its purpose better for users of update-index, rather than
    fellow developers trying to work with the SKIP_WORKTREE bit.
  * Clarify that when core.sparseCheckout=true, we treat a file's
    presence in the working tree as being an override to the
    SKIP_WORKTREE bit (i.e. in sparse checkouts when the file is
    present we ignore the SKIP_WORKTREE bit).

Note that this commit, like many touching documentation, is best viewed
with the `--color-words` option to diff/log.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 Documentation/git-read-tree.txt       | 12 +++--
 Documentation/git-sparse-checkout.txt | 76 ++++++++++++++++-----------
 Documentation/git-update-index.txt    | 57 +++++++++++++++-----
 3 files changed, 98 insertions(+), 47 deletions(-)

diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index 8c3aceb8324..99bb387134d 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -375,9 +375,14 @@ have finished your work-in-progress), attempt the merge again.
 SPARSE CHECKOUT
 ---------------
 
+Note: The `update-index` and `read-tree` primitives for supporting the
+skip-worktree bit predated the introduction of
+linkgit:git-sparse-checkout[1].  Users are encouraged to use
+`sparse-checkout` in preference to these low-level primitives.
+
 "Sparse checkout" allows populating the working directory sparsely.
-It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
-Git whether a file in the working directory is worth looking at.
+It uses the skip-worktree bit (see linkgit:git-update-index[1]) to
+tell Git whether a file in the working directory is worth looking at.
 
 'git read-tree' and other merge-based commands ('git merge', 'git
 checkout'...) can help maintaining the skip-worktree bitmap and working
@@ -385,7 +390,8 @@ directory update. `$GIT_DIR/info/sparse-checkout` is used to
 define the skip-worktree reference bitmap. When 'git read-tree' needs
 to update the working directory, it resets the skip-worktree bit in the index
 based on this file, which uses the same syntax as .gitignore files.
-If an entry matches a pattern in this file, skip-worktree will not be
+If an entry matches a pattern in this file, or the entry corresponds to
+a file present in the working tree, then skip-worktree will not be
 set on that entry. Otherwise, skip-worktree will be set.
 
 Then it compares the new skip-worktree value with the previous one. If
diff --git a/Documentation/git-sparse-checkout.txt b/Documentation/git-sparse-checkout.txt
index b81dbe06543..3da3d5a1007 100644
--- a/Documentation/git-sparse-checkout.txt
+++ b/Documentation/git-sparse-checkout.txt
@@ -3,9 +3,7 @@ git-sparse-checkout(1)
 
 NAME
 ----
-git-sparse-checkout - Initialize and modify the sparse-checkout
-configuration, which reduces the checkout to a set of paths
-given by a list of patterns.
+git-sparse-checkout - Reduce your working tree to a subset of tracked files
 
 
 SYNOPSIS
@@ -17,8 +15,20 @@ SYNOPSIS
 DESCRIPTION
 -----------
 
-Initialize and modify the sparse-checkout configuration, which reduces
-the checkout to a set of paths given by a list of patterns.
+This command is used to create sparse checkouts, which means that it
+changes the working tree from having all tracked files present, to only
+have a subset of them.  It can also switch which subset of files are
+present, or undo and go back to having all tracked files present in the
+working copy.
+
+The subset of files is chosen by providing a list of directories in
+cone mode (which is recommended), or by providing a list of patterns
+in non-cone mode.
+
+When in a sparse-checkout, other Git commands behave a bit differently.
+For example, switching branches will not update paths outside the
+sparse-checkout directories/patterns, and `git commit -a` will not record
+paths outside the sparse-checkout directories/patterns as deleted.
 
 THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
 COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
@@ -28,7 +38,7 @@ THE FUTURE.
 COMMANDS
 --------
 'list'::
-	Describe the patterns in the sparse-checkout file.
+	Describe the directories or patterns in the sparse-checkout file.
 
 'set'::
 	Enable the necessary config settings
@@ -38,20 +48,26 @@ COMMANDS
 	list of arguments following the 'set' subcommand. Update the
 	working directory to match the new patterns.
 +
-When the `--stdin` option is provided, the patterns are read from
-standard in as a newline-delimited list instead of from the arguments.
+When the `--stdin` option is provided, the directories or patterns are
+read from standard in as a newline-delimited list instead of from the
+arguments.
 +
 When `--cone` is passed or `core.sparseCheckoutCone` is enabled, the
-input list is considered a list of directories instead of
-sparse-checkout patterns.  This allows for better performance with a
-limited set of patterns (see 'CONE PATTERN SET' below).  Note that the
-set command will write patterns to the sparse-checkout file to include
-all files contained in those directories (recursively) as well as
-files that are siblings of ancestor directories. The input format
-matches the output of `git ls-tree --name-only`.  This includes
-interpreting pathnames that begin with a double quote (") as C-style
-quoted strings.  This may become the default in the future; --no-cone
-can be passed to request non-cone mode.
+input list is considered a list of directories.  This allows for
+better performance with a limited set of patterns (see 'CONE PATTERN
+SET' below).  The input format matches the output of `git ls-tree
+--name-only`.  This includes interpreting pathnames that begin with a
+double quote (") as C-style quoted strings.  Note that the set command
+will write patterns to the sparse-checkout file to include all files
+contained in those directories (recursively) as well as files that are
+siblings of ancestor directories. This may become the default in the
+future; --no-cone can be passed to request non-cone mode.
++
+When `--no-cone` is passed or `core.sparseCheckoutCone` is not enabled,
+the input list is considered a list of patterns.  This mode is harder
+to use and less performant, and is thus not recommended.  See the
+"Sparse Checkout" section of linkgit:git-read-tree[1] and the "Pattern
+Set" sections below for more details.
 +
 Use the `--[no-]sparse-index` option to use a sparse index (the
 default is to not use it).  A sparse index reduces the size of the
@@ -69,11 +85,10 @@ understand the sparse directory entries index extension and may fail to
 interact with your repository until it is disabled.
 
 'add'::
-	Update the sparse-checkout file to include additional patterns.
-	By default, these patterns are read from the command-line arguments,
-	but they can be read from stdin using the `--stdin` option. When
-	`core.sparseCheckoutCone` is enabled, the given patterns are interpreted
-	as directory names as in the 'set' subcommand.
+	Update the sparse-checkout file to include additional directories
+	(in cone mode) or patterns (in non-cone mode).  By default, these
+	directories or patterns are read from the command-line arguments,
+	but they can be read from stdin using the `--stdin` option.
 
 'reapply'::
 	Reapply the sparsity pattern rules to paths in the working tree.
@@ -117,13 +132,14 @@ decreased in utility.
 SPARSE CHECKOUT
 ---------------
 
-"Sparse checkout" allows populating the working directory sparsely.
-It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
-Git whether a file in the working directory is worth looking at. If
-the skip-worktree bit is set, then the file is ignored in the working
-directory. Git will avoid populating the contents of those files, which
-makes a sparse checkout helpful when working in a repository with many
-files, but only a few are important to the current user.
+"Sparse checkout" allows populating the working directory sparsely.  It
+uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell Git
+whether a file in the working directory is worth looking at. If the
+skip-worktree bit is set, and the file is not present in the working tree,
+then its absence is ignored. Git will avoid populating the contents of
+those files, which makes a sparse checkout helpful when working in a
+repository with many files, but only a few are important to the current
+user.
 
 The `$GIT_DIR/info/sparse-checkout` file is used to define the
 skip-worktree reference bitmap. When Git updates the working
diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt
index 2853f168d97..568dbfe76b8 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -351,6 +351,10 @@ unchanged".  Note that "assume unchanged" bit is *not* set if
 the index (use `git update-index --really-refresh` if you want
 to mark them as "assume unchanged").
 
+Sometimes users confuse the assume-unchanged bit with the
+skip-worktree bit.  See the final paragraph in the "Skip-worktree bit"
+section below for an explanation of the differences.
+
 
 EXAMPLES
 --------
@@ -392,22 +396,47 @@ M foo.c
 SKIP-WORKTREE BIT
 -----------------
 
-Skip-worktree bit can be defined in one (long) sentence: When reading
-an entry, if it is marked as skip-worktree, then Git pretends its
-working directory version is up to date and read the index version
-instead.
-
-To elaborate, "reading" means checking for file existence, reading
-file attributes or file content. The working directory version may be
-present or absent. If present, its content may match against the index
-version or not. Writing is not affected by this bit, content safety
-is still first priority. Note that Git _can_ update working directory
-file, that is marked skip-worktree, if it is safe to do so (i.e.
-working directory version matches index version)
+Skip-worktree bit can be defined in one (long) sentence: Tell git to
+avoid writing the file to the working directory when reasonably
+possible, and treat the file as unchanged when it is not
+present in the working directory.
+
+Note that not all git commands will pay attention to this bit, and
+some only partially support it.
+
+The update-index flags and the read-tree capabilities relating to the
+skip-worktree bit predated the introduction of the
+linkgit:git-sparse-checkout[1] command, which provides a much easier
+way to configure and handle the skip-worktree bits.  If you want to
+reduce your working tree to only deal with a subset of the files in
+the repository, we strongly encourage the use of
+linkgit:git-sparse-checkout[1] in preference to the low-level
+update-index and read-tree primitives.
+
+The primary purpose of the skip-worktree bit is to enable sparse
+checkouts, i.e. to have working directories with only a subset of
+paths present.  When the skip-worktree bit is set, Git commands (such
+as `switch`, `pull`, `merge`) will avoid writing these files.
+However, these commands will sometimes write these files anyway in
+important cases such as conflicts during a merge or rebase.  Git
+commands will also avoid treating the lack of such files as an
+intentional deletion; for example `git add -u` will not not stage a
+deletion for these files and `git commit -a` will not make a commit
+deleting them either.
 
 Although this bit looks similar to assume-unchanged bit, its goal is
-different from assume-unchanged bit's. Skip-worktree also takes
-precedence over assume-unchanged bit when both are set.
+different.  The assume-unchanged bit is for leaving the file in the
+working tree but having Git omit checking it for changes and presuming
+that the file has not been changed (though if it can determine without
+stat'ing the file that it has changed, it is free to record the
+changes).  skip-worktree tells Git to ignore the absence of the file,
+avoid updating it when possible with commands that normally update
+much of the working directory (e.g. `checkout`, `switch`, `pull`,
+etc.), and not have its absence be recorded in commits.  Note that in
+sparse checkouts (setup by `git sparse-checkout` or by configuring
+core.sparseCheckout to true), if a file is marked as skip-worktree in
+the index but is found in the working tree, Git will clear the
+skip-worktree bit for that file.
 
 SPLIT INDEX
 -----------
-- 
gitgitgadget


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching
  2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
                     ` (3 preceding siblings ...)
  2022-01-14 15:59   ` [PATCH v2 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
@ 2022-01-14 15:59   ` Elijah Newren via GitGitGadget
  2022-01-15  1:39     ` Victoria Dye
  2022-02-16  9:32     ` Ævar Arnfjörð Bjarmason
  2022-01-15  1:51   ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Victoria Dye
  5 siblings, 2 replies; 30+ messages in thread
From: Elijah Newren via GitGitGadget @ 2022-01-14 15:59 UTC (permalink / raw)
  To: git
  Cc: Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren,
	Elijah Newren, Elijah Newren

From: Elijah Newren <newren@gmail.com>

Trying to clear the skip-worktree bit from files that are present does
present some computational overhead, for sparse-checkouts.  (We do not
do the bit clearing in non-sparse-checkouts.)  Optimize it as follows:

Rather than lstat()'ing every SKIP_WORKTREE path, take advantage of the
fact that entire directories will often be missing, especially for cone
mode and even more so ever since commit 55dfcf9591 ("sparse-checkout:
clear tracked sparse dirs", 2021-09-08).  If we have already determined
that the parent directory of a file (or other previous ancestor) does
not exist, then the file cannot exist either so we do not need to
lstat() it separately.

Timings for p2000 included below, reformatted to fit in normal commit
message line lengths, which compare three things:
  * Timings before this series
  * Timings of the unoptimized version of
    clear_skip_worktree_from_present_files() from a few commits ago
  * Timings after the optimization in this commit

(NOTE: t/perf/ appears to have timing resolution only down to 0.01 s,
which presents significant measurement error when timings only differ by
0.01s.  I don't trust any such timings below, and yet all the optimized
results differ by at most 0.01s.)

Test        Before Series    Unoptimized              Optimized
-----------------------------------------------------------------------------
*git status*
full-v3     0.15(0.10+0.06)  0.32(0.16+0.17) +113.3%  0.16(0.10+0.07) +6.7%
full-v4     0.15(0.11+0.05)  0.32(0.17+0.16) +113.3%  0.16(0.11+0.05) +6.7%
sparse-v3   0.04(0.03+0.04)  0.04(0.02+0.05) +0.0%    0.04(0.02+0.05) +0.0%
sparse-v4   0.04(0.03+0.04)  0.04(0.02+0.05) +0.0%    0.04(0.03+0.05) +0.0%

*git add -A*
full-v3     0.40(0.30+0.07)  0.56(0.36+0.17) +40.0%   0.39(0.30+0.07) -2.5%
full-v4     0.37(0.28+0.07)  0.54(0.37+0.16) +45.9%   0.38(0.29+0.07) +2.7%
sparse-v3   0.06(0.04+0.05)  0.08(0.05+0.05) +33.3%   0.06(0.05+0.04) +0.0%
sparse-v4   0.05(0.03+0.05)  0.05(0.04+0.04) +0.0%    0.06(0.04+0.05) +20.0%

*git add .*
full-v3     0.40(0.31+0.07)  0.57(0.37+0.17) +42.5%   0.41(0.30+0.08) +2.5%
full-v4     0.38(0.30+0.06)  0.55(0.37+0.16) +44.7%   0.38(0.30+0.06) +0.0%
sparse-v3   0.06(0.04+0.05)  0.06(0.05+0.04) +0.0%    0.06(0.03+0.05) +0.0%
sparse-v4   0.06(0.05+0.05)  0.06(0.04+0.05) +0.0%    0.06(0.04+0.06) +0.0%

*git commit -a -m A*
full-v3     0.41(0.32+0.06)  0.58(0.39+0.17) +41.5%   0.42(0.32+0.07) +2.4%
full-v4     0.39(0.30+0.07)  0.56(0.38+0.17) +43.6%   0.40(0.31+0.07) +2.6%
sparse-v3   0.04(0.03+0.04)  0.04(0.03+0.04) +0.0%    0.04(0.03+0.04) +0.0%
sparse-v4   0.04(0.03+0.05)  0.04(0.03+0.05) +0.0%    0.04(0.03+0.04) +0.0%

*git checkout -f -*
full-v3     0.56(0.46+0.07)  0.73(0.55+0.16) +30.4%   0.57(0.47+0.08) +1.8%
full-v4     0.54(0.45+0.07)  0.71(0.53+0.17) +31.5%   0.55(0.45+0.07) +1.9%
sparse-v3   0.06(0.04+0.04)  0.06(0.04+0.05) +0.0%    0.06(0.04+0.05) +0.0%
sparse-v4   0.05(0.05+0.04)  0.05(0.04+0.05) +0.0%    0.06(0.04+0.05) +20.0%

*git reset*
full-v3     0.34(0.26+0.05)  0.51(0.34+0.15) +50.0%   0.34(0.26+0.06) +0.0%
full-v4     0.32(0.24+0.06)  0.49(0.32+0.15) +53.1%   0.33(0.25+0.06) +3.1%
sparse-v3   0.04(0.03+0.04)  0.04(0.03+0.04) +0.0%    0.04(0.03+0.04) +0.0%
sparse-v4   0.03(0.03+0.04)  0.03(0.02+0.04) +0.0%    0.03(0.03+0.04) +0.0%

*git reset --hard*
full-v3     0.57(0.46+0.07)  0.90(0.61+0.25) +57.9%   0.57(0.45+0.08) +0.0%
full-v4     0.54(0.46+0.05)  0.88(0.59+0.26) +63.0%   0.55(0.45+0.07) +1.9%
sparse-v3   0.07(0.03+0.03)  0.07(0.04+0.03) +0.0%    0.07(0.03+0.03) +0.0%
sparse-v4   0.06(0.03+0.03)  0.06(0.04+0.02) +0.0%    0.06(0.03+0.03) +0.0%

*git reset -- does-not-exist*
full-v3     0.35(0.27+0.06)  0.52(0.32+0.17) +48.6%   0.35(0.27+0.06) +0.0%
full-v4     0.33(0.26+0.05)  0.50(0.33+0.15) +51.5%   0.33(0.26+0.06) +0.0%
sparse-v3   0.04(0.03+0.04)  0.04(0.03+0.04) +0.0%    0.04(0.03+0.04) +0.0%
sparse-v4   0.04(0.02+0.04)  0.03(0.02+0.04) -25.0%   0.03(0.02+0.04) -25.0%

*git diff*
full-v3     0.07(0.04+0.04)  0.24(0.11+0.14) +242.9%  0.07(0.04+0.04) +0.0%
full-v4     0.07(0.03+0.05)  0.24(0.13+0.12) +242.9%  0.08(0.04+0.05) +14.3%
sparse-v3   0.02(0.01+0.04)  0.02(0.01+0.04) +0.0%    0.02(0.01+0.05) +0.0%
sparse-v4   0.02(0.02+0.03)  0.02(0.01+0.04) +0.0%    0.02(0.01+0.04) +0.0%

*git diff --cached*
full-v3     0.05(0.03+0.02)  0.22(0.12+0.09) +340.0%  0.05(0.03+0.01) +0.0%
full-v4     0.05(0.03+0.01)  0.23(0.12+0.11) +360.0%  0.05(0.03+0.02) +0.0%
sparse-v3   0.01(0.00+0.00)  0.01(0.00+0.00) +0.0%    0.01(0.00+0.00) +0.0%
sparse-v4   0.01(0.00+0.00)  0.01(0.00+0.00) +0.0%    0.01(0.00+0.00) +0.0%

*git blame f2/f4/a*
full-v3     0.18(0.13+0.05)  0.52(0.29+0.23) +188.9%  0.19(0.15+0.04) +5.6%
full-v4     0.19(0.15+0.04)  0.52(0.28+0.23) +173.7%  0.19(0.14+0.04) +0.0%
sparse-v3   0.10(0.08+0.02)  0.10(0.09+0.01) +0.0%    0.10(0.09+0.01) +0.0%
sparse-v4   0.10(0.08+0.02)  0.10(0.08+0.02) +0.0%    0.10(0.08+0.02) +0.0%

*git blame f2/f4/f3/a*
full-v3     0.45(0.36+0.08)  0.78(0.51+0.27) +73.3%   0.45(0.37+0.08) +0.0%
full-v4     0.45(0.37+0.08)  0.78(0.51+0.26) +73.3%   0.45(0.37+0.08) +0.0%
sparse-v3   0.36(0.32+0.04)  0.36(0.31+0.05) +0.0%    0.36(0.31+0.04) +0.0%
sparse-v4   0.36(0.31+0.05)  0.36(0.31+0.05) +0.0%    0.36(0.31+0.04) +0.0%

*git checkout-index -f --all*
full-v3     0.07(0.02+0.05)  0.24(0.12+0.12) +242.9%  0.08(0.04+0.04) +14.3%
full-v4     0.07(0.03+0.04)  0.24(0.11+0.13) +242.9%  0.08(0.03+0.04) +14.3%
sparse-v3   0.04(0.01+0.03)  0.04(0.00+0.03) +0.0%    0.04(0.01+0.03) +0.0%
sparse-v4   0.04(0.01+0.02)  0.04(0.01+0.03) +0.0%    0.04(0.01+0.02) +0.0%

*git update-index --add --remove f2/f4/a*
full-v3     0.29(0.23+0.02)  0.46(0.30+0.12) +58.6%   0.30(0.24+0.02) +3.4%
full-v4     0.27(0.22+0.02)  0.45(0.29+0.12) +66.7%   0.28(0.22+0.03) +3.7%
sparse-v3   0.02(0.02+0.00)  0.02(0.01+0.00) +0.0%    0.02(0.01+0.00) +0.0%
sparse-v4   0.02(0.02+0.00)  0.02(0.02+0.00) +0.0%    0.02(0.02+0.00) +0.0%

So, with the optimization, the extra work appears to be essentially 0
for sparse-checkouts that are also using sparse-indexes (even before my
optimization), and the extra work appears to be just marginally more
than 0 for sparse-checkouts that are using full indexes.

Signed-off-by: Elijah Newren <newren@gmail.com>
---
 sparse-index.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/sparse-index.c b/sparse-index.c
index b82648b10ee..eed170cd8f7 100644
--- a/sparse-index.c
+++ b/sparse-index.c
@@ -341,18 +341,70 @@ void ensure_correct_sparsity(struct index_state *istate)
 		ensure_full_index(istate);
 }
 
+static int path_found(const char *path, const char **dirname, size_t *dir_len,
+		      int *dir_found)
+{
+	struct stat st;
+	char *newdir;
+	char *tmp;
+
+	/*
+	 * If dirname corresponds to a directory that doesn't exist, and this
+	 * path starts with dirname, then path can't exist.
+	 */
+	if (!*dir_found && !memcmp(path, *dirname, *dir_len))
+		return 0;
+
+	/*
+	 * If path itself exists, return 1.
+	 */
+	if (!lstat(path, &st))
+		return 1;
+
+	/*
+	 * Otherwise, path does not exist so we'll return 0...but we'll first
+	 * determine some info about its parent directory so we can avoid
+	 * lstat calls for future cache entries.
+	 */
+	newdir = strrchr(path, '/');
+	if (!newdir)
+		return 0; /* Didn't find a parent dir; just return 0 now. */
+
+	/*
+	 * If path starts with directory (which we already lstat'ed and found),
+	 * then no need to lstat parent directory again.
+	 */
+	if (*dir_found && *dirname && memcmp(path, *dirname, *dir_len))
+		return 0;
+
+	/* Free previous dirname, and cache path's dirname */
+	*dirname = path;
+	*dir_len = newdir - path + 1;
+
+	tmp = xstrndup(path, *dir_len);
+	*dir_found = !lstat(tmp, &st);
+	free(tmp);
+
+	return 0;
+}
+
 void clear_skip_worktree_from_present_files(struct index_state *istate)
 {
+	const char *last_dirname = NULL;
+	size_t dir_len = 0;
+	int dir_found = 1;
+
 	int i;
+
 	if (!core_apply_sparse_checkout)
 		return;
 
 restart:
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce = istate->cache[i];
-		struct stat st;
 
-		if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
+		if (ce_skip_worktree(ce) &&
+		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
 			if (S_ISSPARSEDIR(ce->ce_mode)) {
 				ensure_full_index(istate);
 				goto restart;
-- 
gitgitgadget

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching
  2022-01-14 15:59   ` [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
@ 2022-01-15  1:39     ` Victoria Dye
  2022-02-16  9:32     ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 30+ messages in thread
From: Victoria Dye @ 2022-01-15  1:39 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Derrick Stolee, Lessley Dennington, Elijah Newren

Elijah Newren via GitGitGadget wrote:
> From: Elijah Newren <newren@gmail.com>
> 
> Trying to clear the skip-worktree bit from files that are present does
> present some computational overhead, for sparse-checkouts.  (We do not
> do the bit clearing in non-sparse-checkouts.)  Optimize it as follows:
> 
> Rather than lstat()'ing every SKIP_WORKTREE path, take advantage of the
> fact that entire directories will often be missing, especially for cone
> mode and even more so ever since commit 55dfcf9591 ("sparse-checkout:
> clear tracked sparse dirs", 2021-09-08).  If we have already determined
> that the parent directory of a file (or other previous ancestor) does
> not exist, then the file cannot exist either so we do not need to
> lstat() it separately.
> 
> Timings for p2000 included below, reformatted to fit in normal commit
> message line lengths, which compare three things:
>   * Timings before this series
>   * Timings of the unoptimized version of
>     clear_skip_worktree_from_present_files() from a few commits ago
>   * Timings after the optimization in this commit
> 
> (NOTE: t/perf/ appears to have timing resolution only down to 0.01 s,
> which presents significant measurement error when timings only differ by
> 0.01s.  I don't trust any such timings below, and yet all the optimized
> results differ by at most 0.01s.)
> 
> Test        Before Series    Unoptimized              Optimized
> -----------------------------------------------------------------------------
> *git status*
> full-v3     0.15(0.10+0.06)  0.32(0.16+0.17) +113.3%  0.16(0.10+0.07) +6.7%
> full-v4     0.15(0.11+0.05)  0.32(0.17+0.16) +113.3%  0.16(0.11+0.05) +6.7%
> sparse-v3   0.04(0.03+0.04)  0.04(0.02+0.05) +0.0%    0.04(0.02+0.05) +0.0%
> sparse-v4   0.04(0.03+0.04)  0.04(0.02+0.05) +0.0%    0.04(0.03+0.05) +0.0%
> 
> *git add -A*
> full-v3     0.40(0.30+0.07)  0.56(0.36+0.17) +40.0%   0.39(0.30+0.07) -2.5%
> full-v4     0.37(0.28+0.07)  0.54(0.37+0.16) +45.9%   0.38(0.29+0.07) +2.7%
> sparse-v3   0.06(0.04+0.05)  0.08(0.05+0.05) +33.3%   0.06(0.05+0.04) +0.0%
> sparse-v4   0.05(0.03+0.05)  0.05(0.04+0.04) +0.0%    0.06(0.04+0.05) +20.0%
> 
> *git add .*
> full-v3     0.40(0.31+0.07)  0.57(0.37+0.17) +42.5%   0.41(0.30+0.08) +2.5%
> full-v4     0.38(0.30+0.06)  0.55(0.37+0.16) +44.7%   0.38(0.30+0.06) +0.0%
> sparse-v3   0.06(0.04+0.05)  0.06(0.05+0.04) +0.0%    0.06(0.03+0.05) +0.0%
> sparse-v4   0.06(0.05+0.05)  0.06(0.04+0.05) +0.0%    0.06(0.04+0.06) +0.0%
> 
> *git commit -a -m A*
> full-v3     0.41(0.32+0.06)  0.58(0.39+0.17) +41.5%   0.42(0.32+0.07) +2.4%
> full-v4     0.39(0.30+0.07)  0.56(0.38+0.17) +43.6%   0.40(0.31+0.07) +2.6%
> sparse-v3   0.04(0.03+0.04)  0.04(0.03+0.04) +0.0%    0.04(0.03+0.04) +0.0%
> sparse-v4   0.04(0.03+0.05)  0.04(0.03+0.05) +0.0%    0.04(0.03+0.04) +0.0%
> 
> *git checkout -f -*
> full-v3     0.56(0.46+0.07)  0.73(0.55+0.16) +30.4%   0.57(0.47+0.08) +1.8%
> full-v4     0.54(0.45+0.07)  0.71(0.53+0.17) +31.5%   0.55(0.45+0.07) +1.9%
> sparse-v3   0.06(0.04+0.04)  0.06(0.04+0.05) +0.0%    0.06(0.04+0.05) +0.0%
> sparse-v4   0.05(0.05+0.04)  0.05(0.04+0.05) +0.0%    0.06(0.04+0.05) +20.0%
> 
> *git reset*
> full-v3     0.34(0.26+0.05)  0.51(0.34+0.15) +50.0%   0.34(0.26+0.06) +0.0%
> full-v4     0.32(0.24+0.06)  0.49(0.32+0.15) +53.1%   0.33(0.25+0.06) +3.1%
> sparse-v3   0.04(0.03+0.04)  0.04(0.03+0.04) +0.0%    0.04(0.03+0.04) +0.0%
> sparse-v4   0.03(0.03+0.04)  0.03(0.02+0.04) +0.0%    0.03(0.03+0.04) +0.0%
> 
> *git reset --hard*
> full-v3     0.57(0.46+0.07)  0.90(0.61+0.25) +57.9%   0.57(0.45+0.08) +0.0%
> full-v4     0.54(0.46+0.05)  0.88(0.59+0.26) +63.0%   0.55(0.45+0.07) +1.9%
> sparse-v3   0.07(0.03+0.03)  0.07(0.04+0.03) +0.0%    0.07(0.03+0.03) +0.0%
> sparse-v4   0.06(0.03+0.03)  0.06(0.04+0.02) +0.0%    0.06(0.03+0.03) +0.0%
> 
> *git reset -- does-not-exist*
> full-v3     0.35(0.27+0.06)  0.52(0.32+0.17) +48.6%   0.35(0.27+0.06) +0.0%
> full-v4     0.33(0.26+0.05)  0.50(0.33+0.15) +51.5%   0.33(0.26+0.06) +0.0%
> sparse-v3   0.04(0.03+0.04)  0.04(0.03+0.04) +0.0%    0.04(0.03+0.04) +0.0%
> sparse-v4   0.04(0.02+0.04)  0.03(0.02+0.04) -25.0%   0.03(0.02+0.04) -25.0%
> 
> *git diff*
> full-v3     0.07(0.04+0.04)  0.24(0.11+0.14) +242.9%  0.07(0.04+0.04) +0.0%
> full-v4     0.07(0.03+0.05)  0.24(0.13+0.12) +242.9%  0.08(0.04+0.05) +14.3%
> sparse-v3   0.02(0.01+0.04)  0.02(0.01+0.04) +0.0%    0.02(0.01+0.05) +0.0%
> sparse-v4   0.02(0.02+0.03)  0.02(0.01+0.04) +0.0%    0.02(0.01+0.04) +0.0%
> 
> *git diff --cached*
> full-v3     0.05(0.03+0.02)  0.22(0.12+0.09) +340.0%  0.05(0.03+0.01) +0.0%
> full-v4     0.05(0.03+0.01)  0.23(0.12+0.11) +360.0%  0.05(0.03+0.02) +0.0%
> sparse-v3   0.01(0.00+0.00)  0.01(0.00+0.00) +0.0%    0.01(0.00+0.00) +0.0%
> sparse-v4   0.01(0.00+0.00)  0.01(0.00+0.00) +0.0%    0.01(0.00+0.00) +0.0%
> 
> *git blame f2/f4/a*
> full-v3     0.18(0.13+0.05)  0.52(0.29+0.23) +188.9%  0.19(0.15+0.04) +5.6%
> full-v4     0.19(0.15+0.04)  0.52(0.28+0.23) +173.7%  0.19(0.14+0.04) +0.0%
> sparse-v3   0.10(0.08+0.02)  0.10(0.09+0.01) +0.0%    0.10(0.09+0.01) +0.0%
> sparse-v4   0.10(0.08+0.02)  0.10(0.08+0.02) +0.0%    0.10(0.08+0.02) +0.0%
> 
> *git blame f2/f4/f3/a*
> full-v3     0.45(0.36+0.08)  0.78(0.51+0.27) +73.3%   0.45(0.37+0.08) +0.0%
> full-v4     0.45(0.37+0.08)  0.78(0.51+0.26) +73.3%   0.45(0.37+0.08) +0.0%
> sparse-v3   0.36(0.32+0.04)  0.36(0.31+0.05) +0.0%    0.36(0.31+0.04) +0.0%
> sparse-v4   0.36(0.31+0.05)  0.36(0.31+0.05) +0.0%    0.36(0.31+0.04) +0.0%
> 
> *git checkout-index -f --all*
> full-v3     0.07(0.02+0.05)  0.24(0.12+0.12) +242.9%  0.08(0.04+0.04) +14.3%
> full-v4     0.07(0.03+0.04)  0.24(0.11+0.13) +242.9%  0.08(0.03+0.04) +14.3%
> sparse-v3   0.04(0.01+0.03)  0.04(0.00+0.03) +0.0%    0.04(0.01+0.03) +0.0%
> sparse-v4   0.04(0.01+0.02)  0.04(0.01+0.03) +0.0%    0.04(0.01+0.02) +0.0%
> 
> *git update-index --add --remove f2/f4/a*
> full-v3     0.29(0.23+0.02)  0.46(0.30+0.12) +58.6%   0.30(0.24+0.02) +3.4%
> full-v4     0.27(0.22+0.02)  0.45(0.29+0.12) +66.7%   0.28(0.22+0.03) +3.7%
> sparse-v3   0.02(0.02+0.00)  0.02(0.01+0.00) +0.0%    0.02(0.01+0.00) +0.0%
> sparse-v4   0.02(0.02+0.00)  0.02(0.02+0.00) +0.0%    0.02(0.02+0.00) +0.0%
> 
> So, with the optimization, the extra work appears to be essentially 0
> for sparse-checkouts that are also using sparse-indexes (even before my
> optimization), and the extra work appears to be just marginally more
> than 0 for sparse-checkouts that are using full indexes.
> 

Thank you for including these results! In the cone-mode case, I'm happy to
see that the optimization effectively brings the cost of correcting
SKIP_WORKTREE down to 0. As for the unoptimized version (which, if I'm
reading this correctly, still approximates some worst-case non-cone cases),
I think correcting SKIP_WORKTREE is probably still worth the performance
cost - in the relatively massive test repo used in p2000, it's still only
~0.2-0.3s across the board.

> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  sparse-index.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 54 insertions(+), 2 deletions(-)
> 
> diff --git a/sparse-index.c b/sparse-index.c
> index b82648b10ee..eed170cd8f7 100644
> --- a/sparse-index.c
> +++ b/sparse-index.c
> @@ -341,18 +341,70 @@ void ensure_correct_sparsity(struct index_state *istate)
>  		ensure_full_index(istate);
>  }
>  
> +static int path_found(const char *path, const char **dirname, size_t *dir_len,
> +		      int *dir_found)
> +{
> +	struct stat st;
> +	char *newdir;
> +	char *tmp;
> +
> +	/*
> +	 * If dirname corresponds to a directory that doesn't exist, and this
> +	 * path starts with dirname, then path can't exist.
> +	 */
> +	if (!*dir_found && !memcmp(path, *dirname, *dir_len))
> +		return 0;
> +

I really like how this version of the optimization works, and ultimately
find it simpler than the hashmap approach from the RFC. Looks good!

> +	/*
> +	 * If path itself exists, return 1.
> +	 */
> +	if (!lstat(path, &st))
> +		return 1;
> +
> +	/*
> +	 * Otherwise, path does not exist so we'll return 0...but we'll first
> +	 * determine some info about its parent directory so we can avoid
> +	 * lstat calls for future cache entries.
> +	 */
> +	newdir = strrchr(path, '/');
> +	if (!newdir)
> +		return 0; /* Didn't find a parent dir; just return 0 now. */
> +
> +	/*
> +	 * If path starts with directory (which we already lstat'ed and found),
> +	 * then no need to lstat parent directory again.
> +	 */
> +	if (*dir_found && *dirname && memcmp(path, *dirname, *dir_len))
> +		return 0;
> +
> +	/* Free previous dirname, and cache path's dirname */
> +	*dirname = path;
> +	*dir_len = newdir - path + 1;
> +
> +	tmp = xstrndup(path, *dir_len);
> +	*dir_found = !lstat(tmp, &st);
> +	free(tmp);
> +
> +	return 0;
> +}
> +
>  void clear_skip_worktree_from_present_files(struct index_state *istate)
>  {
> +	const char *last_dirname = NULL;
> +	size_t dir_len = 0;
> +	int dir_found = 1;
> +
>  	int i;
> +
>  	if (!core_apply_sparse_checkout)
>  		return;
>  
>  restart:
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
> -		struct stat st;
>  
> -		if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
> +		if (ce_skip_worktree(ce) &&
> +		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
>  			if (S_ISSPARSEDIR(ce->ce_mode)) {
>  				ensure_full_index(istate);
>  				goto restart;


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts)
  2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
                     ` (4 preceding siblings ...)
  2022-01-14 15:59   ` [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
@ 2022-01-15  1:51   ` Victoria Dye
  5 siblings, 0 replies; 30+ messages in thread
From: Victoria Dye @ 2022-01-15  1:51 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget, git
  Cc: Derrick Stolee, Lessley Dennington, Elijah Newren

Elijah Newren via GitGitGadget wrote:
> (Maintainer note: This series builds on (v2 of) vd/sparse-clean-etc, because
> it tweaks one of the testcases added there.)
> 
> (Note 2: There was a previous RFC round of this series at
> https://lore.kernel.org/git/20220109045732.2497526-1-newren@gmail.com/.)
> 
> Files in the present-despite-SKIP_WORKTREE state have caused no ends of
> discussions and bugs[1,2,3,4,5,6,...and lots of others]. Trying to address
> the big issue of discovering & recovering from this state has befuddled me
> for over a year because I was worried we'd need additional code at every
> skip_worktree-checking path in the code (and they are all over the place),
> and that we'd make the code significantly slower unless we plumbed a bunch
> of additional information all over the place to allow some reasonable
> optimizations.
> 
> This series tries to solve the problem a bit differently by automatic early
> discovery and recovery; as a result, it greatly simplifies the landscape,
> reduces our testing matrix burden, and fixes a large swath of bugs. And I
> figured out how to get the perf cost down to essentially negligible.
> 
> Changes since v1 (or v2 if you count RFC as v1):
> 
>  * now includes some fixes for testcases from
>    ds/fetch-pull-with-sparse-index (which topic marked its own new tests as
>    potentially suboptimal; with my series, the sparse behavior now matches
>    the full tree behavior on that test. Wahoo!). Note that Junio's version
>    of vd/sparse-clean-etc already includes ds/fetch-pull-with-sparse-index,
>    so no need to merge in anything extra.
> 
> Changes since RFC version:
> 
>  * updated the commit messages as per suggestions from Victoria, including
>    adding performance measurements
>  * renamed the new function to use a clearer name
>  * replaced the final patch with a different optimization, which is both
>    simpler and performs quite a bit better (the cost for my previous patch 5
>    was already decent in many cases, but had a few cases where the cost was
>    significant).
> 
> Quick overview:
> 
>  * Patches 1 & 2 add a test to demonstrate accidental deletion of
>    possibly-modified files, and then fix the bug.
>  * Patch 3 is the crux of this series; a small amount of code with a huge
>    commit message
>  * Patch 4 updates the documentation
>  * Patch 5 adds an optimization to reduce the performance impact of patch 3
> 
Betwee the RFC and this version, you've made the changes I was looking for
and answered any remaining questions I had. This series (including patches
1, 2, and 4 - I didn't have anything substantial to say/ask/add to those)
looks good to me.

Thank you for working on this!

> [1]
> https://lore.kernel.org/git/CABPp-BFnFpzwGC11TLoLs8YK5yiisA5D5-fFjXnJsbESVDwZsA@mail.gmail.com/
> [2]
> https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/
> [3]
> https://lore.kernel.org/git/pull.809.git.git.1592356884310.gitgitgadget@gmail.com/
> [4] commit 66b209b ("merge-ort: implement CE_SKIP_WORKTREE handling with
> conflicted entries", 2021-03-20) [5] commit ba359fd ("stash: fix stash
> application in sparse-checkouts", 2020-12-01) [6]
> https://lore.kernel.org/git/CABPp-BH9tju7WVm=QZDOvaMDdZbpNXrVWQdN-jmfN8wC6YVhmw@mail.gmail.com/
> 
> Elijah Newren (5):
>   t1011: add testcase demonstrating accidental loss of user
>     modifications
>   unpack-trees: fix accidental loss of user changes
>   repo_read_index: clear SKIP_WORKTREE bit from files present in
>     worktree
>   Update documentation related to sparsity and the skip-worktree bit
>   Accelerate clear_skip_worktree_from_present_files() by caching
> 
>  Documentation/git-read-tree.txt          | 12 +++-
>  Documentation/git-sparse-checkout.txt    | 76 ++++++++++++++----------
>  Documentation/git-update-index.txt       | 57 +++++++++++++-----
>  repository.c                             |  7 +++
>  sparse-index.c                           | 73 +++++++++++++++++++++++
>  sparse-index.h                           |  1 +
>  t/t1011-read-tree-sparse-checkout.sh     | 23 ++++++-
>  t/t1092-sparse-checkout-compatibility.sh | 41 ++++++-------
>  t/t3705-add-sparse-checkout.sh           |  2 +
>  t/t6428-merge-conflicts-sparse.sh        | 23 ++-----
>  t/t7012-skip-worktree-writing.sh         | 44 +++-----------
>  t/t7817-grep-sparse-checkout.sh          | 11 +++-
>  unpack-trees.c                           |  4 +-
>  13 files changed, 246 insertions(+), 128 deletions(-)
> 
> 
> base-commit: 48609de3bf32befb69c40c1a2595a98dac0448b4
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1114%2Fnewren%2Ffix-present-despite-skip-worktree-v2
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1114/newren/fix-present-despite-skip-worktree-v2
> Pull-Request: https://github.com/gitgitgadget/git/pull/1114
> 
> Range-diff vs v1:
> 
>  1:  c553d558c2f = 1:  d50d804af4e t1011: add testcase demonstrating accidental loss of user modifications
>  2:  1e3958576e2 = 2:  206c638fa90 unpack-trees: fix accidental loss of user changes
>  3:  b263cc75b7d ! 3:  11d46a399d2 repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
>      @@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'update-index modi
>        
>        	# When skip-worktree is disabled (even on files outside sparse cone), file
>        	# is updated in the index
>      +@@ t/t1092-sparse-checkout-compatibility.sh: test_expect_success 'ls-files' '
>      + 	test_cmp dense sparse &&
>      + 
>      + 	# Set up a strange condition of having a file edit
>      +-	# outside of the sparse-checkout cone. This is just
>      +-	# to verify that sparse-checkout and sparse-index
>      +-	# behave the same in this case.
>      ++	# outside of the sparse-checkout cone. We want to verify
>      ++	# that all modes handle this the same, and detect the
>      ++	# modification.
>      + 	write_script edit-content <<-\EOF &&
>      +-	mkdir folder1 &&
>      ++	mkdir -p folder1 &&
>      + 	echo content >>folder1/a
>      + 	EOF
>      +-	run_on_sparse ../edit-content &&
>      ++	run_on_all ../edit-content &&
>      + 
>      +-	# ls-files does not currently notice modified files whose
>      +-	# cache entries are marked SKIP_WORKTREE. This may change
>      +-	# in the future, but here we test that sparse index does
>      +-	# not accidentally create a change of behavior.
>      +-	test_sparse_match git ls-files --modified &&
>      +-	test_must_be_empty sparse-checkout-out &&
>      +-	test_must_be_empty sparse-index-out &&
>      ++	test_all_match git ls-files --modified &&
>      + 
>      + 	git -C sparse-index ls-files --sparse --modified >sparse-index-out &&
>      +-	test_must_be_empty sparse-index-out &&
>      ++	cat >expect <<-\EOF &&
>      ++	folder1/a
>      ++	EOF
>      ++	test_cmp expect sparse-index-out &&
>      + 
>      + 	# Add folder1 to the sparse-checkout cone and
>      + 	# check that ls-files shows the expanded files.
>      + 	test_sparse_match git sparse-checkout add folder1 &&
>      +-	test_sparse_match git ls-files --modified &&
>      ++	test_all_match git ls-files --modified &&
>      + 
>      + 	test_all_match git ls-files &&
>      + 	git -C sparse-index ls-files --sparse >actual &&
>       
>        ## t/t3705-add-sparse-checkout.sh ##
>       @@ t/t3705-add-sparse-checkout.sh: setup_sparse_entry () {
>  4:  c74ad19616e = 4:  0af00779128 Update documentation related to sparsity and the skip-worktree bit
>  5:  e68028ebe0a = 5:  05ac964e630 Accelerate clear_skip_worktree_from_present_files() by caching
> 


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications
  2022-01-14 15:59   ` [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
@ 2022-02-16  8:51     ` Ævar Arnfjörð Bjarmason
  2022-02-16 16:02       ` Elijah Newren
  0 siblings, 1 reply; 30+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-16  8:51 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren


On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> [...]
> +	test_path_is_file init.t &&
> +	grep -q dirty init.t
> [...]
> +	test_path_is_file init.t &&
> +	grep -q dirty init.t

Maybe I'm missing something, but can these two just be:

    grep dirty init.t

I.e. won't grep report errors appropriately here, e.g.:
    
    $ grep foo t
    grep: t: Is a directory
    $ grep foo x
    grep: x: No such file or directory

The only prior art I could find was the same pattern in your c449947a79d
(directory rename detection: files/directories in the way of some
renames, 2018-04-19).

It's probably good to lose the "-q" too, unless this output is way too
verbose without it. In any case the errors wouldn't be affected.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  2022-01-14 15:59   ` [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
@ 2022-02-16  8:57     ` Ævar Arnfjörð Bjarmason
  2022-02-16 16:08       ` Elijah Newren
  2022-02-19  1:06     ` Jonathan Nieder
  1 sibling, 1 reply; 30+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-16  8:57 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren


On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> [...]
>  	# Set up a strange condition of having a file edit
> -	# outside of the sparse-checkout cone. This is just
> -	# to verify that sparse-checkout and sparse-index
> -	# behave the same in this case.
> +	# outside of the sparse-checkout cone. We want to verify
> +	# that all modes handle this the same, and detect the
> +	# modification.
>  	write_script edit-content <<-\EOF &&
> -	mkdir folder1 &&
> +	mkdir -p folder1 &&
>  	echo content >>folder1/a
>  	EOF
> -	run_on_sparse ../edit-content &&
> +	run_on_all ../edit-content &&

The end-state of this series will pass its tests with this on top, only
the last "mkdir -p" you added for the ls-files test seems to do
anything:

diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
index 2a04b532f91..160c119e17d 100755
--- a/t/t1092-sparse-checkout-compatibility.sh
+++ b/t/t1092-sparse-checkout-compatibility.sh
@@ -639,7 +639,7 @@ test_expect_success 'update-index modify outside sparse definition' '
 	# condition in which a `skip-worktree` enabled, outside-of-cone file
 	# exists on disk. It is used here to ensure `update-index` is stable
 	# and behaves predictably if such a condition occurs.
-	run_on_sparse mkdir -p folder1 &&
+	run_on_sparse mkdir folder1 &&
 	run_on_sparse cp ../initial-repo/folder1/a folder1/a &&
 	run_on_all ../edit-contents folder1/a &&
 
@@ -665,7 +665,7 @@ test_expect_success 'update-index --add outside sparse definition' '
 	EOF
 
 	# Create folder1, add new file
-	run_on_sparse mkdir -p folder1 &&
+	run_on_sparse mkdir folder1 &&
 	run_on_all ../edit-contents folder1/b &&
 
 	# The *untracked* out-of-cone file is added to the index because it does
@@ -949,7 +949,7 @@ test_expect_success 'checkout-index outside sparse definition' '
 
 	run_on_sparse rm -rf folder1 &&
 	echo test >new-a &&
-	run_on_sparse mkdir -p folder1 &&
+	run_on_sparse mkdir folder1 &&
 	run_on_all cp ../new-a folder1/a &&
 
 	test_all_match test_must_fail git checkout-index --ignore-skip-worktree-bits -- folder1/a &&
@@ -996,7 +996,7 @@ test_expect_success 'clean' '
 	test_all_match git commit -m "ignore bogus files" &&
 
 	run_on_sparse mkdir folder1 &&
-	run_on_all mkdir -p deep/untracked-deep &&
+	run_on_all mkdir deep/untracked-deep &&
 	run_on_all touch folder1/bogus &&
 	run_on_all touch folder1/untracked &&
 	run_on_all touch deep/untracked-deep/bogus &&

A bit nit-y I guess, but I do think tests are much easier to follow when
it's clear when we're doing initial setup v.s. using already set-up
data. In this case 

More importantnly I think between this and 19a0acc83e4 (t1092: test
interesting sparse-checkout scenarios, 2021-01-23) that introduced this
pattern there's a large foot-gun being left in place here by using these
"run_on_all" and "run_on_sparse" helpers to run POSIX tooling, as
opposed to git itself.

Utilities like "mv", "rm", "mkdir" etc. are differently chatty between
platform, and this helper captures their stdout/stderr for a later
test_cmp.

Now, I think actually there isn't a bug *now* because we clobber the
output, and seem to only call test_all_match() and other test_cmp
helpers right after we've run "git", not these POSIX utilities.

But since all we want in those cases is just a "run these commands in
these N dirs" it would be good to split up the helper.



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/5] Update documentation related to sparsity and the skip-worktree bit
  2022-01-14 15:59   ` [PATCH v2 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
@ 2022-02-16  9:15     ` Ævar Arnfjörð Bjarmason
  2022-02-16 16:21       ` Elijah Newren
  0 siblings, 1 reply; 30+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-16  9:15 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren


On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> Make several small updates, to address a few documentation issues
> I spotted:
>   * sparse-checkout focused on "patterns" even though the inputs (and
>     outputs in the case of `list`) are directories in cone-mode
>   * The description section of the sparse-checkout documentation
>     was a bit sparse (no pun intended), and focused more on internal
>     mechanics rather than end user usage.  This made sense in the
>     early days when the command was even more experimental, but let's
>     adjust a bit to try to make it more approachable to end users who
>     may want to consider using it.  Keep the scary backward
>     compatibility warning, though; we're still hard at work trying to
>     fix up commands to behave reasonably in sparse checkouts.
>   * both read-tree and update-index tried to describe how to use the
>     skip-worktree bit, but both predated the sparse-checkout command.
>     The sparse-checkout command is a far easier mechanism to use and
>     for users trying to reduce the size of their working tree, we
>     should recommend users to look at it instead.
>   * The update-index documentation pointed out that assume-unchanged
>     and skip-worktree sounded similar but had different purposes.
>     However, it made no attempt to explain the differences, only to
>     point out that they were different.  Explain the differences.
>   * The update-index documentation focused much more on (internal?)
>     implementation details than on end-user usage.  Try to explain
>     its purpose better for users of update-index, rather than
>     fellow developers trying to work with the SKIP_WORKTREE bit.
>   * Clarify that when core.sparseCheckout=true, we treat a file's
>     presence in the working tree as being an override to the
>     SKIP_WORKTREE bit (i.e. in sparse checkouts when the file is
>     present we ignore the SKIP_WORKTREE bit).
>
> Note that this commit, like many touching documentation, is best viewed
> with the `--color-words` option to diff/log.
>
> Signed-off-by: Elijah Newren <newren@gmail.com>
> ---
>  Documentation/git-read-tree.txt       | 12 +++--
>  Documentation/git-sparse-checkout.txt | 76 ++++++++++++++++-----------
>  Documentation/git-update-index.txt    | 57 +++++++++++++++-----
>  3 files changed, 98 insertions(+), 47 deletions(-)
>
> diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
> index 8c3aceb8324..99bb387134d 100644
> --- a/Documentation/git-read-tree.txt
> +++ b/Documentation/git-read-tree.txt
> @@ -375,9 +375,14 @@ have finished your work-in-progress), attempt the merge again.
>  SPARSE CHECKOUT
>  ---------------
>  
> +Note: The `update-index` and `read-tree` primitives for supporting the
> +skip-worktree bit predated the introduction of
> +linkgit:git-sparse-checkout[1].  Users are encouraged to use
> +`sparse-checkout` in preference to these low-level primitives.

I was honestly a bit confused about whether we were really referring to
the git-update-index and git-read-tree commands here, or some
sparse-checkout (re-)usage of the same as "primitives", but reading it
again & the commit message we're just talking about the commands here.

So this really just wants to assure readers that they're advised to use
the shiny porcelain command instead of the plumbing.

I think we should refer to these as e.g. "linkgit:git-update-index[1]"
not "`update-index`" here, and call them e.g. "plumbing commands"
instead of "primitives" here, which will address that (the reader
wonders "what's a primitive?").

> -Initialize and modify the sparse-checkout configuration, which reduces
> -the checkout to a set of paths given by a list of patterns.
> +This command is used to create sparse checkouts, which means that it
> +changes the working tree from having all tracked files present, to only
> +have a subset of them.  It can also switch which subset of files are
> +present, or undo and go back to having all tracked files present in the
> +working copy.

In terms of prose I think it's preferred to keep matter-of-fact "Slices
and dices apples, making them easier to eat" instead of "This command
slices and dices apples, which means that it's easier to eat them".

I've forgotten what the linguisting term for that is, but it's more
consistent with our docs, and makes for easier reading.

> +The subset of files is chosen by providing a list of directories in
> +cone mode (which is recommended), or by providing a list of patterns
> +in non-cone mode.

As someone with light familiarity with sparse-checkout:

I'm still not sure if this is telling me that it's preferred to list
directories v.s. patterns, or if it's telling me it's better to operate
in "cone mode" v.s. "non-cone mode", or some combination of the two.

IOW I think peeling out that "(which is recommended)" and making it
clearly refer to which (or both?) of the two we're talking about would
be much better.

> +When in a sparse-checkout, other Git commands behave a bit differently.
> +For example, switching branches will not update paths outside the
> +sparse-checkout directories/patterns, and `git commit -a` will not record
> +paths outside the sparse-checkout directories/patterns as deleted.

(I didn't read through the rest in any detail)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching
  2022-01-14 15:59   ` [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
  2022-01-15  1:39     ` Victoria Dye
@ 2022-02-16  9:32     ` Ævar Arnfjörð Bjarmason
  2022-02-16 16:30       ` Elijah Newren
  1 sibling, 1 reply; 30+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-02-16  9:32 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Victoria Dye, Derrick Stolee, Lessley Dennington, Elijah Newren


On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
> [...]
> +static int path_found(const char *path, const char **dirname, size_t *dir_len,
> +		      int *dir_found)
> +{
> +	struct stat st;
> +	char *newdir;
> +	char *tmp;
> +
> +	/*
> +	 * If dirname corresponds to a directory that doesn't exist, and this
> +	 * path starts with dirname, then path can't exist.
> +	 */
> +	if (!*dir_found && !memcmp(path, *dirname, *dir_len))
> +		return 0;
> +
> +	/*
> +	 * If path itself exists, return 1.
> +	 */
> +	if (!lstat(path, &st))
> +		return 1;
> +
> +	/*
> +	 * Otherwise, path does not exist so we'll return 0...but we'll first
> +	 * determine some info about its parent directory so we can avoid
> +	 * lstat calls for future cache entries.
> +	 */
> +	newdir = strrchr(path, '/');
> +	if (!newdir)
> +		return 0; /* Didn't find a parent dir; just return 0 now. */
> +
> +	/*
> +	 * If path starts with directory (which we already lstat'ed and found),
> +	 * then no need to lstat parent directory again.
> +	 */
> +	if (*dir_found && *dirname && memcmp(path, *dirname, *dir_len))
> +		return 0;

I really don't care/just asking, but there was a discussion on another
topic about guarding calls to the mem*() family when n=0:
https://lore.kernel.org/git/xmqq1r24gsph.fsf@gitster.g/

Is this the same sort of redundancy where we could lose the "&&
*dirname" part, or is it still important because a "\0" dirname would
have corresponding non-0 *dir_len?

More generally ... (see below)...

> +
> +	/* Free previous dirname, and cache path's dirname */
> +	*dirname = path;
> +	*dir_len = newdir - path + 1;
> +
> +	tmp = xstrndup(path, *dir_len);
> +	*dir_found = !lstat(tmp, &st);

In most other places we're a bit more careful about lstat() error handling, e.g.:
    
    builtin/init-db.c:              if (lstat(path->buf, &st_git)) {
    builtin/init-db.c-                      if (errno != ENOENT)
    builtin/init-db.c-                              die_errno(_("cannot stat '%s'"), path->buf);
    builtin/init-db.c-              }
    
Shouldn't we do the same here and at least error() on return values of
-1 with an accompanying errno that isn't ENOENT?


> +	free(tmp);
> +
> +	return 0;
> +}
> +
>  void clear_skip_worktree_from_present_files(struct index_state *istate)
>  {
> +	const char *last_dirname = NULL;
> +	size_t dir_len = 0;
> +	int dir_found = 1;
> +
>  	int i;
> +
>  	if (!core_apply_sparse_checkout)
>  		return;
>  
>  restart:
>  	for (i = 0; i < istate->cache_nr; i++) {
>  		struct cache_entry *ce = istate->cache[i];
> -		struct stat st;
>  
> -		if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
> +		if (ce_skip_worktree(ce) &&
> +		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {

...(continued from above) is the "path is zero" part of this even
reachable? I tried with this on top and ran your tests (and the rest of
t*sparse*.sh) successfully:
	
	diff --git a/sparse-index.c b/sparse-index.c
	index eed170cd8f7..f89c944d8cd 100644
	--- a/sparse-index.c
	+++ b/sparse-index.c
	@@ -403,6 +403,7 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
	 	for (i = 0; i < istate->cache_nr; i++) {
	 		struct cache_entry *ce = istate->cache[i];
	 
	+		assert(*ce->name);
	 		if (ce_skip_worktree(ce) &&
	 		    path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
	 			if (S_ISSPARSEDIR(ce->ce_mode)) {

I.e. isn't this undue paranoia about the cache API giving us zero-length
paths?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications
  2022-02-16  8:51     ` Ævar Arnfjörð Bjarmason
@ 2022-02-16 16:02       ` Elijah Newren
  0 siblings, 0 replies; 30+ messages in thread
From: Elijah Newren @ 2022-02-16 16:02 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Victoria Dye,
	Derrick Stolee, Lessley Dennington

On Wed, Feb 16, 2022 at 12:53 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> > [...]
> > +     test_path_is_file init.t &&
> > +     grep -q dirty init.t
> > [...]
> > +     test_path_is_file init.t &&
> > +     grep -q dirty init.t
>
> Maybe I'm missing something, but can these two just be:
>
>     grep dirty init.t
>
> I.e. won't grep report errors appropriately here, e.g.:
>
>     $ grep foo t
>     grep: t: Is a directory
>     $ grep foo x
>     grep: x: No such file or directory
>
> The only prior art I could find was the same pattern in your c449947a79d
> (directory rename detection: files/directories in the way of some
> renames, 2018-04-19).
>
> It's probably good to lose the "-q" too, unless this output is way too
> verbose without it. In any case the errors wouldn't be affected.

Fair point.  I believe my original test during development was just
test_path_is_file, then I came back later and decided I should check
the contents weren't lost.  A test_cmp might be a better choice to
replace both of these, because we don't really want to check that it
just has "dirty" somewhere in the file, but that it's been left alone
and only has those contents.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  2022-02-16  8:57     ` Ævar Arnfjörð Bjarmason
@ 2022-02-16 16:08       ` Elijah Newren
  0 siblings, 0 replies; 30+ messages in thread
From: Elijah Newren @ 2022-02-16 16:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Victoria Dye,
	Derrick Stolee, Lessley Dennington

On Wed, Feb 16, 2022 at 1:14 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> > [...]
> >       # Set up a strange condition of having a file edit
> > -     # outside of the sparse-checkout cone. This is just
> > -     # to verify that sparse-checkout and sparse-index
> > -     # behave the same in this case.
> > +     # outside of the sparse-checkout cone. We want to verify
> > +     # that all modes handle this the same, and detect the
> > +     # modification.
> >       write_script edit-content <<-\EOF &&
> > -     mkdir folder1 &&
> > +     mkdir -p folder1 &&
> >       echo content >>folder1/a
> >       EOF
> > -     run_on_sparse ../edit-content &&
> > +     run_on_all ../edit-content &&
>
> The end-state of this series will pass its tests with this on top, only
> the last "mkdir -p" you added for the ls-files test seems to do
> anything:
>
> diff --git a/t/t1092-sparse-checkout-compatibility.sh b/t/t1092-sparse-checkout-compatibility.sh
> index 2a04b532f91..160c119e17d 100755
> --- a/t/t1092-sparse-checkout-compatibility.sh
> +++ b/t/t1092-sparse-checkout-compatibility.sh
> @@ -639,7 +639,7 @@ test_expect_success 'update-index modify outside sparse definition' '
>         # condition in which a `skip-worktree` enabled, outside-of-cone file
>         # exists on disk. It is used here to ensure `update-index` is stable
>         # and behaves predictably if such a condition occurs.
> -       run_on_sparse mkdir -p folder1 &&
> +       run_on_sparse mkdir folder1 &&
>         run_on_sparse cp ../initial-repo/folder1/a folder1/a &&
>         run_on_all ../edit-contents folder1/a &&
>
> @@ -665,7 +665,7 @@ test_expect_success 'update-index --add outside sparse definition' '
>         EOF
>
>         # Create folder1, add new file
> -       run_on_sparse mkdir -p folder1 &&
> +       run_on_sparse mkdir folder1 &&
>         run_on_all ../edit-contents folder1/b &&
>
>         # The *untracked* out-of-cone file is added to the index because it does
> @@ -949,7 +949,7 @@ test_expect_success 'checkout-index outside sparse definition' '
>
>         run_on_sparse rm -rf folder1 &&
>         echo test >new-a &&
> -       run_on_sparse mkdir -p folder1 &&
> +       run_on_sparse mkdir folder1 &&
>         run_on_all cp ../new-a folder1/a &&
>
>         test_all_match test_must_fail git checkout-index --ignore-skip-worktree-bits -- folder1/a &&
> @@ -996,7 +996,7 @@ test_expect_success 'clean' '
>         test_all_match git commit -m "ignore bogus files" &&
>
>         run_on_sparse mkdir folder1 &&
> -       run_on_all mkdir -p deep/untracked-deep &&
> +       run_on_all mkdir deep/untracked-deep &&
>         run_on_all touch folder1/bogus &&
>         run_on_all touch folder1/untracked &&
>         run_on_all touch deep/untracked-deep/bogus &&
>
> A bit nit-y I guess, but I do think tests are much easier to follow when
> it's clear when we're doing initial setup v.s. using already set-up
> data. In this case
>
> More importantnly I think between this and 19a0acc83e4 (t1092: test
> interesting sparse-checkout scenarios, 2021-01-23) that introduced this
> pattern there's a large foot-gun being left in place here by using these
> "run_on_all" and "run_on_sparse" helpers to run POSIX tooling, as
> opposed to git itself.
>
> Utilities like "mv", "rm", "mkdir" etc. are differently chatty between
> platform, and this helper captures their stdout/stderr for a later
> test_cmp.
>
> Now, I think actually there isn't a bug *now* because we clobber the
> output, and seem to only call test_all_match() and other test_cmp
> helpers right after we've run "git", not these POSIX utilities.
>
> But since all we want in those cases is just a "run these commands in
> these N dirs" it would be good to split up the helper.

Maybe, but that sounds like a comment on the series from 13 months ago
rather than this series.  It's somewhat orthogonal to what I'm doing
here...  ;-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/5] Update documentation related to sparsity and the skip-worktree bit
  2022-02-16  9:15     ` Ævar Arnfjörð Bjarmason
@ 2022-02-16 16:21       ` Elijah Newren
  0 siblings, 0 replies; 30+ messages in thread
From: Elijah Newren @ 2022-02-16 16:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Victoria Dye,
	Derrick Stolee, Lessley Dennington

On Wed, Feb 16, 2022 at 1:23 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > Make several small updates, to address a few documentation issues
> > I spotted:
> >   * sparse-checkout focused on "patterns" even though the inputs (and
> >     outputs in the case of `list`) are directories in cone-mode
> >   * The description section of the sparse-checkout documentation
> >     was a bit sparse (no pun intended), and focused more on internal
> >     mechanics rather than end user usage.  This made sense in the
> >     early days when the command was even more experimental, but let's
> >     adjust a bit to try to make it more approachable to end users who
> >     may want to consider using it.  Keep the scary backward
> >     compatibility warning, though; we're still hard at work trying to
> >     fix up commands to behave reasonably in sparse checkouts.
> >   * both read-tree and update-index tried to describe how to use the
> >     skip-worktree bit, but both predated the sparse-checkout command.
> >     The sparse-checkout command is a far easier mechanism to use and
> >     for users trying to reduce the size of their working tree, we
> >     should recommend users to look at it instead.
> >   * The update-index documentation pointed out that assume-unchanged
> >     and skip-worktree sounded similar but had different purposes.
> >     However, it made no attempt to explain the differences, only to
> >     point out that they were different.  Explain the differences.
> >   * The update-index documentation focused much more on (internal?)
> >     implementation details than on end-user usage.  Try to explain
> >     its purpose better for users of update-index, rather than
> >     fellow developers trying to work with the SKIP_WORKTREE bit.
> >   * Clarify that when core.sparseCheckout=true, we treat a file's
> >     presence in the working tree as being an override to the
> >     SKIP_WORKTREE bit (i.e. in sparse checkouts when the file is
> >     present we ignore the SKIP_WORKTREE bit).
> >
> > Note that this commit, like many touching documentation, is best viewed
> > with the `--color-words` option to diff/log.
> >
> > Signed-off-by: Elijah Newren <newren@gmail.com>
> > ---
> >  Documentation/git-read-tree.txt       | 12 +++--
> >  Documentation/git-sparse-checkout.txt | 76 ++++++++++++++++-----------
> >  Documentation/git-update-index.txt    | 57 +++++++++++++++-----
> >  3 files changed, 98 insertions(+), 47 deletions(-)
> >
> > diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
> > index 8c3aceb8324..99bb387134d 100644
> > --- a/Documentation/git-read-tree.txt
> > +++ b/Documentation/git-read-tree.txt
> > @@ -375,9 +375,14 @@ have finished your work-in-progress), attempt the merge again.
> >  SPARSE CHECKOUT
> >  ---------------
> >
> > +Note: The `update-index` and `read-tree` primitives for supporting the
> > +skip-worktree bit predated the introduction of
> > +linkgit:git-sparse-checkout[1].  Users are encouraged to use
> > +`sparse-checkout` in preference to these low-level primitives.
>
> I was honestly a bit confused about whether we were really referring to
> the git-update-index and git-read-tree commands here, or some
> sparse-checkout (re-)usage of the same as "primitives"

Neither, really.

>, but reading it
> again & the commit message we're just talking about the commands here.

I was trying to talk about just the small building blocks found in
git-update-index and git-read-tree commands that relate to sparse
checkouts (i.e. the --[no-]skip-worktree and
--[no-]ignore-skip-worktree-entries options to git-update-index, and
the ability to update the working tree according to
$GITDIR/info/sprse-checkout buried in `git read-tree -mu HEAD`).

> So this really just wants to assure readers that they're advised to use
> the shiny porcelain command instead of the plumbing.
>
> I think we should refer to these as e.g. "linkgit:git-update-index[1]"
> not "`update-index`" here, and call them e.g. "plumbing commands"
> instead of "primitives" here, which will address that (the reader
> wonders "what's a primitive?").

Perhaps:

Note: The skip-worktree capabilities in linkgit:git-update-index[1]
and `read-tree` predated the introduction of
linkgit:git-sparse-checkout[1].  Users are encouraged to use the
`sparse-checkout` command in preference to these plumbing commands for
sparse-checkout/skip-worktree related needs.

?

>
> > -Initialize and modify the sparse-checkout configuration, which reduces
> > -the checkout to a set of paths given by a list of patterns.
> > +This command is used to create sparse checkouts, which means that it
> > +changes the working tree from having all tracked files present, to only
> > +have a subset of them.  It can also switch which subset of files are
> > +present, or undo and go back to having all tracked files present in the
> > +working copy.
>
> In terms of prose I think it's preferred to keep matter-of-fact "Slices
> and dices apples, making them easier to eat" instead of "This command
> slices and dices apples, which means that it's easier to eat them".
>
> I've forgotten what the linguisting term for that is, but it's more
> consistent with our docs, and makes for easier reading.

So, s/means that it changes/change/ ?

> > +The subset of files is chosen by providing a list of directories in
> > +cone mode (which is recommended), or by providing a list of patterns
> > +in non-cone mode.
>
> As someone with light familiarity with sparse-checkout:
>
> I'm still not sure if this is telling me that it's preferred to list
> directories v.s. patterns, or if it's telling me it's better to operate
> in "cone mode" v.s. "non-cone mode", or some combination of the two.
>
> IOW I think peeling out that "(which is recommended)" and making it
> clearly refer to which (or both?) of the two we're talking about would
> be much better.

Much easier to fix when I resubmit the patch to change the default for
sparse-checkout to cone mode (i.e. when I resubmit
https://lore.kernel.org/git/e30119b96dfaf9fdb82039cab84f8b81caacc1de.1644712798.git.gitgitgadget@gmail.com/).
I can include the above changes too.

> > +When in a sparse-checkout, other Git commands behave a bit differently.
> > +For example, switching branches will not update paths outside the
> > +sparse-checkout directories/patterns, and `git commit -a` will not record
> > +paths outside the sparse-checkout directories/patterns as deleted.
>
> (I didn't read through the rest in any detail)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching
  2022-02-16  9:32     ` Ævar Arnfjörð Bjarmason
@ 2022-02-16 16:30       ` Elijah Newren
  2022-02-17  4:40         ` Elijah Newren
  0 siblings, 1 reply; 30+ messages in thread
From: Elijah Newren @ 2022-02-16 16:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Victoria Dye,
	Derrick Stolee, Lessley Dennington

On Wed, Feb 16, 2022 at 1:37 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> > [...]
> > +static int path_found(const char *path, const char **dirname, size_t *dir_len,
> > +                   int *dir_found)
> > +{
> > +     struct stat st;
> > +     char *newdir;
> > +     char *tmp;
> > +
> > +     /*
> > +      * If dirname corresponds to a directory that doesn't exist, and this
> > +      * path starts with dirname, then path can't exist.
> > +      */
> > +     if (!*dir_found && !memcmp(path, *dirname, *dir_len))
> > +             return 0;
> > +
> > +     /*
> > +      * If path itself exists, return 1.
> > +      */
> > +     if (!lstat(path, &st))
> > +             return 1;
> > +
> > +     /*
> > +      * Otherwise, path does not exist so we'll return 0...but we'll first
> > +      * determine some info about its parent directory so we can avoid
> > +      * lstat calls for future cache entries.
> > +      */
> > +     newdir = strrchr(path, '/');
> > +     if (!newdir)
> > +             return 0; /* Didn't find a parent dir; just return 0 now. */
> > +
> > +     /*
> > +      * If path starts with directory (which we already lstat'ed and found),
> > +      * then no need to lstat parent directory again.
> > +      */
> > +     if (*dir_found && *dirname && memcmp(path, *dirname, *dir_len))
> > +             return 0;
>
> I really don't care/just asking, but there was a discussion on another
> topic about guarding calls to the mem*() family when n=0:
> https://lore.kernel.org/git/xmqq1r24gsph.fsf@gitster.g/
>
> Is this the same sort of redundancy where we could lose the "&&
> *dirname" part, or is it still important because a "\0" dirname would
> have corresponding non-0 *dir_len?

No, dirname is a char**, not a char*.  I need to make sure *dirname is
non-NULL before passing to memcmp or we get segfaults (and *dirname
will be NULL the first time it gets to this line, so the check is
critical).

> More generally ... (see below)...
>
> > +
> > +     /* Free previous dirname, and cache path's dirname */
> > +     *dirname = path;
> > +     *dir_len = newdir - path + 1;
> > +
> > +     tmp = xstrndup(path, *dir_len);
> > +     *dir_found = !lstat(tmp, &st);
>
> In most other places we're a bit more careful about lstat() error handling, e.g.:
>
>     builtin/init-db.c:              if (lstat(path->buf, &st_git)) {
>     builtin/init-db.c-                      if (errno != ENOENT)
>     builtin/init-db.c-                              die_errno(_("cannot stat '%s'"), path->buf);
>     builtin/init-db.c-              }
>
> Shouldn't we do the same here and at least error() on return values of
> -1 with an accompanying errno that isn't ENOENT?

If we should do that everywhere, should we have an xlstat in wrapper.[ch]?

> > +     free(tmp);
> > +
> > +     return 0;
> > +}
> > +
> >  void clear_skip_worktree_from_present_files(struct index_state *istate)
> >  {
> > +     const char *last_dirname = NULL;
> > +     size_t dir_len = 0;
> > +     int dir_found = 1;
> > +
> >       int i;
> > +
> >       if (!core_apply_sparse_checkout)
> >               return;
> >
> >  restart:
> >       for (i = 0; i < istate->cache_nr; i++) {
> >               struct cache_entry *ce = istate->cache[i];
> > -             struct stat st;
> >
> > -             if (ce_skip_worktree(ce) && !lstat(ce->name, &st)) {
> > +             if (ce_skip_worktree(ce) &&
> > +                 path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
>
> ...(continued from above) is the "path is zero" part of this even
> reachable? I tried with this on top and ran your tests (and the rest of
> t*sparse*.sh) successfully:
>
>         diff --git a/sparse-index.c b/sparse-index.c
>         index eed170cd8f7..f89c944d8cd 100644
>         --- a/sparse-index.c
>         +++ b/sparse-index.c
>         @@ -403,6 +403,7 @@ void clear_skip_worktree_from_present_files(struct index_state *istate)
>                 for (i = 0; i < istate->cache_nr; i++) {
>                         struct cache_entry *ce = istate->cache[i];
>
>         +               assert(*ce->name);
>                         if (ce_skip_worktree(ce) &&
>                             path_found(ce->name, &last_dirname, &dir_len, &dir_found)) {
>                                 if (S_ISSPARSEDIR(ce->ce_mode)) {
>
> I.e. isn't this undue paranoia about the cache API giving us zero-length
> paths?

Nope, not related at all, for two reasons: the code above was checking
for NULL pointers rather than NUL characters, and the argument I was
checking was last_dirname, not ce->name.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching
  2022-02-16 16:30       ` Elijah Newren
@ 2022-02-17  4:40         ` Elijah Newren
  0 siblings, 0 replies; 30+ messages in thread
From: Elijah Newren @ 2022-02-17  4:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Victoria Dye,
	Derrick Stolee, Lessley Dennington

On Wed, Feb 16, 2022 at 8:30 AM Elijah Newren <newren@gmail.com> wrote:
>
> On Wed, Feb 16, 2022 at 1:37 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> >
> > On Fri, Jan 14 2022, Elijah Newren via GitGitGadget wrote:
> >
...
> > > +
> > > +     /* Free previous dirname, and cache path's dirname */
> > > +     *dirname = path;
> > > +     *dir_len = newdir - path + 1;
> > > +
> > > +     tmp = xstrndup(path, *dir_len);
> > > +     *dir_found = !lstat(tmp, &st);
> >
> > In most other places we're a bit more careful about lstat() error handling, e.g.:
> >
> >     builtin/init-db.c:              if (lstat(path->buf, &st_git)) {
> >     builtin/init-db.c-                      if (errno != ENOENT)
> >     builtin/init-db.c-                              die_errno(_("cannot stat '%s'"), path->buf);
> >     builtin/init-db.c-              }
> >
> > Shouldn't we do the same here and at least error() on return values of
> > -1 with an accompanying errno that isn't ENOENT?
>
> If we should do that everywhere, should we have an xlstat in wrapper.[ch]?

Turns out we already DO have a file_exists() wrapper function, but
it's found in dir.c rather than wrapper.[ch].  However, it doesn't
bother to check errno either...

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  2022-01-14 15:59   ` [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
  2022-02-16  8:57     ` Ævar Arnfjörð Bjarmason
@ 2022-02-19  1:06     ` Jonathan Nieder
  2022-02-19 16:42       ` Elijah Newren
  2022-02-20 16:56       ` Derrick Stolee
  1 sibling, 2 replies; 30+ messages in thread
From: Jonathan Nieder @ 2022-02-19  1:06 UTC (permalink / raw)
  To: Elijah Newren via GitGitGadget
  Cc: git, Victoria Dye, Derrick Stolee, Lessley Dennington,
	Elijah Newren, Jonathan Tan, jabolopes, Jeff Hostetler

(cc-ing Jonathan Tan, Jose Lopes, and Jeff Hostetler, vfs experts)
Hi Elijah,

Elijah Newren wrote[1]:

> The fix is short (~30 lines), but the description is not.  Sorry.
>
> There is a set of problems caused by files in what I'll refer to as the
> "present-despite-SKIP_WORKTREE" state.  This commit aims to not just fix
> these problems, but remove the entire class as a possibility -- for
> those using sparse checkouts.  But first, we need to understand the
> problems this class presents.  A quick outline:
>
>    * Problems
>      * User facing issues
>      * Problem space complexity
>      * Maintenance and code correctness challenges
>    * SKIP_WORKTREE expectations in Git
>    * Suggested solution
>    * Pros/Cons of suggested solution
>    * Notes on testcase modifications

Thanks for a clear explanation!  This is very helpful.

> === User facing issues ===
>
> There are various ways for users to get files to be present in the
> working copy despite having the SKIP_WORKTREE bit set for that file in
> the index.  This may come from:
>   * various git commands not really supporting the SKIP_WORKTREE bit[1,2]
>   * users grabbing files from elsewhere and writing them to the worktree
>     (perhaps even cached in their editor)
>   * users attempting to "abort" a sparse-checkout operation with a
>     not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
>     working tree is not atomic)[3].
>
> Once users have present-despite-SKIP_WORKTREE files, any modifications
> users make to these files will be ignored, possibly to users' confusion.
[...]
> The suggests a simple solution: present-despite-SKIP_WORKTREE files
> should not exist, for those using sparse-checkouts.

This patch just reached "next", so at $DAYJOB a test for our vfsd[2]
noticed this change.  The trick behind a Git-based virtual filesystem
is typically:

- since we control the filesystem, we can pretend all files are
  already present.  On access, we obtain the file content from the git
  object store.  On write, we update the sparse-checkout pattern so
  that Git knows to start tracking the file.

- by keeping the sparse-checkout pattern narrow, we minimize the time
  commands like "git status" need to spend looking for changes in
  unmodified files.  Controlling the filesystem means we don't need to
  worry about changes to files that don't match that pattern (since
  any modification would also trigger a sparse-checkout pattern
  update).

If I understand the intent behind this change correctly, it's
incompatible with that trick.  How would you recommend handling that?
In the not too far away future, I'd expect something like the "VFS
projection hook" to handle this use case, but in the meantime, I would
expect this change to break VFS for Git performance.  A few options:

 a. We could guard it with a config option.  It would still be a
    regression for VFS for Git users, but they'd be able to use the
    config option to restore the expected behavior.  (Or
    alternatively, such a config option could be disabled by default,
    but I suspect that would defeat the purpose described for the
    patch.)

 b. We could distinguish between the vfsd and the "interrupted and
    forgot to update SKIP_WORKTREE bits in the index" cases somehow.
    This sounds complex.

 c. Something else?

(b) and (c) aren't sounding obviously good, so (a) seems tempting.
What do you think?

Thanks,
Jonathan

[1] https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/
[2] see
https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/
for what I mean by "vfsd"

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  2022-02-19  1:06     ` Jonathan Nieder
@ 2022-02-19 16:42       ` Elijah Newren
  2022-02-19 18:14         ` Jonathan Nieder
  2022-02-20 16:56       ` Derrick Stolee
  1 sibling, 1 reply; 30+ messages in thread
From: Elijah Newren @ 2022-02-19 16:42 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Victoria Dye,
	Derrick Stolee, Lessley Dennington, Jonathan Tan, Jose Lopes,
	Jeff Hostetler

On Fri, Feb 18, 2022 at 5:07 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> (cc-ing Jonathan Tan, Jose Lopes, and Jeff Hostetler, vfs experts)
> Hi Elijah,
>
> Elijah Newren wrote[1]:
>
> > The fix is short (~30 lines), but the description is not.  Sorry.
> >
> > There is a set of problems caused by files in what I'll refer to as the
> > "present-despite-SKIP_WORKTREE" state.  This commit aims to not just fix
> > these problems, but remove the entire class as a possibility -- for
> > those using sparse checkouts.  But first, we need to understand the
> > problems this class presents.  A quick outline:
> >
> >    * Problems
> >      * User facing issues
> >      * Problem space complexity
> >      * Maintenance and code correctness challenges
> >    * SKIP_WORKTREE expectations in Git
> >    * Suggested solution
> >    * Pros/Cons of suggested solution
> >    * Notes on testcase modifications
>
> Thanks for a clear explanation!  This is very helpful.
>
> > === User facing issues ===
> >
> > There are various ways for users to get files to be present in the
> > working copy despite having the SKIP_WORKTREE bit set for that file in
> > the index.  This may come from:
> >   * various git commands not really supporting the SKIP_WORKTREE bit[1,2]
> >   * users grabbing files from elsewhere and writing them to the worktree
> >     (perhaps even cached in their editor)
> >   * users attempting to "abort" a sparse-checkout operation with a
> >     not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
> >     working tree is not atomic)[3].
> >
> > Once users have present-despite-SKIP_WORKTREE files, any modifications
> > users make to these files will be ignored, possibly to users' confusion.
> [...]
> > The suggests a simple solution: present-despite-SKIP_WORKTREE files
> > should not exist, for those using sparse-checkouts.
>
> This patch just reached "next", so at $DAYJOB a test for our vfsd[2]
> noticed this change.  The trick behind a Git-based virtual filesystem
> is typically:
>
> - since we control the filesystem, we can pretend all files are
>   already present.  On access, we obtain the file content from the git
>   object store.  On write, we update the sparse-checkout pattern so
>   that Git knows to start tracking the file.
>
> - by keeping the sparse-checkout pattern narrow, we minimize the time
>   commands like "git status" need to spend looking for changes in
>   unmodified files.  Controlling the filesystem means we don't need to
>   worry about changes to files that don't match that pattern (since
>   any modification would also trigger a sparse-checkout pattern
>   update).

Sorry for the headache.

Let me try to restate the problem I'm solving, then attempt to put
your above situation into my own words, to verify I'm understanding...

So the primary challenge I was trying to address with this patch
series, was that keeping the filesystem and the "sparsity" state
in-sync was just *difficult*.  Things like checkout-index or even
diff+apply will write to the working tree without even thinking about
updating the sparsity state in the index to match.  And that's only
looking at Git-related commands.  People can write new copies of files
manually in a myriad of ways.  Detecting and handling that
inconsistent state between the working tree and index wasn't as easy
as it looked.

One way to solve this problem would be to have a vfs, where any time
someone writes to a file, you update the index to clear the
SKIP_WORKTREE bit for the written file.  Sounds like you have such a
thing. (...or close to it, since you'd update the sparsity patterns,
and the tooling to check mismatches of sparsity patterns and the index
state are good and those get updated nicely.  We just struggle to have
something that correctly updates between index and working tree
mismatches of sparsity state).  So, that basically means you wouldn't
benefit from this change.

But for those of us without some kind of vfs that detects writes and
auto-updates either the sparsity patterns or the SKIP_WORKTREE bit in
the index, we want something that will manually check the working tree
to see if files have been written and update the index accordingly.

And, of course, you're trying to do more than just detect
inconsistencies -- you want the vfs to fully control the sparsity
patterns and expand them based on dynamic file accesses by the user.
That dynamic bit doesn't play well with the non-vfs workaround.


Does that sound right?

> If I understand the intent behind this change correctly, it's
> incompatible with that trick.  How would you recommend handling that?
> In the not too far away future, I'd expect something like the "VFS
> projection hook" to handle this use case, but in the meantime, I would
> expect this change to break VFS for Git performance.  A few options:

Side note: I thought Microsoft's vfs was first named GVFS and then
based on naming collisions renamed to VFS for Git.  Sounds like you
have something that is probably a bit different, but which you are
also calling VFS for Git?  Is there some potential confusion here, or
are you banking on Microsoft eventually dropping their project?  Or
that you'll both keep your projects internal and not share them so the
naming collision doesn't matter?  Just curious...

>  a. We could guard it with a config option.  It would still be a
>     regression for VFS for Git users, but they'd be able to use the
>     config option to restore the expected behavior.  (Or
>     alternatively, such a config option could be disabled by default,
>     but I suspect that would defeat the purpose described for the
>     patch.)
>
>  b. We could distinguish between the vfsd and the "interrupted and
>     forgot to update SKIP_WORKTREE bits in the index" cases somehow.
>     This sounds complex.
>
>  c. Something else?
>
> (b) and (c) aren't sounding obviously good, so (a) seems tempting.
> What do you think?

Yeah, I'm having a hard time coming up with a way that either the VFS
could recognize these special Git present-despite-skip-worktree checks
and treat them differently, or having Git recognize that it is running
under a special VFS that likes to aggressively and automagically
expand the sparsity patterns.  So (a) seems tempting to me too.

Got any name suggestions?  core.avoidPresentDespiteSkipWorktreeCheck
(defaulting to false)?  core.sparsityConsistencyCheck (defaulting to
true)?

Did your team want to implement that on top of
en/present-despite-skipped since you can verify if it works for you,
or did you want me to take a stab at it?  Should be a pretty simple
change.

> Thanks,
> Jonathan
>
> [1] https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/
> [2] see
> https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/
> for what I mean by "vfsd"

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  2022-02-19 16:42       ` Elijah Newren
@ 2022-02-19 18:14         ` Jonathan Nieder
  2022-02-20  5:28           ` Elijah Newren
  0 siblings, 1 reply; 30+ messages in thread
From: Jonathan Nieder @ 2022-02-19 18:14 UTC (permalink / raw)
  To: Elijah Newren
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Victoria Dye,
	Derrick Stolee, Lessley Dennington, Jonathan Tan, Jose Lopes,
	Jeff Hostetler

Hi,

Elijah Newren wrote:

> And, of course, you're trying to do more than just detect
> inconsistencies -- you want the vfs to fully control the sparsity
> patterns and expand them based on dynamic file accesses by the user.
> That dynamic bit doesn't play well with the non-vfs workaround.
>
>
> Does that sound right?

You captured some of it.  There's a bit more: typically when using
sparse checkout the traditional way, you will not have files in your
working directory that do not match the sparse checkout pattern.  That
way, the disk usage in the working copy is nice and small.  But with a
vfsd like described in [2], having other files in the working
directory does not cost disk usage because the corresponding data even
in compressed (git object) form does not have to be present locally
until the files are accessed.  So a vfsd gives an end user the
illusion that all files are present, whereas git only operates on a
small subset (the working set).

With this change, git would start operating on all the files.

[...]
> Side note: I thought Microsoft's vfs was first named GVFS and then
> based on naming collisions renamed to VFS for Git.  Sounds like you
> have something that is probably a bit different, but which you are
> also calling VFS for Git?

No, sorry for the lack of clarity.  When I say "VFS for Git", I
genuinely mean https://vfsforgit.org/, which was authored by Microsoft
and to my understanding is still used by Microsoft's Windows team and
is available for anyone to use.  (That page currently returns a cert
error because their SSL cert expired 10 days ago.  But hopefully it
conveys the idea, and the content is still there if you go through the
interstitial.)

I agree that it can be kind of confusing to talk about that alongside
VFSes in general, but I didn't choose the name. :)

[...]
> On Fri, Feb 18, 2022 at 5:07 PM Jonathan Nieder <jrnieder@gmail.com> wrote:

>>  a. We could guard it with a config option.  It would still be a
>>     regression for VFS for Git users, but they'd be able to use the
>>     config option to restore the expected behavior.
[...]
>>  b. We could distinguish between the vfsd and the "interrupted and
>>     forgot to update SKIP_WORKTREE bits in the index" cases somehow.
>>     This sounds complex.
>>
>>  c. Something else?
>>
>> (b) and (c) aren't sounding obviously good, so (a) seems tempting.
>> What do you think?
>
> Yeah, I'm having a hard time coming up with a way that either the VFS
> could recognize these special Git present-despite-skip-worktree checks
> and treat them differently, or having Git recognize that it is running
> under a special VFS that likes to aggressively and automagically
> expand the sparsity patterns.  So (a) seems tempting to me too.

Thanks.  In a way it feels like giving up (isn't it better when things
automagically Just Work?), but I think it's the right thing to do.

> Got any name suggestions?  core.avoidPresentDespiteSkipWorktreeCheck
> (defaulting to false)?  core.sparsityConsistencyCheck (defaulting to
> true)?
>
> Did your team want to implement that on top of
> en/present-despite-skipped since you can verify if it works for you,
> or did you want me to take a stab at it?  Should be a pretty simple
> change.

Monday is a holiday here; it shouldn't be hard to get a patch out
later this week.  Happy to write one but I also won't be at all offended
if someone else writes it first.

Ideally the config name should describe the intent from the user's
perspective instead of the implementation.  So something like
"sparsecheckout.expectFilesOutsideSparsePattern".

Thanks,
Jonathan

>> [2] see
>> https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/
>> for what I mean by "vfsd"

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  2022-02-19 18:14         ` Jonathan Nieder
@ 2022-02-20  5:28           ` Elijah Newren
  0 siblings, 0 replies; 30+ messages in thread
From: Elijah Newren @ 2022-02-20  5:28 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: Elijah Newren via GitGitGadget, Git Mailing List, Victoria Dye,
	Derrick Stolee, Lessley Dennington, Jonathan Tan, Jose Lopes,
	Jeff Hostetler

On Sat, Feb 19, 2022 at 10:14 AM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Hi,
>
> Elijah Newren wrote:
>
> > And, of course, you're trying to do more than just detect
> > inconsistencies -- you want the vfs to fully control the sparsity
> > patterns and expand them based on dynamic file accesses by the user.
> > That dynamic bit doesn't play well with the non-vfs workaround.
> >
> >
> > Does that sound right?
>
> You captured some of it.  There's a bit more: typically when using
> sparse checkout the traditional way, you will not have files in your
> working directory that do not match the sparse checkout pattern.  That
> way, the disk usage in the working copy is nice and small.  But with a
> vfsd like described in [2], having other files in the working
> directory does not cost disk usage because the corresponding data even
> in compressed (git object) form does not have to be present locally
> until the files are accessed.  So a vfsd gives an end user the
> illusion that all files are present, whereas git only operates on a
> small subset (the working set).
>
> With this change, git would start operating on all the files.
>
> [...]
> > Side note: I thought Microsoft's vfs was first named GVFS and then
> > based on naming collisions renamed to VFS for Git.  Sounds like you
> > have something that is probably a bit different, but which you are
> > also calling VFS for Git?
>
> No, sorry for the lack of clarity.  When I say "VFS for Git", I
> genuinely mean https://vfsforgit.org/, which was authored by Microsoft
> and to my understanding is still used by Microsoft's Windows team and
> is available for anyone to use.  (That page currently returns a cert
> error because their SSL cert expired 10 days ago.  But hopefully it
> conveys the idea, and the content is still there if you go through the
> interstitial.)
>
> I agree that it can be kind of confusing to talk about that alongside
> VFSes in general, but I didn't choose the name. :)
>
> [...]
> > On Fri, Feb 18, 2022 at 5:07 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> >>  a. We could guard it with a config option.  It would still be a
> >>     regression for VFS for Git users, but they'd be able to use the
> >>     config option to restore the expected behavior.
> [...]
> >>  b. We could distinguish between the vfsd and the "interrupted and
> >>     forgot to update SKIP_WORKTREE bits in the index" cases somehow.
> >>     This sounds complex.
> >>
> >>  c. Something else?
> >>
> >> (b) and (c) aren't sounding obviously good, so (a) seems tempting.
> >> What do you think?
> >
> > Yeah, I'm having a hard time coming up with a way that either the VFS
> > could recognize these special Git present-despite-skip-worktree checks
> > and treat them differently, or having Git recognize that it is running
> > under a special VFS that likes to aggressively and automagically
> > expand the sparsity patterns.  So (a) seems tempting to me too.
>
> Thanks.  In a way it feels like giving up (isn't it better when things
> automagically Just Work?), but I think it's the right thing to do.
>
> > Got any name suggestions?  core.avoidPresentDespiteSkipWorktreeCheck
> > (defaulting to false)?  core.sparsityConsistencyCheck (defaulting to
> > true)?
> >
> > Did your team want to implement that on top of
> > en/present-despite-skipped since you can verify if it works for you,
> > or did you want me to take a stab at it?  Should be a pretty simple
> > change.
>
> Monday is a holiday here; it shouldn't be hard to get a patch out
> later this week.  Happy to write one but I also won't be at all offended
> if someone else writes it first.
>
> Ideally the config name should describe the intent from the user's
> perspective instead of the implementation.  So something like
> "sparsecheckout.expectFilesOutsideSparsePattern".
>
> Thanks,
> Jonathan

I took a stab at it over here:
https://lore.kernel.org/git/pull.1153.git.1645333542011.gitgitgadget@gmail.com/

Let me know if that works for you.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  2022-02-19  1:06     ` Jonathan Nieder
  2022-02-19 16:42       ` Elijah Newren
@ 2022-02-20 16:56       ` Derrick Stolee
  2022-02-22 23:17         ` Jonathan Nieder
  1 sibling, 1 reply; 30+ messages in thread
From: Derrick Stolee @ 2022-02-20 16:56 UTC (permalink / raw)
  To: Jonathan Nieder, Elijah Newren via GitGitGadget
  Cc: git, Victoria Dye, Derrick Stolee, Lessley Dennington,
	Elijah Newren, Jonathan Tan, jabolopes, Jeff Hostetler

On 2/18/2022 8:06 PM, Jonathan Nieder wrote:
> (cc-ing Jonathan Tan, Jose Lopes, and Jeff Hostetler, vfs experts)
> Hi Elijah,
> 
> Elijah Newren wrote[1]:
> 
>> The fix is short (~30 lines), but the description is not.  Sorry.
>>
>> There is a set of problems caused by files in what I'll refer to as the
>> "present-despite-SKIP_WORKTREE" state.  This commit aims to not just fix
>> these problems, but remove the entire class as a possibility -- for
>> those using sparse checkouts.  But first, we need to understand the
>> problems this class presents.  A quick outline:
>>
>>    * Problems
>>      * User facing issues
>>      * Problem space complexity
>>      * Maintenance and code correctness challenges
>>    * SKIP_WORKTREE expectations in Git
>>    * Suggested solution
>>    * Pros/Cons of suggested solution
>>    * Notes on testcase modifications
> 
> Thanks for a clear explanation!  This is very helpful.
> 
>> === User facing issues ===
>>
>> There are various ways for users to get files to be present in the
>> working copy despite having the SKIP_WORKTREE bit set for that file in
>> the index.  This may come from:
>>   * various git commands not really supporting the SKIP_WORKTREE bit[1,2]
>>   * users grabbing files from elsewhere and writing them to the worktree
>>     (perhaps even cached in their editor)
>>   * users attempting to "abort" a sparse-checkout operation with a
>>     not-so-early Ctrl+C (updating $GIT_DIR/info/sparse-checkout and the
>>     working tree is not atomic)[3].
>>
>> Once users have present-despite-SKIP_WORKTREE files, any modifications
>> users make to these files will be ignored, possibly to users' confusion.
> [...]
>> The suggests a simple solution: present-despite-SKIP_WORKTREE files
>> should not exist, for those using sparse-checkouts.
> 
> This patch just reached "next", so at $DAYJOB a test for our vfsd[2]
> noticed this change.  The trick behind a Git-based virtual filesystem
> is typically:
> 
> - since we control the filesystem, we can pretend all files are
>   already present.  On access, we obtain the file content from the git
>   object store.  On write, we update the sparse-checkout pattern so
>   that Git knows to start tracking the file.
> 
> - by keeping the sparse-checkout pattern narrow, we minimize the time
>   commands like "git status" need to spend looking for changes in
>   unmodified files.  Controlling the filesystem means we don't need to
>   worry about changes to files that don't match that pattern (since
>   any modification would also trigger a sparse-checkout pattern
>   update).
> 
> If I understand the intent behind this change correctly, it's
> incompatible with that trick.  How would you recommend handling that?
> In the not too far away future, I'd expect something like the "VFS
> projection hook" to handle this use case, but in the meantime, I would
> expect this change to break VFS for Git performance.  A few options:
> 
>  a. We could guard it with a config option.  It would still be a
>     regression for VFS for Git users, but they'd be able to use the
>     config option to restore the expected behavior.  (Or
>     alternatively, such a config option could be disabled by default,
>     but I suspect that would defeat the purpose described for the
>     patch.)
> 
>  b. We could distinguish between the vfsd and the "interrupted and
>     forgot to update SKIP_WORKTREE bits in the index" cases somehow.
>     This sounds complex.
> 
>  c. Something else?
> 
> (b) and (c) aren't sounding obviously good, so (a) seems tempting.
> What do you think?

Just chiming in here to say that we've dealt with these issues in
microsoft/git by special-casing them behind our core_virtualfilesystem
global. A recent example is the changes to 'git add' to prevent
adding a file that is marked as sparse (unless --sparse is specified).
We always allow this when in the virtualized scenario [1].

[1] https://github.com/microsoft/git/blob/2f6531aced2e77a6c1000a923967ae0105383930/builtin/add.c#L50-L54

This seems like something that should be on vfsd to handle, and should
not prevent upstream Git from making changes that benefit its users.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree
  2022-02-20 16:56       ` Derrick Stolee
@ 2022-02-22 23:17         ` Jonathan Nieder
  0 siblings, 0 replies; 30+ messages in thread
From: Jonathan Nieder @ 2022-02-22 23:17 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Elijah Newren via GitGitGadget, git, Victoria Dye,
	Derrick Stolee, Lessley Dennington, Elijah Newren, Jonathan Tan,
	jabolopes, Jeff Hostetler

Hi,

Derrick Stolee wrote:

> Just chiming in here to say that we've dealt with these issues in
> microsoft/git by special-casing them behind our core_virtualfilesystem
> global. A recent example is the changes to 'git add' to prevent
> adding a file that is marked as sparse (unless --sparse is specified).
> We always allow this when in the virtualized scenario [1].
>
> [1] https://github.com/microsoft/git/blob/2f6531aced2e77a6c1000a923967ae0105383930/builtin/add.c#L50-L54

Oh!  Thanks for that pointer --- it's very helpful.  I hadn't realized
that VFS for Git still requires a patched Git.

> This seems like something that should be on vfsd to handle, and should
> not prevent upstream Git from making changes that benefit its users.

Separate from the "should" questions (in general I'm in favor of
unpatched Git being able to serve as a solid platform for both Scalar
and this kind of VFS use case), this context tells me that this is not
in fact a regression for VFS for Git users, so it's much appreciated.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2022-02-22 23:17 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-13 16:43 [PATCH 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
2022-01-13 16:43 ` [PATCH 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
2022-01-13 23:35   ` Elijah Newren
2022-01-14 15:59 ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Elijah Newren via GitGitGadget
2022-01-14 15:59   ` [PATCH v2 1/5] t1011: add testcase demonstrating accidental loss of user modifications Elijah Newren via GitGitGadget
2022-02-16  8:51     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:02       ` Elijah Newren
2022-01-14 15:59   ` [PATCH v2 2/5] unpack-trees: fix accidental loss of user changes Elijah Newren via GitGitGadget
2022-01-14 15:59   ` [PATCH v2 3/5] repo_read_index: clear SKIP_WORKTREE bit from files present in worktree Elijah Newren via GitGitGadget
2022-02-16  8:57     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:08       ` Elijah Newren
2022-02-19  1:06     ` Jonathan Nieder
2022-02-19 16:42       ` Elijah Newren
2022-02-19 18:14         ` Jonathan Nieder
2022-02-20  5:28           ` Elijah Newren
2022-02-20 16:56       ` Derrick Stolee
2022-02-22 23:17         ` Jonathan Nieder
2022-01-14 15:59   ` [PATCH v2 4/5] Update documentation related to sparsity and the skip-worktree bit Elijah Newren via GitGitGadget
2022-02-16  9:15     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:21       ` Elijah Newren
2022-01-14 15:59   ` [PATCH v2 5/5] Accelerate clear_skip_worktree_from_present_files() by caching Elijah Newren via GitGitGadget
2022-01-15  1:39     ` Victoria Dye
2022-02-16  9:32     ` Ævar Arnfjörð Bjarmason
2022-02-16 16:30       ` Elijah Newren
2022-02-17  4:40         ` Elijah Newren
2022-01-15  1:51   ` [PATCH v2 0/5] Remove the present-despite-SKIP_WORKTREE class of bugs (for sparse-checkouts) Victoria Dye

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.