git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Victoria Dye <vdye@github.com>
To: Junio C Hamano <gitster@pobox.com>,
	Kevin Willford via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org, stolee@gmail.com, newren@gmail.com,
	Kevin Willford <kewillf@microsoft.com>
Subject: Re: [PATCH 1/7] reset: behave correctly with sparse-checkout
Date: Fri, 1 Oct 2021 10:55:22 -0400	[thread overview]
Message-ID: <53263148-84b1-f76a-263d-afd00819e7a7@github.com> (raw)
In-Reply-To: <xmqqpmspc3pn.fsf@gitster.g>

Junio C Hamano wrote:
> "Kevin Willford via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> @@ -127,12 +129,49 @@ static void update_index_from_diff(struct diff_queue_struct *q,
>>  		struct diff_options *opt, void *data)
>>  {
>>  	int i;
>> +	int pos;
>>  	int intent_to_add = *(int *)data;
>>  
>>  	for (i = 0; i < q->nr; i++) {
>>  		struct diff_filespec *one = q->queue[i]->one;
>> +		struct diff_filespec *two = q->queue[i]->two;
>>  		int is_missing = !(one->mode && !is_null_oid(&one->oid));
>> +		int was_missing = !two->mode && is_null_oid(&two->oid);
> 
> Not a problem introduced by this patch per-se, but is_missing is a
> counter-intuitive name for what the boolean wants to represent, I
> think, which was brought in by b4b313f9 (reset: support "--mixed
> --intent-to-add" mode, 2014-02-04).  Before the commit, we used to
> say
> 
>  	for (i = 0; i < q->nr; i++) {
>  		struct diff_filespec *one = q->queue[i]->one;
> 		if (one->mode && !is_null_sha1(one->sha1)) {
> 			... create ce out of one and add to the	index ...
> 		} else
>  			remove_file_from_cache(one->path);
> 		...
> 
> i.e. "if one is not missing, create a ce and add it, otherwise
> remove the path".
> 
> It should have been called "one_is_missing" if we wanted to
> literally express the condition the code checked, but an even better
> name would have been given after the intent of what the code wants
> to do with the information.  If the resetted-to tree (that is what
> 'one' side of the comparison in diff_cache() is) has a valid blob,
> we want it to be in the index, and otherwise, we do not want it in
> the index.
> 
> Now, the patch makes things worse and I had to do the above digging
> to see why the new code is even more confusing.  The 'two' side of
> the comparison is what is in the to-be-corrected-by-reset index.
> "was_missing" in contrast to "is_missing" makes it sound as if it
> was the state before whatever "is_missing" tries to represent, but
> that is not what is happening.  "is_missing" does not mean "the
> entry is currently not there in the index", but "was_missing" does
> mean exactly that: "the entry is currently not there in the index".
> 
> There isn't any "was" missing about it.  It "is" missing in the
> index.  Instead of renaming, I wonder if we can do without this new
> variable.  Let's read on the patch.

The new variable can most likely be refactored away, but based on this it's
probably worth renaming "is_missing" to "is_missing_in_reset_tree" (or
inverting the boolean and using "is_in_reset_tree").

> Also, now the code uses both sides of the filepair, we must double
> check that our do_diff_cache() is *not* doing any rename detection.
> It might be even prudent to ensure that 
> 
> 	if (strcmp(one->path, two->path))
> 		BUG("reset drove diff-cache with rename detection");
> 
> but it might be with too much paranoia.  I dunno.

I don't think a rename would break what this change intends to do (although
it does break some of the current assumptions in the patch). I'll make sure
to verify the rename case works before submitting a new version, just in 
case.

>>  		struct cache_entry *ce;
>> +		struct cache_entry *ce_before;
>> +		struct checkout state = CHECKOUT_INIT;
> 
> These two new variables do not need this wide a scope, I would
> think.  Shouldn't it be inside the body of the new "if" statement
> this patch adds?

I will likely need to make other changes to this patch and re-roll, so I'll
fix the scoping of all of the variables added here when I do.

>> +		/*
>> +		 * When using the sparse-checkout feature the cache entries
>> +		 * that are added here will not have the skip-worktree bit
>> +		 * set. Without this code there is data that is lost because
>> +		 * the files that would normally be in the working directory
>> +		 * are not there and show as deleted for the next status.
>> +		 * In the case of added files, they just disappear.
>> +		 *
>> +		 * We need to create the previous version of the files in
>> +		 * the working directory so that they will have the right
>> +		 * content and the next status call will show modified or
>> +		 * untracked files correctly.
>> +		 */
>> +		if (core_apply_sparse_checkout && !file_exists(two->path)) {
> 
> In a sparsely checked out working tree, there is nothing in the
> working tree at the path.  It may be because it is sparse and we
> didn't want to have anything there, or it may be because the user
> wanted to get rid of it and said "rm path" (not "git rm path") and
> this part of the tree were of interest even if the sparse checkout
> feature was used to hide other parts of the tree.  With the above
> two checks alone, we cannot tell which.  Let's read on.
> 
>> +			pos = cache_name_pos(two->path, strlen(two->path));
> 
> We check the index to see if there is an entry for it.  I suspect
> that because we need to do this check anyway, we shouldn't even have
> to look at 'two' (and add a new 'was_missing' variable), because
> 'one' and 'two' came from a comparison between the resetted-to tree
> object and the current index, and if cache_name_pos() for the path
> (we can use 'one->path') says there is an entry in the index, by
> definition, 'two' would not be showing a "removed" state (i.e. "the
> resetted-to tree had it, the index does not" is what "was_missing"
> wants to say).
> 
> So I wonder if it is better to
> 
>  - use one->path for !file_exists() above and cache_name_pos() here
>    instead of two->path.
> 
>  - drop the confusingly named 'was_missing', because (pos < 0) is
>    equivalent to it after this point, and we didn't need it up to
>    this point.
> 
>> +			if ((pos >= 0 && ce_skip_worktree(active_cache[pos])) &&
> 
> And we do find an entry for it.  So this path is not something
> sparse cone specifies not to check out (otherwise we would have a
> tree-like entry that covers this path in the index and not an entry
> for this specific path)?
> 
> Anyway, if it is marked with the skip-worktree bit, does that mean
> there is no risk that the reason why two->path does not exist in the
> working tree is because we earlier gave it in the working tree but
> it was later removed by the user?  Just making sure that we are not
> breaking the end-user's wish that the path should be removed by
> resurrecting it in the working tree with a new call to
> checkout_entry().
> 
>> +			    (is_missing || !was_missing)) {
> 
> And in such a case, if the resetted-to tree says we shouldn't have
> the path in the resulting index, or if the original state in the
> index had this path (but because (0 <= pos) must be true for us to 
> reach this point, I am not sure if "was_missing" can ever be true
> here), then do the following, which is ...
> 
>> +				state.force = 1;
>> +				state.refresh_cache = 1;
>> +				state.istate = &the_index;
>> +
>> +				ce_before = make_cache_entry(&the_index, two->mode,
>> +							     &two->oid, two->path,
>> +							     0, 0);
>> +				if (!ce_before)
>> +					die(_("make_cache_entry failed for path '%s'"),
>> +						two->path);
>> +
>> +				checkout_entry(ce_before, &state, NULL, NULL);
> 
> ... to resurrect the last "git add"ed state from the index and write
> it out to the working tree.  As I suspected, ce_before and state
> should be scoped inside this block and not visible outside, no?
> 
> I am not sure why this behaviour is desirable.  The "mixed" reset
> should not have to touch the working tree in the first place.
> 
> The large comment before this block says "... will not have the
> skip-worktree bit set", but we are dealing with a case where the
> original index had a cache entry there with skip-worktree bit set,
> so isn't the more desirable outcome that the cache entry added back
> to the index has the skip-worktree bit still set and there is no
> working tree file that the user did not desire to have?
> 
> And isn't it the matter of preserving the skip-worktree bit when the
> code in the post context of this hunk this patch did not touch adds
> the entry back to the index?
> 
>> +			}
>> +		}
>>  
>>  		if (is_missing && !intent_to_add) {
>>  			remove_file_from_cache(one->path);
> 
> If we look at the code after this point, we do use "is_missing"
> information to tweak ce->ce_flags with the intent-to-add bit.
> 
> Perhaps we can do a similar tweak to the cache entry to mark it with
> skip-worktree bit if the index had a cache entry at the path with
> the bit set?  The code that needs to do so would only have to
> remember if the one->path is in the current index and the cache
> entry for the path has the skip-worktree bit in the body of the new
> if() statement about (core_apply_sparse_checkout && !file_exists())
> added by this patch (I am not sure if !file_exists() even matters,
> though, as the approach I am suggesting is to preserve the skip bit
> and not disturb the working tree files at all).

I think it might easier to address these points as a whole rather than
inline.

The problem this patch is attempting to solve is that, while (as you noted)
`git reset --mixed` should not touch the working tree, it is *also* expected
to preserve the files of the pre-reset state (both statements paraphrased
from the `--mixed` option doc). Normally these statements don't conflict,
but if `skip-worktree` is respected and nothing is done to the working tree
before resetting the index, `skip-worktree` files will effectively be `reset
--hard`. So, to force preservation of the pre-reset state, the files are
checked out.

Based on that high-level intent, the implementation here can be simplified
(and clarified). The condition on checking out a file (to avoid the `reset 
--hard`) would be "if the path exists in the current index and the entry 
in the index has `skip-worktree` enabled".

* "if the path exists in the current index" - if it does not exist in the
  index, there's nothing to preserve.
* "if the entry in the index has `skip-worktree` enabled" - if it does not,
  it's already in the working tree so we don't need to checkout.

Then, `checkout_entry()` can then be run on the index entry found (rather
than a "fake" one created with `make_cache_entry`). This eliminates a lot of
unnecessary usage of `one` and `two`, which hopefully addresses some of your
concerns about them. After that, the index reset proceeds as normal (without
manual changes to the `skip-worktree` bit).

As for the issue of ignoring `skip-worktree`: all of this could be
conditioned on a "--ignore-skip-worktree-bits" flag (or something like it)
if you'd prefer the default behavior is "don't touch the working tree".

  reply	other threads:[~2021-10-01 14:55 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-30 14:50 [PATCH 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
2021-09-30 14:50 ` [PATCH 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
2021-09-30 18:34   ` Junio C Hamano
2021-10-01 14:55     ` Victoria Dye [this message]
2021-09-30 14:50 ` [PATCH 2/7] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
2021-09-30 19:17   ` Taylor Blau
2021-09-30 20:11     ` Victoria Dye
2021-09-30 21:32       ` Junio C Hamano
2021-09-30 22:59         ` Victoria Dye
2021-10-01  0:04           ` Junio C Hamano
2021-10-04 13:47             ` Victoria Dye
2021-10-01  9:14   ` Bagas Sanjaya
2021-09-30 14:50 ` [PATCH 3/7] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
2021-09-30 14:50 ` [PATCH 4/7] reset: integrate with sparse index Victoria Dye via GitGitGadget
2021-09-30 14:50 ` [PATCH 5/7] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
2021-09-30 14:51 ` [PATCH 6/7] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
2021-10-01 15:03   ` Victoria Dye
2021-09-30 14:51 ` [PATCH 7/7] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
2021-10-05 13:20 ` [PATCH v2 0/7] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
2021-10-05 13:20   ` [PATCH v2 1/7] reset: behave correctly with sparse-checkout Kevin Willford via GitGitGadget
2021-10-05 19:30     ` Junio C Hamano
2021-10-05 21:59       ` Victoria Dye
2021-10-06 12:44         ` Junio C Hamano
2021-10-06  1:46     ` Elijah Newren
2021-10-06 20:09       ` Victoria Dye
2021-10-06 10:31     ` Bagas Sanjaya
2021-10-05 13:20   ` [PATCH v2 2/7] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
2021-10-06  2:00     ` Elijah Newren
2021-10-06 20:40       ` Victoria Dye
2021-10-08  3:42         ` Elijah Newren
2021-10-08 17:11           ` Junio C Hamano
2021-10-06 10:33     ` Bagas Sanjaya
2021-10-05 13:20   ` [PATCH v2 3/7] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
2021-10-06  2:04     ` Elijah Newren
2021-10-05 13:20   ` [PATCH v2 4/7] reset: integrate with sparse index Victoria Dye via GitGitGadget
2021-10-06  2:15     ` Elijah Newren
2021-10-06 17:48       ` Junio C Hamano
2021-10-05 13:20   ` [PATCH v2 5/7] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
2021-10-06  3:43     ` Elijah Newren
2021-10-06 20:56       ` Victoria Dye
2021-10-06 10:34     ` Bagas Sanjaya
2021-10-05 13:20   ` [PATCH v2 6/7] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
2021-10-06  4:43     ` Elijah Newren
2021-10-07 14:34       ` Victoria Dye
2021-10-05 13:20   ` [PATCH v2 7/7] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
2021-10-06 10:37     ` Bagas Sanjaya
2021-10-05 15:34   ` [PATCH v2 0/7] Sparse Index: integrate with reset Ævar Arnfjörð Bjarmason
2021-10-05 20:44     ` Victoria Dye
2021-10-06  5:46   ` Elijah Newren
2021-10-07 21:15   ` [PATCH v3 0/8] " Victoria Dye via GitGitGadget
2021-10-07 21:15     ` [PATCH v3 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
2021-10-07 21:15     ` [PATCH v3 2/8] reset: preserve skip-worktree bit in mixed reset Kevin Willford via GitGitGadget
2021-10-08  9:04       ` Junio C Hamano
2021-10-07 21:15     ` [PATCH v3 3/8] update-index: add --force-full-index option for expand/collapse test Victoria Dye via GitGitGadget
2021-10-08  2:50       ` Bagas Sanjaya
2021-10-08  5:24       ` Junio C Hamano
2021-10-08 15:47         ` Victoria Dye
2021-10-08 17:19           ` Junio C Hamano
2021-10-11 14:12             ` Derrick Stolee
2021-10-11 15:05               ` Victoria Dye
2021-10-11 15:24               ` Junio C Hamano
2021-10-07 21:15     ` [PATCH v3 4/8] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
2021-10-07 21:15     ` [PATCH v3 5/8] reset: integrate with sparse index Victoria Dye via GitGitGadget
2021-10-07 21:15     ` [PATCH v3 6/8] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
2021-10-08 11:09       ` Phillip Wood
2021-10-08 17:14         ` Victoria Dye
2021-10-08 18:31           ` Junio C Hamano
2021-10-09 11:18             ` Phillip Wood
2021-10-10 22:03               ` Junio C Hamano
2021-10-11 15:55                 ` Victoria Dye
2021-10-11 16:16                   ` Junio C Hamano
2021-10-12 10:16                 ` Phillip Wood
2021-10-12 19:15                   ` Junio C Hamano
2021-10-12 10:17           ` Phillip Wood
2021-10-07 21:15     ` [PATCH v3 7/8] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
2021-11-20 22:02       ` Elijah Newren
2021-11-22 16:46         ` Victoria Dye
2021-10-07 21:15     ` [PATCH v3 8/8] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
2021-10-11 20:30     ` [PATCH v4 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 2/8] reset: preserve skip-worktree bit in mixed reset Victoria Dye via GitGitGadget
2021-10-22  4:19         ` Emily Shaffer
2021-10-25 15:59           ` Victoria Dye
2021-10-11 20:30       ` [PATCH v4 3/8] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 4/8] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 5/8] reset: integrate with sparse index Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 6/8] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 7/8] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
2021-10-11 20:30       ` [PATCH v4 8/8] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
2021-10-27 14:39       ` [PATCH v5 0/8] Sparse Index: integrate with reset Victoria Dye via GitGitGadget
2021-10-27 14:39         ` [PATCH v5 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
2021-10-27 14:39         ` [PATCH v5 2/8] reset: preserve skip-worktree bit in mixed reset Victoria Dye via GitGitGadget
2021-10-27 14:39         ` [PATCH v5 3/8] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
2021-10-27 14:39         ` [PATCH v5 4/8] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
2021-10-27 14:39         ` [PATCH v5 5/8] reset: integrate with sparse index Victoria Dye via GitGitGadget
2021-10-27 14:39         ` [PATCH v5 6/8] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
2021-10-27 14:39         ` [PATCH v5 7/8] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
2021-11-20 22:05           ` Elijah Newren
2021-11-22 16:54             ` Victoria Dye
2021-10-27 14:39         ` [PATCH v5 8/8] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
2021-11-20 22:13         ` [PATCH v5 0/8] Sparse Index: integrate with reset Elijah Newren
2021-11-29 15:52         ` [PATCH v6 " Victoria Dye via GitGitGadget
2021-11-29 15:52           ` [PATCH v6 1/8] reset: rename is_missing to !is_in_reset_tree Victoria Dye via GitGitGadget
2021-11-29 15:52           ` [PATCH v6 2/8] reset: preserve skip-worktree bit in mixed reset Victoria Dye via GitGitGadget
2021-11-29 15:52           ` [PATCH v6 3/8] sparse-index: update command for expand/collapse test Victoria Dye via GitGitGadget
2021-11-29 15:52           ` [PATCH v6 4/8] reset: expand test coverage for sparse checkouts Victoria Dye via GitGitGadget
2021-11-29 15:52           ` [PATCH v6 5/8] reset: integrate with sparse index Victoria Dye via GitGitGadget
2021-11-29 15:52           ` [PATCH v6 6/8] reset: make sparse-aware (except --mixed) Victoria Dye via GitGitGadget
2021-11-29 15:52           ` [PATCH v6 7/8] reset: make --mixed sparse-aware Victoria Dye via GitGitGadget
2021-11-29 15:52           ` [PATCH v6 8/8] unpack-trees: improve performance of next_cache_entry Victoria Dye via GitGitGadget
2021-11-29 18:37           ` [PATCH v6 0/8] Sparse Index: integrate with reset Elijah Newren
2021-11-29 19:44             ` Victoria Dye
2021-11-30  3:59               ` Elijah Newren
2021-12-01 20:26               ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53263148-84b1-f76a-263d-afd00819e7a7@github.com \
    --to=vdye@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    --cc=kewillf@microsoft.com \
    --cc=newren@gmail.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).