git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marc Strapetz <marc.strapetz@syntevo.com>
To: Junio C Hamano <gitster@pobox.com>,
	Marc Strapetz via GitGitGadget <gitgitgadget@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] update-index: refresh should rewrite index in case of racy timestamps
Date: Thu, 23 Dec 2021 19:24:32 +0100	[thread overview]
Message-ID: <b97672fa-837f-1e28-f7f2-aee80e52d374@syntevo.com> (raw)
In-Reply-To: <xmqqfsqkdwo4.fsf@gitster.g>

On 23/12/2021 00:52, Junio C Hamano wrote:
> Ah, there are cases where we do clear active_cache_changed when we
> notice that an operation detected an error, to avoid spreading the
> breakage by writing the index file out, and I think that is the
> right thing to do.  Which means that the above patch is not quite
> right.  Perhaps taking all of the above together, something like
> this?
> 
> 	*o->has_errors |= refresh_cache(o->flags | flag);
> 	if (*o->has_errors)
> 		active_cache_changed = 0;
> 	else if (has_racy_timestamps(&the_index))
>          	/*
> 		 * Even if nothing else has changed, updating the file
> 		 * increases the chance that racy timestamps become
> 		 * non-racy, helping future run-time performance.
> 		 */
> 		active_cache_changed |= SOMETHING_CHANGED;

I think it's safe to write the index even if refresh_cache() reports an 
"error" and we should actually do that:

The underlying refresh_index() will report an "error" only for "file: 
needs merge" and "file: needs update". In both cases, the corresponding 
entries will not have been updated. Every entry which has been updated 
is good on its own and writing these updates makes the index a little 
bit better. Subsequent calls to refresh_index() won't have to do the 
same work again (like invoking the quite expensive LFS filter).

This is also how cmd_status() currently works: it does not pay attention 
to the return value of refresh_index() and will always write the index 
if racy timestamps are encountered.

Overall, the "error" handling in update-index.c might not always do what 
one expects. Let's consider your suggested fix. When invoking:

update-index --refresh

this won't fix racy timestamps, however:

update-index --refresh --add untracked

will do. I think this is caused by active_cache_changed being used in 
two different ways: to indicate that the cache should be written and to 
indicate that it must not be written. It might be a good idea to take 
the latter "block index write" to a separate static variable in 
update-index.c.

Candidate usages of this new "block index write" variable will be in the 
existing callbacks: errors detected in unresolve_callback() should 
probably continue to block an index write, to ensure that either all or 
none of the specified files will be unresolved. For the 
reupdate_callback(), the underlying do_reupdate() seems to return 0 
always, so there is dead code in the callback (or am I completely 
blind?). To stay on the safe side, we may still continue to block an 
index write here. The refresh_callback() will never block an index write.

Does it make sense to clarify error handling in some preceding commit 
and only then address the razy timestamps? It will probably make this 
second commit clearer.

>> +}
>> +
>> +update_assert_changed() {
>> +	local ts1=$(test-tool chmtime --get .git/index) &&
>> +	test_might_fail git update-index $1 &&
>> +	local ts2=$(test-tool chmtime --get .git/index) &&
>> +	[ $ts1 -ne $ts2 ]
>> +}
>> +
>> +test_expect_success 'setup' '
>> +	touch .git/fs-tstamp &&
> 
> Not that it is wrong, but do we need to create such a throw-away
> file inside the .git directory?

We actually only need a timestamp for which we know that it is before 
the timestamp the next file system operation would create. I agree that 
it should be easy to rewrite that using "test-tool chmtime". This should 
also simplify reset_mtime().

Regarding all other comments, thanks, I'll address them as suggested for 
the next patch. And sorry for not checking CodingGuidelines before (I 
had completely missed this document).

-Marc

  reply	other threads:[~2021-12-23 18:24 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-22 13:56 [PATCH] update-index: refresh should rewrite index in case of racy timestamps Marc Strapetz via GitGitGadget
2021-12-22 23:52 ` Junio C Hamano
2021-12-23 18:24   ` Marc Strapetz [this message]
2022-01-05 13:15 ` [PATCH v2 0/2] " Marc Strapetz via GitGitGadget
2022-01-05 13:15   ` [PATCH v2 1/2] t7508: add tests capturing racy timestamp handling Marc Strapetz via GitGitGadget
2022-01-05 20:59     ` Junio C Hamano
2022-01-06 10:21       ` Marc Strapetz
2022-01-05 13:15   ` [PATCH v2 2/2] update-index: refresh should rewrite index in case of racy timestamps Marc Strapetz via GitGitGadget
2022-01-05 21:03     ` Junio C Hamano
2022-01-06 22:34   ` [PATCH v3 0/4] " Marc Strapetz via GitGitGadget
2022-01-06 22:34     ` [PATCH v3 1/4] test-lib: introduce API for verifying file mtime Marc Strapetz via GitGitGadget
2022-01-06 23:55       ` Junio C Hamano
2022-01-06 22:34     ` [PATCH v3 2/4] t7508: fix bogus mtime verification Marc Strapetz via GitGitGadget
2022-01-06 22:34     ` [PATCH v3 3/4] t7508: add tests capturing racy timestamp handling Marc Strapetz via GitGitGadget
2022-01-06 22:34     ` [PATCH v3 4/4] update-index: refresh should rewrite index in case of racy timestamps Marc Strapetz via GitGitGadget
2022-01-07 11:17     ` [PATCH v4 0/4] " Marc Strapetz via GitGitGadget
2022-01-07 11:17       ` [PATCH v4 1/4] test-lib: introduce API for verifying file mtime Marc Strapetz via GitGitGadget
2022-01-07 11:17       ` [PATCH v4 2/4] t7508: fix bogus mtime verification Marc Strapetz via GitGitGadget
2022-01-07 11:17       ` [PATCH v4 3/4] t7508: add tests capturing racy timestamp handling Marc Strapetz via GitGitGadget
2022-01-07 11:17       ` [PATCH v4 4/4] update-index: refresh should rewrite index in case of racy timestamps Marc Strapetz via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b97672fa-837f-1e28-f7f2-aee80e52d374@syntevo.com \
    --to=marc.strapetz@syntevo.com \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).