git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Stefan Beller" <stefanbeller@gmail.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>
Subject: Re: [PATCH v3 8/8] reflog expire: don't assert the OID when locking refs
Date: Thu, 21 Mar 2019 15:10:08 +0100	[thread overview]
Message-ID: <87pnqkco8v.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu>


On Tue, Mar 19 2019, Michael Haggerty wrote:

> Thanks for your work and for your thorough explanation of the change!

Hi. Yes, thanks a lot for the feedback. Just hadn't gotten around to
looping back to this yet & digging into the issue you raised.

> On 3/15/19 4:59 PM, Ævar Arnfjörð Bjarmason wrote:
>> During reflog expiry, the cmd_reflog_expire() function first iterates
>> over all reflogs in logs/*, and then one-by-one acquires the lock for
>> each one to expire its reflog by getting a *.lock file on the
>> corresponding loose ref[1] (even if the actual ref is packed).
>>
>> This lock is needed, but what isn't needed is locking the loose ref as
>> a function of the OID we found from that first iteration. By the time
>> we get around to re-visiting the reference some of the OIDs may have
>> changed.
>
> Instead of "what isn't needed is locking the loose ref as a function of
> the OID we found from that first iteration", I suggest "what isn't
> needed is to insist that the reference still has the OID that we found
> in that first iteration".
>
>> Thus the verify_lock() function called by the lock_ref_oid_basic()
>> function being changed here would fail with e.g. "ref '%s' is at %s
>> but expected %s" if the repository was being updated concurrent to the
>> "reflog expire".
>>
>> By not passing the OID to it we'll try to lock the reference
>> regardless of it last known OID. Locking as a function of the OID
>
> s/it/its/
>
>> would make "reflog expire" exit with a non-zero exit status under such
>> contention, which in turn meant that a "gc" command (which expires
>> reflogs before forking to the background) would encounter a hard
>> error.
>
> The last sentence seems mostly redundant with the previous paragraph.
>
>> This behavior of considering the OID when locking has been here ever
>> since "reflog expire" was initially implemented in 4264dc15e1 ("git
>> reflog expire", 2006-12-19). As seen in that simpler initial version
>> of the code we subsequently use the OID to inform the expiry (and
>> still do), but never needed to use it to lock the reference associated
>> with the reflog.
>>
>> By locking the reference without considering what OID we last saw it
>> at, we won't encounter user-visible contention to the extent that
>> core.filesRefLockTimeout mitigates it. See 4ff0f01cb7 ("refs: retry
>> acquiring reference locks for 100ms", 2017-08-21).
>>
>> Unfortunately this sort of probabilistic contention is hard to turn
>> into a test. I've tested this by running the following three subshells
>> in concurrent terminals:
>>
>>     (
>>         cd /tmp &&
>>         rm -rf git &&
>>         git init git &&
>>         cd git &&
>>         while true
>>         do
>>             head -c 10 /dev/urandom | hexdump >out &&
>>             git add out &&
>>             git commit -m"out"
>>         done
>>     )
>>
>>     (
>>         cd /tmp &&
>>         rm -rf git-clone &&
>>         git clone file:///tmp/git git-clone &&
>>         cd git-clone &&
>>         while git pull
>>         do
>>             date
>>         done
>>     )
>>
>>     (
>>         cd /tmp/git-clone &&
>>         while git reflog expire --all
>>         do
>>             date
>>         done
>>     )
>>
>> Before this change the "reflog expire" would fail really quickly with
>> a "but expected" error. After this change both the "pull" and "reflog
>> expire" will run for a while, but eventually fail because I get
>> unlucky with core.filesRefLockTimeout (the "reflog expire" is in a
>> really tight loop). That can be resolved by being more generous with
>> higher values of core.filesRefLockTimeout than the 100ms default.
>>
>> As noted in the commentary being added here we also need to handle the
>> case of references being racily deleted, that can be tested by adding
>> this to the above:
>>
>>     (
>>         cd /tmp/git-clone &&
>>         while git branch topic master && git branch -D topic
>>         do
>>             date
>>         done
>>     )
>>
>> We could change lock_ref_oid_basic() to always pass down
>> RESOLVE_REF_READING to refs_resolve_ref_unsafe() and then
>> files_reflog_expire() to detect the "is it deleted?" state. But let's
>> not bother, in the event of such a race we're going to redundantly
>> create a lock on the deleted reference, and shortly afterwards handle
>> that case and others with the refs_reflog_exists() check.
>>
>> 1. https://public-inbox.org/git/54857871.5090805@alum.mit.edu/
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>  refs/files-backend.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index ef053f716c3..c7ed1792b3b 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -3036,8 +3036,14 @@ static int files_reflog_expire(struct ref_store *ref_store,
>>  	 * The reflog file is locked by holding the lock on the
>>  	 * reference itself, plus we might need to update the
>>  	 * reference if --updateref was specified:
>> +	 *
>> +	 * We don't pass down the oid here because we'd like to be
>> +	 * tolerant to the OID of the ref having changed, and to
>> +	 * gracefully handle the case where it's been deleted (see oid
>> +	 * -> mustexist -> RESOLVE_REF_READING in
>> +	 * lock_ref_oid_basic()) ...
>>  	 */
>> -	lock = lock_ref_oid_basic(refs, refname, oid,
>> +	lock = lock_ref_oid_basic(refs, refname, NULL,
>>  				  NULL, NULL, REF_NO_DEREF,
>>  				  &type, &err);
>
> This seems totally reasonable. But then later, where `oid` is passed to
> `(*prepare_fn)()`, I think you must pass `&(lock->old_oid)` instead,
> since we no longer have a guarantee that `oid` reflects the correct
> state of the reference. And after that, there is no need for this
> function to take an `oid` parameter at all (which also makes sense from
> an abstract point of view). Which means that the signatures of
> `refs_reflog_expire()`, `reflog_expire()`, `packed_reflog_expire()`, and
> `reflog_expire_fn` can also be changed, along with callers.
>
> I haven't had time yet to inspect those callers to see whether they
> might actually care that the `oid` that they used to pass to
> `reflog_expire()` isn't necessarily the one that gets passed back to
> their callbacks, but following the trail that I just outlined should
> make it possible to determine that.
>
>>  	if (!lock) {
>> @@ -3045,6 +3051,13 @@ static int files_reflog_expire(struct ref_store *ref_store,
>>  		strbuf_release(&err);
>>  		return -1;
>>  	}
>> +	/*
>> +	 * When refs are deleted their reflog is deleted before the
>> +	 * loose ref is deleted. This catches that case, i.e. when
>> +	 * racing against a ref deletion lock_ref_oid_basic() will
>> +	 * have acquired a lock on the now-deleted ref, but here's
>> +	 * where we find out it has no reflog anymore.
>> +	 */
>>  	if (!refs_reflog_exists(ref_store, refname)) {
>>  		unlock_ref(lock);
>>  		return 0;
>>
>
> Cheers,
> Michael

  parent reply	other threads:[~2019-03-21 14:10 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-12 23:28 BUG: Race condition due to reflog expiry in "gc" Ævar Arnfjörð Bjarmason
2019-03-13  1:43 ` Junio C Hamano
2019-03-13 10:28   ` Ævar Arnfjörð Bjarmason
2019-03-13 16:02     ` Jeff King
2019-03-13 16:22       ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54         ` [PATCH 0/5] gc: minor code cleanup + contention fixes Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 0/7] " Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 0/8] " Ævar Arnfjörð Bjarmason
2019-03-18  6:13               ` Junio C Hamano
2019-03-28 16:14               ` [PATCH v4 0/7] gc: tests and handle reflog expire config Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 5/7] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 6/7] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 7/7] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 1/8] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 2/8] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 3/8] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 4/8] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 5/8] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 6/8] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-19  6:20               ` Jeff King
2019-03-15 15:59             ` [PATCH v3 7/8] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 8/8] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
     [not found]               ` <b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu>
2019-03-21 14:10                 ` Ævar Arnfjörð Bjarmason [this message]
2019-03-19  6:27             ` [PATCH v2 0/7] gc: minor code cleanup + contention fixes Jeff King
2019-03-14 12:34           ` [PATCH v2 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15  9:51             ` Duy Nguyen
2019-03-15 10:42               ` Ævar Arnfjörð Bjarmason
2019-03-15 15:49                 ` Johannes Schindelin
2019-03-14 12:34           ` [PATCH v2 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 5/7] reflog: exit early if there's no work to do Ævar Arnfjörð Bjarmason
2019-03-15 10:01             ` Duy Nguyen
2019-03-15 15:51               ` Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 6/7] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-15 10:05             ` Duy Nguyen
2019-03-15 10:24               ` Ævar Arnfjörð Bjarmason
2019-03-15 10:32                 ` Duy Nguyen
2019-03-15 15:51                 ` Johannes Schindelin
2019-03-18  6:04                 ` Junio C Hamano
2019-03-14 12:34           ` [PATCH v2 7/7] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-15 11:10             ` Duy Nguyen
2019-03-15 15:52               ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54         ` [PATCH 1/5] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14  0:25           ` Jeff King
2019-03-13 23:54         ` [PATCH 2/5] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-14  0:26           ` Jeff King
2019-03-13 23:54         ` [PATCH 3/5] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14  0:30           ` Jeff King
2019-03-13 23:54         ` [PATCH 4/5] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-14  0:40           ` Jeff King
2019-03-14  4:51             ` Junio C Hamano
2019-03-14 19:26               ` Jeff King
2019-03-13 23:54         ` [PATCH 5/5] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-14  0:44           ` Jeff King
2019-03-14  0:25         ` BUG: Race condition due to reflog expiry in "gc" Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pnqkco8v.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=stefanbeller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).