git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Michael Haggerty" <mhagger@alum.mit.edu>,
	"Stefan Beller" <stefanbeller@gmail.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v2 0/7] gc: minor code cleanup + contention fixes
Date: Thu, 14 Mar 2019 13:34:32 +0100	[thread overview]
Message-ID: <20190314123439.4347-1-avarab@gmail.com> (raw)
In-Reply-To: <20190313235439.30439-1-avarab@gmail.com>

Addresse Peff's comments to v1. For a couple of patches I've faked up
his SOB where I copy/pasted a comment or code from a v1 comment
verbatim. Will see if he acks that.

The main change is a better commit message in the last patch (now
7/7), and 2x new "reflog" patches to make it early exit in addition to
"gc" when there's nothing to do.

There was the "why do it at all in gc" feedback on 6/7 in v1. I've
adjusted the commit message there to justify it, but we'll see what
Peff things about it this time around...

Ævar Arnfjörð Bjarmason (7):
  gc: remove redundant check for gc_auto_threshold
  gc: convert to using the_hash_algo
  gc: refactor a "call me once" pattern
  reflog tests: make use of "test_config" idiom
  reflog: exit early if there's no work to do
  gc: don't run "reflog expire" when keeping reflogs
  reflog expire: don't assert the OID when locking refs

 builtin/gc.c         | 37 +++++++++++++++++++++++++++++--------
 builtin/reflog.c     |  7 +++++++
 refs/files-backend.c |  2 +-
 t/t1410-reflog.sh    | 17 ++++++++---------
 t/t6500-gc.sh        | 19 +++++++++++++++++++
 5 files changed, 64 insertions(+), 18 deletions(-)

Range-diff:
1:  1635c7fb22 ! 1:  e18433f9c6 gc: convert to using the_hash_algo
    @@ -11,16 +11,26 @@
         the case for SHA-256, and reading between the lines in
         hash-function-transition.txt the format is planned to be the same.
     
    -    However, we may want to modify this code for the hash function
    -    transition. There's a potential pathological case here where we'll
    -    only consider the loose objects for the currently active hash, but
    -    objects for that hash will share a directory storage with the other
    -    hash.
    +    In the future we may want to further modify this code for the hash
    +    function transition. There's a potential pathological case here where
    +    we'll only consider the loose objects for the currently active hash,
    +    but objects for that hash will share a directory storage with the
    +    other hash.
     
         Thus we could theoretically have 1k SHA-1 loose objects, and say 1
         million SHA-256 objects, and not notice because we're currently using
         SHA-1.
     
    +    So assuming that "gc" eventually learns to pack up both SHA-1 and
    +    SHA-256 objects regardless of what the current the_hash_alg is perhaps
    +    this check should be changed to consider all files in objects/17/
    +    matching [0-9a-f] 38 or 62 characters in length (i.e. both SHA-1 and
    +    SHA-256).
    +
    +    But none of that is something we need to worry about now, and
    +    supporting both 38 and 62 characters depending on "the_hash_algo"
    +    removes another case of SHA-1 hardcoding.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      diff --git a/builtin/gc.c b/builtin/gc.c
    @@ -30,8 +40,7 @@
      	int auto_threshold;
      	int num_loose = 0;
      	int needed = 0;
    -+	const unsigned hexsz = the_hash_algo->hexsz;
    -+	const unsigned hexsz_loose = hexsz - 2;
    ++	const unsigned hexsz_loose = the_hash_algo->hexsz - 2;
      
      	dir = opendir(git_path("objects/17"));
      	if (!dir)
2:  ced972826d ! 2:  54e4bce91c gc: refactor a "call me once" pattern
    @@ -10,6 +10,7 @@
         "prune_reflogs" variables, so let's not leave the reader wondering if
         they're being zero'd out for later use somewhere else.
     
    +    Signed-off-by: Jeff King <peff@peff.net>
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      diff --git a/builtin/gc.c b/builtin/gc.c
    @@ -19,6 +20,11 @@
      
      static void gc_before_repack(void)
      {
    ++	/*
    ++	 * We may be called twice, as both the pre- and
    ++	 * post-daemonized phases will call us, but running these
    ++	 * commands more than once is pointless and wasteful.
    ++	 */
     +	static int done = 0;
     +	if (done++)
     +		return;
-:  ---------- > 3:  82f87db134 reflog tests: make use of "test_config" idiom
-:  ---------- > 4:  c79608dbbb reflog: exit early if there's no work to do
3:  599772f2bd ! 5:  c47dedab58 gc: don't run "reflog expire" when keeping reflogs
    @@ -5,10 +5,18 @@
         Don't redundantly run "git reflog expire --all" when gc.reflogExpire
         and gc.reflogExpireUnreachable are set to "never".
     
    -    I'm being careful to not repeat the issue fixed in
    -    8ab5aa4bd8 ("parseopt: handle malformed --expire arguments more
    -    nicely", 2018-04-21). We'll die early if the config variables are set
    -    to invalid values.
    +    An earlier change taught "git reflog expire" itself to exit early
    +    under this scenario, so in some sense this isn't strictly
    +    necessary. Reasons to also do it here:
    +
    +     1) Similar to 8ab5aa4bd8 ("parseopt: handle malformed --expire
    +        arguments more nicely", 2018-04-21). We'll die early if the config
    +        variables are set to invalid values. We run "pack-refs" before
    +        "reflog expire", which can take a while, only to then die on an
    +        invalid gc.reflogExpire{Unreachable,} configuration.
    +
    +     2) Not invoking the command at all means it won't show up in trace
    +        output, which makes what's going on more obvious.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
4:  c127265828 ! 6:  55dd203a04 reflog expire: don't assert the OID when locking refs
    @@ -2,35 +2,39 @@
     
         reflog expire: don't assert the OID when locking refs
     
    -    The locking protocol for reflogs involves getting a *.lock file on the
    -    loose ref[1] (even if the actual ref is packed). This isn't needed to
    -    expire the reflog, and needlessly results promotes reference update
    -    contention to hard errors in e.g. "gc".
    -
         During reflog expiry, the cmd_reflog_expire() function first iterates
    -    over all known references, and then one-by-one acquires the lock for
    -    each one to expire its reflog. By the time it gets around to
    -    re-visiting the references some of the OIDs may have changed.
    +    over all reflogs in logs/*, and then one-by-one acquires the lock for
    +    each one to expire its reflog by getting a *.lock file on the
    +    corresponding loose ref[1] (even if the actual ref is packed).
    +
    +    This lock is needed, but what isn't needed is locking the loose ref as
    +    a function of the OID we found from that first iteration. By the time
    +    we get around to re-visiting the reference some of the OIDs may have
    +    changed.
     
    -    This has been the case ever since "reflog expire" was initially
    -    implemented in 4264dc15e1 ("git reflog expire", 2006-12-19). As seen
    -    in that simpler initial version of the code (and the same is the case
    -    before this change) we subsequently use the OID to inform the expiry,
    -    but never needed to use it to lock the reference associated with the
    -    reflog.
    +    Thus the verify_lock() function called by the lock_ref_oid_basic()
    +    function being changed here would fail with e.g. "ref '%s' is at %s
    +    but expected %s" if the repository was being updated concurrent to the
    +    "reflog expire".
     
    -    Thus the verify_lock() function would fail with e.g. "ref '%s' is at
    -    %s but expected %s" if the repository was being updated concurrent to
    -    the "reflog expire".
    +    By not passing the OID to it we'll try to lock the reference
    +    regardless of it last known OID. Locking as a function of the OID
    +    would make "reflog expire" exit with a non-zero exit status under such
    +    contention, which in turn meant that a "gc" command (which expires
    +    reflogs before forking to the background) would encounter a hard
    +    error.
     
    -    This made "reflog expire" exit with a non-zero exit status, which in
    -    turn meant that a "gc" command (which expires reflogs before forking
    -    to the background) would encounter a hard error in such a scenario.
    +    This behavior of considering the OID when locking has been here ever
    +    since "reflog expire" was initially implemented in 4264dc15e1 ("git
    +    reflog expire", 2006-12-19). As seen in that simpler initial version
    +    of the code we subsequently use the OID to inform the expiry (and
    +    still do), but never needed to use it to lock the reference associated
    +    with the reflog.
     
    -    If we instead lock the reference without considering what OID we last
    -    saw it at, we won't encounter user-visible contention to the extent
    -    that core.filesRefLockTimeout mitigates it. See 4ff0f01cb7 ("refs:
    -    retry acquiring reference locks for 100ms", 2017-08-21).
    +    By locking the reference without considering what OID we last saw it
    +    at, we won't encounter user-visible contention to the extent that
    +    core.filesRefLockTimeout mitigates it. See 4ff0f01cb7 ("refs: retry
    +    acquiring reference locks for 100ms", 2017-08-21).
     
         Unfortunately this sort of probabilistic contention is hard to turn
         into a test. I've tested this by running the following three subshells
-- 
2.21.0.360.g471c308f928


  reply	other threads:[~2019-03-14 12:34 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-12 23:28 BUG: Race condition due to reflog expiry in "gc" Ævar Arnfjörð Bjarmason
2019-03-13  1:43 ` Junio C Hamano
2019-03-13 10:28   ` Ævar Arnfjörð Bjarmason
2019-03-13 16:02     ` Jeff King
2019-03-13 16:22       ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54         ` [PATCH 0/5] gc: minor code cleanup + contention fixes Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` Ævar Arnfjörð Bjarmason [this message]
2019-03-15 15:59             ` [PATCH v3 0/8] " Ævar Arnfjörð Bjarmason
2019-03-18  6:13               ` Junio C Hamano
2019-03-28 16:14               ` [PATCH v4 0/7] gc: tests and handle reflog expire config Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 5/7] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 6/7] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 7/7] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 1/8] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 2/8] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 3/8] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 4/8] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 5/8] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 6/8] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-19  6:20               ` Jeff King
2019-03-15 15:59             ` [PATCH v3 7/8] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 8/8] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
     [not found]               ` <b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu>
2019-03-21 14:10                 ` Ævar Arnfjörð Bjarmason
2019-03-19  6:27             ` [PATCH v2 0/7] gc: minor code cleanup + contention fixes Jeff King
2019-03-14 12:34           ` [PATCH v2 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15  9:51             ` Duy Nguyen
2019-03-15 10:42               ` Ævar Arnfjörð Bjarmason
2019-03-15 15:49                 ` Johannes Schindelin
2019-03-14 12:34           ` [PATCH v2 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 5/7] reflog: exit early if there's no work to do Ævar Arnfjörð Bjarmason
2019-03-15 10:01             ` Duy Nguyen
2019-03-15 15:51               ` Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 6/7] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-15 10:05             ` Duy Nguyen
2019-03-15 10:24               ` Ævar Arnfjörð Bjarmason
2019-03-15 10:32                 ` Duy Nguyen
2019-03-15 15:51                 ` Johannes Schindelin
2019-03-18  6:04                 ` Junio C Hamano
2019-03-14 12:34           ` [PATCH v2 7/7] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-15 11:10             ` Duy Nguyen
2019-03-15 15:52               ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54         ` [PATCH 1/5] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14  0:25           ` Jeff King
2019-03-13 23:54         ` [PATCH 2/5] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-14  0:26           ` Jeff King
2019-03-13 23:54         ` [PATCH 3/5] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14  0:30           ` Jeff King
2019-03-13 23:54         ` [PATCH 4/5] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-14  0:40           ` Jeff King
2019-03-14  4:51             ` Junio C Hamano
2019-03-14 19:26               ` Jeff King
2019-03-13 23:54         ` [PATCH 5/5] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-14  0:44           ` Jeff King
2019-03-14  0:25         ` BUG: Race condition due to reflog expiry in "gc" Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190314123439.4347-1-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=stefanbeller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).