All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Michael Heemskerk <mheemskerk@atlassian.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Git Users" <git@vger.kernel.org>
Subject: Re: reference-transaction regression in 2.36.0-rc1
Date: Mon, 2 May 2022 13:12:08 +0200	[thread overview]
Message-ID: <Ym+8iCCy5TlfmObq@ncase> (raw)
In-Reply-To: <CAJDSCnM767fdo6qy085jc9sqezvbBqDD4ikXz1y5tHEjSYED2A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3792 bytes --]

On Mon, May 02, 2022 at 11:41:26AM +0200, Michael Heemskerk wrote:
> > > I'd be happy to provide the fix to files_delete_refs in a patch
> > (including
> > > some extra tests) on top of Patrick's
> > > avoid-unnecessary-hook-invocation-with-packed-refs series if and
> > > when that series is unreverted on 'next'.
> >
> > That would be great. To be clear, do the fixes you have also fix the
> > pre-v2.36.0 issues Bryan has mentioned?
> >
> 
> That is correct. The pre-v2.36.0 issues are caused by files_delete_refs
> first
> preparing and committing a transaction for the packed_ref_store, and then
> iterating over each of the to-be-deleted refs and preparing and committing
> another transaction per ref. This latter transaction triggers the usual
> ABORTED (from packed_ref_store), PREPARED (ref_store),
> COMMITTED (ref_store) callbacks.
> 
> As far as I can tell, all we need to do is remove the separate transaction
> for
> the packed_ref_store.

Do you mean that we should just get rid of the transaction and instead
delete the refs in the packet-refs backend one by one? If so, then I
think that would be a performance regressions in a lot of places. If you
are deleting a bunch of references at the same time, then you do indeed
want to make this a single packed-refs transaction or otherwise we'd
repeatedly rewrite the whole file. And given that this file can easily
range in the hundreds of megabytes this can easily go into pathological
cases.

I think what we need to do instead is a bit more complicated:

    1. Lock the packed-refs backend. This is to ensure that it's not
       being updated while we update loose refs.

    2. Create a transaction for the loose-refs and queue all refs that
       we are about to delete. Now we're safe to prepate the loose refs,
       and at this point in time nothing has been pruned yet. As a
       result, it is still possible for the reference-transaction hook
       to intervene even if the refs only existed as packed-refs file.

    3. Now we have to prepare and commit the packed-refs file. This is
       to avoid a race where deleting loose refs would uncover anything
       that existed in the packed-refs file already.

    4. Unlock the packed-refs backend.

    5. Commit the loose-refs transaction we have prepared already.

This should be race-free and fix the usecase for aborting ref deletions
via the reference-transaction hook. It has the downside though that
packed-refs are locked for far longer than they had been before because
the lock also spans over preparation of the loose-refs transaction.

> Another thing we need to decide is whether or not to backport the fix to
> 2.36 and possibly older. My suggestion would be to only apply the fix to
> 'next' and NOT backport it to older git versions as changing the
> reference-transaction semantics in a patch release would be unexpected.

The way it sounds like this has been a longer-standing issue.
Furthermore, it's not a regression: it just hasn't ever been working. So
I don't feel like it's critical to backport this.

> > Please let me know whether you want to pursue this and whether you need
> > any further help with it.
> >
> 
> I'm happy to provide the patch, but assume that I need to wait for Patrick's
> series to be reapplied to 'next'? Alternatively, I can send the patch to
> Patrick
> to be included in a reroll of the series.
> 
> Michael

Given that you said that my patch series made the preexisting problem
worse I'd prefer to first land a proper fix for this issue, and only
then reapply my own patches on top. I'm also happy to fix the issue
myself -- at GitLab, we also care about this working correctly. Just let
me know your preference.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

      parent reply	other threads:[~2022-05-02 11:12 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-12  9:20 reference-transaction regression in 2.36.0-rc1 Bryan Turner
2022-04-12 20:53 ` Bryan Turner
2022-04-13 14:34   ` Ævar Arnfjörð Bjarmason
2022-04-13 16:21     ` René Scharfe
2022-04-13 19:52     ` Junio C Hamano
2022-04-13 22:44       ` Junio C Hamano
2022-04-13 22:55         ` Bryan Turner
2022-04-13 22:56         ` Junio C Hamano
2022-04-13 23:31           ` Junio C Hamano
2022-04-15  2:37       ` Bryan Turner
2022-04-15 13:27         ` Ævar Arnfjörð Bjarmason
2022-04-15 16:53           ` Junio C Hamano
     [not found]             ` <CAJDSCnOUQm-doY-xTobPk64oeiSsTd+vSFzsb_cz9BLORAxXGA@mail.gmail.com>
2022-04-27 11:05               ` Patrick Steinhardt
     [not found]                 ` <CAJDSCnM767fdo6qy085jc9sqezvbBqDD4ikXz1y5tHEjSYED2A@mail.gmail.com>
2022-05-02 11:12                   ` Patrick Steinhardt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ym+8iCCy5TlfmObq@ncase \
    --to=ps@pks.im \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mheemskerk@atlassian.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.