git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Heemskerk <mheemskerk@atlassian.com>
To: Jiang Xin <worldhello.net@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	Patrick Steinhardt <ps@pks.im>, Git List <git@vger.kernel.org>,
	Jiang Xin <zhiyou.jx@alibaba-inc.com>
Subject: Re: [PATCH 9/9] refs: reimplement refs_delete_refs() and run hook once
Date: Tue, 2 Aug 2022 14:42:01 +0200	[thread overview]
Message-ID: <CAJDSCnMHHdYGeyXKj=ztUKBv2vRTn5BEXUR_7fAfATJxn_uwww@mail.gmail.com> (raw)
In-Reply-To: <20220729101245.6469-10-worldhello.net@gmail.com>

Let me re-share some questions/suggestions/objections I got on a patch I
shared with similar changes:
https://lore.kernel.org/git/pull.1228.git.1651676435634.gitgitgadget@gmail.com/

There's a lot to like about the change; it fixes the incorrect invocation of
the reference-transaction hooks when (bulk) deleting refs, but there is a
down-side that Patrick pointed out. We never got to a satisfactory solution,
so let me reshare his feedback to pick up the discussion.

Patrick:
> I really like these changes given that they simplify things, but I
> wonder whether we can do them. In the preimage we're eagerly removing
> loose refs: any error encountered when deleting a reference is recorded,
> but we keep on trying to remove the other refs, as well. With the new
> behaviour we now create a single transaction for all refs and try to
> commit it. This also means that we'll abort the transaction when locking
> any of the refs fails, which is a change in behaviour.
>
> The current behaviour is explicitly documented in `refs.h:refs_delete_refs()`:
>
>     /*
>      * Delete the specified references. If there are any problems, emit
>      * errors but attempt to keep going (i.e., the deletes are not done in
>      * an all-or-nothing transaction). msg and flags are passed through to
>      * ref_transaction_delete().
>      */
>    int refs_delete_refs(struct ref_store *refs, const char *msg,
>                          struct string_list *refnames, unsigned int flags);
>
> There are multiple callsites of this function via `delete_refs()`. Now
> honestly, most of these callsites look somewhat broken:
>
>     - `bisect.c` simply does its best to clean up bisect state. This
>       usecase looks fine to me.
>
>    - `builtin/branch.c` reports the branches as deleted even if
>       `delete_refs()` failed.
>
>     - `builtin/remote.c` also misreports the deleted branches for the
>       `prune` verb. The `rm` verb looks alright: if deletion of any
>       branch failed then it doesn't prune the remote's config in the end
>       and reports an error.
>
>     - `builtin/fetch.c` also misreports deleted branches with `--prune`.
>
> So most of these commands incorrectly handle the case where only a
> subset of branches has been deleted. This raises the question whether
> the interface provided by `refs_delete_refs()` is actually sensible if
> it's so easy to get wrong. It doesn't even report which branches could
> be removed and which couldn't. Furthermore, the question is whether new
> backends like the reftable backend which write all refs into a single
> slice would actually even be in a position to efficiently retain
> semantics of this function.
>
> I'm torn. There are valid usecases for eagerly deleting refs even if a
> subset of deletions failed, making this change a tough sell, but most of
> the callsites don't actually handle this correctly in the first place.

At the time, the only solution I could see was to switch to
transaction-per-ref semantics, but this results in bad performance when
deleting tens of thousands of refs.

One option might be to optimistically try to delete the refs in a single
transaction. If that fails for whatever reason and multiple ref deletions are
requested, we could fall back to a transaction-per-ref approach. That'd keep
the common case fast, and still provide best effort deletes.

Thoughts?

Cheers,
Michael Heemskerk

On Fri, Jul 29, 2022 at 12:13 PM Jiang Xin <worldhello.net@gmail.com> wrote:
>
> From: Jiang Xin <zhiyou.jx@alibaba-inc.com>
>
> When delete references using "git branch -d" or "git tag -d", there will
> be duplicate call of "reference-transaction committed" for same refs.
> This is because "refs_delete_refs()" is called twice, once for
> files-backend and once for packed-backend, and we used to reinvented the
> wheel in "files_delete_refs()" and "packed_delete_refs()". By removing
> "packed_delete_refs()" and reimplement "files_delete_refs()", the
> "reference-transaction" hook will run only once for deleted branches and
> tags.
>
> The behavior of the following git commands and the last two testcases
> have been fixed in t1416:
>
>  * git branch -d <branch>
>  * git tag -d <tag>
>
> A testcase in t5510 is broken because we used to call the function
> "packed_refs_lock()", but it is not necessary if the deleted reference
> is not in the "packed-refs" file.
>
> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
> ---
>  refs/files-backend.c             | 21 ++++++-------
>  refs/packed-backend.c            | 51 +-------------------------------
>  t/t1416-ref-transaction-hooks.sh |  4 +--
>  t/t5510-fetch.sh                 | 17 +++++++++++
>  4 files changed, 29 insertions(+), 64 deletions(-)
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 8baea66e58..21426efaae 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1268,31 +1268,27 @@ static int files_pack_refs(struct ref_store *ref_store, unsigned int flags)
>  static int files_delete_refs(struct ref_store *ref_store, const char *msg,
>                              struct string_list *refnames, unsigned int flags)
>  {
> -       struct files_ref_store *refs =
> -               files_downcast(ref_store, REF_STORE_WRITE, "delete_refs");
> +       struct ref_transaction *transaction;
>         struct strbuf err = STRBUF_INIT;
>         int i, result = 0;
>
>         if (!refnames->nr)
>                 return 0;
>
> -       if (packed_refs_lock(refs->packed_ref_store, 0, &err))
> -               goto error;
> -
> -       if (refs_delete_refs(refs->packed_ref_store, msg, refnames, flags)) {
> -               packed_refs_unlock(refs->packed_ref_store);
> +       transaction = ref_store_transaction_begin(ref_store, &err);
> +       if (!transaction)
>                 goto error;
> -       }
> -
> -       packed_refs_unlock(refs->packed_ref_store);
>
>         for (i = 0; i < refnames->nr; i++) {
>                 const char *refname = refnames->items[i].string;
> -
> -               if (refs_delete_ref(&refs->base, msg, refname, NULL, flags))
> +               if (ref_transaction_delete(transaction, refname, NULL,
> +                                          flags, msg, &err))
>                         result |= error(_("could not remove reference %s"), refname);
>         }
> +       if (ref_transaction_commit(transaction, &err))
> +               goto error;
>
> +       ref_transaction_free(transaction);
>         strbuf_release(&err);
>         return result;
>
> @@ -1309,6 +1305,7 @@ static int files_delete_refs(struct ref_store *ref_store, const char *msg,
>         else
>                 error(_("could not delete references: %s"), err.buf);
>
> +       ref_transaction_free(transaction);
>         strbuf_release(&err);
>         return -1;
>  }
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 97b6837767..fdb7a0a52c 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1519,55 +1519,6 @@ static int packed_initial_transaction_commit(struct ref_store *ref_store,
>         return ref_transaction_commit(transaction, err);
>  }
>
> -static int packed_delete_refs(struct ref_store *ref_store, const char *msg,
> -                            struct string_list *refnames, unsigned int flags)
> -{
> -       struct packed_ref_store *refs =
> -               packed_downcast(ref_store, REF_STORE_WRITE, "delete_refs");
> -       struct strbuf err = STRBUF_INIT;
> -       struct ref_transaction *transaction;
> -       struct string_list_item *item;
> -       int ret;
> -
> -       (void)refs; /* We need the check above, but don't use the variable */
> -
> -       if (!refnames->nr)
> -               return 0;
> -
> -       /*
> -        * Since we don't check the references' old_oids, the
> -        * individual updates can't fail, so we can pack all of the
> -        * updates into a single transaction.
> -        */
> -
> -       transaction = ref_store_transaction_begin(ref_store, &err);
> -       if (!transaction)
> -               return -1;
> -
> -       for_each_string_list_item(item, refnames) {
> -               if (ref_transaction_delete(transaction, item->string, NULL,
> -                                          flags, msg, &err)) {
> -                       warning(_("could not delete reference %s: %s"),
> -                               item->string, err.buf);
> -                       strbuf_reset(&err);
> -               }
> -       }
> -
> -       ret = ref_transaction_commit(transaction, &err);
> -
> -       if (ret) {
> -               if (refnames->nr == 1)
> -                       error(_("could not delete reference %s: %s"),
> -                             refnames->items[0].string, err.buf);
> -               else
> -                       error(_("could not delete references: %s"), err.buf);
> -       }
> -
> -       ref_transaction_free(transaction);
> -       strbuf_release(&err);
> -       return ret;
> -}
> -
>  static int packed_pack_refs(struct ref_store *ref_store, unsigned int flags)
>  {
>         /*
> @@ -1595,7 +1546,7 @@ struct ref_storage_be refs_be_packed = {
>
>         .pack_refs = packed_pack_refs,
>         .create_symref = NULL,
> -       .delete_refs = packed_delete_refs,
> +       .delete_refs = NULL,
>         .rename_ref = NULL,
>         .copy_ref = NULL,
>
> diff --git a/t/t1416-ref-transaction-hooks.sh b/t/t1416-ref-transaction-hooks.sh
> index df75e5727c..f64166f9d7 100755
> --- a/t/t1416-ref-transaction-hooks.sh
> +++ b/t/t1416-ref-transaction-hooks.sh
> @@ -744,7 +744,7 @@ test_expect_success "branch: rename branches" '
>         test_cmp_heads_and_tags -C workdir expect
>  '
>
> -test_expect_failure "branch: remove branches" '
> +test_expect_success "branch: remove branches" '
>         test_when_finished "rm -f $HOOK_OUTPUT" &&
>
>         cat >expect <<-EOF &&
> @@ -873,7 +873,7 @@ test_expect_success "tag: update refs to create loose refs" '
>         test_cmp_heads_and_tags -C workdir expect
>  '
>
> -test_expect_failure "tag: remove tags with mixed ref_stores" '
> +test_expect_success "tag: remove tags with mixed ref_stores" '
>         test_when_finished "rm -f $HOOK_OUTPUT" &&
>
>         cat >expect <<-EOF &&
> diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
> index b45879a760..22de7ac9ec 100755
> --- a/t/t5510-fetch.sh
> +++ b/t/t5510-fetch.sh
> @@ -168,6 +168,8 @@ test_expect_success REFFILES 'fetch --prune fails to delete branches' '
>         cd "$D" &&
>         git clone . prune-fail &&
>         cd prune-fail &&
> +       git update-ref refs/remotes/origin/extrabranch main~ &&
> +       git pack-refs --all &&
>         git update-ref refs/remotes/origin/extrabranch main &&
>         : this will prevent --prune from locking packed-refs for deleting refs, but adding loose refs still succeeds  &&
>         >.git/packed-refs.new &&
> @@ -175,6 +177,21 @@ test_expect_success REFFILES 'fetch --prune fails to delete branches' '
>         test_must_fail git fetch --prune origin
>  '
>
> +test_expect_success REFFILES 'fetch --prune ok for loose refs not in locked packed-refs' '
> +       test_when_finished "cd \"$D\"; rm -rf \"prune-ok-ref-not-packed\"" &&
> +       cd "$D" &&
> +       git clone . prune-ok-ref-not-packed &&
> +       (
> +               cd prune-ok-ref-not-packed &&
> +               git update-ref refs/remotes/origin/extrabranch main &&
> +               : for loose refs not in packed-refs, we can delete them even the packed-refs is locked &&
> +               :>.git/packed-refs.new &&
> +
> +               git fetch --prune origin &&
> +               test_must_fail git rev-parse refs/remotes/origin/extrabranch --
> +       )
> +'
> +
>  test_expect_success 'fetch --atomic works with a single branch' '
>         test_when_finished "rm -rf \"$D\"/atomic" &&
>
> --
> 2.36.1.25.gc87d5ad63a.dirty
>

  reply	other threads:[~2022-08-02 12:42 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-29 10:12 [PATCH 0/9] Fix issues of reference-transaction hook for various git commands Jiang Xin
2022-07-29 10:12 ` [PATCH 1/9] t1416: more testcases for reference-transaction hook Jiang Xin
2022-07-30  6:44   ` Eric Sunshine
2022-07-31  3:25     ` Jiang Xin
2022-07-29 10:12 ` [PATCH 2/9] refs: update missing old-oid in transaction from lockfile Jiang Xin
2022-07-29 10:12 ` [PATCH 3/9] refs: add new field in transaction for running transaction hook Jiang Xin
2022-07-29 10:12 ` [PATCH 4/9] refs: do not run transaction hook for git-pack-refs Jiang Xin
2022-07-29 10:12 ` [PATCH 5/9] refs: avoid duplicate running of the reference-transaction hook Jiang Xin
2022-08-02 12:18   ` Michael Heemskerk
2022-08-05  1:41     ` Jiang Xin
2022-08-19  3:21       ` [PATCH v2 0/9] Fix issues of refx-txn hook for various git commands Jiang Xin
2022-08-19  3:21       ` [PATCH v2 1/9] t1416: more testcases for reference-transaction hook Jiang Xin
2022-08-19  3:21       ` [PATCH v2 2/9] refs: update missing old-oid in transaction from lockfile Jiang Xin
2022-08-19  3:21       ` [PATCH v2 3/9] refs: add new field in transaction for running transaction hook Jiang Xin
2022-08-19  3:21       ` [PATCH v2 4/9] refs: do not run transaction hook for git-pack-refs Jiang Xin
2022-08-19  3:21       ` [PATCH v2 5/9] refs: avoid duplicate running of the reference-transaction hook Jiang Xin
2022-08-19  3:21       ` [PATCH v2 6/9] refs: add reflog_info to hold more fields for reflog entry Jiang Xin
2022-08-19  3:21       ` [PATCH v2 7/9] refs: get error message via refs_update_ref_extended() Jiang Xin
2022-08-19  3:21       ` [PATCH v2 8/9] refs: reimplement files_copy_or_rename_ref() to run refs-txn hook Jiang Xin
2022-08-19  3:21       ` [PATCH v2 9/9] refs: reimplement refs_delete_refs() and run hook once Jiang Xin
2022-07-29 10:12 ` [PATCH 6/9] refs: add reflog_info to hold more fields for reflog entry Jiang Xin
2022-08-01 11:32   ` Jiang Xin
2022-07-29 10:12 ` [PATCH 7/9] refs: get error message via refs_update_ref_extended() Jiang Xin
2022-07-29 10:12 ` [PATCH 8/9] refs: reimplement files_copy_or_rename_ref() to run hook Jiang Xin
2022-07-29 10:12 ` [PATCH 9/9] refs: reimplement refs_delete_refs() and run hook once Jiang Xin
2022-08-02 12:42   ` Michael Heemskerk [this message]
2022-08-09 11:05     ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJDSCnMHHdYGeyXKj=ztUKBv2vRTn5BEXUR_7fAfATJxn_uwww@mail.gmail.com' \
    --to=mheemskerk@atlassian.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=ps@pks.im \
    --cc=worldhello.net@gmail.com \
    --cc=zhiyou.jx@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).