git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiang Xin <worldhello.net@gmail.com>
To: Junio C Hamano <gitster@pobox.com>,
	Patrick Steinhardt <ps@pks.im>,
	Michael Heemskerk <mheemskerk@atlassian.com>,
	Git List <git@vger.kernel.org>
Cc: Jiang Xin <zhiyou.jx@alibaba-inc.com>,
	Jiang Xin <worldhello.net@gmail.com>
Subject: [PATCH v2 0/9] Fix issues of refx-txn hook for various git commands
Date: Fri, 19 Aug 2022 11:21:38 +0800	[thread overview]
Message-ID: <20220819032147.28841-1-worldhello.net@gmail.com> (raw)
In-Reply-To: <CANYiYbFw71bX827akAG87RSKOozPk313Hoe573O9dQ65_U6sLQ@mail.gmail.com>

From: Jiang Xin <zhiyou.jx@alibaba-inc.com>

Before Patrick introduced the reference-transaction hook in git 2.28,
We had implemented some internal transaction hooks "pre-txn" and
"post-txn" for our internal git infrastructure. The "pre-txn" hook is
used to check our site-wide lockfile to disable write services for
one or multiple repositories. The "post-txn" hook is used to update
the timestamp and checksum of the repository. Recently I wanted to
migrate our internal implementation to use ref-txn hooks, but I ran
into the same issues reported by Michael and Bryan:

 * https://lore.kernel.org/git/CAGyf7-GYYA2DOnAVYW--=QEc2WjSHzUhp2OQyuyOr3EOtFDm6g@mail.gmail.com/
 * https://lore.kernel.org/git/pull.1228.git.1651676435634.gitgitgadget@gmail.com/

This patch series try to fix the issues I encountered.

Patch 1/9 adds more testcases in t1416:

 * Some commands failed because the expected "<old-oid>" was
   "<zero-oid>". Such as: "git update-ref -d <ref>".

 * Some commands failed because the "reference-transaction committed"
   command was repeated multiple times for the same changes. Such as:
   "git update-ref -d <ref>" and "git tag -d <tag>".

 * Some commands should not trigger the "reference-transaction" hook
   because no real changes have occurred to the repository. Such as:
   "git pack-refs --all" and "git gc".

 * Some commands did not execute the "reference-transaction" hook at
   all. E.g.: "git branch -c <src> <dest>", "git branch -m <old> <new>".

Patch 2/9 propagates real old-oid found in lockfile to the update
entries for transaction, so we can get proper old-oid in "prepared" and
"committed" stage for refs-txn hook.

Patch 3/9 adds a new flag in transaction, and we can turn off running
the entire hook for git-pack-refs in patch 4/9. We can also turn off the
"committed" and "aborted" stages of the ref-txn hook for
"packed-ref-store", but we can still run "prepared" stage of the hook.
See patch 5/9.

Patch 6/9 and 7/9 create an extended function
"ref_transaction_update_extended()"
to be used in patch 8/9 to reimplement branch copy and rename.

Patch 9/9 reimplents "files_delete_refs()" to fix failed testcases for
"git branch -d" and "git tag -d".

## Commit log changed since v1:

 1:  bd0ceab4a2 !  1:  2860af63ee t1416: more testcases for reference-transaction hook
    @@ Commit message
          * git tag <new-tag> <oid>                      # create new tag
          * git update-ref --stdin                       # create new refs
          * git update-ref <ref> <oid>                   # create new ref
    +     * git update-ref <ref> <new-oid> <old-oid>     # update ref
     
    -    But 16 testcases failed.
    +    But 17 testcases failed.
     
         Some commands failed because the expected "<old-oid>" became
         "<zero-oid>". E.g.:
    @@ Commit message
          * git tag -d <tag>
          * git update-ref --stdin                       # update/delete refs
          * git update-ref -d <ref>
    -     * git update-ref <ref> <new-oid> [<old-oid>]   # update ref
    +     * git update-ref <ref> <new-oid>               # update ref
     
         Some commands failed because the "reference-transaction committed"
         command was repeated multiple times for the same changes. E.g.:
    @@ Commit message
         We will fix the failed testcases in later commits.
     
         Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
    +    Helped-by: Eric Sunshine <sunshine@sunshineco.com>
     
 2:  915ff63007 !  2:  1dbd4251aa refs: update missing old-oid in transaction from lockfile
    @@ Commit message
         in any case, get proper old_oid from the lock file and propagate it to
         the corresponding update entry of a transaction.
     
    -    The behavior of the following git commands and four testcases have been
    +    The behavior of the following git commands and five testcases have been
         fixed in t1416:
     
    -     * git branch [-f] <ref> <new-oid>              # update branch
    +     * git branch [-f] <ref> <new-oid>      # update branch
          * git cherry-pick <oid>
          * git rebase
          * git tag -d <tag>
    -     * git update-ref --stdin                       # update refs
    +     * git update-ref --stdin               # update refs
          * git update-ref -d <ref>
    -     * git update-ref <ref> <new-oid> [<old-oid>]   # update ref
    +     * git update-ref <ref> <new-oid>       # update ref
     
         Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>

 3:  cf7be214c6 !  3:  58658c782d refs: add new field in transaction for running transaction hook
    @@ Commit message
         and aborted. To update a reference, git may create two seperate
         transactions, one for loose reference and one for packed ref-store. This
         may cause duplicate running of the hook for same references. The new
    -    field "hook_flags" in the transaction can turn off running of a specific
    +    field "hook_flags" in the transaction can turn off running a specific
         transaction. In some scenarios, we may only want to turn off certain
         states of a transaction, such as "committed" and "aborted", but want to
         turn on the "prepared" state of the hook to do some pre-checks, so the
    @@ Commit message
         By calling the "ref_store_transaction_begin()" function, all the flags
         of the "hook_flags" field for the new initialized transaction will be
         turned on. The new function "ref_store_transaction_begin_extended()"
    -    will be used in later commits to custom the "hook_flags" field for a
    -    new initialized transaction.
    +    will be used in later commits with a specific "hook_flags" field to
    +    initialize a new transaction.
     
         Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
     
 4:  34f6a5803a !  4:  ec0c6a1aed refs: do not run transaction hook for git-pack-refs
    @@ Metadata
      ## Commit message ##
         refs: do not run transaction hook for git-pack-refs
     
    -    git-pack-refs will call "files_pack_refs()" to pack loose references
    -    into the "packed-refs" file, and there are no real changes to the
    -    repository. Therefore, by initializing a transaction with an empty
    -    "hook_flags" field, the "reference-transaction" hook will not run.
    +    The git-pack-refs command will call "files_pack_refs()" to pack loose
    +    references into the "packed-refs" file, and there are no real changes
    +    to the repository. Therefore, by initializing a transaction with an
    +    empty "hook_flags" field, the "reference-transaction" hook will not run.
     
         The "prune_refs()" and "prune_ref()" functions are called from
         "file_pack_refs()", and they are used to prune loose refereces which

 5:  649a5734c3 !  5:  d04425af29 refs: avoid duplicate running of the reference-transaction hook
    @@ Commit message
         If there are references to be deleted in a transaction, we should remove
         each reference from both loose references and the "packed-refs" file.
         The "reference-transaction" hook will run twice, once for the primary
    -    ref-store (loose references), and another for the second ref-store (i.e.
    +    ref-store (loose ref-store), and another for the second ref-store (i.e.
         packed ref-store).
     
         To avoid duplicate running of the "reference-trancaction" hook, we pass
         a special "hook-flags" parameter to initialize the second ref-store.
         The "REF_TRANSACTION_RUN_PREPARED_HOOK" bit is preserved for the
         transaction of the second ref-store because we may still want to call
    -    command "reference-trancaction prepared" for some pre-checks, such as
    -    terminate unwanted transaction for the "packed-refs" file.
    +    command "reference-trancaction prepared" to do some checks. E.g.: We
    +    can create a global lock file (such as "site.lock") to disable writing
    +    permissions for one or multiple repositories.
     
         The behavior of the following git commands and five testcases have been
         fixed in t1416:

 6:  9c5ae8e592 !  6:  7443764f6f refs: add reflog_info to hold more fields for reflog entry
    @@ Commit message
         a new structure "reflog_info" to hold more custom fields for the new
         reflog entry, and add two additional extended version functions.
     
    -    We will use this extension in a later commit to reimplement
    -    "files_copy_or_rename_ref()" using "refs_update_ref_extended()" to
    -    create new reference in a transaction and add proper reflog entry.
    +    We will use this extension in a later commit to reimplement the function
    +    "files_copy_or_rename_ref()" by calling "refs_update_ref_extended()" to
    +    create a new reference in a transaction, while adding proper reflog
    +    entry.
     
         Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com>
     
 7:  602deefb6d !  7:  0347885acc refs: get error message via refs_update_ref_extended()
    @@ Metadata
      ## Commit message ##
         refs: get error message via refs_update_ref_extended()
     
    -    The last parameter for "refs_update_ref_extended()" is an enum
    -    "action_on_err", and we can not use this function to get the specific
    -    error message. Extend this function again to get error message.
    +    The last parameter of the function "refs_update_ref()" is an enum
    +    "action_on_err", and we can not use this function to get the error
    +    message. Extend this function to get error message.
     
         We will use the function "refs_update_ref_extended()" to reimplement
         the function "files_copy_or_rename_ref()" in later commit.

 8:  16f3c34db1 !  8:  3c41c0ac8d refs: reimplement files_copy_or_rename_ref() to run hook
    @@ Metadata
     Author: Jiang Xin <zhiyou.jx@alibaba-inc.com>
     
      ## Commit message ##
    -    refs: reimplement files_copy_or_rename_ref() to run hook
    +    refs: reimplement files_copy_or_rename_ref() to run refs-txn hook
     
         When copying or renaming a branch, the "reference-transaction" hook is
    -    not executed. This is because we used to reinvent the wheel in function
    -    "files_copy_or_rename_ref()". By calling "refs_update_ref_extended()" to
    -    reimplement "files_copy_or_rename_ref()", the "reference-transaction"
    -    hook will run correctly.
    +    not executed. This is because we called two low-level functions
    +    "lock_ref_oid_basic()" and "write_ref_to_lockfile()", and reinvented
    +    the wheel in "commit_ref_update()" to update the reference instead of
    +    implementing "files_copy_or_rename_ref()" by calling "refs_update_ref()"
    +    to update a reference in a transaction. The reason for this is that we
    +    want to create a proper reflog for the newly copied reference.
    +
    +    Refactor "files_copy_or_rename_ref()" by calling the extended version
    +    of "refs_update_ref", i.e. "refs_update_ref_extended()", so we can
    +    create the target branch for copying or renaming a branch and generate
    +    a correct reflog file at the same time.
     
         The behavior of the following git commands and two testcases have been
         fixed in t1416:

 9:  c206059145 !  9:  f201436741 refs: reimplement refs_delete_refs() and run hook once
    @@ Metadata
      ## Commit message ##
         refs: reimplement refs_delete_refs() and run hook once
     
    -    When delete references using "git branch -d" or "git tag -d", there will
    -    be duplicate call of "reference-transaction committed" for same refs.
    -    This is because "refs_delete_refs()" is called twice, once for
    -    files-backend and once for packed-backend, and we used to reinvented the
    -    wheel in "files_delete_refs()" and "packed_delete_refs()". By removing
    -    "packed_delete_refs()" and reimplement "files_delete_refs()", the
    -    "reference-transaction" hook will run only once for deleted branches and
    -    tags.
    +    When using "git branch -d" or "git tag -d" to delete one or multiple
    +    references, command "reference-transaction committed" will be called
    +    repeatedly for the same references. This is because the function
    +    "refs_delete_refs()" is called twice, once for loose ref-store and once
    +    for packed ref-store.
    +
    +    The old story starts when running the function "refs_delete_refs()" on a
    +    loose ref-store:
    +
    +     1. Try to remove the references from packed ref-store.
    +
    +        1.1. Lock the packed-ref-store by calling "packed_refs_lock()" in
    +             "files_delete_refs()".
    +
    +        1.2. Call "refs_delete_refs()" on packed-ref-store, and then call
    +             "packed_delete_refs()".
    +
    +        1.3. Create a transaction for packed-ref-store in function
    +             "packed_delete_refs()" by calling the function
    +             "ref_store_transaction_begin()".
    +
    +        2.2. Add update entries for all the references to be removed into
    +             this transaction by calling "ref_transaction_delete()".
    +
    +        2.3. Call "ref_transaction_commit()" to commit the transaction.
    +
    +        2.4. Unlock the packed-ref-store.
    +
    +     2. Try to remove the references one by one by calling the function
    +        "refs_delete_ref()".
    +
    +        2.1. Create a new transaction on loose-ref-store by calling
    +             "ref_store_transaction_begin()".
    +
    +        2.2. Call "ref_transaction_delete()" to add a update entry
    +             for the reference to be deleted into the transaction.
    +
    +        2.3. In "ref_transaction_commit()", it will call functions
    +             "files_transaction_prepare()" and "files_transaction_finish()"
    +             to commit the transaction.
    +
    +             2.3.1. Lock the loose reference.
    +
    +             2.3.2. Create a new packed-transaction, and add a new update
    +                    entry to this packed-transaction. The previous step 1
    +                    makes this operation unnecessary.
    +
    +             2.3.3. Lock the packed-ref-store and call fucntion
    +                    "is_packed_transaction_needed()" to check whether it
    +                    is necessary to commit the transaction, and then
    +                    abort the transaction because the reference is already
    +                    removed from the packed-ref-store in step 1.
    +
    +             2.3.4. Remove the reflog and the loose reference file for
    +                    the reference to be deleted.
    +
    +             2.3.4. Unlock the loose reference.
    +
    +    From the above steps, we can see that "refs_delete_refs()" is not an
    +    atomic operation, but a semi-atomic operation. The operation is atomic
    +    if all references to be deleted are in the packed ref-store, but not
    +    if some references are loose references because we delete the loose
    +    references one by one by calling "refs_delete_ref()" .
    +
    +    Refactored function "files_delete_refs()" to delete references within a
    +    transaction, so the "reference-transaction" hook will only run once for
    +    deleted branches and tags.
     
         The behavior of the following git commands and the last two testcases
         have been fixed in t1416:

## Source code changed since v1:

 2:  915ff63007 !  2:  1dbd4251aa refs: update missing old-oid in transaction from lockfile
    
      ## refs/files-backend.c ##
     @@ refs/files-backend.c: static int lock_ref_for_update(struct files_ref_store *refs,
    - 			goto out;
    - 		}
    + 	update->backend_data = lock;
      
    + 	if (update->type & REF_ISSYMREF) {
    ++		const char *ref;
    ++
     +		/*
    -+		 * Propagate old_oid from the lock to the update entry, so we can
    -+		 * provide a real old-oid of to the "reference-transaction" hook.
    ++		 * Read the referent even though we won't use it as part
    ++		 * of the transaction, because we want to set a proper
    ++		 * old_oid for this symref using the oid we got.
     +		 */
    -+		if (!(update->flags & REF_HAVE_OLD)) {
    -+			oidcpy(&update->old_oid, &lock->old_oid);
    -+			update->flags |= REF_HAVE_OLD;
    -+		}
    ++		ref = refs_resolve_ref_unsafe(&refs->base,
    ++					      referent.buf, 0,
    ++					      &lock->old_oid, NULL);
    ++
    + 		if (update->flags & REF_NO_DEREF) {
    + 			/*
    + 			 * We won't be reading the referent as part of
    +-			 * the transaction, so we have to read it here
    +-			 * to record and possibly check old_oid:
    ++			 * the transaction, so we may need to check
    ++			 * old_oid here:
    + 			 */
    +-			if (!refs_resolve_ref_unsafe(&refs->base,
    +-						     referent.buf, 0,
    +-						     &lock->old_oid, NULL)) {
    ++			if (!ref) {
    + 				if (update->flags & REF_HAVE_OLD) {
    + 					strbuf_addf(err, "cannot lock ref '%s': "
    + 						    "error reading reference",
    +@@ refs/files-backend.c: static int lock_ref_for_update(struct files_ref_store *refs,
    + 		}
    + 	}
    + 
    ++	/*
    ++	 * Propagate old_oid from the lock to the update entry, so we can
    ++	 * provide a proper old-oid of to the "reference-transaction" hook.
    ++	 */
    ++	if (!(update->flags & REF_HAVE_OLD) && !is_null_oid(&lock->old_oid)) {
    ++		oidcpy(&update->old_oid, &lock->old_oid);
    ++		update->flags |= REF_HAVE_OLD;
    ++	}
     +
    - 		/*
    - 		 * If this update is happening indirectly because of a
    - 		 * symref update, record the old OID in the parent
    + 	if ((update->flags & REF_HAVE_NEW) &&
    + 	    !(update->flags & REF_DELETING) &&
    + 	    !(update->flags & REF_LOG_ONLY)) {
     
 6:  9c5ae8e592 !  6:  7443764f6f refs: add reflog_info to hold more fields for reflog entry

    @@ refs.c: struct ref_update *ref_transaction_add_update(
      		oidcpy(&update->old_oid, old_oid);
     -	update->msg = normalize_reflog_message(msg);
     +	if (reflog_info) {
    -+		update->reflog_info = xmalloc(sizeof(*reflog_info));
    ++		update->reflog_info = xcalloc(1, sizeof(*reflog_info));
     +		update->reflog_info->msg = normalize_reflog_message(reflog_info->msg);
     +		if (reflog_info->old_oid)
     +			update->reflog_info->old_oid = oiddup(reflog_info->old_oid);

## Testcases changed since v1 are not list here

--

Jiang Xin (9):
  t1416: more testcases for reference-transaction hook
  refs: update missing old-oid in transaction from lockfile
  refs: add new field in transaction for running transaction hook
  refs: do not run transaction hook for git-pack-refs
  refs: avoid duplicate running of the reference-transaction hook
  refs: add reflog_info to hold more fields for reflog entry
  refs: get error message via refs_update_ref_extended()
  refs: reimplement files_copy_or_rename_ref() to run refs-txn hook
  refs: reimplement refs_delete_refs() and run hook once

 refs.c                           |   99 ++-
 refs.h                           |   23 +
 refs/debug.c                     |    2 +-
 refs/files-backend.c             |  177 ++---
 refs/packed-backend.c            |   51 +-
 refs/refs-internal.h             |   25 +-
 t/t1416-ref-transaction-hooks.sh | 1115 +++++++++++++++++++++++++++++-
 t/t5510-fetch.sh                 |   16 +
 8 files changed, 1318 insertions(+), 190 deletions(-)

-- 
2.36.1.25.gc87d5ad63a.dirty


  reply	other threads:[~2022-08-19  3:22 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-29 10:12 [PATCH 0/9] Fix issues of reference-transaction hook for various git commands Jiang Xin
2022-07-29 10:12 ` [PATCH 1/9] t1416: more testcases for reference-transaction hook Jiang Xin
2022-07-30  6:44   ` Eric Sunshine
2022-07-31  3:25     ` Jiang Xin
2022-07-29 10:12 ` [PATCH 2/9] refs: update missing old-oid in transaction from lockfile Jiang Xin
2022-07-29 10:12 ` [PATCH 3/9] refs: add new field in transaction for running transaction hook Jiang Xin
2022-07-29 10:12 ` [PATCH 4/9] refs: do not run transaction hook for git-pack-refs Jiang Xin
2022-07-29 10:12 ` [PATCH 5/9] refs: avoid duplicate running of the reference-transaction hook Jiang Xin
2022-08-02 12:18   ` Michael Heemskerk
2022-08-05  1:41     ` Jiang Xin
2022-08-19  3:21       ` Jiang Xin [this message]
2022-08-19  3:21       ` [PATCH v2 1/9] t1416: more testcases for " Jiang Xin
2022-08-19  3:21       ` [PATCH v2 2/9] refs: update missing old-oid in transaction from lockfile Jiang Xin
2022-08-19  3:21       ` [PATCH v2 3/9] refs: add new field in transaction for running transaction hook Jiang Xin
2022-08-19  3:21       ` [PATCH v2 4/9] refs: do not run transaction hook for git-pack-refs Jiang Xin
2022-08-19  3:21       ` [PATCH v2 5/9] refs: avoid duplicate running of the reference-transaction hook Jiang Xin
2022-08-19  3:21       ` [PATCH v2 6/9] refs: add reflog_info to hold more fields for reflog entry Jiang Xin
2022-08-19  3:21       ` [PATCH v2 7/9] refs: get error message via refs_update_ref_extended() Jiang Xin
2022-08-19  3:21       ` [PATCH v2 8/9] refs: reimplement files_copy_or_rename_ref() to run refs-txn hook Jiang Xin
2022-08-19  3:21       ` [PATCH v2 9/9] refs: reimplement refs_delete_refs() and run hook once Jiang Xin
2022-07-29 10:12 ` [PATCH 6/9] refs: add reflog_info to hold more fields for reflog entry Jiang Xin
2022-08-01 11:32   ` Jiang Xin
2022-07-29 10:12 ` [PATCH 7/9] refs: get error message via refs_update_ref_extended() Jiang Xin
2022-07-29 10:12 ` [PATCH 8/9] refs: reimplement files_copy_or_rename_ref() to run hook Jiang Xin
2022-07-29 10:12 ` [PATCH 9/9] refs: reimplement refs_delete_refs() and run hook once Jiang Xin
2022-08-02 12:42   ` Michael Heemskerk
2022-08-09 11:05     ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220819032147.28841-1-worldhello.net@gmail.com \
    --to=worldhello.net@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mheemskerk@atlassian.com \
    --cc=ps@pks.im \
    --cc=zhiyou.jx@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).