[RFC] fetch: update refs in a single transaction

* [RFC] fetch: update refs in a single transaction
@ 2021-12-07 14:24 Patrick Steinhardt
  2021-12-08  8:15 ` Christian Couder
  2021-12-09 21:53 ` Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Patrick Steinhardt @ 2021-12-07 14:24 UTC (permalink / raw)
  To: git; +Cc: Han-Wen Nienhuys, Jeff King

[-- Attachment #1: Type: text/plain, Size: 6798 bytes --]

When git-fetch(1) isn't called with the `--atomic` flag, then each
reference will be updated in a separate transaction. As a result, even
if we failed to update a subset of transactions, the remaining refs will
be modified after the command has finished. While this pattern is
reasonably efficient with the files backend where we'd lock and write
each file separately anyway, the upcoming reftable backend could handle
such an update a lot more efficiently if it was batched into a single
transaction given that it could then create a single new reftable slice
instead of creating one slice per updated ref. While this inefficiency
can be easily mitigated by using the `--atomic` flag, this flag cannot
be used in contexts where we want partial-update semantics.

Convert git-fetch(1) to always use a single reference transaction,
regardless of whether it is called with `--atomic` or not. The only
remaining difference between both modes is that with `--atomic` set,
we'd abort the transaciton in case at least one reference cannot be
updated.

Note that this slightly changes semantics of git-fetch(1): if we hit any
unexpected errors like the reference update racing with another process,
then we'll now fail to update any references, too.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/fetch.c | 78 ++++++++++++++++---------------------------------
 1 file changed, 25 insertions(+), 53 deletions(-)

Hi,

until now, we have been quite lenient with creating lots of reference
transactions even though they could've been bundled together into a
single transaction. After all, it didn't have much of a downside in most
contexts with the files backend: we'd have to lock each loose ref
separately anyway. I'd like to get some feedback on changing our stance
here, due to multiple reasons:

    - The upcoming reftable backend will be more efficient if we use a
      single transaction to bundle together multiple ref updates given
      that it only needs to write one new slice instead of one per
      update.

    - Syncing refs to disk can be batched more efficiently if we bundle
      ref updates. See my initial patch series to implement fsync-ing
      refs [1] and Peff's benchmarks [2] demonstrating that fetches may
      become a lot slower.

    - The reference-transaction hook can be used more efficiently given
      that it would only need to execute twice, instead of twice per
      updated ref. It also has a more global view of what's happening.
      While this is a concern of mine, it's a non-reason in the context
      of the Git project given that we really ought not to change
      semantics only to appease this hook.

With these reasons in mind, I'm wondering whether we want to accept
refactorings which convert existing code to use batched reference
transactions. While the potential performance improvements are a rather
obvious upside, the downside is that it would definitely change the
failure mode in many cases.

The example I have here with git-fetch(1) would mean that if we were to
race with any other process or if any other unexpected error occurs
which leads us to die, then we'll not commit any change to disk. This
can be seen as an improvement in consistency, but it can also be seen as
a change which breaks current semantics of trying to do as much work as
possible.

I'd thus love to hear about any opinions on this topic.

Patrick

[1]: <cover.1636544377.git.ps@pks.im>
[2]: <YYwvVy6AWDjkWazn@coredump.intra.peff.net>

diff --git a/builtin/fetch.c b/builtin/fetch.c
index f7abbc31ff..c4cfd55452 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -641,9 +641,6 @@ static struct ref *get_ref_map(struct remote *remote,
 	return ref_map;
 }
 
-#define STORE_REF_ERROR_OTHER 1
-#define STORE_REF_ERROR_DF_CONFLICT 2
-
 static int s_update_ref(const char *action,
 			struct ref *ref,
 			struct ref_transaction *transaction,
@@ -651,7 +648,6 @@ static int s_update_ref(const char *action,
 {
 	char *msg;
 	char *rla = getenv("GIT_REFLOG_ACTION");
-	struct ref_transaction *our_transaction = NULL;
 	struct strbuf err = STRBUF_INIT;
 	int ret;
 
@@ -661,44 +657,12 @@ static int s_update_ref(const char *action,
 		rla = default_rla.buf;
 	msg = xstrfmt("%s: %s", rla, action);
 
-	/*
-	 * If no transaction was passed to us, we manage the transaction
-	 * ourselves. Otherwise, we trust the caller to handle the transaction
-	 * lifecycle.
-	 */
-	if (!transaction) {
-		transaction = our_transaction = ref_transaction_begin(&err);
-		if (!transaction) {
-			ret = STORE_REF_ERROR_OTHER;
-			goto out;
-		}
-	}
-
 	ret = ref_transaction_update(transaction, ref->name, &ref->new_oid,
 				     check_old ? &ref->old_oid : NULL,
 				     0, msg, &err);
-	if (ret) {
-		ret = STORE_REF_ERROR_OTHER;
-		goto out;
-	}
-
-	if (our_transaction) {
-		switch (ref_transaction_commit(our_transaction, &err)) {
-		case 0:
-			break;
-		case TRANSACTION_NAME_CONFLICT:
-			ret = STORE_REF_ERROR_DF_CONFLICT;
-			goto out;
-		default:
-			ret = STORE_REF_ERROR_OTHER;
-			goto out;
-		}
-	}
-
-out:
-	ref_transaction_free(our_transaction);
 	if (ret)
 		error("%s", err.buf);
+
 	strbuf_release(&err);
 	free(msg);
 	return ret;
@@ -1107,12 +1071,10 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
-	if (atomic_fetch) {
-		transaction = ref_transaction_begin(&err);
-		if (!transaction) {
-			error("%s", err.buf);
-			goto abort;
-		}
+	transaction = ref_transaction_begin(&err);
+	if (!transaction) {
+		error("%s", err.buf);
+		goto abort;
 	}
 
 	prepare_format_display(ref_map);
@@ -1229,21 +1191,31 @@ static int store_updated_refs(const char *raw_url, const char *remote_name,
 		}
 	}
 
-	if (!rc && transaction) {
-		rc = ref_transaction_commit(transaction, &err);
-		if (rc) {
-			error("%s", err.buf);
-			goto abort;
-		}
+	if (rc && atomic_fetch) {
+		error(_("aborting reference updates because of atomic fetch"));
+		goto abort;
 	}
 
-	if (!rc)
-		commit_fetch_head(&fetch_head);
-
-	if (rc & STORE_REF_ERROR_DF_CONFLICT)
+	switch (ref_transaction_commit(transaction, &err)) {
+	case 0:
+		break;
+	case TRANSACTION_NAME_CONFLICT:
 		error(_("some local refs could not be updated; try running\n"
 		      " 'git remote prune %s' to remove any old, conflicting "
 		      "branches"), remote_name);
+		rc = -1;
+		break;
+	default:
+		error("%s", err.buf);
+		rc = -1;
+		break;
+	}
+
+	if (rc && atomic_fetch)
+		goto abort;
+
+	if (!rc)
+		commit_fetch_head(&fetch_head);
 
 	if (advice_enabled(ADVICE_FETCH_SHOW_FORCED_UPDATES)) {
 		if (!fetch_show_forced_updates) {
-- 
2.34.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread