All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes
@ 2015-06-22 14:02 Michael Haggerty
  2015-06-22 14:02 ` [PATCH v3 01/19] delete_ref(): move declaration to refs.h Michael Haggerty
                   ` (18 more replies)
  0 siblings, 19 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

This is v3 of the series. I think I have addressed all of the feedback
from v1 [1] and v2 [2]. Thanks to Stefan, Junio, and Peff for their
feedback about v2.

There are three significant changes since v2:

* Add a patch

      delete_refs(): bail early if the packed-refs file cannot be rewritten

  that changes delete_refs() to bail early if repack_without_refs()
  fails. See the log message for the explanation.

* Add a patch

      prune_refs(): use delete_refs()

  that changes "git fetch --prune" to use delete_refs() as well. This
  is analogous to the existing patch

      prune_remote(): use delete_refs()

  (Both of these changes remove quadratic behavior that can be
  extremely expensive in repositories with many packed refs.)

* Append four more commits that change how delete_ref() interprets its
  `old_sha1` parameter, to make it consistent with
  ref_transaction_delete() and friends:

      check_branch_commit(): make first parameter const
      update_ref(): don't read old reference value before delete
      cmd_update_ref(): make logic more straightforward
      delete_ref(): use the usual convention for old_sha1

  This change was suggested by Junio [3].

The last change is good for consistency, but less important than
expected in the real world. The reason is that the convention of
ref_transaction_delete() is that passing NULL_SHA1 as old_sha1 is a
bug (because why would somebody delete a reference that he knows not
to exist?) So we don't really gain a useful third case, though we do
gain a consistency check that might be useful someday. However, if you
don't like this change, the last four commits can be dropped
(actually, commits N-3 through N-1 are marginally useful cleanups
anyway so probably only the last commit should be dropped in this
case).

Other, minor changes since v2:

* Improve the commit message of "delete_refs(): make error message
  more generic".

* Better explain the change to error output caused by "prune_remote():
  use delete_refs()" in that commit's log message.

* Split the "add names for function parameters" changes into a
  separate patch.

* Fix how die() is invoked in "initial_ref_transaction_commit():
  function for initial ref creation" and remove some superfluous
  braces.

This series is also available from my GitHub account [4] as branch
"init-delete-refs-api".

[1] http://thread.gmane.org/gmane.comp.version-control.git/271017
[2] http://thread.gmane.org/gmane.comp.version-control.git/271552
[3] http://article.gmane.org/gmane.comp.version-control.git/271697
[4] https://github.com/mhagger/git

Michael Haggerty (19):
  delete_ref(): move declaration to refs.h
  remove_branches(): remove temporary
  delete_ref(): handle special case more explicitly
  delete_refs(): new function for the refs API
  delete_refs(): make error message more generic
  delete_refs(): bail early if the packed-refs file cannot be rewritten
  prune_remote(): use delete_refs()
  prune_refs(): use delete_refs()
  repack_without_refs(): make function private
  initial_ref_transaction_commit(): function for initial ref creation
  refs: remove some functions from the module's public interface
  initial_ref_transaction_commit(): check for duplicate refs
  initial_ref_transaction_commit(): check for ref D/F conflicts
  refs: move the remaining ref module declarations to refs.h
  refs.h: add some parameter names to function declarations
  check_branch_commit(): make first parameter const
  update_ref(): don't read old reference value before delete
  cmd_update_ref(): make logic more straightforward
  delete_ref(): use the usual convention for old_sha1

 archive.c               |   1 +
 builtin/blame.c         |   1 +
 builtin/branch.c        |   5 +-
 builtin/clone.c         |  18 ++++-
 builtin/fast-export.c   |   1 +
 builtin/fetch.c         |  25 ++++--
 builtin/fmt-merge-msg.c |   1 +
 builtin/init-db.c       |   1 +
 builtin/log.c           |   1 +
 builtin/remote.c        |  33 +-------
 builtin/update-ref.c    |  21 ++++-
 cache.h                 |  68 ----------------
 fast-import.c           |   6 +-
 refs.c                  | 182 ++++++++++++++++++++++++++++++++++++++---
 refs.h                  | 211 +++++++++++++++++++++++++++++++-----------------
 remote-testsvn.c        |   1 +
 16 files changed, 371 insertions(+), 205 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 01/19] delete_ref(): move declaration to refs.h
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
@ 2015-06-22 14:02 ` Michael Haggerty
  2015-06-22 14:02 ` [PATCH v3 02/19] remove_branches(): remove temporary Michael Haggerty
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

Also

* Add a docstring

* Rename the second parameter to "old_sha1", to be consistent with the
  convention used elsewhere in the refs module

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 cache.h |  2 --
 refs.c  |  5 +++--
 refs.h  | 10 ++++++++++
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 571c98f..be92121 100644
--- a/cache.h
+++ b/cache.h
@@ -585,8 +585,6 @@ extern void update_index_if_able(struct index_state *, struct lock_file *);
 extern int hold_locked_index(struct lock_file *, int);
 extern void set_alternate_index_output(const char *);
 
-extern int delete_ref(const char *, const unsigned char *sha1, unsigned int flags);
-
 /* Environment bits from configuration mechanism */
 extern int trust_executable_bit;
 extern int trust_ctime;
diff --git a/refs.c b/refs.c
index a742d79..b575bb8 100644
--- a/refs.c
+++ b/refs.c
@@ -2789,7 +2789,8 @@ static int delete_ref_loose(struct ref_lock *lock, int flag, struct strbuf *err)
 	return 0;
 }
 
-int delete_ref(const char *refname, const unsigned char *sha1, unsigned int flags)
+int delete_ref(const char *refname, const unsigned char *old_sha1,
+	       unsigned int flags)
 {
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
@@ -2797,7 +2798,7 @@ int delete_ref(const char *refname, const unsigned char *sha1, unsigned int flag
 	transaction = ref_transaction_begin(&err);
 	if (!transaction ||
 	    ref_transaction_delete(transaction, refname,
-				   (sha1 && !is_null_sha1(sha1)) ? sha1 : NULL,
+				   (old_sha1 && !is_null_sha1(old_sha1)) ? old_sha1 : NULL,
 				   flags, NULL, &err) ||
 	    ref_transaction_commit(transaction, &err)) {
 		error("%s", err.buf);
diff --git a/refs.h b/refs.h
index 8c3d433..68b5e81 100644
--- a/refs.h
+++ b/refs.h
@@ -202,6 +202,16 @@ extern int read_ref_at(const char *refname, unsigned int flags,
 /** Check if a particular reflog exists */
 extern int reflog_exists(const char *refname);
 
+/*
+ * Delete the specified reference. If old_sha1 is non-NULL and not
+ * NULL_SHA1, then verify that the current value of the reference is
+ * old_sha1 before deleting it. If old_sha1 is NULL or NULL_SHA1,
+ * delete the reference if it exists, regardless of its old value.
+ * flags is passed through to ref_transaction_delete().
+ */
+extern int delete_ref(const char *refname, const unsigned char *old_sha1,
+		      unsigned int flags);
+
 /** Delete a reflog */
 extern int delete_reflog(const char *refname);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 02/19] remove_branches(): remove temporary
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
  2015-06-22 14:02 ` [PATCH v3 01/19] delete_ref(): move declaration to refs.h Michael Haggerty
@ 2015-06-22 14:02 ` Michael Haggerty
  2015-06-22 14:02 ` [PATCH v3 03/19] delete_ref(): handle special case more explicitly Michael Haggerty
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/remote.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index f4a6ec9..53b8e13 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -756,8 +756,7 @@ static int remove_branches(struct string_list *branches)
 	strbuf_release(&err);
 
 	for (i = 0; i < branches->nr; i++) {
-		struct string_list_item *item = branches->items + i;
-		const char *refname = item->string;
+		const char *refname = branches->items[i].string;
 
 		if (delete_ref(refname, NULL, 0))
 			result |= error(_("Could not remove branch %s"), refname);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 03/19] delete_ref(): handle special case more explicitly
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
  2015-06-22 14:02 ` [PATCH v3 01/19] delete_ref(): move declaration to refs.h Michael Haggerty
  2015-06-22 14:02 ` [PATCH v3 02/19] remove_branches(): remove temporary Michael Haggerty
@ 2015-06-22 14:02 ` Michael Haggerty
  2015-06-22 14:02 ` [PATCH v3 04/19] delete_refs(): new function for the refs API Michael Haggerty
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

delete_ref() uses a different convention for its old_sha1 parameter
than, say, ref_transaction_delete(): NULL_SHA1 means not to check the
old value. Make this fact a little bit clearer in the code by handling
it in explicit, commented code rather than burying it in a conditional
expression.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index b575bb8..f0b6cec 100644
--- a/refs.c
+++ b/refs.c
@@ -2795,10 +2795,17 @@ int delete_ref(const char *refname, const unsigned char *old_sha1,
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
 
+	/*
+	 * Treat NULL_SHA1 and NULL alike, to mean "we don't care what
+	 * the old value of the reference was (or even if it didn't
+	 * exist)":
+	 */
+	if (old_sha1 && is_null_sha1(old_sha1))
+		old_sha1 = NULL;
+
 	transaction = ref_transaction_begin(&err);
 	if (!transaction ||
-	    ref_transaction_delete(transaction, refname,
-				   (old_sha1 && !is_null_sha1(old_sha1)) ? old_sha1 : NULL,
+	    ref_transaction_delete(transaction, refname, old_sha1,
 				   flags, NULL, &err) ||
 	    ref_transaction_commit(transaction, &err)) {
 		error("%s", err.buf);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 04/19] delete_refs(): new function for the refs API
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (2 preceding siblings ...)
  2015-06-22 14:02 ` [PATCH v3 03/19] delete_ref(): handle special case more explicitly Michael Haggerty
@ 2015-06-22 14:02 ` Michael Haggerty
  2015-06-22 14:02 ` [PATCH v3 05/19] delete_refs(): make error message more generic Michael Haggerty
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

Move the function remove_branches() from builtin/remote.c to refs.c,
rename it to delete_refs(), and make it public.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/remote.c | 21 +--------------------
 refs.c           | 19 +++++++++++++++++++
 refs.h           |  7 +++++++
 3 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index 53b8e13..c8dc724 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -746,25 +746,6 @@ static int mv(int argc, const char **argv)
 	return 0;
 }
 
-static int remove_branches(struct string_list *branches)
-{
-	struct strbuf err = STRBUF_INIT;
-	int i, result = 0;
-
-	if (repack_without_refs(branches, &err))
-		result |= error("%s", err.buf);
-	strbuf_release(&err);
-
-	for (i = 0; i < branches->nr; i++) {
-		const char *refname = branches->items[i].string;
-
-		if (delete_ref(refname, NULL, 0))
-			result |= error(_("Could not remove branch %s"), refname);
-	}
-
-	return result;
-}
-
 static int rm(int argc, const char **argv)
 {
 	struct option options[] = {
@@ -821,7 +802,7 @@ static int rm(int argc, const char **argv)
 	strbuf_release(&buf);
 
 	if (!result)
-		result = remove_branches(&branches);
+		result = delete_refs(&branches);
 	string_list_clear(&branches, 0);
 
 	if (skipped.nr) {
diff --git a/refs.c b/refs.c
index f0b6cec..6f62bd1 100644
--- a/refs.c
+++ b/refs.c
@@ -2818,6 +2818,25 @@ int delete_ref(const char *refname, const unsigned char *old_sha1,
 	return 0;
 }
 
+int delete_refs(struct string_list *refnames)
+{
+	struct strbuf err = STRBUF_INIT;
+	int i, result = 0;
+
+	if (repack_without_refs(refnames, &err))
+		result |= error("%s", err.buf);
+	strbuf_release(&err);
+
+	for (i = 0; i < refnames->nr; i++) {
+		const char *refname = refnames->items[i].string;
+
+		if (delete_ref(refname, NULL, 0))
+			result |= error(_("Could not remove branch %s"), refname);
+	}
+
+	return result;
+}
+
 /*
  * People using contrib's git-new-workdir have .git/logs/refs ->
  * /some/other/path/.git/logs/refs, and that may live on another device.
diff --git a/refs.h b/refs.h
index 68b5e81..1a5d44a 100644
--- a/refs.h
+++ b/refs.h
@@ -212,6 +212,13 @@ extern int reflog_exists(const char *refname);
 extern int delete_ref(const char *refname, const unsigned char *old_sha1,
 		      unsigned int flags);
 
+/*
+ * Delete the specified references. If there are any problems, emit
+ * errors but attempt to keep going (i.e., the deletes are not done in
+ * an all-or-nothing transaction).
+ */
+extern int delete_refs(struct string_list *refnames);
+
 /** Delete a reflog */
 extern int delete_reflog(const char *refname);
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 05/19] delete_refs(): make error message more generic
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (3 preceding siblings ...)
  2015-06-22 14:02 ` [PATCH v3 04/19] delete_refs(): new function for the refs API Michael Haggerty
@ 2015-06-22 14:02 ` Michael Haggerty
  2015-06-22 14:02 ` [PATCH v3 06/19] delete_refs(): bail early if the packed-refs file cannot be rewritten Michael Haggerty
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

Change the error message from

    Could not remove branch %s

to

    could not remove reference %s

First of all, the old error message referred to "branch
refs/remotes/origin/foo", which was awkward even for the existing
caller. Normally we would refer to a reference like that as either
"remote-tracking branch origin/foo" or "reference
refs/remotes/origin/foo". Here I take the lazier alternative.

Moreover, now that this function is part of the refs API, it might be
called for refs that are neither branches nor remote-tracking
branches.

While we're at it, convert the error message to lower case, as per our
usual convention.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 6f62bd1..5386077 100644
--- a/refs.c
+++ b/refs.c
@@ -2831,7 +2831,7 @@ int delete_refs(struct string_list *refnames)
 		const char *refname = refnames->items[i].string;
 
 		if (delete_ref(refname, NULL, 0))
-			result |= error(_("Could not remove branch %s"), refname);
+			result |= error(_("could not remove reference %s"), refname);
 	}
 
 	return result;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 06/19] delete_refs(): bail early if the packed-refs file cannot be rewritten
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (4 preceding siblings ...)
  2015-06-22 14:02 ` [PATCH v3 05/19] delete_refs(): make error message more generic Michael Haggerty
@ 2015-06-22 14:02 ` Michael Haggerty
  2015-06-22 14:02 ` [PATCH v3 07/19] prune_remote(): use delete_refs() Michael Haggerty
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

If we fail to delete the doomed references from the packed-refs file,
then it is unsafe to delete their loose references, because doing so
might expose a value from the packed-refs file that is obsolete and
perhaps even points at an object that has been garbage collected.

So if repack_without_refs() fails, emit a more explicit error message
and bail.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c | 25 ++++++++++++++++++++++---
 1 file changed, 22 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index 5386077..19afc4d 100644
--- a/refs.c
+++ b/refs.c
@@ -2823,9 +2823,26 @@ int delete_refs(struct string_list *refnames)
 	struct strbuf err = STRBUF_INIT;
 	int i, result = 0;
 
-	if (repack_without_refs(refnames, &err))
-		result |= error("%s", err.buf);
-	strbuf_release(&err);
+	if (!refnames->nr)
+		return 0;
+
+	result = repack_without_refs(refnames, &err);
+	if (result) {
+		/*
+		 * If we failed to rewrite the packed-refs file, then
+		 * it is unsafe to try to remove loose refs, because
+		 * doing so might expose an obsolete packed value for
+		 * a reference that might even point at an object that
+		 * has been garbage collected.
+		 */
+		if (refnames->nr == 1)
+			error(_("could not delete reference %s: %s"),
+			      refnames->items[0].string, err.buf);
+		else
+			error(_("could not delete references: %s"), err.buf);
+
+		goto out;
+	}
 
 	for (i = 0; i < refnames->nr; i++) {
 		const char *refname = refnames->items[i].string;
@@ -2834,6 +2851,8 @@ int delete_refs(struct string_list *refnames)
 			result |= error(_("could not remove reference %s"), refname);
 	}
 
+out:
+	strbuf_release(&err);
 	return result;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 07/19] prune_remote(): use delete_refs()
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (5 preceding siblings ...)
  2015-06-22 14:02 ` [PATCH v3 06/19] delete_refs(): bail early if the packed-refs file cannot be rewritten Michael Haggerty
@ 2015-06-22 14:02 ` Michael Haggerty
  2015-06-22 14:02 ` [PATCH v3 08/19] prune_refs(): " Michael Haggerty
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

This slightly changes how errors are reported. The old and new code
both report errors that come from repack_without_refs() the same way.
But if an error occurs within delete_ref(), the old version only
emitted an error within delete_ref() without further comment. The new
version (in delete_refs()) still emits that error, but then follows it
up with

    error(_("could not remove reference %s"), refname)

The "could not remove reference" error originally came from a similar
message in remove_branches() (from builtin/remote.c).

This is an improvement, because the error from delete_ref() (which
usually comes from ref_transaction_commit()) can be pretty low-level,
like

    Cannot lock ref '%s': unable to resolve reference %s: %s

where the last "%s" is the original strerror.

In any case, I don't think we need to sweat the details too much,
because these errors should only ever be seen in the case of a broken
repository or a race between two processes; i.e., only in pretty rare
and anomalous situations.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/remote.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index c8dc724..cc3c741 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -1314,19 +1314,12 @@ static int prune_remote(const char *remote, int dry_run)
 		string_list_append(&refs_to_prune, item->util);
 	string_list_sort(&refs_to_prune);
 
-	if (!dry_run) {
-		struct strbuf err = STRBUF_INIT;
-		if (repack_without_refs(&refs_to_prune, &err))
-			result |= error("%s", err.buf);
-		strbuf_release(&err);
-	}
+	if (!dry_run)
+		result |= delete_refs(&refs_to_prune);
 
 	for_each_string_list_item(item, &states.stale) {
 		const char *refname = item->util;
 
-		if (!dry_run)
-			result |= delete_ref(refname, NULL, 0);
-
 		if (dry_run)
 			printf_ln(_(" * [would prune] %s"),
 			       abbrev_ref(refname, "refs/remotes/"));
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 08/19] prune_refs(): use delete_refs()
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (6 preceding siblings ...)
  2015-06-22 14:02 ` [PATCH v3 07/19] prune_remote(): use delete_refs() Michael Haggerty
@ 2015-06-22 14:02 ` Michael Haggerty
  2015-06-22 14:03 ` [PATCH v3 09/19] repack_without_refs(): make function private Michael Haggerty
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

The old version just looped over the references to delete, calling
delete_ref() on each one. But that has quadratic behavior, because
each call to delete_ref() might have to rewrite the packed-refs file.
This can be very expensive in a repository with a large number of
references. In some (admittedly extreme) repositories, we've seen
cases where the ref-pruning part of fetch takes multiple tens of
minutes.

Instead call delete_refs(), which (aside from being less code) has the
optimization that it only rewrites the packed-refs file a single time.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/fetch.c | 25 +++++++++++++++++--------
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 8d5b2db..34b6f5e 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -790,20 +790,29 @@ static int prune_refs(struct refspec *refs, int ref_count, struct ref *ref_map,
 	if (4 < i && !strncmp(".git", url + i - 3, 4))
 		url_len = i - 3;
 
-	for (ref = stale_refs; ref; ref = ref->next) {
-		if (!dry_run)
-			result |= delete_ref(ref->name, NULL, 0);
-		if (verbosity >= 0 && !shown_url) {
-			fprintf(stderr, _("From %.*s\n"), url_len, url);
-			shown_url = 1;
-		}
-		if (verbosity >= 0) {
+	if (!dry_run) {
+		struct string_list refnames = STRING_LIST_INIT_NODUP;
+
+		for (ref = stale_refs; ref; ref = ref->next)
+			string_list_append(&refnames, ref->name);
+
+		result = delete_refs(&refnames);
+		string_list_clear(&refnames, 0);
+	}
+
+	if (verbosity >= 0) {
+		for (ref = stale_refs; ref; ref = ref->next) {
+			if (!shown_url) {
+				fprintf(stderr, _("From %.*s\n"), url_len, url);
+				shown_url = 1;
+			}
 			fprintf(stderr, " x %-*s %-*s -> %s\n",
 				TRANSPORT_SUMMARY(_("[deleted]")),
 				REFCOL_WIDTH, _("(none)"), prettify_refname(ref->name));
 			warn_dangling_symref(stderr, dangling_msg, ref->name);
 		}
 	}
+
 	free(url);
 	free_refs(stale_refs);
 	return result;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 09/19] repack_without_refs(): make function private
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (7 preceding siblings ...)
  2015-06-22 14:02 ` [PATCH v3 08/19] prune_refs(): " Michael Haggerty
@ 2015-06-22 14:03 ` Michael Haggerty
  2015-06-22 14:03 ` [PATCH v3 10/19] initial_ref_transaction_commit(): function for initial ref creation Michael Haggerty
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

It is no longer called from outside of the refs module. Also move its
docstring and change it to imperative voice.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |  9 ++++++++-
 refs.h | 11 -----------
 2 files changed, 8 insertions(+), 12 deletions(-)

diff --git a/refs.c b/refs.c
index 19afc4d..b21bb55 100644
--- a/refs.c
+++ b/refs.c
@@ -2724,7 +2724,14 @@ int pack_refs(unsigned int flags)
 	return 0;
 }
 
-int repack_without_refs(struct string_list *refnames, struct strbuf *err)
+/*
+ * Rewrite the packed-refs file, omitting any refs listed in
+ * 'refnames'. On error, leave packed-refs unchanged, write an error
+ * message to 'err', and return a nonzero value.
+ *
+ * The refs in 'refnames' needn't be sorted. `err` must not be NULL.
+ */
+static int repack_without_refs(struct string_list *refnames, struct strbuf *err)
 {
 	struct ref_dir *packed;
 	struct string_list_item *refname;
diff --git a/refs.h b/refs.h
index 1a5d44a..5f3bea7 100644
--- a/refs.h
+++ b/refs.h
@@ -154,17 +154,6 @@ extern void rollback_packed_refs(void);
  */
 int pack_refs(unsigned int flags);
 
-/*
- * Rewrite the packed-refs file, omitting any refs listed in
- * 'refnames'. On error, packed-refs will be unchanged, the return
- * value is nonzero, and a message about the error is written to the
- * 'err' strbuf.
- *
- * The refs in 'refnames' needn't be sorted. `err` must not be NULL.
- */
-extern int repack_without_refs(struct string_list *refnames,
-			       struct strbuf *err);
-
 extern int ref_exists(const char *);
 
 extern int is_branch(const char *refname);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 10/19] initial_ref_transaction_commit(): function for initial ref creation
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (8 preceding siblings ...)
  2015-06-22 14:03 ` [PATCH v3 09/19] repack_without_refs(): make function private Michael Haggerty
@ 2015-06-22 14:03 ` Michael Haggerty
  2015-06-22 14:03 ` [PATCH v3 11/19] refs: remove some functions from the module's public interface Michael Haggerty
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

"git clone" uses shortcuts when creating the initial set of
references:

* It writes them directly to packed-refs.

* It doesn't lock the individual references (though it does lock the
  packed-refs file).

* It doesn't check for refname conflicts between two new references or
  between one new reference and any hypothetical old ones.

* It doesn't create reflog entries for the reference creations.

This functionality was implemented in builtin/clone.c. But really that
file shouldn't have such intimate knowledge of how references are
stored. So provide a new function in the refs API,
initial_ref_transaction_commit(), which can be used for initial
reference creation. The new function is based on the ref_transaction
interface.

This means that we can make some other functions private to the refs
module. That will be done in a followup commit.

It would seem to make sense to add a test here that there are no
existing references, because that is how the function *should* be
used. But in fact, the "testgit" remote helper appears to call it
*after* having set up refs/remotes/<name>/HEAD and
refs/remotes/<name>/master, so we can't be so strict. For now, the
function trusts its caller to only call it when it makes sense. Future
commits will add some more limited sanity checks.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/clone.c | 18 ++++++++++++++----
 refs.c          | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 refs.h          | 14 ++++++++++++++
 3 files changed, 75 insertions(+), 4 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index b878252..771d450 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -500,16 +500,26 @@ static void write_remote_refs(const struct ref *local_refs)
 {
 	const struct ref *r;
 
-	lock_packed_refs(LOCK_DIE_ON_ERROR);
+	struct ref_transaction *t;
+	struct strbuf err = STRBUF_INIT;
+
+	t = ref_transaction_begin(&err);
+	if (!t)
+		die("%s", err.buf);
 
 	for (r = local_refs; r; r = r->next) {
 		if (!r->peer_ref)
 			continue;
-		add_packed_ref(r->peer_ref->name, r->old_sha1);
+		if (ref_transaction_create(t, r->peer_ref->name, r->old_sha1,
+					   0, NULL, &err))
+			die("%s", err.buf);
 	}
 
-	if (commit_packed_refs())
-		die_errno("unable to overwrite old ref-pack file");
+	if (initial_ref_transaction_commit(t, &err))
+		die("%s", err.buf);
+
+	strbuf_release(&err);
+	ref_transaction_free(t);
 }
 
 static void write_followtags(const struct ref *refs, const char *msg)
diff --git a/refs.c b/refs.c
index b21bb55..2367572 100644
--- a/refs.c
+++ b/refs.c
@@ -4064,6 +4064,53 @@ cleanup:
 	return ret;
 }
 
+int initial_ref_transaction_commit(struct ref_transaction *transaction,
+				   struct strbuf *err)
+{
+	int ret = 0, i;
+	int n = transaction->nr;
+	struct ref_update **updates = transaction->updates;
+
+	assert(err);
+
+	if (transaction->state != REF_TRANSACTION_OPEN)
+		die("BUG: commit called for transaction that is not open");
+
+	for (i = 0; i < n; i++) {
+		struct ref_update *update = updates[i];
+
+		if ((update->flags & REF_HAVE_OLD) &&
+		    !is_null_sha1(update->old_sha1))
+			die("BUG: initial ref transaction with old_sha1 set");
+	}
+
+	if (lock_packed_refs(0)) {
+		strbuf_addf(err, "unable to lock packed-refs file: %s",
+			    strerror(errno));
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto cleanup;
+	}
+
+	for (i = 0; i < n; i++) {
+		struct ref_update *update = updates[i];
+
+		if ((update->flags & REF_HAVE_NEW) &&
+		    !is_null_sha1(update->new_sha1))
+			add_packed_ref(update->refname, update->new_sha1);
+	}
+
+	if (commit_packed_refs()) {
+		strbuf_addf(err, "unable to commit packed-refs file: %s",
+			    strerror(errno));
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto cleanup;
+	}
+
+cleanup:
+	transaction->state = REF_TRANSACTION_CLOSED;
+	return ret;
+}
+
 char *shorten_unambiguous_ref(const char *refname, int strict)
 {
 	int i;
diff --git a/refs.h b/refs.h
index 5f3bea7..9602889 100644
--- a/refs.h
+++ b/refs.h
@@ -366,6 +366,20 @@ int ref_transaction_commit(struct ref_transaction *transaction,
 			   struct strbuf *err);
 
 /*
+ * Like ref_transaction_commit(), but optimized for creating
+ * references when originally initializing a repository (e.g., by "git
+ * clone"). It writes the new references directly to packed-refs
+ * without locking the individual references.
+ *
+ * It is a bug to call this function when there might be other
+ * processes accessing the repository or if there are existing
+ * references that might conflict with the ones being created. All
+ * old_sha1 values must either be absent or NULL_SHA1.
+ */
+int initial_ref_transaction_commit(struct ref_transaction *transaction,
+				   struct strbuf *err);
+
+/*
  * Free an existing transaction and all associated data.
  */
 void ref_transaction_free(struct ref_transaction *transaction);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 11/19] refs: remove some functions from the module's public interface
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (9 preceding siblings ...)
  2015-06-22 14:03 ` [PATCH v3 10/19] initial_ref_transaction_commit(): function for initial ref creation Michael Haggerty
@ 2015-06-22 14:03 ` Michael Haggerty
  2015-06-22 14:03 ` [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs Michael Haggerty
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

The following functions are no longer used from outside the refs
module:

* lock_packed_refs()
* add_packed_ref()
* commit_packed_refs()
* rollback_packed_refs()

So make these functions private.

This is an important step, because it means that nobody outside of the
refs module needs to know the difference between loose and packed
references.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c | 31 ++++++++++++++++++++++++-------
 refs.h | 30 ------------------------------
 2 files changed, 24 insertions(+), 37 deletions(-)

diff --git a/refs.c b/refs.c
index 2367572..31661c7 100644
--- a/refs.c
+++ b/refs.c
@@ -1314,7 +1314,13 @@ static struct ref_dir *get_packed_refs(struct ref_cache *refs)
 	return get_packed_ref_dir(get_packed_ref_cache(refs));
 }
 
-void add_packed_ref(const char *refname, const unsigned char *sha1)
+/*
+ * Add a reference to the in-memory packed reference cache.  This may
+ * only be called while the packed-refs file is locked (see
+ * lock_packed_refs()).  To actually write the packed-refs file, call
+ * commit_packed_refs().
+ */
+static void add_packed_ref(const char *refname, const unsigned char *sha1)
 {
 	struct packed_ref_cache *packed_ref_cache =
 		get_packed_ref_cache(&ref_cache);
@@ -2503,8 +2509,12 @@ static int write_packed_entry_fn(struct ref_entry *entry, void *cb_data)
 	return 0;
 }
 
-/* This should return a meaningful errno on failure */
-int lock_packed_refs(int flags)
+/*
+ * Lock the packed-refs file for writing. Flags is passed to
+ * hold_lock_file_for_update(). Return 0 on success. On errors, set
+ * errno appropriately and return a nonzero value.
+ */
+static int lock_packed_refs(int flags)
 {
 	static int timeout_configured = 0;
 	static int timeout_value = 1000;
@@ -2534,10 +2544,12 @@ int lock_packed_refs(int flags)
 }
 
 /*
- * Commit the packed refs changes.
- * On error we must make sure that errno contains a meaningful value.
+ * Write the current version of the packed refs cache from memory to
+ * disk. The packed-refs file must already be locked for writing (see
+ * lock_packed_refs()). Return zero on success. On errors, set errno
+ * and return a nonzero value
  */
-int commit_packed_refs(void)
+static int commit_packed_refs(void)
 {
 	struct packed_ref_cache *packed_ref_cache =
 		get_packed_ref_cache(&ref_cache);
@@ -2566,7 +2578,12 @@ int commit_packed_refs(void)
 	return error;
 }
 
-void rollback_packed_refs(void)
+/*
+ * Rollback the lockfile for the packed-refs file, and discard the
+ * in-memory packed reference cache.  (The packed-refs file will be
+ * read anew if it is needed again after this function is called.)
+ */
+static void rollback_packed_refs(void)
 {
 	struct packed_ref_cache *packed_ref_cache =
 		get_packed_ref_cache(&ref_cache);
diff --git a/refs.h b/refs.h
index 9602889..cd87f2f 100644
--- a/refs.h
+++ b/refs.h
@@ -111,36 +111,6 @@ extern void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char *refn
 extern void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct string_list *refnames);
 
 /*
- * Lock the packed-refs file for writing.  Flags is passed to
- * hold_lock_file_for_update().  Return 0 on success.
- * Errno is set to something meaningful on error.
- */
-extern int lock_packed_refs(int flags);
-
-/*
- * Add a reference to the in-memory packed reference cache.  This may
- * only be called while the packed-refs file is locked (see
- * lock_packed_refs()).  To actually write the packed-refs file, call
- * commit_packed_refs().
- */
-extern void add_packed_ref(const char *refname, const unsigned char *sha1);
-
-/*
- * Write the current version of the packed refs cache from memory to
- * disk.  The packed-refs file must already be locked for writing (see
- * lock_packed_refs()).  Return zero on success.
- * Sets errno to something meaningful on error.
- */
-extern int commit_packed_refs(void);
-
-/*
- * Rollback the lockfile for the packed-refs file, and discard the
- * in-memory packed reference cache.  (The packed-refs file will be
- * read anew if it is needed again after this function is called.)
- */
-extern void rollback_packed_refs(void);
-
-/*
  * Flags for controlling behaviour of pack_refs()
  * PACK_REFS_PRUNE: Prune loose refs after packing
  * PACK_REFS_ALL:   Pack _all_ refs, not just tags and already packed refs
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (10 preceding siblings ...)
  2015-06-22 14:03 ` [PATCH v3 11/19] refs: remove some functions from the module's public interface Michael Haggerty
@ 2015-06-22 14:03 ` Michael Haggerty
  2015-06-22 21:06   ` Junio C Hamano
  2015-06-22 14:03 ` [PATCH v3 13/19] initial_ref_transaction_commit(): check for ref D/F conflicts Michael Haggerty
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

Error out if the ref_transaction includes more than one update for any
refname.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/refs.c b/refs.c
index 31661c7..53d9e45 100644
--- a/refs.c
+++ b/refs.c
@@ -4087,12 +4087,22 @@ int initial_ref_transaction_commit(struct ref_transaction *transaction,
 	int ret = 0, i;
 	int n = transaction->nr;
 	struct ref_update **updates = transaction->updates;
+	struct string_list affected_refnames = STRING_LIST_INIT_NODUP;
 
 	assert(err);
 
 	if (transaction->state != REF_TRANSACTION_OPEN)
 		die("BUG: commit called for transaction that is not open");
 
+	/* Fail if a refname appears more than once in the transaction: */
+	for (i = 0; i < n; i++)
+		string_list_append(&affected_refnames, updates[i]->refname);
+	string_list_sort(&affected_refnames);
+	if (ref_update_reject_duplicates(&affected_refnames, err)) {
+		ret = TRANSACTION_GENERIC_ERROR;
+		goto cleanup;
+	}
+
 	for (i = 0; i < n; i++) {
 		struct ref_update *update = updates[i];
 
@@ -4125,6 +4135,7 @@ int initial_ref_transaction_commit(struct ref_transaction *transaction,
 
 cleanup:
 	transaction->state = REF_TRANSACTION_CLOSED;
+	string_list_clear(&affected_refnames, 0);
 	return ret;
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 13/19] initial_ref_transaction_commit(): check for ref D/F conflicts
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (11 preceding siblings ...)
  2015-06-22 14:03 ` [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs Michael Haggerty
@ 2015-06-22 14:03 ` Michael Haggerty
  2015-06-22 14:03 ` [PATCH v3 14/19] refs: move the remaining ref module declarations to refs.h Michael Haggerty
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

In initial_ref_transaction_commit(), check for D/F conflicts (i.e.,
the type of conflict that exists between "refs/foo" and
"refs/foo/bar") among the references being created and between the
references being created and any hypothetical existing references.

Ideally, there shouldn't *be* any existing references when this
function is called. But, at least in the case of the "testgit" remote
helper, "clone" can be called after the remote-tracking "HEAD" and
"master" branches have already been created. So let's just do the
full-blown check.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---

Notes (discussion):
    I'm a bit torn on this commit. Part of me says that we should avoid
    the overhead of the extra checks against loose_refs and packed_refs
    because there shouldn't be any references there anyway. Instead, the
    "testgit" remote helper should be fixed.
    
    But these tests should be pretty cheap, and we would want to check for
    ref D/F conflicts among the new references in any case, so it seems
    safer to include these checks. Also, our documentation suggests that
    users refer to git-remote-testgit(1) as an example when writing their
    own remote helpers, so it could be that there is other code out there
    that has imitated its light misuse of this functionality.

 refs.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/refs.c b/refs.c
index 53d9e45..4513ec7 100644
--- a/refs.c
+++ b/refs.c
@@ -4081,9 +4081,19 @@ cleanup:
 	return ret;
 }
 
+static int ref_present(const char *refname,
+		       const struct object_id *oid, int flags, void *cb_data)
+{
+	struct string_list *affected_refnames = cb_data;
+
+	return string_list_has_string(affected_refnames, refname);
+}
+
 int initial_ref_transaction_commit(struct ref_transaction *transaction,
 				   struct strbuf *err)
 {
+	struct ref_dir *loose_refs = get_loose_refs(&ref_cache);
+	struct ref_dir *packed_refs = get_packed_refs(&ref_cache);
 	int ret = 0, i;
 	int n = transaction->nr;
 	struct ref_update **updates = transaction->updates;
@@ -4103,12 +4113,36 @@ int initial_ref_transaction_commit(struct ref_transaction *transaction,
 		goto cleanup;
 	}
 
+	/*
+	 * It's really undefined to call this function in an active
+	 * repository or when there are existing references: we are
+	 * only locking and changing packed-refs, so (1) any
+	 * simultaneous processes might try to change a reference at
+	 * the same time we do, and (2) any existing loose versions of
+	 * the references that we are setting would have precedence
+	 * over our values. But some remote helpers create the remote
+	 * "HEAD" and "master" branches before calling this function,
+	 * so here we really only check that none of the references
+	 * that we are creating already exists.
+	 */
+	if (for_each_rawref(ref_present, &affected_refnames))
+		die("BUG: initial ref transaction called with existing refs");
+
 	for (i = 0; i < n; i++) {
 		struct ref_update *update = updates[i];
 
 		if ((update->flags & REF_HAVE_OLD) &&
 		    !is_null_sha1(update->old_sha1))
 			die("BUG: initial ref transaction with old_sha1 set");
+		if (verify_refname_available(update->refname,
+					     &affected_refnames, NULL,
+					     loose_refs, err) ||
+		    verify_refname_available(update->refname,
+					     &affected_refnames, NULL,
+					     packed_refs, err)) {
+			ret = TRANSACTION_NAME_CONFLICT;
+			goto cleanup;
+		}
 	}
 
 	if (lock_packed_refs(0)) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 14/19] refs: move the remaining ref module declarations to refs.h
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (12 preceding siblings ...)
  2015-06-22 14:03 ` [PATCH v3 13/19] initial_ref_transaction_commit(): check for ref D/F conflicts Michael Haggerty
@ 2015-06-22 14:03 ` Michael Haggerty
  2015-06-22 14:03 ` [PATCH v3 15/19] refs.h: add some parameter names to function declarations Michael Haggerty
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

Some functions from the refs module were still declared in cache.h.
Move them to refs.h.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 archive.c               |   1 +
 builtin/blame.c         |   1 +
 builtin/fast-export.c   |   1 +
 builtin/fmt-merge-msg.c |   1 +
 builtin/init-db.c       |   1 +
 builtin/log.c           |   1 +
 cache.h                 |  66 --------------------------
 refs.c                  |   6 ++-
 refs.h                  | 121 +++++++++++++++++++++++++++++++++++++++---------
 remote-testsvn.c        |   1 +
 10 files changed, 109 insertions(+), 91 deletions(-)

diff --git a/archive.c b/archive.c
index d37c41d..936a594 100644
--- a/archive.c
+++ b/archive.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "refs.h"
 #include "commit.h"
 #include "tree-walk.h"
 #include "attr.h"
diff --git a/builtin/blame.c b/builtin/blame.c
index b3e948e..1c998cb 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -6,6 +6,7 @@
  */
 
 #include "cache.h"
+#include "refs.h"
 #include "builtin.h"
 #include "blob.h"
 #include "commit.h"
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index b8182c2..d23f3be 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -5,6 +5,7 @@
  */
 #include "builtin.h"
 #include "cache.h"
+#include "refs.h"
 #include "commit.h"
 #include "object.h"
 #include "tag.h"
diff --git a/builtin/fmt-merge-msg.c b/builtin/fmt-merge-msg.c
index 05f4c26..4ba7f28 100644
--- a/builtin/fmt-merge-msg.c
+++ b/builtin/fmt-merge-msg.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "refs.h"
 #include "commit.h"
 #include "diff.h"
 #include "revision.h"
diff --git a/builtin/init-db.c b/builtin/init-db.c
index 4335738..49df78d 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -4,6 +4,7 @@
  * Copyright (C) Linus Torvalds, 2005
  */
 #include "cache.h"
+#include "refs.h"
 #include "builtin.h"
 #include "exec_cmd.h"
 #include "parse-options.h"
diff --git a/builtin/log.c b/builtin/log.c
index e67671e..3caa917 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -5,6 +5,7 @@
  *		 2006 Junio Hamano
  */
 #include "cache.h"
+#include "refs.h"
 #include "color.h"
 #include "commit.h"
 #include "diff.h"
diff --git a/cache.h b/cache.h
index be92121..1c00098 100644
--- a/cache.h
+++ b/cache.h
@@ -1009,76 +1009,10 @@ extern int get_oid_hex(const char *hex, struct object_id *sha1);
 
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
 extern char *oid_to_hex(const struct object_id *oid);	/* same static buffer as sha1_to_hex */
-extern int read_ref_full(const char *refname, int resolve_flags,
-			 unsigned char *sha1, int *flags);
-extern int read_ref(const char *refname, unsigned char *sha1);
 
-/*
- * Resolve a reference, recursively following symbolic refererences.
- *
- * Store the referred-to object's name in sha1 and return the name of
- * the non-symbolic reference that ultimately pointed at it.  The
- * return value, if not NULL, is a pointer into either a static buffer
- * or the input ref.
- *
- * If the reference cannot be resolved to an object, the behavior
- * depends on the RESOLVE_REF_READING flag:
- *
- * - If RESOLVE_REF_READING is set, return NULL.
- *
- * - If RESOLVE_REF_READING is not set, clear sha1 and return the name of
- *   the last reference name in the chain, which will either be a non-symbolic
- *   reference or an undefined reference.  If this is a prelude to
- *   "writing" to the ref, the return value is the name of the ref
- *   that will actually be created or changed.
- *
- * If the RESOLVE_REF_NO_RECURSE flag is passed, only resolves one
- * level of symbolic reference.  The value stored in sha1 for a symbolic
- * reference will always be null_sha1 in this case, and the return
- * value is the reference that the symref refers to directly.
- *
- * If flags is non-NULL, set the value that it points to the
- * combination of REF_ISPACKED (if the reference was found among the
- * packed references), REF_ISSYMREF (if the initial reference was a
- * symbolic reference), REF_BAD_NAME (if the reference name is ill
- * formed --- see RESOLVE_REF_ALLOW_BAD_NAME below), and REF_ISBROKEN
- * (if the ref is malformed or has a bad name). See refs.h for more detail
- * on each flag.
- *
- * If ref is not a properly-formatted, normalized reference, return
- * NULL.  If more than MAXDEPTH recursive symbolic lookups are needed,
- * give up and return NULL.
- *
- * RESOLVE_REF_ALLOW_BAD_NAME allows resolving refs even when their
- * name is invalid according to git-check-ref-format(1).  If the name
- * is bad then the value stored in sha1 will be null_sha1 and the two
- * flags REF_ISBROKEN and REF_BAD_NAME will be set.
- *
- * Even with RESOLVE_REF_ALLOW_BAD_NAME, names that escape the refs/
- * directory and do not consist of all caps and underscores cannot be
- * resolved. The function returns NULL for such ref names.
- * Caps and underscores refers to the special refs, such as HEAD,
- * FETCH_HEAD and friends, that all live outside of the refs/ directory.
- */
-#define RESOLVE_REF_READING 0x01
-#define RESOLVE_REF_NO_RECURSE 0x02
-#define RESOLVE_REF_ALLOW_BAD_NAME 0x04
-extern const char *resolve_ref_unsafe(const char *ref, int resolve_flags, unsigned char *sha1, int *flags);
-extern char *resolve_refdup(const char *ref, int resolve_flags, unsigned char *sha1, int *flags);
-
-extern int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref);
-extern int dwim_log(const char *str, int len, unsigned char *sha1, char **ref);
 extern int interpret_branch_name(const char *str, int len, struct strbuf *);
 extern int get_sha1_mb(const char *str, unsigned char *sha1);
 
-/*
- * Return true iff abbrev_name is a possible abbreviation for
- * full_name according to the rules defined by ref_rev_parse_rules in
- * refs.c.
- */
-extern int refname_match(const char *abbrev_name, const char *full_name);
-
-extern int create_symref(const char *ref, const char *refs_heads_master, const char *logmsg);
 extern int validate_headref(const char *ref);
 
 extern int base_name_compare(const char *name1, int len1, int mode1, const char *name2, int len2, int mode2);
diff --git a/refs.c b/refs.c
index 4513ec7..2ba618e 100644
--- a/refs.c
+++ b/refs.c
@@ -1732,9 +1732,11 @@ const char *resolve_ref_unsafe(const char *refname, int resolve_flags,
 	return ret;
 }
 
-char *resolve_refdup(const char *ref, int resolve_flags, unsigned char *sha1, int *flags)
+char *resolve_refdup(const char *refname, int resolve_flags,
+		     unsigned char *sha1, int *flags)
 {
-	return xstrdup_or_null(resolve_ref_unsafe(ref, resolve_flags, sha1, flags));
+	return xstrdup_or_null(resolve_ref_unsafe(refname, resolve_flags,
+						  sha1, flags));
 }
 
 /* The argument to filter_refs */
diff --git a/refs.h b/refs.h
index cd87f2f..b22e308 100644
--- a/refs.h
+++ b/refs.h
@@ -2,6 +2,98 @@
 #define REFS_H
 
 /*
+ * Resolve a reference, recursively following symbolic refererences.
+ *
+ * Store the referred-to object's name in sha1 and return the name of
+ * the non-symbolic reference that ultimately pointed at it.  The
+ * return value, if not NULL, is a pointer into either a static buffer
+ * or the input ref.
+ *
+ * If the reference cannot be resolved to an object, the behavior
+ * depends on the RESOLVE_REF_READING flag:
+ *
+ * - If RESOLVE_REF_READING is set, return NULL.
+ *
+ * - If RESOLVE_REF_READING is not set, clear sha1 and return the name of
+ *   the last reference name in the chain, which will either be a non-symbolic
+ *   reference or an undefined reference.  If this is a prelude to
+ *   "writing" to the ref, the return value is the name of the ref
+ *   that will actually be created or changed.
+ *
+ * If the RESOLVE_REF_NO_RECURSE flag is passed, only resolves one
+ * level of symbolic reference.  The value stored in sha1 for a symbolic
+ * reference will always be null_sha1 in this case, and the return
+ * value is the reference that the symref refers to directly.
+ *
+ * If flags is non-NULL, set the value that it points to the
+ * combination of REF_ISPACKED (if the reference was found among the
+ * packed references), REF_ISSYMREF (if the initial reference was a
+ * symbolic reference), REF_BAD_NAME (if the reference name is ill
+ * formed --- see RESOLVE_REF_ALLOW_BAD_NAME below), and REF_ISBROKEN
+ * (if the ref is malformed or has a bad name). See refs.h for more detail
+ * on each flag.
+ *
+ * If ref is not a properly-formatted, normalized reference, return
+ * NULL.  If more than MAXDEPTH recursive symbolic lookups are needed,
+ * give up and return NULL.
+ *
+ * RESOLVE_REF_ALLOW_BAD_NAME allows resolving refs even when their
+ * name is invalid according to git-check-ref-format(1).  If the name
+ * is bad then the value stored in sha1 will be null_sha1 and the two
+ * flags REF_ISBROKEN and REF_BAD_NAME will be set.
+ *
+ * Even with RESOLVE_REF_ALLOW_BAD_NAME, names that escape the refs/
+ * directory and do not consist of all caps and underscores cannot be
+ * resolved. The function returns NULL for such ref names.
+ * Caps and underscores refers to the special refs, such as HEAD,
+ * FETCH_HEAD and friends, that all live outside of the refs/ directory.
+ */
+#define RESOLVE_REF_READING 0x01
+#define RESOLVE_REF_NO_RECURSE 0x02
+#define RESOLVE_REF_ALLOW_BAD_NAME 0x04
+
+extern const char *resolve_ref_unsafe(const char *refname, int resolve_flags,
+				      unsigned char *sha1, int *flags);
+
+extern char *resolve_refdup(const char *refname, int resolve_flags,
+			    unsigned char *sha1, int *flags);
+
+extern int read_ref_full(const char *refname, int resolve_flags,
+			 unsigned char *sha1, int *flags);
+extern int read_ref(const char *refname, unsigned char *sha1);
+
+extern int ref_exists(const char *);
+
+extern int is_branch(const char *refname);
+
+/*
+ * If refname is a non-symbolic reference that refers to a tag object,
+ * and the tag can be (recursively) dereferenced to a non-tag object,
+ * store the SHA1 of the referred-to object to sha1 and return 0.  If
+ * any of these conditions are not met, return a non-zero value.
+ * Symbolic references are considered unpeelable, even if they
+ * ultimately resolve to a peelable tag.
+ */
+extern int peel_ref(const char *refname, unsigned char *sha1);
+
+/**
+ * Resolve refname in the nested "gitlink" repository that is located
+ * at path.  If the resolution is successful, return 0 and set sha1 to
+ * the name of the object; otherwise, return a non-zero value.
+ */
+extern int resolve_gitlink_ref(const char *path, const char *refname, unsigned char *sha1);
+
+/*
+ * Return true iff abbrev_name is a possible abbreviation for
+ * full_name according to the rules defined by ref_rev_parse_rules in
+ * refs.c.
+ */
+extern int refname_match(const char *abbrev_name, const char *full_name);
+
+extern int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref);
+extern int dwim_log(const char *str, int len, unsigned char *sha1, char **ref);
+
+/*
  * A ref_transaction represents a collection of ref updates
  * that should succeed or fail together.
  *
@@ -99,14 +191,14 @@ extern int for_each_remote_ref_submodule(const char *submodule, each_ref_fn fn,
 extern int head_ref_namespaced(each_ref_fn fn, void *cb_data);
 extern int for_each_namespaced_ref(each_ref_fn fn, void *cb_data);
 
+/* can be used to learn about broken ref and symref */
+extern int for_each_rawref(each_ref_fn, void *);
+
 static inline const char *has_glob_specials(const char *pattern)
 {
 	return strpbrk(pattern, "?*[");
 }
 
-/* can be used to learn about broken ref and symref */
-extern int for_each_rawref(each_ref_fn, void *);
-
 extern void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char *refname);
 extern void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct string_list *refnames);
 
@@ -124,20 +216,6 @@ extern void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct st
  */
 int pack_refs(unsigned int flags);
 
-extern int ref_exists(const char *);
-
-extern int is_branch(const char *refname);
-
-/*
- * If refname is a non-symbolic reference that refers to a tag object,
- * and the tag can be (recursively) dereferenced to a non-tag object,
- * store the SHA1 of the referred-to object to sha1 and return 0.  If
- * any of these conditions are not met, return a non-zero value.
- * Symbolic references are considered unpeelable, even if they
- * ultimately resolve to a peelable tag.
- */
-extern int peel_ref(const char *refname, unsigned char *sha1);
-
 /*
  * Flags controlling ref_transaction_update(), ref_transaction_create(), etc.
  * REF_NODEREF: act on the ref directly, instead of dereferencing
@@ -206,17 +284,13 @@ extern int for_each_reflog(each_ref_fn, void *);
 extern int check_refname_format(const char *refname, int flags);
 
 extern const char *prettify_refname(const char *refname);
+
 extern char *shorten_unambiguous_ref(const char *refname, int strict);
 
 /** rename ref, return 0 on success **/
 extern int rename_ref(const char *oldref, const char *newref, const char *logmsg);
 
-/**
- * Resolve refname in the nested "gitlink" repository that is located
- * at path.  If the resolution is successful, return 0 and set sha1 to
- * the name of the object; otherwise, return a non-zero value.
- */
-extern int resolve_gitlink_ref(const char *path, const char *refname, unsigned char *sha1);
+extern int create_symref(const char *ref, const char *refs_heads_master, const char *logmsg);
 
 enum action_on_err {
 	UPDATE_REFS_MSG_ON_ERR,
@@ -367,6 +441,7 @@ int update_ref(const char *msg, const char *refname,
 	       unsigned int flags, enum action_on_err onerr);
 
 extern int parse_hide_refs_config(const char *var, const char *value, const char *);
+
 extern int ref_is_hidden(const char *);
 
 enum expire_reflog_flags {
diff --git a/remote-testsvn.c b/remote-testsvn.c
index 48bf6eb..f599c37 100644
--- a/remote-testsvn.c
+++ b/remote-testsvn.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "refs.h"
 #include "remote.h"
 #include "strbuf.h"
 #include "url.h"
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 15/19] refs.h: add some parameter names to function declarations
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (13 preceding siblings ...)
  2015-06-22 14:03 ` [PATCH v3 14/19] refs: move the remaining ref module declarations to refs.h Michael Haggerty
@ 2015-06-22 14:03 ` Michael Haggerty
  2015-06-22 14:03 ` [PATCH v3 16/19] check_branch_commit(): make first parameter const Michael Haggerty
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.h | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/refs.h b/refs.h
index b22e308..c9596ea 100644
--- a/refs.h
+++ b/refs.h
@@ -62,7 +62,7 @@ extern int read_ref_full(const char *refname, int resolve_flags,
 			 unsigned char *sha1, int *flags);
 extern int read_ref(const char *refname, unsigned char *sha1);
 
-extern int ref_exists(const char *);
+extern int ref_exists(const char *refname);
 
 extern int is_branch(const char *refname);
 
@@ -170,15 +170,15 @@ typedef int each_ref_fn(const char *refname,
  * modifies the reference also returns a nonzero value to immediately
  * stop the iteration.
  */
-extern int head_ref(each_ref_fn, void *);
-extern int for_each_ref(each_ref_fn, void *);
-extern int for_each_ref_in(const char *, each_ref_fn, void *);
-extern int for_each_tag_ref(each_ref_fn, void *);
-extern int for_each_branch_ref(each_ref_fn, void *);
-extern int for_each_remote_ref(each_ref_fn, void *);
-extern int for_each_replace_ref(each_ref_fn, void *);
-extern int for_each_glob_ref(each_ref_fn, const char *pattern, void *);
-extern int for_each_glob_ref_in(each_ref_fn, const char *pattern, const char* prefix, void *);
+extern int head_ref(each_ref_fn fn, void *cb_data);
+extern int for_each_ref(each_ref_fn fn, void *cb_data);
+extern int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data);
+extern int for_each_tag_ref(each_ref_fn fn, void *cb_data);
+extern int for_each_branch_ref(each_ref_fn fn, void *cb_data);
+extern int for_each_remote_ref(each_ref_fn fn, void *cb_data);
+extern int for_each_replace_ref(each_ref_fn fn, void *cb_data);
+extern int for_each_glob_ref(each_ref_fn fn, const char *pattern, void *cb_data);
+extern int for_each_glob_ref_in(each_ref_fn fn, const char *pattern, const char *prefix, void *cb_data);
 
 extern int head_ref_submodule(const char *submodule, each_ref_fn fn, void *cb_data);
 extern int for_each_ref_submodule(const char *submodule, each_ref_fn fn, void *cb_data);
@@ -192,7 +192,7 @@ extern int head_ref_namespaced(each_ref_fn fn, void *cb_data);
 extern int for_each_namespaced_ref(each_ref_fn fn, void *cb_data);
 
 /* can be used to learn about broken ref and symref */
-extern int for_each_rawref(each_ref_fn, void *);
+extern int for_each_rawref(each_ref_fn fn, void *cb_data);
 
 static inline const char *has_glob_specials(const char *pattern)
 {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 16/19] check_branch_commit(): make first parameter const
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (14 preceding siblings ...)
  2015-06-22 14:03 ` [PATCH v3 15/19] refs.h: add some parameter names to function declarations Michael Haggerty
@ 2015-06-22 14:03 ` Michael Haggerty
  2015-06-22 14:03 ` [PATCH v3 17/19] update_ref(): don't read old reference value before delete Michael Haggerty
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

Make it clear that this function does not overwrite its first
argument.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/branch.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index b42e5b6..47e3eb9 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -160,7 +160,7 @@ static int branch_merged(int kind, const char *name,
 }
 
 static int check_branch_commit(const char *branchname, const char *refname,
-			       unsigned char *sha1, struct commit *head_rev,
+			       const unsigned char *sha1, struct commit *head_rev,
 			       int kinds, int force)
 {
 	struct commit *rev = lookup_commit_reference(sha1);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 17/19] update_ref(): don't read old reference value before delete
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (15 preceding siblings ...)
  2015-06-22 14:03 ` [PATCH v3 16/19] check_branch_commit(): make first parameter const Michael Haggerty
@ 2015-06-22 14:03 ` Michael Haggerty
  2015-06-22 14:03 ` [PATCH v3 18/19] cmd_update_ref(): make logic more straightforward Michael Haggerty
  2015-06-22 14:03 ` [PATCH v3 19/19] delete_ref(): use the usual convention for old_sha1 Michael Haggerty
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

If we are deleting the reference, then we don't need to read the
reference's old value. It doesn't provide any race safety, because the
value read just before the delete is no "better" than the value that
would be read under lock during the delete. And even if the reference
previously didn't exist, we can call delete_ref() on it if we don't
provide an old_sha1 value.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 fast-import.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fast-import.c b/fast-import.c
index 6378726..d7ed065 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1692,13 +1692,13 @@ static int update_branch(struct branch *b)
 	unsigned char old_sha1[20];
 	struct strbuf err = STRBUF_INIT;
 
-	if (read_ref(b->name, old_sha1))
-		hashclr(old_sha1);
 	if (is_null_sha1(b->sha1)) {
 		if (b->delete)
-			delete_ref(b->name, old_sha1, 0);
+			delete_ref(b->name, NULL, 0);
 		return 0;
 	}
+	if (read_ref(b->name, old_sha1))
+		hashclr(old_sha1);
 	if (!force_update && !is_null_sha1(old_sha1)) {
 		struct commit *old_cmit, *new_cmit;
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 18/19] cmd_update_ref(): make logic more straightforward
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (16 preceding siblings ...)
  2015-06-22 14:03 ` [PATCH v3 17/19] update_ref(): don't read old reference value before delete Michael Haggerty
@ 2015-06-22 14:03 ` Michael Haggerty
  2015-06-22 14:03 ` [PATCH v3 19/19] delete_ref(): use the usual convention for old_sha1 Michael Haggerty
  18 siblings, 0 replies; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

Restructure the code to avoid clearing oldsha1 when oldval is unset.
It's value is not needed in that case, so this change makes it more
obvious that its initialization is consistent with its later use.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/update-ref.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 3d79a46..160c7ac 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -408,9 +408,16 @@ int cmd_update_ref(int argc, const char **argv, const char *prefix)
 			die("%s: not a valid SHA1", value);
 	}
 
-	hashclr(oldsha1); /* all-zero hash in case oldval is the empty string */
-	if (oldval && *oldval && get_sha1(oldval, oldsha1))
-		die("%s: not a valid old SHA1", oldval);
+	if (oldval) {
+		if (!*oldval)
+			/*
+			 * The empty string implies that the reference
+			 * must not already exist:
+			 */
+			hashclr(oldsha1);
+		else if (get_sha1(oldval, oldsha1))
+			die("%s: not a valid old SHA1", oldval);
+	}
 
 	if (no_deref)
 		flags = REF_NODEREF;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 19/19] delete_ref(): use the usual convention for old_sha1
  2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
                   ` (17 preceding siblings ...)
  2015-06-22 14:03 ` [PATCH v3 18/19] cmd_update_ref(): make logic more straightforward Michael Haggerty
@ 2015-06-22 14:03 ` Michael Haggerty
  2015-06-22 21:10   ` Junio C Hamano
  18 siblings, 1 reply; 24+ messages in thread
From: Michael Haggerty @ 2015-06-22 14:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git, Michael Haggerty

The ref_transaction_update() family of functions use the following
convention for their old_sha1 parameters:

* old_sha1 == NULL: Don't check the old value at all.
* is_null_sha1(old_sha1): Ensure that the reference didn't exist
  before the transaction.
* otherwise: Ensure that the reference had the specified value before
  the transaction.

delete_ref() had a different convention, namely treating
is_null_sha1(old_sha1) as "don't care". Change it to adhere to the
standard convention to reduce the scope for confusion.

Please note that it is now a bug to pass old_sha1=NULL_SHA1 to
delete_ref() (because it doesn't make sense to delete a reference that
you already know doesn't exist). This is consistent with the behavior
of ref_transaction_delete().

Most of the callers of delete_ref() never pass old_sha1=NULL_SHA1 to
delete_ref(), and are therefore unaffected by this change. The
two exceptions are:

* The call in cmd_update_ref(), which passed NULL_SHA1 if the old
  value passed in on the command line was 0{40} or the empty string.
  Change that caller to pass NULL in those cases.

  Arguably, it should be an error to call "update-ref -d" with the old
  value set to "does not exist", just as it is for the `--stdin`
  command "delete". But since this usage was accepted until now,
  continue to accept it.

* The call in delete_branches(), which could pass NULL_SHA1 if
  deleting a broken or symbolic ref. Change it to pass NULL in these
  cases.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/branch.c     |  3 ++-
 builtin/update-ref.c |  8 +++++++-
 refs.c               |  8 --------
 refs.h               | 10 +++++-----
 4 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 47e3eb9..58aa84f 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -253,7 +253,8 @@ static int delete_branches(int argc, const char **argv, int force, int kinds,
 			continue;
 		}
 
-		if (delete_ref(name, sha1, REF_NODEREF)) {
+		if (delete_ref(name, is_null_sha1(sha1) ? NULL : sha1,
+			       REF_NODEREF)) {
 			error(remote_branch
 			      ? _("Error deleting remote-tracking branch '%s'")
 			      : _("Error deleting branch '%s'"),
diff --git a/builtin/update-ref.c b/builtin/update-ref.c
index 160c7ac..6763cf1 100644
--- a/builtin/update-ref.c
+++ b/builtin/update-ref.c
@@ -422,7 +422,13 @@ int cmd_update_ref(int argc, const char **argv, const char *prefix)
 	if (no_deref)
 		flags = REF_NODEREF;
 	if (delete)
-		return delete_ref(refname, oldval ? oldsha1 : NULL, flags);
+		/*
+		 * For purposes of backwards compatibility, we treat
+		 * NULL_SHA1 as "don't care" here:
+		 */
+		return delete_ref(refname,
+				  (oldval && !is_null_sha1(oldsha1)) ? oldsha1 : NULL,
+				  flags);
 	else
 		return update_ref(msg, refname, sha1, oldval ? oldsha1 : NULL,
 				  flags, UPDATE_REFS_DIE_ON_ERR);
diff --git a/refs.c b/refs.c
index 2ba618e..adf3dc2 100644
--- a/refs.c
+++ b/refs.c
@@ -2821,14 +2821,6 @@ int delete_ref(const char *refname, const unsigned char *old_sha1,
 	struct ref_transaction *transaction;
 	struct strbuf err = STRBUF_INIT;
 
-	/*
-	 * Treat NULL_SHA1 and NULL alike, to mean "we don't care what
-	 * the old value of the reference was (or even if it didn't
-	 * exist)":
-	 */
-	if (old_sha1 && is_null_sha1(old_sha1))
-		old_sha1 = NULL;
-
 	transaction = ref_transaction_begin(&err);
 	if (!transaction ||
 	    ref_transaction_delete(transaction, refname, old_sha1,
diff --git a/refs.h b/refs.h
index c9596ea..e82fca5 100644
--- a/refs.h
+++ b/refs.h
@@ -240,11 +240,11 @@ extern int read_ref_at(const char *refname, unsigned int flags,
 extern int reflog_exists(const char *refname);
 
 /*
- * Delete the specified reference. If old_sha1 is non-NULL and not
- * NULL_SHA1, then verify that the current value of the reference is
- * old_sha1 before deleting it. If old_sha1 is NULL or NULL_SHA1,
- * delete the reference if it exists, regardless of its old value.
- * flags is passed through to ref_transaction_delete().
+ * Delete the specified reference. If old_sha1 is non-NULL, then
+ * verify that the current value of the reference is old_sha1 before
+ * deleting it. If old_sha1 is NULL, delete the reference if it
+ * exists, regardless of its old value. It is an error for old_sha1 to
+ * be NULL_SHA1. flags is passed through to ref_transaction_delete().
  */
 extern int delete_ref(const char *refname, const unsigned char *old_sha1,
 		      unsigned int flags);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs
  2015-06-22 14:03 ` [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs Michael Haggerty
@ 2015-06-22 21:06   ` Junio C Hamano
  2015-06-23  7:11     ` Michael Haggerty
  0 siblings, 1 reply; 24+ messages in thread
From: Junio C Hamano @ 2015-06-22 21:06 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Stefan Beller, Jeff King, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Error out if the ref_transaction includes more than one update for any
> refname.
>
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
>  refs.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)

This somehow feels like "ehh, I now know better and this function
should have been like this from the beginning" to me.

But that is OK.

Is the initial creation logic too fragile to deserve its own
function to force callers to think about it, by the way?

What I am wondering is if we could turn the safety logic that appear
here (i.e. no existing refs must be assumed by the set of updates,
etc.)  into an optimization cue and implement this as a special case
helper to ref_transaction_commit(), i.e.

	ref_transaction_commit(...)
        {
		if (updates are all initial creation &&
                    no existing refs in repository)
			return initial_ref_transaction_commit(...);
		/* otherwise we do the usual thing */
		...
	}

and have "clone" call ref_transaction_commit() as usual.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 19/19] delete_ref(): use the usual convention for old_sha1
  2015-06-22 14:03 ` [PATCH v3 19/19] delete_ref(): use the usual convention for old_sha1 Michael Haggerty
@ 2015-06-22 21:10   ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2015-06-22 21:10 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Stefan Beller, Jeff King, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> The ref_transaction_update() family of functions use the following
> convention for their old_sha1 parameters:
>
> * old_sha1 == NULL: Don't check the old value at all.
> * is_null_sha1(old_sha1): Ensure that the reference didn't exist
>   before the transaction.
> * otherwise: Ensure that the reference had the specified value before
>   the transaction.
>
> delete_ref() had a different convention, namely treating
> is_null_sha1(old_sha1) as "don't care". Change it to adhere to the
> standard convention to reduce the scope for confusion.
>
> Please note that it is now a bug to pass old_sha1=NULL_SHA1 to
> delete_ref() (because it doesn't make sense to delete a reference that
> you already know doesn't exist). This is consistent with the behavior
> of ref_transaction_delete().

Nice.

Everything I read in this round of changes makes sense (except for
the ones I had minor comments on, which were separately sent).

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs
  2015-06-22 21:06   ` Junio C Hamano
@ 2015-06-23  7:11     ` Michael Haggerty
  2015-06-23 17:44       ` Junio C Hamano
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Haggerty @ 2015-06-23  7:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, Jeff King, git

On 06/22/2015 11:06 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> Error out if the ref_transaction includes more than one update for any
>> refname.
>>
>> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
>> ---
>>  refs.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
> 
> This somehow feels like "ehh, I now know better and this function
> should have been like this from the beginning" to me.
> 
> But that is OK.
> 
> Is the initial creation logic too fragile to deserve its own
> function to force callers to think about it, by the way?
> 
> What I am wondering is if we could turn the safety logic that appear
> here (i.e. no existing refs must be assumed by the set of updates,
> etc.)  into an optimization cue and implement this as a special case
> helper to ref_transaction_commit(), i.e.
> 
> 	ref_transaction_commit(...)
>         {
> 		if (updates are all initial creation &&
>                     no existing refs in repository)
> 			return initial_ref_transaction_commit(...);
> 		/* otherwise we do the usual thing */
> 		...
> 	}
> 
> and have "clone" call ref_transaction_commit() as usual.

The safety logic in this function is (approximately) necessary, but not
sufficient, to guarantee safety. One of the shortcuts that it takes is
not locking the references while they are being created. Therefore, it
would be unsafe for one process to call ref_transaction_commit() while
another is calling initial_ref_transaction_commit(). So the caller has
to "know" somehow that no other processes are working in the repository
for this optimization to be safe. It conveys that knowledge by calling
initial_ref_transaction_commit() rather than ref_transaction_commit().

Of course the next question is, "How does `git clone` know that no other
process is working in the new repository?" Actually, it doesn't. For
example, I just verified that I can run

    git clone $URL mygit &
    sleep 0.1
    cd mygit
    git commit --allow-empty -m "New root commit"

and thereby "overwrite" the upstream `master` without the usual
non-fast-forward protection. I guess we are just relying on the user's
common sense not to run Git commands in a new repository before its
creation is complete.

I suppose we *could* special-case `git clone` to not finish the
initialization of the repository (for example, not write the `config`
file) until *after* the packed-refs file is written. This would prevent
other git processes from recognizing the directory as a Git repository
and so prevent them from running before the clone is finished.

But I think if anything it would make more sense to go the other direction:

* Teach ref_transaction_commit() an option that asks it to write
  references updates to packed-refs instead of loose refs (but
  locking the references as usual).

* Change clone to use ref_transaction_commit() like everybody
  else, passing it the new REFS_WRITE_TO_PACKED_REFS option.

Then clone would participate in the normal locking protocol, and it
wouldn't *matter* if another process runs before the clone is finished.
There would also be some consistency benefits. For example, if
core.logallrefupdates is set globally or on the command line, the
initial reference creations would be reflogged. And other operations
that write references in bulk could use the new
REFS_WRITE_TO_PACKED_REFS option to prevent loose reference proliferation.

But I don't think any of this is a problem in practice, and I think we
can live with using the optimized-but-not-100%-safe
initial_ref_transaction_commit() for cloning.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs
  2015-06-23  7:11     ` Michael Haggerty
@ 2015-06-23 17:44       ` Junio C Hamano
  0 siblings, 0 replies; 24+ messages in thread
From: Junio C Hamano @ 2015-06-23 17:44 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Stefan Beller, Jeff King, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> On 06/22/2015 11:06 PM, Junio C Hamano wrote:
> ...
>> What I am wondering is if we could turn the safety logic that appear
>> here (i.e. no existing refs must be assumed by the set of updates,
>> etc.)  into an optimization cue and implement this as a special case
>> helper to ref_transaction_commit(), i.e.
>> 
>> 	ref_transaction_commit(...)
>>         {
>> 		if (updates are all initial creation &&
>>                     no existing refs in repository)
>> 			return initial_ref_transaction_commit(...);
>> 		/* otherwise we do the usual thing */
>> 		...
>> 	}
>> 
>> and have "clone" call ref_transaction_commit() as usual.
>
> The safety logic in this function is (approximately) necessary, but not
> sufficient, to guarantee safety.

Oh, no question about it, and you do not even have to bring up an
insane "user runs random commands while Git is hard working on it"
non use-case ;-)

> One of the shortcuts that it takes is
> not locking the references while they are being created. Therefore, it
> would be unsafe for one process to call ref_transaction_commit() while
> another is calling initial_ref_transaction_commit(). So the caller has
> to "know" somehow that no other processes are working in the repository
> for this optimization to be safe. It conveys that knowledge by calling
> initial_ref_transaction_commit() rather than ref_transaction_commit().

OK.  So the answer to my first question "is the initial creation
logic too fragile" is a resounding "yes"; the caller should know
that it is too crazy for the user to be competing with what it is
doing before deciding to call initial_ref_transaction_commit(),
hence we cannot automatically detect if it is safe from within
ref_transaction_commit() to use this logic as an optimization.

> But I think if anything it would make more sense to go the other direction:
>
> * Teach ref_transaction_commit() an option that asks it to write
>   references updates to packed-refs instead of loose refs (but
>   locking the references as usual).
>
> * Change clone to use ref_transaction_commit() like everybody
>   else, passing it the new REFS_WRITE_TO_PACKED_REFS option.
>
> Then clone would participate in the normal locking protocol, and it
> wouldn't *matter* if another process runs before the clone is finished.

Yeah, I thought that was actually I was driving at, and doing so
without that write-to-packed-refs option, which I'd prefer to leave
it as an optimization inside ref_transaction_commit().

Except that I missed that the initial_* variant is even more
aggressive (i.e. not locking), so no such optimization is safe.

> There would also be some consistency benefits. For example, if
> core.logallrefupdates is set globally or on the command line, the
> initial reference creations would be reflogged. And other operations
> that write references in bulk could use the new
> REFS_WRITE_TO_PACKED_REFS option to prevent loose reference proliferation.
>
> But I don't think any of this is a problem in practice, and I think we
> can live with using the optimized-but-not-100%-safe
> initial_ref_transaction_commit() for cloning.

OK.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-06-23 17:44 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-22 14:02 [PATCH v3 00/19] Improve "refs" encapsulation and speed up deletes Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 01/19] delete_ref(): move declaration to refs.h Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 02/19] remove_branches(): remove temporary Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 03/19] delete_ref(): handle special case more explicitly Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 04/19] delete_refs(): new function for the refs API Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 05/19] delete_refs(): make error message more generic Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 06/19] delete_refs(): bail early if the packed-refs file cannot be rewritten Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 07/19] prune_remote(): use delete_refs() Michael Haggerty
2015-06-22 14:02 ` [PATCH v3 08/19] prune_refs(): " Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 09/19] repack_without_refs(): make function private Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 10/19] initial_ref_transaction_commit(): function for initial ref creation Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 11/19] refs: remove some functions from the module's public interface Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs Michael Haggerty
2015-06-22 21:06   ` Junio C Hamano
2015-06-23  7:11     ` Michael Haggerty
2015-06-23 17:44       ` Junio C Hamano
2015-06-22 14:03 ` [PATCH v3 13/19] initial_ref_transaction_commit(): check for ref D/F conflicts Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 14/19] refs: move the remaining ref module declarations to refs.h Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 15/19] refs.h: add some parameter names to function declarations Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 16/19] check_branch_commit(): make first parameter const Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 17/19] update_ref(): don't read old reference value before delete Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 18/19] cmd_update_ref(): make logic more straightforward Michael Haggerty
2015-06-22 14:03 ` [PATCH v3 19/19] delete_ref(): use the usual convention for old_sha1 Michael Haggerty
2015-06-22 21:10   ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.