git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Remove a user of extra_refs in clone
@ 2012-01-17  5:50 mhagger
  2012-01-17  5:50 ` [PATCH 1/4] pack_refs(): remove redundant check mhagger
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: mhagger @ 2012-01-17  5:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Jakub Narebski, Heiko Voigt, Johan Herland,
	Michael Haggerty

From: Michael Haggerty <mhagger@alum.mit.edu>

When cloning, write_remote_refs() creates local packed refs from the
refs read from the remote repository.  It does this by creating extra
refs for the references then calling pack_refs() to bake the extra
refs into the packed-refs file, then calling clear_extra_refs().

This is silly and relies on the kludgy extra_refs mechanism, which I
want to get rid of.  Instead, add a function call add_packed_ref() to
the refs API, and use it to create packed refs (in the memory cache)
directly.  Then call pack_refs() as before to write the packed-refs
file.

Because the new add_packed_ref() function allows references (perhaps
many of them) to be added to an existing ref_array, it would be
inefficient to re-sort the list after every addition.  So instead,
append new entries to the end of the ref_array and note that the array
is unsorted.  Then, before the ref_array is used, check if it is
unsorted and sort it if necessary.

A side effect of this change is that the new packed references are
left in the in-memory packed reference cache after the return from
write_remote_refs() (whereas previously, the refs were stored as
temporary extra refs that were purged before return from the
function).  I can't see any place in the following code where this
would make a difference, but there is quite a bit of code there so it
is hard to audit.  Confirmation that this is OK would be welcome.

Michael Haggerty (4):
  pack_refs(): remove redundant check
  ref_array: keep track of whether references are sorted
  add_packed_ref(): new function in the refs API.
  write_remote_refs(): create packed (rather than extra) refs

 builtin/clone.c |    3 +--
 pack-refs.c     |    3 +--
 refs.c          |   39 ++++++++++++++++++++++++++++++++-------
 refs.h          |    6 ++++++
 4 files changed, 40 insertions(+), 11 deletions(-)

-- 
1.7.8.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] pack_refs(): remove redundant check
  2012-01-17  5:50 [PATCH 0/4] Remove a user of extra_refs in clone mhagger
@ 2012-01-17  5:50 ` mhagger
  2012-01-17  5:50 ` [PATCH 2/4] ref_array: keep track of whether references are sorted mhagger
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: mhagger @ 2012-01-17  5:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Jakub Narebski, Heiko Voigt, Johan Herland,
	Michael Haggerty

From: Michael Haggerty <mhagger@alum.mit.edu>

handle_one_ref() only adds refs to the cbdata.ref_to_prune list if
(cbdata.flags & PACK_REFS_PRUNE) is set.  So any references in this
list at the end of pack_refs() can be pruned unconditionally.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
Trivial simplification, not essential to the rest of this patch
series.

 pack-refs.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/pack-refs.c b/pack-refs.c
index 23bbd00..f09a054 100644
--- a/pack-refs.c
+++ b/pack-refs.c
@@ -143,7 +143,6 @@ int pack_refs(unsigned int flags)
 	packed.fd = -1;
 	if (commit_lock_file(&packed) < 0)
 		die_errno("unable to overwrite old ref-pack file");
-	if (cbdata.flags & PACK_REFS_PRUNE)
-		prune_refs(cbdata.ref_to_prune);
+	prune_refs(cbdata.ref_to_prune);
 	return 0;
 }
-- 
1.7.8.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/4] ref_array: keep track of whether references are sorted
  2012-01-17  5:50 [PATCH 0/4] Remove a user of extra_refs in clone mhagger
  2012-01-17  5:50 ` [PATCH 1/4] pack_refs(): remove redundant check mhagger
@ 2012-01-17  5:50 ` mhagger
  2012-01-17  5:50 ` [PATCH 3/4] add_packed_ref(): new function in the refs API mhagger
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: mhagger @ 2012-01-17  5:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Jakub Narebski, Heiko Voigt, Johan Herland,
	Michael Haggerty

From: Michael Haggerty <mhagger@alum.mit.edu>

Keep track of how many entries in a ref_array are already sorted.  In
sort_ref_array(), only call qsort() if the dir contains unsorted
entries (i.e., if references have been appended to the end of the list
since the last call to sort_ref_array()).

Sort ref_arrays only when needed, namely in search_ref_array() and in
do_for_each_ref().  However, never sort the extra_refs, because
extra_refs can contain multiple entries with the same name.

This change is currently not useful, because entries are not added to
ref_arrays after they are created.  But in a moment they will be...

Implementation note: we could store a binary "sorted" value instead of
an integer, but storing the number of sorted entries leaves the way
open for a couple of possible future optimizations:

* In sort_ref_array(), sort *only* the unsorted entries, then merge
  them with the sorted entries.  This should be faster if most of the
  entries are already sorted.

* Teach search_ref_array() to do a binary search of any sorted
  entries, and if unsuccessful do a linear search of any unsorted
  entries.  This would avoid the need to sort the list every time that
  search_ref_array() is called, and (given some intelligence about how
  often to sort) could significantly improve the speed in certain
  hypothetical usage patterns.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |   33 ++++++++++++++++++++++++++-------
 1 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/refs.c b/refs.c
index 6f436f1..268816f 100644
--- a/refs.c
+++ b/refs.c
@@ -17,6 +17,15 @@ struct ref_entry {
 
 struct ref_array {
 	int nr, alloc;
+
+	/*
+	 * Entries with index 0 <= i < sorted are sorted by name.  New
+	 * entries are appended to the list unsorted, and are sorted
+	 * only when required; thus we avoid the need to sort the list
+	 * after the addition of every reference.
+	 */
+	int sorted;
+
 	struct ref_entry **refs;
 };
 
@@ -105,12 +114,18 @@ static int is_dup_ref(const struct ref_entry *ref1, const struct ref_entry *ref2
 	}
 }
 
+/*
+ * Sort the entries in array (if they are not already sorted).
+ */
 static void sort_ref_array(struct ref_array *array)
 {
 	int i, j;
 
-	/* Nothing to sort unless there are at least two entries */
-	if (array->nr < 2)
+	/*
+	 * This check also prevents passing a zero-length array to qsort(),
+	 * which is a problem on some platforms.
+	 */
+	if (array->sorted == array->nr)
 		return;
 
 	qsort(array->refs, array->nr, sizeof(*array->refs), ref_entry_cmp);
@@ -124,7 +139,7 @@ static void sort_ref_array(struct ref_array *array)
 		}
 		array->refs[++i] = array->refs[j];
 	}
-	array->nr = i + 1;
+	array->sorted = array->nr = i + 1;
 }
 
 static struct ref_entry *search_ref_array(struct ref_array *array, const char *refname)
@@ -137,7 +152,7 @@ static struct ref_entry *search_ref_array(struct ref_array *array, const char *r
 
 	if (!array->nr)
 		return NULL;
-
+	sort_ref_array(array);
 	len = strlen(refname) + 1;
 	e = xmalloc(sizeof(struct ref_entry) + len);
 	memcpy(e->name, refname, len);
@@ -168,6 +183,10 @@ static struct ref_cache {
 
 static struct ref_entry *current_ref;
 
+/*
+ * The extra_refs is never sorted, because it is allowed to contain entries
+ * with duplicate names.
+ */
 static struct ref_array extra_refs;
 
 static void clear_ref_array(struct ref_array *array)
@@ -176,7 +195,7 @@ static void clear_ref_array(struct ref_array *array)
 	for (i = 0; i < array->nr; i++)
 		free(array->refs[i]);
 	free(array->refs);
-	array->nr = array->alloc = 0;
+	array->sorted = array->nr = array->alloc = 0;
 	array->refs = NULL;
 }
 
@@ -268,7 +287,6 @@ static void read_packed_refs(FILE *f, struct ref_array *array)
 		    !get_sha1_hex(refline + 1, sha1))
 			hashcpy(last->peeled, sha1);
 	}
-	sort_ref_array(array);
 }
 
 void add_extra_ref(const char *refname, const unsigned char *sha1, int flag)
@@ -404,7 +422,6 @@ static struct ref_array *get_loose_refs(struct ref_cache *refs)
 {
 	if (!refs->did_loose) {
 		get_ref_dir(refs, "refs", &refs->loose);
-		sort_ref_array(&refs->loose);
 		refs->did_loose = 1;
 	}
 	return &refs->loose;
@@ -720,6 +737,8 @@ static int do_for_each_ref(const char *submodule, const char *base, each_ref_fn
 	for (i = 0; i < extra->nr; i++)
 		retval = do_one_ref(base, fn, trim, flags, cb_data, extra->refs[i]);
 
+	sort_ref_array(packed);
+	sort_ref_array(loose);
 	while (p < packed->nr && l < loose->nr) {
 		struct ref_entry *entry;
 		int cmp = strcmp(packed->refs[p]->name, loose->refs[l]->name);
-- 
1.7.8.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/4] add_packed_ref(): new function in the refs API.
  2012-01-17  5:50 [PATCH 0/4] Remove a user of extra_refs in clone mhagger
  2012-01-17  5:50 ` [PATCH 1/4] pack_refs(): remove redundant check mhagger
  2012-01-17  5:50 ` [PATCH 2/4] ref_array: keep track of whether references are sorted mhagger
@ 2012-01-17  5:50 ` mhagger
  2012-01-17  5:50 ` [PATCH 4/4] write_remote_refs(): create packed (rather than extra) refs mhagger
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: mhagger @ 2012-01-17  5:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Jakub Narebski, Heiko Voigt, Johan Herland,
	Michael Haggerty

From: Michael Haggerty <mhagger@alum.mit.edu>

Add a new function add_packed_ref() that adds a reference directly to
the in-memory packed reference cache.  This will be useful for
creating local references while cloning.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
This new function call only stores the new reference to the in-memory
cache.  The user has to remember to call pack_refs() to actually write
the new reference(s) to the packed-refs file.  (I don't think it is
practical to make the write happen automatically.)

 refs.c |    6 ++++++
 refs.h |    6 ++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/refs.c b/refs.c
index 268816f..14f764d 100644
--- a/refs.c
+++ b/refs.c
@@ -319,6 +319,12 @@ static struct ref_array *get_packed_refs(struct ref_cache *refs)
 	return &refs->packed;
 }
 
+void add_packed_ref(const char *refname, const unsigned char *sha1)
+{
+	add_ref(get_packed_refs(get_ref_cache(NULL)),
+			create_ref_entry(refname, sha1, REF_ISPACKED, 1));
+}
+
 static void get_ref_dir(struct ref_cache *refs, const char *base,
 			struct ref_array *array)
 {
diff --git a/refs.h b/refs.h
index d498291..00ba1e2 100644
--- a/refs.h
+++ b/refs.h
@@ -51,6 +51,12 @@ extern int for_each_rawref(each_ref_fn, void *);
 extern void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char *refname);
 
 /*
+ * Add a reference to the in-memory packed reference cache.  To actually
+ * write the reference to the packed-refs file, call pack_refs().
+ */
+extern void add_packed_ref(const char *refname, const unsigned char *sha1);
+
+/*
  * Extra refs will be listed by for_each_ref() before any actual refs
  * for the duration of this process or until clear_extra_refs() is
  * called. Only extra refs added before for_each_ref() is called will
-- 
1.7.8.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 4/4] write_remote_refs(): create packed (rather than extra) refs
  2012-01-17  5:50 [PATCH 0/4] Remove a user of extra_refs in clone mhagger
                   ` (2 preceding siblings ...)
  2012-01-17  5:50 ` [PATCH 3/4] add_packed_ref(): new function in the refs API mhagger
@ 2012-01-17  5:50 ` mhagger
  2012-01-17  6:35 ` [PATCH 0/4] Remove a user of extra_refs in clone Jeff King
  2012-01-17  6:51 ` Junio C Hamano
  5 siblings, 0 replies; 7+ messages in thread
From: mhagger @ 2012-01-17  5:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Jeff King, Jakub Narebski, Heiko Voigt, Johan Herland,
	Michael Haggerty

From: Michael Haggerty <mhagger@alum.mit.edu>

write_remote_refs() creates new packed refs from references obtained
from the remote repository, which is "out of thin air" as far as the
local repository is concerned.  Previously it did this by creating
"extra" refs, then calling pack_refs() to bake them into the
packed-refs file.  Instead, create packed refs (in the packed
reference cache) directly, then call pack_refs().

Aside from being more logical, this is another step towards removing
extra refs entirely.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/clone.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 86db954..9413537 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -441,11 +441,10 @@ static void write_remote_refs(const struct ref *local_refs)
 	for (r = local_refs; r; r = r->next) {
 		if (!r->peer_ref)
 			continue;
-		add_extra_ref(r->peer_ref->name, r->old_sha1, 0);
+		add_packed_ref(r->peer_ref->name, r->old_sha1);
 	}
 
 	pack_refs(PACK_REFS_ALL);
-	clear_extra_refs();
 }
 
 static int write_one_config(const char *key, const char *value, void *data)
-- 
1.7.8.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/4] Remove a user of extra_refs in clone
  2012-01-17  5:50 [PATCH 0/4] Remove a user of extra_refs in clone mhagger
                   ` (3 preceding siblings ...)
  2012-01-17  5:50 ` [PATCH 4/4] write_remote_refs(): create packed (rather than extra) refs mhagger
@ 2012-01-17  6:35 ` Jeff King
  2012-01-17  6:51 ` Junio C Hamano
  5 siblings, 0 replies; 7+ messages in thread
From: Jeff King @ 2012-01-17  6:35 UTC (permalink / raw)
  To: mhagger; +Cc: Junio C Hamano, git, Jakub Narebski, Heiko Voigt, Johan Herland

On Tue, Jan 17, 2012 at 06:50:30AM +0100, mhagger@alum.mit.edu wrote:

> When cloning, write_remote_refs() creates local packed refs from the
> refs read from the remote repository.  It does this by creating extra
> refs for the references then calling pack_refs() to bake the extra
> refs into the packed-refs file, then calling clear_extra_refs().
> 
> This is silly and relies on the kludgy extra_refs mechanism, which I
> want to get rid of.  Instead, add a function call add_packed_ref() to
> the refs API, and use it to create packed refs (in the memory cache)
> directly.  Then call pack_refs() as before to write the packed-refs
> file.

I certainly approve of the goal.

> Because the new add_packed_ref() function allows references (perhaps
> many of them) to be added to an existing ref_array, it would be
> inefficient to re-sort the list after every addition.  So instead,
> append new entries to the end of the ref_array and note that the array
> is unsorted.  Then, before the ref_array is used, check if it is
> unsorted and sort it if necessary.

Makes sense.

> A side effect of this change is that the new packed references are
> left in the in-memory packed reference cache after the return from
> write_remote_refs() (whereas previously, the refs were stored as
> temporary extra refs that were purged before return from the
> function).  I can't see any place in the following code where this
> would make a difference, but there is quite a bit of code there so it
> is hard to audit.  Confirmation that this is OK would be welcome.

Actually, I think you may be fixing an extremely minor bug with this.

If later code in clone tries to resolve one of the refs in
refs/remotes/<origin>/, the current code will see that it doesn't exist
as a ref file (because we wrote it packed) and call get_packed_ref. That
checks the cached refs list, which will claim that did_packed is true,
but the "packed" array will be empty. Which is wrong; we _do_ have that
ref, and our cache is stale. After writing the packed list, the current
code probably ought to be calling invalidate_ref_cache(). It was only
the fact that most of the remaining code didn't care that this wasn't a
bug in the first place.

Your code makes more sense, in that it will keep the packed_refs list up
to date, and later calls to resolve_ref will properly find those refs.

The only place where I can detect a change in behavior is in the reflog
creation. When we write the refs/remotes/<origin>/HEAD ref, we call
create_symref, which in turn will decide whether to write a reflog entry
or not based on whether the pointed-to ref exists (because if we are
making a symref to something that doesn't exist, we have no sha1 to
write in the reflog entry). So before, we got no reflog for
refs/remotes/<origin>/HEAD (because we erroneously thought that
refs/remotes/origin/master (or whatever) did not exist). With your
patches, the reflog entry is created.

I doubt anyone ever noticed, but now that code is at least working as
intended.

> Michael Haggerty (4):
>   pack_refs(): remove redundant check
>   ref_array: keep track of whether references are sorted
>   add_packed_ref(): new function in the refs API.
>   write_remote_refs(): create packed (rather than extra) refs

I won't respond to each patch individually. All of them looked good to
me. Thanks for a very pleasant read.

-Peff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/4] Remove a user of extra_refs in clone
  2012-01-17  5:50 [PATCH 0/4] Remove a user of extra_refs in clone mhagger
                   ` (4 preceding siblings ...)
  2012-01-17  6:35 ` [PATCH 0/4] Remove a user of extra_refs in clone Jeff King
@ 2012-01-17  6:51 ` Junio C Hamano
  5 siblings, 0 replies; 7+ messages in thread
From: Junio C Hamano @ 2012-01-17  6:51 UTC (permalink / raw)
  To: mhagger; +Cc: git, Jeff King, Jakub Narebski, Heiko Voigt, Johan Herland

Yay ;-)

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-01-17  6:51 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-17  5:50 [PATCH 0/4] Remove a user of extra_refs in clone mhagger
2012-01-17  5:50 ` [PATCH 1/4] pack_refs(): remove redundant check mhagger
2012-01-17  5:50 ` [PATCH 2/4] ref_array: keep track of whether references are sorted mhagger
2012-01-17  5:50 ` [PATCH 3/4] add_packed_ref(): new function in the refs API mhagger
2012-01-17  5:50 ` [PATCH 4/4] write_remote_refs(): create packed (rather than extra) refs mhagger
2012-01-17  6:35 ` [PATCH 0/4] Remove a user of extra_refs in clone Jeff King
2012-01-17  6:51 ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).