All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] Reference iterators
@ 2016-05-30  7:55 Michael Haggerty
  2016-05-30  7:55 ` [PATCH 01/13] refs: remove unnecessary "extern" keywords Michael Haggerty
                   ` (12 more replies)
  0 siblings, 13 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

This patch series implements a new iteration paradigm for iterating
over references using iterators. This approach was proposed earlier as
an RFC [1].

The justification for this change is laid out in the RFC [1] and in
the commit message for patch 09/13 [2]. Please refer to those, as I
won't repeat them here.

There are several obvious followup steps that are not included in this
patch series (because they don't help with the initial transition to
pluggable reference backends). I've written prototypes of several of
these:

* The iterator interface could be made public and callers could start
  using it directly.

* A filter_ref_iterator could let references be filtered based on a
  callback function (this would be useful, for example, for
  for_each_glob_ref()).

* A single_ref_iterator could "iterate" over a single reference (e.g.,
  HEAD).

* A series_ref_iterator could iterate over multiple iterators, for
  callers that want to operate on, say, HEAD plus all branches.

* The ref_cache could be fed from an iterator, to better decouple
  caching from reading packed and loose references, and to make it
  easy to use caching with other reference storage backends.

* Per-worktree refs could be overlaid on top of shared references
  using merge_ref_iterator rather than mixing them up in the same
  ref_cache.

* The dir_iterator could be used in more places; for example, when
  reading loose references from disk.

Table of contents of changes:

* The first eight patches are cleanups.

  * Patch 05/13 fixes a code path that unlinks symrefs directly
    instead of using the refs API.

* Patch 09/13 is the most important part of this series. It introduces
  not only reference iterators, but also (1) a pattern that other
  iterator interfaces can follow (along with some useful constants in
  iterator.h), and (2) a pattern for building OO code with
  inheritance.

* Patch 10/13 actually uses the new reference iteration mechanism.

* Patch 11/13 avoids aborting reflog iterations if an unexpected file
  is found under `$GIT_DIR/logs`. This fixes a bug that could cause
  objects needed by reflogs to be pruned, breaking the reflogs.

* Patch 12/13 adds an iterator interface for iterating over
  directories (using the same model as patch 09/13).

* Patch 13/13 implements for_each_reflog() on top of an iterator
  interface (essentially the analogue of patch 10/13, but for
  reflogs).

Note that it is not necessary to rebuild for_each_reflog_ent() on top
of iterators at this time, because that function deals with only a
single reference at a time. Therefore, composability is not important
here (for example, it won't have to deal with multiple refs backends
at the same time).

This patch series applies on top of mh/split-under-lock. It is also
available from my GitHub repository [3] as branch "ref-iterators".

I haven't yet completely rebased the ref-store changes (virtualization
of the refs API) on top of all of these changes, but I will work on
that next.

Michael

[1] http://thread.gmane.org/gmane.comp.version-control.git/290409
[2] http://mid.gmane.org/89634d216544d1102dafd5d18247bff2581d48a8.1464537050.git.mhagger@alum.mit.edu
[3] https://github.com/mhagger/git

Michael Haggerty (13):
  refs: remove unnecessary "extern" keywords
  do_for_each_ref(): move docstring to the header file
  refs: use name "prefix" consistently
  delete_refs(): add a flags argument
  remote rm: handle symbolic refs correctly
  get_ref_cache(): only create an instance if there is a submodule
  entry_resolves_to_object(): rename function from
    ref_resolves_to_object()
  ref_resolves_to_object(): new function
  refs: introduce an iterator interface
  do_for_each_ref(): reimplement using reference iteration
  for_each_reflog(): don't abort for bad references
  dir_iterator: new API for iterating over a directory tree
  for_each_reflog(): reimplement using iterators

 Makefile             |   2 +
 builtin/fetch.c      |   2 +-
 builtin/remote.c     |   8 +-
 dir-iterator.c       | 180 +++++++++++++++
 dir-iterator.h       |  86 +++++++
 iterator.h           |  81 +++++++
 refs.c               |  20 ++
 refs.h               | 139 +++++++-----
 refs/files-backend.c | 630 +++++++++++++++++++++++++++++++--------------------
 refs/iterator.c      | 376 ++++++++++++++++++++++++++++++
 refs/refs-internal.h | 225 +++++++++++++++++-
 11 files changed, 1427 insertions(+), 322 deletions(-)
 create mode 100644 dir-iterator.c
 create mode 100644 dir-iterator.h
 create mode 100644 iterator.h
 create mode 100644 refs/iterator.c

-- 
2.8.1

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/13] refs: remove unnecessary "extern" keywords
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-05-30  7:55 ` [PATCH 02/13] do_for_each_ref(): move docstring to the header file Michael Haggerty
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

There's continuing work in this area, so clean up unneeded "extern"
keywords rather than schlepping them around. Also split up some overlong
lines and add parameter names in a couple of places.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.h | 132 +++++++++++++++++++++++++++++++++++------------------------------
 1 file changed, 72 insertions(+), 60 deletions(-)

diff --git a/refs.h b/refs.h
index 9230d47..21874f0 100644
--- a/refs.h
+++ b/refs.h
@@ -52,19 +52,19 @@
 #define RESOLVE_REF_NO_RECURSE 0x02
 #define RESOLVE_REF_ALLOW_BAD_NAME 0x04
 
-extern const char *resolve_ref_unsafe(const char *refname, int resolve_flags,
-				      unsigned char *sha1, int *flags);
+const char *resolve_ref_unsafe(const char *refname, int resolve_flags,
+			       unsigned char *sha1, int *flags);
 
-extern char *resolve_refdup(const char *refname, int resolve_flags,
-			    unsigned char *sha1, int *flags);
+char *resolve_refdup(const char *refname, int resolve_flags,
+		     unsigned char *sha1, int *flags);
 
-extern int read_ref_full(const char *refname, int resolve_flags,
-			 unsigned char *sha1, int *flags);
-extern int read_ref(const char *refname, unsigned char *sha1);
+int read_ref_full(const char *refname, int resolve_flags,
+		  unsigned char *sha1, int *flags);
+int read_ref(const char *refname, unsigned char *sha1);
 
-extern int ref_exists(const char *refname);
+int ref_exists(const char *refname);
 
-extern int is_branch(const char *refname);
+int is_branch(const char *refname);
 
 /*
  * If refname is a non-symbolic reference that refers to a tag object,
@@ -74,24 +74,25 @@ extern int is_branch(const char *refname);
  * Symbolic references are considered unpeelable, even if they
  * ultimately resolve to a peelable tag.
  */
-extern int peel_ref(const char *refname, unsigned char *sha1);
+int peel_ref(const char *refname, unsigned char *sha1);
 
 /**
  * Resolve refname in the nested "gitlink" repository that is located
  * at path.  If the resolution is successful, return 0 and set sha1 to
  * the name of the object; otherwise, return a non-zero value.
  */
-extern int resolve_gitlink_ref(const char *path, const char *refname, unsigned char *sha1);
+int resolve_gitlink_ref(const char *path, const char *refname,
+			unsigned char *sha1);
 
 /*
  * Return true iff abbrev_name is a possible abbreviation for
  * full_name according to the rules defined by ref_rev_parse_rules in
  * refs.c.
  */
-extern int refname_match(const char *abbrev_name, const char *full_name);
+int refname_match(const char *abbrev_name, const char *full_name);
 
-extern int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref);
-extern int dwim_log(const char *str, int len, unsigned char *sha1, char **ref);
+int dwim_ref(const char *str, int len, unsigned char *sha1, char **ref);
+int dwim_log(const char *str, int len, unsigned char *sha1, char **ref);
 
 /*
  * A ref_transaction represents a collection of ref updates
@@ -182,38 +183,45 @@ typedef int each_ref_fn(const char *refname,
  * modifies the reference also returns a nonzero value to immediately
  * stop the iteration.
  */
-extern int head_ref(each_ref_fn fn, void *cb_data);
-extern int for_each_ref(each_ref_fn fn, void *cb_data);
-extern int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data);
-extern int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data, unsigned int broken);
-extern int for_each_tag_ref(each_ref_fn fn, void *cb_data);
-extern int for_each_branch_ref(each_ref_fn fn, void *cb_data);
-extern int for_each_remote_ref(each_ref_fn fn, void *cb_data);
-extern int for_each_replace_ref(each_ref_fn fn, void *cb_data);
-extern int for_each_glob_ref(each_ref_fn fn, const char *pattern, void *cb_data);
-extern int for_each_glob_ref_in(each_ref_fn fn, const char *pattern, const char *prefix, void *cb_data);
+int head_ref(each_ref_fn fn, void *cb_data);
+int for_each_ref(each_ref_fn fn, void *cb_data);
+int for_each_ref_in(const char *prefix, each_ref_fn fn, void *cb_data);
+int for_each_fullref_in(const char *prefix, each_ref_fn fn, void *cb_data,
+			unsigned int broken);
+int for_each_tag_ref(each_ref_fn fn, void *cb_data);
+int for_each_branch_ref(each_ref_fn fn, void *cb_data);
+int for_each_remote_ref(each_ref_fn fn, void *cb_data);
+int for_each_replace_ref(each_ref_fn fn, void *cb_data);
+int for_each_glob_ref(each_ref_fn fn, const char *pattern, void *cb_data);
+int for_each_glob_ref_in(each_ref_fn fn, const char *pattern,
+			 const char *prefix, void *cb_data);
 
-extern int head_ref_submodule(const char *submodule, each_ref_fn fn, void *cb_data);
-extern int for_each_ref_submodule(const char *submodule, each_ref_fn fn, void *cb_data);
-extern int for_each_ref_in_submodule(const char *submodule, const char *prefix,
+int head_ref_submodule(const char *submodule, each_ref_fn fn, void *cb_data);
+int for_each_ref_submodule(const char *submodule,
+			   each_ref_fn fn, void *cb_data);
+int for_each_ref_in_submodule(const char *submodule, const char *prefix,
 		each_ref_fn fn, void *cb_data);
-extern int for_each_tag_ref_submodule(const char *submodule, each_ref_fn fn, void *cb_data);
-extern int for_each_branch_ref_submodule(const char *submodule, each_ref_fn fn, void *cb_data);
-extern int for_each_remote_ref_submodule(const char *submodule, each_ref_fn fn, void *cb_data);
+int for_each_tag_ref_submodule(const char *submodule,
+			       each_ref_fn fn, void *cb_data);
+int for_each_branch_ref_submodule(const char *submodule,
+				  each_ref_fn fn, void *cb_data);
+int for_each_remote_ref_submodule(const char *submodule,
+				  each_ref_fn fn, void *cb_data);
 
-extern int head_ref_namespaced(each_ref_fn fn, void *cb_data);
-extern int for_each_namespaced_ref(each_ref_fn fn, void *cb_data);
+int head_ref_namespaced(each_ref_fn fn, void *cb_data);
+int for_each_namespaced_ref(each_ref_fn fn, void *cb_data);
 
 /* can be used to learn about broken ref and symref */
-extern int for_each_rawref(each_ref_fn fn, void *cb_data);
+int for_each_rawref(each_ref_fn fn, void *cb_data);
 
 static inline const char *has_glob_specials(const char *pattern)
 {
 	return strpbrk(pattern, "?*[");
 }
 
-extern void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char *refname);
-extern void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct string_list *refnames);
+void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char *refname);
+void warn_dangling_symrefs(FILE *fp, const char *msg_fmt,
+			   const struct string_list *refnames);
 
 /*
  * Flags for controlling behaviour of pack_refs()
@@ -245,13 +253,13 @@ int pack_refs(unsigned int flags);
 int safe_create_reflog(const char *refname, int force_create, struct strbuf *err);
 
 /** Reads log for the value of ref during at_time. **/
-extern int read_ref_at(const char *refname, unsigned int flags,
-		       unsigned long at_time, int cnt,
-		       unsigned char *sha1, char **msg,
-		       unsigned long *cutoff_time, int *cutoff_tz, int *cutoff_cnt);
+int read_ref_at(const char *refname, unsigned int flags,
+		unsigned long at_time, int cnt,
+		unsigned char *sha1, char **msg,
+		unsigned long *cutoff_time, int *cutoff_tz, int *cutoff_cnt);
 
 /** Check if a particular reflog exists */
-extern int reflog_exists(const char *refname);
+int reflog_exists(const char *refname);
 
 /*
  * Delete the specified reference. If old_sha1 is non-NULL, then
@@ -260,21 +268,25 @@ extern int reflog_exists(const char *refname);
  * exists, regardless of its old value. It is an error for old_sha1 to
  * be NULL_SHA1. flags is passed through to ref_transaction_delete().
  */
-extern int delete_ref(const char *refname, const unsigned char *old_sha1,
-		      unsigned int flags);
+int delete_ref(const char *refname, const unsigned char *old_sha1,
+	       unsigned int flags);
 
 /*
  * Delete the specified references. If there are any problems, emit
  * errors but attempt to keep going (i.e., the deletes are not done in
  * an all-or-nothing transaction).
  */
-extern int delete_refs(struct string_list *refnames);
+int delete_refs(struct string_list *refnames);
 
 /** Delete a reflog */
-extern int delete_reflog(const char *refname);
+int delete_reflog(const char *refname);
 
 /* iterate over reflog entries */
-typedef int each_reflog_ent_fn(unsigned char *osha1, unsigned char *nsha1, const char *, unsigned long, int, const char *, void *);
+typedef int each_reflog_ent_fn(
+		unsigned char *old_sha1, unsigned char *new_sha1,
+		const char *committer, unsigned long timestamp,
+		int tz, const char *msg, void *cb_data);
+
 int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
 
@@ -282,7 +294,7 @@ int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void
  * Calls the specified function for each reflog file until it returns nonzero,
  * and returns the value
  */
-extern int for_each_reflog(each_ref_fn, void *);
+int for_each_reflog(each_ref_fn fn, void *cb_data);
 
 #define REFNAME_ALLOW_ONELEVEL 1
 #define REFNAME_REFSPEC_PATTERN 2
@@ -295,16 +307,16 @@ extern int for_each_reflog(each_ref_fn, void *);
  * allow a single "*" wildcard character in the refspec. No leading or
  * repeated slashes are accepted.
  */
-extern int check_refname_format(const char *refname, int flags);
+int check_refname_format(const char *refname, int flags);
 
-extern const char *prettify_refname(const char *refname);
+const char *prettify_refname(const char *refname);
 
-extern char *shorten_unambiguous_ref(const char *refname, int strict);
+char *shorten_unambiguous_ref(const char *refname, int strict);
 
 /** rename ref, return 0 on success **/
-extern int rename_ref(const char *oldref, const char *newref, const char *logmsg);
+int rename_ref(const char *oldref, const char *newref, const char *logmsg);
 
-extern int create_symref(const char *refname, const char *target, const char *logmsg);
+int create_symref(const char *refname, const char *target, const char *logmsg);
 
 /*
  * Update HEAD of the specified gitdir.
@@ -313,7 +325,7 @@ extern int create_symref(const char *refname, const char *target, const char *lo
  * $GIT_DIR points to.
  * Return 0 if successful, non-zero otherwise.
  * */
-extern int set_worktree_head_symref(const char *gitdir, const char *target);
+int set_worktree_head_symref(const char *gitdir, const char *target);
 
 enum action_on_err {
 	UPDATE_REFS_MSG_ON_ERR,
@@ -463,7 +475,7 @@ int update_ref(const char *msg, const char *refname,
 	       const unsigned char *new_sha1, const unsigned char *old_sha1,
 	       unsigned int flags, enum action_on_err onerr);
 
-extern int parse_hide_refs_config(const char *var, const char *value, const char *);
+int parse_hide_refs_config(const char *var, const char *value, const char *);
 
 /*
  * Check whether a ref is hidden. If no namespace is set, both the first and
@@ -473,7 +485,7 @@ extern int parse_hide_refs_config(const char *var, const char *value, const char
  * the ref is outside that namespace, the first parameter is NULL. The second
  * parameter always points to the full ref name.
  */
-extern int ref_is_hidden(const char *, const char *);
+int ref_is_hidden(const char *, const char *);
 
 enum ref_type {
 	REF_TYPE_PER_WORKTREE,
@@ -522,11 +534,11 @@ typedef void reflog_expiry_cleanup_fn(void *cb_data);
  * enum expire_reflog_flags. The three function pointers are described
  * above. On success, return zero.
  */
-extern int reflog_expire(const char *refname, const unsigned char *sha1,
-			 unsigned int flags,
-			 reflog_expiry_prepare_fn prepare_fn,
-			 reflog_expiry_should_prune_fn should_prune_fn,
-			 reflog_expiry_cleanup_fn cleanup_fn,
-			 void *policy_cb_data);
+int reflog_expire(const char *refname, const unsigned char *sha1,
+		  unsigned int flags,
+		  reflog_expiry_prepare_fn prepare_fn,
+		  reflog_expiry_should_prune_fn should_prune_fn,
+		  reflog_expiry_cleanup_fn cleanup_fn,
+		  void *policy_cb_data);
 
 #endif /* REFS_H */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/13] do_for_each_ref(): move docstring to the header file
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
  2016-05-30  7:55 ` [PATCH 01/13] refs: remove unnecessary "extern" keywords Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-05-30  7:55 ` [PATCH 03/13] refs: use name "prefix" consistently Michael Haggerty
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c |  9 ---------
 refs/refs-internal.h | 10 +++++++++-
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 1230dfb..68db3e8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1878,15 +1878,6 @@ static int do_for_each_entry(struct ref_cache *refs, const char *base,
 	return retval;
 }
 
-/*
- * Call fn for each reference in the specified ref_cache for which the
- * refname begins with base.  If trim is non-zero, then trim that many
- * characters off the beginning of each refname before passing the
- * refname to fn.  flags can be DO_FOR_EACH_INCLUDE_BROKEN to include
- * broken references in the iteration.  If fn ever returns a non-zero
- * value, stop the iteration and return that value; otherwise, return
- * 0.
- */
 int do_for_each_ref(const char *submodule, const char *base,
 		    each_ref_fn fn, int trim, int flags, void *cb_data)
 {
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 1bb3d87..b4dd545 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -249,7 +249,15 @@ int rename_ref_available(const char *oldname, const char *newname);
 #define DO_FOR_EACH_INCLUDE_BROKEN 0x01
 
 /*
- * The common backend for the for_each_*ref* functions
+ * Call fn for each reference in the specified submodule for which the
+ * refname begins with base. If trim is non-zero, then trim that many
+ * characters off the beginning of each refname before passing the
+ * refname to fn. flags can be DO_FOR_EACH_INCLUDE_BROKEN to include
+ * broken references in the iteration. If fn ever returns a non-zero
+ * value, stop the iteration and return that value; otherwise, return
+ * 0.
+ *
+ * This is the common backend for the for_each_*ref* functions.
  */
 int do_for_each_ref(const char *submodule, const char *base,
 		    each_ref_fn fn, int trim, int flags, void *cb_data);
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/13] refs: use name "prefix" consistently
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
  2016-05-30  7:55 ` [PATCH 01/13] refs: remove unnecessary "extern" keywords Michael Haggerty
  2016-05-30  7:55 ` [PATCH 02/13] do_for_each_ref(): move docstring to the header file Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-05-30  7:55 ` [PATCH 04/13] delete_refs(): add a flags argument Michael Haggerty
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

In the context of the for_each_ref() functions, call the prefix that
references must start with "prefix". (In some places it was called
"base".) This is clearer, and also prevents confusion with another
planned use of the word "base".

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 24 ++++++++++++------------
 refs/refs-internal.h | 14 +++++++-------
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 68db3e8..5af7441 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -542,7 +542,7 @@ static struct ref_entry *current_ref;
 typedef int each_ref_entry_fn(struct ref_entry *entry, void *cb_data);
 
 struct ref_entry_cb {
-	const char *base;
+	const char *prefix;
 	int trim;
 	int flags;
 	each_ref_fn *fn;
@@ -559,7 +559,7 @@ static int do_one_ref(struct ref_entry *entry, void *cb_data)
 	struct ref_entry *old_current_ref;
 	int retval;
 
-	if (!starts_with(entry->name, data->base))
+	if (!starts_with(entry->name, data->prefix))
 		return 0;
 
 	if (!(data->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
@@ -1824,12 +1824,12 @@ int peel_ref(const char *refname, unsigned char *sha1)
 
 /*
  * Call fn for each reference in the specified ref_cache, omitting
- * references not in the containing_dir of base.  fn is called for all
- * references, including broken ones.  If fn ever returns a non-zero
+ * references not in the containing_dir of prefix. Call fn for all
+ * references, including broken ones. If fn ever returns a non-zero
  * value, stop the iteration and return that value; otherwise, return
  * 0.
  */
-static int do_for_each_entry(struct ref_cache *refs, const char *base,
+static int do_for_each_entry(struct ref_cache *refs, const char *prefix,
 			     each_ref_entry_fn fn, void *cb_data)
 {
 	struct packed_ref_cache *packed_ref_cache;
@@ -1846,8 +1846,8 @@ static int do_for_each_entry(struct ref_cache *refs, const char *base,
 	 * disk.
 	 */
 	loose_dir = get_loose_refs(refs);
-	if (base && *base) {
-		loose_dir = find_containing_dir(loose_dir, base, 0);
+	if (prefix && *prefix) {
+		loose_dir = find_containing_dir(loose_dir, prefix, 0);
 	}
 	if (loose_dir)
 		prime_ref_dir(loose_dir);
@@ -1855,8 +1855,8 @@ static int do_for_each_entry(struct ref_cache *refs, const char *base,
 	packed_ref_cache = get_packed_ref_cache(refs);
 	acquire_packed_ref_cache(packed_ref_cache);
 	packed_dir = get_packed_ref_dir(packed_ref_cache);
-	if (base && *base) {
-		packed_dir = find_containing_dir(packed_dir, base, 0);
+	if (prefix && *prefix) {
+		packed_dir = find_containing_dir(packed_dir, prefix, 0);
 	}
 
 	if (packed_dir && loose_dir) {
@@ -1878,14 +1878,14 @@ static int do_for_each_entry(struct ref_cache *refs, const char *base,
 	return retval;
 }
 
-int do_for_each_ref(const char *submodule, const char *base,
+int do_for_each_ref(const char *submodule, const char *prefix,
 		    each_ref_fn fn, int trim, int flags, void *cb_data)
 {
 	struct ref_entry_cb data;
 	struct ref_cache *refs;
 
 	refs = get_ref_cache(submodule);
-	data.base = base;
+	data.prefix = prefix;
 	data.trim = trim;
 	data.flags = flags;
 	data.fn = fn;
@@ -1896,7 +1896,7 @@ int do_for_each_ref(const char *submodule, const char *base,
 	if (ref_paranoia)
 		data.flags |= DO_FOR_EACH_INCLUDE_BROKEN;
 
-	return do_for_each_entry(refs, base, do_one_ref, &data);
+	return do_for_each_entry(refs, prefix, do_one_ref, &data);
 }
 
 /*
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index b4dd545..8ad02d8 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -250,16 +250,16 @@ int rename_ref_available(const char *oldname, const char *newname);
 
 /*
  * Call fn for each reference in the specified submodule for which the
- * refname begins with base. If trim is non-zero, then trim that many
- * characters off the beginning of each refname before passing the
- * refname to fn. flags can be DO_FOR_EACH_INCLUDE_BROKEN to include
- * broken references in the iteration. If fn ever returns a non-zero
- * value, stop the iteration and return that value; otherwise, return
- * 0.
+ * refname begins with prefix. If trim is non-zero, then trim that
+ * many characters off the beginning of each refname before passing
+ * the refname to fn. flags can be DO_FOR_EACH_INCLUDE_BROKEN to
+ * include broken references in the iteration. If fn ever returns a
+ * non-zero value, stop the iteration and return that value;
+ * otherwise, return 0.
  *
  * This is the common backend for the for_each_*ref* functions.
  */
-int do_for_each_ref(const char *submodule, const char *base,
+int do_for_each_ref(const char *submodule, const char *prefix,
 		    each_ref_fn fn, int trim, int flags, void *cb_data);
 
 /*
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/13] delete_refs(): add a flags argument
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
                   ` (2 preceding siblings ...)
  2016-05-30  7:55 ` [PATCH 03/13] refs: use name "prefix" consistently Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-05-30  7:55 ` [PATCH 05/13] remote rm: handle symbolic refs correctly Michael Haggerty
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

This will be useful for passing REF_NODEREF through.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/fetch.c      | 2 +-
 builtin/remote.c     | 4 ++--
 refs.h               | 5 +++--
 refs/files-backend.c | 4 ++--
 4 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index f8455bd..b55c83c 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -806,7 +806,7 @@ static int prune_refs(struct refspec *refs, int ref_count, struct ref *ref_map,
 		for (ref = stale_refs; ref; ref = ref->next)
 			string_list_append(&refnames, ref->name);
 
-		result = delete_refs(&refnames);
+		result = delete_refs(&refnames, 0);
 		string_list_clear(&refnames, 0);
 	}
 
diff --git a/builtin/remote.c b/builtin/remote.c
index fda5c2e..1bbf9b4 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -788,7 +788,7 @@ static int rm(int argc, const char **argv)
 	strbuf_release(&buf);
 
 	if (!result)
-		result = delete_refs(&branches);
+		result = delete_refs(&branches, 0);
 	string_list_clear(&branches, 0);
 
 	if (skipped.nr) {
@@ -1303,7 +1303,7 @@ static int prune_remote(const char *remote, int dry_run)
 	string_list_sort(&refs_to_prune);
 
 	if (!dry_run)
-		result |= delete_refs(&refs_to_prune);
+		result |= delete_refs(&refs_to_prune, 0);
 
 	for_each_string_list_item(item, &states.stale) {
 		const char *refname = item->util;
diff --git a/refs.h b/refs.h
index 21874f0..6d515a4 100644
--- a/refs.h
+++ b/refs.h
@@ -274,9 +274,10 @@ int delete_ref(const char *refname, const unsigned char *old_sha1,
 /*
  * Delete the specified references. If there are any problems, emit
  * errors but attempt to keep going (i.e., the deletes are not done in
- * an all-or-nothing transaction).
+ * an all-or-nothing transaction). flags is passed through to
+ * ref_transaction_delete().
  */
-int delete_refs(struct string_list *refnames);
+int delete_refs(struct string_list *refnames, unsigned int flags);
 
 /** Delete a reflog */
 int delete_reflog(const char *refname);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 5af7441..a0d09f4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2403,7 +2403,7 @@ static int delete_ref_loose(struct ref_lock *lock, int flag, struct strbuf *err)
 	return 0;
 }
 
-int delete_refs(struct string_list *refnames)
+int delete_refs(struct string_list *refnames, unsigned int flags)
 {
 	struct strbuf err = STRBUF_INIT;
 	int i, result = 0;
@@ -2432,7 +2432,7 @@ int delete_refs(struct string_list *refnames)
 	for (i = 0; i < refnames->nr; i++) {
 		const char *refname = refnames->items[i].string;
 
-		if (delete_ref(refname, NULL, 0))
+		if (delete_ref(refname, NULL, flags))
 			result |= error(_("could not remove reference %s"), refname);
 	}
 
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/13] remote rm: handle symbolic refs correctly
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
                   ` (3 preceding siblings ...)
  2016-05-30  7:55 ` [PATCH 04/13] delete_refs(): add a flags argument Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-05-30  7:55 ` [PATCH 06/13] get_ref_cache(): only create an instance if there is a submodule Michael Haggerty
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

In the modern world of reference backends, it is not OK to delete a
symref by unlink()ing the file directly. This must be done via the refs
API.

We do so by adding the symref to the list of references to delete along
with the non-symbolic references, then calling delete_refs() with the
new flags option set to REF_NODEREF.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 builtin/remote.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index 1bbf9b4..c4b4d67 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -539,10 +539,6 @@ static int add_branch_for_removal(const char *refname,
 		return 0;
 	}
 
-	/* make sure that symrefs are deleted */
-	if (flags & REF_ISSYMREF)
-		return unlink(git_path("%s", refname));
-
 	string_list_append(branches->branches, refname);
 
 	return 0;
@@ -788,7 +784,7 @@ static int rm(int argc, const char **argv)
 	strbuf_release(&buf);
 
 	if (!result)
-		result = delete_refs(&branches, 0);
+		result = delete_refs(&branches, REF_NODEREF);
 	string_list_clear(&branches, 0);
 
 	if (skipped.nr) {
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/13] get_ref_cache(): only create an instance if there is a submodule
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
                   ` (4 preceding siblings ...)
  2016-05-30  7:55 ` [PATCH 05/13] remote rm: handle symbolic refs correctly Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-05-30  7:55 ` [PATCH 07/13] entry_resolves_to_object(): rename function from ref_resolves_to_object() Michael Haggerty
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

If there is not a nonbare repository where a submodule is supposedly
located, then don't instantiate a ref_cache for it.

The analogous check can be removed from resolve_gitlink_ref().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
This doesn't actually reduce the number of ref_cache instances
generated by out test suite, but it is a more logical place for
the check that was added in

  a2d5156c resolve_gitlink_ref: ignore non-repository paths

 refs/files-backend.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index a0d09f4..142c977 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -954,15 +954,26 @@ static struct ref_cache *lookup_ref_cache(const char *submodule)
 
 /*
  * Return a pointer to a ref_cache for the specified submodule. For
- * the main repository, use submodule==NULL. The returned structure
- * will be allocated and initialized but not necessarily populated; it
- * should not be freed.
+ * the main repository, use submodule==NULL; such a call cannot fail.
+ * For a submodule, the submodule must exist and be a nonbare
+ * repository, otherwise return NULL.
+ *
+ * The returned structure will be allocated and initialized but not
+ * necessarily populated; it should not be freed.
  */
 static struct ref_cache *get_ref_cache(const char *submodule)
 {
 	struct ref_cache *refs = lookup_ref_cache(submodule);
-	if (!refs)
-		refs = create_ref_cache(submodule);
+
+	if (!refs) {
+		struct strbuf submodule_sb = STRBUF_INIT;
+
+		strbuf_addstr(&submodule_sb, submodule);
+		if (is_nonbare_repository_dir(&submodule_sb))
+			refs = create_ref_cache(submodule);
+		strbuf_release(&submodule_sb);
+	}
+
 	return refs;
 }
 
@@ -1341,13 +1352,10 @@ int resolve_gitlink_ref(const char *path, const char *refname, unsigned char *sh
 		return -1;
 
 	strbuf_add(&submodule, path, len);
-	refs = lookup_ref_cache(submodule.buf);
+	refs = get_ref_cache(submodule.buf);
 	if (!refs) {
-		if (!is_nonbare_repository_dir(&submodule)) {
-			strbuf_release(&submodule);
-			return -1;
-		}
-		refs = create_ref_cache(submodule.buf);
+		strbuf_release(&submodule);
+		return -1;
 	}
 	strbuf_release(&submodule);
 
@@ -1885,6 +1893,9 @@ int do_for_each_ref(const char *submodule, const char *prefix,
 	struct ref_cache *refs;
 
 	refs = get_ref_cache(submodule);
+	if (!refs)
+		return 0;
+
 	data.prefix = prefix;
 	data.trim = trim;
 	data.flags = flags;
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/13] entry_resolves_to_object(): rename function from ref_resolves_to_object()
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
                   ` (5 preceding siblings ...)
  2016-05-30  7:55 ` [PATCH 06/13] get_ref_cache(): only create an instance if there is a submodule Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-05-30  7:55 ` [PATCH 08/13] ref_resolves_to_object(): new function Michael Haggerty
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

Free up the old name for a more general purpose.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 142c977..1a46f32 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -517,7 +517,7 @@ static void sort_ref_dir(struct ref_dir *dir)
  * an object in the database.  Emit a warning if the referred-to
  * object does not exist.
  */
-static int ref_resolves_to_object(struct ref_entry *entry)
+static int entry_resolves_to_object(struct ref_entry *entry)
 {
 	if (entry->flag & REF_ISBROKEN)
 		return 0;
@@ -563,7 +563,7 @@ static int do_one_ref(struct ref_entry *entry, void *cb_data)
 		return 0;
 
 	if (!(data->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
-	      !ref_resolves_to_object(entry))
+	      !entry_resolves_to_object(entry))
 		return 0;
 
 	/* Store the old value, in case this is a recursive call: */
@@ -2228,7 +2228,7 @@ static int pack_if_possible_fn(struct ref_entry *entry, void *cb_data)
 		return 0;
 
 	/* Do not pack symbolic or broken refs: */
-	if ((entry->flag & REF_ISSYMREF) || !ref_resolves_to_object(entry))
+	if ((entry->flag & REF_ISSYMREF) || !entry_resolves_to_object(entry))
 		return 0;
 
 	/* Add a packed ref cache entry equivalent to the loose entry. */
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/13] ref_resolves_to_object(): new function
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
                   ` (6 preceding siblings ...)
  2016-05-30  7:55 ` [PATCH 07/13] entry_resolves_to_object(): rename function from ref_resolves_to_object() Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-05-30  7:55 ` [PATCH 09/13] refs: introduce an iterator interface Michael Haggerty
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

Extract new function ref_resolves_to_object() from
entry_resolves_to_object(). It can be used even if there is no ref_entry
at hand.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 1a46f32..8ab4d5f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -513,19 +513,32 @@ static void sort_ref_dir(struct ref_dir *dir)
 }
 
 /*
- * Return true iff the reference described by entry can be resolved to
- * an object in the database.  Emit a warning if the referred-to
- * object does not exist.
+ * Return true if refname, which has the specified oid and flags, can
+ * be resolved to an object in the database. If the referred-to object
+ * does not exist, emit a warning and return false.
+ */
+static int ref_resolves_to_object(const char *refname,
+				  const struct object_id *oid,
+				  unsigned int flags)
+{
+	if (flags & REF_ISBROKEN)
+		return 0;
+	if (!has_sha1_file(oid->hash)) {
+		error("%s does not point to a valid object!", refname);
+		return 0;
+	}
+	return 1;
+}
+
+/*
+ * Return true if the reference described by entry can be resolved to
+ * an object in the database; otherwise, emit a warning and return
+ * false.
  */
 static int entry_resolves_to_object(struct ref_entry *entry)
 {
-	if (entry->flag & REF_ISBROKEN)
-		return 0;
-	if (!has_sha1_file(entry->u.value.oid.hash)) {
-		error("%s does not point to a valid object!", entry->name);
-		return 0;
-	}
-	return 1;
+	return ref_resolves_to_object(entry->name,
+				      &entry->u.value.oid, entry->flag);
 }
 
 /*
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/13] refs: introduce an iterator interface
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
                   ` (7 preceding siblings ...)
  2016-05-30  7:55 ` [PATCH 08/13] ref_resolves_to_object(): new function Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-05-30 15:22   ` Ramsay Jones
                     ` (3 more replies)
  2016-05-30  7:55 ` [PATCH 10/13] do_for_each_ref(): reimplement using reference iteration Michael Haggerty
                   ` (3 subsequent siblings)
  12 siblings, 4 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

Currently, the API for iterating over references is via a family of
for_each_ref()-type functions that invoke a callback function for each
selected reference. All of these eventually call do_for_each_ref(),
which knows how to do one thing: iterate in parallel through two
ref_caches, one for loose and one for packed refs, giving loose
references precedence over packed refs. This is rather complicated code,
and is quite specialized to the files backend. It also requires callers
to encapsulate their work into a callback function, which often means
that they have to define and use a "cb_data" struct to manage their
context.

The current design is already bursting at the seams, and will become
even more awkward in the upcoming world of multiple reference storage
backends:

* Per-worktree vs. shared references are currently handled via a kludge
  in git_path() rather than iterating over each part of the reference
  namespace separately and merging the results. This kludge will cease
  to work when we have multiple reference storage backends.

* The current scheme is inflexible. What if we sometimes want to bypass
  the ref_cache, or use it only for packed or only for loose refs? What
  if we want to store symbolic refs in one type of storage backend and
  non-symbolic ones in another?

In the future, each reference backend will need to define its own way of
iterating over references. The crux of the problem with the current
design is that it is impossible to compose for_each_ref()-style
iterations, because the flow of control is owned by the for_each_ref()
function. There is nothing that a caller can do but iterate through all
references in a single burst, so there is no way for it to interleave
references from multiple backends and present the result to the rest of
the world as a single compound backend.

This commit introduces a new iteration primitive for references: a
ref_iterator. A ref_iterator is a polymorphic object that a reference
storage backend can be asked to instantiate. There are three functions
that can be applied to a ref_iterator:

* ref_iterator_advance(): move to the next reference in the iteration
* ref_iterator_abort(): end the iteration before it is exhausted
* ref_iterator_peel(): peel the reference currently being looked at

Iterating using a ref_iterator leaves the flow of control in the hands
of the caller, which means that ref_iterators from multiple
sources (e.g., loose and packed refs) can be composed and presented to
the world as a single compound ref_iterator.

It also means that the backend code for implementing reference iteration
will sometimes be more complicated. For example, the
cache_ref_iterator (which iterates over a ref_cache) can't use the C
stack to recurse; instead, it must manage its own stack internally as
explicit data structures. There is also a lot of boilerplate connected
with object-oriented programming in C.

Eventually, end-user callers will be able to be written in a more
natural way—managing their own flow of control rather than having to
work via callbacks. Since there will only be a few reference backends
but there are many consumers of this API, this is a good tradeoff.

More importantly, we gain composability, and especially the possibility
of writing interchangeable parts that can work with any ref_iterator.

For example, merge_ref_iterator implements a generic way of merging the
contents of any two ref_iterators. It is used to merge loose + packed
refs as part of the implementation of the files_ref_iterator. But it
will also be possible to use it to merge other pairs of reference
sources (e.g., per-worktree vs. shared refs).

Another example is prefix_ref_iterator, which can be used to trim a
prefix off the front of reference names before presenting them to the
caller (e.g., "refs/heads/master" -> "master").

In this patch, we introduce the iterator abstraction and many utilities,
and implement a reference iterator for the files ref storage backend.
(I've written several other obvious utilities, for example a generic way
to filter references being iterated over. These will probably be useful
in the future. But they are not needed for this patch series, so I am
not including them at this time.)

In a moment we will rewrite do_for_each_ref() to work via reference
iterators (allowing some special-purpose code to be discarded), and do
something similar for reflogs. In future patch series, we will expose
the ref_iterator abstraction in the public refs API so that callers can
use it directly.

Implementation note: I tried abstracting this a layer further to allow
generic iterators (over arbitrary types of objects) and generic
utilities like a generic merge_iterator. But the implementation in C was
very cumbersome, involving (in my opinion) too much boilerplate and too
much unsafe casting, some of which would have had to be done on the
caller side. However, I did put a few iterator-related constants in a
top-level header file, iterator.h, as they will be useful in a moment to
implement iteration over directory trees and possibly other types of
iterators in the future.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 Makefile             |   1 +
 iterator.h           |  81 ++++++++++++
 refs.h               |   4 +-
 refs/files-backend.c | 282 +++++++++++++++++++++++++++++++++++++++++
 refs/iterator.c      | 347 +++++++++++++++++++++++++++++++++++++++++++++++++++
 refs/refs-internal.h | 193 ++++++++++++++++++++++++++++
 6 files changed, 907 insertions(+), 1 deletion(-)
 create mode 100644 iterator.h
 create mode 100644 refs/iterator.c

diff --git a/Makefile b/Makefile
index a83e322..ac8f365 100644
--- a/Makefile
+++ b/Makefile
@@ -786,6 +786,7 @@ LIB_OBJS += read-cache.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
+LIB_OBJS += refs/iterator.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += remote.o
 LIB_OBJS += replace_object.o
diff --git a/iterator.h b/iterator.h
new file mode 100644
index 0000000..0f6900e
--- /dev/null
+++ b/iterator.h
@@ -0,0 +1,81 @@
+#ifndef ITERATOR_H
+#define ITERATOR_H
+
+/*
+ * Generic constants related to iterators.
+ */
+
+/*
+ * The attempt to advance the iterator was successful; the iterator
+ * reflects the new current entry.
+ */
+#define ITER_OK 0
+
+/*
+ * The iterator is exhausted and has been freed.
+ */
+#define ITER_DONE -1
+
+/*
+ * The iterator experienced an error. The iteration has been aborted
+ * and the iterator has been freed.
+ */
+#define ITER_ERROR -2
+
+/*
+ * Return values for selector functions for merge iterators. The
+ * numerical values of these constants are important and must be
+ * compatible with ITER_DONE and ITER_ERROR.
+ */
+enum iterator_selection {
+	/* End the iteration without an error: */
+	ITER_SELECT_DONE = ITER_DONE,
+
+	/* Report an error and abort the iteration: */
+	ITER_SELECT_ERROR = ITER_ERROR,
+
+	/*
+	 * The next group of constants are masks that are useful
+	 * mainly internally.
+	 */
+
+	/* The LSB selects whether iter0/iter1 is the "current" iterator: */
+	ITER_CURRENT_SELECTION_MASK = 0x01,
+
+	/* iter0 is the "current" iterator this round: */
+	ITER_CURRENT_SELECTION_0 = 0x00,
+
+	/* iter1 is the "current" iterator this round: */
+	ITER_CURRENT_SELECTION_1 = 0x01,
+
+	/* Yield the value from the current iterator? */
+	ITER_YIELD_CURRENT = 0x02,
+
+	/* Discard the value from the secondary iterator? */
+	ITER_SKIP_SECONDARY = 0x04,
+
+	/*
+	 * The constants that a selector function should usually
+	 * return.
+	 */
+
+	/* Yield the value from iter0: */
+	ITER_SELECT_0 = ITER_CURRENT_SELECTION_0 | ITER_YIELD_CURRENT,
+
+	/* Yield the value from iter0 and discard the one from iter1: */
+	ITER_SELECT_0_SKIP_1 = ITER_SELECT_0 | ITER_SKIP_SECONDARY,
+
+	/* Discard the value from iter0 without yielding anything this round: */
+	ITER_SKIP_0 = ITER_CURRENT_SELECTION_1 | ITER_SKIP_SECONDARY,
+
+	/* Yield the value from iter1: */
+	ITER_SELECT_1 = ITER_CURRENT_SELECTION_1 | ITER_YIELD_CURRENT,
+
+	/* Yield the value from iter1 and discard the one from iter0: */
+	ITER_SELECT_1_SKIP_0 = ITER_SELECT_1 | ITER_SKIP_SECONDARY,
+
+	/* Discard the value from iter1 without yielding anything this round: */
+	ITER_SKIP_1 = ITER_CURRENT_SELECTION_0 | ITER_SKIP_SECONDARY
+};
+
+#endif /* ITERATOR_H */
diff --git a/refs.h b/refs.h
index 6d515a4..442c1a5 100644
--- a/refs.h
+++ b/refs.h
@@ -141,7 +141,9 @@ int dwim_log(const char *str, int len, unsigned char *sha1, char **ref);
 struct ref_transaction;
 
 /*
- * Bit values set in the flags argument passed to each_ref_fn():
+ * Bit values set in the flags argument passed to each_ref_fn() and
+ * stored in ref_iterator::flags. Other bits are for internal use
+ * only:
  */
 
 /* Reference is a symbolic reference. */
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8ab4d5f..dbf1587 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,6 +1,7 @@
 #include "../cache.h"
 #include "../refs.h"
 #include "refs-internal.h"
+#include "../iterator.h"
 #include "../lockfile.h"
 #include "../object.h"
 #include "../dir.h"
@@ -704,6 +705,154 @@ static void prime_ref_dir(struct ref_dir *dir)
 	}
 }
 
+/*
+ * A level in the reference hierarchy that is currently being iterated
+ * through.
+ */
+struct cache_ref_iterator_level {
+	/*
+	 * The ref_dir being iterated over at this level. The ref_dir
+         * is sorted before being stored here.
+	 */
+	struct ref_dir *dir;
+
+	/*
+	 * The index of the current entry within dir (which might
+	 * itself be a directory). If index == -1, then the iteration
+	 * hasn't yet begun. If index == dir->nr, then the iteration
+	 * through this level is over.
+	 */
+	int index;
+};
+
+/*
+ * Represent an iteration through a ref_dir in the memory cache. The
+ * iteration recurses through subdirectories.
+ */
+struct cache_ref_iterator {
+	struct ref_iterator base;
+
+	/*
+	 * The number of levels currently on the stack. This is always
+	 * at least 1, because when it becomes zero the iteration is
+	 * ended and this struct is freed.
+	 */
+	size_t levels_nr;
+
+	/* The number of levels that have been allocated on the stack */
+	size_t levels_alloc;
+
+	/*
+	 * A stack of levels. levels[0] is the uppermost level that is
+	 * being iterated over in this iteration. (This is not
+	 * necessary the top level in the references hierarchy. If we
+	 * are iterating through a subtree, then levels[0] will hold
+	 * the ref_dir for that subtree, and subsequent levels will go
+	 * on from there.)
+	 */
+	struct cache_ref_iterator_level *levels;
+};
+
+static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+
+	while (1) {
+		struct cache_ref_iterator_level *level =
+			&iter->levels[iter->levels_nr - 1];
+		struct ref_dir *dir = level->dir;
+		struct ref_entry *entry;
+
+		if (level->index == -1)
+			sort_ref_dir(dir);
+
+		if (++level->index == level->dir->nr) {
+			/* This level is exhausted; pop up a level */
+			if (--iter->levels_nr == 0)
+				return ref_iterator_abort(ref_iterator);
+
+			continue;
+		}
+
+		entry = dir->entries[level->index];
+
+		if (entry->flag & REF_DIR) {
+			/* push down a level */
+			ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+				   iter->levels_alloc);
+
+			level = &iter->levels[iter->levels_nr++];
+			level->dir = get_ref_dir(entry);
+			sort_ref_dir(level->dir);
+			level->index = -1;
+		} else {
+			iter->base.refname = entry->name;
+			iter->base.oid = &entry->u.value.oid;
+			iter->base.flags = entry->flag;
+			return ITER_OK;
+		}
+	}
+}
+
+static enum peel_status peel_entry(struct ref_entry *entry, int repeel);
+
+static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				   struct object_id *peeled)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+	struct cache_ref_iterator_level *level;
+	struct ref_entry *entry;
+
+	level = &iter->levels[iter->levels_nr - 1];
+
+	if (level->index == -1)
+		die("BUG: peel called before advance for cache iterator");
+
+	entry = level->dir->entries[level->index];
+
+	if (peel_entry(entry, 0))
+		return -1;
+	hashcpy(peeled->hash, entry->u.value.peeled.hash);
+	return 0;
+}
+
+static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+
+	free(iter->levels);
+	base_ref_iterator_free(ref_iterator);
+	return ITER_DONE;
+}
+
+struct ref_iterator_vtable cache_ref_iterator_vtable = {
+	cache_ref_iterator_advance,
+	cache_ref_iterator_peel,
+	cache_ref_iterator_abort
+};
+
+static struct ref_iterator *cache_ref_iterator_begin(struct ref_dir *dir)
+{
+	struct cache_ref_iterator *iter;
+	struct ref_iterator *ref_iterator;
+	struct cache_ref_iterator_level *level;
+
+	iter = xcalloc(1, sizeof(*iter));
+	ref_iterator = &iter->base;
+	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
+	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+
+	iter->levels_nr = 1;
+	level = &iter->levels[0];
+	level->index = -1;
+	level->dir = dir;
+
+	return ref_iterator;
+}
+
 struct nonmatching_ref_data {
 	const struct string_list *skip;
 	const char *conflicting_refname;
@@ -1843,6 +1992,139 @@ int peel_ref(const char *refname, unsigned char *sha1)
 	return peel_object(base, sha1);
 }
 
+struct files_ref_iterator {
+	struct ref_iterator base;
+
+	struct packed_ref_cache *packed_ref_cache;
+	struct ref_iterator *iter0;
+	unsigned int flags;
+};
+
+static int files_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct files_ref_iterator *iter =
+		(struct files_ref_iterator *)ref_iterator;
+	int ok;
+
+	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
+		if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
+		    !ref_resolves_to_object(iter->iter0->refname,
+					    iter->iter0->oid,
+					    iter->iter0->flags))
+			continue;
+
+		iter->base.refname = iter->iter0->refname;
+		iter->base.oid = iter->iter0->oid;
+		iter->base.flags = iter->iter0->flags;
+		return ITER_OK;
+	}
+
+	iter->iter0 = NULL;
+	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
+		ok = ITER_ERROR;
+
+	return ok;
+}
+
+static int files_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				   struct object_id *peeled)
+{
+	struct files_ref_iterator *iter =
+		(struct files_ref_iterator *)ref_iterator;
+
+	return ref_iterator_peel(iter->iter0, peeled);
+}
+
+static int files_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct files_ref_iterator *iter =
+		(struct files_ref_iterator *)ref_iterator;
+	int ok = ITER_DONE;
+
+	if (iter->iter0)
+		ok = ref_iterator_abort(iter->iter0);
+
+	release_packed_ref_cache(iter->packed_ref_cache);
+	base_ref_iterator_free(ref_iterator);
+	return ok;
+}
+
+struct ref_iterator_vtable files_ref_iterator_vtable = {
+	files_ref_iterator_advance,
+	files_ref_iterator_peel,
+	files_ref_iterator_abort
+};
+
+struct ref_iterator *files_ref_iterator_begin(
+		const char *submodule,
+		const char *prefix, unsigned int flags)
+{
+	struct ref_cache *refs = get_ref_cache(submodule);
+	struct ref_dir *loose_dir, *packed_dir;
+	struct ref_iterator *loose_iter, *packed_iter;
+	struct files_ref_iterator *iter;
+	struct ref_iterator *ref_iterator;
+
+	if (!refs)
+		return empty_ref_iterator_begin();
+
+	if (ref_paranoia < 0)
+		ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 0);
+	if (ref_paranoia)
+		flags |= DO_FOR_EACH_INCLUDE_BROKEN;
+
+	iter = xcalloc(1, sizeof(*iter));
+	ref_iterator = &iter->base;
+	base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable);
+
+	/*
+	 * We must make sure that all loose refs are read before
+	 * accessing the packed-refs file; this avoids a race
+	 * condition if loose refs are migrated to the packed-refs
+	 * file by a simultaneous process, but our in-memory view is
+	 * from before the migration. We ensure this as follows:
+	 * First, we call prime_ref_dir(), which pre-reads the loose
+	 * references for the subtree into the cache. (If they've
+	 * already been read, that's OK; we only need to guarantee
+	 * that they're read before the packed refs, not *how much*
+	 * before.) After that, we call get_packed_ref_cache(), which
+	 * internally checks whether the packed-ref cache is up to
+	 * date with what is on disk, and re-reads it if not.
+	 */
+
+	loose_dir = get_loose_refs(refs);
+
+	if (prefix && *prefix)
+		loose_dir = find_containing_dir(loose_dir, prefix, 0);
+
+	if (loose_dir) {
+		prime_ref_dir(loose_dir);
+		loose_iter = cache_ref_iterator_begin(loose_dir);
+	} else {
+		/* There's nothing to iterate over. */
+		loose_iter = empty_ref_iterator_begin();
+	}
+
+	iter->packed_ref_cache = get_packed_ref_cache(refs);
+	acquire_packed_ref_cache(iter->packed_ref_cache);
+	packed_dir = get_packed_ref_dir(iter->packed_ref_cache);
+
+	if (prefix && *prefix)
+		packed_dir = find_containing_dir(packed_dir, prefix, 0);
+
+	if (packed_dir) {
+		packed_iter = cache_ref_iterator_begin(packed_dir);
+	} else {
+		/* There's nothing to iterate over. */
+		packed_iter = empty_ref_iterator_begin();
+	}
+
+	iter->iter0 = overlay_ref_iterator_begin(packed_iter, loose_iter);
+	iter->flags = flags;
+
+	return ref_iterator;
+}
+
 /*
  * Call fn for each reference in the specified ref_cache, omitting
  * references not in the containing_dir of prefix. Call fn for all
diff --git a/refs/iterator.c b/refs/iterator.c
new file mode 100644
index 0000000..82a6948
--- /dev/null
+++ b/refs/iterator.c
@@ -0,0 +1,347 @@
+/*
+ * Generic reference iterator infrastructure. See refs-internal.h for
+ * documentation about the design and use of reference iterators.
+ */
+
+#include "cache.h"
+#include "refs.h"
+#include "refs/refs-internal.h"
+#include "iterator.h"
+
+int ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	return ref_iterator->vtable->advance(ref_iterator);
+}
+
+int ref_iterator_peel(struct ref_iterator *ref_iterator,
+		      struct object_id *peeled)
+{
+	return ref_iterator->vtable->peel(ref_iterator, peeled);
+}
+
+int ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	return ref_iterator->vtable->abort(ref_iterator);
+}
+
+void base_ref_iterator_init(struct ref_iterator *iter,
+			    struct ref_iterator_vtable *vtable)
+{
+	iter->vtable = vtable;
+	iter->refname = NULL;
+	iter->oid = NULL;
+	iter->flags = 0;
+}
+
+void base_ref_iterator_free(struct ref_iterator *iter)
+{
+	/* Help make use-after-free bugs fail quickly: */
+	iter->vtable = NULL;
+	free(iter);
+}
+
+struct empty_ref_iterator {
+	struct ref_iterator base;
+};
+
+static int empty_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	return ref_iterator_abort(ref_iterator);
+}
+
+static int empty_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				   struct object_id *peeled)
+{
+	die("BUG: peel called for empty iterator");
+}
+
+static int empty_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	base_ref_iterator_free(ref_iterator);
+	return ITER_DONE;
+}
+
+static struct ref_iterator_vtable empty_ref_iterator_vtable = {
+	empty_ref_iterator_advance,
+	empty_ref_iterator_peel,
+	empty_ref_iterator_abort
+};
+
+struct ref_iterator *empty_ref_iterator_begin(void)
+{
+	struct empty_ref_iterator *iter = xcalloc(1, sizeof(*iter));
+	struct ref_iterator *ref_iterator = &iter->base;
+
+	base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable);
+	return ref_iterator;
+}
+
+int is_empty_ref_iterator(struct ref_iterator *ref_iterator)
+{
+	return ref_iterator->vtable == &empty_ref_iterator_vtable;
+}
+
+struct merge_ref_iterator {
+	struct ref_iterator base;
+
+	struct ref_iterator *iter0, *iter1;
+
+	ref_iterator_select_fn *select;
+	void *cb_data;
+
+	/*
+	 * A pointer to iter0 or iter1 (whichever is supplying the
+	 * current value), or NULL if advance has not yet been called.
+	 */
+	struct ref_iterator **current;
+};
+
+static int merge_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct merge_ref_iterator *iter =
+		(struct merge_ref_iterator *)ref_iterator;
+	int ok;
+
+	if (!iter->current) {
+		/* Initialize: advance both iterators to their first entries */
+		if ((ok = ref_iterator_advance(iter->iter0)) != ITER_OK) {
+			iter->iter0 = NULL;
+			if (ok == ITER_ERROR)
+				goto error;
+		}
+		if ((ok = ref_iterator_advance(iter->iter1)) != ITER_OK) {
+			iter->iter1 = NULL;
+			if (ok == ITER_ERROR)
+				goto error;
+		}
+	} else {
+		/*
+		 * Advance the current iterator past the just-used
+		 * entry:
+		 */
+		if ((ok = ref_iterator_advance(*iter->current)) != ITER_OK) {
+			*iter->current = NULL;
+			if (ok == ITER_ERROR)
+				goto error;
+		}
+	}
+
+	/* Loop until we find an entry that we can yield. */
+	while (1) {
+		struct ref_iterator **secondary;
+		enum iterator_selection selection =
+			iter->select(iter->iter0, iter->iter1, iter->cb_data);
+
+		if (selection == ITER_SELECT_DONE) {
+			return ref_iterator_abort(ref_iterator);
+		} else if (selection == ITER_SELECT_ERROR) {
+			ref_iterator_abort(ref_iterator);
+			return ITER_ERROR;
+		}
+
+		if ((selection & ITER_CURRENT_SELECTION_MASK) == 0) {
+			iter->current = &iter->iter0;
+			secondary = &iter->iter1;
+		} else {
+			iter->current = &iter->iter1;
+			secondary = &iter->iter0;
+		}
+
+		if (selection & ITER_SKIP_SECONDARY) {
+			if ((ok = ref_iterator_advance(*secondary)) != ITER_OK) {
+				*secondary = NULL;
+				if (ok == ITER_ERROR)
+					goto error;
+			}
+		}
+
+		if (selection & ITER_YIELD_CURRENT) {
+			iter->base.refname = (*iter->current)->refname;
+			iter->base.oid = (*iter->current)->oid;
+			iter->base.flags = (*iter->current)->flags;
+			return ITER_OK;
+		}
+	}
+
+error:
+	ref_iterator_abort(ref_iterator);
+	return ITER_ERROR;
+}
+
+static int merge_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				   struct object_id *peeled)
+{
+	struct merge_ref_iterator *iter =
+		(struct merge_ref_iterator *)ref_iterator;
+
+	if (!iter->current) {
+		die("BUG: peel called before advance for merge iterator");
+	}
+	return ref_iterator_peel(*iter->current, peeled);
+}
+
+static int merge_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct merge_ref_iterator *iter =
+		(struct merge_ref_iterator *)ref_iterator;
+	int ok = ITER_DONE;
+
+	if (iter->iter0) {
+		if (ref_iterator_abort(iter->iter0) != ITER_DONE)
+			ok = ITER_ERROR;
+	}
+	if (iter->iter1) {
+		if (ref_iterator_abort(iter->iter1) != ITER_DONE)
+			ok = ITER_ERROR;
+	}
+	base_ref_iterator_free(ref_iterator);
+	return ok;
+}
+
+static struct ref_iterator_vtable merge_ref_iterator_vtable = {
+	merge_ref_iterator_advance,
+	merge_ref_iterator_peel,
+	merge_ref_iterator_abort
+};
+
+struct ref_iterator *merge_ref_iterator_begin(
+		struct ref_iterator *iter0, struct ref_iterator *iter1,
+		ref_iterator_select_fn *select, void *cb_data)
+{
+	struct merge_ref_iterator *iter = xcalloc(1, sizeof(*iter));
+	struct ref_iterator *ref_iterator = &iter->base;
+
+	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
+	iter->iter0 = iter0;
+	iter->iter1 = iter1;
+	iter->select = select;
+	iter->cb_data = cb_data;
+	iter->current = NULL;
+	return ref_iterator;
+}
+
+/*
+ * A ref_iterator_select_fn that overlays the items from iter1 on top
+ * of those from iter0 (like loose refs over packed refs). See
+ * overlay_ref_iterator_begin().
+ */
+static enum iterator_selection overlay_iterator_select(
+		struct ref_iterator *iter0, struct ref_iterator *iter1,
+		void *cb_data)
+{
+	int cmp;
+
+	if (!iter0)
+		return iter1 ? ITER_SELECT_1 : ITER_SELECT_DONE;
+	else if (!iter1)
+		return ITER_SELECT_0;
+
+	cmp = strcmp(iter0->refname, iter1->refname);
+
+	if (cmp < 0)
+		return ITER_SELECT_0;
+	else if (cmp > 0)
+		return ITER_SELECT_1;
+	else
+		return ITER_SELECT_1_SKIP_0;
+}
+
+struct ref_iterator *overlay_ref_iterator_begin(struct ref_iterator *iter0,
+						struct ref_iterator *iter1)
+{
+	/*
+	 * Optimization: if one of the iterators is empty, return the
+	 * other one rather than incurring the overhead of wrapping
+	 * them.
+	 */
+	if (is_empty_ref_iterator(iter0)) {
+		ref_iterator_abort(iter0);
+		return iter1;
+	} else if (is_empty_ref_iterator(iter1)) {
+		ref_iterator_abort(iter1);
+		return iter0;
+	}
+
+	return merge_ref_iterator_begin(iter0, iter1,
+					overlay_iterator_select, NULL);
+}
+
+struct prefix_ref_iterator {
+	struct ref_iterator base;
+
+	struct ref_iterator *iter0;
+	char *prefix;
+	int trim;
+};
+
+static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct prefix_ref_iterator *iter =
+		(struct prefix_ref_iterator *)ref_iterator;
+	int ok;
+
+	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
+		if (!starts_with(iter->iter0->refname, iter->prefix))
+			continue;
+
+		iter->base.refname = iter->iter0->refname + iter->trim;
+		iter->base.oid = iter->iter0->oid;
+		iter->base.flags = iter->iter0->flags;
+		return ITER_OK;
+	}
+
+	iter->iter0 = NULL;
+	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
+		return ITER_ERROR;
+	return ok;
+}
+
+static int prefix_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				    struct object_id *peeled)
+{
+	struct prefix_ref_iterator *iter =
+		(struct prefix_ref_iterator *)ref_iterator;
+
+	return ref_iterator_peel(iter->iter0, peeled);
+}
+
+static int prefix_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct prefix_ref_iterator *iter =
+		(struct prefix_ref_iterator *)ref_iterator;
+	int ok = ITER_DONE;
+
+	if (iter->iter0)
+		ok = ref_iterator_abort(iter->iter0);
+	free(iter->prefix);
+	base_ref_iterator_free(ref_iterator);
+	return ok;
+}
+
+static struct ref_iterator_vtable prefix_ref_iterator_vtable = {
+	prefix_ref_iterator_advance,
+	prefix_ref_iterator_peel,
+	prefix_ref_iterator_abort
+};
+
+struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
+					       const char *prefix,
+					       int trim)
+{
+	struct prefix_ref_iterator *iter;
+	struct ref_iterator *ref_iterator;
+
+	if (!*prefix && !trim)
+		return iter0; /* optimization: no need to wrap iterator */
+
+	iter = xcalloc(1, sizeof(*iter));
+	ref_iterator = &iter->base;
+
+	base_ref_iterator_init(ref_iterator, &prefix_ref_iterator_vtable);
+
+	iter->iter0 = iter0;
+	iter->prefix = xstrdup(prefix);
+	iter->trim = trim;
+
+	return ref_iterator;
+}
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 8ad02d8..11182b4 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -249,6 +249,199 @@ int rename_ref_available(const char *oldname, const char *newname);
 #define DO_FOR_EACH_INCLUDE_BROKEN 0x01
 
 /*
+ * Reference iterators
+ *
+ * A reference iterator encapsulates the state of an in-progress
+ * iteration over references. Create an instance of `struct
+ * ref_iterator` via one of the functions in this module.
+ *
+ * A freshly-created ref_iterator doesn't yet point at a reference. To
+ * advance the iterator, call ref_iterator_advance(). If successful,
+ * this sets the iterator's refname, oid, and flags fields to describe
+ * the next reference and returns ITER_OK. The data pointed at by
+ * refname and oid belong to the iterator; if you want to retain them
+ * after calling ref_iterator_advance() again or calling
+ * ref_iterator_abort(), you must make a copy. When the iteration has
+ * been exhausted, ref_iterator_advance() releases any resources
+ * assocated with the iteration, frees the ref_iterator object, and
+ * returns ITER_DONE. If you want to abort the iteration early, call
+ * ref_iterator_abort(), which also frees the ref_iterator object and
+ * any associated resources. If there was an internal error advancing
+ * to the next entry, ref_iterator_advance() aborts the iteration,
+ * frees the ref_iterator, and returns ITER_ERROR.
+ *
+ * The reference currently being looked at can be peeled by calling
+ * ref_iterator_peel(). This function is often faster than peel_ref(),
+ * so it should be preferred when iterating over references.
+ *
+ * Putting it all together, a typical iteration looks like this:
+ *
+ *     int ok;
+ *     struct ref_iterator *iter = ...;
+ *
+ *     while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+ *             if (want_to_stop_iteration()) {
+ *                     ok = ref_iterator_abort(iter);
+ *                     break;
+ *             }
+ *
+ *             // Access information about the current reference:
+ *             if (!(iter->flags & REF_ISSYMREF))
+ *                     printf("%s is %s\n", iter->refname, oid_to_hex(&iter->oid));
+ *
+ *             // If you need to peel the reference:
+ *             ref_iterator_peel(iter, &oid);
+ *     }
+ *
+ *     if (ok != ITER_DONE)
+ *             handle_error();
+ */
+struct ref_iterator {
+	struct ref_iterator_vtable *vtable;
+	const char *refname;
+	const struct object_id *oid;
+	unsigned int flags;
+};
+
+/*
+ * Advance the iterator to the first or next item and return ITER_OK.
+ * If the iteration is exhausted, free the resources associated with
+ * the ref_iterator and return ITER_DONE. On errors, free the iterator
+ * resources and return ITER_ERROR. It is a bug to use ref_iterator or
+ * call this function again after it has returned false.
+ */
+int ref_iterator_advance(struct ref_iterator *ref_iterator);
+
+/*
+ * If possible, peel the reference currently being viewed by the
+ * iterator. Return 0 on success.
+ */
+int ref_iterator_peel(struct ref_iterator *ref_iterator,
+		      struct object_id *peeled);
+
+/*
+ * End the iteration before it has been exhausted, freeing the
+ * reference iterator and any associated resources and returning
+ * ITER_DONE. If the abort itself failed, return ITER_ERROR.
+ */
+int ref_iterator_abort(struct ref_iterator *ref_iterator);
+
+/*
+ * An iterator over nothing (its first ref_iterator_advance() call
+ * returns 0).
+ */
+struct ref_iterator *empty_ref_iterator_begin(void);
+
+/*
+ * Return true iff ref_iterator is an empty_ref_iterator.
+ */
+int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
+
+/*
+ * A callback function used to instruct merge_ref_iterator how to
+ * interleave the entries from iter0 and iter1. The function should
+ * return one of the constants defined in enum iterator_selection. It
+ * must not advance either of the iterators itself.
+ *
+ * The function must be prepared to handle the case that iter0 and/or
+ * iter1 is NULL, which indicates that the corresponding sub-iterator
+ * has been exhausted. Its return value must be consistent with the
+ * current states of the iterators; e.g., it must not return
+ * ITER_SKIP_1 if iter1 has already been exhausted.
+ */
+typedef enum iterator_selection ref_iterator_select_fn(
+		struct ref_iterator *iter0, struct ref_iterator *iter1,
+		void *cb_data);
+
+/*
+ * Iterate over the intries from iter0 and iter1, with the values
+ * interleaved as directed by the select function. The iterator takes
+ * ownership of iter0 and iter1 and frees them when the iteration is
+ * over.
+ */
+struct ref_iterator *merge_ref_iterator_begin(
+		struct ref_iterator *iter0, struct ref_iterator *iter1,
+		ref_iterator_select_fn *select, void *cb_data);
+
+/*
+ * An iterator consisting of the union of the entries from iter0 and
+ * iter1. If there are entries common to the two sub-iterators, use
+ * the one from iter1. Each iterator must iterate over its entries in
+ * strcmp() order by refname for this to work.
+ *
+ * The new iterator takes ownership of its arguments and frees them
+ * when the iteration is over. As a convenience to callers, if iter0
+ * or iter1 is_empty_ref_iterator(), then abort that one immediately
+ * and return the other iterator directly, without wrapping it.
+ */
+struct ref_iterator *overlay_ref_iterator_begin(struct ref_iterator *iter0,
+						struct ref_iterator *iter1);
+
+/*
+ * Wrap iter0, only letting through the references whose names start
+ * with prefix. If trim is set, set iter->refname to the name of the
+ * reference with that many characters trimmed off the front;
+ * otherwise set it to the full refname. The new iterator takes over
+ * ownership of iter0 and frees it when iteration is over. It makes
+ * its own copy of prefix.
+ *
+ * As an convenience to callers, if prefix is the empty string and
+ * trim is zero, this function returns iter0 directly, without
+ * wrapping it.
+ */
+struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
+					       const char *prefix,
+					       int trim);
+
+/*
+ * Iterate over the packed and loose references in the specified
+ * submodule that are within find_containing_dir(prefix). If prefix is
+ * NULL or the empty string, iterate over all references in the
+ * submodule.
+ */
+struct ref_iterator *files_ref_iterator_begin(const char *submodule,
+					      const char *prefix,
+					      unsigned int flags);
+
+/* Internal implementation of reference iteration: */
+
+/*
+ * Base class constructor for ref_iterators. Initialize the
+ * ref_iterator part of iter, setting its vtable pointer as specified.
+ * This is meant to be called only by the initializers of derived
+ * classes.
+ */
+void base_ref_iterator_init(struct ref_iterator *iter,
+			    struct ref_iterator_vtable *vtable);
+
+/*
+ * Base class destructor for ref_iterators. Destroy the ref_iterator
+ * part of iter and shallow-free the object. This is meant to be
+ * called only by the destructors of derived classes.
+ */
+void base_ref_iterator_free(struct ref_iterator *iter);
+
+/* Virtual function declarations for ref_iterators: */
+
+typedef int ref_iterator_advance_fn(struct ref_iterator *ref_iterator);
+
+typedef int ref_iterator_peel_fn(struct ref_iterator *ref_iterator,
+				 struct object_id *peeled);
+
+/*
+ * Implementations of this function should free any resources specific
+ * to the derived class, then call base_ref_iterator_free() to clean
+ * up and free the ref_iterator object.
+ */
+typedef int ref_iterator_abort_fn(struct ref_iterator *ref_iterator);
+
+struct ref_iterator_vtable {
+	ref_iterator_advance_fn *advance;
+	ref_iterator_peel_fn *peel;
+	ref_iterator_abort_fn *abort;
+};
+
+/*
  * Call fn for each reference in the specified submodule for which the
  * refname begins with prefix. If trim is non-zero, then trim that
  * many characters off the beginning of each refname before passing
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/13] do_for_each_ref(): reimplement using reference iteration
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
                   ` (8 preceding siblings ...)
  2016-05-30  7:55 ` [PATCH 09/13] refs: introduce an iterator interface Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-05-30  7:55 ` [PATCH 11/13] for_each_reflog(): don't abort for bad references Michael Haggerty
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

Use the reference iterator interface to implement do_for_each_ref().
Delete a bunch of code supporting the old for_each_ref() implementation.
And now that do_for_each_ref() is generic code (it is no longer tied to
the files backend), move it to refs.c.

The implementation is via a new function, do_for_each_ref_iterator(),
which takes a reference iterator as argument and calls a callback
function for each of the references in the iterator.

This change requires the current_ref performance hack for peel_ref() to
be implemented via ref_iterator_peel() rather than peel_entry() because
we don't have a ref_entry handy (it is hidden under three layers:
file_ref_iterator, merge_ref_iterator, and cache_ref_iterator). So:

* do_for_each_ref_iterator() records the active iterator in
  current_ref_iter while it is running.

* peel_ref() checks whether current_ref_iter is pointing at the
  requested reference. If so, it asks the iterator to peel the
  reference (which it can do efficiently via its "peel" virtual
  function). For extra safety, we do the optimization only if the
  refname *addresses* are the same, not only if the refname *strings*
  are the same, to forestall possible mixups between refnames that come
  from different ref_iterators.

Please note that this optimization of peel_ref() is only available when
iterating via do_for_each_ref_iterator() (including all of the
for_each_ref() functions, which call it indirectly). It would be
complicated to implement a similar optimization when iterating directly
using a reference iterator, because multiple reference iterators can be
in use at the same time, with interleaved calls to
ref_iterator_advance(). (In fact we do exactly that in
merge_ref_iterator.)

But that is not necessary. peel_ref() is only called while iterating
over references. Callers who iterate using the for_each_ref() functions
benefit from the optimization described above. Callers who iterate using
reference iterators directly have access to the ref_iterator, so they
can call ref_iterator_peel() themselves to get an analogous optimization
in a more straightforward manner.

If we rewrite all callers to use the reference iteration API, then we
can remove the current_ref_iter hack permanently.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c               |  20 +++++
 refs/files-backend.c | 206 ++-------------------------------------------------
 refs/iterator.c      |  29 ++++++++
 refs/refs-internal.h |  33 ++++++---
 4 files changed, 76 insertions(+), 212 deletions(-)

diff --git a/refs.c b/refs.c
index 842c5c7..814cad3 100644
--- a/refs.c
+++ b/refs.c
@@ -1120,6 +1120,26 @@ int head_ref(each_ref_fn fn, void *cb_data)
 	return head_ref_submodule(NULL, fn, cb_data);
 }
 
+/*
+ * Call fn for each reference in the specified submodule for which the
+ * refname begins with prefix. If trim is non-zero, then trim that
+ * many characters off the beginning of each refname before passing
+ * the refname to fn. flags can be DO_FOR_EACH_INCLUDE_BROKEN to
+ * include broken references in the iteration. If fn ever returns a
+ * non-zero value, stop the iteration and return that value;
+ * otherwise, return 0.
+ */
+static int do_for_each_ref(const char *submodule, const char *prefix,
+			   each_ref_fn fn, int trim, int flags, void *cb_data)
+{
+	struct ref_iterator *iter;
+
+	iter = files_ref_iterator_begin(submodule, prefix, flags);
+	iter = prefix_ref_iterator_begin(iter, prefix, trim);
+
+	return do_for_each_ref_iterator(iter, fn, cb_data);
+}
+
 int for_each_ref(each_ref_fn fn, void *cb_data)
 {
 	return do_for_each_ref(NULL, "", fn, 0, 0, cb_data);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index dbf1587..c48a006 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -542,53 +542,8 @@ static int entry_resolves_to_object(struct ref_entry *entry)
 				      &entry->u.value.oid, entry->flag);
 }
 
-/*
- * current_ref is a performance hack: when iterating over references
- * using the for_each_ref*() functions, current_ref is set to the
- * current reference's entry before calling the callback function.  If
- * the callback function calls peel_ref(), then peel_ref() first
- * checks whether the reference to be peeled is the current reference
- * (it usually is) and if so, returns that reference's peeled version
- * if it is available.  This avoids a refname lookup in a common case.
- */
-static struct ref_entry *current_ref;
-
 typedef int each_ref_entry_fn(struct ref_entry *entry, void *cb_data);
 
-struct ref_entry_cb {
-	const char *prefix;
-	int trim;
-	int flags;
-	each_ref_fn *fn;
-	void *cb_data;
-};
-
-/*
- * Handle one reference in a do_for_each_ref*()-style iteration,
- * calling an each_ref_fn for each entry.
- */
-static int do_one_ref(struct ref_entry *entry, void *cb_data)
-{
-	struct ref_entry_cb *data = cb_data;
-	struct ref_entry *old_current_ref;
-	int retval;
-
-	if (!starts_with(entry->name, data->prefix))
-		return 0;
-
-	if (!(data->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
-	      !entry_resolves_to_object(entry))
-		return 0;
-
-	/* Store the old value, in case this is a recursive call: */
-	old_current_ref = current_ref;
-	current_ref = entry;
-	retval = data->fn(entry->name + data->trim, &entry->u.value.oid,
-			  entry->flag, data->cb_data);
-	current_ref = old_current_ref;
-	return retval;
-}
-
 /*
  * Call fn for each reference in dir that has index in the range
  * offset <= index < dir->nr.  Recurse into subdirectories that are in
@@ -618,78 +573,6 @@ static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset,
 }
 
 /*
- * Call fn for each reference in the union of dir1 and dir2, in order
- * by refname.  Recurse into subdirectories.  If a value entry appears
- * in both dir1 and dir2, then only process the version that is in
- * dir2.  The input dirs must already be sorted, but subdirs will be
- * sorted as needed.  fn is called for all references, including
- * broken ones.
- */
-static int do_for_each_entry_in_dirs(struct ref_dir *dir1,
-				     struct ref_dir *dir2,
-				     each_ref_entry_fn fn, void *cb_data)
-{
-	int retval;
-	int i1 = 0, i2 = 0;
-
-	assert(dir1->sorted == dir1->nr);
-	assert(dir2->sorted == dir2->nr);
-	while (1) {
-		struct ref_entry *e1, *e2;
-		int cmp;
-		if (i1 == dir1->nr) {
-			return do_for_each_entry_in_dir(dir2, i2, fn, cb_data);
-		}
-		if (i2 == dir2->nr) {
-			return do_for_each_entry_in_dir(dir1, i1, fn, cb_data);
-		}
-		e1 = dir1->entries[i1];
-		e2 = dir2->entries[i2];
-		cmp = strcmp(e1->name, e2->name);
-		if (cmp == 0) {
-			if ((e1->flag & REF_DIR) && (e2->flag & REF_DIR)) {
-				/* Both are directories; descend them in parallel. */
-				struct ref_dir *subdir1 = get_ref_dir(e1);
-				struct ref_dir *subdir2 = get_ref_dir(e2);
-				sort_ref_dir(subdir1);
-				sort_ref_dir(subdir2);
-				retval = do_for_each_entry_in_dirs(
-						subdir1, subdir2, fn, cb_data);
-				i1++;
-				i2++;
-			} else if (!(e1->flag & REF_DIR) && !(e2->flag & REF_DIR)) {
-				/* Both are references; ignore the one from dir1. */
-				retval = fn(e2, cb_data);
-				i1++;
-				i2++;
-			} else {
-				die("conflict between reference and directory: %s",
-				    e1->name);
-			}
-		} else {
-			struct ref_entry *e;
-			if (cmp < 0) {
-				e = e1;
-				i1++;
-			} else {
-				e = e2;
-				i2++;
-			}
-			if (e->flag & REF_DIR) {
-				struct ref_dir *subdir = get_ref_dir(e);
-				sort_ref_dir(subdir);
-				retval = do_for_each_entry_in_dir(
-						subdir, 0, fn, cb_data);
-			} else {
-				retval = fn(e, cb_data);
-			}
-		}
-		if (retval)
-			return retval;
-	}
-}
-
-/*
  * Load all of the refs from the dir into our in-memory cache. The hard work
  * of loading loose refs is done by get_ref_dir(), so we just need to recurse
  * through all of the sub-directories. We do not even need to care about
@@ -1960,11 +1843,12 @@ int peel_ref(const char *refname, unsigned char *sha1)
 	int flag;
 	unsigned char base[20];
 
-	if (current_ref && (current_ref->name == refname
-			    || !strcmp(current_ref->name, refname))) {
-		if (peel_entry(current_ref, 0))
+	if (current_ref_iter && current_ref_iter->refname == refname) {
+		struct object_id peeled;
+
+		if (ref_iterator_peel(current_ref_iter, &peeled))
 			return -1;
-		hashcpy(sha1, current_ref->u.value.peeled.hash);
+		hashcpy(sha1, peeled.hash);
 		return 0;
 	}
 
@@ -2126,86 +2010,6 @@ struct ref_iterator *files_ref_iterator_begin(
 }
 
 /*
- * Call fn for each reference in the specified ref_cache, omitting
- * references not in the containing_dir of prefix. Call fn for all
- * references, including broken ones. If fn ever returns a non-zero
- * value, stop the iteration and return that value; otherwise, return
- * 0.
- */
-static int do_for_each_entry(struct ref_cache *refs, const char *prefix,
-			     each_ref_entry_fn fn, void *cb_data)
-{
-	struct packed_ref_cache *packed_ref_cache;
-	struct ref_dir *loose_dir;
-	struct ref_dir *packed_dir;
-	int retval = 0;
-
-	/*
-	 * We must make sure that all loose refs are read before accessing the
-	 * packed-refs file; this avoids a race condition in which loose refs
-	 * are migrated to the packed-refs file by a simultaneous process, but
-	 * our in-memory view is from before the migration. get_packed_ref_cache()
-	 * takes care of making sure our view is up to date with what is on
-	 * disk.
-	 */
-	loose_dir = get_loose_refs(refs);
-	if (prefix && *prefix) {
-		loose_dir = find_containing_dir(loose_dir, prefix, 0);
-	}
-	if (loose_dir)
-		prime_ref_dir(loose_dir);
-
-	packed_ref_cache = get_packed_ref_cache(refs);
-	acquire_packed_ref_cache(packed_ref_cache);
-	packed_dir = get_packed_ref_dir(packed_ref_cache);
-	if (prefix && *prefix) {
-		packed_dir = find_containing_dir(packed_dir, prefix, 0);
-	}
-
-	if (packed_dir && loose_dir) {
-		sort_ref_dir(packed_dir);
-		sort_ref_dir(loose_dir);
-		retval = do_for_each_entry_in_dirs(
-				packed_dir, loose_dir, fn, cb_data);
-	} else if (packed_dir) {
-		sort_ref_dir(packed_dir);
-		retval = do_for_each_entry_in_dir(
-				packed_dir, 0, fn, cb_data);
-	} else if (loose_dir) {
-		sort_ref_dir(loose_dir);
-		retval = do_for_each_entry_in_dir(
-				loose_dir, 0, fn, cb_data);
-	}
-
-	release_packed_ref_cache(packed_ref_cache);
-	return retval;
-}
-
-int do_for_each_ref(const char *submodule, const char *prefix,
-		    each_ref_fn fn, int trim, int flags, void *cb_data)
-{
-	struct ref_entry_cb data;
-	struct ref_cache *refs;
-
-	refs = get_ref_cache(submodule);
-	if (!refs)
-		return 0;
-
-	data.prefix = prefix;
-	data.trim = trim;
-	data.flags = flags;
-	data.fn = fn;
-	data.cb_data = cb_data;
-
-	if (ref_paranoia < 0)
-		ref_paranoia = git_env_bool("GIT_REF_PARANOIA", 0);
-	if (ref_paranoia)
-		data.flags |= DO_FOR_EACH_INCLUDE_BROKEN;
-
-	return do_for_each_entry(refs, prefix, do_one_ref, &data);
-}
-
-/*
  * Verify that the reference locked by lock has the value old_sha1.
  * Fail if the reference doesn't exist and mustexist is set. Return 0
  * on success. On error, write an error message to err, set errno, and
diff --git a/refs/iterator.c b/refs/iterator.c
index 82a6948..c1670ab 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -345,3 +345,32 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 
 	return ref_iterator;
 }
+
+struct ref_iterator *current_ref_iter = NULL;
+
+int do_for_each_ref_iterator(struct ref_iterator *iter,
+			     each_ref_fn fn, void *cb_data)
+{
+	int retval = 0, ok;
+	struct ref_iterator *old_ref_iter = current_ref_iter;
+
+	current_ref_iter = iter;
+	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+		retval = fn(iter->refname, iter->oid, iter->flags, cb_data);
+		if (retval) {
+			/*
+			 * If ref_iterator_abort() returns ITER_ERROR,
+			 * we ignore that error in deference to the
+			 * callback function's return value.
+			 */
+			ref_iterator_abort(iter);
+			goto out;
+		}
+	}
+
+out:
+	current_ref_iter = old_ref_iter;
+	if (ok == ITER_ERROR)
+		return -1;
+	return retval;
+}
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 11182b4..f16a38e 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -442,18 +442,29 @@ struct ref_iterator_vtable {
 };
 
 /*
- * Call fn for each reference in the specified submodule for which the
- * refname begins with prefix. If trim is non-zero, then trim that
- * many characters off the beginning of each refname before passing
- * the refname to fn. flags can be DO_FOR_EACH_INCLUDE_BROKEN to
- * include broken references in the iteration. If fn ever returns a
- * non-zero value, stop the iteration and return that value;
- * otherwise, return 0.
- *
- * This is the common backend for the for_each_*ref* functions.
+ * current_ref_iter is a performance hack: when iterating over
+ * references using the for_each_ref*() functions, current_ref_iter is
+ * set to the reference iterator before calling the callback function.
+ * If the callback function calls peel_ref(), then peel_ref() first
+ * checks whether the reference to be peeled is the one referred to by
+ * the iterator (it usually is) and if so, asks the iterator for the
+ * peeled version of the reference if it is available. This avoids a
+ * refname lookup in a common case. current_ref_iter is set to NULL
+ * when the iteration is over.
  */
-int do_for_each_ref(const char *submodule, const char *prefix,
-		    each_ref_fn fn, int trim, int flags, void *cb_data);
+extern struct ref_iterator *current_ref_iter;
+
+/*
+ * The common backend for the for_each_*ref* functions. Call fn for
+ * each reference in iter. If the iterator itself ever returns
+ * ITER_ERROR, return -1. If fn ever returns a non-zero value, stop
+ * the iteration and return that value. Otherwise, return 0. In any
+ * case, free the iterator when done. This function is basically an
+ * adapter between the callback style of reference iteration and the
+ * iterator style.
+ */
+int do_for_each_ref_iterator(struct ref_iterator *iter,
+			     each_ref_fn fn, void *cb_data);
 
 /*
  * Read the specified reference from the filesystem or packed refs
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/13] for_each_reflog(): don't abort for bad references
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
                   ` (9 preceding siblings ...)
  2016-05-30  7:55 ` [PATCH 10/13] do_for_each_ref(): reimplement using reference iteration Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-05-30  7:55 ` [PATCH 12/13] dir_iterator: new API for iterating over a directory tree Michael Haggerty
  2016-05-30  7:55 ` [PATCH 13/13] for_each_reflog(): reimplement using iterators Michael Haggerty
  12 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

If there is a file under "$GIT_DIR/logs" with no corresponding
reference, the old code was emitting an error message, aborting the
reflog iteration, and returning -1. But

* None of the callers was checking the exit value

* The callers all want to find all legitimate reflogs (sometimes for the
  purpose of determining object reachability!) and wouldn't benefit from
  a truncated iteration anyway.

So instead, emit an error message and skip the "broken" reflog, but
continue with the iteration.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index c48a006..a7cc0e2 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3325,7 +3325,7 @@ static int do_for_each_reflog(struct strbuf *name, each_ref_fn fn, void *cb_data
 				struct object_id oid;
 
 				if (read_ref_full(name->buf, 0, oid.hash, NULL))
-					retval = error("bad ref for %s", name->buf);
+					error("bad ref for %s", name->buf);
 				else
 					retval = fn(name->buf, &oid, 0, cb_data);
 			}
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/13] dir_iterator: new API for iterating over a directory tree
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
                   ` (10 preceding siblings ...)
  2016-05-30  7:55 ` [PATCH 11/13] for_each_reflog(): don't abort for bad references Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  2016-06-01  0:12   ` David Turner
  2016-05-30  7:55 ` [PATCH 13/13] for_each_reflog(): reimplement using iterators Michael Haggerty
  12 siblings, 1 reply; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

The iterator interface is modeled on that for references, though no
vtable is necessary because there is (so far?) only one type of
dir_iterator.

There are obviously a lot of features that could easily be added to this
class:

* Skip/include directory paths in the iteration
* Shallow/deep iteration
* Letting the caller decide which subdirectories to recurse into (e.g.,
  via a dir_iterator_advance_into() function)
* Option to iterate in sorted order
* Option to iterate over directory paths before vs. after their contents

But these are not needed for the current patch series, so I refrain.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 Makefile       |   1 +
 dir-iterator.c | 180 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 dir-iterator.h |  86 +++++++++++++++++++++++++++
 3 files changed, 267 insertions(+)
 create mode 100644 dir-iterator.c
 create mode 100644 dir-iterator.h

diff --git a/Makefile b/Makefile
index ac8f365..b4ffc11 100644
--- a/Makefile
+++ b/Makefile
@@ -722,6 +722,7 @@ LIB_OBJS += diff-lib.o
 LIB_OBJS += diff-no-index.o
 LIB_OBJS += diff.o
 LIB_OBJS += dir.o
+LIB_OBJS += dir-iterator.o
 LIB_OBJS += editor.o
 LIB_OBJS += entry.o
 LIB_OBJS += environment.o
diff --git a/dir-iterator.c b/dir-iterator.c
new file mode 100644
index 0000000..e1d60f0
--- /dev/null
+++ b/dir-iterator.c
@@ -0,0 +1,180 @@
+#include "cache.h"
+#include "dir.h"
+#include "iterator.h"
+#include "dir-iterator.h"
+
+struct dir_iterator_level {
+	int initialized;
+
+	DIR *dir;
+
+	/*
+	 * The length of the directory part of refname at this level
+	 * (including the trailing '/'):
+	 */
+	size_t prefix_len;
+
+	/*
+	 * The last action that has been taken with the current entry
+	 * (needed for directories, which have to be included in the
+	 * iteration and also iterated into):
+	 */
+	enum {
+		DIR_STATE_ITER,
+		DIR_STATE_RECURSE
+	} dir_state;
+};
+
+/*
+ * The full data structure used to manage the internal directory
+ * iteration state. It includes members that are not part of the
+ * public interface.
+ */
+struct dir_iterator_int {
+	struct dir_iterator base;
+
+	/*
+	 * The number of levels currently on the stack. This is always
+	 * at least 1, because when it becomes zero the iteration is
+	 * ended and this struct is freed.
+	 */
+	size_t levels_nr;
+
+	/* The number of levels that have been allocated on the stack */
+	size_t levels_alloc;
+
+	/*
+	 * A stack of levels. levels[0] is the uppermost directory
+	 * that will be included in this iteration.
+	 */
+	struct dir_iterator_level *levels;
+};
+
+int dir_iterator_advance(struct dir_iterator *dir_iterator)
+{
+	struct dir_iterator_int *iter =
+		(struct dir_iterator_int *)dir_iterator;
+
+	while (1) {
+		struct dir_iterator_level *level =
+			&iter->levels[iter->levels_nr - 1];
+		struct dirent *de;
+
+		if (!level->initialized) {
+			if (!is_dir_sep(iter->base.path.buf[iter->base.path.len - 1]))
+				strbuf_addch(&iter->base.path, '/');
+			level->prefix_len = iter->base.path.len;
+
+			/* opendir() errors are handled below */
+			level->dir = opendir(iter->base.path.buf);
+
+			level->initialized = 1;
+		} else if (S_ISDIR(iter->base.st.st_mode)) {
+			if (level->dir_state == DIR_STATE_ITER) {
+				/*
+				 * The directory was just iterated
+				 * over; now prepare to iterate into
+				 * it.
+				 */
+				level->dir_state = DIR_STATE_RECURSE;
+				ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+					   iter->levels_alloc);
+				level = &iter->levels[iter->levels_nr++];
+				level->initialized = 0;
+				continue;
+			} else {
+				/*
+				 * The directory has already been
+				 * iterated over and iterated into;
+				 * we're done with it.
+				 */
+			}
+		}
+
+		if (!level->dir) {
+			/*
+			 * This level is exhausted (or wasn't opened
+			 * successfully); pop up a level.
+			 */
+			if (--iter->levels_nr == 0) {
+				return dir_iterator_abort(dir_iterator);
+			}
+			continue;
+		}
+
+		/*
+		 * Loop until we find an entry that we can give back
+		 * to the caller:
+		 */
+		while (1) {
+			strbuf_setlen(&iter->base.path, level->prefix_len);
+			de = readdir(level->dir);
+
+			if (!de) {
+				/* This level is exhausted; pop up a level. */
+				closedir(level->dir);
+				level->dir = NULL;
+				if (--iter->levels_nr == 0)
+					return dir_iterator_abort(dir_iterator);
+				break;
+			}
+
+			if (is_dot_or_dotdot(de->d_name))
+				continue;
+
+			strbuf_addstr(&iter->base.path, de->d_name);
+			if (lstat(iter->base.path.buf, &iter->base.st) < 0)
+				continue; /* silently skip */
+
+			/*
+			 * We have to set these each time because
+			 * the path strbuf might have been realloc()ed.
+			 */
+
+			iter->base.relative_path =
+				iter->base.path.buf + iter->levels[0].prefix_len;
+			iter->base.basename =
+				iter->base.path.buf + level->prefix_len;
+			level->dir_state = DIR_STATE_ITER;
+
+			return ITER_OK;
+		}
+	}
+}
+
+int dir_iterator_abort(struct dir_iterator *dir_iterator)
+{
+	struct dir_iterator_int *iter = (struct dir_iterator_int *)dir_iterator;
+
+	while (iter->levels_nr) {
+		struct dir_iterator_level *level =
+			&iter->levels[--iter->levels_nr];
+
+		if (level->dir)
+			closedir(level->dir);
+	}
+
+	free(iter->levels);
+	strbuf_release(&iter->base.path);
+	free(iter);
+	return ITER_DONE;
+}
+
+struct dir_iterator *dir_iterator_begin(const char *path)
+{
+	struct dir_iterator_int *iter = xcalloc(1, sizeof(*iter));
+	struct dir_iterator *dir_iterator = &iter->base;
+
+	if (!path || !*path)
+		die("BUG: empty path passed to dir_iterator_begin()");
+
+	strbuf_init(&iter->base.path, PATH_MAX);
+	strbuf_addstr(&iter->base.path, path);
+
+	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+
+	iter->levels_nr = 1;
+	iter->levels[0].initialized = 0;
+
+	return dir_iterator;
+}
diff --git a/dir-iterator.h b/dir-iterator.h
new file mode 100644
index 0000000..8eb1f4c
--- /dev/null
+++ b/dir-iterator.h
@@ -0,0 +1,86 @@
+#ifndef DIR_ITERATOR_H
+#define DIR_ITERATOR_H
+
+/*
+ * Iterate over a directory tree.
+ *
+ * Iterate over a directory tree, recursively, including paths of all
+ * types and hidden paths. Skip "." and ".." entries and don't follow
+ * symlinks except for the original path.
+ *
+ * Every time dir_iterator_advance() is called, update the members of
+ * the dir_iterator structure to reflect the next path in the
+ * iteration. The order that paths are iterated over within a
+ * directory is undefined, but directory paths are always iterated
+ * over before the subdirectory contents.
+ *
+ * A typical iteration looks like this:
+ *
+ *     int ok;
+ *     struct iterator *iter = dir_iterator_begin(path);
+ *
+ *     while ((ok = dir_iterator_advance(iter)) == ITER_OK) {
+ *             if (want_to_stop_iteration()) {
+ *                     ok = dir_iterator_abort(iter);
+ *                     break;
+ *             }
+ *
+ *             // Access information about the current path:
+ *             if (S_ISDIR(iter->st.st_mode))
+ *                     printf("%s is a directory\n", iter->relative_path);
+ *     }
+ *
+ *     if (ok != ITER_DONE)
+ *             handle_error();
+ *
+ * Callers are allowed to modify iter->path while they are working,
+ * but they must restore it to its original contents before calling
+ * dir_iterator_advance() again.
+ */
+
+struct dir_iterator {
+	/* The current path: */
+	struct strbuf path;
+
+	/*
+	 * The current path relative to the starting path. This part
+	 * of the path always uses "/" characters to separate path
+	 * components:
+	 */
+	const char *relative_path;
+
+	/* The current basename: */
+	const char *basename;
+
+	/* The result of calling lstat() on path: */
+	struct stat st;
+};
+
+/*
+ * Start a directory iteration over path. Return a dir_iterator that
+ * holds the internal state of the iteration.
+ *
+ * The iteration includes all paths under path, not including path
+ * itself and not including "." or ".." entries.
+ *
+ * path is the starting directory. An internal copy will be made.
+ */
+struct dir_iterator *dir_iterator_begin(const char *path);
+
+/*
+ * Advance the iterator to the first or next item and return ITER_OK.
+ * If the iteration is exhausted, free the resources associated with
+ * the iterator and return ITER_DONE. On error, return ITER_ERROR. It
+ * is a bug to use iterator or call this function again after it has
+ * returned false.
+ */
+int dir_iterator_advance(struct dir_iterator *iterator);
+
+/*
+ * End the iteration before it has been exhausted. Free the reference
+ * iterator and any associated resources and return ITER_DONE. Return
+ * ITER_ERROR on error.
+ */
+int dir_iterator_abort(struct dir_iterator *iterator);
+
+#endif
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/13] for_each_reflog(): reimplement using iterators
  2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
                   ` (11 preceding siblings ...)
  2016-05-30  7:55 ` [PATCH 12/13] dir_iterator: new API for iterating over a directory tree Michael Haggerty
@ 2016-05-30  7:55 ` Michael Haggerty
  12 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-30  7:55 UTC (permalink / raw)
  To: Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git, Michael Haggerty

Allow references with reflogs to be iterated over using a ref_iterator.
The latter is implemented as a files_reflog_iterator, which in turn uses
dir_iterator to read the "logs" directory.

Note that reflog iteration doesn't correctly handle per-worktree
reflogs (either before or after this patch).

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 113 ++++++++++++++++++++++++++++++++-------------------
 refs/refs-internal.h |   7 ++++
 2 files changed, 78 insertions(+), 42 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index a7cc0e2..2e0d006 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2,6 +2,7 @@
 #include "../refs.h"
 #include "refs-internal.h"
 #include "../iterator.h"
+#include "../dir-iterator.h"
 #include "../lockfile.h"
 #include "../object.h"
 #include "../dir.h"
@@ -3292,60 +3293,88 @@ int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_dat
 	strbuf_release(&sb);
 	return ret;
 }
-/*
- * Call fn for each reflog in the namespace indicated by name.  name
- * must be empty or end with '/'.  Name will be used as a scratch
- * space, but its contents will be restored before return.
- */
-static int do_for_each_reflog(struct strbuf *name, each_ref_fn fn, void *cb_data)
+
+struct files_reflog_iterator {
+	struct ref_iterator base;
+
+	struct dir_iterator *dir_iterator;
+	struct object_id oid;
+};
+
+static int files_reflog_iterator_advance(struct ref_iterator *ref_iterator)
 {
-	DIR *d = opendir(git_path("logs/%s", name->buf));
-	int retval = 0;
-	struct dirent *de;
-	int oldlen = name->len;
+	struct files_reflog_iterator *iter =
+		(struct files_reflog_iterator *)ref_iterator;
+	struct dir_iterator *diter = iter->dir_iterator;
+	int ok;
 
-	if (!d)
-		return name->len ? errno : 0;
+	while ((ok = dir_iterator_advance(diter)) == ITER_OK) {
+		int flags;
 
-	while ((de = readdir(d)) != NULL) {
-		struct stat st;
-
-		if (de->d_name[0] == '.')
+		if (!S_ISREG(diter->st.st_mode))
+			continue;
+		if (diter->basename[0] == '.')
 			continue;
-		if (ends_with(de->d_name, ".lock"))
+		if (ends_with(diter->basename, ".lock"))
 			continue;
-		strbuf_addstr(name, de->d_name);
-		if (stat(git_path("logs/%s", name->buf), &st) < 0) {
-			; /* silently ignore */
-		} else {
-			if (S_ISDIR(st.st_mode)) {
-				strbuf_addch(name, '/');
-				retval = do_for_each_reflog(name, fn, cb_data);
-			} else {
-				struct object_id oid;
 
-				if (read_ref_full(name->buf, 0, oid.hash, NULL))
-					error("bad ref for %s", name->buf);
-				else
-					retval = fn(name->buf, &oid, 0, cb_data);
-			}
-			if (retval)
-				break;
+		if (read_ref_full(diter->relative_path, 0,
+				  iter->oid.hash, &flags)) {
+			error("bad ref for %s", diter->path.buf);
+			continue;
 		}
-		strbuf_setlen(name, oldlen);
+
+		iter->base.refname = diter->relative_path;
+		iter->base.oid = &iter->oid;
+		iter->base.flags = flags;
+		return ITER_OK;
 	}
-	closedir(d);
-	return retval;
+
+	iter->dir_iterator = NULL;
+	if (ref_iterator_abort(ref_iterator) == ITER_ERROR)
+		ok = ITER_ERROR;
+	return ok;
+}
+
+static int files_reflog_iterator_peel(struct ref_iterator *ref_iterator,
+				   struct object_id *peeled)
+{
+	die("BUG: ref_iterator_peel() called for reflog_iterator");
+}
+
+static int files_reflog_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct files_reflog_iterator *iter =
+		(struct files_reflog_iterator *)ref_iterator;
+	int ok = ITER_DONE;
+
+	if (iter->dir_iterator)
+		ok = dir_iterator_abort(iter->dir_iterator);
+
+	base_ref_iterator_free(ref_iterator);
+	return ok;
+}
+
+struct ref_iterator_vtable files_reflog_iterator_vtable = {
+	files_reflog_iterator_advance,
+	files_reflog_iterator_peel,
+	files_reflog_iterator_abort
+};
+
+struct ref_iterator *files_reflog_iterator_begin(void)
+{
+	struct files_reflog_iterator *iter = xcalloc(1, sizeof(*iter));
+	struct ref_iterator *ref_iterator = &iter->base;
+
+	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
+	iter->dir_iterator = dir_iterator_begin(git_path("logs"));
+	return ref_iterator;
 }
 
 int for_each_reflog(each_ref_fn fn, void *cb_data)
 {
-	int retval;
-	struct strbuf name;
-	strbuf_init(&name, PATH_MAX);
-	retval = do_for_each_reflog(&name, fn, cb_data);
-	strbuf_release(&name);
-	return retval;
+	return do_for_each_ref_iterator(files_reflog_iterator_begin(),
+					fn, cb_data);
 }
 
 static int ref_update_reject_duplicates(struct string_list *refnames,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index f16a38e..87ad8b1 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -403,6 +403,13 @@ struct ref_iterator *files_ref_iterator_begin(const char *submodule,
 					      const char *prefix,
 					      unsigned int flags);
 
+/*
+ * Iterate over the references in the main ref_store that have a
+ * reflog. The paths within a directory are iterated over in arbitrary
+ * order.
+ */
+struct ref_iterator *files_reflog_iterator_begin(void);
+
 /* Internal implementation of reference iteration: */
 
 /*
-- 
2.8.1

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] refs: introduce an iterator interface
  2016-05-30  7:55 ` [PATCH 09/13] refs: introduce an iterator interface Michael Haggerty
@ 2016-05-30 15:22   ` Ramsay Jones
  2016-05-30 16:57     ` Ramsay Jones
  2016-05-31  5:29   ` Eric Sunshine
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Ramsay Jones @ 2016-05-30 15:22 UTC (permalink / raw)
  To: Michael Haggerty, Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git



On 30/05/16 08:55, Michael Haggerty wrote:
[snip]

>  /* Reference is a symbolic reference. */
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 8ab4d5f..dbf1587 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -1,6 +1,7 @@
>  #include "../cache.h"
>  #include "../refs.h"
>  #include "refs-internal.h"
> +#include "../iterator.h"
>  #include "../lockfile.h"
>  #include "../object.h"
>  #include "../dir.h"
> @@ -704,6 +705,154 @@ static void prime_ref_dir(struct ref_dir *dir)
>  	}
>  }
>  
> +/*
> + * A level in the reference hierarchy that is currently being iterated
> + * through.
> + */
> +struct cache_ref_iterator_level {
> +	/*
> +	 * The ref_dir being iterated over at this level. The ref_dir
> +         * is sorted before being stored here.
> +	 */
> +	struct ref_dir *dir;
> +
> +	/*
> +	 * The index of the current entry within dir (which might
> +	 * itself be a directory). If index == -1, then the iteration
> +	 * hasn't yet begun. If index == dir->nr, then the iteration
> +	 * through this level is over.
> +	 */
> +	int index;
> +};
> +
> +/*
> + * Represent an iteration through a ref_dir in the memory cache. The
> + * iteration recurses through subdirectories.
> + */
> +struct cache_ref_iterator {
> +	struct ref_iterator base;
> +
> +	/*
> +	 * The number of levels currently on the stack. This is always
> +	 * at least 1, because when it becomes zero the iteration is
> +	 * ended and this struct is freed.
> +	 */
> +	size_t levels_nr;
> +
> +	/* The number of levels that have been allocated on the stack */
> +	size_t levels_alloc;
> +
> +	/*
> +	 * A stack of levels. levels[0] is the uppermost level that is
> +	 * being iterated over in this iteration. (This is not
> +	 * necessary the top level in the references hierarchy. If we
> +	 * are iterating through a subtree, then levels[0] will hold
> +	 * the ref_dir for that subtree, and subsequent levels will go
> +	 * on from there.)
> +	 */
> +	struct cache_ref_iterator_level *levels;
> +};
> +
> +static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
> +{
> +	struct cache_ref_iterator *iter =
> +		(struct cache_ref_iterator *)ref_iterator;
> +
> +	while (1) {
> +		struct cache_ref_iterator_level *level =
> +			&iter->levels[iter->levels_nr - 1];
> +		struct ref_dir *dir = level->dir;
> +		struct ref_entry *entry;
> +
> +		if (level->index == -1)
> +			sort_ref_dir(dir);

do you need to sort here ...
> +
> +		if (++level->index == level->dir->nr) {
> +			/* This level is exhausted; pop up a level */
> +			if (--iter->levels_nr == 0)
> +				return ref_iterator_abort(ref_iterator);
> +
> +			continue;
> +		}
> +
> +		entry = dir->entries[level->index];
> +
> +		if (entry->flag & REF_DIR) {
> +			/* push down a level */
> +			ALLOC_GROW(iter->levels, iter->levels_nr + 1,
> +				   iter->levels_alloc);
> +
> +			level = &iter->levels[iter->levels_nr++];
> +			level->dir = get_ref_dir(entry);
> +			sort_ref_dir(level->dir);

... given that you sort here?

> +			level->index = -1;
> +		} else {
> +			iter->base.refname = entry->name;
> +			iter->base.oid = &entry->u.value.oid;
> +			iter->base.flags = entry->flag;
> +			return ITER_OK;
> +		}
> +	}
> +}
> +

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] refs: introduce an iterator interface
  2016-05-30 15:22   ` Ramsay Jones
@ 2016-05-30 16:57     ` Ramsay Jones
  2016-05-31  1:16       ` Michael Haggerty
  0 siblings, 1 reply; 27+ messages in thread
From: Ramsay Jones @ 2016-05-30 16:57 UTC (permalink / raw)
  To: Michael Haggerty, Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git



On 30/05/16 16:22, Ramsay Jones wrote:
> 
> 
> On 30/05/16 08:55, Michael Haggerty wrote:
> [snip]
> 
>>  /* Reference is a symbolic reference. */
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index 8ab4d5f..dbf1587 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -1,6 +1,7 @@
>>  #include "../cache.h"
>>  #include "../refs.h"
>>  #include "refs-internal.h"
>> +#include "../iterator.h"
>>  #include "../lockfile.h"
>>  #include "../object.h"
>>  #include "../dir.h"
>> @@ -704,6 +705,154 @@ static void prime_ref_dir(struct ref_dir *dir)
>>  	}
>>  }
>>  
>> +/*
>> + * A level in the reference hierarchy that is currently being iterated
>> + * through.
>> + */
>> +struct cache_ref_iterator_level {
>> +	/*
>> +	 * The ref_dir being iterated over at this level. The ref_dir
>> +         * is sorted before being stored here.
>> +	 */
>> +	struct ref_dir *dir;
>> +
>> +	/*
>> +	 * The index of the current entry within dir (which might
>> +	 * itself be a directory). If index == -1, then the iteration
>> +	 * hasn't yet begun. If index == dir->nr, then the iteration
>> +	 * through this level is over.
>> +	 */
>> +	int index;
>> +};
>> +
>> +/*
>> + * Represent an iteration through a ref_dir in the memory cache. The
>> + * iteration recurses through subdirectories.
>> + */
>> +struct cache_ref_iterator {
>> +	struct ref_iterator base;
>> +
>> +	/*
>> +	 * The number of levels currently on the stack. This is always
>> +	 * at least 1, because when it becomes zero the iteration is
>> +	 * ended and this struct is freed.
>> +	 */
>> +	size_t levels_nr;
>> +
>> +	/* The number of levels that have been allocated on the stack */
>> +	size_t levels_alloc;
>> +
>> +	/*
>> +	 * A stack of levels. levels[0] is the uppermost level that is
>> +	 * being iterated over in this iteration. (This is not
>> +	 * necessary the top level in the references hierarchy. If we
>> +	 * are iterating through a subtree, then levels[0] will hold
>> +	 * the ref_dir for that subtree, and subsequent levels will go
>> +	 * on from there.)
>> +	 */
>> +	struct cache_ref_iterator_level *levels;
>> +};
>> +
>> +static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
>> +{
>> +	struct cache_ref_iterator *iter =
>> +		(struct cache_ref_iterator *)ref_iterator;
>> +
>> +	while (1) {
>> +		struct cache_ref_iterator_level *level =
>> +			&iter->levels[iter->levels_nr - 1];
>> +		struct ref_dir *dir = level->dir;
>> +		struct ref_entry *entry;
>> +
>> +		if (level->index == -1)
>> +			sort_ref_dir(dir);
> 
> do you need to sort here ...
>> +
>> +		if (++level->index == level->dir->nr) {
>> +			/* This level is exhausted; pop up a level */
>> +			if (--iter->levels_nr == 0)
>> +				return ref_iterator_abort(ref_iterator);
>> +
>> +			continue;
>> +		}
>> +
>> +		entry = dir->entries[level->index];
>> +
>> +		if (entry->flag & REF_DIR) {
>> +			/* push down a level */
>> +			ALLOC_GROW(iter->levels, iter->levels_nr + 1,
>> +				   iter->levels_alloc);
>> +
>> +			level = &iter->levels[iter->levels_nr++];
>> +			level->dir = get_ref_dir(entry);
>> +			sort_ref_dir(level->dir);
> 
> ... given that you sort here?

I had intended to say 'or vice versa' here. When I wrote this, I had not
finished reading this patch (let alone the series). Now, I suspect that
you can simply drop this 'sort_ref_dir()' call site. Unless I've misread
the code, of course! ;-)

ATB,
Ramsay Jones

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] refs: introduce an iterator interface
  2016-05-30 16:57     ` Ramsay Jones
@ 2016-05-31  1:16       ` Michael Haggerty
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-31  1:16 UTC (permalink / raw)
  To: Ramsay Jones, Junio C Hamano, David Turner
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git

On 05/30/2016 06:57 PM, Ramsay Jones wrote:
> 
> 
> On 30/05/16 16:22, Ramsay Jones wrote:
>>
>>
>> On 30/05/16 08:55, Michael Haggerty wrote:
>> [snip]
>>
>>>  /* Reference is a symbolic reference. */
>>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>>> index 8ab4d5f..dbf1587 100644
>>> --- a/refs/files-backend.c
>>> +++ b/refs/files-backend.c
>>> [...]
>>> +			sort_ref_dir(dir);
>>
>> do you need to sort here ...
>>> [...]
>>> +			sort_ref_dir(level->dir);
>>
>> ... given that you sort here?
> 
> I had intended to say 'or vice versa' here. When I wrote this, I had not
> finished reading this patch (let alone the series). Now, I suspect that
> you can simply drop this 'sort_ref_dir()' call site. Unless I've misread
> the code, of course! ;-)

Yes, you are right. Thanks for catching this! I'll fix it in v2.

Michael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] refs: introduce an iterator interface
  2016-05-30  7:55 ` [PATCH 09/13] refs: introduce an iterator interface Michael Haggerty
  2016-05-30 15:22   ` Ramsay Jones
@ 2016-05-31  5:29   ` Eric Sunshine
  2016-05-31  7:59     ` Michael Haggerty
  2016-05-31  6:10   ` Junio C Hamano
  2016-06-02 10:08   ` Duy Nguyen
  3 siblings, 1 reply; 27+ messages in thread
From: Eric Sunshine @ 2016-05-31  5:29 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, David Turner, Jeff King,
	Nguyễn Thái Ngọc Duy, Git List

On Mon, May 30, 2016 at 3:55 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> [...]
> This commit introduces a new iteration primitive for references: a
> ref_iterator. A ref_iterator is a polymorphic object that a reference
> storage backend can be asked to instantiate. There are three functions
> that can be applied to a ref_iterator:
>
> * ref_iterator_advance(): move to the next reference in the iteration
> * ref_iterator_abort(): end the iteration before it is exhausted
> * ref_iterator_peel(): peel the reference currently being looked at
> [...]
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> @@ -249,6 +249,199 @@ int rename_ref_available(const char *oldname, const char *newname);
> +/*
> + * Advance the iterator to the first or next item and return ITER_OK.
> + * If the iteration is exhausted, free the resources associated with
> + * the ref_iterator and return ITER_DONE. On errors, free the iterator
> + * resources and return ITER_ERROR. It is a bug to use ref_iterator or
> + * call this function again after it has returned false.
> + */

Either:

    s/false/something other than ITER_OK/

or:

    s/false/ITER_DONE or ITER_ERROR/

> +int ref_iterator_advance(struct ref_iterator *ref_iterator);
> +
> +/*
> + * An iterator over nothing (its first ref_iterator_advance() call
> + * returns 0).
> + */

s/0/ITER_DONE/

> +struct ref_iterator *empty_ref_iterator_begin(void);
> +
> +/*
> + * Return true iff ref_iterator is an empty_ref_iterator.
> + */
> +int is_empty_ref_iterator(struct ref_iterator *ref_iterator);

I can see that you used this function as an optimization or
convenience in overlay_ref_iterator_begin(), but do you expect it to
be generally useful otherwise? Is it worth publishing? Do you have
other use-cases in mind?

Also, can you explain why the merge iterator doesn't also perform the
optimization/convenience of checking if one iterator is an empty
iterator?

> +/*
> + * Iterate over the intries from iter0 and iter1, with the values

s/intries/entries/

> + * interleaved as directed by the select function. The iterator takes
> + * ownership of iter0 and iter1 and frees them when the iteration is
> + * over.
> + */
> +struct ref_iterator *merge_ref_iterator_begin(
> +               struct ref_iterator *iter0, struct ref_iterator *iter1,
> +               ref_iterator_select_fn *select, void *cb_data);
> +
> +/*
> + * An iterator consisting of the union of the entries from iter0 and
> + * iter1. If there are entries common to the two sub-iterators, use
> + * the one from iter1. Each iterator must iterate over its entries in
> + * strcmp() order by refname for this to work.
> + *
> + * The new iterator takes ownership of its arguments and frees them
> + * when the iteration is over. As a convenience to callers, if iter0
> + * or iter1 is_empty_ref_iterator(), then abort that one immediately
> + * and return the other iterator directly, without wrapping it.
> + */
> +struct ref_iterator *overlay_ref_iterator_begin(struct ref_iterator *iter0,
> +                                               struct ref_iterator *iter1);

When reading about the overlay iterator (both code and documentation),
my expectation was that iter0 would shadow iter1, not the other way
around as implemented here. Of course, that's entirely subjective, but
the generic names don't provide any useful clues as to which shadows
which. Perhaps giving them more meaningful names would help.

> +/*
> + * Wrap iter0, only letting through the references whose names start
> + * with prefix. If trim is set, set iter->refname to the name of the
> + * reference with that many characters trimmed off the front;
> + * otherwise set it to the full refname. The new iterator takes over
> + * ownership of iter0 and frees it when iteration is over. It makes
> + * its own copy of prefix.
> + *
> + * As an convenience to callers, if prefix is the empty string and
> + * trim is zero, this function returns iter0 directly, without
> + * wrapping it.
> + */
> +struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
> +                                              const char *prefix,
> +                                              int trim);

Minor: Similarly, when reading the code and documentation, I wondered
why this was named 'iter0' when no 'iter1' was in sight. Perhaps name
it simply 'iter'.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] refs: introduce an iterator interface
  2016-05-30  7:55 ` [PATCH 09/13] refs: introduce an iterator interface Michael Haggerty
  2016-05-30 15:22   ` Ramsay Jones
  2016-05-31  5:29   ` Eric Sunshine
@ 2016-05-31  6:10   ` Junio C Hamano
  2016-05-31  8:45     ` Michael Haggerty
  2016-06-02 10:08   ` Duy Nguyen
  3 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2016-05-31  6:10 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: David Turner, Jeff King, Nguyễn Thái Ngọc Duy, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> This commit introduces a new iteration primitive for references: a
> ref_iterator. A ref_iterator is a polymorphic object that a reference
> storage backend can be asked to instantiate. There are three functions
> that can be applied to a ref_iterator:
>
> * ref_iterator_advance(): move to the next reference in the iteration
> * ref_iterator_abort(): end the iteration before it is exhausted
> * ref_iterator_peel(): peel the reference currently being looked at

This part looked somewhat strange in that it makes "peel" sound
something very special.  Even though I understand why, it made me
feel uneasy.  I do not think of another operation like peel that may
want to have such a specialized helper, so I'll let it pass, but the
primary uneasiness I felt comes from the fact that "iterator-peel"
is not an essential service of the API that needs for correctness,
but is a pure optimization (i.e. you could grab a ref from the
normal iterator call, and then ask "please peel this ref" to the
usual ref API that does not know anything about iteration).

> Iterating using a ref_iterator leaves the flow of control in the hands
> of the caller, which means that ref_iterators from multiple
> sources (e.g., loose and packed refs) can be composed and presented to
> the world as a single compound ref_iterator.

Yes, this is a very good move.  I am happy to see us going in this
direction.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] refs: introduce an iterator interface
  2016-05-31  5:29   ` Eric Sunshine
@ 2016-05-31  7:59     ` Michael Haggerty
  2016-05-31 23:12       ` Eric Sunshine
  0 siblings, 1 reply; 27+ messages in thread
From: Michael Haggerty @ 2016-05-31  7:59 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Junio C Hamano, David Turner, Jeff King,
	Nguyễn Thái Ngọc Duy, Git List

On 05/31/2016 07:29 AM, Eric Sunshine wrote:
> On Mon, May 30, 2016 at 3:55 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> [...]
> [...]
> Either:
> 
>     s/false/something other than ITER_OK/
> 
> or:
> 
>     s/false/ITER_DONE or ITER_ERROR/

Thanks.

>> +int ref_iterator_advance(struct ref_iterator *ref_iterator);
>> +
>> +/*
>> + * An iterator over nothing (its first ref_iterator_advance() call
>> + * returns 0).
>> + */
> 
> s/0/ITER_DONE/

Thanks. I guess you can guess what an earlier draft of this interface
looked like :-)

>> +struct ref_iterator *empty_ref_iterator_begin(void);
>> +
>> +/*
>> + * Return true iff ref_iterator is an empty_ref_iterator.
>> + */
>> +int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
> 
> I can see that you used this function as an optimization or
> convenience in overlay_ref_iterator_begin(), but do you expect it to
> be generally useful otherwise? Is it worth publishing? Do you have
> other use-cases in mind?

It is only "published" within the refs module, in refs/refs-internal.h.
This header file is not meant to be used by code outside of the refs module.

My thinking was that it might be useful to other reference backends. The
function is pretty safe for anybody to call, though I admit that it is
not very general.

I don't have a strong feeling either way. If nobody else chimes in, I'll
remove it from the header file as you suggested. We can always add it
back if somebody needs it.

> Also, can you explain why the merge iterator doesn't also perform the
> optimization/convenience of checking if one iterator is an empty
> iterator?

That's because the merge iterator doesn't know what its select function
will do. For example, you could imagine an "intersect" select function
that only lets through references that were in *both* sub-iterators. In
that case, your suggested "optimization" would be incorrect.

Incidentally, that's also why I decided to leave the select function in
charge even after one or both of the sub-iterators is exhausted—because
it lets merge_ref_iterator implement more diverse behavior.

>> +/*
>> + * Iterate over the intries from iter0 and iter1, with the values
> 
> s/intries/entries/

Thanks.

>> + * interleaved as directed by the select function. The iterator takes
>> + * ownership of iter0 and iter1 and frees them when the iteration is
>> + * over.
>> + */
>> +struct ref_iterator *merge_ref_iterator_begin(
>> +               struct ref_iterator *iter0, struct ref_iterator *iter1,
>> +               ref_iterator_select_fn *select, void *cb_data);
>> +
>> +/*
>> + * An iterator consisting of the union of the entries from iter0 and
>> + * iter1. If there are entries common to the two sub-iterators, use
>> + * the one from iter1. Each iterator must iterate over its entries in
>> + * strcmp() order by refname for this to work.
>> + *
>> + * The new iterator takes ownership of its arguments and frees them
>> + * when the iteration is over. As a convenience to callers, if iter0
>> + * or iter1 is_empty_ref_iterator(), then abort that one immediately
>> + * and return the other iterator directly, without wrapping it.
>> + */
>> +struct ref_iterator *overlay_ref_iterator_begin(struct ref_iterator *iter0,
>> +                                               struct ref_iterator *iter1);
> 
> When reading about the overlay iterator (both code and documentation),
> my expectation was that iter0 would shadow iter1, not the other way
> around as implemented here. Of course, that's entirely subjective, but
> the generic names don't provide any useful clues as to which shadows
> which. Perhaps giving them more meaningful names would help.

That's a good idea. I also found myself having to refer back to the
documentation to remind myself which was which.

How about I rename them "back" and "front"? I will also reverse the
order of the arguments.

(But I will leave the names "iter0" and "iter1" in merge_ref_iterator,
and also the constants like ITER_SELECT_0, because these don't
necessarily have the interpretation of "back" and "front".)

>> +/*
>> + * Wrap iter0, only letting through the references whose names start
>> + * with prefix. If trim is set, set iter->refname to the name of the
>> + * reference with that many characters trimmed off the front;
>> + * otherwise set it to the full refname. The new iterator takes over
>> + * ownership of iter0 and frees it when iteration is over. It makes
>> + * its own copy of prefix.
>> + *
>> + * As an convenience to callers, if prefix is the empty string and
>> + * trim is zero, this function returns iter0 directly, without
>> + * wrapping it.
>> + */
>> +struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
>> +                                              const char *prefix,
>> +                                              int trim);
> 
> Minor: Similarly, when reading the code and documentation, I wondered
> why this was named 'iter0' when no 'iter1' was in sight. Perhaps name
> it simply 'iter'.

I found that it got a little bit confusing, because the constructor and
method implementations all use `iter` as a local variable. In particular
in the constructor there would want to be an argument "iter" and also
the local variable "iter" for the iterator being constructed, so a new
name would otherwise have to be invented for one or the other. Between
all the "iter" and "iter" and "iter->iter", I found that naming the
sub-iterator "iter0" made things a little bit less bewildering.

If you don't like that, we could name the embedded iterators something
like "subiter", "subiter0", and "subiter1". But the current convention
is a bit more succinct so I slightly prefer it.

Thanks for all your comments!

Michael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] refs: introduce an iterator interface
  2016-05-31  6:10   ` Junio C Hamano
@ 2016-05-31  8:45     ` Michael Haggerty
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-05-31  8:45 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: David Turner, Jeff King, Nguyễn Thái Ngọc Duy, git

On 05/31/2016 08:10 AM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> This commit introduces a new iteration primitive for references: a
>> ref_iterator. A ref_iterator is a polymorphic object that a reference
>> storage backend can be asked to instantiate. There are three functions
>> that can be applied to a ref_iterator:
>>
>> * ref_iterator_advance(): move to the next reference in the iteration
>> * ref_iterator_abort(): end the iteration before it is exhausted
>> * ref_iterator_peel(): peel the reference currently being looked at
> 
> This part looked somewhat strange in that it makes "peel" sound
> something very special.  Even though I understand why, it made me
> feel uneasy.  I do not think of another operation like peel that may
> want to have such a specialized helper, so I'll let it pass, but the
> primary uneasiness I felt comes from the fact that "iterator-peel"
> is not an essential service of the API that needs for correctness,
> but is a pure optimization (i.e. you could grab a ref from the
> normal iterator call, and then ask "please peel this ref" to the
> usual ref API that does not know anything about iteration).

I agree that this is inelegant. The underlying reason is that the
iterator embodies both the iteration and also the reference currently
being iterated over. So ref_iterator_advance() and ref_iterator_abort()
are really iterator methods, whereas ref_iterator_peel() is really a
method on the current reference. For that matter, the refname etc fields
are conceptually parts of the current reference, too, not of the iteration.

One of my many coding spikes tried to separate the two concepts by
having the iterator give back a separate object representing the
reference. That object would have had a peel() method. But I wasn't
happy with that approach:

* Anything that made it look like the reference object had a lifetime
independent of that of the iterator seemed like asking for trouble.

* The peel() method would probably have to be a real method that knows
where the reference came from (as opposed to a simple function) because
of things like prefix_ref_iterator, where the refname stored in the
ref_iterator might have had its prefix stripped off.

The complexity level was getting beyond what I felt was tenable in C, so
I made the compromise that you see.

If I had more time I'd do some benchmarking to see whether the "peeling"
optimization is really still needed. It comes from the days when
references were stored in giant lists. Now that we store refs in tries,
and each level gets sorted as part of the iteration anyway, the lookup
might be fast enough that nobody would care.

> [...]

Michael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] refs: introduce an iterator interface
  2016-05-31  7:59     ` Michael Haggerty
@ 2016-05-31 23:12       ` Eric Sunshine
  2016-06-03 11:48         ` Michael Haggerty
  0 siblings, 1 reply; 27+ messages in thread
From: Eric Sunshine @ 2016-05-31 23:12 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, David Turner, Jeff King,
	Nguyễn Thái Ngọc Duy, Git List

On Tue, May 31, 2016 at 3:59 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 05/31/2016 07:29 AM, Eric Sunshine wrote:
>> On Mon, May 30, 2016 at 3:55 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>>> +struct ref_iterator *empty_ref_iterator_begin(void);
>>> +
>>> +/*
>>> + * Return true iff ref_iterator is an empty_ref_iterator.
>>> + */
>>> +int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
>>
>> I can see that you used this function as an optimization or
>> convenience in overlay_ref_iterator_begin(), but do you expect it to
>> be generally useful otherwise? Is it worth publishing? Do you have
>> other use-cases in mind?
>
> It is only "published" within the refs module, in refs/refs-internal.h.
> This header file is not meant to be used by code outside of the refs module.

Ah, I forgot about that. In that case, it's probably less of an issue.

> My thinking was that it might be useful to other reference backends. The
> function is pretty safe for anybody to call, though I admit that it is
> not very general.
>
> I don't have a strong feeling either way. If nobody else chimes in, I'll
> remove it from the header file as you suggested. We can always add it
> back if somebody needs it.

I don't feel strongly about it either.

>> Also, can you explain why the merge iterator doesn't also perform the
>> optimization/convenience of checking if one iterator is an empty
>> iterator?
>
> That's because the merge iterator doesn't know what its select function
> will do. For example, you could imagine an "intersect" select function
> that only lets through references that were in *both* sub-iterators. In
> that case, your suggested "optimization" would be incorrect.

Makes sense. Thanks for explaining. I wonder if this deserves a
comment somewhere in code or commit message to make the situation
clear to a future developer who might think it a good idea to promote
the "optimization" to the merge iterator.

>>> +/*
>>> + * An iterator consisting of the union of the entries from iter0 and
>>> + * iter1. If there are entries common to the two sub-iterators, use
>>> + * the one from iter1. Each iterator must iterate over its entries in
>>> + * strcmp() order by refname for this to work.
>>> + *
>>> + * The new iterator takes ownership of its arguments and frees them
>>> + * when the iteration is over. As a convenience to callers, if iter0
>>> + * or iter1 is_empty_ref_iterator(), then abort that one immediately
>>> + * and return the other iterator directly, without wrapping it.
>>> + */
>>> +struct ref_iterator *overlay_ref_iterator_begin(struct ref_iterator *iter0,
>>> +                                               struct ref_iterator *iter1);
>>
>> When reading about the overlay iterator (both code and documentation),
>> my expectation was that iter0 would shadow iter1, not the other way
>> around as implemented here. Of course, that's entirely subjective, but
>> the generic names don't provide any useful clues as to which shadows
>> which. Perhaps giving them more meaningful names would help.
>
> That's a good idea. I also found myself having to refer back to the
> documentation to remind myself which was which.
>
> How about I rename them "back" and "front"? I will also reverse the
> order of the arguments.

I had a hard time coming up with better names, which is why I didn't
suggest any in my review. The best I had was "shadower" and
"shadowee", but they are far too similar (and long) for my tastes.
"back" and "front" feel a bit off, but are better than anything I
thought of.

As for the argument order, I can't explain why my expectation was that
it would be the other way around, so I certainly don't insist that
they be swapped.

> (But I will leave the names "iter0" and "iter1" in merge_ref_iterator,
> and also the constants like ITER_SELECT_0, because these don't
> necessarily have the interpretation of "back" and "front".)

Yes, those names are fine in the merge iterator.

>>> +/*
>>> + * Wrap iter0, only letting through the references whose names start
>>> + * with prefix. If trim is set, set iter->refname to the name of the
>>> + * reference with that many characters trimmed off the front;
>>> + * otherwise set it to the full refname. The new iterator takes over
>>> + * ownership of iter0 and frees it when iteration is over. It makes
>>> + * its own copy of prefix.
>>> + *
>>> + * As an convenience to callers, if prefix is the empty string and
>>> + * trim is zero, this function returns iter0 directly, without
>>> + * wrapping it.
>>> + */
>>> +struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
>>> +                                              const char *prefix,
>>> +                                              int trim);
>>
>> Minor: Similarly, when reading the code and documentation, I wondered
>> why this was named 'iter0' when no 'iter1' was in sight. Perhaps name
>> it simply 'iter'.
>
> I found that it got a little bit confusing, because the constructor and
> method implementations all use `iter` as a local variable. In particular
> in the constructor there would want to be an argument "iter" and also
> the local variable "iter" for the iterator being constructed, so a new
> name would otherwise have to be invented for one or the other. Between
> all the "iter" and "iter" and "iter->iter", I found that naming the
> sub-iterator "iter0" made things a little bit less bewildering.
>
> If you don't like that, we could name the embedded iterators something
> like "subiter", "subiter0", and "subiter1". But the current convention
> is a bit more succinct so I slightly prefer it.

I'd probably have called the sub-iterator "child" (which is the same
length as "iter0") or "wrapped". If you're not interested changing the
name in the code, perhaps in the header alone you could call it simply
"iter" or omit the name altogether, but this a very minor issue and
probably not worth much time or effort to address.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 12/13] dir_iterator: new API for iterating over a directory tree
  2016-05-30  7:55 ` [PATCH 12/13] dir_iterator: new API for iterating over a directory tree Michael Haggerty
@ 2016-06-01  0:12   ` David Turner
  2016-06-03 11:57     ` Michael Haggerty
  0 siblings, 1 reply; 27+ messages in thread
From: David Turner @ 2016-06-01  0:12 UTC (permalink / raw)
  To: Michael Haggerty, Junio C Hamano
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git

On Mon, 2016-05-30 at 09:55 +0200, Michael Haggerty wrote:
> +struct dir_iterator_level {
> +	int initialized;
> +
> +	DIR *dir;
> +
> +	/*
> +	 * The length of the directory part of refname at this level


"refname"? Isn't this was for directories?

> +			if (lstat(iter->base.path.buf, &iter
->base.st) < 0)
> +				continue; /* silently skip */

Should we warn on non-ENOENT errors here?

> +/*
> + * End the iteration before it has been exhausted. Free the
> reference

s/reference/directory/

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] refs: introduce an iterator interface
  2016-05-30  7:55 ` [PATCH 09/13] refs: introduce an iterator interface Michael Haggerty
                     ` (2 preceding siblings ...)
  2016-05-31  6:10   ` Junio C Hamano
@ 2016-06-02 10:08   ` Duy Nguyen
  2016-06-02 16:23     ` Michael Haggerty
  3 siblings, 1 reply; 27+ messages in thread
From: Duy Nguyen @ 2016-06-02 10:08 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, David Turner, Jeff King, Git Mailing List

On Mon, May 30, 2016 at 2:55 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> Currently, the API for iterating over references is via a family of
> for_each_ref()-type functions that invoke a callback function for each
> selected reference. All of these eventually call do_for_each_ref(),
> which knows how to do one thing: iterate in parallel through two
> ref_caches, one for loose and one for packed refs, giving loose
> references precedence over packed refs. This is rather complicated code,
> and is quite specialized to the files backend. It also requires callers
> to encapsulate their work into a callback function, which often means
> that they have to define and use a "cb_data" struct to manage their
> context.
>
> The current design is already bursting at the seams, and will become
> even more awkward in the upcoming world of multiple reference storage
> backends:
>
> * Per-worktree vs. shared references are currently handled via a kludge
>   in git_path() rather than iterating over each part of the reference
>   namespace separately and merging the results. This kludge will cease
>   to work when we have multiple reference storage backends.

Question from a refs user. Right now worktree.c:get_worktrees() peeks
directly to "$GIT_DIR/worktrees/xxx/HEAD" and parses the content
itself, something that I  promised to fix but never got around to do
it. Will we have an iterator to go through all worktrees' HEAD, or
will  there be an API to say "resolve ref HEAD from worktree XYZ"?
-- 
Duy

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] refs: introduce an iterator interface
  2016-06-02 10:08   ` Duy Nguyen
@ 2016-06-02 16:23     ` Michael Haggerty
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-06-02 16:23 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Junio C Hamano, David Turner, Jeff King, Git Mailing List

On 06/02/2016 12:08 PM, Duy Nguyen wrote:
> On Mon, May 30, 2016 at 2:55 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> Currently, the API for iterating over references is via a family of
>> for_each_ref()-type functions that invoke a callback function for each
>> selected reference. All of these eventually call do_for_each_ref(),
>> which knows how to do one thing: iterate in parallel through two
>> ref_caches, one for loose and one for packed refs, giving loose
>> references precedence over packed refs. This is rather complicated code,
>> and is quite specialized to the files backend. It also requires callers
>> to encapsulate their work into a callback function, which often means
>> that they have to define and use a "cb_data" struct to manage their
>> context.
>>
>> The current design is already bursting at the seams, and will become
>> even more awkward in the upcoming world of multiple reference storage
>> backends:
>>
>> * Per-worktree vs. shared references are currently handled via a kludge
>>   in git_path() rather than iterating over each part of the reference
>>   namespace separately and merging the results. This kludge will cease
>>   to work when we have multiple reference storage backends.
> 
> Question from a refs user. Right now worktree.c:get_worktrees() peeks
> directly to "$GIT_DIR/worktrees/xxx/HEAD" and parses the content
> itself, something that I  promised to fix but never got around to do
> it. Will we have an iterator to go through all worktrees' HEAD, or
> will  there be an API to say "resolve ref HEAD from worktree XYZ"?

My preference is that there is a way to say "create a ref_store object
representing the loose references stored physically under
"$GIT_DIR/worktrees/xxx". Then that ref_store could be asked to read its
`HEAD` (or iterate over all of the refs under that path or whatever).
You could even write `HEAD` through the same ref_store.

Michael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 09/13] refs: introduce an iterator interface
  2016-05-31 23:12       ` Eric Sunshine
@ 2016-06-03 11:48         ` Michael Haggerty
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-06-03 11:48 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Junio C Hamano, David Turner, Jeff King,
	Nguyễn Thái Ngọc Duy, Git List

On 06/01/2016 01:12 AM, Eric Sunshine wrote:
> On Tue, May 31, 2016 at 3:59 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> On 05/31/2016 07:29 AM, Eric Sunshine wrote:
>>> On Mon, May 30, 2016 at 3:55 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>>>> +struct ref_iterator *empty_ref_iterator_begin(void);
>>>> +
>>>> +/*
>>>> + * Return true iff ref_iterator is an empty_ref_iterator.
>>>> + */
>>>> +int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
>>>
>>> I can see that you used this function as an optimization or
>>> convenience in overlay_ref_iterator_begin(), but do you expect it to
>>> be generally useful otherwise? Is it worth publishing? Do you have
>>> other use-cases in mind?
>>
>> It is only "published" within the refs module, in refs/refs-internal.h.
>> This header file is not meant to be used by code outside of the refs module.
> 
> Ah, I forgot about that. In that case, it's probably less of an issue.
> 
>> My thinking was that it might be useful to other reference backends. The
>> function is pretty safe for anybody to call, though I admit that it is
>> not very general.
>>
>> I don't have a strong feeling either way. If nobody else chimes in, I'll
>> remove it from the header file as you suggested. We can always add it
>> back if somebody needs it.
> 
> I don't feel strongly about it either.

OK then, I'll leave it as-is.

>>> Also, can you explain why the merge iterator doesn't also perform the
>>> optimization/convenience of checking if one iterator is an empty
>>> iterator?
>>
>> That's because the merge iterator doesn't know what its select function
>> will do. For example, you could imagine an "intersect" select function
>> that only lets through references that were in *both* sub-iterators. In
>> that case, your suggested "optimization" would be incorrect.
> 
> Makes sense. Thanks for explaining. I wonder if this deserves a
> comment somewhere in code or commit message to make the situation
> clear to a future developer who might think it a good idea to promote
> the "optimization" to the merge iterator.

Good idea. I'll add a comment.

> [...]

Thanks,
Michael

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 12/13] dir_iterator: new API for iterating over a directory tree
  2016-06-01  0:12   ` David Turner
@ 2016-06-03 11:57     ` Michael Haggerty
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2016-06-03 11:57 UTC (permalink / raw)
  To: David Turner, Junio C Hamano
  Cc: Jeff King, Nguyễn Thái Ngọc Duy, git

On 06/01/2016 02:12 AM, David Turner wrote:
> On Mon, 2016-05-30 at 09:55 +0200, Michael Haggerty wrote:
>> +struct dir_iterator_level {
>> +	int initialized;
>> +
>> +	DIR *dir;
>> +
>> +	/*
>> +	 * The length of the directory part of refname at this level
> 
> 
> "refname"? Isn't this was for directories?

Thanks.

>> +			if (lstat(iter->base.path.buf, &iter
> ->base.st) < 0)
>> +				continue; /* silently skip */
> 
> Should we warn on non-ENOENT errors here?

Good idea. I'll add that.

>> +/*
>> + * End the iteration before it has been exhausted. Free the
>> reference
> 
> s/reference/directory/

Thanks. Will fix.

Michael

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2016-06-03 11:58 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-30  7:55 [PATCH 00/13] Reference iterators Michael Haggerty
2016-05-30  7:55 ` [PATCH 01/13] refs: remove unnecessary "extern" keywords Michael Haggerty
2016-05-30  7:55 ` [PATCH 02/13] do_for_each_ref(): move docstring to the header file Michael Haggerty
2016-05-30  7:55 ` [PATCH 03/13] refs: use name "prefix" consistently Michael Haggerty
2016-05-30  7:55 ` [PATCH 04/13] delete_refs(): add a flags argument Michael Haggerty
2016-05-30  7:55 ` [PATCH 05/13] remote rm: handle symbolic refs correctly Michael Haggerty
2016-05-30  7:55 ` [PATCH 06/13] get_ref_cache(): only create an instance if there is a submodule Michael Haggerty
2016-05-30  7:55 ` [PATCH 07/13] entry_resolves_to_object(): rename function from ref_resolves_to_object() Michael Haggerty
2016-05-30  7:55 ` [PATCH 08/13] ref_resolves_to_object(): new function Michael Haggerty
2016-05-30  7:55 ` [PATCH 09/13] refs: introduce an iterator interface Michael Haggerty
2016-05-30 15:22   ` Ramsay Jones
2016-05-30 16:57     ` Ramsay Jones
2016-05-31  1:16       ` Michael Haggerty
2016-05-31  5:29   ` Eric Sunshine
2016-05-31  7:59     ` Michael Haggerty
2016-05-31 23:12       ` Eric Sunshine
2016-06-03 11:48         ` Michael Haggerty
2016-05-31  6:10   ` Junio C Hamano
2016-05-31  8:45     ` Michael Haggerty
2016-06-02 10:08   ` Duy Nguyen
2016-06-02 16:23     ` Michael Haggerty
2016-05-30  7:55 ` [PATCH 10/13] do_for_each_ref(): reimplement using reference iteration Michael Haggerty
2016-05-30  7:55 ` [PATCH 11/13] for_each_reflog(): don't abort for bad references Michael Haggerty
2016-05-30  7:55 ` [PATCH 12/13] dir_iterator: new API for iterating over a directory tree Michael Haggerty
2016-06-01  0:12   ` David Turner
2016-06-03 11:57     ` Michael Haggerty
2016-05-30  7:55 ` [PATCH 13/13] for_each_reflog(): reimplement using iterators Michael Haggerty

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.