git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: mhagger@alum.mit.edu
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>,
	Drew Northup <drew.northup@maine.edu>,
	Jakub Narebski <jnareb@gmail.com>,
	Heiko Voigt <hvoigt@hvoigt.net>,
	Johan Herland <johan@herland.net>,
	Julian Phillips <julian@quantumfyre.co.uk>,
	Michael Haggerty <mhagger@alum.mit.edu>
Subject: [PATCH v2 48/51] refs: read loose references lazily
Date: Mon, 12 Dec 2011 06:38:55 +0100	[thread overview]
Message-ID: <1323668338-1764-49-git-send-email-mhagger@alum.mit.edu> (raw)
In-Reply-To: <1323668338-1764-1-git-send-email-mhagger@alum.mit.edu>

From: Michael Haggerty <mhagger@alum.mit.edu>

Instead of reading the whole directory of loose references the first
time any are needed, only read them on demand, one directory at a
time.

Use a new ref_entry flag value REF_DIR_INCOMPLETE to indicate that the
entry represents a REF_DIR that hasn't been read yet.  Whenever any
entries from such a directory are needed, read all of the loose
references from that directory.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c |  112 ++++++++++++++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 88 insertions(+), 24 deletions(-)

diff --git a/refs.c b/refs.c
index a85a8a5..3cd8e04 100644
--- a/refs.c
+++ b/refs.c
@@ -101,6 +101,12 @@ int check_refname_format(const char *refname, int flags)
 
 struct ref_entry;
 
+/*
+ * Information used (along with the information in ref_entry) to
+ * describe a single cached reference.  This data structure only
+ * occurs embedded in a union in struct ref_entry, and only when
+ * (ref_entry->flag & REF_DIR) is zero.
+ */
 struct ref_value {
 	unsigned char sha1[20];
 	unsigned char peeled[20];
@@ -108,6 +114,34 @@ struct ref_value {
 
 struct ref_cache;
 
+/*
+ * Information used (along with the information in ref_entry) to
+ * describe a level in the hierarchy of references.  This data
+ * structure only occurs embedded in a union in struct ref_entry, and
+ * only when (ref_entry.flag & REF_DIR) is nonzero.  In that case,
+ * (ref_entry.flag & REF_DIR) can take the following values:
+ *
+ *     REF_DIR_COMPLETE -- a directory of loose or packed references,
+ *         already read.
+ *
+ *     REF_DIR_INCOMPLETE -- a directory of loose references that
+ *         hasn't been read yet (nor has any of its subdirectories).
+ *
+ * Entries within a directory are stored within a growable array of
+ * pointers to ref_entries (entries, nr, alloc).  Entries 0 <= i <
+ * sorted are sorted by their component name in strcmp() order and the
+ * remaining entries are unsorted.
+ *
+ * Loose references are read lazily, one directory at a time.  When a
+ * directory of loose references is read, then all of the references
+ * in that directory are stored, and REF_DIR_INCOMPLETE stubs are
+ * created for any subdirectories, but the subdirectories themselves
+ * are not read.  The reading is triggered either by search_ref_dir()
+ * (called when single references are added or interrogated), by
+ * sort_ref_dir(), or by iteration over a subdirectory of references
+ * using one of the for_each_ref*() functions (which calls
+ * sort_ref_dir() for each subdirectory).
+ */
 struct ref_dir {
 	int nr, alloc;
 
@@ -125,19 +159,33 @@ struct ref_dir {
 
 /* ISSYMREF=0x01, ISPACKED=0x02, and ISBROKEN=0x04 are public interfaces */
 #define REF_KNOWS_PEELED 0x08
-#define REF_DIR 0x10
+
+/* If any of these bits are set, the entry represents a directory: */
+#define REF_DIR 0x30
+
+/* A directory that has already been fully read. */
+#define REF_DIR_COMPLETE 0x10
+
+/* A directory of loose references that has not yet been fully read. */
+#define REF_DIR_INCOMPLETE 0x20
 
 /*
  * A ref_entry represents either a reference or a "subdirectory" of
- * references.  Each directory in the reference namespace is
- * represented by a ref_entry with (flags & REF_DIR) set and
- * containing a subdir member that holds the entries in that
- * directory.  References are represented by a ref_entry with (flags &
- * REF_DIR) unset and a value member that describes the reference's
- * value.  The flag member is at the ref_entry level, but it is also
- * needed to interpret the contents of the value field (in other
- * words, a ref_value object is not very much use without the
- * enclosing ref_entry).
+ * references.
+ *
+ * Each directory in the reference namespace is represented by a
+ * ref_entry with (flags & REF_DIR) set and containing a subdir member
+ * that holds the entries in that directory that have been read so
+ * far.  If (flags & REF_DIR) == REF_DIR_INCOMPLETE, then the
+ * directory and its subdirectories haven't been read yet.
+ * REF_DIR_INCOMPLETE is only used for loose references.
+ *
+ * References are represented by a ref_entry with (flags & REF_DIR) ==
+ * 0 and a value member that describes the reference's value.  The
+ * flag member is at the ref_entry level, but it is also needed to
+ * interpret the contents of the value field (in other words, a
+ * ref_value object is not very much use without the enclosing
+ * ref_entry).
  *
  * Reference names cannot end with slash and directories' names are
  * always stored with a trailing slash (except for the top-level
@@ -229,19 +277,21 @@ static void clear_ref_dir(struct ref_dir *dir)
 	dir->entries = NULL;
 }
 
+static void read_loose_refs(struct ref_entry *direntry);
+
 /*
  * Create a struct ref_entry object for the specified dirname.
  * dirname is the name of the directory with a trailing slash (e.g.,
  * "refs/heads/") or "" for the top-level directory.
  */
 static struct ref_entry *create_dir_entry(struct ref_cache *ref_cache,
-					  const char *dirname)
+					  const char *dirname, int flag)
 {
 	struct ref_entry *direntry;
 	int len = strlen(dirname);
 	direntry = xcalloc(1, sizeof(struct ref_entry) + len + 1);
 	memcpy(direntry->name, dirname, len + 1);
-	direntry->flag = REF_DIR;
+	direntry->flag = flag;
 	direntry->u.subdir.ref_cache = ref_cache;
 	return direntry;
 }
@@ -266,6 +316,7 @@ static struct ref_entry *search_ref_dir(struct ref_entry *direntry, const char *
 	struct ref_dir *dir;
 
 	assert(direntry->flag & REF_DIR);
+	read_loose_refs(direntry);
 	dir = &direntry->u.subdir;
 	if (refname == NULL || !dir->nr)
 		return NULL;
@@ -322,8 +373,14 @@ static struct ref_entry *find_containing_direntry(struct ref_entry *direntry,
 				direntry = NULL;
 				break;
 			}
+			/*
+			 * If search_ref_dir() above didn't make the
+			 * entry spring into existence, then this must
+			 * not be an unread loose reference tree, so
+			 * the correct flag is REF_DIR_COMPLETE.
+			 */
 			entry = create_dir_entry(direntry->u.subdir.ref_cache,
-						 refname_copy);
+						 refname_copy, REF_DIR_COMPLETE);
 			add_entry(direntry, entry);
 		}
 		slash[1] = tmp;
@@ -399,6 +456,7 @@ static void sort_ref_dir(struct ref_entry *direntry)
 	struct ref_entry *last = NULL;
 	struct ref_dir *dir;
 	assert(direntry->flag & REF_DIR);
+	read_loose_refs(direntry);
 	dir = &direntry->u.subdir;
 	if (dir->sorted == dir->nr)
 		return; /* This directory is already sorted and de-duped */
@@ -449,8 +507,8 @@ static int do_for_each_ref_in_dir(struct ref_entry *direntry, int offset,
 	int i;
 	struct ref_dir *dir;
 	assert(direntry->flag & REF_DIR);
-	dir = &direntry->u.subdir;
 	sort_ref_dir(direntry);
+	dir = &direntry->u.subdir;
 	for (i = offset; i < dir->nr; i++) {
 		struct ref_entry *entry = dir->entries[i];
 		int retval;
@@ -477,10 +535,10 @@ static int do_for_each_ref_in_dirs(struct ref_entry *direntry1,
 
 	assert(direntry1->flag & REF_DIR);
 	assert(direntry2->flag & REF_DIR);
-	dir1 = &direntry1->u.subdir;
-	dir2 = &direntry2->u.subdir;
 	sort_ref_dir(direntry1);
 	sort_ref_dir(direntry2);
+	dir1 = &direntry1->u.subdir;
+	dir2 = &direntry2->u.subdir;
 	while (1) {
 		struct ref_entry *e1, *e2, *entry;
 		int cmp;
@@ -737,7 +795,7 @@ static void read_packed_refs(FILE *f, struct ref_entry *direntry)
 void add_extra_ref(const char *refname, const unsigned char *sha1, int flag)
 {
 	if (!extra_refs)
-		extra_refs = create_dir_entry(NULL, "");
+		extra_refs = create_dir_entry(NULL, "", REF_DIR_COMPLETE);
 	add_ref(extra_refs, create_ref_entry(refname, sha1, flag, 0));
 }
 
@@ -755,7 +813,7 @@ static struct ref_entry *get_packed_refs(struct ref_cache *refs)
 		const char *packed_refs_file;
 		FILE *f;
 
-		refs->packed = create_dir_entry(refs, "");
+		refs->packed = create_dir_entry(refs, "", REF_DIR_COMPLETE);
 		if (*refs->name)
 			packed_refs_file = git_path_submodule(refs->name, "packed-refs");
 		else
@@ -777,11 +835,14 @@ static void read_loose_refs(struct ref_entry *direntry)
 	DIR *d;
 	char *path;
 	char *dirname = direntry->name;
-	int dirnamelen = strlen(dirname);
+	int dirnamelen;
 	int pathlen;
 	struct ref_cache *refs;
 
 	assert(direntry->flag & REF_DIR);
+	if ((direntry->flag & REF_DIR) != REF_DIR_INCOMPLETE)
+		return;
+	dirnamelen = strlen(dirname);
 	assert(dirnamelen && direntry->name[dirnamelen - 1] == '/');
 	refs = direntry->u.subdir.ref_cache;
 	if (*refs->name)
@@ -819,11 +880,12 @@ static void read_loose_refs(struct ref_entry *direntry)
 			if (stat(refdir, &st) < 0)
 				continue;
 			if (S_ISDIR(st.st_mode)) {
+				struct ref_entry *subdirentry;
 				refname[dirnamelen + namelen] = '/';
 				refname[dirnamelen + namelen + 1] = '\0';
-				read_loose_refs(find_containing_direntry(
-								refs->loose,
-								refname, 1));
+				subdirentry = create_dir_entry(direntry->u.subdir.ref_cache,
+							       refname, REF_DIR_INCOMPLETE);
+				add_entry(direntry, subdirentry);
 				continue;
 			}
 			if (*refs->name) {
@@ -842,13 +904,15 @@ static void read_loose_refs(struct ref_entry *direntry)
 		free(refname);
 		closedir(d);
 	}
+	direntry->flag = REF_DIR_COMPLETE;
 }
 
 static struct ref_entry *get_loose_refs(struct ref_cache *refs)
 {
 	if (!refs->loose) {
-		refs->loose = create_dir_entry(refs, "");
-		read_loose_refs(find_containing_direntry(refs->loose, "refs/", 1));
+		refs->loose = create_dir_entry(refs, "", REF_DIR_COMPLETE);
+		add_entry(refs->loose,
+			  create_dir_entry(refs, "refs/", REF_DIR_INCOMPLETE));
 	}
 	return refs->loose;
 }
-- 
1.7.8

  parent reply	other threads:[~2011-12-12  5:41 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-12  5:38 [PATCH v2 00/51] ref-api-C and ref-api-D re-roll mhagger
2011-12-12  5:38 ` [PATCH v2 01/51] struct ref_entry: document name member mhagger
2011-12-12  5:38 ` [PATCH v2 02/51] refs: rename "refname" variables mhagger
2011-12-13  0:37   ` Junio C Hamano
2011-12-12  5:38 ` [PATCH v2 03/51] refs: rename parameters result -> sha1 mhagger
2011-12-12  5:38 ` [PATCH v2 04/51] clear_ref_array(): rename from free_ref_array() mhagger
2011-12-12  5:38 ` [PATCH v2 05/51] is_refname_available(): remove the "quiet" argument mhagger
2011-12-12  5:38 ` [PATCH v2 06/51] parse_ref_line(): add docstring mhagger
2011-12-12  5:38 ` [PATCH v2 07/51] add_ref(): " mhagger
2011-12-12  5:38 ` [PATCH v2 08/51] is_dup_ref(): extract function from sort_ref_array() mhagger
2011-12-12  8:33   ` Jeff King
2011-12-12 11:44     ` Michael Haggerty
2011-12-12 17:14       ` Junio C Hamano
2011-12-12 22:33   ` Junio C Hamano
2011-12-13  4:35     ` Michael Haggerty
2011-12-13  5:00       ` Michael Haggerty
2011-12-12  5:38 ` [PATCH v2 09/51] refs: change signatures of get_packed_refs() and get_loose_refs() mhagger
2011-12-12  5:38 ` [PATCH v2 10/51] get_ref_dir(): change signature mhagger
2011-12-12  5:38 ` [PATCH v2 11/51] resolve_gitlink_ref(): improve docstring mhagger
2011-12-12  5:38 ` [PATCH v2 12/51] Pass a (ref_cache *) to the resolve_gitlink_*() helper functions mhagger
2011-12-12  5:38 ` [PATCH v2 13/51] resolve_gitlink_ref_recursive(): change to work with struct ref_cache mhagger
2011-12-12  5:38 ` [PATCH v2 14/51] repack_without_ref(): remove temporary mhagger
2011-12-12  5:38 ` [PATCH v2 15/51] create_ref_entry(): extract function from add_ref() mhagger
2011-12-12  5:38 ` [PATCH v2 16/51] add_ref(): take a (struct ref_entry *) parameter mhagger
2011-12-12  5:38 ` [PATCH v2 17/51] do_for_each_ref(): correctly terminate while processesing extra_refs mhagger
2011-12-12 22:41   ` Junio C Hamano
2011-12-12  5:38 ` [PATCH v2 18/51] do_for_each_ref_in_array(): new function mhagger
2011-12-12  5:38 ` [PATCH v2 19/51] do_for_each_ref_in_arrays(): " mhagger
2011-12-12  5:38 ` [PATCH v2 20/51] repack_without_ref(): reimplement using do_for_each_ref_in_array() mhagger
2011-12-12 22:44   ` Junio C Hamano
2011-12-12  5:38 ` [PATCH v2 21/51] names_conflict(): new function, extracted from is_refname_available() mhagger
2011-12-12  5:38 ` [PATCH v2 22/51] names_conflict(): simplify implementation mhagger
2011-12-12  5:38 ` [PATCH v2 23/51] is_refname_available(): reimplement using do_for_each_ref_in_array() mhagger
2011-12-12  5:38 ` [PATCH v2 24/51] refs.c: reorder definitions more logically mhagger
2011-12-12  5:38 ` [PATCH v2 25/51] free_ref_entry(): new function mhagger
2011-12-12  5:38 ` [PATCH v2 26/51] check_refname_component(): return 0 for zero-length components mhagger
2011-12-12  5:38 ` [PATCH v2 27/51] struct ref_entry: nest the value part in a union mhagger
2011-12-12  5:38 ` [PATCH v2 28/51] refs.c: rename ref_array -> ref_dir mhagger
2011-12-13  0:45   ` Junio C Hamano
2011-12-13  5:43     ` Michael Haggerty
2011-12-13  6:37       ` Junio C Hamano
2011-12-13 19:12         ` Michael Haggerty
2011-12-13 19:17           ` Junio C Hamano
2011-12-13 22:13           ` Michael Haggerty
2011-12-13 23:24             ` Junio C Hamano
2011-12-14  0:19               ` Junio C Hamano
2011-12-14  2:33                 ` Jeff King
2011-12-15  8:19                   ` Michael Haggerty
2011-12-15  8:37                     ` Jeff King
2012-01-17 15:07               ` Michael Haggerty
2012-02-10 14:51                 ` Michael Haggerty
2012-02-10 20:44                   ` Jeff King
2012-02-10 21:17                     ` Junio C Hamano
2012-02-11  6:33                       ` Michael Haggerty
2011-12-12  5:38 ` [PATCH v2 29/51] refs: store references hierarchically mhagger
2011-12-12  5:38 ` [PATCH v2 30/51] sort_ref_dir(): do not sort if already sorted mhagger
2011-12-12 23:26   ` Junio C Hamano
2011-12-12  5:38 ` [PATCH v2 31/51] refs: sort ref_dirs lazily mhagger
2011-12-12  5:38 ` [PATCH v2 32/51] do_for_each_ref(): only iterate over the subtree that was requested mhagger
2011-12-12  5:38 ` [PATCH v2 33/51] get_ref_dir(): keep track of the current ref_dir mhagger
2011-12-12  5:38 ` [PATCH v2 34/51] refs: wrap top-level ref_dirs in ref_entries mhagger
2011-12-12  5:38 ` [PATCH v2 35/51] get_packed_refs(): return (ref_entry *) instead of (ref_dir *) mhagger
2011-12-12  5:38 ` [PATCH v2 36/51] get_loose_refs(): " mhagger
2011-12-12  5:38 ` [PATCH v2 37/51] is_refname_available(): take " mhagger
2011-12-12  5:38 ` [PATCH v2 38/51] find_ref(): " mhagger
2011-12-12  5:38 ` [PATCH v2 39/51] read_packed_refs(): " mhagger
2011-12-12  5:38 ` [PATCH v2 40/51] add_ref(): " mhagger
2011-12-12  5:38 ` [PATCH v2 41/51] find_containing_direntry(): use " mhagger
2011-12-12  5:38 ` [PATCH v2 42/51] search_ref_dir(): take " mhagger
2011-12-12  5:38 ` [PATCH v2 43/51] add_entry(): " mhagger
2011-12-12  5:38 ` [PATCH v2 44/51] do_for_each_ref_in_dir*(): " mhagger
2011-12-12  5:38 ` [PATCH v2 45/51] sort_ref_dir(): " mhagger
2011-12-12  5:38 ` [PATCH v2 46/51] struct ref_dir: store a reference to the enclosing ref_cache mhagger
2011-12-12  5:38 ` [PATCH v2 47/51] read_loose_refs(): take a (ref_entry *) as argument mhagger
2011-12-12  5:38 ` mhagger [this message]
2011-12-12  5:38 ` [PATCH v2 49/51] is_refname_available(): query only possibly-conflicting references mhagger
2011-12-12  5:38 ` [PATCH v2 50/51] read_packed_refs(): keep track of the directory being worked in mhagger
2011-12-12  5:38 ` [PATCH v2 51/51] repack_without_ref(): call clear_packed_ref_cache() mhagger
2011-12-12  8:24 ` [PATCH v2 00/51] ref-api-C and ref-api-D re-roll Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1323668338-1764-49-git-send-email-mhagger@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=drew.northup@maine.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hvoigt@hvoigt.net \
    --cc=jnareb@gmail.com \
    --cc=johan@herland.net \
    --cc=julian@quantumfyre.co.uk \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).