git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/20] Read `packed-refs` using mmap()
@ 2017-09-13 17:15 Michael Haggerty
  2017-09-13 17:15 ` [PATCH 01/20] ref_iterator: keep track of whether the iterator output is ordered Michael Haggerty
                   ` (22 more replies)
  0 siblings, 23 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

Previously, any time we wanted to read even a single reference from
the `packed-refs` file, we parsed the whole file and stored it in an
elaborate structure in memory called a `ref_cache`. Subsequent
reference lookups or iterations over some or all of the references
could be done by reading from the `ref_cache`.

But for large `packed-refs` files, the time needed to parse the file,
and the memory needed to cache its contents, can be quite significant.
This is partly because building the cache costs lots of little memory
allocations. (And lest you think that most Git commands can be
executed by reading at most a couple of loose references, remember
that almost any command that reads objects has to look for replace
references (unless they are turned off in the config), and *that*
necessarily entails reading packed references.)

Following lots of work to extract the `packed_ref_store` into a
separate module and decouple it from the `files_ref_store`, it is now
possible to fundamentally change how the `packed-refs` file is read.

* `mmap()` the whole file rather than `read()`ing it.

* Instead of parsing the complete file at once into a `ref_cache`,
  parse the references out of the file contents on demand.

* Use a binary search to find, very quickly, the reference or group of
  references that needs to be read. Parse *only* those references out
  of the file contents, without creating in-memory data structures at
  all.

In rare cases this change might force parts of the `packed-refs` file
to be read multiple times, but that cost is far outweighed by the fact
that usually most of the `packed-refs` file doesn't have to be read
*at all*.

Note that the binary search optimization requires the `packed-refs`
file to be sorted by reference name. We have always written them
sorted, but just in case there are clients that don't, we assume the
file is unsorted unless its header lists a `sorted` trait. From now
on, we write the file with that trait. If the file is not sorted, it
is sorted on the fly in memory.

For a repository with only a couple thousand references and a warm
disk cache, this change doesn't make a very significant difference.
But for repositories with very large numbers of references, the
difference start to be significant:

A repository with 54k references (warm cache):

                                  git 2.13.1         this branch
git for-each-ref                      464 ms              452 ms
git for-each-ref (no output)           66 ms               47 ms
git for-each-ref (0.6% of refs)        47 ms                9 ms
git for-each-ref (0.6%, no output)     41 ms                2 ms
git rev-parse                          32 ms                2 ms

A repository (admittedly insane, but a real-life example) with 60M
references (warm cache):

                                  git 2.13.1         this branch
git for-each-ref (no output)        84000 ms            61000 ms
git rev-parse                       40000 ms                2 ms

This branch applies on top of mh/packed-ref-transactions. It can also
be obtained from my git fork [1] as branch `mmap-packed-refs`.

Michael

[1] https://github.com/mhagger/git

Jeff King (1):
  prefix_ref_iterator: break when we leave the prefix

Michael Haggerty (19):
  ref_iterator: keep track of whether the iterator output is ordered
  packed_ref_cache: add a backlink to the associated `packed_ref_store`
  die_unterminated_line(), die_invalid_line(): new functions
  read_packed_refs(): use mmap to read the `packed-refs` file
  read_packed_refs(): only check for a header at the top of the file
  read_packed_refs(): make parsing of the header line more robust
  read_packed_refs(): read references with minimal copying
  packed_ref_cache: remember the file-wide peeling state
  mmapped_ref_iterator: add iterator over a packed-refs file
  mmapped_ref_iterator_advance(): no peeled value for broken refs
  packed_ref_cache: keep the `packed-refs` file open and mmapped
  read_packed_refs(): ensure that references are ordered when read
  packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator`
  packed_read_raw_ref(): read the reference from the mmapped buffer
  ref_store: implement `refs_peel_ref()` generically
  packed_ref_store: get rid of the `ref_cache` entirely
  ref_cache: remove support for storing peeled values
  mmapped_ref_iterator: inline into `packed_ref_iterator`
  packed-backend.c: rename a bunch of things and update comments

 refs.c                |  22 +-
 refs/files-backend.c  |  54 +--
 refs/iterator.c       |  47 ++-
 refs/packed-backend.c | 896 +++++++++++++++++++++++++++++++++++++-------------
 refs/ref-cache.c      |  44 +--
 refs/ref-cache.h      |  35 +-
 refs/refs-internal.h  |  26 +-
 7 files changed, 761 insertions(+), 363 deletions(-)

-- 
2.14.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 01/20] ref_iterator: keep track of whether the iterator output is ordered
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
@ 2017-09-13 17:15 ` Michael Haggerty
  2017-09-13 17:15 ` [PATCH 02/20] prefix_ref_iterator: break when we leave the prefix Michael Haggerty
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

References are iterated over in order by refname, but reflogs are not.
Some consumers of reference iteration care about the difference. Teach
each `ref_iterator` to keep track of whether its output is ordered.

`overlay_ref_iterator` is one of the picky consumers. Add a sanity
check in `overlay_ref_iterator_begin()` to verify that its inputs are
ordered.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c                |  4 ++++
 refs/files-backend.c  | 16 +++++++++-------
 refs/iterator.c       | 15 ++++++++++-----
 refs/packed-backend.c |  2 +-
 refs/ref-cache.c      |  2 +-
 refs/ref-cache.h      |  3 ++-
 refs/refs-internal.h  | 23 +++++++++++++++++++----
 7 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/refs.c b/refs.c
index b0106b8162..101c107ee8 100644
--- a/refs.c
+++ b/refs.c
@@ -1309,6 +1309,10 @@ struct ref_iterator *refs_ref_iterator_begin(
 	if (trim)
 		iter = prefix_ref_iterator_begin(iter, "", trim);
 
+	/* Sanity check for subclasses: */
+	if (!iter->ordered)
+		BUG("reference iterator is not ordered");
+
 	return iter;
 }
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 961424a4ea..35648c89fc 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -762,7 +762,7 @@ static struct ref_iterator *files_ref_iterator_begin(
 		const char *prefix, unsigned int flags)
 {
 	struct files_ref_store *refs;
-	struct ref_iterator *loose_iter, *packed_iter;
+	struct ref_iterator *loose_iter, *packed_iter, *overlay_iter;
 	struct files_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
 	unsigned int required_flags = REF_STORE_READ;
@@ -772,10 +772,6 @@ static struct ref_iterator *files_ref_iterator_begin(
 
 	refs = files_downcast(ref_store, required_flags, "ref_iterator_begin");
 
-	iter = xcalloc(1, sizeof(*iter));
-	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable);
-
 	/*
 	 * We must make sure that all loose refs are read before
 	 * accessing the packed-refs file; this avoids a race
@@ -811,7 +807,13 @@ static struct ref_iterator *files_ref_iterator_begin(
 			refs->packed_ref_store, prefix, 0,
 			DO_FOR_EACH_INCLUDE_BROKEN);
 
-	iter->iter0 = overlay_ref_iterator_begin(loose_iter, packed_iter);
+	overlay_iter = overlay_ref_iterator_begin(loose_iter, packed_iter);
+
+	iter = xcalloc(1, sizeof(*iter));
+	ref_iterator = &iter->base;
+	base_ref_iterator_init(ref_iterator, &files_ref_iterator_vtable,
+			       overlay_iter->ordered);
+	iter->iter0 = overlay_iter;
 	iter->flags = flags;
 
 	return ref_iterator;
@@ -2084,7 +2086,7 @@ static struct ref_iterator *files_reflog_iterator_begin(struct ref_store *ref_st
 	struct ref_iterator *ref_iterator = &iter->base;
 	struct strbuf sb = STRBUF_INIT;
 
-	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable);
+	base_ref_iterator_init(ref_iterator, &files_reflog_iterator_vtable, 0);
 	files_reflog_path(refs, &sb, NULL);
 	iter->dir_iterator = dir_iterator_begin(sb.buf);
 	iter->ref_store = ref_store;
diff --git a/refs/iterator.c b/refs/iterator.c
index 4cf449ef66..c475360f0a 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -25,9 +25,11 @@ int ref_iterator_abort(struct ref_iterator *ref_iterator)
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
-			    struct ref_iterator_vtable *vtable)
+			    struct ref_iterator_vtable *vtable,
+			    int ordered)
 {
 	iter->vtable = vtable;
+	iter->ordered = !!ordered;
 	iter->refname = NULL;
 	iter->oid = NULL;
 	iter->flags = 0;
@@ -72,7 +74,7 @@ struct ref_iterator *empty_ref_iterator_begin(void)
 	struct empty_ref_iterator *iter = xcalloc(1, sizeof(*iter));
 	struct ref_iterator *ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable);
+	base_ref_iterator_init(ref_iterator, &empty_ref_iterator_vtable, 1);
 	return ref_iterator;
 }
 
@@ -205,6 +207,7 @@ static struct ref_iterator_vtable merge_ref_iterator_vtable = {
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
+		int ordered,
 		struct ref_iterator *iter0, struct ref_iterator *iter1,
 		ref_iterator_select_fn *select, void *cb_data)
 {
@@ -219,7 +222,7 @@ struct ref_iterator *merge_ref_iterator_begin(
 	 * references through only if they exist in both iterators.
 	 */
 
-	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable);
+	base_ref_iterator_init(ref_iterator, &merge_ref_iterator_vtable, ordered);
 	iter->iter0 = iter0;
 	iter->iter1 = iter1;
 	iter->select = select;
@@ -268,9 +271,11 @@ struct ref_iterator *overlay_ref_iterator_begin(
 	} else if (is_empty_ref_iterator(back)) {
 		ref_iterator_abort(back);
 		return front;
+	} else if (!front->ordered || !back->ordered) {
+		BUG("overlay_ref_iterator requires ordered inputs");
 	}
 
-	return merge_ref_iterator_begin(front, back,
+	return merge_ref_iterator_begin(1, front, back,
 					overlay_iterator_select, NULL);
 }
 
@@ -361,7 +366,7 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 	iter = xcalloc(1, sizeof(*iter));
 	ref_iterator = &iter->base;
 
-	base_ref_iterator_init(ref_iterator, &prefix_ref_iterator_vtable);
+	base_ref_iterator_init(ref_iterator, &prefix_ref_iterator_vtable, iter0->ordered);
 
 	iter->iter0 = iter0;
 	iter->prefix = xstrdup(prefix);
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 0279aeceea..e411501871 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -437,7 +437,7 @@ static struct ref_iterator *packed_ref_iterator_begin(
 
 	iter = xcalloc(1, sizeof(*iter));
 	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable);
+	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
 
 	/*
 	 * Note that get_packed_ref_cache() internally checks whether
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 76bb723c86..54dfb5218c 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -574,7 +574,7 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 
 	iter = xcalloc(1, sizeof(*iter));
 	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
+	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable, 1);
 	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
 
 	iter->levels_nr = 1;
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index 794f000fd3..a082bfb06c 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -245,7 +245,8 @@ struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname);
  * Start iterating over references in `cache`. If `prefix` is
  * specified, only include references whose names start with that
  * prefix. If `prime_dir` is true, then fill any incomplete
- * directories before beginning the iteration.
+ * directories before beginning the iteration. The output is ordered
+ * by refname.
  */
 struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 					      const char *prefix,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index d7d344de73..d7f233beba 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -329,6 +329,13 @@ int refs_rename_ref_available(struct ref_store *refs,
  */
 struct ref_iterator {
 	struct ref_iterator_vtable *vtable;
+
+	/*
+	 * Does this `ref_iterator` iterate over references in order
+	 * by refname?
+	 */
+	unsigned int ordered : 1;
+
 	const char *refname;
 	const struct object_id *oid;
 	unsigned int flags;
@@ -374,7 +381,7 @@ int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
  * which the refname begins with prefix. If trim is non-zero, then
  * trim that many characters off the beginning of each refname. flags
  * can be DO_FOR_EACH_INCLUDE_BROKEN to include broken references in
- * the iteration.
+ * the iteration. The output is ordered by refname.
  */
 struct ref_iterator *refs_ref_iterator_begin(
 		struct ref_store *refs,
@@ -400,9 +407,11 @@ typedef enum iterator_selection ref_iterator_select_fn(
  * Iterate over the entries from iter0 and iter1, with the values
  * interleaved as directed by the select function. The iterator takes
  * ownership of iter0 and iter1 and frees them when the iteration is
- * over.
+ * over. A derived class should set `ordered` to 1 or 0 based on
+ * whether it generates its output in order by reference name.
  */
 struct ref_iterator *merge_ref_iterator_begin(
+		int ordered,
 		struct ref_iterator *iter0, struct ref_iterator *iter1,
 		ref_iterator_select_fn *select, void *cb_data);
 
@@ -431,6 +440,8 @@ struct ref_iterator *overlay_ref_iterator_begin(
  * As an convenience to callers, if prefix is the empty string and
  * trim is zero, this function returns iter0 directly, without
  * wrapping it.
+ *
+ * The resulting ref_iterator is ordered if iter0 is.
  */
 struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 					       const char *prefix,
@@ -441,11 +452,14 @@ struct ref_iterator *prefix_ref_iterator_begin(struct ref_iterator *iter0,
 /*
  * Base class constructor for ref_iterators. Initialize the
  * ref_iterator part of iter, setting its vtable pointer as specified.
+ * `ordered` should be set to 1 if the iterator will iterate over
+ * references in order by refname; otherwise it should be set to 0.
  * This is meant to be called only by the initializers of derived
  * classes.
  */
 void base_ref_iterator_init(struct ref_iterator *iter,
-			    struct ref_iterator_vtable *vtable);
+			    struct ref_iterator_vtable *vtable,
+			    int ordered);
 
 /*
  * Base class destructor for ref_iterators. Destroy the ref_iterator
@@ -564,7 +578,8 @@ typedef int rename_ref_fn(struct ref_store *ref_store,
  * Iterate over the references in `ref_store` whose names start with
  * `prefix`. `prefix` is matched as a literal string, without regard
  * for path separators. If prefix is NULL or the empty string, iterate
- * over all references in `ref_store`.
+ * over all references in `ref_store`. The output is ordered by
+ * refname.
  */
 typedef struct ref_iterator *ref_iterator_begin_fn(
 		struct ref_store *ref_store,
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 02/20] prefix_ref_iterator: break when we leave the prefix
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
  2017-09-13 17:15 ` [PATCH 01/20] ref_iterator: keep track of whether the iterator output is ordered Michael Haggerty
@ 2017-09-13 17:15 ` Michael Haggerty
  2017-09-13 17:15 ` [PATCH 03/20] packed_ref_cache: add a backlink to the associated `packed_ref_store` Michael Haggerty
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

From: Jeff King <peff@peff.net>

If the underlying iterator is ordered, then `prefix_ref_iterator` can
stop as soon as it sees a refname that comes after the prefix. This
will rarely make a big difference now, because `ref_cache_iterator`
only iterates over the directory containing the prefix (and usually
the prefix will span a whole directory anyway). But if *hint, hint* a
future reference backend doesn't itself know where to stop the
iteration, then this optimization will be a big win.

Note that there is no guarantee that the underlying iterator doesn't
include output preceding the prefix, so we have to skip over any
unwanted references before we get to the ones that we want.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
Note that the implementation of `compare_prefix()` here is a bit
different than Peff's original version.

 refs/iterator.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/refs/iterator.c b/refs/iterator.c
index c475360f0a..bd35da4e62 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -287,6 +287,20 @@ struct prefix_ref_iterator {
 	int trim;
 };
 
+/* Return -1, 0, 1 if refname is before, inside, or after the prefix. */
+static int compare_prefix(const char *refname, const char *prefix)
+{
+	while (*prefix) {
+		if (*refname != *prefix)
+			return ((unsigned char)*refname < (unsigned char)*prefix) ? -1 : +1;
+
+		refname++;
+		prefix++;
+	}
+
+	return 0;
+}
+
 static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 {
 	struct prefix_ref_iterator *iter =
@@ -294,9 +308,25 @@ static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	int ok;
 
 	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
-		if (!starts_with(iter->iter0->refname, iter->prefix))
+		int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
+
+		if (cmp < 0)
 			continue;
 
+		if (cmp > 0) {
+			/*
+			 * If the source iterator is ordered, then we
+			 * can stop the iteration as soon as we see a
+			 * refname that comes after the prefix:
+			 */
+			if (iter->iter0->ordered) {
+				ok = ref_iterator_abort(iter->iter0);
+				break;
+			} else {
+				continue;
+			}
+		}
+
 		if (iter->trim) {
 			/*
 			 * It is nonsense to trim off characters that
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 03/20] packed_ref_cache: add a backlink to the associated `packed_ref_store`
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
  2017-09-13 17:15 ` [PATCH 01/20] ref_iterator: keep track of whether the iterator output is ordered Michael Haggerty
  2017-09-13 17:15 ` [PATCH 02/20] prefix_ref_iterator: break when we leave the prefix Michael Haggerty
@ 2017-09-13 17:15 ` Michael Haggerty
  2017-09-13 17:15 ` [PATCH 04/20] die_unterminated_line(), die_invalid_line(): new functions Michael Haggerty
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

It will prove convenient in upcoming patches.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index e411501871..a3d9210cb0 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -7,7 +7,15 @@
 #include "../iterator.h"
 #include "../lockfile.h"
 
+struct packed_ref_store;
+
 struct packed_ref_cache {
+	/*
+	 * A back-pointer to the packed_ref_store with which this
+	 * cache is associated:
+	 */
+	struct packed_ref_store *refs;
+
 	struct ref_cache *cache;
 
 	/*
@@ -154,7 +162,7 @@ static const char *parse_ref_line(struct strbuf *line, struct object_id *oid)
 }
 
 /*
- * Read from `packed_refs_file` into a newly-allocated
+ * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
  * have its reference count incremented.
  *
@@ -182,7 +190,7 @@ static const char *parse_ref_line(struct strbuf *line, struct object_id *oid)
  *      compatibility with older clients, but we do not require it
  *      (i.e., "peeled" is a no-op if "fully-peeled" is set).
  */
-static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file)
+static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
 	FILE *f;
 	struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
@@ -191,11 +199,12 @@ static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file)
 	enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
 	struct ref_dir *dir;
 
+	packed_refs->refs = refs;
 	acquire_packed_ref_cache(packed_refs);
 	packed_refs->cache = create_ref_cache(NULL, NULL);
 	packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
 
-	f = fopen(packed_refs_file, "r");
+	f = fopen(refs->path, "r");
 	if (!f) {
 		if (errno == ENOENT) {
 			/*
@@ -205,7 +214,7 @@ static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file)
 			 */
 			return packed_refs;
 		} else {
-			die_errno("couldn't read %s", packed_refs_file);
+			die_errno("couldn't read %s", refs->path);
 		}
 	}
 
@@ -218,7 +227,7 @@ static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file)
 		const char *traits;
 
 		if (!line.len || line.buf[line.len - 1] != '\n')
-			die("unterminated line in %s: %s", packed_refs_file, line.buf);
+			die("unterminated line in %s: %s", refs->path, line.buf);
 
 		if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
 			if (strstr(traits, " fully-peeled "))
@@ -258,7 +267,7 @@ static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file)
 			last->flag |= REF_KNOWS_PEELED;
 		} else {
 			strbuf_setlen(&line, line.len - 1);
-			die("unexpected line in %s: %s", packed_refs_file, line.buf);
+			die("unexpected line in %s: %s", refs->path, line.buf);
 		}
 	}
 
@@ -293,7 +302,7 @@ static struct packed_ref_cache *get_packed_ref_cache(struct packed_ref_store *re
 		validate_packed_ref_cache(refs);
 
 	if (!refs->cache)
-		refs->cache = read_packed_refs(refs->path);
+		refs->cache = read_packed_refs(refs);
 
 	return refs->cache;
 }
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 04/20] die_unterminated_line(), die_invalid_line(): new functions
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (2 preceding siblings ...)
  2017-09-13 17:15 ` [PATCH 03/20] packed_ref_cache: add a backlink to the associated `packed_ref_store` Michael Haggerty
@ 2017-09-13 17:15 ` Michael Haggerty
  2017-09-13 17:15 ` [PATCH 05/20] read_packed_refs(): use mmap to read the `packed-refs` file Michael Haggerty
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

Extract some helper functions for reporting errors. While we're at it,
prevent them from spewing unlimited output to the terminal. These
functions will soon have more callers.

These functions accept the problematic line as a `(ptr, len)` pair
rather than a NUL-terminated string, and `die_invalid_line()` checks
for an EOL itself, because these calling conventions will be
convenient for future callers. (Efficiency is not a concern here
because these functions are only ever called if the `packed-refs` file
is corrupt.)

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 28 +++++++++++++++++++++++++---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a3d9210cb0..5c50c223ef 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -161,6 +161,29 @@ static const char *parse_ref_line(struct strbuf *line, struct object_id *oid)
 	return ref;
 }
 
+static NORETURN void die_unterminated_line(const char *path,
+					   const char *p, size_t len)
+{
+	if (len < 80)
+		die("unterminated line in %s: %.*s", path, (int)len, p);
+	else
+		die("unterminated line in %s: %.75s...", path, p);
+}
+
+static NORETURN void die_invalid_line(const char *path,
+				      const char *p, size_t len)
+{
+	const char *eol = memchr(p, '\n', len);
+
+	if (!eol)
+		die_unterminated_line(path, p, len);
+	else if (eol - p < 80)
+		die("unexpected line in %s: %.*s", path, (int)(eol - p), p);
+	else
+		die("unexpected line in %s: %.75s...", path, p);
+
+}
+
 /*
  * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
@@ -227,7 +250,7 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 		const char *traits;
 
 		if (!line.len || line.buf[line.len - 1] != '\n')
-			die("unterminated line in %s: %s", refs->path, line.buf);
+			die_unterminated_line(refs->path, line.buf, line.len);
 
 		if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
 			if (strstr(traits, " fully-peeled "))
@@ -266,8 +289,7 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 			 */
 			last->flag |= REF_KNOWS_PEELED;
 		} else {
-			strbuf_setlen(&line, line.len - 1);
-			die("unexpected line in %s: %s", refs->path, line.buf);
+			die_invalid_line(refs->path, line.buf, line.len);
 		}
 	}
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 05/20] read_packed_refs(): use mmap to read the `packed-refs` file
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (3 preceding siblings ...)
  2017-09-13 17:15 ` [PATCH 04/20] die_unterminated_line(), die_invalid_line(): new functions Michael Haggerty
@ 2017-09-13 17:15 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 06/20] read_packed_refs(): only check for a header at the top of the file Michael Haggerty
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:15 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

It's still done in a pretty stupid way, involving more data copying
than necessary. That will improve in future commits.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 42 ++++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 5c50c223ef..154abbd83a 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -215,8 +215,12 @@ static NORETURN void die_invalid_line(const char *path,
  */
 static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
-	FILE *f;
 	struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
+	int fd;
+	struct stat st;
+	size_t size;
+	char *buf;
+	const char *pos, *eol, *eof;
 	struct ref_entry *last = NULL;
 	struct strbuf line = STRBUF_INIT;
 	enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
@@ -227,8 +231,8 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 	packed_refs->cache = create_ref_cache(NULL, NULL);
 	packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
 
-	f = fopen(refs->path, "r");
-	if (!f) {
+	fd = open(refs->path, O_RDONLY);
+	if (fd < 0) {
 		if (errno == ENOENT) {
 			/*
 			 * This is OK; it just means that no
@@ -241,16 +245,27 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 		}
 	}
 
-	stat_validity_update(&packed_refs->validity, fileno(f));
+	stat_validity_update(&packed_refs->validity, fd);
+
+	if (fstat(fd, &st) < 0)
+		die_errno("couldn't stat %s", refs->path);
+
+	size = xsize_t(st.st_size);
+	buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
+	pos = buf;
+	eof = buf + size;
 
 	dir = get_ref_dir(packed_refs->cache->root);
-	while (strbuf_getwholeline(&line, f, '\n') != EOF) {
+	while (pos < eof) {
 		struct object_id oid;
 		const char *refname;
 		const char *traits;
 
-		if (!line.len || line.buf[line.len - 1] != '\n')
-			die_unterminated_line(refs->path, line.buf, line.len);
+		eol = memchr(pos, '\n', eof - pos);
+		if (!eol)
+			die_unterminated_line(refs->path, pos, eof - pos);
+
+		strbuf_add(&line, pos, eol + 1 - pos);
 
 		if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
 			if (strstr(traits, " fully-peeled "))
@@ -258,7 +273,7 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 			else if (strstr(traits, " peeled "))
 				peeled = PEELED_TAGS;
 			/* perhaps other traits later as well */
-			continue;
+			goto next_line;
 		}
 
 		refname = parse_ref_line(&line, &oid);
@@ -291,11 +306,18 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 		} else {
 			die_invalid_line(refs->path, line.buf, line.len);
 		}
+
+	next_line:
+		/* The "+ 1" is for the LF character. */
+		pos = eol + 1;
+		strbuf_reset(&line);
 	}
 
-	fclose(f);
-	strbuf_release(&line);
+	if (munmap(buf, size))
+		die_errno("error ummapping packed-refs file");
+	close(fd);
 
+	strbuf_release(&line);
 	return packed_refs;
 }
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 06/20] read_packed_refs(): only check for a header at the top of the file
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (4 preceding siblings ...)
  2017-09-13 17:15 ` [PATCH 05/20] read_packed_refs(): use mmap to read the `packed-refs` file Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 07/20] read_packed_refs(): make parsing of the header line more robust Michael Haggerty
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

This tightens up the parsing a bit; previously, stray header-looking
lines would have been processed.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 35 ++++++++++++++++++++++++-----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 154abbd83a..141f02b9c8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -255,11 +255,34 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 	pos = buf;
 	eof = buf + size;
 
+	/* If the file has a header line, process it: */
+	if (pos < eof && *pos == '#') {
+		const char *traits;
+
+		eol = memchr(pos, '\n', eof - pos);
+		if (!eol)
+			die_unterminated_line(refs->path, pos, eof - pos);
+
+		strbuf_add(&line, pos, eol + 1 - pos);
+
+		if (!skip_prefix(line.buf, "# pack-refs with:", &traits))
+			die_invalid_line(refs->path, pos, eof - pos);
+
+		if (strstr(traits, " fully-peeled "))
+			peeled = PEELED_FULLY;
+		else if (strstr(traits, " peeled "))
+			peeled = PEELED_TAGS;
+		/* perhaps other traits later as well */
+
+		/* The "+ 1" is for the LF character. */
+		pos = eol + 1;
+		strbuf_reset(&line);
+	}
+
 	dir = get_ref_dir(packed_refs->cache->root);
 	while (pos < eof) {
 		struct object_id oid;
 		const char *refname;
-		const char *traits;
 
 		eol = memchr(pos, '\n', eof - pos);
 		if (!eol)
@@ -267,15 +290,6 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 
 		strbuf_add(&line, pos, eol + 1 - pos);
 
-		if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
-			if (strstr(traits, " fully-peeled "))
-				peeled = PEELED_FULLY;
-			else if (strstr(traits, " peeled "))
-				peeled = PEELED_TAGS;
-			/* perhaps other traits later as well */
-			goto next_line;
-		}
-
 		refname = parse_ref_line(&line, &oid);
 		if (refname) {
 			int flag = REF_ISPACKED;
@@ -307,7 +321,6 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 			die_invalid_line(refs->path, line.buf, line.len);
 		}
 
-	next_line:
 		/* The "+ 1" is for the LF character. */
 		pos = eol + 1;
 		strbuf_reset(&line);
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 07/20] read_packed_refs(): make parsing of the header line more robust
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (5 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 06/20] read_packed_refs(): only check for a header at the top of the file Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 08/20] read_packed_refs(): read references with minimal copying Michael Haggerty
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

The old code parsed the traits in the `packed-refs` header by looking
for the string " trait " (i.e., the name of the trait with a space on
either side) in the header line. This is fragile, because if any other
implementation of Git forgets to write the trailing space, the last
trait would silently be ignored (and the error might never be
noticed).

So instead, use `string_list_split_in_place()` to split the traits
into tokens then use `unsorted_string_list_has_string()` to look for
the tokens we are interested in. This means that we can read the
traits correctly even if the header line is missing a trailing
space (or indeed, if it is missing the space after the colon, or if it
has multiple spaces somewhere).

However, older Git clients (and perhaps other Git implementations)
still require the surrounding spaces, so we still have to output the
header with a trailing space.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 141f02b9c8..a45e3ff92f 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -257,25 +257,30 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 
 	/* If the file has a header line, process it: */
 	if (pos < eof && *pos == '#') {
-		const char *traits;
+		char *p;
+		struct string_list traits = STRING_LIST_INIT_NODUP;
 
 		eol = memchr(pos, '\n', eof - pos);
 		if (!eol)
 			die_unterminated_line(refs->path, pos, eof - pos);
 
-		strbuf_add(&line, pos, eol + 1 - pos);
+		strbuf_add(&line, pos, eol - pos);
 
-		if (!skip_prefix(line.buf, "# pack-refs with:", &traits))
+		if (!skip_prefix(line.buf, "# pack-refs with:", (const char **)&p))
 			die_invalid_line(refs->path, pos, eof - pos);
 
-		if (strstr(traits, " fully-peeled "))
+		string_list_split_in_place(&traits, p, ' ', -1);
+
+		if (unsorted_string_list_has_string(&traits, "fully-peeled"))
 			peeled = PEELED_FULLY;
-		else if (strstr(traits, " peeled "))
+		else if (unsorted_string_list_has_string(&traits, "peeled"))
 			peeled = PEELED_TAGS;
 		/* perhaps other traits later as well */
 
 		/* The "+ 1" is for the LF character. */
 		pos = eol + 1;
+
+		string_list_clear(&traits, 0);
 		strbuf_reset(&line);
 	}
 
@@ -610,7 +615,11 @@ int packed_refs_is_locked(struct ref_store *ref_store)
 
 /*
  * The packed-refs header line that we write out.  Perhaps other
- * traits will be added later.  The trailing space is required.
+ * traits will be added later.
+ *
+ * Note that earlier versions of Git used to parse these traits by
+ * looking for " trait " in the line. For this reason, the space after
+ * the colon and the trailing space are required.
  */
 static const char PACKED_REFS_HEADER[] =
 	"# pack-refs with: peeled fully-peeled \n";
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 08/20] read_packed_refs(): read references with minimal copying
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (6 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 07/20] read_packed_refs(): make parsing of the header line more robust Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 09/20] packed_ref_cache: remember the file-wide peeling state Michael Haggerty
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

Instead of copying data from the `packed-refs` file one line at time
and then processing it, process the data in place as much as possible.

Also, instead of processing one line per iteration of the main loop,
process a reference line plus its corresponding peeled line (if
present) together.

Note that this change slightly tightens up the parsing of the
`parse-ref` file. Previously, the parser would have accepted multiple
"peeled" lines for a single reference (ignoring all but the last one).
Now it would reject that.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 101 ++++++++++++++++++++------------------------------
 1 file changed, 40 insertions(+), 61 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a45e3ff92f..2b80f244c8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -134,33 +134,6 @@ static void clear_packed_ref_cache(struct packed_ref_store *refs)
 	}
 }
 
-/* The length of a peeled reference line in packed-refs, including EOL: */
-#define PEELED_LINE_LENGTH 42
-
-/*
- * Parse one line from a packed-refs file.  Write the SHA1 to sha1.
- * Return a pointer to the refname within the line (null-terminated),
- * or NULL if there was a problem.
- */
-static const char *parse_ref_line(struct strbuf *line, struct object_id *oid)
-{
-	const char *ref;
-
-	if (parse_oid_hex(line->buf, oid, &ref) < 0)
-		return NULL;
-	if (!isspace(*ref++))
-		return NULL;
-
-	if (isspace(*ref))
-		return NULL;
-
-	if (line->buf[line->len - 1] != '\n')
-		return NULL;
-	line->buf[--line->len] = 0;
-
-	return ref;
-}
-
 static NORETURN void die_unterminated_line(const char *path,
 					   const char *p, size_t len)
 {
@@ -221,8 +194,7 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 	size_t size;
 	char *buf;
 	const char *pos, *eol, *eof;
-	struct ref_entry *last = NULL;
-	struct strbuf line = STRBUF_INIT;
+	struct strbuf tmp = STRBUF_INIT;
 	enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
 	struct ref_dir *dir;
 
@@ -264,9 +236,9 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 		if (!eol)
 			die_unterminated_line(refs->path, pos, eof - pos);
 
-		strbuf_add(&line, pos, eol - pos);
+		strbuf_add(&tmp, pos, eol - pos);
 
-		if (!skip_prefix(line.buf, "# pack-refs with:", (const char **)&p))
+		if (!skip_prefix(tmp.buf, "# pack-refs with:", (const char **)&p))
 			die_invalid_line(refs->path, pos, eof - pos);
 
 		string_list_split_in_place(&traits, p, ' ', -1);
@@ -281,61 +253,68 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 		pos = eol + 1;
 
 		string_list_clear(&traits, 0);
-		strbuf_reset(&line);
+		strbuf_reset(&tmp);
 	}
 
 	dir = get_ref_dir(packed_refs->cache->root);
 	while (pos < eof) {
+		const char *p = pos;
 		struct object_id oid;
 		const char *refname;
+		int flag = REF_ISPACKED;
+		struct ref_entry *entry = NULL;
 
-		eol = memchr(pos, '\n', eof - pos);
+		if (eof - pos < GIT_SHA1_HEXSZ + 2 ||
+		    parse_oid_hex(p, &oid, &p) ||
+		    !isspace(*p++))
+			die_invalid_line(refs->path, pos, eof - pos);
+
+		eol = memchr(p, '\n', eof - p);
 		if (!eol)
 			die_unterminated_line(refs->path, pos, eof - pos);
 
-		strbuf_add(&line, pos, eol + 1 - pos);
+		strbuf_add(&tmp, p, eol - p);
+		refname = tmp.buf;
+
+		if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
+			if (!refname_is_safe(refname))
+				die("packed refname is dangerous: %s", refname);
+			oidclr(&oid);
+			flag |= REF_BAD_NAME | REF_ISBROKEN;
+		}
+		if (peeled == PEELED_FULLY ||
+		    (peeled == PEELED_TAGS && starts_with(refname, "refs/tags/")))
+			flag |= REF_KNOWS_PEELED;
+		entry = create_ref_entry(refname, &oid, flag);
+		add_ref_entry(dir, entry);
 
-		refname = parse_ref_line(&line, &oid);
-		if (refname) {
-			int flag = REF_ISPACKED;
+		pos = eol + 1;
+
+		if (pos < eof && *pos == '^') {
+			p = pos + 1;
+			if (eof - p < GIT_SHA1_HEXSZ + 1 ||
+			    parse_oid_hex(p, &entry->u.value.peeled, &p) ||
+			    *p++ != '\n')
+				die_invalid_line(refs->path, pos, eof - pos);
 
-			if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
-				if (!refname_is_safe(refname))
-					die("packed refname is dangerous: %s", refname);
-				oidclr(&oid);
-				flag |= REF_BAD_NAME | REF_ISBROKEN;
-			}
-			last = create_ref_entry(refname, &oid, flag);
-			if (peeled == PEELED_FULLY ||
-			    (peeled == PEELED_TAGS && starts_with(refname, "refs/tags/")))
-				last->flag |= REF_KNOWS_PEELED;
-			add_ref_entry(dir, last);
-		} else if (last &&
-		    line.buf[0] == '^' &&
-		    line.len == PEELED_LINE_LENGTH &&
-		    line.buf[PEELED_LINE_LENGTH - 1] == '\n' &&
-		    !get_oid_hex(line.buf + 1, &oid)) {
-			oidcpy(&last->u.value.peeled, &oid);
 			/*
 			 * Regardless of what the file header said,
 			 * we definitely know the value of *this*
 			 * reference:
 			 */
-			last->flag |= REF_KNOWS_PEELED;
-		} else {
-			die_invalid_line(refs->path, line.buf, line.len);
+			entry->flag |= REF_KNOWS_PEELED;
+
+			pos = p;
 		}
 
-		/* The "+ 1" is for the LF character. */
-		pos = eol + 1;
-		strbuf_reset(&line);
+		strbuf_reset(&tmp);
 	}
 
 	if (munmap(buf, size))
 		die_errno("error ummapping packed-refs file");
 	close(fd);
 
-	strbuf_release(&line);
+	strbuf_release(&tmp);
 	return packed_refs;
 }
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 09/20] packed_ref_cache: remember the file-wide peeling state
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (7 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 08/20] read_packed_refs(): read references with minimal copying Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 10/20] mmapped_ref_iterator: add iterator over a packed-refs file Michael Haggerty
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

Rather than store the peeling state (i.e., the one defined by traits
in the `packed-refs` file header line) in a local variable in
`read_packed_refs()`, store it permanently in `packed_ref_cache`. This
will be needed when we stop reading all packed refs at once.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 2b80f244c8..ae276f3445 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -18,6 +18,12 @@ struct packed_ref_cache {
 
 	struct ref_cache *cache;
 
+	/*
+	 * What is the peeled state of this cache? (This is usually
+	 * determined from the header of the "packed-refs" file.)
+	 */
+	enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled;
+
 	/*
 	 * Count of references to the data structure in this instance,
 	 * including the pointer from files_ref_store::packed if any.
@@ -195,13 +201,13 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 	char *buf;
 	const char *pos, *eol, *eof;
 	struct strbuf tmp = STRBUF_INIT;
-	enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
 	struct ref_dir *dir;
 
 	packed_refs->refs = refs;
 	acquire_packed_ref_cache(packed_refs);
 	packed_refs->cache = create_ref_cache(NULL, NULL);
 	packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
+	packed_refs->peeled = PEELED_NONE;
 
 	fd = open(refs->path, O_RDONLY);
 	if (fd < 0) {
@@ -244,9 +250,9 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 		string_list_split_in_place(&traits, p, ' ', -1);
 
 		if (unsorted_string_list_has_string(&traits, "fully-peeled"))
-			peeled = PEELED_FULLY;
+			packed_refs->peeled = PEELED_FULLY;
 		else if (unsorted_string_list_has_string(&traits, "peeled"))
-			peeled = PEELED_TAGS;
+			packed_refs->peeled = PEELED_TAGS;
 		/* perhaps other traits later as well */
 
 		/* The "+ 1" is for the LF character. */
@@ -282,8 +288,9 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 			oidclr(&oid);
 			flag |= REF_BAD_NAME | REF_ISBROKEN;
 		}
-		if (peeled == PEELED_FULLY ||
-		    (peeled == PEELED_TAGS && starts_with(refname, "refs/tags/")))
+		if (packed_refs->peeled == PEELED_FULLY ||
+		    (packed_refs->peeled == PEELED_TAGS &&
+		     starts_with(refname, "refs/tags/")))
 			flag |= REF_KNOWS_PEELED;
 		entry = create_ref_entry(refname, &oid, flag);
 		add_ref_entry(dir, entry);
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 10/20] mmapped_ref_iterator: add iterator over a packed-refs file
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (8 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 09/20] packed_ref_cache: remember the file-wide peeling state Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 11/20] mmapped_ref_iterator_advance(): no peeled value for broken refs Michael Haggerty
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

Add a new `mmapped_ref_iterator`, which can iterate over the
references in an mmapped `packed-refs` file directly. Use this
iterator from `read_packed_refs()` to fill the packed refs cache.

Note that we are not yet willing to promise that the new iterator
generates its output in order. That doesn't matter for now, because
the packed refs cache doesn't care what order it is filled.

This change adds a lot of boilerplate without providing any obvious
benefits. The benefits will come soon, when we get rid of the
`ref_cache` for packed references altogether.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 207 ++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 152 insertions(+), 55 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index ae276f3445..312116a99d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -163,6 +163,141 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
+/*
+ * An iterator over a packed-refs file that is currently mmapped.
+ */
+struct mmapped_ref_iterator {
+	struct ref_iterator base;
+
+	struct packed_ref_cache *packed_refs;
+
+	/* The current position in the mmapped file: */
+	const char *pos;
+
+	/* The end of the mmapped file: */
+	const char *eof;
+
+	struct object_id oid, peeled;
+
+	struct strbuf refname_buf;
+};
+
+static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct mmapped_ref_iterator *iter =
+		(struct mmapped_ref_iterator *)ref_iterator;
+	const char *p = iter->pos, *eol;
+
+	strbuf_reset(&iter->refname_buf);
+
+	if (iter->pos == iter->eof)
+		return ref_iterator_abort(ref_iterator);
+
+	iter->base.flags = REF_ISPACKED;
+
+	if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
+	    parse_oid_hex(p, &iter->oid, &p) ||
+	    !isspace(*p++))
+		die_invalid_line(iter->packed_refs->refs->path,
+				 iter->pos, iter->eof - iter->pos);
+
+	eol = memchr(p, '\n', iter->eof - p);
+	if (!eol)
+		die_unterminated_line(iter->packed_refs->refs->path,
+				      iter->pos, iter->eof - iter->pos);
+
+	strbuf_add(&iter->refname_buf, p, eol - p);
+	iter->base.refname = iter->refname_buf.buf;
+
+	if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) {
+		if (!refname_is_safe(iter->base.refname))
+			die("packed refname is dangerous: %s",
+			    iter->base.refname);
+		oidclr(&iter->oid);
+		iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN;
+	}
+	if (iter->packed_refs->peeled == PEELED_FULLY ||
+	    (iter->packed_refs->peeled == PEELED_TAGS &&
+	     starts_with(iter->base.refname, "refs/tags/")))
+		iter->base.flags |= REF_KNOWS_PEELED;
+
+	iter->pos = eol + 1;
+
+	if (iter->pos < iter->eof && *iter->pos == '^') {
+		p = iter->pos + 1;
+		if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
+		    parse_oid_hex(p, &iter->peeled, &p) ||
+		    *p++ != '\n')
+			die_invalid_line(iter->packed_refs->refs->path,
+					 iter->pos, iter->eof - iter->pos);
+		iter->pos = p;
+
+		/*
+		 * Regardless of what the file header said, we
+		 * definitely know the value of *this* reference:
+		 */
+		iter->base.flags |= REF_KNOWS_PEELED;
+	} else {
+		oidclr(&iter->peeled);
+	}
+
+	return ITER_OK;
+}
+
+static int mmapped_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				    struct object_id *peeled)
+{
+	struct mmapped_ref_iterator *iter =
+		(struct mmapped_ref_iterator *)ref_iterator;
+
+	if ((iter->base.flags & REF_KNOWS_PEELED)) {
+		oidcpy(peeled, &iter->peeled);
+		return is_null_oid(&iter->peeled) ? -1 : 0;
+	} else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) {
+		return -1;
+	} else {
+		return !!peel_object(iter->oid.hash, peeled->hash);
+	}
+}
+
+static int mmapped_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct mmapped_ref_iterator *iter =
+		(struct mmapped_ref_iterator *)ref_iterator;
+
+	release_packed_ref_cache(iter->packed_refs);
+	strbuf_release(&iter->refname_buf);
+	base_ref_iterator_free(ref_iterator);
+	return ITER_DONE;
+}
+
+static struct ref_iterator_vtable mmapped_ref_iterator_vtable = {
+	mmapped_ref_iterator_advance,
+	mmapped_ref_iterator_peel,
+	mmapped_ref_iterator_abort
+};
+
+struct ref_iterator *mmapped_ref_iterator_begin(
+		const char *packed_refs_file,
+		struct packed_ref_cache *packed_refs,
+		const char *pos, const char *eof)
+{
+	struct mmapped_ref_iterator *iter = xcalloc(1, sizeof(*iter));
+	struct ref_iterator *ref_iterator = &iter->base;
+
+	base_ref_iterator_init(ref_iterator, &mmapped_ref_iterator_vtable, 0);
+
+	iter->packed_refs = packed_refs;
+	acquire_packed_ref_cache(iter->packed_refs);
+	iter->pos = pos;
+	iter->eof = eof;
+	strbuf_init(&iter->refname_buf, 0);
+
+	iter->base.oid = &iter->oid;
+
+	return ref_iterator;
+}
+
 /*
  * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
@@ -199,9 +334,10 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 	struct stat st;
 	size_t size;
 	char *buf;
-	const char *pos, *eol, *eof;
-	struct strbuf tmp = STRBUF_INIT;
+	const char *pos, *eof;
 	struct ref_dir *dir;
+	struct ref_iterator *iter;
+	int ok;
 
 	packed_refs->refs = refs;
 	acquire_packed_ref_cache(packed_refs);
@@ -235,7 +371,9 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 
 	/* If the file has a header line, process it: */
 	if (pos < eof && *pos == '#') {
+		struct strbuf tmp = STRBUF_INIT;
 		char *p;
+		const char *eol;
 		struct string_list traits = STRING_LIST_INIT_NODUP;
 
 		eol = memchr(pos, '\n', eof - pos);
@@ -259,69 +397,28 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 		pos = eol + 1;
 
 		string_list_clear(&traits, 0);
-		strbuf_reset(&tmp);
+		strbuf_release(&tmp);
 	}
 
 	dir = get_ref_dir(packed_refs->cache->root);
-	while (pos < eof) {
-		const char *p = pos;
-		struct object_id oid;
-		const char *refname;
-		int flag = REF_ISPACKED;
-		struct ref_entry *entry = NULL;
-
-		if (eof - pos < GIT_SHA1_HEXSZ + 2 ||
-		    parse_oid_hex(p, &oid, &p) ||
-		    !isspace(*p++))
-			die_invalid_line(refs->path, pos, eof - pos);
+	iter = mmapped_ref_iterator_begin(refs->path, packed_refs, pos, eof);
+	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+		struct ref_entry *entry =
+			create_ref_entry(iter->refname, iter->oid, iter->flags);
 
-		eol = memchr(p, '\n', eof - p);
-		if (!eol)
-			die_unterminated_line(refs->path, pos, eof - pos);
-
-		strbuf_add(&tmp, p, eol - p);
-		refname = tmp.buf;
-
-		if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
-			if (!refname_is_safe(refname))
-				die("packed refname is dangerous: %s", refname);
-			oidclr(&oid);
-			flag |= REF_BAD_NAME | REF_ISBROKEN;
-		}
-		if (packed_refs->peeled == PEELED_FULLY ||
-		    (packed_refs->peeled == PEELED_TAGS &&
-		     starts_with(refname, "refs/tags/")))
-			flag |= REF_KNOWS_PEELED;
-		entry = create_ref_entry(refname, &oid, flag);
+		if ((iter->flags & REF_KNOWS_PEELED))
+			ref_iterator_peel(iter, &entry->u.value.peeled);
 		add_ref_entry(dir, entry);
-
-		pos = eol + 1;
-
-		if (pos < eof && *pos == '^') {
-			p = pos + 1;
-			if (eof - p < GIT_SHA1_HEXSZ + 1 ||
-			    parse_oid_hex(p, &entry->u.value.peeled, &p) ||
-			    *p++ != '\n')
-				die_invalid_line(refs->path, pos, eof - pos);
-
-			/*
-			 * Regardless of what the file header said,
-			 * we definitely know the value of *this*
-			 * reference:
-			 */
-			entry->flag |= REF_KNOWS_PEELED;
-
-			pos = p;
-		}
-
-		strbuf_reset(&tmp);
 	}
 
+	if (ok != ITER_DONE)
+		die("error reading packed-refs file %s", refs->path);
+
 	if (munmap(buf, size))
-		die_errno("error ummapping packed-refs file");
+		die_errno("error ummapping packed-refs file %s", refs->path);
+
 	close(fd);
 
-	strbuf_release(&tmp);
 	return packed_refs;
 }
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 11/20] mmapped_ref_iterator_advance(): no peeled value for broken refs
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (9 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 10/20] mmapped_ref_iterator: add iterator over a packed-refs file Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 12/20] packed_ref_cache: keep the `packed-refs` file open and mmapped Michael Haggerty
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

If a reference is broken, suppress its peeled value.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 312116a99d..724c88631d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -234,9 +234,15 @@ static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator)
 
 		/*
 		 * Regardless of what the file header said, we
-		 * definitely know the value of *this* reference:
+		 * definitely know the value of *this* reference. But
+		 * we suppress it if the reference is broken:
 		 */
-		iter->base.flags |= REF_KNOWS_PEELED;
+		if ((iter->base.flags & REF_ISBROKEN)) {
+			oidclr(&iter->peeled);
+			iter->base.flags &= ~REF_KNOWS_PEELED;
+		} else {
+			iter->base.flags |= REF_KNOWS_PEELED;
+		}
 	} else {
 		oidclr(&iter->peeled);
 	}
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 12/20] packed_ref_cache: keep the `packed-refs` file open and mmapped
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (10 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 11/20] mmapped_ref_iterator_advance(): no peeled value for broken refs Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 13/20] read_packed_refs(): ensure that references are ordered when read Michael Haggerty
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

As long as a `packed_ref_cache` object is in use, keep the
corresponding `packed-refs` file open and mmapped.

This is still pointless, because we only read the file immediately
after instantiating the `packed_ref_cache`. But that will soon change.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 137 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 89 insertions(+), 48 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 724c88631d..d209c8e5a0 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -18,6 +18,22 @@ struct packed_ref_cache {
 
 	struct ref_cache *cache;
 
+	/*
+	 * The file descriptor of the `packed-refs` file (open in
+	 * read-only mode), or -1 if it is not open.
+	 */
+	int fd;
+
+	/*
+	 * The range of memory to which the `packed-refs` file is
+	 * mmapped. If there were no contents (e.g., because the file
+	 * didn't exist), `buf` and `eof` are both NULL.
+	 */
+	char *buf, *eof;
+
+	/* The size of the header line, if any; otherwise, 0: */
+	size_t header_len;
+
 	/*
 	 * What is the peeled state of this cache? (This is usually
 	 * determined from the header of the "packed-refs" file.)
@@ -36,30 +52,6 @@ struct packed_ref_cache {
 	struct stat_validity validity;
 };
 
-/*
- * Increment the reference count of *packed_refs.
- */
-static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs)
-{
-	packed_refs->referrers++;
-}
-
-/*
- * Decrease the reference count of *packed_refs.  If it goes to zero,
- * free *packed_refs and return true; otherwise return false.
- */
-static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
-{
-	if (!--packed_refs->referrers) {
-		free_ref_cache(packed_refs->cache);
-		stat_validity_clear(&packed_refs->validity);
-		free(packed_refs);
-		return 1;
-	} else {
-		return 0;
-	}
-}
-
 /*
  * A container for `packed-refs`-related data. It is not (yet) a
  * `ref_store`.
@@ -92,6 +84,50 @@ struct packed_ref_store {
 	struct tempfile tempfile;
 };
 
+/*
+ * Increment the reference count of *packed_refs.
+ */
+static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs)
+{
+	packed_refs->referrers++;
+}
+
+/*
+ * If the buffer in `packed_refs` is active, munmap the memory, close the
+ * file, and set the buffer pointers to NULL.
+ */
+static void release_packed_ref_buffer(struct packed_ref_cache *packed_refs)
+{
+	if (packed_refs->buf) {
+		if (munmap(packed_refs->buf,
+			   packed_refs->eof - packed_refs->buf))
+			die_errno("error ummapping packed-refs file %s",
+				  packed_refs->refs->path);
+		packed_refs->buf = packed_refs->eof = NULL;
+		packed_refs->header_len = 0;
+	}
+
+	if (packed_refs->fd >= 0)
+		close(packed_refs->fd);
+}
+
+/*
+ * Decrease the reference count of *packed_refs.  If it goes to zero,
+ * free *packed_refs and return true; otherwise return false.
+ */
+static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
+{
+	if (!--packed_refs->referrers) {
+		free_ref_cache(packed_refs->cache);
+		stat_validity_clear(&packed_refs->validity);
+		release_packed_ref_buffer(packed_refs);
+		free(packed_refs);
+		return 1;
+	} else {
+		return 0;
+	}
+}
+
 struct ref_store *packed_ref_store_create(const char *path,
 					  unsigned int store_flags)
 {
@@ -284,13 +320,15 @@ static struct ref_iterator_vtable mmapped_ref_iterator_vtable = {
 };
 
 struct ref_iterator *mmapped_ref_iterator_begin(
-		const char *packed_refs_file,
 		struct packed_ref_cache *packed_refs,
 		const char *pos, const char *eof)
 {
 	struct mmapped_ref_iterator *iter = xcalloc(1, sizeof(*iter));
 	struct ref_iterator *ref_iterator = &iter->base;
 
+	if (!packed_refs->buf)
+		return empty_ref_iterator_begin();
+
 	base_ref_iterator_init(ref_iterator, &mmapped_ref_iterator_vtable, 0);
 
 	iter->packed_refs = packed_refs;
@@ -336,11 +374,8 @@ struct ref_iterator *mmapped_ref_iterator_begin(
 static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
 	struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
-	int fd;
 	struct stat st;
 	size_t size;
-	char *buf;
-	const char *pos, *eof;
 	struct ref_dir *dir;
 	struct ref_iterator *iter;
 	int ok;
@@ -351,13 +386,15 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 	packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
 	packed_refs->peeled = PEELED_NONE;
 
-	fd = open(refs->path, O_RDONLY);
-	if (fd < 0) {
+	packed_refs->fd = open(refs->path, O_RDONLY);
+	if (packed_refs->fd < 0) {
 		if (errno == ENOENT) {
 			/*
 			 * This is OK; it just means that no
 			 * "packed-refs" file has been written yet,
-			 * which is equivalent to it being empty.
+			 * which is equivalent to it being empty,
+			 * which is its state when initialized with
+			 * zeros.
 			 */
 			return packed_refs;
 		} else {
@@ -365,31 +402,37 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 		}
 	}
 
-	stat_validity_update(&packed_refs->validity, fd);
+	stat_validity_update(&packed_refs->validity, packed_refs->fd);
 
-	if (fstat(fd, &st) < 0)
+	if (fstat(packed_refs->fd, &st) < 0)
 		die_errno("couldn't stat %s", refs->path);
 
 	size = xsize_t(st.st_size);
-	buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
-	pos = buf;
-	eof = buf + size;
+	packed_refs->buf = xmmap(NULL, size,
+				 PROT_READ, MAP_PRIVATE,
+				 packed_refs->fd, 0);
+	packed_refs->eof = packed_refs->buf + size;
 
 	/* If the file has a header line, process it: */
-	if (pos < eof && *pos == '#') {
+	if (packed_refs->buf < packed_refs->eof && *packed_refs->buf == '#') {
 		struct strbuf tmp = STRBUF_INIT;
 		char *p;
 		const char *eol;
 		struct string_list traits = STRING_LIST_INIT_NODUP;
 
-		eol = memchr(pos, '\n', eof - pos);
+		eol = memchr(packed_refs->buf, '\n',
+			     packed_refs->eof - packed_refs->buf);
 		if (!eol)
-			die_unterminated_line(refs->path, pos, eof - pos);
+			die_unterminated_line(refs->path,
+					      packed_refs->buf,
+					      packed_refs->eof - packed_refs->buf);
 
-		strbuf_add(&tmp, pos, eol - pos);
+		strbuf_add(&tmp, packed_refs->buf, eol - packed_refs->buf);
 
 		if (!skip_prefix(tmp.buf, "# pack-refs with:", (const char **)&p))
-			die_invalid_line(refs->path, pos, eof - pos);
+			die_invalid_line(refs->path,
+					 packed_refs->buf,
+					 packed_refs->eof - packed_refs->buf);
 
 		string_list_split_in_place(&traits, p, ' ', -1);
 
@@ -400,14 +443,17 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 		/* perhaps other traits later as well */
 
 		/* The "+ 1" is for the LF character. */
-		pos = eol + 1;
+		packed_refs->header_len = eol + 1 - packed_refs->buf;
 
 		string_list_clear(&traits, 0);
 		strbuf_release(&tmp);
 	}
 
 	dir = get_ref_dir(packed_refs->cache->root);
-	iter = mmapped_ref_iterator_begin(refs->path, packed_refs, pos, eof);
+	iter = mmapped_ref_iterator_begin(
+			packed_refs,
+			packed_refs->buf + packed_refs->header_len,
+			packed_refs->eof);
 	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
 		struct ref_entry *entry =
 			create_ref_entry(iter->refname, iter->oid, iter->flags);
@@ -420,11 +466,6 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 	if (ok != ITER_DONE)
 		die("error reading packed-refs file %s", refs->path);
 
-	if (munmap(buf, size))
-		die_errno("error ummapping packed-refs file %s", refs->path);
-
-	close(fd);
-
 	return packed_refs;
 }
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 13/20] read_packed_refs(): ensure that references are ordered when read
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (11 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 12/20] packed_ref_cache: keep the `packed-refs` file open and mmapped Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 14/20] packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator` Michael Haggerty
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

It doesn't actually matter now, because the references are only
iterated over to fill the associated `ref_cache`, which itself puts
them in the correct order. But we want to get rid of the `ref_cache`,
so we want to be able to iterate directly over the `packed-refs`
buffer, and then the iteration will need to be ordered correctly.

In fact, we already write the `packed-refs` file sorted, but it is
possible that other Git clients don't get it right. So let's not
assume that a `packed-refs` file is sorted unless it is explicitly
declared to be so via a `sorted` trait in its header line.

If it is *not* declared to be sorted, then scan quickly through the
file to check. If it is found to be out of order, then sort the
records into a new memory-only copy rather than using the mmapped file
contents. This checking and sorting is done quickly, without parsing
the full file contents. However, it needs a little bit of care to
avoid reading past the end of the buffer even if the `packed-refs`
file is corrupt.

If `packed-refs` is sorted (i.e., its file header contains the
`sorted` trait or our check shows that it is), then use the mmapped
file contents directly.

Since we write the file correctly sorted, include that trait when we
write or rewrite a `packed-refs` file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 222 ++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 206 insertions(+), 16 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index d209c8e5a0..2a36dd9afb 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -25,9 +25,12 @@ struct packed_ref_cache {
 	int fd;
 
 	/*
-	 * The range of memory to which the `packed-refs` file is
-	 * mmapped. If there were no contents (e.g., because the file
-	 * didn't exist), `buf` and `eof` are both NULL.
+	 * The contents of the `packed-refs` file. If the file was
+	 * already sorted, this points at the mmapped contents of the
+	 * file. If not, this points at heap-allocated memory
+	 * containing the contents, sorted. If there were no contents
+	 * (e.g., because the file didn't exist), `buf` and `eof` are
+	 * both NULL.
 	 */
 	char *buf, *eof;
 
@@ -93,22 +96,24 @@ static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs)
 }
 
 /*
- * If the buffer in `packed_refs` is active, munmap the memory, close the
- * file, and set the buffer pointers to NULL.
+ * If the buffer in `packed_refs` is active, then either munmap the
+ * memory and close the file, or free the memory. Then set the buffer
+ * pointers to NULL.
  */
 static void release_packed_ref_buffer(struct packed_ref_cache *packed_refs)
 {
-	if (packed_refs->buf) {
+	if (packed_refs->fd >= 0) {
 		if (munmap(packed_refs->buf,
 			   packed_refs->eof - packed_refs->buf))
 			die_errno("error ummapping packed-refs file %s",
 				  packed_refs->refs->path);
-		packed_refs->buf = packed_refs->eof = NULL;
-		packed_refs->header_len = 0;
-	}
-
-	if (packed_refs->fd >= 0)
 		close(packed_refs->fd);
+		packed_refs->fd = -1;
+	} else {
+		free(packed_refs->buf);
+	}
+	packed_refs->buf = packed_refs->eof = NULL;
+	packed_refs->header_len = 0;
 }
 
 /*
@@ -329,7 +334,7 @@ struct ref_iterator *mmapped_ref_iterator_begin(
 	if (!packed_refs->buf)
 		return empty_ref_iterator_begin();
 
-	base_ref_iterator_init(ref_iterator, &mmapped_ref_iterator_vtable, 0);
+	base_ref_iterator_init(ref_iterator, &mmapped_ref_iterator_vtable, 1);
 
 	iter->packed_refs = packed_refs;
 	acquire_packed_ref_cache(iter->packed_refs);
@@ -342,6 +347,170 @@ struct ref_iterator *mmapped_ref_iterator_begin(
 	return ref_iterator;
 }
 
+struct packed_ref_entry {
+	const char *start;
+	size_t len;
+};
+
+static int cmp_packed_ref_entries(const void *v1, const void *v2)
+{
+	const struct packed_ref_entry *e1 = v1, *e2 = v2;
+	const char *r1 = e1->start + GIT_SHA1_HEXSZ + 1;
+	const char *r2 = e2->start + GIT_SHA1_HEXSZ + 1;
+
+	while (1) {
+		if (*r1 == '\n')
+			return *r2 == '\n' ? 0 : -1;
+		if (*r1 != *r2) {
+			if (*r2 == '\n')
+				return 1;
+			else
+				return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
+		}
+		r1++;
+		r2++;
+	}
+}
+
+/*
+ * `packed_refs->buf` is not known to be sorted. Check whether it is,
+ * and if not, sort it into new memory and munmap/free the old
+ * storage.
+ */
+static void sort_packed_refs(struct packed_ref_cache *packed_refs)
+{
+	struct packed_ref_entry *entries = NULL;
+	size_t alloc = 0, nr = 0;
+	int sorted = 1;
+	const char *pos, *eof, *eol;
+	size_t len, i;
+	char *new_buffer, *dst;
+
+	pos = packed_refs->buf + packed_refs->header_len;
+	eof = packed_refs->eof;
+	len = eof - pos;
+
+	if (!len)
+		return;
+
+	/*
+	 * Initialize entries based on a crude estimate of the number
+	 * of references in the file (we'll grow it below if needed):
+	 */
+	ALLOC_GROW(entries, len / 80 + 20, alloc);
+
+	while (pos < eof) {
+		eol = memchr(pos, '\n', eof - pos);
+		if (!eol)
+			/* The safety check should prevent this. */
+			BUG("unterminated line found in packed-refs");
+		if (eol - pos < GIT_SHA1_HEXSZ + 2)
+			die_invalid_line(packed_refs->refs->path,
+					 pos, eof - pos);
+		eol++;
+		if (eol < eof && *eol == '^') {
+			/*
+			 * Keep any peeled line together with its
+			 * reference:
+			 */
+			const char *peeled_start = eol;
+
+			eol = memchr(peeled_start, '\n', eof - peeled_start);
+			if (!eol)
+				/* The safety check should prevent this. */
+				BUG("unterminated peeled line found in packed-refs");
+			eol++;
+		}
+
+		ALLOC_GROW(entries, nr + 1, alloc);
+		entries[nr].start = pos;
+		entries[nr].len = eol - pos;
+		nr++;
+
+		if (sorted &&
+		    nr > 1 &&
+		    cmp_packed_ref_entries(&entries[nr - 2],
+					   &entries[nr - 1]) >= 0)
+			sorted = 0;
+
+		pos = eol;
+	}
+
+	if (sorted)
+		goto cleanup;
+
+	/* We need to sort the memory. First we sort the entries array: */
+	QSORT(entries, nr, cmp_packed_ref_entries);
+
+	/*
+	 * Allocate a new chunk of memory, and copy the old memory to
+	 * the new in the order indicated by `entries` (not bothering
+	 * with the header line):
+	 */
+	new_buffer = xmalloc(len);
+	for (dst = new_buffer, i = 0; i < nr; i++) {
+		memcpy(dst, entries[i].start, entries[i].len);
+		dst += entries[i].len;
+	}
+
+	/*
+	 * Now munmap the old buffer and use the sorted buffer in its
+	 * place:
+	 */
+	release_packed_ref_buffer(packed_refs);
+	packed_refs->buf = new_buffer;
+	packed_refs->eof = new_buffer + len;
+	packed_refs->header_len = 0;
+
+cleanup:
+	free(entries);
+}
+
+/*
+ * Return a pointer to the start of the record that contains the
+ * character `*p` (which must be within the buffer). If no other
+ * record start is found, return `buf`.
+ */
+static const char *find_start_of_record(const char *buf, const char *p)
+{
+	while (p > buf && (p[-1] != '\n' || p[0] == '^'))
+		p--;
+	return p;
+}
+
+/*
+ * We want to be able to compare mmapped reference records quickly,
+ * without totally parsing them. We can do so because the records are
+ * LF-terminated, and the refname should start exactly (GIT_SHA1_HEXSZ
+ * + 1) bytes past the beginning of the record.
+ *
+ * But what if the `packed-refs` file contains garbage? We're willing
+ * to tolerate not detecting the problem, as long as we don't produce
+ * totally garbled output (we can't afford to check the integrity of
+ * the whole file during every Git invocation). But we do want to be
+ * sure that we never read past the end of the buffer in memory and
+ * perform an illegal memory access.
+ *
+ * Guarantee that minimum level of safety by verifying that the last
+ * record in the file is LF-terminated, and that it has at least
+ * (GIT_SHA1_HEXSZ + 1) characters before the LF. Die if either of
+ * these checks fails.
+ */
+static void verify_buffer_safe(struct packed_ref_cache *packed_refs)
+{
+	const char *buf = packed_refs->buf + packed_refs->header_len;
+	const char *eof = packed_refs->eof;
+	const char *last_line;
+
+	if (buf == eof)
+		return;
+
+	last_line = find_start_of_record(buf, eof - 1);
+	if (*(eof - 1) != '\n' || eof - last_line < GIT_SHA1_HEXSZ + 2)
+		die_invalid_line(packed_refs->refs->path,
+				 last_line, eof - last_line);
+}
+
 /*
  * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
@@ -350,19 +519,19 @@ struct ref_iterator *mmapped_ref_iterator_begin(
  * A comment line of the form "# pack-refs with: " may contain zero or
  * more traits. We interpret the traits as follows:
  *
- *   No traits:
+ *   Neither `peeled` nor `fully-peeled`:
  *
  *      Probably no references are peeled. But if the file contains a
  *      peeled value for a reference, we will use it.
  *
- *   peeled:
+ *   `peeled`:
  *
  *      References under "refs/tags/", if they *can* be peeled, *are*
  *      peeled in this file. References outside of "refs/tags/" are
  *      probably not peeled even if they could have been, but if we find
  *      a peeled value for such a reference we will use it.
  *
- *   fully-peeled:
+ *   `fully-peeled`:
  *
  *      All references in the file that can be peeled are peeled.
  *      Inversely (and this is more important), any references in the
@@ -370,6 +539,10 @@ struct ref_iterator *mmapped_ref_iterator_begin(
  *      trait should typically be written alongside "peeled" for
  *      compatibility with older clients, but we do not require it
  *      (i.e., "peeled" is a no-op if "fully-peeled" is set).
+ *
+ *   `sorted`:
+ *
+ *      The references in this file are known to be sorted by refname.
  */
 static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
@@ -378,6 +551,7 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 	size_t size;
 	struct ref_dir *dir;
 	struct ref_iterator *iter;
+	int sorted = 0;
 	int ok;
 
 	packed_refs->refs = refs;
@@ -440,6 +614,9 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 			packed_refs->peeled = PEELED_FULLY;
 		else if (unsorted_string_list_has_string(&traits, "peeled"))
 			packed_refs->peeled = PEELED_TAGS;
+
+		sorted = unsorted_string_list_has_string(&traits, "sorted");
+
 		/* perhaps other traits later as well */
 
 		/* The "+ 1" is for the LF character. */
@@ -449,6 +626,19 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 		strbuf_release(&tmp);
 	}
 
+	verify_buffer_safe(packed_refs);
+
+	if (!sorted) {
+		sort_packed_refs(packed_refs);
+
+		/*
+		 * Reordering the records might have moved a short one
+		 * to the end of the buffer, so verify the buffer's
+		 * safety again:
+		 */
+		verify_buffer_safe(packed_refs);
+	}
+
 	dir = get_ref_dir(packed_refs->cache->root);
 	iter = mmapped_ref_iterator_begin(
 			packed_refs,
@@ -752,7 +942,7 @@ int packed_refs_is_locked(struct ref_store *ref_store)
  * the colon and the trailing space are required.
  */
 static const char PACKED_REFS_HEADER[] =
-	"# pack-refs with: peeled fully-peeled \n";
+	"# pack-refs with: peeled fully-peeled sorted \n";
 
 static int packed_init_db(struct ref_store *ref_store, struct strbuf *err)
 {
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 14/20] packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator`
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (12 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 13/20] read_packed_refs(): ensure that references are ordered when read Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 15/20] packed_read_raw_ref(): read the reference from the mmapped buffer Michael Haggerty
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

Now that we have an efficient way to iterate, in order, over the
mmapped contents of the `packed-refs` file, we can use that directly
to implement reference iteration for the `packed_ref_store`, rather
than iterating over the `ref_cache`. This is the next step towards
getting rid of the `ref_cache` entirely.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 106 insertions(+), 3 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 2a36dd9afb..a018a6c8ad 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -372,6 +372,27 @@ static int cmp_packed_ref_entries(const void *v1, const void *v2)
 	}
 }
 
+/*
+ * Compare a packed-refs record pointed to by `rec` to the specified
+ * NUL-terminated refname.
+ */
+static int cmp_entry_to_refname(const char *rec, const char *refname)
+{
+	const char *r1 = rec + GIT_SHA1_HEXSZ + 1;
+	const char *r2 = refname;
+
+	while (1) {
+		if (*r1 == '\n')
+			return *r2 ? -1 : 0;
+		if (!*r2)
+			return 1;
+		if (*r1 != *r2)
+			return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
+		r1++;
+		r2++;
+	}
+}
+
 /*
  * `packed_refs->buf` is not known to be sorted. Check whether it is,
  * and if not, sort it into new memory and munmap/free the old
@@ -478,6 +499,17 @@ static const char *find_start_of_record(const char *buf, const char *p)
 	return p;
 }
 
+/*
+ * Return a pointer to the start of the record following the record
+ * that contains `*p`. If none is found before `end`, return `end`.
+ */
+static const char *find_end_of_record(const char *p, const char *end)
+{
+	while (++p < end && (p[-1] != '\n' || p[0] == '^'))
+		;
+	return p;
+}
+
 /*
  * We want to be able to compare mmapped reference records quickly,
  * without totally parsing them. We can do so because the records are
@@ -511,6 +543,65 @@ static void verify_buffer_safe(struct packed_ref_cache *packed_refs)
 				 last_line, eof - last_line);
 }
 
+/*
+ * Find the place in `cache->buf` where the start of the record for
+ * `refname` starts. If `mustexist` is true and the reference doesn't
+ * exist, then return NULL. If `mustexist` is false and the reference
+ * doesn't exist, then return the point where that reference would be
+ * inserted. In the latter mode, `refname` doesn't have to be a proper
+ * reference name; for example, one could search for "refs/replace/"
+ * to find the start of any replace references.
+ *
+ * The record is sought using a binary search, so `cache->buf` must be
+ * sorted.
+ */
+static const char *find_reference_location(struct packed_ref_cache *cache,
+					   const char *refname, int mustexist)
+{
+	/*
+	 * This is not *quite* a garden-variety binary search, because
+	 * the data we're searching is made up of records, and we
+	 * always need to find the beginning of a record to do a
+	 * comparison. A "record" here is one line for the reference
+	 * itself and zero or one peel lines that start with '^'. Our
+	 * loop invariant is described in the next two comments.
+	 */
+
+	/*
+	 * A pointer to the character at the start of a record whose
+	 * preceding records all have reference names that come
+	 * *before* `refname`.
+	 */
+	const char *lo = cache->buf + cache->header_len;
+
+	/*
+	 * A pointer to a the first character of a record whose
+	 * reference name comes *after* `refname`.
+	 */
+	const char *hi = cache->eof;
+
+	while (lo < hi) {
+		const char *mid, *rec;
+		int cmp;
+
+		mid = lo + (hi - lo) / 2;
+		rec = find_start_of_record(lo, mid);
+		cmp = cmp_entry_to_refname(rec, refname);
+		if (cmp < 0) {
+			lo = find_end_of_record(mid, hi);
+		} else if (cmp > 0) {
+			hi = rec;
+		} else {
+			return rec;
+		}
+	}
+
+	if (mustexist)
+		return NULL;
+	else
+		return lo;
+}
+
 /*
  * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
@@ -818,6 +909,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
 		const char *prefix, unsigned int flags)
 {
 	struct packed_ref_store *refs;
+	struct packed_ref_cache *packed_refs;
+	const char *start;
 	struct packed_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
 	unsigned int required_flags = REF_STORE_READ;
@@ -835,13 +928,23 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	 * the packed-ref cache is up to date with what is on disk,
 	 * and re-reads it if not.
 	 */
+	iter->cache = packed_refs = get_packed_ref_cache(refs);
+	acquire_packed_ref_cache(packed_refs);
 
-	iter->cache = get_packed_ref_cache(refs);
-	acquire_packed_ref_cache(iter->cache);
-	iter->iter0 = cache_ref_iterator_begin(iter->cache->cache, prefix, 0);
+	if (prefix && *prefix)
+		start = find_reference_location(packed_refs, prefix, 0);
+	else
+		start = packed_refs->buf + packed_refs->header_len;
+
+	iter->iter0 = mmapped_ref_iterator_begin(
+			packed_refs, start, packed_refs->eof);
 
 	iter->flags = flags;
 
+	if (prefix && *prefix)
+		/* Stop iteration after we've gone *past* prefix: */
+		ref_iterator = prefix_ref_iterator_begin(ref_iterator, prefix, 0);
+
 	return ref_iterator;
 }
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 15/20] packed_read_raw_ref(): read the reference from the mmapped buffer
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (13 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 14/20] packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator` Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 16/20] ref_store: implement `refs_peel_ref()` generically Michael Haggerty
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

Instead of reading the reference from the `ref_cache`, read it
directly from the mmapped buffer.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a018a6c8ad..9b10532eb3 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -806,18 +806,22 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
 {
 	struct packed_ref_store *refs =
 		packed_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
-
-	struct ref_entry *entry;
+	struct packed_ref_cache *packed_refs = get_packed_ref_cache(refs);
+	const char *rec;
 
 	*type = 0;
 
-	entry = get_packed_ref(refs, refname);
-	if (!entry) {
+	rec = find_reference_location(packed_refs, refname, 1);
+
+	if (!rec) {
+		/* refname is not a packed reference. */
 		errno = ENOENT;
 		return -1;
 	}
 
-	hashcpy(sha1, entry->u.value.oid.hash);
+	if (get_sha1_hex(rec, sha1))
+		die_invalid_line(refs->path, rec, packed_refs->eof - rec);
+
 	*type = REF_ISPACKED;
 	return 0;
 }
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 16/20] ref_store: implement `refs_peel_ref()` generically
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (14 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 15/20] packed_read_raw_ref(): read the reference from the mmapped buffer Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 17/20] packed_ref_store: get rid of the `ref_cache` entirely Michael Haggerty
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

We're about to stop storing packed refs in a `ref_cache`. That means
that the only way we have left to optimize `peel_ref()` is by checking
whether the reference being peeled is the one currently being iterated
over (in `current_ref_iter`), and if so, using `ref_iterator_peel()`.
But this can be done generically; it doesn't have to be implemented
per-backend.

So implement `refs_peel_ref()` in `refs.c` and remove the `peel_ref()`
method from the refs API.

This removes the last callers of a couple of functions, so delete
them. More cleanup to come...

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c                | 18 +++++++++++++++++-
 refs/files-backend.c  | 38 --------------------------------------
 refs/packed-backend.c | 36 ------------------------------------
 refs/refs-internal.h  |  3 ---
 4 files changed, 17 insertions(+), 78 deletions(-)

diff --git a/refs.c b/refs.c
index 101c107ee8..c5e6f79c77 100644
--- a/refs.c
+++ b/refs.c
@@ -1735,7 +1735,23 @@ int refs_pack_refs(struct ref_store *refs, unsigned int flags)
 int refs_peel_ref(struct ref_store *refs, const char *refname,
 		  unsigned char *sha1)
 {
-	return refs->be->peel_ref(refs, refname, sha1);
+	int flag;
+	unsigned char base[20];
+
+	if (current_ref_iter && current_ref_iter->refname == refname) {
+		struct object_id peeled;
+
+		if (ref_iterator_peel(current_ref_iter, &peeled))
+			return -1;
+		hashcpy(sha1, peeled.hash);
+		return 0;
+	}
+
+	if (refs_read_ref_full(refs, refname,
+			       RESOLVE_REF_READING, base, &flag))
+		return -1;
+
+	return peel_object(base, sha1);
 }
 
 int peel_ref(const char *refname, unsigned char *sha1)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 35648c89fc..7d12de88d0 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -655,43 +655,6 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	return ret;
 }
 
-static int files_peel_ref(struct ref_store *ref_store,
-			  const char *refname, unsigned char *sha1)
-{
-	struct files_ref_store *refs =
-		files_downcast(ref_store, REF_STORE_READ | REF_STORE_ODB,
-			       "peel_ref");
-	int flag;
-	unsigned char base[20];
-
-	if (current_ref_iter && current_ref_iter->refname == refname) {
-		struct object_id peeled;
-
-		if (ref_iterator_peel(current_ref_iter, &peeled))
-			return -1;
-		hashcpy(sha1, peeled.hash);
-		return 0;
-	}
-
-	if (refs_read_ref_full(ref_store, refname,
-			       RESOLVE_REF_READING, base, &flag))
-		return -1;
-
-	/*
-	 * If the reference is packed, read its ref_entry from the
-	 * cache in the hope that we already know its peeled value.
-	 * We only try this optimization on packed references because
-	 * (a) forcing the filling of the loose reference cache could
-	 * be expensive and (b) loose references anyway usually do not
-	 * have REF_KNOWS_PEELED.
-	 */
-	if (flag & REF_ISPACKED &&
-	    !refs_peel_ref(refs->packed_ref_store, refname, sha1))
-		return 0;
-
-	return peel_object(base, sha1);
-}
-
 struct files_ref_iterator {
 	struct ref_iterator base;
 
@@ -3012,7 +2975,6 @@ struct ref_storage_be refs_be_files = {
 	files_initial_transaction_commit,
 
 	files_pack_refs,
-	files_peel_ref,
 	files_create_symref,
 	files_delete_refs,
 	files_rename_ref,
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 9b10532eb3..88242a49fe 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -780,26 +780,6 @@ static struct packed_ref_cache *get_packed_ref_cache(struct packed_ref_store *re
 	return refs->cache;
 }
 
-static struct ref_dir *get_packed_ref_dir(struct packed_ref_cache *packed_ref_cache)
-{
-	return get_ref_dir(packed_ref_cache->cache->root);
-}
-
-static struct ref_dir *get_packed_refs(struct packed_ref_store *refs)
-{
-	return get_packed_ref_dir(get_packed_ref_cache(refs));
-}
-
-/*
- * Return the ref_entry for the given refname from the packed
- * references.  If it does not exist, return NULL.
- */
-static struct ref_entry *get_packed_ref(struct packed_ref_store *refs,
-					const char *refname)
-{
-	return find_ref_entry(get_packed_refs(refs), refname);
-}
-
 static int packed_read_raw_ref(struct ref_store *ref_store,
 			       const char *refname, unsigned char *sha1,
 			       struct strbuf *referent, unsigned int *type)
@@ -826,21 +806,6 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
 	return 0;
 }
 
-static int packed_peel_ref(struct ref_store *ref_store,
-			   const char *refname, unsigned char *sha1)
-{
-	struct packed_ref_store *refs =
-		packed_downcast(ref_store, REF_STORE_READ | REF_STORE_ODB,
-				"peel_ref");
-	struct ref_entry *r = get_packed_ref(refs, refname);
-
-	if (!r || peel_entry(r, 0))
-		return -1;
-
-	hashcpy(sha1, r->u.value.peeled.hash);
-	return 0;
-}
-
 struct packed_ref_iterator {
 	struct ref_iterator base;
 
@@ -1526,7 +1491,6 @@ struct ref_storage_be refs_be_packed = {
 	packed_initial_transaction_commit,
 
 	packed_pack_refs,
-	packed_peel_ref,
 	packed_create_symref,
 	packed_delete_refs,
 	packed_rename_ref,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index d7f233beba..cc6c373f59 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -562,8 +562,6 @@ typedef int ref_transaction_commit_fn(struct ref_store *refs,
 				      struct strbuf *err);
 
 typedef int pack_refs_fn(struct ref_store *ref_store, unsigned int flags);
-typedef int peel_ref_fn(struct ref_store *ref_store,
-			const char *refname, unsigned char *sha1);
 typedef int create_symref_fn(struct ref_store *ref_store,
 			     const char *ref_target,
 			     const char *refs_heads_master,
@@ -668,7 +666,6 @@ struct ref_storage_be {
 	ref_transaction_commit_fn *initial_transaction_commit;
 
 	pack_refs_fn *pack_refs;
-	peel_ref_fn *peel_ref;
 	create_symref_fn *create_symref;
 	delete_refs_fn *delete_refs;
 	rename_ref_fn *rename_ref;
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 17/20] packed_ref_store: get rid of the `ref_cache` entirely
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (15 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 16/20] ref_store: implement `refs_peel_ref()` generically Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 18/20] ref_cache: remove support for storing peeled values Michael Haggerty
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

Now that everything has been changed to read what it needs directly
out of the `packed-refs` file, `packed_ref_store` doesn't need to
maintain a `ref_cache` at all. So get rid of it.

First of all, this will save a lot of memory and lots of little
allocations. Instead of needing to store complicated parsed data
structures in memory, we just mmap the file (potentially sharing
memory with other processes) and parse only what we need.

Moreover, since the mmapped access to the file reads only the parts of
the file that it needs, this might save reading all of the data from
disk at all (at least if the file starts out sorted).

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 29 ++---------------------------
 1 file changed, 2 insertions(+), 27 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 88242a49fe..9868a5c198 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -16,8 +16,6 @@ struct packed_ref_cache {
 	 */
 	struct packed_ref_store *refs;
 
-	struct ref_cache *cache;
-
 	/*
 	 * The file descriptor of the `packed-refs` file (open in
 	 * read-only mode), or -1 if it is not open.
@@ -123,7 +121,6 @@ static void release_packed_ref_buffer(struct packed_ref_cache *packed_refs)
 static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
 {
 	if (!--packed_refs->referrers) {
-		free_ref_cache(packed_refs->cache);
 		stat_validity_clear(&packed_refs->validity);
 		release_packed_ref_buffer(packed_refs);
 		free(packed_refs);
@@ -640,15 +637,10 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 	struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
 	struct stat st;
 	size_t size;
-	struct ref_dir *dir;
-	struct ref_iterator *iter;
 	int sorted = 0;
-	int ok;
 
 	packed_refs->refs = refs;
 	acquire_packed_ref_cache(packed_refs);
-	packed_refs->cache = create_ref_cache(NULL, NULL);
-	packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
 	packed_refs->peeled = PEELED_NONE;
 
 	packed_refs->fd = open(refs->path, O_RDONLY);
@@ -730,23 +722,6 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 		verify_buffer_safe(packed_refs);
 	}
 
-	dir = get_ref_dir(packed_refs->cache->root);
-	iter = mmapped_ref_iterator_begin(
-			packed_refs,
-			packed_refs->buf + packed_refs->header_len,
-			packed_refs->eof);
-	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
-		struct ref_entry *entry =
-			create_ref_entry(iter->refname, iter->oid, iter->flags);
-
-		if ((iter->flags & REF_KNOWS_PEELED))
-			ref_iterator_peel(iter, &entry->u.value.peeled);
-		add_ref_entry(dir, entry);
-	}
-
-	if (ok != ITER_DONE)
-		die("error reading packed-refs file %s", refs->path);
-
 	return packed_refs;
 }
 
@@ -905,8 +880,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	else
 		start = packed_refs->buf + packed_refs->header_len;
 
-	iter->iter0 = mmapped_ref_iterator_begin(
-			packed_refs, start, packed_refs->eof);
+	iter->iter0 = mmapped_ref_iterator_begin(packed_refs,
+						 start, packed_refs->eof);
 
 	iter->flags = flags;
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 18/20] ref_cache: remove support for storing peeled values
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (16 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 17/20] packed_ref_store: get rid of the `ref_cache` entirely Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 19/20] mmapped_ref_iterator: inline into `packed_ref_iterator` Michael Haggerty
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

Now that the `packed-refs` backend doesn't use `ref_cache`, there is
nobody left who might want to store peeled values of references in
`ref_cache`. So remove that feature.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c |  9 ++++++++-
 refs/ref-cache.c      | 42 +-----------------------------------------
 refs/ref-cache.h      | 32 ++------------------------------
 3 files changed, 11 insertions(+), 72 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 9868a5c198..d76051c7f5 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -2,7 +2,6 @@
 #include "../config.h"
 #include "../refs.h"
 #include "refs-internal.h"
-#include "ref-cache.h"
 #include "packed-backend.h"
 #include "../iterator.h"
 #include "../lockfile.h"
@@ -201,6 +200,14 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
+/*
+ * This value is set in `base.flags` if the peeled value of the
+ * current reference is known. In that case, `peeled` contains the
+ * correct peeled value for the reference, which might be `null_sha1`
+ * if the reference is not a tag or if it is broken.
+ */
+#define REF_KNOWS_PEELED 0x40
+
 /*
  * An iterator over a packed-refs file that is currently mmapped.
  */
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 54dfb5218c..4f850e1b5c 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -38,7 +38,6 @@ struct ref_entry *create_ref_entry(const char *refname,
 
 	FLEX_ALLOC_STR(ref, name, refname);
 	oidcpy(&ref->u.value.oid, oid);
-	oidclr(&ref->u.value.peeled);
 	ref->flag = flag;
 	return ref;
 }
@@ -491,49 +490,10 @@ static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
 	}
 }
 
-enum peel_status peel_entry(struct ref_entry *entry, int repeel)
-{
-	enum peel_status status;
-
-	if (entry->flag & REF_KNOWS_PEELED) {
-		if (repeel) {
-			entry->flag &= ~REF_KNOWS_PEELED;
-			oidclr(&entry->u.value.peeled);
-		} else {
-			return is_null_oid(&entry->u.value.peeled) ?
-				PEEL_NON_TAG : PEEL_PEELED;
-		}
-	}
-	if (entry->flag & REF_ISBROKEN)
-		return PEEL_BROKEN;
-	if (entry->flag & REF_ISSYMREF)
-		return PEEL_IS_SYMREF;
-
-	status = peel_object(entry->u.value.oid.hash, entry->u.value.peeled.hash);
-	if (status == PEEL_PEELED || status == PEEL_NON_TAG)
-		entry->flag |= REF_KNOWS_PEELED;
-	return status;
-}
-
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
 				   struct object_id *peeled)
 {
-	struct cache_ref_iterator *iter =
-		(struct cache_ref_iterator *)ref_iterator;
-	struct cache_ref_iterator_level *level;
-	struct ref_entry *entry;
-
-	level = &iter->levels[iter->levels_nr - 1];
-
-	if (level->index == -1)
-		die("BUG: peel called before advance for cache iterator");
-
-	entry = level->dir->entries[level->index];
-
-	if (peel_entry(entry, 0))
-		return -1;
-	oidcpy(peeled, &entry->u.value.peeled);
-	return 0;
+	return peel_object(ref_iterator->oid->hash, peeled->hash);
 }
 
 static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index a082bfb06c..eda65e73ed 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -38,14 +38,6 @@ struct ref_value {
 	 * referred to by the last reference in the symlink chain.
 	 */
 	struct object_id oid;
-
-	/*
-	 * If REF_KNOWS_PEELED, then this field holds the peeled value
-	 * of this reference, or null if the reference is known not to
-	 * be peelable.  See the documentation for peel_ref() for an
-	 * exact definition of "peelable".
-	 */
-	struct object_id peeled;
 };
 
 /*
@@ -97,21 +89,14 @@ struct ref_dir {
  * public values; see refs.h.
  */
 
-/*
- * The field ref_entry->u.value.peeled of this value entry contains
- * the correct peeled value for the reference, which might be
- * null_sha1 if the reference is not a tag or if it is broken.
- */
-#define REF_KNOWS_PEELED 0x10
-
 /* ref_entry represents a directory of references */
-#define REF_DIR 0x20
+#define REF_DIR 0x10
 
 /*
  * Entry has not yet been read from disk (used only for REF_DIR
  * entries representing loose references)
  */
-#define REF_INCOMPLETE 0x40
+#define REF_INCOMPLETE 0x20
 
 /*
  * A ref_entry represents either a reference or a "subdirectory" of
@@ -252,17 +237,4 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 					      const char *prefix,
 					      int prime_dir);
 
-/*
- * Peel the entry (if possible) and return its new peel_status.  If
- * repeel is true, re-peel the entry even if there is an old peeled
- * value that is already stored in it.
- *
- * It is OK to call this function with a packed reference entry that
- * might be stale and might even refer to an object that has since
- * been garbage-collected.  In such a case, if the entry has
- * REF_KNOWS_PEELED then leave the status unchanged and return
- * PEEL_PEELED or PEEL_NON_TAG; otherwise, return PEEL_INVALID.
- */
-enum peel_status peel_entry(struct ref_entry *entry, int repeel);
-
 #endif /* REFS_REF_CACHE_H */
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 19/20] mmapped_ref_iterator: inline into `packed_ref_iterator`
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (17 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 18/20] ref_cache: remove support for storing peeled values Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 17:16 ` [PATCH 20/20] packed-backend.c: rename a bunch of things and update comments Michael Haggerty
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

Since `packed_ref_iterator` is now delegating to
`mmapped_ref_iterator` rather than `cache_ref_iterator` to do the
heavy lifting, there is no need to keep the two iterators separate. So
"inline" `mmapped_ref_iterator` into `packed_ref_iterator`. This
removes a bunch of boilerplate.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 284 ++++++++++++++++++++------------------------------
 1 file changed, 114 insertions(+), 170 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index d76051c7f5..a3f8a19b9b 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -200,157 +200,6 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
-/*
- * This value is set in `base.flags` if the peeled value of the
- * current reference is known. In that case, `peeled` contains the
- * correct peeled value for the reference, which might be `null_sha1`
- * if the reference is not a tag or if it is broken.
- */
-#define REF_KNOWS_PEELED 0x40
-
-/*
- * An iterator over a packed-refs file that is currently mmapped.
- */
-struct mmapped_ref_iterator {
-	struct ref_iterator base;
-
-	struct packed_ref_cache *packed_refs;
-
-	/* The current position in the mmapped file: */
-	const char *pos;
-
-	/* The end of the mmapped file: */
-	const char *eof;
-
-	struct object_id oid, peeled;
-
-	struct strbuf refname_buf;
-};
-
-static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator)
-{
-	struct mmapped_ref_iterator *iter =
-		(struct mmapped_ref_iterator *)ref_iterator;
-	const char *p = iter->pos, *eol;
-
-	strbuf_reset(&iter->refname_buf);
-
-	if (iter->pos == iter->eof)
-		return ref_iterator_abort(ref_iterator);
-
-	iter->base.flags = REF_ISPACKED;
-
-	if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
-	    parse_oid_hex(p, &iter->oid, &p) ||
-	    !isspace(*p++))
-		die_invalid_line(iter->packed_refs->refs->path,
-				 iter->pos, iter->eof - iter->pos);
-
-	eol = memchr(p, '\n', iter->eof - p);
-	if (!eol)
-		die_unterminated_line(iter->packed_refs->refs->path,
-				      iter->pos, iter->eof - iter->pos);
-
-	strbuf_add(&iter->refname_buf, p, eol - p);
-	iter->base.refname = iter->refname_buf.buf;
-
-	if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) {
-		if (!refname_is_safe(iter->base.refname))
-			die("packed refname is dangerous: %s",
-			    iter->base.refname);
-		oidclr(&iter->oid);
-		iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN;
-	}
-	if (iter->packed_refs->peeled == PEELED_FULLY ||
-	    (iter->packed_refs->peeled == PEELED_TAGS &&
-	     starts_with(iter->base.refname, "refs/tags/")))
-		iter->base.flags |= REF_KNOWS_PEELED;
-
-	iter->pos = eol + 1;
-
-	if (iter->pos < iter->eof && *iter->pos == '^') {
-		p = iter->pos + 1;
-		if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
-		    parse_oid_hex(p, &iter->peeled, &p) ||
-		    *p++ != '\n')
-			die_invalid_line(iter->packed_refs->refs->path,
-					 iter->pos, iter->eof - iter->pos);
-		iter->pos = p;
-
-		/*
-		 * Regardless of what the file header said, we
-		 * definitely know the value of *this* reference. But
-		 * we suppress it if the reference is broken:
-		 */
-		if ((iter->base.flags & REF_ISBROKEN)) {
-			oidclr(&iter->peeled);
-			iter->base.flags &= ~REF_KNOWS_PEELED;
-		} else {
-			iter->base.flags |= REF_KNOWS_PEELED;
-		}
-	} else {
-		oidclr(&iter->peeled);
-	}
-
-	return ITER_OK;
-}
-
-static int mmapped_ref_iterator_peel(struct ref_iterator *ref_iterator,
-				    struct object_id *peeled)
-{
-	struct mmapped_ref_iterator *iter =
-		(struct mmapped_ref_iterator *)ref_iterator;
-
-	if ((iter->base.flags & REF_KNOWS_PEELED)) {
-		oidcpy(peeled, &iter->peeled);
-		return is_null_oid(&iter->peeled) ? -1 : 0;
-	} else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) {
-		return -1;
-	} else {
-		return !!peel_object(iter->oid.hash, peeled->hash);
-	}
-}
-
-static int mmapped_ref_iterator_abort(struct ref_iterator *ref_iterator)
-{
-	struct mmapped_ref_iterator *iter =
-		(struct mmapped_ref_iterator *)ref_iterator;
-
-	release_packed_ref_cache(iter->packed_refs);
-	strbuf_release(&iter->refname_buf);
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
-}
-
-static struct ref_iterator_vtable mmapped_ref_iterator_vtable = {
-	mmapped_ref_iterator_advance,
-	mmapped_ref_iterator_peel,
-	mmapped_ref_iterator_abort
-};
-
-struct ref_iterator *mmapped_ref_iterator_begin(
-		struct packed_ref_cache *packed_refs,
-		const char *pos, const char *eof)
-{
-	struct mmapped_ref_iterator *iter = xcalloc(1, sizeof(*iter));
-	struct ref_iterator *ref_iterator = &iter->base;
-
-	if (!packed_refs->buf)
-		return empty_ref_iterator_begin();
-
-	base_ref_iterator_init(ref_iterator, &mmapped_ref_iterator_vtable, 1);
-
-	iter->packed_refs = packed_refs;
-	acquire_packed_ref_cache(iter->packed_refs);
-	iter->pos = pos;
-	iter->eof = eof;
-	strbuf_init(&iter->refname_buf, 0);
-
-	iter->base.oid = &iter->oid;
-
-	return ref_iterator;
-}
-
 struct packed_ref_entry {
 	const char *start;
 	size_t len;
@@ -788,38 +637,120 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
 	return 0;
 }
 
+/*
+ * This value is set in `base.flags` if the peeled value of the
+ * current reference is known. In that case, `peeled` contains the
+ * correct peeled value for the reference, which might be `null_sha1`
+ * if the reference is not a tag or if it is broken.
+ */
+#define REF_KNOWS_PEELED 0x40
+
+/*
+ * An iterator over a packed-refs file that is currently mmapped.
+ */
 struct packed_ref_iterator {
 	struct ref_iterator base;
 
-	struct packed_ref_cache *cache;
-	struct ref_iterator *iter0;
+	struct packed_ref_cache *packed_refs;
+
+	/* The current position in the mmapped file: */
+	const char *pos;
+
+	/* The end of the mmapped file: */
+	const char *eof;
+
+	struct object_id oid, peeled;
+
+	struct strbuf refname_buf;
+
 	unsigned int flags;
 };
 
+static int next_record(struct packed_ref_iterator *iter)
+{
+	const char *p = iter->pos, *eol;
+
+	strbuf_reset(&iter->refname_buf);
+
+	if (iter->pos == iter->eof)
+		return ITER_DONE;
+
+	iter->base.flags = REF_ISPACKED;
+
+	if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
+	    parse_oid_hex(p, &iter->oid, &p) ||
+	    !isspace(*p++))
+		die_invalid_line(iter->packed_refs->refs->path,
+				 iter->pos, iter->eof - iter->pos);
+
+	eol = memchr(p, '\n', iter->eof - p);
+	if (!eol)
+		die_unterminated_line(iter->packed_refs->refs->path,
+				      iter->pos, iter->eof - iter->pos);
+
+	strbuf_add(&iter->refname_buf, p, eol - p);
+	iter->base.refname = iter->refname_buf.buf;
+
+	if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) {
+		if (!refname_is_safe(iter->base.refname))
+			die("packed refname is dangerous: %s",
+			    iter->base.refname);
+		oidclr(&iter->oid);
+		iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN;
+	}
+	if (iter->packed_refs->peeled == PEELED_FULLY ||
+	    (iter->packed_refs->peeled == PEELED_TAGS &&
+	     starts_with(iter->base.refname, "refs/tags/")))
+		iter->base.flags |= REF_KNOWS_PEELED;
+
+	iter->pos = eol + 1;
+
+	if (iter->pos < iter->eof && *iter->pos == '^') {
+		p = iter->pos + 1;
+		if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
+		    parse_oid_hex(p, &iter->peeled, &p) ||
+		    *p++ != '\n')
+			die_invalid_line(iter->packed_refs->refs->path,
+					 iter->pos, iter->eof - iter->pos);
+		iter->pos = p;
+
+		/*
+		 * Regardless of what the file header said, we
+		 * definitely know the value of *this* reference. But
+		 * we suppress it if the reference is broken:
+		 */
+		if ((iter->base.flags & REF_ISBROKEN)) {
+			oidclr(&iter->peeled);
+			iter->base.flags &= ~REF_KNOWS_PEELED;
+		} else {
+			iter->base.flags |= REF_KNOWS_PEELED;
+		}
+	} else {
+		oidclr(&iter->peeled);
+	}
+
+	return ITER_OK;
+}
+
 static int packed_ref_iterator_advance(struct ref_iterator *ref_iterator)
 {
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
 	int ok;
 
-	while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
+	while ((ok = next_record(iter)) == ITER_OK) {
 		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
-		    ref_type(iter->iter0->refname) != REF_TYPE_PER_WORKTREE)
+		    ref_type(iter->base.refname) != REF_TYPE_PER_WORKTREE)
 			continue;
 
 		if (!(iter->flags & DO_FOR_EACH_INCLUDE_BROKEN) &&
-		    !ref_resolves_to_object(iter->iter0->refname,
-					    iter->iter0->oid,
-					    iter->iter0->flags))
+		    !ref_resolves_to_object(iter->base.refname, &iter->oid,
+					    iter->flags))
 			continue;
 
-		iter->base.refname = iter->iter0->refname;
-		iter->base.oid = iter->iter0->oid;
-		iter->base.flags = iter->iter0->flags;
 		return ITER_OK;
 	}
 
-	iter->iter0 = NULL;
 	if (ref_iterator_abort(ref_iterator) != ITER_DONE)
 		ok = ITER_ERROR;
 
@@ -832,7 +763,14 @@ static int packed_ref_iterator_peel(struct ref_iterator *ref_iterator,
 	struct packed_ref_iterator *iter =
 		(struct packed_ref_iterator *)ref_iterator;
 
-	return ref_iterator_peel(iter->iter0, peeled);
+	if ((iter->base.flags & REF_KNOWS_PEELED)) {
+		oidcpy(peeled, &iter->peeled);
+		return is_null_oid(&iter->peeled) ? -1 : 0;
+	} else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) {
+		return -1;
+	} else {
+		return !!peel_object(iter->oid.hash, peeled->hash);
+	}
 }
 
 static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
@@ -841,10 +779,8 @@ static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
 		(struct packed_ref_iterator *)ref_iterator;
 	int ok = ITER_DONE;
 
-	if (iter->iter0)
-		ok = ref_iterator_abort(iter->iter0);
-
-	release_packed_ref_cache(iter->cache);
+	strbuf_release(&iter->refname_buf);
+	release_packed_ref_cache(iter->packed_refs);
 	base_ref_iterator_free(ref_iterator);
 	return ok;
 }
@@ -870,6 +806,11 @@ static struct ref_iterator *packed_ref_iterator_begin(
 		required_flags |= REF_STORE_ODB;
 	refs = packed_downcast(ref_store, required_flags, "ref_iterator_begin");
 
+	packed_refs = get_packed_ref_cache(refs);
+
+	if (!packed_refs->buf)
+		return empty_ref_iterator_begin();
+
 	iter = xcalloc(1, sizeof(*iter));
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
@@ -879,7 +820,7 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	 * the packed-ref cache is up to date with what is on disk,
 	 * and re-reads it if not.
 	 */
-	iter->cache = packed_refs = get_packed_ref_cache(refs);
+	iter->packed_refs = packed_refs;
 	acquire_packed_ref_cache(packed_refs);
 
 	if (prefix && *prefix)
@@ -887,8 +828,11 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	else
 		start = packed_refs->buf + packed_refs->header_len;
 
-	iter->iter0 = mmapped_ref_iterator_begin(packed_refs,
-						 start, packed_refs->eof);
+	iter->pos = start;
+	iter->eof = packed_refs->eof;
+	strbuf_init(&iter->refname_buf, 0);
+
+	iter->base.oid = &iter->oid;
 
 	iter->flags = flags;
 
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 20/20] packed-backend.c: rename a bunch of things and update comments
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (18 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 19/20] mmapped_ref_iterator: inline into `packed_ref_iterator` Michael Haggerty
@ 2017-09-13 17:16 ` Michael Haggerty
  2017-09-13 23:00   ` Stefan Beller
  2017-09-13 21:35 ` [PATCH 00/20] Read `packed-refs` using mmap() Junio C Hamano
                   ` (2 subsequent siblings)
  22 siblings, 1 reply; 27+ messages in thread
From: Michael Haggerty @ 2017-09-13 17:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Michael Haggerty

We've made huge changes to this file, and some of the old names and
comments are no longer very fitting. So rename a bunch of things:

* `struct packed_ref_cache` → `struct snapshot`
* `acquire_packed_ref_cache()` → `acquire_snapshot()`
* `release_packed_ref_buffer()` → `clear_snapshot_buffer()`
* `release_packed_ref_cache()` → `release_snapshot()`
* `clear_packed_ref_cache()` → `clear_snapshot()`
* `struct packed_ref_entry` → `struct snapshot_record`
* `cmp_packed_ref_entries()` → `cmp_packed_ref_records()`
* `cmp_entry_to_refname()` → `cmp_record_to_refname()`
* `sort_packed_refs()` → `sort_snapshot()`
* `read_packed_refs()` → `create_snapshot()`
* `validate_packed_ref_cache()` → `validate_snapshot()`
* `get_packed_ref_cache()` → `get_snapshot()`
* Renamed local variables and struct members accordingly.

Also update a bunch of comments to reflect the renaming and the
accumulated changes that the code has undergone.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 392 ++++++++++++++++++++++++++++----------------------
 1 file changed, 217 insertions(+), 175 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a3f8a19b9b..8235ac8506 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -8,10 +8,30 @@
 
 struct packed_ref_store;
 
-struct packed_ref_cache {
+/*
+ * A `snapshot` represents one snapshot of a `packed-refs` file.
+ *
+ * Normally, this will be a mmapped view of the contents of the
+ * `packed-refs` file at the time the snapshot was created. However,
+ * if the `packed-refs` file was not sorted, this might point at heap
+ * memory holding the contents of the `packed-refs` file with its
+ * records sorted by refname.
+ *
+ * `snapshot` instances are reference counted (via
+ * `acquire_snapshot()` and `release_snapshot()`). This is to prevent
+ * an instance from disappearing while an iterator is still iterating
+ * over it. Instances are garbage collected when their `referrers`
+ * count goes to zero.
+ *
+ * The most recent `snapshot`, if available, is referenced by the
+ * `packed_ref_store`. Its freshness is checked whenever
+ * `get_snapshot()` is called; if the existing snapshot is obsolete, a
+ * new snapshot is taken.
+ */
+struct snapshot {
 	/*
 	 * A back-pointer to the packed_ref_store with which this
-	 * cache is associated:
+	 * snapshot is associated:
 	 */
 	struct packed_ref_store *refs;
 
@@ -35,26 +55,42 @@ struct packed_ref_cache {
 	size_t header_len;
 
 	/*
-	 * What is the peeled state of this cache? (This is usually
-	 * determined from the header of the "packed-refs" file.)
+	 * What is the peeled state of the `packed-refs` file that
+	 * this snapshot represents? (This is usually determined from
+	 * the file's header.)
 	 */
 	enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled;
 
 	/*
-	 * Count of references to the data structure in this instance,
-	 * including the pointer from files_ref_store::packed if any.
-	 * The data will not be freed as long as the reference count
-	 * is nonzero.
+	 * Count of references to this instance, including the pointer
+	 * from `packed_ref_store::snapshot`, if any. The instance
+	 * will not be freed as long as the reference count is
+	 * nonzero.
 	 */
 	unsigned int referrers;
 
-	/* The metadata from when this packed-refs cache was read */
+	/*
+	 * The metadata of the `packed-refs` file from which this
+	 * snapshot was created, used to tell if the file has been
+	 * replaced since we read it.
+	 */
 	struct stat_validity validity;
 };
 
 /*
- * A container for `packed-refs`-related data. It is not (yet) a
- * `ref_store`.
+ * A `ref_store` representing references stored in a `packed-refs`
+ * file. It implements the `ref_store` interface, though it has some
+ * limitations:
+ *
+ * - It cannot store symbolic references.
+ *
+ * - It cannot store reflogs.
+ *
+ * - It does not support reference renaming (though it could).
+ *
+ * On the other hand, it can be locked outside of a reference
+ * transaction. In that case, it remains locked even after the
+ * transaction is done and the new `packed-refs` file is activated.
  */
 struct packed_ref_store {
 	struct ref_store base;
@@ -65,10 +101,10 @@ struct packed_ref_store {
 	char *path;
 
 	/*
-	 * A cache of the values read from the `packed-refs` file, if
-	 * it might still be current; otherwise, NULL.
+	 * A snapshot of the values read from the `packed-refs` file,
+	 * if it might still be current; otherwise, NULL.
 	 */
-	struct packed_ref_cache *cache;
+	struct snapshot *snapshot;
 
 	/*
 	 * Lock used for the "packed-refs" file. Note that this (and
@@ -85,44 +121,43 @@ struct packed_ref_store {
 };
 
 /*
- * Increment the reference count of *packed_refs.
+ * Increment the reference count of `*snapshot`.
  */
-static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs)
+static void acquire_snapshot(struct snapshot *snapshot)
 {
-	packed_refs->referrers++;
+	snapshot->referrers++;
 }
 
 /*
- * If the buffer in `packed_refs` is active, then either munmap the
+ * If the buffer in `snapshot` is active, then either munmap the
  * memory and close the file, or free the memory. Then set the buffer
  * pointers to NULL.
  */
-static void release_packed_ref_buffer(struct packed_ref_cache *packed_refs)
+static void clear_snapshot_buffer(struct snapshot *snapshot)
 {
-	if (packed_refs->fd >= 0) {
-		if (munmap(packed_refs->buf,
-			   packed_refs->eof - packed_refs->buf))
+	if (snapshot->fd >= 0) {
+		if (munmap(snapshot->buf, snapshot->eof - snapshot->buf))
 			die_errno("error ummapping packed-refs file %s",
-				  packed_refs->refs->path);
-		close(packed_refs->fd);
-		packed_refs->fd = -1;
+				  snapshot->refs->path);
+		close(snapshot->fd);
+		snapshot->fd = -1;
 	} else {
-		free(packed_refs->buf);
+		free(snapshot->buf);
 	}
-	packed_refs->buf = packed_refs->eof = NULL;
-	packed_refs->header_len = 0;
+	snapshot->buf = snapshot->eof = NULL;
+	snapshot->header_len = 0;
 }
 
 /*
- * Decrease the reference count of *packed_refs.  If it goes to zero,
- * free *packed_refs and return true; otherwise return false.
+ * Decrease the reference count of `*snapshot`. If it goes to zero,
+ * free `*snapshot` and return true; otherwise return false.
  */
-static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
+static int release_snapshot(struct snapshot *snapshot)
 {
-	if (!--packed_refs->referrers) {
-		stat_validity_clear(&packed_refs->validity);
-		release_packed_ref_buffer(packed_refs);
-		free(packed_refs);
+	if (!--snapshot->referrers) {
+		stat_validity_clear(&snapshot->validity);
+		clear_snapshot_buffer(snapshot);
+		free(snapshot);
 		return 1;
 	} else {
 		return 0;
@@ -167,13 +202,13 @@ static struct packed_ref_store *packed_downcast(struct ref_store *ref_store,
 	return refs;
 }
 
-static void clear_packed_ref_cache(struct packed_ref_store *refs)
+static void clear_snapshot(struct packed_ref_store *refs)
 {
-	if (refs->cache) {
-		struct packed_ref_cache *cache = refs->cache;
+	if (refs->snapshot) {
+		struct snapshot *snapshot = refs->snapshot;
 
-		refs->cache = NULL;
-		release_packed_ref_cache(cache);
+		refs->snapshot = NULL;
+		release_snapshot(snapshot);
 	}
 }
 
@@ -200,14 +235,14 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
-struct packed_ref_entry {
+struct snapshot_record {
 	const char *start;
 	size_t len;
 };
 
-static int cmp_packed_ref_entries(const void *v1, const void *v2)
+static int cmp_packed_ref_records(const void *v1, const void *v2)
 {
-	const struct packed_ref_entry *e1 = v1, *e2 = v2;
+	const struct snapshot_record *e1 = v1, *e2 = v2;
 	const char *r1 = e1->start + GIT_SHA1_HEXSZ + 1;
 	const char *r2 = e2->start + GIT_SHA1_HEXSZ + 1;
 
@@ -226,10 +261,10 @@ static int cmp_packed_ref_entries(const void *v1, const void *v2)
 }
 
 /*
- * Compare a packed-refs record pointed to by `rec` to the specified
- * NUL-terminated refname.
+ * Compare a snapshot record at `rec` to the specified NUL-terminated
+ * refname.
  */
-static int cmp_entry_to_refname(const char *rec, const char *refname)
+static int cmp_record_to_refname(const char *rec, const char *refname)
 {
 	const char *r1 = rec + GIT_SHA1_HEXSZ + 1;
 	const char *r2 = refname;
@@ -247,31 +282,30 @@ static int cmp_entry_to_refname(const char *rec, const char *refname)
 }
 
 /*
- * `packed_refs->buf` is not known to be sorted. Check whether it is,
- * and if not, sort it into new memory and munmap/free the old
- * storage.
+ * `snapshot->buf` is not known to be sorted. Check whether it is, and
+ * if not, sort it into new memory and munmap/free the old storage.
  */
-static void sort_packed_refs(struct packed_ref_cache *packed_refs)
+static void sort_snapshot(struct snapshot *snapshot)
 {
-	struct packed_ref_entry *entries = NULL;
+	struct snapshot_record *records = NULL;
 	size_t alloc = 0, nr = 0;
 	int sorted = 1;
 	const char *pos, *eof, *eol;
 	size_t len, i;
 	char *new_buffer, *dst;
 
-	pos = packed_refs->buf + packed_refs->header_len;
-	eof = packed_refs->eof;
+	pos = snapshot->buf + snapshot->header_len;
+	eof = snapshot->eof;
 	len = eof - pos;
 
 	if (!len)
 		return;
 
 	/*
-	 * Initialize entries based on a crude estimate of the number
+	 * Initialize records based on a crude estimate of the number
 	 * of references in the file (we'll grow it below if needed):
 	 */
-	ALLOC_GROW(entries, len / 80 + 20, alloc);
+	ALLOC_GROW(records, len / 80 + 20, alloc);
 
 	while (pos < eof) {
 		eol = memchr(pos, '\n', eof - pos);
@@ -279,7 +313,7 @@ static void sort_packed_refs(struct packed_ref_cache *packed_refs)
 			/* The safety check should prevent this. */
 			BUG("unterminated line found in packed-refs");
 		if (eol - pos < GIT_SHA1_HEXSZ + 2)
-			die_invalid_line(packed_refs->refs->path,
+			die_invalid_line(snapshot->refs->path,
 					 pos, eof - pos);
 		eol++;
 		if (eol < eof && *eol == '^') {
@@ -296,15 +330,15 @@ static void sort_packed_refs(struct packed_ref_cache *packed_refs)
 			eol++;
 		}
 
-		ALLOC_GROW(entries, nr + 1, alloc);
-		entries[nr].start = pos;
-		entries[nr].len = eol - pos;
+		ALLOC_GROW(records, nr + 1, alloc);
+		records[nr].start = pos;
+		records[nr].len = eol - pos;
 		nr++;
 
 		if (sorted &&
 		    nr > 1 &&
-		    cmp_packed_ref_entries(&entries[nr - 2],
-					   &entries[nr - 1]) >= 0)
+		    cmp_packed_ref_records(&records[nr - 2],
+					   &records[nr - 1]) >= 0)
 			sorted = 0;
 
 		pos = eol;
@@ -313,31 +347,31 @@ static void sort_packed_refs(struct packed_ref_cache *packed_refs)
 	if (sorted)
 		goto cleanup;
 
-	/* We need to sort the memory. First we sort the entries array: */
-	QSORT(entries, nr, cmp_packed_ref_entries);
+	/* We need to sort the memory. First we sort the records array: */
+	QSORT(records, nr, cmp_packed_ref_records);
 
 	/*
 	 * Allocate a new chunk of memory, and copy the old memory to
-	 * the new in the order indicated by `entries` (not bothering
+	 * the new in the order indicated by `records` (not bothering
 	 * with the header line):
 	 */
 	new_buffer = xmalloc(len);
 	for (dst = new_buffer, i = 0; i < nr; i++) {
-		memcpy(dst, entries[i].start, entries[i].len);
-		dst += entries[i].len;
+		memcpy(dst, records[i].start, records[i].len);
+		dst += records[i].len;
 	}
 
 	/*
 	 * Now munmap the old buffer and use the sorted buffer in its
 	 * place:
 	 */
-	release_packed_ref_buffer(packed_refs);
-	packed_refs->buf = new_buffer;
-	packed_refs->eof = new_buffer + len;
-	packed_refs->header_len = 0;
+	clear_snapshot_buffer(snapshot);
+	snapshot->buf = new_buffer;
+	snapshot->eof = new_buffer + len;
+	snapshot->header_len = 0;
 
 cleanup:
-	free(entries);
+	free(records);
 }
 
 /*
@@ -381,10 +415,10 @@ static const char *find_end_of_record(const char *p, const char *end)
  * (GIT_SHA1_HEXSZ + 1) characters before the LF. Die if either of
  * these checks fails.
  */
-static void verify_buffer_safe(struct packed_ref_cache *packed_refs)
+static void verify_buffer_safe(struct snapshot *snapshot)
 {
-	const char *buf = packed_refs->buf + packed_refs->header_len;
-	const char *eof = packed_refs->eof;
+	const char *buf = snapshot->buf + snapshot->header_len;
+	const char *eof = snapshot->eof;
 	const char *last_line;
 
 	if (buf == eof)
@@ -392,12 +426,12 @@ static void verify_buffer_safe(struct packed_ref_cache *packed_refs)
 
 	last_line = find_start_of_record(buf, eof - 1);
 	if (*(eof - 1) != '\n' || eof - last_line < GIT_SHA1_HEXSZ + 2)
-		die_invalid_line(packed_refs->refs->path,
+		die_invalid_line(snapshot->refs->path,
 				 last_line, eof - last_line);
 }
 
 /*
- * Find the place in `cache->buf` where the start of the record for
+ * Find the place in `snapshot->buf` where the start of the record for
  * `refname` starts. If `mustexist` is true and the reference doesn't
  * exist, then return NULL. If `mustexist` is false and the reference
  * doesn't exist, then return the point where that reference would be
@@ -405,10 +439,10 @@ static void verify_buffer_safe(struct packed_ref_cache *packed_refs)
  * reference name; for example, one could search for "refs/replace/"
  * to find the start of any replace references.
  *
- * The record is sought using a binary search, so `cache->buf` must be
- * sorted.
+ * The record is sought using a binary search, so `snapshot->buf` must
+ * be sorted.
  */
-static const char *find_reference_location(struct packed_ref_cache *cache,
+static const char *find_reference_location(struct snapshot *snapshot,
 					   const char *refname, int mustexist)
 {
 	/*
@@ -425,13 +459,13 @@ static const char *find_reference_location(struct packed_ref_cache *cache,
 	 * preceding records all have reference names that come
 	 * *before* `refname`.
 	 */
-	const char *lo = cache->buf + cache->header_len;
+	const char *lo = snapshot->buf + snapshot->header_len;
 
 	/*
 	 * A pointer to a the first character of a record whose
 	 * reference name comes *after* `refname`.
 	 */
-	const char *hi = cache->eof;
+	const char *hi = snapshot->eof;
 
 	while (lo < hi) {
 		const char *mid, *rec;
@@ -439,7 +473,7 @@ static const char *find_reference_location(struct packed_ref_cache *cache,
 
 		mid = lo + (hi - lo) / 2;
 		rec = find_start_of_record(lo, mid);
-		cmp = cmp_entry_to_refname(rec, refname);
+		cmp = cmp_record_to_refname(rec, refname);
 		if (cmp < 0) {
 			lo = find_end_of_record(mid, hi);
 		} else if (cmp > 0) {
@@ -456,9 +490,9 @@ static const char *find_reference_location(struct packed_ref_cache *cache,
 }
 
 /*
- * Read from the `packed-refs` file into a newly-allocated
- * `packed_ref_cache` and return it. The return value will already
- * have its reference count incremented.
+ * Create a newly-allocated `snapshot` of the `packed-refs` file in
+ * its current state and return it. The return value will already have
+ * its reference count incremented.
  *
  * A comment line of the form "# pack-refs with: " may contain zero or
  * more traits. We interpret the traits as follows:
@@ -488,19 +522,19 @@ static const char *find_reference_location(struct packed_ref_cache *cache,
  *
  *      The references in this file are known to be sorted by refname.
  */
-static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
+static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 {
-	struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
+	struct snapshot *snapshot = xcalloc(1, sizeof(*snapshot));
 	struct stat st;
 	size_t size;
 	int sorted = 0;
 
-	packed_refs->refs = refs;
-	acquire_packed_ref_cache(packed_refs);
-	packed_refs->peeled = PEELED_NONE;
+	snapshot->refs = refs;
+	acquire_snapshot(snapshot);
+	snapshot->peeled = PEELED_NONE;
 
-	packed_refs->fd = open(refs->path, O_RDONLY);
-	if (packed_refs->fd < 0) {
+	snapshot->fd = open(refs->path, O_RDONLY);
+	if (snapshot->fd < 0) {
 		if (errno == ENOENT) {
 			/*
 			 * This is OK; it just means that no
@@ -509,106 +543,106 @@ static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 			 * which is its state when initialized with
 			 * zeros.
 			 */
-			return packed_refs;
+			return snapshot;
 		} else {
 			die_errno("couldn't read %s", refs->path);
 		}
 	}
 
-	stat_validity_update(&packed_refs->validity, packed_refs->fd);
+	stat_validity_update(&snapshot->validity, snapshot->fd);
 
-	if (fstat(packed_refs->fd, &st) < 0)
+	if (fstat(snapshot->fd, &st) < 0)
 		die_errno("couldn't stat %s", refs->path);
 
 	size = xsize_t(st.st_size);
-	packed_refs->buf = xmmap(NULL, size,
-				 PROT_READ, MAP_PRIVATE,
-				 packed_refs->fd, 0);
-	packed_refs->eof = packed_refs->buf + size;
+	snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE,
+			      snapshot->fd, 0);
+	snapshot->eof = snapshot->buf + size;
 
 	/* If the file has a header line, process it: */
-	if (packed_refs->buf < packed_refs->eof && *packed_refs->buf == '#') {
+	if (snapshot->buf < snapshot->eof && *snapshot->buf == '#') {
 		struct strbuf tmp = STRBUF_INIT;
 		char *p;
 		const char *eol;
 		struct string_list traits = STRING_LIST_INIT_NODUP;
 
-		eol = memchr(packed_refs->buf, '\n',
-			     packed_refs->eof - packed_refs->buf);
+		eol = memchr(snapshot->buf, '\n',
+			     snapshot->eof - snapshot->buf);
 		if (!eol)
 			die_unterminated_line(refs->path,
-					      packed_refs->buf,
-					      packed_refs->eof - packed_refs->buf);
+					      snapshot->buf,
+					      snapshot->eof - snapshot->buf);
 
-		strbuf_add(&tmp, packed_refs->buf, eol - packed_refs->buf);
+		strbuf_add(&tmp, snapshot->buf, eol - snapshot->buf);
 
 		if (!skip_prefix(tmp.buf, "# pack-refs with:", (const char **)&p))
 			die_invalid_line(refs->path,
-					 packed_refs->buf,
-					 packed_refs->eof - packed_refs->buf);
+					 snapshot->buf,
+					 snapshot->eof - snapshot->buf);
 
 		string_list_split_in_place(&traits, p, ' ', -1);
 
 		if (unsorted_string_list_has_string(&traits, "fully-peeled"))
-			packed_refs->peeled = PEELED_FULLY;
+			snapshot->peeled = PEELED_FULLY;
 		else if (unsorted_string_list_has_string(&traits, "peeled"))
-			packed_refs->peeled = PEELED_TAGS;
+			snapshot->peeled = PEELED_TAGS;
 
 		sorted = unsorted_string_list_has_string(&traits, "sorted");
 
 		/* perhaps other traits later as well */
 
 		/* The "+ 1" is for the LF character. */
-		packed_refs->header_len = eol + 1 - packed_refs->buf;
+		snapshot->header_len = eol + 1 - snapshot->buf;
 
 		string_list_clear(&traits, 0);
 		strbuf_release(&tmp);
 	}
 
-	verify_buffer_safe(packed_refs);
+	verify_buffer_safe(snapshot);
 
 	if (!sorted) {
-		sort_packed_refs(packed_refs);
+		sort_snapshot(snapshot);
 
 		/*
 		 * Reordering the records might have moved a short one
 		 * to the end of the buffer, so verify the buffer's
 		 * safety again:
 		 */
-		verify_buffer_safe(packed_refs);
+		verify_buffer_safe(snapshot);
 	}
 
-	return packed_refs;
+	return snapshot;
 }
 
 /*
- * Check that the packed refs cache (if any) still reflects the
- * contents of the file. If not, clear the cache.
+ * Check that `refs->snapshot` (if present) still reflects the
+ * contents of the `packed-refs` file. If not, clear the snapshot.
  */
-static void validate_packed_ref_cache(struct packed_ref_store *refs)
+static void validate_snapshot(struct packed_ref_store *refs)
 {
-	if (refs->cache &&
-	    !stat_validity_check(&refs->cache->validity, refs->path))
-		clear_packed_ref_cache(refs);
+	if (refs->snapshot &&
+	    !stat_validity_check(&refs->snapshot->validity, refs->path))
+		clear_snapshot(refs);
 }
 
 /*
- * Get the packed_ref_cache for the specified packed_ref_store,
- * creating and populating it if it hasn't been read before or if the
- * file has been changed (according to its `validity` field) since it
- * was last read. On the other hand, if we hold the lock, then assume
- * that the file hasn't been changed out from under us, so skip the
- * extra `stat()` call in `stat_validity_check()`.
+ * Get the `snapshot` for the specified packed_ref_store, creating and
+ * populating it if it hasn't been read before or if the file has been
+ * changed (according to its `validity` field) since it was last read.
+ * On the other hand, if we hold the lock, then assume that the file
+ * hasn't been changed out from under us, so skip the extra `stat()`
+ * call in `stat_validity_check()`. This function does *not* increase
+ * the snapshot's reference count on behalf of the caller.
  */
-static struct packed_ref_cache *get_packed_ref_cache(struct packed_ref_store *refs)
+static struct snapshot *get_snapshot(struct packed_ref_store *refs)
 {
 	if (!is_lock_file_locked(&refs->lock))
-		validate_packed_ref_cache(refs);
+		validate_snapshot(refs);
 
-	if (!refs->cache)
-		refs->cache = read_packed_refs(refs);
+	if (!refs->snapshot)
+		refs->snapshot = create_snapshot(refs);
 
-	return refs->cache;
+	return refs->snapshot;
 }
 
 static int packed_read_raw_ref(struct ref_store *ref_store,
@@ -617,12 +651,12 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
 {
 	struct packed_ref_store *refs =
 		packed_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
-	struct packed_ref_cache *packed_refs = get_packed_ref_cache(refs);
+	struct snapshot *snapshot = get_snapshot(refs);
 	const char *rec;
 
 	*type = 0;
 
-	rec = find_reference_location(packed_refs, refname, 1);
+	rec = find_reference_location(snapshot, refname, 1);
 
 	if (!rec) {
 		/* refname is not a packed reference. */
@@ -631,7 +665,7 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
 	}
 
 	if (get_sha1_hex(rec, sha1))
-		die_invalid_line(refs->path, rec, packed_refs->eof - rec);
+		die_invalid_line(refs->path, rec, snapshot->eof - rec);
 
 	*type = REF_ISPACKED;
 	return 0;
@@ -646,26 +680,33 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
 #define REF_KNOWS_PEELED 0x40
 
 /*
- * An iterator over a packed-refs file that is currently mmapped.
+ * An iterator over a snapshot of a `packed-refs` file.
  */
 struct packed_ref_iterator {
 	struct ref_iterator base;
 
-	struct packed_ref_cache *packed_refs;
+	struct snapshot *snapshot;
 
-	/* The current position in the mmapped file: */
+	/* The current position in the snapshot's buffer: */
 	const char *pos;
 
-	/* The end of the mmapped file: */
+	/* The end of the part of the buffer that will be iterated over: */
 	const char *eof;
 
+	/* Scratch space for current values: */
 	struct object_id oid, peeled;
-
 	struct strbuf refname_buf;
 
 	unsigned int flags;
 };
 
+/*
+ * Move the iterator to the next record in the snapshot, without
+ * respect for whether the record is actually required by the current
+ * iteration. Adjust the fields in `iter` and return `ITER_OK` or
+ * `ITER_DONE`. This function does not free the iterator in the case
+ * of `ITER_DONE`.
+ */
 static int next_record(struct packed_ref_iterator *iter)
 {
 	const char *p = iter->pos, *eol;
@@ -680,12 +721,12 @@ static int next_record(struct packed_ref_iterator *iter)
 	if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
 	    parse_oid_hex(p, &iter->oid, &p) ||
 	    !isspace(*p++))
-		die_invalid_line(iter->packed_refs->refs->path,
+		die_invalid_line(iter->snapshot->refs->path,
 				 iter->pos, iter->eof - iter->pos);
 
 	eol = memchr(p, '\n', iter->eof - p);
 	if (!eol)
-		die_unterminated_line(iter->packed_refs->refs->path,
+		die_unterminated_line(iter->snapshot->refs->path,
 				      iter->pos, iter->eof - iter->pos);
 
 	strbuf_add(&iter->refname_buf, p, eol - p);
@@ -698,8 +739,8 @@ static int next_record(struct packed_ref_iterator *iter)
 		oidclr(&iter->oid);
 		iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN;
 	}
-	if (iter->packed_refs->peeled == PEELED_FULLY ||
-	    (iter->packed_refs->peeled == PEELED_TAGS &&
+	if (iter->snapshot->peeled == PEELED_FULLY ||
+	    (iter->snapshot->peeled == PEELED_TAGS &&
 	     starts_with(iter->base.refname, "refs/tags/")))
 		iter->base.flags |= REF_KNOWS_PEELED;
 
@@ -710,7 +751,7 @@ static int next_record(struct packed_ref_iterator *iter)
 		if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
 		    parse_oid_hex(p, &iter->peeled, &p) ||
 		    *p++ != '\n')
-			die_invalid_line(iter->packed_refs->refs->path,
+			die_invalid_line(iter->snapshot->refs->path,
 					 iter->pos, iter->eof - iter->pos);
 		iter->pos = p;
 
@@ -780,7 +821,7 @@ static int packed_ref_iterator_abort(struct ref_iterator *ref_iterator)
 	int ok = ITER_DONE;
 
 	strbuf_release(&iter->refname_buf);
-	release_packed_ref_cache(iter->packed_refs);
+	release_snapshot(iter->snapshot);
 	base_ref_iterator_free(ref_iterator);
 	return ok;
 }
@@ -796,7 +837,7 @@ static struct ref_iterator *packed_ref_iterator_begin(
 		const char *prefix, unsigned int flags)
 {
 	struct packed_ref_store *refs;
-	struct packed_ref_cache *packed_refs;
+	struct snapshot *snapshot;
 	const char *start;
 	struct packed_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
@@ -806,30 +847,30 @@ static struct ref_iterator *packed_ref_iterator_begin(
 		required_flags |= REF_STORE_ODB;
 	refs = packed_downcast(ref_store, required_flags, "ref_iterator_begin");
 
-	packed_refs = get_packed_ref_cache(refs);
+	/*
+	 * Note that `get_snapshot()` internally checks whether the
+	 * snapshot is up to date with what is on disk, and re-reads
+	 * it if not.
+	 */
+	snapshot = get_snapshot(refs);
 
-	if (!packed_refs->buf)
+	if (!snapshot->buf)
 		return empty_ref_iterator_begin();
 
 	iter = xcalloc(1, sizeof(*iter));
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &packed_ref_iterator_vtable, 1);
 
-	/*
-	 * Note that get_packed_ref_cache() internally checks whether
-	 * the packed-ref cache is up to date with what is on disk,
-	 * and re-reads it if not.
-	 */
-	iter->packed_refs = packed_refs;
-	acquire_packed_ref_cache(packed_refs);
+	iter->snapshot = snapshot;
+	acquire_snapshot(snapshot);
 
 	if (prefix && *prefix)
-		start = find_reference_location(packed_refs, prefix, 0);
+		start = find_reference_location(snapshot, prefix, 0);
 	else
-		start = packed_refs->buf + packed_refs->header_len;
+		start = snapshot->buf + snapshot->header_len;
 
 	iter->pos = start;
-	iter->eof = packed_refs->eof;
+	iter->eof = snapshot->eof;
 	strbuf_init(&iter->refname_buf, 0);
 
 	iter->base.oid = &iter->oid;
@@ -893,19 +934,19 @@ int packed_refs_lock(struct ref_store *ref_store, int flags, struct strbuf *err)
 
 	/*
 	 * Now that we hold the `packed-refs` lock, make sure that our
-	 * cache matches the current version of the file. Normally
-	 * `get_packed_ref_cache()` does that for us, but that
-	 * function assumes that when the file is locked, any existing
-	 * cache is still valid. We've just locked the file, but it
-	 * might have changed the moment *before* we locked it.
+	 * snapshot matches the current version of the file. Normally
+	 * `get_snapshot()` does that for us, but that function
+	 * assumes that when the file is locked, any existing snapshot
+	 * is still valid. We've just locked the file, but it might
+	 * have changed the moment *before* we locked it.
 	 */
-	validate_packed_ref_cache(refs);
+	validate_snapshot(refs);
 
 	/*
 	 * Now make sure that the packed-refs file as it exists in the
-	 * locked state is loaded into the cache:
+	 * locked state is loaded into the snapshot:
 	 */
-	get_packed_ref_cache(refs);
+	get_snapshot(refs);
 	return 0;
 }
 
@@ -932,8 +973,8 @@ int packed_refs_is_locked(struct ref_store *ref_store)
 }
 
 /*
- * The packed-refs header line that we write out.  Perhaps other
- * traits will be added later.
+ * The packed-refs header line that we write out. Perhaps other traits
+ * will be added later.
  *
  * Note that earlier versions of Git used to parse these traits by
  * looking for " trait " in the line. For this reason, the space after
@@ -949,9 +990,9 @@ static int packed_init_db(struct ref_store *ref_store, struct strbuf *err)
 }
 
 /*
- * Write the packed-refs from the cache to the packed-refs tempfile,
- * incorporating any changes from `updates`. `updates` must be a
- * sorted string list whose keys are the refnames and whose util
+ * Write the packed refs from the current snapshot to the packed-refs
+ * tempfile, incorporating any changes from `updates`. `updates` must
+ * be a sorted string list whose keys are the refnames and whose util
  * values are `struct ref_update *`. On error, rollback the tempfile,
  * write an error message to `err`, and return a nonzero value.
  *
@@ -1192,9 +1233,10 @@ static int packed_transaction_prepare(struct ref_store *ref_store,
 	/*
 	 * Note that we *don't* skip transactions with zero updates,
 	 * because such a transaction might be executed for the side
-	 * effect of ensuring that all of the references are peeled.
-	 * If the caller wants to optimize away empty transactions, it
-	 * should do so itself.
+	 * effect of ensuring that all of the references are peeled or
+	 * ensuring that the `packed-refs` file is sorted. If the
+	 * caller wants to optimize away empty transactions, it should
+	 * do so itself.
 	 */
 
 	data = xcalloc(1, sizeof(*data));
@@ -1267,7 +1309,7 @@ static int packed_transaction_finish(struct ref_store *ref_store,
 		goto cleanup;
 	}
 
-	clear_packed_ref_cache(refs);
+	clear_snapshot(refs);
 	ret = 0;
 
 cleanup:
-- 
2.14.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/20] Read `packed-refs` using mmap()
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (19 preceding siblings ...)
  2017-09-13 17:16 ` [PATCH 20/20] packed-backend.c: rename a bunch of things and update comments Michael Haggerty
@ 2017-09-13 21:35 ` Junio C Hamano
  2017-09-14 16:12 ` Michael Haggerty
  2017-09-14 20:23 ` Johannes Schindelin
  22 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2017-09-13 21:35 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Following lots of work to extract the `packed_ref_store` into a
> separate module and decouple it from the `files_ref_store`, it is now
> possible to fundamentally change how the `packed-refs` file is read.
>
> * `mmap()` the whole file rather than `read()`ing it.
>
> * Instead of parsing the complete file at once into a `ref_cache`,
>   parse the references out of the file contents on demand.
>
> * Use a binary search to find, very quickly, the reference or group of
>   references that needs to be read. Parse *only* those references out
>   of the file contents, without creating in-memory data structures at
>   all.

Oooh, juicy and very exciting.

> ...
> This branch applies on top of mh/packed-ref-transactions. It can also
> be obtained from my git fork [1] as branch `mmap-packed-refs`.
>
> Michael
>
> [1] https://github.com/mhagger/git

Thanks for working on this. I expect that it will take more than a
few hours for me to pick it up, but I am looking forward to it.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 20/20] packed-backend.c: rename a bunch of things and update comments
  2017-09-13 17:16 ` [PATCH 20/20] packed-backend.c: rename a bunch of things and update comments Michael Haggerty
@ 2017-09-13 23:00   ` Stefan Beller
  0 siblings, 0 replies; 27+ messages in thread
From: Stefan Beller @ 2017-09-13 23:00 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git

On Wed, Sep 13, 2017 at 10:16 AM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> We've made huge changes to this file, and some of the old names and
> comments are no longer very fitting. So rename a bunch of things:
>
> * `struct packed_ref_cache` → `struct snapshot`
> * `acquire_packed_ref_cache()` → `acquire_snapshot()`
> * `release_packed_ref_buffer()` → `clear_snapshot_buffer()`
> * `release_packed_ref_cache()` → `release_snapshot()`
> * `clear_packed_ref_cache()` → `clear_snapshot()`
> * `struct packed_ref_entry` → `struct snapshot_record`
> * `cmp_packed_ref_entries()` → `cmp_packed_ref_records()`
> * `cmp_entry_to_refname()` → `cmp_record_to_refname()`
> * `sort_packed_refs()` → `sort_snapshot()`
> * `read_packed_refs()` → `create_snapshot()`
> * `validate_packed_ref_cache()` → `validate_snapshot()`
> * `get_packed_ref_cache()` → `get_snapshot()`
> * Renamed local variables and struct members accordingly.
>
> Also update a bunch of comments to reflect the renaming and the
> accumulated changes that the code has undergone.
>
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>

I have skimmed this series and it looks good.
Thanks,
Stefan

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/20] Read `packed-refs` using mmap()
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (20 preceding siblings ...)
  2017-09-13 21:35 ` [PATCH 00/20] Read `packed-refs` using mmap() Junio C Hamano
@ 2017-09-14 16:12 ` Michael Haggerty
  2017-09-14 22:16   ` Johannes Schindelin
  2017-09-14 20:23 ` Johannes Schindelin
  22 siblings, 1 reply; 27+ messages in thread
From: Michael Haggerty @ 2017-09-14 16:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, Stefan Beller, Jeff King,
	Ævar Arnfjörð Bjarmason, Brandon Williams, git,
	Johannes Schindelin

On 09/13/2017 07:15 PM, Michael Haggerty wrote:
> [...]
> * `mmap()` the whole file rather than `read()`ing it.
> [...]
Apparently this doesn't work on Windows, because the `snapshot` is
keeping the `packed-refs` file open too long, so the new file can't be
renamed on top of it.

I didn't realize that this is even allowed, but TIL that you can close a
file while keeping it mmapped. Does that technique work on Windows? If
so, I'll change v2 to do it as sketched below.

Michael

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 8235ac8506..95c1cd2a27 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -35,11 +35,8 @@ struct snapshot {
         */
        struct packed_ref_store *refs;

-       /*
-        * The file descriptor of the `packed-refs` file (open in
-        * read-only mode), or -1 if it is not open.
-        */
-       int fd;
+       /* Is the `packed-refs` file currently mmapped? */
+       int mmapped;

        /*
         * The contents of the `packed-refs` file. If the file was
@@ -135,12 +132,11 @@ static void acquire_snapshot(struct snapshot
*snapshot)
  */
 static void clear_snapshot_buffer(struct snapshot *snapshot)
 {
-       if (snapshot->fd >= 0) {
+       if (snapshot->mmapped) {
                if (munmap(snapshot->buf, snapshot->eof - snapshot->buf))
                        die_errno("error ummapping packed-refs file %s",
                                  snapshot->refs->path);
-               close(snapshot->fd);
-               snapshot->fd = -1;
+               snapshot->mmapped = 0;
        } else {
                free(snapshot->buf);
        }
@@ -525,6 +521,7 @@ static const char *find_reference_location(struct
snapshot *snapshot,
 static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 {
        struct snapshot *snapshot = xcalloc(1, sizeof(*snapshot));
+       int fd;
        struct stat st;
        size_t size;
        int sorted = 0;
@@ -533,8 +530,8 @@ static struct snapshot *create_snapshot(struct
packed_ref_store *refs)
        acquire_snapshot(snapshot);
        snapshot->peeled = PEELED_NONE;

-       snapshot->fd = open(refs->path, O_RDONLY);
-       if (snapshot->fd < 0) {
+       fd = open(refs->path, O_RDONLY);
+       if (fd < 0) {
                if (errno == ENOENT) {
                        /*
                         * This is OK; it just means that no
@@ -549,15 +546,16 @@ static struct snapshot *create_snapshot(struct
packed_ref_store *refs)
                }
        }

-       stat_validity_update(&snapshot->validity, snapshot->fd);
+       stat_validity_update(&snapshot->validity, fd);

-       if (fstat(snapshot->fd, &st) < 0)
+       if (fstat(fd, &st) < 0)
                die_errno("couldn't stat %s", refs->path);

        size = xsize_t(st.st_size);
-       snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE,
-                             snapshot->fd, 0);
+       snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
        snapshot->eof = snapshot->buf + size;
+       snapshot->mmapped = 1;
+       close(fd);

        /* If the file has a header line, process it: */
        if (snapshot->buf < snapshot->eof && *snapshot->buf == '#') {

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/20] Read `packed-refs` using mmap()
  2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
                   ` (21 preceding siblings ...)
  2017-09-14 16:12 ` Michael Haggerty
@ 2017-09-14 20:23 ` Johannes Schindelin
  2017-09-15  4:21   ` Michael Haggerty
  22 siblings, 1 reply; 27+ messages in thread
From: Johannes Schindelin @ 2017-09-14 20:23 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Stefan Beller, Jeff King, Ævar Arnfjörð Bjarmason,
	Brandon Williams, git

Hi Michael,

On Wed, 13 Sep 2017, Michael Haggerty wrote:

> * `mmap()` the whole file rather than `read()`ing it.

On Windows, a memory-mapped file cannot be renamed. As a consequence, the
following tests fail on `pu`:

t1400-update-ref.sh
t1404-update-ref-errors.sh
t1405-main-ref-store.sh
t1408-packed-refs.sh
t1410-reflog.sh
t1430-bad-ref-name.sh
t1450-fsck.sh
t1507-rev-parse-upstream.sh
t2018-checkout-branch.sh
t2020-checkout-detach.sh
t2024-checkout-dwim.sh
t3200-branch.sh
t3204-branch-name-interpretation.sh
t3210-pack-refs.sh
t3211-peel-ref.sh
t3306-notes-prune.sh
t3400-rebase.sh
t3404-rebase-interactive.sh
t3420-rebase-autostash.sh
t3903-stash.sh
t3905-stash-include-untracked.sh
t4151-am-abort.sh
t5304-prune.sh
t5312-prune-corruption.sh
t5400-send-pack.sh
t5404-tracking-branches.sh
t5500-fetch-pack.sh
t5505-remote.sh
t5510-fetch.sh
t5514-fetch-multiple.sh
t5515-fetch-merge-logic.sh
t5516-fetch-push.sh
t5520-pull.sh
t5533-push-cas.sh
t5572-pull-submodule.sh
t5600-clone-fail-cleanup.sh
t5607-clone-bundle.sh
t6016-rev-list-graph-simplify-history.sh
t6030-bisect-porcelain.sh
t6032-merge-large-rename.sh
t6040-tracking-info.sh
t6050-replace.sh
t6500-gc.sh
t6501-freshen-objects.sh
t7003-filter-branch.sh
t7004-tag.sh
t7102-reset.sh
t7201-co.sh
t7406-submodule-update.sh
t7504-commit-msg-hook.sh
t7701-repack-unpack-unreachable.sh
t9300-fast-import.sh
t9902-completion.sh
t9903-bash-prompt.sh

This diff:

-- snip --
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index f9c5e771bdd..ee8b3603624 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -1306,13 +1308,13 @@ static int packed_transaction_finish(struct
ref_store *ref_store,
 	char *packed_refs_path;
 
 	packed_refs_path = get_locked_file_path(&refs->lock);
+	clear_snapshot(refs);
 	if (rename_tempfile(&refs->tempfile, packed_refs_path)) {
 		strbuf_addf(err, "error replacing %s: %s",
 			    refs->path, strerror(errno));
 		goto cleanup;
 	}
 
-	clear_snapshot(refs);
 	ret = 0;
 
 cleanup:
-- snap --

reduces the failed tests to

t1410-reflog.counts.sh
t3210-pack-refs.counts.sh
t3211-peel-ref.counts.sh
t5505-remote.counts.sh
t5510-fetch.counts.sh
t5600-clone-fail-cleanup.counts.sh

However, much as I tried to wrap my head around it, I could not even start
to come up with a solution for e.g. the t1410 regression. The failure
happens in `git branch -d one/two`.

The culprit: there is yet another unreleased snapshot while we try to
finish the transaction.

It is the snapshot of the main_worktree_ref_store as acquired by
add_head_info() (which is called from get_main_worktree(), which in turn
was called from get_worktrees(), in turn having been called from
find_shared_symref() that was called from delete_branches()), and it seems
to me that there was never any plan to have that released, including its
snapshot.

And the snapshot gets initialized upon add_head_info() calling
refs_resolve_ref_unsafe().

Further down in the delete_branches() function, however, we call
delete_ref() which in turn wants to overwrite the packed-refs file via an
atomic rename. That rename fails now because there is still that main
worktree's ref_store that has the snapshot mmap()ed .

I imagine that the rest of the test failures stems from the same root
cause.

Do you have any idea how to ensure that all mmap()ed packed refs are
released before attempting to finish a transaction?

Ciao,
Dscho

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/20] Read `packed-refs` using mmap()
  2017-09-14 16:12 ` Michael Haggerty
@ 2017-09-14 22:16   ` Johannes Schindelin
  0 siblings, 0 replies; 27+ messages in thread
From: Johannes Schindelin @ 2017-09-14 22:16 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Stefan Beller, Jeff King, Ævar Arnfjörð Bjarmason,
	Brandon Williams, git

Hi Michael,

On Thu, 14 Sep 2017, Michael Haggerty wrote:

> On 09/13/2017 07:15 PM, Michael Haggerty wrote:
> > [...]
> > * `mmap()` the whole file rather than `read()`ing it.
> > [...]
>
> Apparently this doesn't work on Windows, because the `snapshot` is
> keeping the `packed-refs` file open too long, so the new file can't be
> renamed on top of it.

Indeed, I sent you a mail about it before I checked for new mails ;-)

> I didn't realize that this is even allowed, but TIL that you can close a
> file while keeping it mmapped. Does that technique work on Windows? If
> so, I'll change v2 to do it as sketched below.

I do not know whether that technique works on Windows, but it would not
solve the problem *entirely*, as I outlined in my reply: in
delete_branches(), for example, the main_worktree_ref_store is implicitly
getting its snapshot initialized, and nothing seems to release it.

This is only one example, as I figure that multiple worktrees could cause
even more ref_stores to have unreleased snapshots of the packed-refs file.

I imagine that something like close_all_packs() is needed
("clear_all_snapshots()"?).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 00/20] Read `packed-refs` using mmap()
  2017-09-14 20:23 ` Johannes Schindelin
@ 2017-09-15  4:21   ` Michael Haggerty
  0 siblings, 0 replies; 27+ messages in thread
From: Michael Haggerty @ 2017-09-15  4:21 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Stefan Beller, Jeff King, Ævar Arnfjörð Bjarmason,
	Brandon Williams, git

On 09/14/2017 10:23 PM, Johannes Schindelin wrote:
> On Wed, 13 Sep 2017, Michael Haggerty wrote:
> 
>> * `mmap()` the whole file rather than `read()`ing it.
> 
> On Windows, a memory-mapped file cannot be renamed. As a consequence, the
> following tests fail on `pu`:
> 
> [...]

Thanks for your quick investigation.

> This diff:
> 
> -- snip --
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index f9c5e771bdd..ee8b3603624 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -1306,13 +1308,13 @@ static int packed_transaction_finish(struct
> ref_store *ref_store,
>  	char *packed_refs_path;
>  
>  	packed_refs_path = get_locked_file_path(&refs->lock);
> +	clear_snapshot(refs);
>  	if (rename_tempfile(&refs->tempfile, packed_refs_path)) {
>  		strbuf_addf(err, "error replacing %s: %s",
>  			    refs->path, strerror(errno));
>  		goto cleanup;
>  	}
>  
> -	clear_snapshot(refs);
>  	ret = 0;
>  
>  cleanup:
> -- snap --
> 
> reduces the failed tests to
> 
> t1410-reflog.counts.sh
> t3210-pack-refs.counts.sh
> t3211-peel-ref.counts.sh
> t5505-remote.counts.sh
> t5510-fetch.counts.sh
> t5600-clone-fail-cleanup.counts.sh

That change is a strict improvement on all systems; I'll integrate it
into the series.

> However, much as I tried to wrap my head around it, I could not even start
> to come up with a solution for e.g. the t1410 regression. The failure
> happens in `git branch -d one/two`.
> 
> The culprit: there is yet another unreleased snapshot while we try to
> finish the transaction.
> 
> It is the snapshot of the main_worktree_ref_store as acquired by
> add_head_info() (which is called from get_main_worktree(), which in turn
> was called from get_worktrees(), in turn having been called from
> find_shared_symref() that was called from delete_branches()), and it seems
> to me that there was never any plan to have that released, including its
> snapshot.

Yeah the idea was that the default situation would be that whenever a
`packed-refs` file needs to be read, it would be kept mmapped for the
life of the program (or until the file was detected to have been
changed). This was meant to be a replacement for the explicit
`ref_cache`. So any long-lived git process could be expected to have the
`packed-refs` file (or even multiple `packed-refs` files in the case of
submodules) mmapped.

That's obviously not going to work on Windows.

> [...]
> Do you have any idea how to ensure that all mmap()ed packed refs are
> released before attempting to finish a transaction?

[And from your other email:]
> This is only one example, as I figure that multiple worktrees could cause
> even more ref_stores to have unreleased snapshots of the packed-refs file.
> 
> I imagine that something like close_all_packs() is needed
> ("clear_all_snapshots()"?).

Yes, I wasn't really familiar with `close_all_packs()`, but it sounds
like it solves a similar problem.

Within a single process, we could make this work via an arduous process
of figuring out what functions might want to overwrite the `packed-refs`
file, and making sure that nobody up the call stack is iterating over
references during those function calls. If that condition is met, then
calling `clear_snapshot()` earlier, as you propose above, would decrease
the reference count to zero, causing the old snapshot to be freed, and
allowing the rename to succeed. We could even do

    if (!clear_snapshot(refs))
            BUG("packed-refs snapshot is still in use");

, analogously to `close_all_packs()`, to help find code that violates
the condition.

Similarly, it wouldn't be allowed to invoke a subprocess or hook
function while iterating over references, and one would have to clear
any current snapshots before doing so. It might even make sense to do
that at the same time as `close_all_packs()`, for example in a new
function `close_all_unnecessary_files()`.

But what about unrelated processes? Anybody reading `packed-refs` would
block anybody who wants to modify it, for example to delete a reference
or pack the references. I think this is worse than the packfile
situation. Packfiles all have unique names, so they never need to be
overwritten; at worst they are deleted. And the deletion is never
critical to correctness. If you can't delete a packfile, the only
consequence is that it hangs around until the next GC.

However, the `packed-refs` file always has the same name, and
overwriting it is sometimes critical for correctness. So it sounds to me
like even *readers* mustn't keep the file open or mmapped longer than
necessary!

(This makes me wonder, doesn't `rename_tempfile()` for the `packed-refs`
file *already* fail sometimes on Windows due to a concurrent reader?
Shouldn't it retry if it gets whatever error indicates that the failure
was due to the file being open?)

Anyway, if all of my reasoning is correct, it seems that the inescapable
conclusion is that having the `packed-refs` file open or mmapped is
tantamount to holding a reader lock on the file, because it causes
writers to fail or at least have to retry. Therefore, even if you are
only reading the `packed-refs` file, you should read then close/munmap
it as quickly as possible.

And if *that* is true, then ISTM that this mmap idea is unworkable on
Windows. We would either need to keep the old `ref_cache`-based code
around for use there, or we would need to make a copy the file contents
in memory immediately and use *that* as our cache. The latter would be
pretty easy to implement, actually, because something similar is done in
`sort_snapshot()` to handle the case that the `packed-refs` file on disk
is not correctly ordered by refname. I think that approach would be
competitive with the `ref_cache`-based code, which also reads (and
parses!) the whole file whenever any packed reference needs to be read.
It is possible that memory usage might go up a bit due to the fact that
the SHA-1s would be stored in memory in hex rather than binary form; on
the other hand, this would spare some memory overhead associated with
allocating lots of little objects and the pointers that string the
objects together, so it's probably a wash.

Please let me know if you agree with my reasoning. If so, I'll implement
"always copy `packed-refs` file contents to RAM" for Windows.

A possible future optimization might be that when iterating over a
subset of the references (e.g., `refs/replace/`), only that part of the
file is copied to RAM. But that would be a bigger change, and it's not
obvious whether it might make some read access patterns slower.

Michael

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2017-09-15  4:21 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-13 17:15 [PATCH 00/20] Read `packed-refs` using mmap() Michael Haggerty
2017-09-13 17:15 ` [PATCH 01/20] ref_iterator: keep track of whether the iterator output is ordered Michael Haggerty
2017-09-13 17:15 ` [PATCH 02/20] prefix_ref_iterator: break when we leave the prefix Michael Haggerty
2017-09-13 17:15 ` [PATCH 03/20] packed_ref_cache: add a backlink to the associated `packed_ref_store` Michael Haggerty
2017-09-13 17:15 ` [PATCH 04/20] die_unterminated_line(), die_invalid_line(): new functions Michael Haggerty
2017-09-13 17:15 ` [PATCH 05/20] read_packed_refs(): use mmap to read the `packed-refs` file Michael Haggerty
2017-09-13 17:16 ` [PATCH 06/20] read_packed_refs(): only check for a header at the top of the file Michael Haggerty
2017-09-13 17:16 ` [PATCH 07/20] read_packed_refs(): make parsing of the header line more robust Michael Haggerty
2017-09-13 17:16 ` [PATCH 08/20] read_packed_refs(): read references with minimal copying Michael Haggerty
2017-09-13 17:16 ` [PATCH 09/20] packed_ref_cache: remember the file-wide peeling state Michael Haggerty
2017-09-13 17:16 ` [PATCH 10/20] mmapped_ref_iterator: add iterator over a packed-refs file Michael Haggerty
2017-09-13 17:16 ` [PATCH 11/20] mmapped_ref_iterator_advance(): no peeled value for broken refs Michael Haggerty
2017-09-13 17:16 ` [PATCH 12/20] packed_ref_cache: keep the `packed-refs` file open and mmapped Michael Haggerty
2017-09-13 17:16 ` [PATCH 13/20] read_packed_refs(): ensure that references are ordered when read Michael Haggerty
2017-09-13 17:16 ` [PATCH 14/20] packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator` Michael Haggerty
2017-09-13 17:16 ` [PATCH 15/20] packed_read_raw_ref(): read the reference from the mmapped buffer Michael Haggerty
2017-09-13 17:16 ` [PATCH 16/20] ref_store: implement `refs_peel_ref()` generically Michael Haggerty
2017-09-13 17:16 ` [PATCH 17/20] packed_ref_store: get rid of the `ref_cache` entirely Michael Haggerty
2017-09-13 17:16 ` [PATCH 18/20] ref_cache: remove support for storing peeled values Michael Haggerty
2017-09-13 17:16 ` [PATCH 19/20] mmapped_ref_iterator: inline into `packed_ref_iterator` Michael Haggerty
2017-09-13 17:16 ` [PATCH 20/20] packed-backend.c: rename a bunch of things and update comments Michael Haggerty
2017-09-13 23:00   ` Stefan Beller
2017-09-13 21:35 ` [PATCH 00/20] Read `packed-refs` using mmap() Junio C Hamano
2017-09-14 16:12 ` Michael Haggerty
2017-09-14 22:16   ` Johannes Schindelin
2017-09-14 20:23 ` Johannes Schindelin
2017-09-15  4:21   ` Michael Haggerty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).