git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] packed_ref_cache: don't use mmap() for small files
@ 2018-01-13 16:11 Kim Gybels
  2018-01-13 18:56 ` Johannes Schindelin
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Kim Gybels @ 2018-01-13 16:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin, Michael Haggerty, Kim Gybels

Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use
read() instead of mmap() for small packed-refs files.

This also fixes the problem[1] where xmmap() returns NULL for zero
length[2], for which munmap() later fails.

Alternatively, we could simply check for NULL before munmap(), or
introduce an xmunmap() that could be used together with xmmap().

[1] https://github.com/git-for-windows/git/issues/1410
[2] Logic introduced in commit 9130ac1e1966adb9922e64f645730d0d45383495

Signed-off-by: Kim Gybels <kgybels@infogroep.be>
---
 refs/packed-backend.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index dab8a85d9a..7177e5bc2f 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 				 last_line, eof - last_line);
 }
 
+#define SMALL_FILE_SIZE (32*1024)
+
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
@@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot)
 		die_errno("couldn't stat %s", snapshot->refs->path);
 	size = xsize_t(st.st_size);
 
-	switch (mmap_strategy) {
-	case MMAP_NONE:
+	if (!size) {
+		snapshot->buf = NULL;
+		snapshot->eof = NULL;
+		snapshot->mmapped = 0;
+	} else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
 		snapshot->buf = xmalloc(size);
 		bytes_read = read_in_full(fd, snapshot->buf, size);
 		if (bytes_read < 0 || bytes_read != size)
 			die_errno("couldn't read %s", snapshot->refs->path);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 0;
-		break;
-	case MMAP_TEMPORARY:
-	case MMAP_OK:
+	} else {
 		snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 1;
-		break;
 	}
 	close(fd);
 
-- 
2.15.1.windows.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] packed_ref_cache: don't use mmap() for small files
  2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels
@ 2018-01-13 18:56 ` Johannes Schindelin
  2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
  2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King
  2 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin @ 2018-01-13 18:56 UTC (permalink / raw)
  To: Kim Gybels; +Cc: git, Junio C Hamano, Michael Haggerty

Hi,

On Sat, 13 Jan 2018, Kim Gybels wrote:

> Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use

Maybe use

	ea68b0ce9f8 (hash-object: don't use mmap() for small files,
	2010-02-21)

instead of the full commit name?

> read() instead of mmap() for small packed-refs files.
> 
> This also fixes the problem[1] where xmmap() returns NULL for zero
> length[2], for which munmap() later fails.
> 
> Alternatively, we could simply check for NULL before munmap(), or
> introduce an xmunmap() that could be used together with xmmap().
> 
> [1] https://github.com/git-for-windows/git/issues/1410
> [2] Logic introduced in commit 9130ac1e1966adb9922e64f645730d0d45383495
> 
> Signed-off-by: Kim Gybels <kgybels@infogroep.be>
> ---
>  refs/packed-backend.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index dab8a85d9a..7177e5bc2f 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
>  				 last_line, eof - last_line);
>  }
>  
> +#define SMALL_FILE_SIZE (32*1024)
> +
>  /*
>   * Depending on `mmap_strategy`, either mmap or read the contents of
>   * the `packed-refs` file into the snapshot. Return 1 if the file
> @@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot)
>  		die_errno("couldn't stat %s", snapshot->refs->path);
>  	size = xsize_t(st.st_size);
>  
> -	switch (mmap_strategy) {
> -	case MMAP_NONE:
> +	if (!size) {
> +		snapshot->buf = NULL;
> +		snapshot->eof = NULL;
> +		snapshot->mmapped = 0;
> +	} else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
>  		snapshot->buf = xmalloc(size);
>  		bytes_read = read_in_full(fd, snapshot->buf, size);
>  		if (bytes_read < 0 || bytes_read != size)
>  			die_errno("couldn't read %s", snapshot->refs->path);
>  		snapshot->eof = snapshot->buf + size;
>  		snapshot->mmapped = 0;
> -		break;
> -	case MMAP_TEMPORARY:
> -	case MMAP_OK:
> +	} else {
>  		snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
>  		snapshot->eof = snapshot->buf + size;
>  		snapshot->mmapped = 1;
> -		break;
>  	}
>  	close(fd);

Nicely explained, and nicely solved, for a potential extra performance
benefit ;-)

Thank you!
Dscho

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2] packed_ref_cache: don't use mmap() for small files
  2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels
  2018-01-13 18:56 ` Johannes Schindelin
@ 2018-01-14 19:14 ` Kim Gybels
  2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
  2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King
  2 siblings, 1 reply; 33+ messages in thread
From: Kim Gybels @ 2018-01-14 19:14 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin, Michael Haggerty, Kim Gybels

Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
small files, 2010-02-21) and use read() instead of mmap() for small
packed-refs files.

This also fixes the problem[1] where xmmap() returns NULL for zero
length[2], for which munmap() later fails.

Alternatively, we could simply check for NULL before munmap(), or
introduce xmunmap() that could be used together with xmmap().

[1] https://github.com/git-for-windows/git/issues/1410
[2] Logic introduced in commit 9130ac1e196 (Better error messages for
    corrupt databases, 2007-01-11)

Signed-off-by: Kim Gybels <kgybels@infogroep.be>
---
Change since v1: reworded commit message based on Johannes Schindelin's
feedback: shorter commit hashes, and included commit titles.

 refs/packed-backend.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index dab8a85d9a..7177e5bc2f 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 				 last_line, eof - last_line);
 }
 
+#define SMALL_FILE_SIZE (32*1024)
+
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
@@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot)
 		die_errno("couldn't stat %s", snapshot->refs->path);
 	size = xsize_t(st.st_size);
 
-	switch (mmap_strategy) {
-	case MMAP_NONE:
+	if (!size) {
+		snapshot->buf = NULL;
+		snapshot->eof = NULL;
+		snapshot->mmapped = 0;
+	} else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
 		snapshot->buf = xmalloc(size);
 		bytes_read = read_in_full(fd, snapshot->buf, size);
 		if (bytes_read < 0 || bytes_read != size)
 			die_errno("couldn't read %s", snapshot->refs->path);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 0;
-		break;
-	case MMAP_TEMPORARY:
-	case MMAP_OK:
+	} else {
 		snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 1;
-		break;
 	}
 	close(fd);
 
-- 
2.16.0.rc2.windows.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files"
  2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
@ 2018-01-15 12:17   ` Michael Haggerty
  2018-01-15 12:17     ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty
                       ` (4 more replies)
  0 siblings, 5 replies; 33+ messages in thread
From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw)
  To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty

Thanks for your patch. I haven't measured the performance difference
of `mmap()` vs. `read()` for small `packed-refs` files, but it's not
surprising that `read()` would be faster.

I especially like the fix for zero-length `packed-refs` files. (Even
though AFAIK Git never writes such files, they are totally legitimate
and shouldn't cause Git to fail.) With or without the additions
mentioned below,

Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>

While reviewing your patch, I realized that some areas of the existing
code use constructs that are undefined according to the C standard,
such as computing `NULL + 0` and `NULL - NULL`. This was already wrong
(and would come up more frequently after your change). Even though
these are unlikely to be problems in the real world, it would be good
to avoid them.

So I will follow up this email with three patches:

1. Mention that `snapshot::buf` can be NULL for empty files

   I suggest squashing this into your patch, to make it clear that
   `snapshot::buf` and `snapshot::eof` can also be NULL if the
   `packed-refs` file is empty.

2. create_snapshot(): exit early if the file was empty

   Avoid undefined behavior by returning early if `snapshot->buf` is
   NULL.

3. find_reference_location(): don't invoke if `snapshot->buf` is NULL

   Avoid undefined behavior and confusing semantics by not calling
   `find_reference_location()` when `snapshot->buf` is NULL.

Michael

Michael Haggerty (3):
  SQUASH? Mention that `snapshot::buf` can be NULL for empty files
  create_snapshot(): exit early if the file was empty
  find_reference_location(): don't invoke if `snapshot->buf` is NULL

 refs/packed-backend.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

-- 
2.14.2


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files
  2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
@ 2018-01-15 12:17     ` Michael Haggerty
  2018-01-15 12:17     ` [PATCH 2/3] create_snapshot(): exit early if the file was empty Michael Haggerty
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw)
  To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 01a13cb817..f20f05b4df 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -69,11 +69,11 @@ struct snapshot {
 
 	/*
 	 * The contents of the `packed-refs` file. If the file was
-	 * already sorted, this points at the mmapped contents of the
-	 * file. If not, this points at heap-allocated memory
-	 * containing the contents, sorted. If there were no contents
-	 * (e.g., because the file didn't exist), `buf` and `eof` are
-	 * both NULL.
+	 * already sorted and if mmapping is allowed, this points at
+	 * the mmapped contents of the file. If not, this points at
+	 * heap-allocated memory containing the contents, sorted. If
+	 * there were no contents (e.g., because the file didn't exist
+	 * or was empty), `buf` and `eof` are both NULL.
 	 */
 	char *buf, *eof;
 
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/3] create_snapshot(): exit early if the file was empty
  2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
  2018-01-15 12:17     ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty
@ 2018-01-15 12:17     ` Michael Haggerty
  2018-01-15 12:17     ` [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL Michael Haggerty
                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw)
  To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty

If the `packed_refs` files is entirely empty (i.e., not even a header
line), then `load_contents()` returns 1 even though `snapshot->buf`
and `snapshot->eof` both end up set to NULL. In that case, the
subsequent processing that `create_snapshot()` does is unnecessary,
and also involves computing `NULL - NULL` and `NULL + 0`, which are
probably safe in real life but are technically undefined in C.

Sidestep both issues by exiting early if `snapshot->buf` is NULL.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index f20f05b4df..36796d65f0 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -613,7 +613,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 	acquire_snapshot(snapshot);
 	snapshot->peeled = PEELED_NONE;
 
-	if (!load_contents(snapshot))
+	if (!load_contents(snapshot) || !snapshot->buf)
 		return snapshot;
 
 	/* If the file has a header line, process it: */
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL
  2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
  2018-01-15 12:17     ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty
  2018-01-15 12:17     ` [PATCH 2/3] create_snapshot(): exit early if the file was empty Michael Haggerty
@ 2018-01-15 12:17     ` Michael Haggerty
  2018-01-17 20:23     ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Johannes Schindelin
  2018-01-17 21:52     ` Junio C Hamano
  4 siblings, 0 replies; 33+ messages in thread
From: Michael Haggerty @ 2018-01-15 12:17 UTC (permalink / raw)
  To: Kim Gybels; +Cc: Johannes Schindelin, Junio C Hamano, git, Michael Haggerty

If `snapshot->buf` is NULL, then `find_reference_location()` has two
problems:

1. It relies on behavior that is technically undefined in C, such as
   computing `NULL + 0`.

2. It returns NULL if the reference doesn't exist, even if `mustexist`
   is not set. This problem doesn't come up in the current code,
   because we never call this function with `snapshot->buf == NULL`
   and `mustexist` set. But it is something that future callers need
   to be aware of.

We could fix the first problem by adding some extra logic to the
function. But considering both problems together, it is more
straightforward to document that the function should only be called if
`snapshot->buf` is non-NULL.

Adjust `packed_read_raw_ref()` to return early if `snapshot->buf` is
NULL rather than calling `find_reference_location()`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 36796d65f0..ed2b396bef 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -521,8 +521,9 @@ static int load_contents(struct snapshot *snapshot)
  * reference name; for example, one could search for "refs/replace/"
  * to find the start of any replace references.
  *
+ * This function must only be called if `snapshot->buf` is non-NULL.
  * The record is sought using a binary search, so `snapshot->buf` must
- * be sorted.
+ * also be sorted.
  */
 static const char *find_reference_location(struct snapshot *snapshot,
 					   const char *refname, int mustexist)
@@ -728,6 +729,12 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
 
 	*type = 0;
 
+	if (!snapshot->buf) {
+		/* There are no packed references */
+		errno = ENOENT;
+		return -1;
+	}
+
 	rec = find_reference_location(snapshot, refname, 1);
 
 	if (!rec) {
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH] packed_ref_cache: don't use mmap() for small files
  2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels
  2018-01-13 18:56 ` Johannes Schindelin
  2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
@ 2018-01-15 21:15 ` Jeff King
  2018-01-15 23:37   ` Kim Gybels
  2 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2018-01-15 21:15 UTC (permalink / raw)
  To: Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty

On Sat, Jan 13, 2018 at 05:11:49PM +0100, Kim Gybels wrote:

> Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use
> read() instead of mmap() for small packed-refs files.
> 
> This also fixes the problem[1] where xmmap() returns NULL for zero
> length[2], for which munmap() later fails.
> 
> Alternatively, we could simply check for NULL before munmap(), or
> introduce an xmunmap() that could be used together with xmmap().

This looks good to me, and since it's a recent-ish regression, I think
we should take the minimal fix here.

But it does make me wonder whether xmmap() ought to be doing this "small
mmap" optimization for us. Obviously that only works when we do
MAP_PRIVATE and never write to the result. But that's how we always use
it anyway, and we're restricted to that to work with the NO_MMAP wrapper
in compat/mmap.c.

> @@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot)
>  		die_errno("couldn't stat %s", snapshot->refs->path);
>  	size = xsize_t(st.st_size);
>  
> -	switch (mmap_strategy) {
> -	case MMAP_NONE:
> +	if (!size) {
> +		snapshot->buf = NULL;
> +		snapshot->eof = NULL;
> +		snapshot->mmapped = 0;
> +	} else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
>  		snapshot->buf = xmalloc(size);
>  		bytes_read = read_in_full(fd, snapshot->buf, size);
>  		if (bytes_read < 0 || bytes_read != size)
>  			die_errno("couldn't read %s", snapshot->refs->path);
>  		snapshot->eof = snapshot->buf + size;
>  		snapshot->mmapped = 0;

If the "!size" case is just lumped in with "size <= SMALL_FILE_SIZE",
then we'd try to xmalloc(0), which is guaranteed to work (we fallback to
a 1-byte allocation if necessary). Would that make things simpler and
more consistent for the rest of the code to always have snapshot->buf be
a valid pointer (just based on seeing Michael's follow-up patches)?

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] packed_ref_cache: don't use mmap() for small files
  2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King
@ 2018-01-15 23:37   ` Kim Gybels
  2018-01-15 23:52     ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Kim Gybels @ 2018-01-15 23:37 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty

On (15/01/18 16:15), Jeff King wrote:

> On Sat, Jan 13, 2018 at 05:11:49PM +0100, Kim Gybels wrote:
> 
> > Take a hint from commit ea68b0ce9f8ce8da3e360aed3cbd6720159ffbee and use
> > read() instead of mmap() for small packed-refs files.
> > 
> > This also fixes the problem[1] where xmmap() returns NULL for zero
> > length[2], for which munmap() later fails.
> > 
> > Alternatively, we could simply check for NULL before munmap(), or
> > introduce an xmunmap() that could be used together with xmmap().
> 
> This looks good to me, and since it's a recent-ish regression, I think
> we should take the minimal fix here.

The minimal fix being a simple NULL check before munmap()?

> But it does make me wonder whether xmmap() ought to be doing this "small
> mmap" optimization for us. Obviously that only works when we do
> MAP_PRIVATE and never write to the result. But that's how we always use
> it anyway, and we're restricted to that to work with the NO_MMAP wrapper
> in compat/mmap.c.

Maybe I should have left the optimization for small files out of the patch for
the zero length regression. After all, read() vs mmap() performance might
depend on other factors than just size.

> > @@ -489,21 +491,21 @@ static int load_contents(struct snapshot *snapshot)
> >  		die_errno("couldn't stat %s", snapshot->refs->path);
> >  	size = xsize_t(st.st_size);
> >  
> > -	switch (mmap_strategy) {
> > -	case MMAP_NONE:
> > +	if (!size) {
> > +		snapshot->buf = NULL;
> > +		snapshot->eof = NULL;
> > +		snapshot->mmapped = 0;
> > +	} else if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
> >  		snapshot->buf = xmalloc(size);
> >  		bytes_read = read_in_full(fd, snapshot->buf, size);
> >  		if (bytes_read < 0 || bytes_read != size)
> >  			die_errno("couldn't read %s", snapshot->refs->path);
> >  		snapshot->eof = snapshot->buf + size;
> >  		snapshot->mmapped = 0;
> 
> If the "!size" case is just lumped in with "size <= SMALL_FILE_SIZE",
> then we'd try to xmalloc(0), which is guaranteed to work (we fallback to
> a 1-byte allocation if necessary). Would that make things simpler and
> more consistent for the rest of the code to always have snapshot->buf be
> a valid pointer (just based on seeing Michael's follow-up patches)?

Indeed, all those patches are to avoid using the NULL pointers in ways that are
undefined. We could also copy index_core's way of handling the zero length
case:
ret = index_mem(sha1, "", size, type, path, flags);

Point to some static memory instead of NULL, then all the pointer arithmetic is defined.

-Kim

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH] packed_ref_cache: don't use mmap() for small files
  2018-01-15 23:37   ` Kim Gybels
@ 2018-01-15 23:52     ` Jeff King
  2018-01-16 19:38       ` [PATCH v3] " Kim Gybels
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2018-01-15 23:52 UTC (permalink / raw)
  To: Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty

On Tue, Jan 16, 2018 at 12:37:51AM +0100, Kim Gybels wrote:

> > This looks good to me, and since it's a recent-ish regression, I think
> > we should take the minimal fix here.
> 
> The minimal fix being a simple NULL check before munmap()?

Sorry to be unclear. I just meant that your patch is probably fine
as-is. I didn't want to hold up a regression fix with a bunch of
nit-picking or possible future work, when we could build that on top
later.

> > But it does make me wonder whether xmmap() ought to be doing this "small
> > mmap" optimization for us. Obviously that only works when we do
> > MAP_PRIVATE and never write to the result. But that's how we always use
> > it anyway, and we're restricted to that to work with the NO_MMAP wrapper
> > in compat/mmap.c.
> 
> Maybe I should have left the optimization for small files out of the patch for
> the zero length regression. After all, read() vs mmap() performance might
> depend on other factors than just size.

I'd be OK including it here, since there's prior art in the commit you
referenced. Though of course actual numbers are always good when
claiming an optimization. :)

> > If the "!size" case is just lumped in with "size <= SMALL_FILE_SIZE",
> > then we'd try to xmalloc(0), which is guaranteed to work (we fallback to
> > a 1-byte allocation if necessary). Would that make things simpler and
> > more consistent for the rest of the code to always have snapshot->buf be
> > a valid pointer (just based on seeing Michael's follow-up patches)?
> 
> Indeed, all those patches are to avoid using the NULL pointers in ways that are
> undefined. We could also copy index_core's way of handling the zero length
> case:
> ret = index_mem(sha1, "", size, type, path, flags);
> 
> Point to some static memory instead of NULL, then all the pointer arithmetic is defined.

Yep, that would work, too. I don't think the overhead of a
once-per-process xmalloc(0) is a big deal, though, if it keeps the code
simpler (though I admit it is not that complex either way).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v3] packed_ref_cache: don't use mmap() for small files
  2018-01-15 23:52     ` Jeff King
@ 2018-01-16 19:38       ` Kim Gybels
  2018-01-17 22:09         ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Kim Gybels @ 2018-01-16 19:38 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty, Kim Gybels

Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
small files, 2010-02-21) and use read() instead of mmap() for small
packed-refs files.

This also fixes the problem[1] where xmmap() returns NULL for zero
length[2], for which munmap() later fails.

Alternatively, we could simply check for NULL before munmap(), or
introduce xmunmap() that could be used together with xmmap(). However,
always setting snapshot->buf to a valid pointer, by relying on
xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots
easier.

[1] https://github.com/git-for-windows/git/issues/1410
[2] Logic introduced in commit 9130ac1e196 (Better error messages for
    corrupt databases, 2007-01-11)

Signed-off-by: Kim Gybels <kgybels@infogroep.be>
---

Change since v2: removed separate case for zero length as suggested by Peff,
ensuring that snapshot->buf is always a valid pointer.

 refs/packed-backend.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index dab8a85d9a..b6e2bc3c1d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -455,6 +455,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 				 last_line, eof - last_line);
 }
 
+#define SMALL_FILE_SIZE (32*1024)
+
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
@@ -489,21 +491,17 @@ static int load_contents(struct snapshot *snapshot)
 		die_errno("couldn't stat %s", snapshot->refs->path);
 	size = xsize_t(st.st_size);
 
-	switch (mmap_strategy) {
-	case MMAP_NONE:
+	if (size <= SMALL_FILE_SIZE || mmap_strategy == MMAP_NONE) {
 		snapshot->buf = xmalloc(size);
 		bytes_read = read_in_full(fd, snapshot->buf, size);
 		if (bytes_read < 0 || bytes_read != size)
 			die_errno("couldn't read %s", snapshot->refs->path);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 0;
-		break;
-	case MMAP_TEMPORARY:
-	case MMAP_OK:
+	} else {
 		snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
 		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 1;
-		break;
 	}
 	close(fd);
 
-- 
2.15.1.windows.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files"
  2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
                       ` (2 preceding siblings ...)
  2018-01-15 12:17     ` [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL Michael Haggerty
@ 2018-01-17 20:23     ` Johannes Schindelin
  2018-01-17 21:52     ` Junio C Hamano
  4 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin @ 2018-01-17 20:23 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Kim Gybels, Junio C Hamano, git

Hi Michael,

On Mon, 15 Jan 2018, Michael Haggerty wrote:

> Thanks for your patch. I haven't measured the performance difference
> of `mmap()` vs. `read()` for small `packed-refs` files, but it's not
> surprising that `read()` would be faster.
> 
> I especially like the fix for zero-length `packed-refs` files. (Even
> though AFAIK Git never writes such files, they are totally legitimate
> and shouldn't cause Git to fail.) With or without the additions
> mentioned below,
> 
> Reviewed-by: Michael Haggerty <mhagger@alum.mit.edu>
> 
> While reviewing your patch, I realized that some areas of the existing
> code use constructs that are undefined according to the C standard,
> such as computing `NULL + 0` and `NULL - NULL`. This was already wrong
> (and would come up more frequently after your change). Even though
> these are unlikely to be problems in the real world, it would be good
> to avoid them.
> 
> So I will follow up this email with three patches:
> 
> 1. Mention that `snapshot::buf` can be NULL for empty files
> 
>    I suggest squashing this into your patch, to make it clear that
>    `snapshot::buf` and `snapshot::eof` can also be NULL if the
>    `packed-refs` file is empty.
> 
> 2. create_snapshot(): exit early if the file was empty
> 
>    Avoid undefined behavior by returning early if `snapshot->buf` is
>    NULL.
> 
> 3. find_reference_location(): don't invoke if `snapshot->buf` is NULL
> 
>    Avoid undefined behavior and confusing semantics by not calling
>    `find_reference_location()` when `snapshot->buf` is NULL.
> 
> Michael
> 
> Michael Haggerty (3):
>   SQUASH? Mention that `snapshot::buf` can be NULL for empty files
>   create_snapshot(): exit early if the file was empty
>   find_reference_location(): don't invoke if `snapshot->buf` is NULL

I reviewed those patches and find the straight-forward (and obviously
good).

Thanks,
Dscho

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files"
  2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
                       ` (3 preceding siblings ...)
  2018-01-17 20:23     ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Johannes Schindelin
@ 2018-01-17 21:52     ` Junio C Hamano
  4 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2018-01-17 21:52 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Kim Gybels, Johannes Schindelin, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> So I will follow up this email with three patches:
>
> 1. Mention that `snapshot::buf` can be NULL for empty files
>
>    I suggest squashing this into your patch, to make it clear that
>    `snapshot::buf` and `snapshot::eof` can also be NULL if the
>    `packed-refs` file is empty.
>
> 2. create_snapshot(): exit early if the file was empty
>
>    Avoid undefined behavior by returning early if `snapshot->buf` is
>    NULL.
>
> 3. find_reference_location(): don't invoke if `snapshot->buf` is NULL
>
>    Avoid undefined behavior and confusing semantics by not calling
>    `find_reference_location()` when `snapshot->buf` is NULL.

These look all sensible with today's code and with v2 from this
thread.

With the v3, i.e. "do the xmalloc() even for size==0", however,
snapshot->buf would never be NULL, so I'd shelve them for now,
though.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files
  2018-01-16 19:38       ` [PATCH v3] " Kim Gybels
@ 2018-01-17 22:09         ` Jeff King
  2018-01-21  4:41           ` Michael Haggerty
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2018-01-17 22:09 UTC (permalink / raw)
  To: Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin, Michael Haggerty

On Tue, Jan 16, 2018 at 08:38:15PM +0100, Kim Gybels wrote:

> Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
> small files, 2010-02-21) and use read() instead of mmap() for small
> packed-refs files.
> 
> This also fixes the problem[1] where xmmap() returns NULL for zero
> length[2], for which munmap() later fails.
> 
> Alternatively, we could simply check for NULL before munmap(), or
> introduce xmunmap() that could be used together with xmmap(). However,
> always setting snapshot->buf to a valid pointer, by relying on
> xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots
> easier.
> 
> [1] https://github.com/git-for-windows/git/issues/1410
> [2] Logic introduced in commit 9130ac1e196 (Better error messages for
>     corrupt databases, 2007-01-11)
> 
> Signed-off-by: Kim Gybels <kgybels@infogroep.be>
> ---
> 
> Change since v2: removed separate case for zero length as suggested by Peff,
> ensuring that snapshot->buf is always a valid pointer.

Thanks, this looks fine to me (I'd be curious to hear from Michael if
this eliminates the need for the other patches).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files
  2018-01-17 22:09         ` Jeff King
@ 2018-01-21  4:41           ` Michael Haggerty
  2018-01-22 19:31             ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Haggerty @ 2018-01-21  4:41 UTC (permalink / raw)
  To: Jeff King, Kim Gybels; +Cc: git, Junio C Hamano, Johannes Schindelin

On 01/17/2018 11:09 PM, Jeff King wrote:
> On Tue, Jan 16, 2018 at 08:38:15PM +0100, Kim Gybels wrote:
> 
>> Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
>> small files, 2010-02-21) and use read() instead of mmap() for small
>> packed-refs files.
>>
>> This also fixes the problem[1] where xmmap() returns NULL for zero
>> length[2], for which munmap() later fails.
>>
>> Alternatively, we could simply check for NULL before munmap(), or
>> introduce xmunmap() that could be used together with xmmap(). However,
>> always setting snapshot->buf to a valid pointer, by relying on
>> xmalloc(0)'s fallback to 1-byte allocation, makes using snapshots
>> easier.
>>
>> [1] https://github.com/git-for-windows/git/issues/1410
>> [2] Logic introduced in commit 9130ac1e196 (Better error messages for
>>     corrupt databases, 2007-01-11)
>>
>> Signed-off-by: Kim Gybels <kgybels@infogroep.be>
>> ---
>>
>> Change since v2: removed separate case for zero length as suggested by Peff,
>> ensuring that snapshot->buf is always a valid pointer.
> 
> Thanks, this looks fine to me (I'd be curious to hear from Michael if
> this eliminates the need for the other patches).

`snapshot->buf` can still be NULL if the `packed-refs` file didn't exist
(see the earlier code path in `load_contents()`). So either that code
path *also* has to get the `xmalloc()` treatment, or my third patch is
still necessary. (My second patch wouldn't be necessary because the
ENOENT case makes `load_contents()` return 0, triggering the early exit
from `create_snapshot()`.)

I don't have a strong preference either way.

Michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files
  2018-01-21  4:41           ` Michael Haggerty
@ 2018-01-22 19:31             ` Junio C Hamano
  2018-01-24 11:05               ` Michael Haggerty
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2018-01-22 19:31 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jeff King, Kim Gybels, git, Johannes Schindelin

Michael Haggerty <mhagger@alum.mit.edu> writes:

> `snapshot->buf` can still be NULL if the `packed-refs` file didn't exist
> (see the earlier code path in `load_contents()`). So either that code
> path *also* has to get the `xmalloc()` treatment, or my third patch is
> still necessary. (My second patch wouldn't be necessary because the
> ENOENT case makes `load_contents()` return 0, triggering the early exit
> from `create_snapshot()`.)
>
> I don't have a strong preference either way.

Which would be a two-liner, like the attached, which does not look
too bad by itself.

The direction, if we take this approach, means that we are declaring
that .buf being NULL is an invalid state for a snapshot to be in,
instead of saying "an empty snapshot looks exactly like one that was
freshly initialized", which seems to be the intention of the original
design.

After Kim's fix and with 3/3 in your follow-up series, various
helpers are still unsafe against .buf being NULL, like
sort_snapshot(), verify_buffer_safe(), clear_snapshot_buffer() (only
when mmapped bit is set), find_reference_location().

packed_ref_iterator_begin() checks if snapshot->buf is NULL and
returns early.  At the first glance, this appears a useful short cut
to optimize the empty case away, but the check also is acting as a
guard to prevent a snapshot with NULL .buf from being fed to an
unsafe find_reference_location().  An implicit guard like this feels
a bit more brittle than my liking.  If we ensure .buf is never NULL,
that check can become a pure short-cut optimization and stop being
a correctness thing.

So...


 refs/packed-backend.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b6e2bc3c1d..1eeb5c7f80 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -473,12 +473,11 @@ static int load_contents(struct snapshot *snapshot)
 	if (fd < 0) {
 		if (errno == ENOENT) {
 			/*
-			 * This is OK; it just means that no
-			 * "packed-refs" file has been written yet,
-			 * which is equivalent to it being empty,
-			 * which is its state when initialized with
-			 * zeros.
+			 * Treat missing "packed-refs" as equivalent to
+			 * it being empty.
 			 */
+			snapshot->eof = snapshot->buf = xmalloc(0);
+			snapshot->mmapped = 0;
 			return 0;
 		} else {
 			die_errno("couldn't read %s", snapshot->refs->path);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files
  2018-01-22 19:31             ` Junio C Hamano
@ 2018-01-24 11:05               ` Michael Haggerty
  2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
  2018-01-24 18:05                 ` [PATCH v3] packed_ref_cache: don't use mmap() for small files Junio C Hamano
  0 siblings, 2 replies; 33+ messages in thread
From: Michael Haggerty @ 2018-01-24 11:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Kim Gybels, git, Johannes Schindelin

On 01/22/2018 08:31 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> `snapshot->buf` can still be NULL if the `packed-refs` file didn't exist
>> (see the earlier code path in `load_contents()`). So either that code
>> path *also* has to get the `xmalloc()` treatment, or my third patch is
>> still necessary. (My second patch wouldn't be necessary because the
>> ENOENT case makes `load_contents()` return 0, triggering the early exit
>> from `create_snapshot()`.)
>>
>> I don't have a strong preference either way.
> 
> Which would be a two-liner, like the attached, which does not look
> too bad by itself.
> 
> The direction, if we take this approach, means that we are declaring
> that .buf being NULL is an invalid state for a snapshot to be in,
> instead of saying "an empty snapshot looks exactly like one that was
> freshly initialized", which seems to be the intention of the original
> design.
> 
> After Kim's fix and with 3/3 in your follow-up series, various
> helpers are still unsafe against .buf being NULL, like
> sort_snapshot(), verify_buffer_safe(), clear_snapshot_buffer() (only
> when mmapped bit is set), find_reference_location().
> 
> packed_ref_iterator_begin() checks if snapshot->buf is NULL and
> returns early.  At the first glance, this appears a useful short cut
> to optimize the empty case away, but the check also is acting as a
> guard to prevent a snapshot with NULL .buf from being fed to an
> unsafe find_reference_location().  An implicit guard like this feels
> a bit more brittle than my liking.  If we ensure .buf is never NULL,
> that check can become a pure short-cut optimization and stop being
> a correctness thing.
> 
> So...
> 
> 
>  refs/packed-backend.c | 9 ++++-----
>  1 file changed, 4 insertions(+), 5 deletions(-)
> 
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index b6e2bc3c1d..1eeb5c7f80 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> @@ -473,12 +473,11 @@ static int load_contents(struct snapshot *snapshot)
>  	if (fd < 0) {
>  		if (errno == ENOENT) {
>  			/*
> -			 * This is OK; it just means that no
> -			 * "packed-refs" file has been written yet,
> -			 * which is equivalent to it being empty,
> -			 * which is its state when initialized with
> -			 * zeros.
> +			 * Treat missing "packed-refs" as equivalent to
> +			 * it being empty.
>  			 */
> +			snapshot->eof = snapshot->buf = xmalloc(0);
> +			snapshot->mmapped = 0;
>  			return 0;
>  		} else {
>  			die_errno("couldn't read %s", snapshot->refs->path);
> 

That would work, though if you go this way, please also change the
docstring for `snapshot::buf`, which still says that `buf` and `eof` can
be `NULL`.

The other alternative, making `snapshot` safe for NULLs, becomes easier
if `snapshot` stores a pointer to the start of the reference section of
the `packed-refs` contents (i.e., after the header line), rather than
repeatedly computing that address from `snapshot->buf +
snapshot->header_len`. With this change, code that is technically
undefined when the fields are NULL can more easily be replaced with code
that is safe for NULL. For example,

    pos = snapshot->buf + snapshot->header_len

becomes

    pos = snapshot->start

, and

    len = snapshot->eof - pos;
    if (!len) [...]

becomes

    if (pos == snapshot->eof) [...]
    len = snapshot->eof - pos;

. In this way, most of the special-casing for NULL goes away (and some
code becomes simpler, as well).

In a moment I'll send a patch series illustrating this approach. I think
patches 01, 02, and 04 are improvements regardless of whether we decide
to make NULL safe.

The change to using `read()` rather than `mmap()` for small
`packed-refs` feels like it should be an improvement, but it occurred to
me that the performance numbers quoted in ea68b0ce9f8 (hash-object:
don't use mmap() for small files, 2010-02-21) are not directly
applicable to the `packed-refs` file. As far as I understand, the file
mmapped in `index_fd()` is always read in full, whereas the main point
of mmapping the packed-refs file is to avoid having to read the whole
file at all in some situations. That being said, a 32 KiB file would
only be 8 pages (assuming a page size of 4 KiB), and by the time you've
read the header and binary-searched to find the desired record, you've
probably paged in most of the file anyway. Reading the whole file at
once, in order, is almost certainly cheaper.

Michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 0/6] Yet another approach to handling empty snapshots
  2018-01-24 11:05               ` Michael Haggerty
@ 2018-01-24 11:14                 ` Michael Haggerty
  2018-01-24 11:14                   ` [PATCH 1/6] struct snapshot: store `start` rather than `header_len` Michael Haggerty
                                     ` (8 more replies)
  2018-01-24 18:05                 ` [PATCH v3] packed_ref_cache: don't use mmap() for small files Junio C Hamano
  1 sibling, 9 replies; 33+ messages in thread
From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty

This patch series fixes the handling of empty packed-refs snapshots
(i.e., those with `snapshot->buf` and friends equal to `NULL`), partly
by changing `snapshot` to store a pointer to the start of the
post-header `packed-refs` content instead of `header_len`. It makes a
couple of other improvements as well.

I'm not sure whether I like this approach better than the alternative
of always setting `snapshot->buf` to a non-NULL value, by allocating a
length-1 bit of RAM if necessary. The latter is less intrusive, though
even if that approach is taken, I think patches 01, 02, and 04 from
this patch series would be worthwhile improvements.

Michael

Kim Gybels (1):
  packed_ref_cache: don't use mmap() for small files

Michael Haggerty (5):
  struct snapshot: store `start` rather than `header_len`
  create_snapshot(): use `xmemdupz()` rather than a strbuf
  find_reference_location(): make function safe for empty snapshots
  packed_ref_iterator_begin(): make optimization more general
  load_contents(): don't try to mmap an empty file

 refs/packed-backend.c | 106 ++++++++++++++++++++++++++------------------------
 1 file changed, 55 insertions(+), 51 deletions(-)

-- 
2.14.2


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/6] struct snapshot: store `start` rather than `header_len`
  2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
@ 2018-01-24 11:14                   ` Michael Haggerty
  2018-01-24 20:36                     ` Jeff King
  2018-01-24 11:14                   ` [PATCH 2/6] create_snapshot(): use `xmemdupz()` rather than a strbuf Michael Haggerty
                                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty

Store a pointer to the start of the actual references within the
`packed-refs` contents rather than storing the length of the header.
This is more convenient for most users of this field.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 64 ++++++++++++++++++++++++++-------------------------
 1 file changed, 33 insertions(+), 31 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 023243fd5f..b872267f02 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -68,17 +68,21 @@ struct snapshot {
 	int mmapped;
 
 	/*
-	 * The contents of the `packed-refs` file. If the file was
-	 * already sorted, this points at the mmapped contents of the
-	 * file. If not, this points at heap-allocated memory
-	 * containing the contents, sorted. If there were no contents
-	 * (e.g., because the file didn't exist), `buf` and `eof` are
-	 * both NULL.
+	 * The contents of the `packed-refs` file:
+	 *
+	 * - buf -- a pointer to the start of the memory
+	 * - start -- a pointer to the first byte of actual references
+         *   (i.e., after the header line, if one is present)
+	 * - eof -- a pointer just past the end of the reference
+         *   contents
+	 *
+	 * If the `packed-refs` file was already sorted, `buf` points
+	 * at the mmapped contents of the file. If not, it points at
+	 * heap-allocated memory containing the contents, sorted. If
+	 * there were no contents (e.g., because the file didn't
+	 * exist), `buf`, `start`, and `eof` are all NULL.
 	 */
-	char *buf, *eof;
-
-	/* The size of the header line, if any; otherwise, 0: */
-	size_t header_len;
+	char *buf, *start, *eof;
 
 	/*
 	 * What is the peeled state of the `packed-refs` file that
@@ -169,8 +173,7 @@ static void clear_snapshot_buffer(struct snapshot *snapshot)
 	} else {
 		free(snapshot->buf);
 	}
-	snapshot->buf = snapshot->eof = NULL;
-	snapshot->header_len = 0;
+	snapshot->buf = snapshot->start = snapshot->eof = NULL;
 }
 
 /*
@@ -319,13 +322,14 @@ static void sort_snapshot(struct snapshot *snapshot)
 	size_t len, i;
 	char *new_buffer, *dst;
 
-	pos = snapshot->buf + snapshot->header_len;
+	pos = snapshot->start;
 	eof = snapshot->eof;
-	len = eof - pos;
 
-	if (!len)
+	if (pos == eof)
 		return;
 
+	len = eof - pos;
+
 	/*
 	 * Initialize records based on a crude estimate of the number
 	 * of references in the file (we'll grow it below if needed):
@@ -391,9 +395,8 @@ static void sort_snapshot(struct snapshot *snapshot)
 	 * place:
 	 */
 	clear_snapshot_buffer(snapshot);
-	snapshot->buf = new_buffer;
+	snapshot->buf = snapshot->start = new_buffer;
 	snapshot->eof = new_buffer + len;
-	snapshot->header_len = 0;
 
 cleanup:
 	free(records);
@@ -442,14 +445,14 @@ static const char *find_end_of_record(const char *p, const char *end)
  */
 static void verify_buffer_safe(struct snapshot *snapshot)
 {
-	const char *buf = snapshot->buf + snapshot->header_len;
+	const char *start = snapshot->start;
 	const char *eof = snapshot->eof;
 	const char *last_line;
 
-	if (buf == eof)
+	if (start == eof)
 		return;
 
-	last_line = find_start_of_record(buf, eof - 1);
+	last_line = find_start_of_record(start, eof - 1);
 	if (*(eof - 1) != '\n' || eof - last_line < GIT_SHA1_HEXSZ + 2)
 		die_invalid_line(snapshot->refs->path,
 				 last_line, eof - last_line);
@@ -495,18 +498,19 @@ static int load_contents(struct snapshot *snapshot)
 		bytes_read = read_in_full(fd, snapshot->buf, size);
 		if (bytes_read < 0 || bytes_read != size)
 			die_errno("couldn't read %s", snapshot->refs->path);
-		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 0;
 		break;
 	case MMAP_TEMPORARY:
 	case MMAP_OK:
 		snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
-		snapshot->eof = snapshot->buf + size;
 		snapshot->mmapped = 1;
 		break;
 	}
 	close(fd);
 
+	snapshot->start = snapshot->buf;
+	snapshot->eof = snapshot->buf + size;
+
 	return 1;
 }
 
@@ -539,7 +543,7 @@ static const char *find_reference_location(struct snapshot *snapshot,
 	 * preceding records all have reference names that come
 	 * *before* `refname`.
 	 */
-	const char *lo = snapshot->buf + snapshot->header_len;
+	const char *lo = snapshot->start;
 
 	/*
 	 * A pointer to a the first character of a record whose
@@ -617,8 +621,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 	/* If the file has a header line, process it: */
 	if (snapshot->buf < snapshot->eof && *snapshot->buf == '#') {
 		struct strbuf tmp = STRBUF_INIT;
-		char *p;
-		const char *eol;
+		char *p, *eol;
 		struct string_list traits = STRING_LIST_INIT_NODUP;
 
 		eol = memchr(snapshot->buf, '\n',
@@ -647,7 +650,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 		/* perhaps other traits later as well */
 
 		/* The "+ 1" is for the LF character. */
-		snapshot->header_len = eol + 1 - snapshot->buf;
+		snapshot->start = eol + 1;
 
 		string_list_clear(&traits, 0);
 		strbuf_release(&tmp);
@@ -671,13 +674,12 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 		 * We don't want to leave the file mmapped, so we are
 		 * forced to make a copy now:
 		 */
-		size_t size = snapshot->eof -
-			(snapshot->buf + snapshot->header_len);
+		size_t size = snapshot->eof - snapshot->start;
 		char *buf_copy = xmalloc(size);
 
-		memcpy(buf_copy, snapshot->buf + snapshot->header_len, size);
+		memcpy(buf_copy, snapshot->start, size);
 		clear_snapshot_buffer(snapshot);
-		snapshot->buf = buf_copy;
+		snapshot->buf = snapshot->start = buf_copy;
 		snapshot->eof = buf_copy + size;
 	}
 
@@ -937,7 +939,7 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	if (prefix && *prefix)
 		start = find_reference_location(snapshot, prefix, 0);
 	else
-		start = snapshot->buf + snapshot->header_len;
+		start = snapshot->start;
 
 	iter->pos = start;
 	iter->eof = snapshot->eof;
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/6] create_snapshot(): use `xmemdupz()` rather than a strbuf
  2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
  2018-01-24 11:14                   ` [PATCH 1/6] struct snapshot: store `start` rather than `header_len` Michael Haggerty
@ 2018-01-24 11:14                   ` Michael Haggerty
  2018-01-24 11:14                   ` [PATCH 3/6] find_reference_location(): make function safe for empty snapshots Michael Haggerty
                                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty

It's lighter weight.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b872267f02..08698de6ea 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -620,8 +620,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 
 	/* If the file has a header line, process it: */
 	if (snapshot->buf < snapshot->eof && *snapshot->buf == '#') {
-		struct strbuf tmp = STRBUF_INIT;
-		char *p, *eol;
+		char *tmp, *p, *eol;
 		struct string_list traits = STRING_LIST_INIT_NODUP;
 
 		eol = memchr(snapshot->buf, '\n',
@@ -631,9 +630,9 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 					      snapshot->buf,
 					      snapshot->eof - snapshot->buf);
 
-		strbuf_add(&tmp, snapshot->buf, eol - snapshot->buf);
+		tmp = xmemdupz(snapshot->buf, eol - snapshot->buf);
 
-		if (!skip_prefix(tmp.buf, "# pack-refs with:", (const char **)&p))
+		if (!skip_prefix(tmp, "# pack-refs with:", (const char **)&p))
 			die_invalid_line(refs->path,
 					 snapshot->buf,
 					 snapshot->eof - snapshot->buf);
@@ -653,7 +652,7 @@ static struct snapshot *create_snapshot(struct packed_ref_store *refs)
 		snapshot->start = eol + 1;
 
 		string_list_clear(&traits, 0);
-		strbuf_release(&tmp);
+		free(tmp);
 	}
 
 	verify_buffer_safe(snapshot);
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/6] find_reference_location(): make function safe for empty snapshots
  2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
  2018-01-24 11:14                   ` [PATCH 1/6] struct snapshot: store `start` rather than `header_len` Michael Haggerty
  2018-01-24 11:14                   ` [PATCH 2/6] create_snapshot(): use `xmemdupz()` rather than a strbuf Michael Haggerty
@ 2018-01-24 11:14                   ` Michael Haggerty
  2018-01-24 20:27                     ` Jeff King
  2018-01-24 11:14                   ` [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general Michael Haggerty
                                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty

This function had two problems if called for an empty snapshot (i.e.,
`snapshot->start == snapshot->eof == NULL`):

* It checked `NULL < NULL`, which is undefined by C (albeit highly
  unlikely to fail in the real world).

* (Assuming the above comparison behaved as expected), it returned
  NULL when `mustexist` was false, contrary to its docstring.

Change the check and fix the docstring.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 08698de6ea..361affd7ad 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -519,9 +519,11 @@ static int load_contents(struct snapshot *snapshot)
  * `refname` starts. If `mustexist` is true and the reference doesn't
  * exist, then return NULL. If `mustexist` is false and the reference
  * doesn't exist, then return the point where that reference would be
- * inserted. In the latter mode, `refname` doesn't have to be a proper
- * reference name; for example, one could search for "refs/replace/"
- * to find the start of any replace references.
+ * inserted, or `snapshot->eof` (which might be NULL) if it would be
+ * inserted at the end of the file. In the latter mode, `refname`
+ * doesn't have to be a proper reference name; for example, one could
+ * search for "refs/replace/" to find the start of any replace
+ * references.
  *
  * The record is sought using a binary search, so `snapshot->buf` must
  * be sorted.
@@ -551,7 +553,7 @@ static const char *find_reference_location(struct snapshot *snapshot,
 	 */
 	const char *hi = snapshot->eof;
 
-	while (lo < hi) {
+	while (lo != hi) {
 		const char *mid, *rec;
 		int cmp;
 
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general
  2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
                                     ` (2 preceding siblings ...)
  2018-01-24 11:14                   ` [PATCH 3/6] find_reference_location(): make function safe for empty snapshots Michael Haggerty
@ 2018-01-24 11:14                   ` Michael Haggerty
  2018-01-24 20:32                     ` Jeff King
  2018-01-24 11:14                   ` [PATCH 5/6] load_contents(): don't try to mmap an empty file Michael Haggerty
                                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 33+ messages in thread
From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty

We can return an empty iterator not only if the `packed-refs` file is
missing, but also if it is empty or if there are no references whose
names succeed `prefix`. Optimize away those cases as well by moving
the call to `find_reference_location()` higher in the function and
checking whether the determined start position is the same as
`snapshot->eof`. (This is possible now because the previous commit
made `find_reference_location()` robust against empty snapshots.)

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 361affd7ad..988c45402b 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -927,7 +927,12 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	 */
 	snapshot = get_snapshot(refs);
 
-	if (!snapshot->buf)
+	if (prefix && *prefix)
+		start = find_reference_location(snapshot, prefix, 0);
+	else
+		start = snapshot->start;
+
+	if (start == snapshot->eof)
 		return empty_ref_iterator_begin();
 
 	iter = xcalloc(1, sizeof(*iter));
@@ -937,11 +942,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	iter->snapshot = snapshot;
 	acquire_snapshot(snapshot);
 
-	if (prefix && *prefix)
-		start = find_reference_location(snapshot, prefix, 0);
-	else
-		start = snapshot->start;
-
 	iter->pos = start;
 	iter->eof = snapshot->eof;
 	strbuf_init(&iter->refname_buf, 0);
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/6] load_contents(): don't try to mmap an empty file
  2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
                                     ` (3 preceding siblings ...)
  2018-01-24 11:14                   ` [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general Michael Haggerty
@ 2018-01-24 11:14                   ` Michael Haggerty
  2018-01-24 11:14                   ` [PATCH 6/6] packed_ref_cache: don't use mmap() for small files Michael Haggerty
                                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty

We don't actually create zero-length `packed-refs` files, but they are
valid and we should handle them correctly. The old code `xmmap()`ed
such files, which led to an error when `munmap()` was called. So, if
the `packed-refs` file is empty, leave the snapshot at its zero values
and return 0 without trying to read or mmap the file.

Returning 0 also makes `create_snapshot()` exit early, which avoids
the technically undefined comparison `NULL < NULL`.

Reported-by: Kim Gybels <kgybels@infogroep.be>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 988c45402b..e829cf206d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -461,7 +461,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
- * existed and was read, or 0 if the file was absent. Die on errors.
+ * existed and was read, or 0 if the file was absent or empty. Die on
+ * errors.
  */
 static int load_contents(struct snapshot *snapshot)
 {
@@ -492,19 +493,17 @@ static int load_contents(struct snapshot *snapshot)
 		die_errno("couldn't stat %s", snapshot->refs->path);
 	size = xsize_t(st.st_size);
 
-	switch (mmap_strategy) {
-	case MMAP_NONE:
+	if (!size) {
+		return 0;
+	} else if (mmap_strategy == MMAP_NONE) {
 		snapshot->buf = xmalloc(size);
 		bytes_read = read_in_full(fd, snapshot->buf, size);
 		if (bytes_read < 0 || bytes_read != size)
 			die_errno("couldn't read %s", snapshot->refs->path);
 		snapshot->mmapped = 0;
-		break;
-	case MMAP_TEMPORARY:
-	case MMAP_OK:
+	} else {
 		snapshot->buf = xmmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
 		snapshot->mmapped = 1;
-		break;
 	}
 	close(fd);
 
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/6] packed_ref_cache: don't use mmap() for small files
  2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
                                     ` (4 preceding siblings ...)
  2018-01-24 11:14                   ` [PATCH 5/6] load_contents(): don't try to mmap an empty file Michael Haggerty
@ 2018-01-24 11:14                   ` Michael Haggerty
  2018-01-24 20:38                   ` [PATCH 0/6] Yet another approach to handling empty snapshots Jeff King
                                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 33+ messages in thread
From: Michael Haggerty @ 2018-01-24 11:14 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kim Gybels, Johannes Schindelin, Jeff King, git, Michael Haggerty

From: Kim Gybels <kgybels@infogroep.be>

Take a hint from commit ea68b0ce9f8 (hash-object: don't use mmap() for
small files, 2010-02-21) and use read() instead of mmap() for small
packed-refs files.

Signed-off-by: Kim Gybels <kgybels@infogroep.be>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index e829cf206d..8b4b45da67 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -458,6 +458,8 @@ static void verify_buffer_safe(struct snapshot *snapshot)
 				 last_line, eof - last_line);
 }
 
+#define SMALL_FILE_SIZE (32*1024)
+
 /*
  * Depending on `mmap_strategy`, either mmap or read the contents of
  * the `packed-refs` file into the snapshot. Return 1 if the file
@@ -495,7 +497,7 @@ static int load_contents(struct snapshot *snapshot)
 
 	if (!size) {
 		return 0;
-	} else if (mmap_strategy == MMAP_NONE) {
+	} else if (mmap_strategy == MMAP_NONE || size <= SMALL_FILE_SIZE) {
 		snapshot->buf = xmalloc(size);
 		bytes_read = read_in_full(fd, snapshot->buf, size);
 		if (bytes_read < 0 || bytes_read != size)
-- 
2.14.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v3] packed_ref_cache: don't use mmap() for small files
  2018-01-24 11:05               ` Michael Haggerty
  2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
@ 2018-01-24 18:05                 ` Junio C Hamano
  1 sibling, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2018-01-24 18:05 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Jeff King, Kim Gybels, git, Johannes Schindelin

Michael Haggerty <mhagger@alum.mit.edu> writes:

> The change to using `read()` rather than `mmap()` for small
> `packed-refs` feels like it should be an improvement, but it occurred to
> me that the performance numbers quoted in ea68b0ce9f8 (hash-object:
> don't use mmap() for small files, 2010-02-21) are not directly
> applicable to the `packed-refs` file. As far as I understand, the file
> mmapped in `index_fd()` is always read in full, whereas the main point
> of mmapping the packed-refs file is to avoid having to read the whole
> file at all in some situations. That being said, a 32 KiB file would
> only be 8 pages (assuming a page size of 4 KiB), and by the time you've
> read the header and binary-searched to find the desired record, you've
> probably paged in most of the file anyway. Reading the whole file at
> once, in order, is almost certainly cheaper.

Yup.  So unless your "small" is meaningfully large, we are likely to
be better off with read(2), but I suspect that this might not be
even measuable since we are only talking about "small" files.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] find_reference_location(): make function safe for empty snapshots
  2018-01-24 11:14                   ` [PATCH 3/6] find_reference_location(): make function safe for empty snapshots Michael Haggerty
@ 2018-01-24 20:27                     ` Jeff King
  2018-01-24 21:11                       ` Junio C Hamano
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2018-01-24 20:27 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Kim Gybels, Johannes Schindelin, git

On Wed, Jan 24, 2018 at 12:14:13PM +0100, Michael Haggerty wrote:

> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 08698de6ea..361affd7ad 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> [...]
> @@ -551,7 +553,7 @@ static const char *find_reference_location(struct snapshot *snapshot,
>  	 */
>  	const char *hi = snapshot->eof;
>  
> -	while (lo < hi) {
> +	while (lo != hi) {
>  		const char *mid, *rec;
>  		int cmp;

This tightens the binary search termination condition. If we ever did
see "hi > lo", we'd want to terminate the loop. Is that ever possible?

I think the answer is "no". Our "hi" here is an exclusive bound, so we
should never go past it via find_end_of_record() when assigning "lo".
And "hi" is always assigned from the start of the current record. That
can never cross "lo", because find_start_of_record() ensures it.

So I think it's fine, but I wanted to double check.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general
  2018-01-24 11:14                   ` [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general Michael Haggerty
@ 2018-01-24 20:32                     ` Jeff King
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2018-01-24 20:32 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Kim Gybels, Johannes Schindelin, git

On Wed, Jan 24, 2018 at 12:14:14PM +0100, Michael Haggerty wrote:

> We can return an empty iterator not only if the `packed-refs` file is
> missing, but also if it is empty or if there are no references whose
> names succeed `prefix`. Optimize away those cases as well by moving
> the call to `find_reference_location()` higher in the function and
> checking whether the determined start position is the same as
> `snapshot->eof`. (This is possible now because the previous commit
> made `find_reference_location()` robust against empty snapshots.)

Makes sense.

> @@ -937,11 +942,6 @@ static struct ref_iterator *packed_ref_iterator_begin(
>  	iter->snapshot = snapshot;
>  	acquire_snapshot(snapshot);
>  
> -	if (prefix && *prefix)
> -		start = find_reference_location(snapshot, prefix, 0);
> -	else
> -		start = snapshot->start;
> -

I did a double-take here that we are now looking at the snapshot without
calling acquire_snapshot(). But that function is just about taking a
refcount on it. The actual acquisition of data happens in
get_snapshot().

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/6] struct snapshot: store `start` rather than `header_len`
  2018-01-24 11:14                   ` [PATCH 1/6] struct snapshot: store `start` rather than `header_len` Michael Haggerty
@ 2018-01-24 20:36                     ` Jeff King
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2018-01-24 20:36 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Kim Gybels, Johannes Schindelin, git

On Wed, Jan 24, 2018 at 12:14:11PM +0100, Michael Haggerty wrote:

> Store a pointer to the start of the actual references within the
> `packed-refs` contents rather than storing the length of the header.
> This is more convenient for most users of this field.

This makes sense. It means that the "start" pointer needs to be
invalidated if "buf" ever changes. But that was pretty much the case
already with "header_len" (because "buf" is not a heap buffer that we
might realloc; if it ever changes it is because we re-read the file, and
we would have to re-parse the header length anyway).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/6] Yet another approach to handling empty snapshots
  2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
                                     ` (5 preceding siblings ...)
  2018-01-24 11:14                   ` [PATCH 6/6] packed_ref_cache: don't use mmap() for small files Michael Haggerty
@ 2018-01-24 20:38                   ` Jeff King
  2018-01-24 20:54                   ` Junio C Hamano
  2018-02-15 16:54                   ` Johannes Schindelin
  8 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2018-01-24 20:38 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Kim Gybels, Johannes Schindelin, git

On Wed, Jan 24, 2018 at 12:14:10PM +0100, Michael Haggerty wrote:

> This patch series fixes the handling of empty packed-refs snapshots
> (i.e., those with `snapshot->buf` and friends equal to `NULL`), partly
> by changing `snapshot` to store a pointer to the start of the
> post-header `packed-refs` content instead of `header_len`. It makes a
> couple of other improvements as well.
> 
> I'm not sure whether I like this approach better than the alternative
> of always setting `snapshot->buf` to a non-NULL value, by allocating a
> length-1 bit of RAM if necessary. The latter is less intrusive, though
> even if that approach is taken, I think patches 01, 02, and 04 from
> this patch series would be worthwhile improvements.

This looks good to me. I agree that 1, 2, and 4 are improvements
regardless (but 4 as it is now depends on 3, right?).

I don't have a strong opinion between this series and the other options
presented. It's probably not worth agonizing over, so we should pick one
and move on.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/6] Yet another approach to handling empty snapshots
  2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
                                     ` (6 preceding siblings ...)
  2018-01-24 20:38                   ` [PATCH 0/6] Yet another approach to handling empty snapshots Jeff King
@ 2018-01-24 20:54                   ` Junio C Hamano
  2018-02-15 16:54                   ` Johannes Schindelin
  8 siblings, 0 replies; 33+ messages in thread
From: Junio C Hamano @ 2018-01-24 20:54 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Kim Gybels, Johannes Schindelin, Jeff King, git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> This patch series fixes the handling of empty packed-refs snapshots
> (i.e., those with `snapshot->buf` and friends equal to `NULL`), partly
> by changing `snapshot` to store a pointer to the start of the
> post-header `packed-refs` content instead of `header_len`. It makes a
> couple of other improvements as well.
>
> I'm not sure whether I like this approach better than the alternative
> of always setting `snapshot->buf` to a non-NULL value, by allocating a
> length-1 bit of RAM if necessary. The latter is less intrusive, though
> even if that approach is taken, I think patches 01, 02, and 04 from
> this patch series would be worthwhile improvements.

I do not have a strong preference either way, but somehow feel that
this is more "coherent" ;-)  That is certainly subjective, though.

Thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] find_reference_location(): make function safe for empty snapshots
  2018-01-24 20:27                     ` Jeff King
@ 2018-01-24 21:11                       ` Junio C Hamano
  2018-01-24 21:34                         ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2018-01-24 21:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael Haggerty, Kim Gybels, Johannes Schindelin, git

Jeff King <peff@peff.net> writes:

> On Wed, Jan 24, 2018 at 12:14:13PM +0100, Michael Haggerty wrote:
>
>> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
>> index 08698de6ea..361affd7ad 100644
>> --- a/refs/packed-backend.c
>> +++ b/refs/packed-backend.c
>> [...]
>> @@ -551,7 +553,7 @@ static const char *find_reference_location(struct snapshot *snapshot,
>>  	 */
>>  	const char *hi = snapshot->eof;
>>  
>> -	while (lo < hi) {
>> +	while (lo != hi) {
>>  		const char *mid, *rec;
>>  		int cmp;
>
> This tightens the binary search termination condition. If we ever did
> see "hi > lo", we'd want to terminate the loop. Is that ever possible?

I think you meant "lo > hi", but I shared the same "Huh?" moment.

Because "While lo is strictly lower than hi" is a so well
established binary search pattern, even though we know that it is
equivalent to "While lo and hi is different" due to your analysis
below, the new code looks somewhat strange at the first glance.

> I think the answer is "no". Our "hi" here is an exclusive bound, so we
> should never go past it via find_end_of_record() when assigning "lo".
> And "hi" is always assigned from the start of the current record. That
> can never cross "lo", because find_start_of_record() ensures it.
>
> So I think it's fine, but I wanted to double check.

It would be much simpler to reason about if we instead do

	#define is_empty_snapshot(s) ((s)->start == NULL)

	if (is_empty_snapshot(snapshot))
		return NULL;

or something like that upfront.
	


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/6] find_reference_location(): make function safe for empty snapshots
  2018-01-24 21:11                       ` Junio C Hamano
@ 2018-01-24 21:34                         ` Jeff King
  0 siblings, 0 replies; 33+ messages in thread
From: Jeff King @ 2018-01-24 21:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, Kim Gybels, Johannes Schindelin, git

On Wed, Jan 24, 2018 at 01:11:00PM -0800, Junio C Hamano wrote:

> > This tightens the binary search termination condition. If we ever did
> > see "hi > lo", we'd want to terminate the loop. Is that ever possible?
> 
> I think you meant "lo > hi", but I shared the same "Huh?" moment.

Er, yeah. Sorry about that.

> Because "While lo is strictly lower than hi" is a so well
> established binary search pattern, even though we know that it is
> equivalent to "While lo and hi is different" due to your analysis
> below, the new code looks somewhat strange at the first glance.

I thought at first that this was due to the way the record-finding
happens, but I think even in our normal binary searches, it is an
invariant that "lo <= hi".

> > I think the answer is "no". Our "hi" here is an exclusive bound, so we
> > should never go past it via find_end_of_record() when assigning "lo".
> > And "hi" is always assigned from the start of the current record. That
> > can never cross "lo", because find_start_of_record() ensures it.
> >
> > So I think it's fine, but I wanted to double check.
> 
> It would be much simpler to reason about if we instead do
> 
> 	#define is_empty_snapshot(s) ((s)->start == NULL)
> 
> 	if (is_empty_snapshot(snapshot))
> 		return NULL;
> 
> or something like that upfront.

Yes, I agree that would also work.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 0/6] Yet another approach to handling empty snapshots
  2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
                                     ` (7 preceding siblings ...)
  2018-01-24 20:54                   ` Junio C Hamano
@ 2018-02-15 16:54                   ` Johannes Schindelin
  8 siblings, 0 replies; 33+ messages in thread
From: Johannes Schindelin @ 2018-02-15 16:54 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: Junio C Hamano, Kim Gybels, Jeff King, git

Hi Michael,

On Wed, 24 Jan 2018, Michael Haggerty wrote:

> This patch series fixes the handling of empty packed-refs snapshots
> (i.e., those with `snapshot->buf` and friends equal to `NULL`), partly
> by changing `snapshot` to store a pointer to the start of the
> post-header `packed-refs` content instead of `header_len`. It makes a
> couple of other improvements as well.
> 
> I'm not sure whether I like this approach better than the alternative
> of always setting `snapshot->buf` to a non-NULL value, by allocating a
> length-1 bit of RAM if necessary. The latter is less intrusive, though
> even if that approach is taken, I think patches 01, 02, and 04 from
> this patch series would be worthwhile improvements.

Thank you for Cc:ing me on this patch series. I tried to find some time to
review it, I really did, but failed. As I saw that others already had a
good look at it, I will just archive the mail thread.

I hope you do not mind!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2018-02-15 16:54 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-13 16:11 [PATCH] packed_ref_cache: don't use mmap() for small files Kim Gybels
2018-01-13 18:56 ` Johannes Schindelin
2018-01-14 19:14 ` [PATCH v2] " Kim Gybels
2018-01-15 12:17   ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Michael Haggerty
2018-01-15 12:17     ` [PATCH 1/3] SQUASH? Mention that `snapshot::buf` can be NULL for empty files Michael Haggerty
2018-01-15 12:17     ` [PATCH 2/3] create_snapshot(): exit early if the file was empty Michael Haggerty
2018-01-15 12:17     ` [PATCH 3/3] find_reference_location(): don't invoke if `snapshot->buf` is NULL Michael Haggerty
2018-01-17 20:23     ` [PATCH 0/3] Supplements to "packed_ref_cache: don't use mmap() for small files" Johannes Schindelin
2018-01-17 21:52     ` Junio C Hamano
2018-01-15 21:15 ` [PATCH] packed_ref_cache: don't use mmap() for small files Jeff King
2018-01-15 23:37   ` Kim Gybels
2018-01-15 23:52     ` Jeff King
2018-01-16 19:38       ` [PATCH v3] " Kim Gybels
2018-01-17 22:09         ` Jeff King
2018-01-21  4:41           ` Michael Haggerty
2018-01-22 19:31             ` Junio C Hamano
2018-01-24 11:05               ` Michael Haggerty
2018-01-24 11:14                 ` [PATCH 0/6] Yet another approach to handling empty snapshots Michael Haggerty
2018-01-24 11:14                   ` [PATCH 1/6] struct snapshot: store `start` rather than `header_len` Michael Haggerty
2018-01-24 20:36                     ` Jeff King
2018-01-24 11:14                   ` [PATCH 2/6] create_snapshot(): use `xmemdupz()` rather than a strbuf Michael Haggerty
2018-01-24 11:14                   ` [PATCH 3/6] find_reference_location(): make function safe for empty snapshots Michael Haggerty
2018-01-24 20:27                     ` Jeff King
2018-01-24 21:11                       ` Junio C Hamano
2018-01-24 21:34                         ` Jeff King
2018-01-24 11:14                   ` [PATCH 4/6] packed_ref_iterator_begin(): make optimization more general Michael Haggerty
2018-01-24 20:32                     ` Jeff King
2018-01-24 11:14                   ` [PATCH 5/6] load_contents(): don't try to mmap an empty file Michael Haggerty
2018-01-24 11:14                   ` [PATCH 6/6] packed_ref_cache: don't use mmap() for small files Michael Haggerty
2018-01-24 20:38                   ` [PATCH 0/6] Yet another approach to handling empty snapshots Jeff King
2018-01-24 20:54                   ` Junio C Hamano
2018-02-15 16:54                   ` Johannes Schindelin
2018-01-24 18:05                 ` [PATCH v3] packed_ref_cache: don't use mmap() for small files Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).