All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] address packed-refs speed regressions
@ 2015-04-05  1:06 Jeff King
  2015-04-05  1:07 ` [PATCH 1/6] strbuf_getwholeline: use getc macro Jeff King
                   ` (7 more replies)
  0 siblings, 8 replies; 44+ messages in thread
From: Jeff King @ 2015-04-05  1:06 UTC (permalink / raw)
  To: git

As I've mentioned before, I have some repositories with rather large
numbers of refs. The worst one has ~13 million refs, for a 1.6GB
packed-refs file. So I was saddened by this:

  $ time git.v2.0.0 rev-parse refs/heads/foo >/dev/null 2>&1
  real    0m6.840s
  user    0m6.404s
  sys     0m0.440s

  $ time git.v2.4.0-rc1 rev-parse refs/heads/foo >/dev/null 2>&1
  real    0m19.432s
  user    0m18.996s
  sys     0m0.456s

The command isn't important; what I'm really measuring is loading the
packed-refs file. And yes, of course this repository is absolutely
ridiculous. But the slowdowns here are linear with the number of refs.
So _every_ git command got a little bit slower, even in less crazy
repositories. We just didn't notice it as much.

Here are the numbers after this series:

  real    0m8.539s
  user    0m8.052s
  sys     0m0.496s

Much better, but I'm frustrated that they are still 20% slower than the
original.

The main culprits seem to be d0f810f (which introduced some extra
expensive code for each ref) and my 10c497a, which switched from fgets()
to strbuf_getwholeline. It turns out that strbuf_getwholeline is really
slow.

There may be other problems lurking to account for the remaining 20%.
It's hard to find performance regressions with a bisection if there are
multiple of them; if you stop at a random commit and it is 500ms slow,
it is hard to tell which problem is causing it.

Note that while these are regressions, they are in v2.2.0 and v2.2.2
respectively. So this can wait until post-2.4.

  [1/6]: strbuf_getwholeline: use getc macro
  [2/6]: git-compat-util: add fallbacks for unlocked stdio
  [3/6]: strbuf_getwholeline: use get_unlocked
  [4/6]: strbuf: add an optimized 1-character strbuf_grow
  [5/6]: t1430: add another refs-escape test
  [6/6]: refname_is_safe: avoid expensive normalize_path_copy call

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/6] strbuf_getwholeline: use getc macro
  2015-04-05  1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
@ 2015-04-05  1:07 ` Jeff King
  2015-04-05  1:08 ` [PATCH 2/6] git-compat-util: add fallbacks for unlocked stdio Jeff King
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-05  1:07 UTC (permalink / raw)
  To: git

strbuf_getwholeline calls fgetc in a tight loop. Using the
getc form, which can be implemented as a macro, should be
faster (and we do not care about it evaluating our argument
twice, as we just have a plain variable).

On my glibc system, running "git rev-parse
refs/heads/does-not-exist" on a file with an extremely large
(1.6GB) packed-refs file went from (best of 3 runs):

  real    0m19.383s
  user    0m18.876s
  sys     0m0.528s

to:

  real    0m18.900s
  user    0m18.472s
  sys     0m0.448s

for a wall-clock speedup of 2.5%.

Signed-off-by: Jeff King <peff@peff.net>
---
Not that exciting a speedup. But later we will switch to getc_unlocked,
and I wanted to measure how much of that was coming from the macro, and
how much from the locking.

 strbuf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index 88cafd4..14f337d 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -443,7 +443,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 		return EOF;
 
 	strbuf_reset(sb);
-	while ((ch = fgetc(fp)) != EOF) {
+	while ((ch = getc(fp)) != EOF) {
 		strbuf_grow(sb, 1);
 		sb->buf[sb->len++] = ch;
 		if (ch == term)
-- 
2.4.0.rc0.363.gf9f328b

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/6] git-compat-util: add fallbacks for unlocked stdio
  2015-04-05  1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
  2015-04-05  1:07 ` [PATCH 1/6] strbuf_getwholeline: use getc macro Jeff King
@ 2015-04-05  1:08 ` Jeff King
  2015-04-05  1:11 ` [PATCH 3/6] strbuf_getwholeline: use getc_unlocked Jeff King
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-05  1:08 UTC (permalink / raw)
  To: git

POSIX.1-2001 specifies some functions for optimizing the
locking out of tight getc() loops. Not all systems are
POSIX, though, and even not all POSIX systems are required
to implement these functions. We can check for the
feature-test macro to see if they are available, and if not,
provide a noop implementation.

There's no Makefile knob here, because we should just detect
this automatically. If there are very bizarre systems, we
may need to add one, but it's not clear yet in which
direction:

  1. If a system defines _POSIX_THREAD_SAFE_FUNCTIONS but
     these functions are missing or broken, we would want a
     knob to manually turn them off.

  2. If a system has these functions but does not define
     _POSIX_THREAD_SAFE_FUNCTIONS, we would want a knob to
     manually turn them on.

We can add such a knob when we find a real-world system that
matches this.

Signed-off-by: Jeff King <peff@peff.net>
---
 git-compat-util.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index bc8fc8c..685a0a4 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -883,4 +883,10 @@ struct tm *git_gmtime_r(const time_t *, struct tm *);
 # define SHELL_PATH "/bin/sh"
 #endif
 
+#ifndef _POSIX_THREAD_SAFE_FUNCTIONS
+#define flockfile(fh)
+#define funlockfile(fh)
+#define getc_unlocked(fh) getc(fh)
+#endif
+
 #endif
-- 
2.4.0.rc0.363.gf9f328b

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-05  1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
  2015-04-05  1:07 ` [PATCH 1/6] strbuf_getwholeline: use getc macro Jeff King
  2015-04-05  1:08 ` [PATCH 2/6] git-compat-util: add fallbacks for unlocked stdio Jeff King
@ 2015-04-05  1:11 ` Jeff King
  2015-04-05  4:56   ` Jeff King
  2015-04-05  1:11 ` [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow Jeff King
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2015-04-05  1:11 UTC (permalink / raw)
  To: git

strbuf_getwholeline calls getc in a tight loop. On modern
libc implementations, the stdio code locks the handle for
every operation, which means we are paying a significant
overhead.  We can get around this by locking the handle for
the whole loop and using the unlocked variant.

Running "git rev-parse refs/heads/does-not-exist" on a repo
with an extremely large (1.6GB) packed-refs file went from:

  real    0m18.900s
  user    0m18.472s
  sys     0m0.448s

to:

  real    0m10.953s
  user    0m10.384s
  sys     0m0.580s

for a wall-clock speedup of 42%. All times are best-of-3,
and done on a glibc 2.19 system.

Note that we call into strbuf_grow while holding the lock.
It's possible for that function to call other stdio
functions (e.g., printing to stderr when dying due to malloc
error); however, the POSIX.1-2001 definition of flockfile
makes it clear that the locks are per-handle, so we are fine
unless somebody else tries to read from our same handle.
This doesn't ever happen in the current code, and is
unlikely to be added in the future (we would have to do
something exotic like add a die_routine that tried to read
from stdin).

Signed-off-by: Jeff King <peff@peff.net>
---
I don't think the complexity is worth it, but if we wanted to be more
careful about the locks, I think it would probably involve growing the
buffer, locking, doing unlocked reads until it's full, and then
unlocking for the next round of growth.

I also considered optimizing the "term == '\n'" case by using fgets, but
it gets rather complex (you have to pick a size, fgets into it, and then
keep going if you didn't get a newline). Also, fgets sucks, because you
have to call strlen() immediately after to find out how many bytes you
got!

 strbuf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index 14f337d..af2bad4 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -443,12 +443,14 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 		return EOF;
 
 	strbuf_reset(sb);
-	while ((ch = getc(fp)) != EOF) {
+	flockfile(fp);
+	while ((ch = getc_unlocked(fp)) != EOF) {
 		strbuf_grow(sb, 1);
 		sb->buf[sb->len++] = ch;
 		if (ch == term)
 			break;
 	}
+	funlockfile(fp);
 	if (ch == EOF && sb->len == 0)
 		return EOF;
 
-- 
2.4.0.rc0.363.gf9f328b

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow
  2015-04-05  1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
                   ` (2 preceding siblings ...)
  2015-04-05  1:11 ` [PATCH 3/6] strbuf_getwholeline: use getc_unlocked Jeff King
@ 2015-04-05  1:11 ` Jeff King
  2015-04-06  2:13   ` Eric Sunshine
  2015-04-05  1:11 ` [PATCH 5/6] t1430: add another refs-escape test Jeff King
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2015-04-05  1:11 UTC (permalink / raw)
  To: git

We have to call strbuf_grow anytime we are going to add data
to a strbuf. In most cases, it's a noop (since we grow the
buffer aggressively), and the cost of the function call and
size check is dwarfed by the actual buffer operation.

For a tight loop of single-character additions, though, this
overhead is noticeable. Furthermore, the single-character
case is much easier to check; since the "extra" parameter is
1, we can do it without worrying about overflow.

This patch adds a simple inline function for checking
single-character growth. For the growth case, it just calls
into the regular strbuf_grow(). This is redundant, as
strbuf_grow will check again whether we need to grow. But it
keeps our inline code simple, and most calls will not need
to grow, so it's OK to treat this as a rare "slow path".

We apply the new function to strbuf_getwholeline. Running
"git rev-parse refs/heads/does-not-exist" on a repo with an
extremely large (1.6GB) packed-refs file went from
(best-of-3):

  real    0m10.953s
  user    0m10.384s
  sys     0m0.580s

to:

  real    0m8.910s
  user    0m8.452s
  sys     0m0.468s

for a wall-clock speedup of 18%.

Signed-off-by: Jeff King <peff@peff.net>
---
 strbuf.c | 2 +-
 strbuf.h | 9 +++++++++
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index af2bad4..2facd5f 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -445,7 +445,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 	strbuf_reset(sb);
 	flockfile(fp);
 	while ((ch = getc_unlocked(fp)) != EOF) {
-		strbuf_grow(sb, 1);
+		strbuf_grow_ch(sb);
 		sb->buf[sb->len++] = ch;
 		if (ch == term)
 			break;
diff --git a/strbuf.h b/strbuf.h
index 1883494..ef41151 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -137,6 +137,15 @@ static inline size_t strbuf_avail(const struct strbuf *sb)
  */
 extern void strbuf_grow(struct strbuf *, size_t);
 
+/*
+ * An optimized version of strbuf_grow() for a single character.
+ */
+static inline void strbuf_grow_ch(struct strbuf *sb)
+{
+	if (!sb->alloc || sb->alloc - 1 <= sb->len)
+		strbuf_grow(sb, 1);
+}
+
 /**
  * Set the length of the buffer to a given value. This function does *not*
  * allocate new memory, so you should not perform a `strbuf_setlen()` to a
-- 
2.4.0.rc0.363.gf9f328b

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 5/6] t1430: add another refs-escape test
  2015-04-05  1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
                   ` (3 preceding siblings ...)
  2015-04-05  1:11 ` [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow Jeff King
@ 2015-04-05  1:11 ` Jeff King
  2015-04-05  1:15 ` [PATCH 6/6] refname_is_safe: avoid expensive normalize_path_copy call Jeff King
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-05  1:11 UTC (permalink / raw)
  To: git

In t1430, we check whether deleting the branch "../../foo"
will delete ".git/foo". However, this is not that
interesting a test; the precious file ".git/foo" does not
look like a ref, so even if we did not notice the "escape"
from the "refs/" hierarchy, we would fail for that reason
(i.e., if you turned refname_is_safe into a noop, the test
still passes).

Let's add an additional test for the same thing, but with a
file that actually looks like a ref. That will make sure we
are exercising the refname_is_safe code. While we're at it,
let's also make the code work a little harder by adding some
extra paths and some empty path components.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1430-bad-ref-name.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
index 468e856..16d0b8b 100755
--- a/t/t1430-bad-ref-name.sh
+++ b/t/t1430-bad-ref-name.sh
@@ -68,6 +68,14 @@ test_expect_success 'branch -D cannot delete non-ref in .git dir' '
 	test_cmp expect .git/my-private-file
 '
 
+test_expect_success 'branch -D cannot delete ref in .git dir' '
+	git rev-parse HEAD >.git/my-private-file &&
+	git rev-parse HEAD >expect &&
+	git branch foo/legit &&
+	test_must_fail git branch -D foo////./././../../../my-private-file &&
+	test_cmp expect .git/my-private-file
+'
+
 test_expect_success 'branch -D cannot delete absolute path' '
 	git branch -f extra &&
 	test_must_fail git branch -D "$(pwd)/.git/refs/heads/extra" &&
-- 
2.4.0.rc0.363.gf9f328b

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 6/6] refname_is_safe: avoid expensive normalize_path_copy call
  2015-04-05  1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
                   ` (4 preceding siblings ...)
  2015-04-05  1:11 ` [PATCH 5/6] t1430: add another refs-escape test Jeff King
@ 2015-04-05  1:15 ` Jeff King
  2015-04-05 13:41 ` [PATCH 0/6] address packed-refs speed regressions René Scharfe
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
  7 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-05  1:15 UTC (permalink / raw)
  To: git

Usually refs are not allowed to contain a ".." component.
However, since d0f810f (refs.c: allow listing and deleting
badly named refs, 2014-09-03), we relax these rules in some
cases in order to help users examine and get rid of
badly-named refs. However, we do still check that these refs
cannot "escape" the refs hierarchy (e.g., "refs/../foo").

This check is implemented by calling normalize_path_copy,
which requires us to allocate a new buffer to hold the
result. But we don't care about the result; we only care
whether the "up" components outnumbered the "down".

We can therefore implement this check ourselves without
requiring any extra allocations. With this patch, running
"git rev-parse refs/heads/does-not-exist" on a repo with
large (1.6GB) packed-refs file went from:

  real    0m8.910s
  user    0m8.452s
  sys     0m0.468s

to:

  real    0m8.529s
  user    0m8.044s
  sys     0m0.492s

for a wall-clock speedup of 4%.

Signed-off-by: Jeff King <peff@peff.net>
---
This was a lot less than I was hoping for, especially considering that
going from d0f810f^ to d0f810f is more like a 15% slowdown (or in
absolute numbers, ~1.1s versus only 400ms here). What's doubly confusing
is that I think we were running check_ref_format before d0f810f, which
does way more than the check we're doing here. So we should have ended
up faster than either.

So this is certainly _an_ improvement, but I think there may be more
going on.

 cache.h |  7 +++++++
 path.c  | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 refs.c  | 16 ++--------------
 3 files changed, 73 insertions(+), 14 deletions(-)

diff --git a/cache.h b/cache.h
index 3d3244b..c2b3df4 100644
--- a/cache.h
+++ b/cache.h
@@ -836,6 +836,13 @@ char *strip_path_suffix(const char *path, const char *suffix);
 int daemon_avoid_alias(const char *path);
 extern int is_ntfs_dotgit(const char *name);
 
+/**
+ * Returns 1 if the path "escapes" its root (e.g., "foo/../../bar") and 0
+ * otherwise. Note that this is purely textual, and does not care about
+ * symlinks or other aspects of the filesystem.
+ */
+int check_path_escape(const char *path);
+
 /* object replacement */
 #define LOOKUP_REPLACE_OBJECT 1
 extern void *read_sha1_file_extended(const unsigned char *sha1, enum object_type *type, unsigned long *size, unsigned flag);
diff --git a/path.c b/path.c
index e608993..1127cbd 100644
--- a/path.c
+++ b/path.c
@@ -703,6 +703,70 @@ int normalize_path_copy(char *dst, const char *src)
 }
 
 /*
+ * We want to detect a path that "escapes" its root. The general strategy
+ * is to parse components left to right, keeping track of our depth,
+ * which is increased by non-empty components and decreased by ".."
+ * components.
+ */
+int check_path_escape(const char *path)
+{
+	int depth = 0;
+
+	while (*path) {
+		char ch = *path++;
+
+		/*
+		 * We always start our loop at the beginning of a path component. So
+		 * we can skip past any dir separators. This handles leading
+		 * "/", as well as any internal "////".
+		 */
+		if (is_dir_sep(ch))
+			continue;
+
+		/*
+		 * If we start with a dot, we care about the four cases
+		 * (similar to normalize_path_copy above):
+		 *
+		 *  (1)  "."  - does not affect depth; we are done
+		 *  (2)  "./" - does not affect depth; skip
+		 *  (3)  ".." - check depth and finish
+		 *  (4)  "../" - drop depth, check, and keep looking
+		 */
+		if (ch == '.') {
+			ch = *path++;
+
+			if (!ch)
+				return 1; /* case (1) */
+			if (is_dir_sep(ch))
+				continue; /* case (2) */
+			if (ch == '.') {
+				ch = *path++;
+				if (!ch)
+					return depth > 0; /* case (3) */
+				if (is_dir_sep(ch)) {
+					/* case (4) */
+					if (--depth < 0)
+						return 0;
+					continue;
+				}
+				/* otherwise, "..foo"; fall through */
+			}
+			/* otherwise ".foo"; fall through */
+		}
+
+		/*
+		 * We have a real component; inrement the depth and eat the
+		 * rest of the component
+		 */
+		depth++;
+		while (*path && !is_dir_sep(*path))
+			path++;
+	}
+
+	return 1;
+}
+
+/*
  * path = Canonical absolute path
  * prefixes = string_list containing normalized, absolute paths without
  * trailing slashes (except for the root directory, which is denoted by "/").
diff --git a/refs.c b/refs.c
index 47e4e53..daca499 100644
--- a/refs.c
+++ b/refs.c
@@ -312,20 +312,8 @@ static struct ref_dir *get_ref_dir(struct ref_entry *entry)
  */
 static int refname_is_safe(const char *refname)
 {
-	if (starts_with(refname, "refs/")) {
-		char *buf;
-		int result;
-
-		buf = xmalloc(strlen(refname) + 1);
-		/*
-		 * Does the refname try to escape refs/?
-		 * For example: refs/foo/../bar is safe but refs/foo/../../bar
-		 * is not.
-		 */
-		result = !normalize_path_copy(buf, refname + strlen("refs/"));
-		free(buf);
-		return result;
-	}
+	if (skip_prefix(refname, "refs/", &refname))
+		return check_path_escape(refname);
 	while (*refname) {
 		if (!isupper(*refname) && *refname != '_')
 			return 0;
-- 
2.4.0.rc0.363.gf9f328b

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-05  1:11 ` [PATCH 3/6] strbuf_getwholeline: use getc_unlocked Jeff King
@ 2015-04-05  4:56   ` Jeff King
  2015-04-05  5:27     ` Jeff King
                       ` (3 more replies)
  0 siblings, 4 replies; 44+ messages in thread
From: Jeff King @ 2015-04-05  4:56 UTC (permalink / raw)
  To: git

On Sat, Apr 04, 2015 at 09:11:10PM -0400, Jeff King wrote:

> I also considered optimizing the "term == '\n'" case by using fgets, but
> it gets rather complex (you have to pick a size, fgets into it, and then
> keep going if you didn't get a newline). Also, fgets sucks, because you
> have to call strlen() immediately after to find out how many bytes you
> got!

My initial attempt at this had been to _just_ use fgets, but the
optimization becomes much simpler if you just do an initial fgets, and
then follow up with character reads. In most cases, the initial fgets
is big enough to get the whole line.

I.e., doing:

diff --git a/strbuf.c b/strbuf.c
index 2facd5f..f319d8d 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -443,6 +443,18 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 		return EOF;
 
 	strbuf_reset(sb);
+
+	if (term == '\n') {
+		strbuf_grow(sb, 256);
+		if (!fgets(sb->buf, sb->alloc - 1, fp)) {
+			strbuf_release(sb);
+			return EOF;
+		}
+		sb->len = strlen(sb->buf);
+		if (sb->buf[sb->len - 1] == '\n')
+			return 0;
+	}
+
 	flockfile(fp);
 	while ((ch = getc_unlocked(fp)) != EOF) {
 		strbuf_grow_ch(sb);

on top of the series drops me from:

  real    0m8.573s
  user    0m8.072s
  sys     0m0.508s

to:

  real    0m6.671s
  user    0m6.216s
  sys     0m0.460s

which is back to the v2.0.0 number. Even with the extra strlen, it seems
that what fgets does internally beats repeated getc calls. Which I guess
is not too surprising, as each getc() will have to check for underflow
in the buffer. Perhaps there is more room to micro-optimize
strbuf_getwholeline, but I kind of doubt it.

The big downside is that our input strings are no longer NUL-clean
(reading "foo\0bar\n" would yield just "foo". I doubt that matters in
the real world, but it does fail a few of the tests (e.g., t7008 tries
to read a list of patterns which includes NUL, and we silently truncate
the pattern rather than read in the NUL and barf).

So we'd have to either:

  1. Decide that doesn't matter.

  2. Have callers specify a "damn the NULs, I want it fast" flag.

  3. Find some alternative that is more robust than fgets, and faster
     than getc. I don't think there is anything in stdio, but I am not
     above dropping in a faster non-portable call if it is available,
     and then falling back to the current code otherwise.

-Peff

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-05  4:56   ` Jeff King
@ 2015-04-05  5:27     ` Jeff King
  2015-04-05  5:35       ` Jeff King
  2015-04-05 14:36     ` Duy Nguyen
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2015-04-05  5:27 UTC (permalink / raw)
  To: git

On Sun, Apr 05, 2015 at 12:56:14AM -0400, Jeff King wrote:

> The big downside is that our input strings are no longer NUL-clean
> (reading "foo\0bar\n" would yield just "foo". I doubt that matters in
> the real world, but it does fail a few of the tests (e.g., t7008 tries
> to read a list of patterns which includes NUL, and we silently truncate
> the pattern rather than read in the NUL and barf).

So there is this trick:

diff --git a/strbuf.c b/strbuf.c
index f319d8d..5ceebb7 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -445,12 +445,13 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 	strbuf_reset(sb);
 
 	if (term == '\n') {
+		long pos = ftell(fp);
 		strbuf_grow(sb, 256);
 		if (!fgets(sb->buf, sb->alloc - 1, fp)) {
 			strbuf_release(sb);
 			return EOF;
 		}
-		sb->len = strlen(sb->buf);
+		sb->len = ftell(fp) - pos;
 		if (sb->buf[sb->len - 1] == '\n')
 			return 0;
 	}

but much to my surprise it actually runs slower than the strlen version!
It also has a 32-bit overflow issue. There's fgetpos() as an
alternative, but fpos_t is an opaque type, and we might not be able to
do arithmetic on it (for that matter, I am not sure if arithmetic is
strictly guaranteed on ftell() results). POSIX gives us ftello(), which
returns an off_t. That would probably be fine.

The ftello() version seems slower than the strlen, but faster than
ftell(). Puzzling.

-Peff

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-05  5:27     ` Jeff King
@ 2015-04-05  5:35       ` Jeff King
  2015-04-05 20:49         ` Junio C Hamano
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2015-04-05  5:35 UTC (permalink / raw)
  To: git

On Sun, Apr 05, 2015 at 01:27:32AM -0400, Jeff King wrote:

> On Sun, Apr 05, 2015 at 12:56:14AM -0400, Jeff King wrote:
> 
> > The big downside is that our input strings are no longer NUL-clean
> > (reading "foo\0bar\n" would yield just "foo". I doubt that matters in
> > the real world, but it does fail a few of the tests (e.g., t7008 tries
> > to read a list of patterns which includes NUL, and we silently truncate
> > the pattern rather than read in the NUL and barf).
> 
> So there is this trick:
> 
> diff --git a/strbuf.c b/strbuf.c
> index f319d8d..5ceebb7 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -445,12 +445,13 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
>  	strbuf_reset(sb);
>  
>  	if (term == '\n') {
> +		long pos = ftell(fp);
>  		strbuf_grow(sb, 256);
>  		if (!fgets(sb->buf, sb->alloc - 1, fp)) {
>  			strbuf_release(sb);
>  			return EOF;
>  		}
> -		sb->len = strlen(sb->buf);
> +		sb->len = ftell(fp) - pos;
>  		if (sb->buf[sb->len - 1] == '\n')
>  			return 0;
>  	}
> 
> but much to my surprise it actually runs slower than the strlen version!
> It also has a 32-bit overflow issue. There's fgetpos() as an
> alternative, but fpos_t is an opaque type, and we might not be able to
> do arithmetic on it (for that matter, I am not sure if arithmetic is
> strictly guaranteed on ftell() results). POSIX gives us ftello(), which
> returns an off_t. That would probably be fine.

Actually, scratch that idea. ftell() always returns 0 on a non-seekable
file, so we can't use it in the general case. And that probably explains
the performance difference, too, if it is not keeping its own counter
and relies on lseek(fileno(fp)) or similar.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] address packed-refs speed regressions
  2015-04-05  1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
                   ` (5 preceding siblings ...)
  2015-04-05  1:15 ` [PATCH 6/6] refname_is_safe: avoid expensive normalize_path_copy call Jeff King
@ 2015-04-05 13:41 ` René Scharfe
  2015-04-05 18:52   ` Jeff King
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
  7 siblings, 1 reply; 44+ messages in thread
From: René Scharfe @ 2015-04-05 13:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty

Am 05.04.2015 um 03:06 schrieb Jeff King:
> As I've mentioned before, I have some repositories with rather large
> numbers of refs. The worst one has ~13 million refs, for a 1.6GB
> packed-refs file. So I was saddened by this:
> 
>    $ time git.v2.0.0 rev-parse refs/heads/foo >/dev/null 2>&1
>    real    0m6.840s
>    user    0m6.404s
>    sys     0m0.440s
> 
>    $ time git.v2.4.0-rc1 rev-parse refs/heads/foo >/dev/null 2>&1
>    real    0m19.432s
>    user    0m18.996s
>    sys     0m0.456s
> 
> The command isn't important; what I'm really measuring is loading the
> packed-refs file. And yes, of course this repository is absolutely
> ridiculous. But the slowdowns here are linear with the number of refs.
> So _every_ git command got a little bit slower, even in less crazy
> repositories. We just didn't notice it as much.
> 
> Here are the numbers after this series:
> 
>    real    0m8.539s
>    user    0m8.052s
>    sys     0m0.496s
> 
> Much better, but I'm frustrated that they are still 20% slower than the
> original.
> 
> The main culprits seem to be d0f810f (which introduced some extra
> expensive code for each ref) and my 10c497a, which switched from fgets()
> to strbuf_getwholeline. It turns out that strbuf_getwholeline is really
> slow.

10c497a changed read_packed_refs(), which reads *all* packed refs.
Each is checked for validity.  That sounds expensive if the goal is
just to look up a single (non-existing) ref.

Would it help to defer any checks until a ref is actually accessed?
Can a binary search be used instead of reading the whole file?

I wonder if pluggable reference backends could help here.  Storing refs
in a database table indexed by refname should simplify things.

Short-term, can we avoid the getc()/strbuf_grow() dance e.g. by mapping
the packed refs file?  What numbers do you get with the following patch?

---
 refs.c | 36 ++++++++++++++++++++++++++++--------
 1 file changed, 28 insertions(+), 8 deletions(-)

diff --git a/refs.c b/refs.c
index 47e4e53..144255f 100644
--- a/refs.c
+++ b/refs.c
@@ -1153,16 +1153,35 @@ static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1)
  *      compatibility with older clients, but we do not require it
  *      (i.e., "peeled" is a no-op if "fully-peeled" is set).
  */
-static void read_packed_refs(FILE *f, struct ref_dir *dir)
+static void read_packed_refs(int fd, struct ref_dir *dir)
 {
 	struct ref_entry *last = NULL;
 	struct strbuf line = STRBUF_INIT;
 	enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
+	struct stat st;
+	void *map;
+	size_t mapsz, len;
+	const char *p;
+
+	fstat(fd, &st);
+	mapsz = xsize_t(st.st_size);
+	if (!mapsz)
+		return;
+	map = xmmap(NULL, mapsz, PROT_READ, MAP_PRIVATE, fd, 0);
 
-	while (strbuf_getwholeline(&line, f, '\n') != EOF) {
+	for (p = map, len = mapsz; len; ) {
 		unsigned char sha1[20];
 		const char *refname;
 		const char *traits;
+		const char *nl;
+		size_t linelen;
+
+		nl = memchr(p, '\n', len);
+		linelen = nl ? nl - p + 1 : len;
+		strbuf_reset(&line);
+		strbuf_add(&line, p, linelen);
+		p += linelen;
+		len -= linelen;
 
 		if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
 			if (strstr(traits, " fully-peeled "))
@@ -1204,6 +1223,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 	}
 
 	strbuf_release(&line);
+	munmap(map, mapsz);
 }
 
 /*
@@ -1224,16 +1244,16 @@ static struct packed_ref_cache *get_packed_ref_cache(struct ref_cache *refs)
 		clear_packed_ref_cache(refs);
 
 	if (!refs->packed) {
-		FILE *f;
+		int fd;
 
 		refs->packed = xcalloc(1, sizeof(*refs->packed));
 		acquire_packed_ref_cache(refs->packed);
 		refs->packed->root = create_dir_entry(refs, "", 0, 0);
-		f = fopen(packed_refs_file, "r");
-		if (f) {
-			stat_validity_update(&refs->packed->validity, fileno(f));
-			read_packed_refs(f, get_ref_dir(refs->packed->root));
-			fclose(f);
+		fd = open(packed_refs_file, O_RDONLY);
+		if (fd >= 0) {
+			stat_validity_update(&refs->packed->validity, fd);
+			read_packed_refs(fd, get_ref_dir(refs->packed->root));
+			close(fd);
 		}
 	}
 	return refs->packed;
-- 
2.3.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-05  4:56   ` Jeff King
  2015-04-05  5:27     ` Jeff King
@ 2015-04-05 14:36     ` Duy Nguyen
  2015-04-05 18:24       ` Jeff King
  2015-04-05 20:09     ` Junio C Hamano
  2015-04-07 13:48     ` Rasmus Villemoes
  3 siblings, 1 reply; 44+ messages in thread
From: Duy Nguyen @ 2015-04-05 14:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List

On Sun, Apr 5, 2015 at 11:56 AM, Jeff King <peff@peff.net> wrote:
> So we'd have to either:
>
>   1. Decide that doesn't matter.
>
>   2. Have callers specify a "damn the NULs, I want it fast" flag.

2+. Avoid FILE* interface and go with syscalls for reading
packed-refs? If mmaping the entire file could be a problem for some
platform because it's too large, we have code for reading (with
bufferring) from fd somewhere, e.g. index-pack.

>   3. Find some alternative that is more robust than fgets, and faster
>      than getc. I don't think there is anything in stdio, but I am not
>      above dropping in a faster non-portable call if it is available,
>      and then falling back to the current code otherwise.
-- 
Duy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-05 14:36     ` Duy Nguyen
@ 2015-04-05 18:24       ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-05 18:24 UTC (permalink / raw)
  To: Duy Nguyen; +Cc: Git Mailing List

On Sun, Apr 05, 2015 at 09:36:04PM +0700, Duy Nguyen wrote:

> On Sun, Apr 5, 2015 at 11:56 AM, Jeff King <peff@peff.net> wrote:
> > So we'd have to either:
> >
> >   1. Decide that doesn't matter.
> >
> >   2. Have callers specify a "damn the NULs, I want it fast" flag.
> 
> 2+. Avoid FILE* interface and go with syscalls for reading
> packed-refs? If mmaping the entire file could be a problem for some
> platform because it's too large, we have code for reading (with
> bufferring) from fd somewhere, e.g. index-pack.

There's strbuf_getwholeline_fd, but it's horrifically inefficient (one
syscall per character). But the other option is to implement your own
buffering, and we're generally better off letting stdio do that for us
(the exception here is that stdio does not have a good NUL-safe "read
until X" function).

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] address packed-refs speed regressions
  2015-04-05 13:41 ` [PATCH 0/6] address packed-refs speed regressions René Scharfe
@ 2015-04-05 18:52   ` Jeff King
  2015-04-05 18:59     ` Jeff King
  2015-04-05 22:39     ` René Scharfe
  0 siblings, 2 replies; 44+ messages in thread
From: Jeff King @ 2015-04-05 18:52 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Michael Haggerty

On Sun, Apr 05, 2015 at 03:41:39PM +0200, René Scharfe wrote:

> > The main culprits seem to be d0f810f (which introduced some extra
> > expensive code for each ref) and my 10c497a, which switched from fgets()
> > to strbuf_getwholeline. It turns out that strbuf_getwholeline is really
> > slow.
> 
> 10c497a changed read_packed_refs(), which reads *all* packed refs.
> Each is checked for validity.  That sounds expensive if the goal is
> just to look up a single (non-existing) ref.
> 
> Would it help to defer any checks until a ref is actually accessed?
> Can a binary search be used instead of reading the whole file?

Yes, but addressing that is much more invasive.

Right now we parse all of the packed-refs file into an in-memory cache,
and then do single lookups from that cache. Doing an mmap() and a binary
search is way faster (and costs less memory) for doing individual
lookups. It relies on the list being sorted. This is generally true, but
not something we currently rely on (however, it would be easy to add a
"sorted" flag to top of the file and have the readers fall back when the
flag is missing). I've played with a patch to do this (it's not entirely
trivial, because you jump into the middle of a line, and then have to
walk backwards to find the start of the record).

For traversals, it's more complicated. Obviously if you are traversing
all refs, you have to read the whole thing anyway. If you are traversing
a subset of the refs, you can binary-search the start of the subset, and
then walk forward. But that's where it gets tricky with the current
code.

The ref_cache code expects to fill in from outer to inner. So if you
have "refs/foo", you should also have filled in all of "refs/" (but not
necessarily "refs/bar"). This matches the way we traverse loose ref
directories; we opendir "refs/", find out that it has "foo" and "bar",
and the descend into "foo", and so forth. But reading a subset of the
packed-ref file is "inside out". You fill in all of "refs/foo", but you
have no idea what else is in "refs/".

So going in that direction would involve some surgery to the ref_cache
code. It might even involve throwing it out entirely (i.e., just mmap
the packed-refs file and look through it directly, without any kind of
in-memory cache; we don't tend to do more than one ref-iteration per
program anyway, so I'm not sure the caching is buying us much anyway).
My big concern there would be that there are a lot of subtle race issues
between packed and loose refs, and the current state is the result of a
lot of tweaking. I'd be worried that a heavy rewrite there would risk
introducing subtle and rare corruptions.

Plus it would be a lot of work, which leads me to...

> I wonder if pluggable reference backends could help here.  Storing refs
> in a database table indexed by refname should simplify things.

...this. I think that effort might be better spent on a ref storage
format that's more efficient, simpler (with respect to subtle races and
such), and could provide other features (e.g., transactional atomicity).

The big plus side of packed-refs improvements is that they "just work"
without worrying about compatibility issues. But ref storage is local,
so I'm not sure how big a deal that is in practice.

> Short-term, can we avoid the getc()/strbuf_grow() dance e.g. by mapping
> the packed refs file?  What numbers do you get with the following patch?

It's about 9% faster than my series + the fgets optimization I posted
(or about 25% than using getc).  Which is certainly nice, but I was
really hoping to just make strbuf_getline faster for all callers, rather
than introducing special code for one call-site. Certainly we could
generalize the technique (i.e., a struct with the mmap data), but then I
feel we are somewhat reinventing stdio. Which is maybe a good thing,
because stdio has a lot of rough edges (as seen here), but it does feel
a bit like NIH syndrome.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] address packed-refs speed regressions
  2015-04-05 18:52   ` Jeff King
@ 2015-04-05 18:59     ` Jeff King
  2015-04-05 23:04       ` René Scharfe
  2015-04-05 22:39     ` René Scharfe
  1 sibling, 1 reply; 44+ messages in thread
From: Jeff King @ 2015-04-05 18:59 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Michael Haggerty

On Sun, Apr 05, 2015 at 02:52:59PM -0400, Jeff King wrote:

> Right now we parse all of the packed-refs file into an in-memory cache,
> and then do single lookups from that cache. Doing an mmap() and a binary
> search is way faster (and costs less memory) for doing individual
> lookups. It relies on the list being sorted. This is generally true, but
> not something we currently rely on (however, it would be easy to add a
> "sorted" flag to top of the file and have the readers fall back when the
> flag is missing). I've played with a patch to do this (it's not entirely
> trivial, because you jump into the middle of a line, and then have to
> walk backwards to find the start of the record).
> 
> For traversals, it's more complicated. Obviously if you are traversing
> all refs, you have to read the whole thing anyway. If you are traversing
> a subset of the refs, you can binary-search the start of the subset, and
> then walk forward. But that's where it gets tricky with the current
> code.

In case you are curious, here is my proof-of-concept for the packed-refs
binary search. You'll note that it's a separate program, and not
integrated into refs.c. I wrote this last August, and after trying to
integrate it into refs.c, I found the ref_cache problems I described,
and I haven't touched it since.

I also seem to have saved the patch for stuffing it into refs.c, but I
am not sure if it even compiles (I wrote only "horrible wip" in the
commit message ;) ).

-- >8 --
Subject: [PATCH] add git-quick-list

This is a proof of concept for binary-searching the
packed-refs file in order to traverse an ordered subset of
it. Note that it _only_ reads the packed-refs file
currently. To really compare to for-each-ref, it would need
to also walk the loose ref area for its prefix. On a
mostly-packed repository that shouldn't make a big speed
difference, though.

And of course we don't _really_ want a separate command here
at all. This should be part of refs.c, and everyone who
calls for_each_ref should benefit from it.

Still, the numbers are promising. Here's are comparisons
against for-each-ref on torvalds/linux, which has a 218M
packed-refs file:

  $ time git for-each-ref \
      --format='%(objectname) %(refname)' \
      refs/remotes/2325298/ |
      wc -c
  44139

  real    0m1.649s
  user    0m1.332s
  sys     0m0.304s

  $ time ~peff/git-quick-list refs/remotes/2325298/ | wc -c
  44139

  real    0m0.012s
  user    0m0.004s
  sys     0m0.004s
---
 Makefile     |   1 +
 quick-list.c | 174 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 175 insertions(+)
 create mode 100644 quick-list.c

diff --git a/Makefile b/Makefile
index 2457065..aa32598 100644
--- a/Makefile
+++ b/Makefile
@@ -541,6 +541,7 @@ PROGRAM_OBJS += shell.o
 PROGRAM_OBJS += show-index.o
 PROGRAM_OBJS += upload-pack.o
 PROGRAM_OBJS += remote-testsvn.o
+PROGRAM_OBJS += quick-list.o
 
 # Binary suffix, set to .exe for Windows builds
 X =
diff --git a/quick-list.c b/quick-list.c
new file mode 100644
index 0000000..e423f1f
--- /dev/null
+++ b/quick-list.c
@@ -0,0 +1,174 @@
+#include "cache.h"
+#include "refs.h"
+
+struct packed_refs_iterator {
+	const char *start;
+	const char *end;
+
+	const char *cur;
+	const char *ref;
+	const char *eol;
+	const char *next;
+};
+
+static void iterator_init(struct packed_refs_iterator *pos,
+			  const char *buf, size_t len)
+{
+	pos->start = buf;
+	pos->end = buf + len;
+
+	/* skip past header line */
+	if (pos->start < pos->end && *pos->start == '#') {
+		while (pos->start < pos->end && *pos->start != '\n')
+			pos->start++;
+		if (pos->start < pos->end)
+			pos->start++;
+	}
+}
+
+static int iterator_cmp(const char *key, struct packed_refs_iterator *pos)
+{
+	const char *ref = pos->ref;
+	for (; *key && ref < pos->eol; key++, ref++)
+		if (*key != *ref)
+			return (unsigned char)*key - (unsigned char)*ref;
+	return ref == pos->eol ? *key ? 1 : 0 : -1;
+}
+
+static const char *find_eol(const char *p, const char *end)
+{
+	p = memchr(p, '\n', end - p);
+	return p ? p : end;
+}
+
+static void parse_line(struct packed_refs_iterator *pos, const char *p)
+{
+	pos->cur = p;
+	if (pos->end - p < 41)
+		die("truncated packed-refs file");
+	p += 41;
+
+	pos->ref = p;
+	pos->eol = p = find_eol(p, pos->end);
+
+	/* skip newline, and then past any peel records */
+	if (p < pos->end)
+		p++;
+	while (p < pos->end && *p == '^') {
+		p = find_eol(p, pos->end);
+		if (p < pos->end)
+			p++;
+	}
+	pos->next = p;
+}
+
+static void iterator_next(struct packed_refs_iterator *pos)
+{
+	if (pos->next < pos->end)
+		parse_line(pos, pos->next);
+	else
+		pos->cur = NULL;
+}
+
+static void iterator_start(struct packed_refs_iterator *pos, const char *prefix)
+{
+	const char *lo = pos->start, *hi = pos->end;
+
+	while (lo < hi) {
+		const char *mi = lo + ((hi - lo) / 2);
+		int cmp;
+
+		/*
+		 * We landed somewhere on a line. Walk back to find
+		 * the start of the line.
+		 */
+		while (mi > lo && *(mi-1) != '\n')
+			mi--;
+
+		/*
+		 * We may have hit a peel-line. In that case, try
+		 * to walk back to the actual ref line (and skip as
+		 * many peel lines as we find, for future-proofing).
+		 */
+		while (*mi == '^') {
+			if (mi == lo)
+				die("peel line without a record before it?");
+			mi--;
+			if (mi == lo)
+				die("peel line with bare newline before it?");
+			mi--;
+			while (mi > lo && *(mi-1) != '\n')
+				mi--;
+		}
+
+		/* Now we should be at a real ref line. */
+		parse_line(pos, mi);
+		cmp = iterator_cmp(prefix, pos);
+		if (!cmp)
+			return;
+		else if (cmp < 0)
+			hi = pos->cur;
+		else
+			lo = pos->next;
+	}
+
+	if (hi < pos->end)
+		parse_line(pos, hi);
+	else
+		pos->cur = NULL;
+}
+
+static void quick_list(const char *prefix, each_ref_fn fn, void *data)
+{
+	int fd = open(git_path("packed-refs"), O_RDONLY);
+	struct stat st;
+	const char *buf = NULL;
+	size_t len;
+	struct packed_refs_iterator pos;
+
+	if (fd < 0)
+		goto out;
+	if (fstat(fd, &st) < 0)
+		goto out;
+	len = xsize_t(st.st_size);
+	buf = xmmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0);
+	if (!buf)
+		goto out;
+
+	iterator_init(&pos, buf, len);
+	for (iterator_start(&pos, prefix);
+	     pos.cur && starts_with(pos.ref, prefix);
+	     iterator_next(&pos)) {
+		unsigned char sha1[20];
+		char *refname;
+
+		if (get_sha1_hex(pos.cur, sha1) < 0)
+			die("packed-refs contained invalid sha1");
+		refname = xmemdupz(pos.ref, pos.eol - pos.ref);
+		fn(refname, sha1, 0, data);
+		free(refname);
+	}
+
+out:
+	close(fd);
+	if (buf)
+		munmap((void *)buf, len);
+}
+
+static int show_ref(const char *refname, const unsigned char *sha1,
+		    int flags, void *data)
+{
+	printf("%s %s\n", sha1_to_hex(sha1), refname);
+	return 0;
+}
+
+int main(int argc, char **argv)
+{
+	if (argc != 2)
+		usage("git quick-list <prefix>");
+
+	setup_git_directory();
+	quick_list(argv[1], show_ref, NULL);
+
+	return 0;
+}
-- 
2.4.0.rc0.363.gf9f328b

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-05  4:56   ` Jeff King
  2015-04-05  5:27     ` Jeff King
  2015-04-05 14:36     ` Duy Nguyen
@ 2015-04-05 20:09     ` Junio C Hamano
  2015-04-07 13:48     ` Rasmus Villemoes
  3 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2015-04-05 20:09 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> So we'd have to either:
>
>   1. Decide that doesn't matter.
>
>   2. Have callers specify a "damn the NULs, I want it fast" flag.

The callers that used to call fgets and then later rewritten to
strbuf_getwholeline(), either of the above obviously should be OK,
and because the whole reason why we added strbuf_getline() interface
was to avoid having to repeatedly call fgets() and concatenate the
result if the fixed-size buffer we would give it is too small, I'd
say the callers that want to read lines terminated by LF and have NUL
as part of payload would be a tiny minority.

It depends on what we would find out after auditing all callers of
this function, but I would not be surprised if we decided #1 (i.e.
"this is about a _line_; what are you doing by having a NUL on
it?"), and the safest would be to do the inverse of #2, i.e. make it
fast by default and make oddball callers that care about NULs to
pass a flag.

Thanks for working on this.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-05  5:35       ` Jeff King
@ 2015-04-05 20:49         ` Junio C Hamano
  0 siblings, 0 replies; 44+ messages in thread
From: Junio C Hamano @ 2015-04-05 20:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Sun, Apr 05, 2015 at 01:27:32AM -0400, Jeff King wrote:
>
>> On Sun, Apr 05, 2015 at 12:56:14AM -0400, Jeff King wrote:
>> 
>> > The big downside is that our input strings are no longer NUL-clean
>> > (reading "foo\0bar\n" would yield just "foo". I doubt that matters in
>> > the real world, but it does fail a few of the tests (e.g., t7008 tries
>> > to read a list of patterns which includes NUL, and we silently truncate
>> > the pattern rather than read in the NUL and barf).
>> 
>> So there is this trick:
>> 
>> diff --git a/strbuf.c b/strbuf.c
>> index f319d8d..5ceebb7 100644
>> --- a/strbuf.c
>> +++ b/strbuf.c
>> @@ -445,12 +445,13 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
>>  	strbuf_reset(sb);
>>  
>>  	if (term == '\n') {
>> +		long pos = ftell(fp);
>>  		strbuf_grow(sb, 256);
>>  		if (!fgets(sb->buf, sb->alloc - 1, fp)) {
>>  			strbuf_release(sb);
>>  			return EOF;
>>  		}
>> -		sb->len = strlen(sb->buf);
>> +		sb->len = ftell(fp) - pos;
>>  		if (sb->buf[sb->len - 1] == '\n')
>>  			return 0;
>>  	}
>> 
>> but much to my surprise it actually runs slower than the strlen version!


The later loop you have "oops, the thing turns out to be longer than
we thought, so let's do byte-by-byte" is protected with locking, but
this part is not, and that suggests me that the ftell-fgets-ftell
sequence we see here may have its own locking cost built-in in the
stdio library, too, perhaps?

>> It also has a 32-bit overflow issue. There's fgetpos() as an
>> alternative, but fpos_t is an opaque type, and we might not be able to
>> do arithmetic on it (for that matter, I am not sure if arithmetic is
>> strictly guaranteed on ftell() results). POSIX gives us ftello(), which
>> returns an off_t. That would probably be fine.
>
> Actually, scratch that idea. ftell() always returns 0 on a non-seekable
> file, so we can't use it in the general case. And that probably explains
> the performance difference, too, if it is not keeping its own counter
> and relies on lseek(fileno(fp)) or similar.

Looked so promising, though ;-) X-<.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] address packed-refs speed regressions
  2015-04-05 18:52   ` Jeff King
  2015-04-05 18:59     ` Jeff King
@ 2015-04-05 22:39     ` René Scharfe
  2015-04-06  4:49       ` Jeff King
  1 sibling, 1 reply; 44+ messages in thread
From: René Scharfe @ 2015-04-05 22:39 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty

Am 05.04.2015 um 20:52 schrieb Jeff King:
> On Sun, Apr 05, 2015 at 03:41:39PM +0200, René Scharfe wrote:
>> I wonder if pluggable reference backends could help here.  Storing refs
>> in a database table indexed by refname should simplify things.
>
> ...this. I think that effort might be better spent on a ref storage
> format that's more efficient, simpler (with respect to subtle races and
> such), and could provide other features (e.g., transactional atomicity).

Such as a DBMS? :-)  Leaving storage details to SQLite or whatever 
sounds attractive to me because I'm lazy.

> The big plus side of packed-refs improvements is that they "just work"
> without worrying about compatibility issues. But ref storage is local,
> so I'm not sure how big a deal that is in practice.

Adding a dependency is a big step, admittedly, so native improvements 
might be a better fit.  There's a chance that we'd run into issues 
already solved by specialized database engines long ago, though.

>> Short-term, can we avoid the getc()/strbuf_grow() dance e.g. by mapping
>> the packed refs file?  What numbers do you get with the following patch?
>
> It's about 9% faster than my series + the fgets optimization I posted
> (or about 25% than using getc).  Which is certainly nice, but I was
> really hoping to just make strbuf_getline faster for all callers, rather
> than introducing special code for one call-site.  Certainly we could
> generalize the technique (i.e., a struct with the mmap data), but then I
> feel we are somewhat reinventing stdio. Which is maybe a good thing,
> because stdio has a lot of rough edges (as seen here), but it does feel
> a bit like NIH syndrome.

Forgot to say: I like your changes.  But if strbuf_getline can only be 
made fast enough beyond that by duplicating stdio buffering then I feel 
it's better to take a different way.  E.g. dropping the requirement to 
handle NUL chars and basing it on fgets as Junio suggested in his reply 
to patch 3 sounds good.

In any case, the packed refs file seems special enough to receive 
special treatment.  Using mmap would make the most sense if we could 
also avoid copying lines to a strbuf for parsing, though.

René

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] address packed-refs speed regressions
  2015-04-05 18:59     ` Jeff King
@ 2015-04-05 23:04       ` René Scharfe
  0 siblings, 0 replies; 44+ messages in thread
From: René Scharfe @ 2015-04-05 23:04 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty

Am 05.04.2015 um 20:59 schrieb Jeff King:
> Still, the numbers are promising. Here's are comparisons
> against for-each-ref on torvalds/linux, which has a 218M
> packed-refs file:
>
>    $ time git for-each-ref \
>        --format='%(objectname) %(refname)' \
>        refs/remotes/2325298/ |
>        wc -c
>    44139
>
>    real    0m1.649s
>    user    0m1.332s
>    sys     0m0.304s
>
>    $ time ~peff/git-quick-list refs/remotes/2325298/ | wc -c
>    44139
>
>    real    0m0.012s
>    user    0m0.004s
>    sys     0m0.004s

Sweet numbers. :-P

I'm not familiar with refs.c, but its sheer size alone suggests that it 
won't be easy to integrate this prototype code there. :-/

René

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow
  2015-04-05  1:11 ` [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow Jeff King
@ 2015-04-06  2:13   ` Eric Sunshine
  2015-04-06  5:05     ` Jeff King
  0 siblings, 1 reply; 44+ messages in thread
From: Eric Sunshine @ 2015-04-06  2:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List

On Sat, Apr 4, 2015 at 9:11 PM, Jeff King <peff@peff.net> wrote:
> We have to call strbuf_grow anytime we are going to add data
> to a strbuf. In most cases, it's a noop (since we grow the
> buffer aggressively), and the cost of the function call and
> size check is dwarfed by the actual buffer operation.
>
> For a tight loop of single-character additions, though, this
> overhead is noticeable. Furthermore, the single-character
> case is much easier to check; since the "extra" parameter is
> 1, we can do it without worrying about overflow.
>
> This patch adds a simple inline function for checking
> single-character growth. For the growth case, it just calls
> into the regular strbuf_grow(). This is redundant, as
> strbuf_grow will check again whether we need to grow. But it
> keeps our inline code simple, and most calls will not need
> to grow, so it's OK to treat this as a rare "slow path".
>
> We apply the new function to strbuf_getwholeline. [...]
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> diff --git a/strbuf.c b/strbuf.c
> index af2bad4..2facd5f 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -445,7 +445,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
>         strbuf_reset(sb);
>         flockfile(fp);
>         while ((ch = getc_unlocked(fp)) != EOF) {
> -               strbuf_grow(sb, 1);
> +               strbuf_grow_ch(sb);

strbuf_grow_ch() seems overly special-case. What about instead taking
advantage of inline strbuf_avail() to do something like this?

    if (!strbuf_avail())
        strbuf_grow(sb, 1);

(Minor tangent: The 1 is still slightly magical and potentially
confusing for someone who doesn't know that the buffer is grown
aggressively, so changing it to a larger number might make it more
obvious to the casual reader that the buffer is in fact not being
grown on every iteration.)

>                 sb->buf[sb->len++] = ch;
>                 if (ch == term)
>                         break;
> diff --git a/strbuf.h b/strbuf.h
> index 1883494..ef41151 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -137,6 +137,15 @@ static inline size_t strbuf_avail(const struct strbuf *sb)
>   */
>  extern void strbuf_grow(struct strbuf *, size_t);
>
> +/*
> + * An optimized version of strbuf_grow() for a single character.
> + */
> +static inline void strbuf_grow_ch(struct strbuf *sb)
> +{
> +       if (!sb->alloc || sb->alloc - 1 <= sb->len)
> +               strbuf_grow(sb, 1);
> +}
> +
>  /**
>   * Set the length of the buffer to a given value. This function does *not*
>   * allocate new memory, so you should not perform a `strbuf_setlen()` to a
> --
> 2.4.0.rc0.363.gf9f328b

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 0/6] address packed-refs speed regressions
  2015-04-05 22:39     ` René Scharfe
@ 2015-04-06  4:49       ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-06  4:49 UTC (permalink / raw)
  To: René Scharfe; +Cc: git, Michael Haggerty

On Mon, Apr 06, 2015 at 12:39:15AM +0200, René Scharfe wrote:

> >...this. I think that effort might be better spent on a ref storage
> >format that's more efficient, simpler (with respect to subtle races and
> >such), and could provide other features (e.g., transactional atomicity).
> 
> Such as a DBMS? :-)  Leaving storage details to SQLite or whatever sounds
> attractive to me because I'm lazy.

Exactly. Though I think some folks were worried about the extra
dependency (e.g., I think SQLite is hard for JGit, because there's no
pure-java implementation, which makes Eclipse unhappy).

With pluggable backends we can make something like a SQLite backend
optional. I.e., use it if you want the benefits and can accept the
portability downsides. But that also risks fracturing the community, and
people on the "old" format being left behind.

> Forgot to say: I like your changes.  But if strbuf_getline can only be made
> fast enough beyond that by duplicating stdio buffering then I feel it's
> better to take a different way.  E.g. dropping the requirement to handle NUL
> chars and basing it on fgets as Junio suggested in his reply to patch 3
> sounds good.

Yeah, though we probably need to either audit the callers, or provide a
flag for each caller to turn on the speed-over-NULs behavior. I'll look
into that, but it may not be this week, as I'll be traveling starting
tomorrow.

> In any case, the packed refs file seems special enough to receive special
> treatment.  Using mmap would make the most sense if we could also avoid
> copying lines to a strbuf for parsing, though.

I had a similar thought. Below is hacky patch, on top of your mmap
patch, that does this. It does shave off another 300ms (around 5%).

I think we may be getting into a useless area of micro-optimizing here,
though.  The results are noticeable on this ridiculous repository, but
probably not so much on real ones. The low-hanging fruit (e.g., dropping
time in half by using getc_unlocked) seems to provide the most bang for
the buck.

---
diff --git a/refs.c b/refs.c
index 144255f..708b49b 100644
--- a/refs.c
+++ b/refs.c
@@ -334,27 +334,40 @@ static int refname_is_safe(const char *refname)
 	return 1;
 }
 
-static struct ref_entry *create_ref_entry(const char *refname,
-					  const unsigned char *sha1, int flag,
-					  int check_name)
+static struct ref_entry *create_ref_entry_len(const char *refname, size_t len,
+					      const unsigned char *sha1, int flag,
+					      int check_name)
 {
-	int len;
 	struct ref_entry *ref;
 
+	/*
+	 * allocate before checking, since the check functions require
+	 * a NUL-terminated refname. And since we die() anyway if
+	 * the check fails, the overhead of the extra malloc is negligible
+	 */
+	ref = xmalloc(sizeof(struct ref_entry) + len + 1);
+	hashcpy(ref->u.value.sha1, sha1);
+	hashclr(ref->u.value.peeled);
+	memcpy(ref->name, refname, len);
+	ref->name[len] = '\0';
+	ref->flag = flag;
+
 	if (check_name &&
 	    check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
 		die("Reference has invalid format: '%s'", refname);
 	if (!check_name && !refname_is_safe(refname))
 		die("Reference has invalid name: '%s'", refname);
-	len = strlen(refname) + 1;
-	ref = xmalloc(sizeof(struct ref_entry) + len);
-	hashcpy(ref->u.value.sha1, sha1);
-	hashclr(ref->u.value.peeled);
-	memcpy(ref->name, refname, len);
-	ref->flag = flag;
 	return ref;
 }
 
+static struct ref_entry *create_ref_entry(const char *refname,
+					 const unsigned char *sha1, int flag,
+					 int check_name)
+{
+	return create_ref_entry_len(refname, strlen(refname), sha1, flag,
+				    check_name);
+}
+
 static void clear_ref_dir(struct ref_dir *dir);
 
 static void free_ref_entry(struct ref_entry *entry)
@@ -1095,7 +1108,9 @@ static const char PACKED_REFS_HEADER[] =
  * Return a pointer to the refname within the line (null-terminated),
  * or NULL if there was a problem.
  */
-static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1)
+static const char *parse_ref_line(const char *line, int len,
+				  unsigned char *sha1,
+				  size_t *refname_len)
 {
 	const char *ref;
 
@@ -1107,22 +1122,22 @@ static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1)
 	 *  +1 (space in between hex and name)
 	 *  +1 (newline at the end of the line)
 	 */
-	if (line->len <= 42)
+	if (len <= 42)
 		return NULL;
 
-	if (get_sha1_hex(line->buf, sha1) < 0)
+	if (get_sha1_hex(line, sha1) < 0)
 		return NULL;
-	if (!isspace(line->buf[40]))
+	if (!isspace(line[40]))
 		return NULL;
 
-	ref = line->buf + 41;
+	ref = line + 41;
 	if (isspace(*ref))
 		return NULL;
 
-	if (line->buf[line->len - 1] != '\n')
+	if (line[len - 1] != '\n')
 		return NULL;
-	line->buf[--line->len] = 0;
 
+	*refname_len = len - (ref - line) - 1;
 	return ref;
 }
 
@@ -1156,7 +1171,6 @@ static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1)
 static void read_packed_refs(int fd, struct ref_dir *dir)
 {
 	struct ref_entry *last = NULL;
-	struct strbuf line = STRBUF_INIT;
 	enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
 	struct stat st;
 	void *map;
@@ -1172,18 +1186,20 @@ static void read_packed_refs(int fd, struct ref_dir *dir)
 	for (p = map, len = mapsz; len; ) {
 		unsigned char sha1[20];
 		const char *refname;
+		size_t refname_len;
 		const char *traits;
 		const char *nl;
+		const char *line;
 		size_t linelen;
 
 		nl = memchr(p, '\n', len);
+		line = p;
 		linelen = nl ? nl - p + 1 : len;
-		strbuf_reset(&line);
-		strbuf_add(&line, p, linelen);
 		p += linelen;
 		len -= linelen;
 
-		if (skip_prefix(line.buf, "# pack-refs with:", &traits)) {
+		/* XXX these should take care not to look past linelen */
+		if (skip_prefix(line, "# pack-refs with:", &traits)) {
 			if (strstr(traits, " fully-peeled "))
 				peeled = PEELED_FULLY;
 			else if (strstr(traits, " peeled "))
@@ -1192,7 +1208,7 @@ static void read_packed_refs(int fd, struct ref_dir *dir)
 			continue;
 		}
 
-		refname = parse_ref_line(&line, sha1);
+		refname = parse_ref_line(line, linelen, sha1, &refname_len);
 		if (refname) {
 			int flag = REF_ISPACKED;
 
@@ -1200,7 +1216,7 @@ static void read_packed_refs(int fd, struct ref_dir *dir)
 				hashclr(sha1);
 				flag |= REF_BAD_NAME | REF_ISBROKEN;
 			}
-			last = create_ref_entry(refname, sha1, flag, 0);
+			last = create_ref_entry_len(refname, len, sha1, flag, 0);
 			if (peeled == PEELED_FULLY ||
 			    (peeled == PEELED_TAGS && starts_with(refname, "refs/tags/")))
 				last->flag |= REF_KNOWS_PEELED;
@@ -1208,10 +1224,10 @@ static void read_packed_refs(int fd, struct ref_dir *dir)
 			continue;
 		}
 		if (last &&
-		    line.buf[0] == '^' &&
-		    line.len == PEELED_LINE_LENGTH &&
-		    line.buf[PEELED_LINE_LENGTH - 1] == '\n' &&
-		    !get_sha1_hex(line.buf + 1, sha1)) {
+		    line[0] == '^' &&
+		    linelen == PEELED_LINE_LENGTH &&
+		    line[PEELED_LINE_LENGTH - 1] == '\n' &&
+		    !get_sha1_hex(line + 1, sha1)) {
 			hashcpy(last->u.value.peeled, sha1);
 			/*
 			 * Regardless of what the file header said,
@@ -1222,7 +1238,6 @@ static void read_packed_refs(int fd, struct ref_dir *dir)
 		}
 	}
 
-	strbuf_release(&line);
 	munmap(map, mapsz);
 }
 

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow
  2015-04-06  2:13   ` Eric Sunshine
@ 2015-04-06  5:05     ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-06  5:05 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List

On Sun, Apr 05, 2015 at 10:13:21PM -0400, Eric Sunshine wrote:

> > -               strbuf_grow(sb, 1);
> > +               strbuf_grow_ch(sb);
> 
> strbuf_grow_ch() seems overly special-case. What about instead taking
> advantage of inline strbuf_avail() to do something like this?
> 
>     if (!strbuf_avail())
>         strbuf_grow(sb, 1);

Thanks, I somehow missed that function (despite it being a few line
above the one I added!).

I agree that strbuf_avail is a much better generic interface, and it
turns out to be just as fast (actually, a tiny bit faster in my tests).
I'll use that in the re-roll.

> (Minor tangent: The 1 is still slightly magical and potentially
> confusing for someone who doesn't know that the buffer is grown
> aggressively, so changing it to a larger number might make it more
> obvious to the casual reader that the buffer is in fact not being
> grown on every iteration.)

I agree this is slightly confusing (and I had to double-check how
strbuf_grow worked while writing this series). OTOH, this is not so much
about the "1" here as about how strbufs work. We care about the
amortized asymptotic cost. strbuf_add() has the same issue; we add more
bytes in each chunk, but we would still want to make sure that there is
a sub-linear relationship between the number of adds and the number of
allocations).

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-05  4:56   ` Jeff King
                       ` (2 preceding siblings ...)
  2015-04-05 20:09     ` Junio C Hamano
@ 2015-04-07 13:48     ` Rasmus Villemoes
  2015-04-07 19:04       ` Jeff King
  3 siblings, 1 reply; 44+ messages in thread
From: Rasmus Villemoes @ 2015-04-07 13:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Sun, Apr 05 2015, Jeff King <peff@peff.net> wrote:

> which is back to the v2.0.0 number. Even with the extra strlen, it seems
> that what fgets does internally beats repeated getc calls. Which I guess
> is not too surprising, as each getc() will have to check for underflow
> in the buffer. Perhaps there is more room to micro-optimize
> strbuf_getwholeline, but I kind of doubt it.
>
> The big downside is that our input strings are no longer NUL-clean
> (reading "foo\0bar\n" would yield just "foo". I doubt that matters in
> the real world, but it does fail a few of the tests (e.g., t7008 tries
> to read a list of patterns which includes NUL, and we silently truncate
> the pattern rather than read in the NUL and barf).
>
> So we'd have to either:
>
>   1. Decide that doesn't matter.
>
>   2. Have callers specify a "damn the NULs, I want it fast" flag.
>
>   3. Find some alternative that is more robust than fgets, and faster
>      than getc. I don't think there is anything in stdio, but I am not
>      above dropping in a faster non-portable call if it is available,
>      and then falling back to the current code otherwise.

getdelim is in POSIX 2008
(http://pubs.opengroup.org/stage7tc1/functions/getdelim.html), so should
be available on any half-{d,r}ecent platform. It obviously has the
advantage of having access to the internal stdio buffer, and by
definition handles embedded NULs. No idea if using such modern
interfaces in git is ok, though. 

Implementation-wise, I think strbuf_getwholeline could be implemented
mostly as a simple wrapper for getdelim. If I'm reading the current code
and the posix spec for getdelim correctly, something like this should do
it (though obviously not meant to be included as-is):

diff --git a/strbuf.c b/strbuf.c
index 0346e74..d3338b9 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -434,6 +434,32 @@ int strbuf_getcwd(struct strbuf *sb)
 	return -1;
 }
 
+#if USE_GETDELIM
+/* Hacky declaration to avoid messing with feature test macros. */
+ssize_t getdelim(char **, size_t *, int, FILE *);
+int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
+{
+	ssize_t r;
+
+	if (feof(fp))
+		return EOF;
+
+	strbuf_reset(sb);
+	if (!sb->alloc)
+		sb->buf = NULL;
+
+	r = getdelim(&sb->buf, &sb->alloc, term, fp);
+
+	if (r > 0) {
+		sb->len = r;
+		return 0;
+	}
+	assert(r == -1);
+	if (!sb->buf)
+		strbuf_init(sb);
+	return EOF;
+}
+#else
 int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 {
 	int ch;
@@ -454,6 +480,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 	sb->buf[sb->len] = '\0';
 	return 0;
 }
+#endif
 
 int strbuf_getline(struct strbuf *sb, FILE *fp, int term)
 {


Rasmus

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-07 13:48     ` Rasmus Villemoes
@ 2015-04-07 19:04       ` Jeff King
  2015-04-07 22:43         ` Rasmus Villemoes
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2015-04-07 19:04 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: git

On Tue, Apr 07, 2015 at 03:48:33PM +0200, Rasmus Villemoes wrote:

> >   3. Find some alternative that is more robust than fgets, and faster
> >      than getc. I don't think there is anything in stdio, but I am not
> >      above dropping in a faster non-portable call if it is available,
> >      and then falling back to the current code otherwise.
> 
> getdelim is in POSIX 2008
> (http://pubs.opengroup.org/stage7tc1/functions/getdelim.html), so should
> be available on any half-{d,r}ecent platform. It obviously has the
> advantage of having access to the internal stdio buffer, and by
> definition handles embedded NULs. No idea if using such modern
> interfaces in git is ok, though.

Thanks, that's perfect. I knew about getline(), but not getdelim(), and
I had thought getline() unconditionally malloc'd. But it doesn't; it
behaves exactly as we are already doing here. :-/

> Implementation-wise, I think strbuf_getwholeline could be implemented
> mostly as a simple wrapper for getdelim. If I'm reading the current code
> and the posix spec for getdelim correctly, something like this should do
> it (though obviously not meant to be included as-is):

I think it's close to what we want. strbuf_grow calls xrealloc, which
will call try_to_free_routine() and possibly die() for us. So we would
probably want to check errno on failure and respond similarly if we get
ENOMEM.

Your patch performs even faster than my fgets version (about 8%).

I wonder if it is even worth doing the getc_unlocked dance at all. It
would potentially speed up the fallback code, but my hope that would be
that most systems would simply use the getdelim() version.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-07 19:04       ` Jeff King
@ 2015-04-07 22:43         ` Rasmus Villemoes
  2015-04-08  0:17           ` Jeff King
  0 siblings, 1 reply; 44+ messages in thread
From: Rasmus Villemoes @ 2015-04-07 22:43 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Tue, Apr 07 2015, Jeff King <peff@peff.net> wrote:

> On Tue, Apr 07, 2015 at 03:48:33PM +0200, Rasmus Villemoes wrote:
>
>> Implementation-wise, I think strbuf_getwholeline could be implemented
>> mostly as a simple wrapper for getdelim. If I'm reading the current code
>> and the posix spec for getdelim correctly, something like this should do
>> it (though obviously not meant to be included as-is):
>
> I think it's close to what we want. strbuf_grow calls xrealloc, which
> will call try_to_free_routine() and possibly die() for us. So we would
> probably want to check errno on failure and respond similarly if we get
> ENOMEM.

Hm, I'm afraid it's not that simple. It seems that data may be lost from
the stream if getdelim encounters ENOMEM: Looking at the glibc
implementation (libio/iogetdelim.c), if reallocating the user buffer
fails, -1 is returned (presumably with errno==ENOMEM set by realloc), and
there's no way of knowing how many bytes were already copied to the
buffer (cur_len).

For regular files, I suppose one could do a ftell/fseek dance. For
other cases, I don't see a reliable way to retry upon ENOMEM.

Related thread on the posix mailing list:
http://thread.gmane.org/gmane.comp.standards.posix.austin.general/10091

> I wonder if it is even worth doing the getc_unlocked dance at all. It
> would potentially speed up the fallback code, but my hope that would be
> that most systems would simply use the getdelim() version.

Well, it's not really an intrusive patch, and 42% speedup is rather
significant. And, of course, the above ENOMEM issue may mean that
getdelim isn't usable after all.

Rasmus

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked
  2015-04-07 22:43         ` Rasmus Villemoes
@ 2015-04-08  0:17           ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-08  0:17 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: git

On Wed, Apr 08, 2015 at 12:43:09AM +0200, Rasmus Villemoes wrote:

> Hm, I'm afraid it's not that simple. It seems that data may be lost from
> the stream if getdelim encounters ENOMEM: Looking at the glibc
> implementation (libio/iogetdelim.c), if reallocating the user buffer
> fails, -1 is returned (presumably with errno==ENOMEM set by realloc), and
> there's no way of knowing how many bytes were already copied to the
> buffer (cur_len).
> 
> For regular files, I suppose one could do a ftell/fseek dance. For
> other cases, I don't see a reliable way to retry upon ENOMEM.

Ah, right, that makes sense.

I think it may be OK to just `die()` on ENOMEM here. The "try to free"
routine is basically about unmapping pack memory, in case it is clogging
up the address space (it is mmap'd, so it shouldn't cause real RAM
pressure; the OS can just drop the pages from the cache if it wants).

But getdelim() calls are not likely to be big allocations in the first
place (so they are not likely to fail at all, and if they do, we really
are horribly out of memory). I'd also question whether
release_pack_memory is doing anything in practice on 64-bit machines. We
have quite a bit of address space to work with in that case.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v2 0/9] address packed-refs speed regressions
  2015-04-05  1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
                   ` (6 preceding siblings ...)
  2015-04-05 13:41 ` [PATCH 0/6] address packed-refs speed regressions René Scharfe
@ 2015-04-16  8:47 ` Jeff King
  2015-04-16  8:48   ` [PATCH 1/9] strbuf_getwholeline: use getc macro Jeff King
                     ` (9 more replies)
  7 siblings, 10 replies; 44+ messages in thread
From: Jeff King @ 2015-04-16  8:47 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine

On Sat, Apr 04, 2015 at 09:06:11PM -0400, Jeff King wrote:

> As I've mentioned before, I have some repositories with rather large
> numbers of refs. The worst one has ~13 million refs, for a 1.6GB
> packed-refs file. So I was saddened by this:
> 
>   $ time git.v2.0.0 rev-parse refs/heads/foo >/dev/null 2>&1
>   real    0m6.840s
>   user    0m6.404s
>   sys     0m0.440s
> 
>   $ time git.v2.4.0-rc1 rev-parse refs/heads/foo >/dev/null 2>&1
>   real    0m19.432s
>   user    0m18.996s
>   sys     0m0.456s

Here's a re-roll incorporating feedback from the list. Thanks everybody
for your comments. Last time the final number was ~8.5s, which was
disappointingly slower than v2.0.0. In this iteration, my final numbers
are:

  real    0m5.703s
  user    0m5.276s
  sys     0m0.432s

which is quite pleasing.

The big changes that resulted in this additional speedup are:

  1. Use getdelim() when it is available. This is much faster than even
     a getc_unlocked() loop.

  2. The slowdown from d0f810f was from adding in refname_is_safe calls.
     But what I didn't notice before is that we run them in _addition_
     to check_refname_format, rather than instead of it. So in the
     common case of a sanely-formatted refname, we can skip the call,
     rather than writing a lot of code to micro-optimize it.

It was also mentioned in a nearby thread that the config code could
benefit from some of the same micro-optimizations. It can't make use of
getdelim(), as it really does want to do character-by-character parsing.
But it can still use getc_unlocked() and the strbuf_avail() trick, which
speeds up config reading by 47%. Those patches are included here.

  [1/9]: strbuf_getwholeline: use getc macro
  [2/9]: git-compat-util: add fallbacks for unlocked stdio
  [3/9]: strbuf_getwholeline: use getc_unlocked
  [4/9]: config: use getc_unlocked when reading from file
  [5/9]: strbuf_addch: avoid calling strbuf_grow
  [6/9]: strbuf_getwholeline: avoid calling strbuf_grow
  [7/9]: strbuf_getwholeline: use getdelim if it is available
  [8/9]: read_packed_refs: avoid double-checking sane refs
  [9/9]: t1430: add another refs-escape test

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/9] strbuf_getwholeline: use getc macro
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
@ 2015-04-16  8:48   ` Jeff King
  2015-04-16  8:48   ` [PATCH 2/9] git-compat-util: add fallbacks for unlocked stdio Jeff King
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-16  8:48 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine

strbuf_getwholeline calls fgetc in a tight loop. Using the
getc form, which can be implemented as a macro, should be
faster (and we do not care about it evaluating our argument
twice, as we just have a plain variable).

On my glibc system, running "git rev-parse
refs/heads/does-not-exist" on a file with an extremely large
(1.6GB) packed-refs file went from (best of 3 runs):

  real    0m19.383s
  user    0m18.876s
  sys     0m0.528s

to:

  real    0m18.900s
  user    0m18.472s
  sys     0m0.448s

for a wall-clock speedup of 2.5%.

Signed-off-by: Jeff King <peff@peff.net>
---
 strbuf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index 88cafd4..14f337d 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -443,7 +443,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 		return EOF;
 
 	strbuf_reset(sb);
-	while ((ch = fgetc(fp)) != EOF) {
+	while ((ch = getc(fp)) != EOF) {
 		strbuf_grow(sb, 1);
 		sb->buf[sb->len++] = ch;
 		if (ch == term)
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/9] git-compat-util: add fallbacks for unlocked stdio
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
  2015-04-16  8:48   ` [PATCH 1/9] strbuf_getwholeline: use getc macro Jeff King
@ 2015-04-16  8:48   ` Jeff King
  2015-04-16  8:49   ` [PATCH 3/9] strbuf_getwholeline: use getc_unlocked Jeff King
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-16  8:48 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine

POSIX.1-2001 specifies some functions for optimizing the
locking out of tight getc() loops. Not all systems are
POSIX, though, and even not all POSIX systems are required
to implement these functions. We can check for the
feature-test macro to see if they are available, and if not,
provide a noop implementation.

There's no Makefile knob here, because we should just detect
this automatically. If there are very bizarre systems, we
may need to add one, but it's not clear yet in which
direction:

  1. If a system defines _POSIX_THREAD_SAFE_FUNCTIONS but
     these functions are missing or broken, we would want a
     knob to manually turn them off.

  2. If a system has these functions but does not define
     _POSIX_THREAD_SAFE_FUNCTIONS, we would want a knob to
     manually turn them on.

We can add such a knob when we find a real-world system that
matches this.

Signed-off-by: Jeff King <peff@peff.net>
---
 git-compat-util.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index bc8fc8c..685a0a4 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -883,4 +883,10 @@ struct tm *git_gmtime_r(const time_t *, struct tm *);
 # define SHELL_PATH "/bin/sh"
 #endif
 
+#ifndef _POSIX_THREAD_SAFE_FUNCTIONS
+#define flockfile(fh)
+#define funlockfile(fh)
+#define getc_unlocked(fh) getc(fh)
+#endif
+
 #endif
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 3/9] strbuf_getwholeline: use getc_unlocked
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
  2015-04-16  8:48   ` [PATCH 1/9] strbuf_getwholeline: use getc macro Jeff King
  2015-04-16  8:48   ` [PATCH 2/9] git-compat-util: add fallbacks for unlocked stdio Jeff King
@ 2015-04-16  8:49   ` Jeff King
  2015-04-16  8:51   ` [PATCH 4/9] config: use getc_unlocked when reading from file Jeff King
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-16  8:49 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine

strbuf_getwholeline calls getc in a tight loop. On modern
libc implementations, the stdio code locks the handle for
every operation, which means we are paying a significant
overhead.  We can get around this by locking the handle for
the whole loop and using the unlocked variant.

Running "git rev-parse refs/heads/does-not-exist" on a repo
with an extremely large (1.6GB) packed-refs file went from:

  real    0m18.900s
  user    0m18.472s
  sys     0m0.448s

to:

  real    0m10.953s
  user    0m10.384s
  sys     0m0.580s

for a wall-clock speedup of 42%. All times are best-of-3,
and done on a glibc 2.19 system.

Note that we call into strbuf_grow while holding the lock.
It's possible for that function to call other stdio
functions (e.g., printing to stderr when dying due to malloc
error); however, the POSIX.1-2001 definition of flockfile
makes it clear that the locks are per-handle, so we are fine
unless somebody else tries to read from our same handle.
This doesn't ever happen in the current code, and is
unlikely to be added in the future (we would have to do
something exotic like add a die_routine that tried to read
from stdin).

Signed-off-by: Jeff King <peff@peff.net>
---
 strbuf.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index 14f337d..af2bad4 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -443,12 +443,14 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 		return EOF;
 
 	strbuf_reset(sb);
-	while ((ch = getc(fp)) != EOF) {
+	flockfile(fp);
+	while ((ch = getc_unlocked(fp)) != EOF) {
 		strbuf_grow(sb, 1);
 		sb->buf[sb->len++] = ch;
 		if (ch == term)
 			break;
 	}
+	funlockfile(fp);
 	if (ch == EOF && sb->len == 0)
 		return EOF;
 
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 4/9] config: use getc_unlocked when reading from file
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
                     ` (2 preceding siblings ...)
  2015-04-16  8:49   ` [PATCH 3/9] strbuf_getwholeline: use getc_unlocked Jeff King
@ 2015-04-16  8:51   ` Jeff King
  2015-04-16  8:53   ` [PATCH 5/9] strbuf_addch: avoid calling strbuf_grow Jeff King
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-16  8:51 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine

We read config files character-by-character from a stdio
handle using fgetc(). This incurs significant locking
overhead, even though we know that only one thread can
possibly access the handle. We can speed this up by taking
the lock ourselves, and then using getc_unlocked to read
each character.

On a silly pathological case:

  perl -le '
    print "[core]";
    print "key$_ = value$_" for (1..1000000)
  ' >input
  git config -f input core.key1

this dropped the time to run git-config from:

  real    0m0.263s
  user    0m0.260s
  sys     0m0.000s

to:

  real    0m0.159s
  user    0m0.152s
  sys     0m0.004s

for a savings of 39%.  Most config files are not this big,
but the savings should be proportional to the size of the
file (i.e., we always save 39%, just of a much smaller
number).

Signed-off-by: Jeff King <peff@peff.net>
---
 config.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/config.c b/config.c
index 66c0a51..8b297fc 100644
--- a/config.c
+++ b/config.c
@@ -49,7 +49,7 @@ static struct config_set the_config_set;
 
 static int config_file_fgetc(struct config_source *conf)
 {
-	return fgetc(conf->u.file);
+	return getc_unlocked(conf->u.file);
 }
 
 static int config_file_ungetc(int c, struct config_source *conf)
@@ -1088,7 +1088,9 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data)
 
 	f = fopen(filename, "r");
 	if (f) {
+		flockfile(f);
 		ret = do_config_from_file(fn, filename, filename, f, data);
+		funlockfile(f);
 		fclose(f);
 	}
 	return ret;
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 5/9] strbuf_addch: avoid calling strbuf_grow
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
                     ` (3 preceding siblings ...)
  2015-04-16  8:51   ` [PATCH 4/9] config: use getc_unlocked when reading from file Jeff King
@ 2015-04-16  8:53   ` Jeff King
  2015-04-16  8:58   ` [PATCH 6/9] strbuf_getwholeline: " Jeff King
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-16  8:53 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine

We mark strbuf_addch as inline, because we expect it may be
called from a tight loop. However, the first thing it does
is call the non-inline strbuf_grow(), which can handle
arbitrary-sized growth. Since we know that we only need a
single character, we can use the inline strbuf_avail() to
quickly check whether we need to grow at all.

Our check is redundant when we do call strbuf_grow(), but
that's OK. The common case is that we avoid calling it at
all, and we have made that case faster.

On a silly pathological case:

  perl -le '
    print "[core]";
    print "key$_ = value$_" for (1..1000000)
  ' >input
  git config -f input core.key1

this dropped the time to run git-config from:

  real    0m0.159s
  user    0m0.152s
  sys     0m0.004s

to:

  real    0m0.140s
  user    0m0.136s
  sys     0m0.004s

for a savings of 12%.

Signed-off-by: Jeff King <peff@peff.net>
---
I doubt anybody will really notice this in practice with config files,
and for the most part we do not have tight loops of strbuf_addch
elsewhere. But it is such an easy optimization, I'd rather do it now
while we're thinking about it.

 strbuf.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/strbuf.h b/strbuf.h
index 1883494..01c5c63 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -205,7 +205,8 @@ extern int strbuf_cmp(const struct strbuf *, const struct strbuf *);
  */
 static inline void strbuf_addch(struct strbuf *sb, int c)
 {
-	strbuf_grow(sb, 1);
+	if (!strbuf_avail(sb))
+		strbuf_grow(sb, 1);
 	sb->buf[sb->len++] = c;
 	sb->buf[sb->len] = '\0';
 }
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 6/9] strbuf_getwholeline: avoid calling strbuf_grow
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
                     ` (4 preceding siblings ...)
  2015-04-16  8:53   ` [PATCH 5/9] strbuf_addch: avoid calling strbuf_grow Jeff King
@ 2015-04-16  8:58   ` Jeff King
  2015-04-16  9:01   ` [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available Jeff King
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-16  8:58 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine

As with the recent speedup to strbuf_addch, we can avoid
calling strbuf_grow() in a tight loop of single-character
adds by instead checking strbuf_avail.

Note that we would instead call strbuf_addch directly here,
but it does more work than necessary: it will NUL-terminate
the result for each character read. Instead, in this loop we
read the characters one by one and then add the terminator
manually at the end.

Running "git rev-parse refs/heads/does-not-exist" on a repo
with an extremely large (1.6GB) packed-refs file went from
(best-of-5):

  real    0m10.948s
  user    0m10.548s
  sys     0m0.412s

to:

  real    0m8.601s
  user    0m8.084s
  sys     0m0.524s

for a wall-clock speedup of 21%.

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Jeff King <peff@peff.net>
---
Our "don't write a NUL for each character" optimization is only possible
because we're intimate with the strbuf details here. I thought about
making a strbuf_addch_unsafe interface to let other callers do this,
too. But the only other caller that would use it is the config reader,
and I measured only a 3% speedup there. Which I don't think is worth the
extra API complexity.

Whereas here it does make a big difference. Switching to strbuf_addch
knocks us back up into the 9.5s range. I think the difference is that
our lines are much longer than the tokens we're parsing in the config
file. So the percentage of wasted NUL writes is much higher here.

 strbuf.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/strbuf.c b/strbuf.c
index af2bad4..921619e 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -445,7 +445,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 	strbuf_reset(sb);
 	flockfile(fp);
 	while ((ch = getc_unlocked(fp)) != EOF) {
-		strbuf_grow(sb, 1);
+		if (!strbuf_avail(sb))
+			strbuf_grow(sb, 1);
 		sb->buf[sb->len++] = ch;
 		if (ch == term)
 			break;
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
                     ` (5 preceding siblings ...)
  2015-04-16  8:58   ` [PATCH 6/9] strbuf_getwholeline: " Jeff King
@ 2015-04-16  9:01   ` Jeff King
  2015-04-17 10:16     ` Eric Sunshine
  2015-04-16  9:03   ` [PATCH 8/9] read_packed_refs: avoid double-checking sane refs Jeff King
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2015-04-16  9:01 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine

We spend a lot of time in strbuf_getwholeline in a tight
loop reading characters from a stdio handle into a buffer.
The libc getdelim() function can do this for us with less
overhead. It's in POSIX.1-2008, and was a GNU extension
before that. Therefore we can't rely on it, but can fall
back to the existing getc loop when it is not available.

The HAVE_GETDELIM knob is turned on automatically for Linux,
where we have glibc. We don't need to set any new
feature-test macros, because we already define _GNU_SOURCE.
Other systems that implement getdelim may need to other
macros (probably _POSIX_C_SOURCE >= 200809L), but we can
address that along with setting the Makefile knob after
testing the feature on those systems.

Running "git rev-parse refs/heads/does-not-exist" on a repo
with an extremely large (1.6GB) packed-refs file went from
(best-of-5):

  real    0m8.601s
  user    0m8.084s
  sys     0m0.524s

to:

  real    0m6.768s
  user    0m6.340s
  sys     0m0.432s

for a wall-clock speedup of 21%.

Based on a patch from Rasmus Villemoes <rv@rasmusvillemoes.dk>.

Signed-off-by: Jeff King <peff@peff.net>
---
If somebody has a FreeBSD or OS X system to test on, I'd
love to see what is needed to compile with HAVE_GETDELIM
there. And to confirm that the performance is much better.
Sharing my 1.6GB packed-refs file would be hard, but you
should be able to generate something large and ridiculous.
I'll leave that as an exercise to the reader.

 Makefile         |  6 ++++++
 config.mak.uname |  1 +
 strbuf.c         | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 49 insertions(+)

diff --git a/Makefile b/Makefile
index 5f3987f..36655d5 100644
--- a/Makefile
+++ b/Makefile
@@ -359,6 +359,8 @@ all::
 # compiler is detected to support it.
 #
 # Define HAVE_BSD_SYSCTL if your platform has a BSD-compatible sysctl function.
+#
+# Define HAVE_GETDELIM if your system has the getdelim() function.
 
 GIT-VERSION-FILE: FORCE
 	@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -1437,6 +1439,10 @@ ifdef HAVE_BSD_SYSCTL
 	BASIC_CFLAGS += -DHAVE_BSD_SYSCTL
 endif
 
+ifdef HAVE_GETDELIM
+	BASIC_CFLAGS += -DHAVE_GETDELIM
+endif
+
 ifeq ($(TCLTK_PATH),)
 NO_TCLTK = NoThanks
 endif
diff --git a/config.mak.uname b/config.mak.uname
index f4e77cb..d26665f 100644
--- a/config.mak.uname
+++ b/config.mak.uname
@@ -36,6 +36,7 @@ ifeq ($(uname_S),Linux)
 	HAVE_DEV_TTY = YesPlease
 	HAVE_CLOCK_GETTIME = YesPlease
 	HAVE_CLOCK_MONOTONIC = YesPlease
+	HAVE_GETDELIM = YesPlease
 endif
 ifeq ($(uname_S),GNU/kFreeBSD)
 	HAVE_ALLOCA_H = YesPlease
diff --git a/strbuf.c b/strbuf.c
index 921619e..0d4f4e5 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -435,6 +435,47 @@ int strbuf_getcwd(struct strbuf *sb)
 	return -1;
 }
 
+#ifdef HAVE_GETDELIM
+int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
+{
+	ssize_t r;
+
+	if (feof(fp))
+		return EOF;
+
+	strbuf_reset(sb);
+
+	/* Translate slopbuf to NULL, as we cannot call realloc on it */
+	if (!sb->alloc)
+		sb->buf = NULL;
+	r = getdelim(&sb->buf, &sb->alloc, term, fp);
+
+	if (r > 0) {
+		sb->len = r;
+		return 0;
+	}
+	assert(r == -1);
+
+	/*
+	 * Normally we would have called xrealloc, which will try to free
+	 * memory and recover. But we have no way to tell getdelim() to do so.
+	 * Worse, we cannot try to recover ENOMEM ourselves, because we have
+	 * no idea how many bytes were read by getdelim.
+	 *
+	 * Dying here is reasonable. It mirrors what xrealloc would do on
+	 * catastrophic memory failure. We skip the opportunity to free pack
+	 * memory and retry, but that's unlikely to help for a malloc small
+	 * enough to hold a single line of input, anyway.
+	 */
+	if (errno == ENOMEM)
+		die("Out of memory, getdelim failed");
+
+	/* Restore slopbuf that we moved out of the way before */
+	if (!sb->buf)
+		strbuf_init(sb, 0);
+	return EOF;
+}
+#else
 int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 {
 	int ch;
@@ -458,6 +499,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term)
 	sb->buf[sb->len] = '\0';
 	return 0;
 }
+#endif
 
 int strbuf_getline(struct strbuf *sb, FILE *fp, int term)
 {
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 8/9] read_packed_refs: avoid double-checking sane refs
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
                     ` (6 preceding siblings ...)
  2015-04-16  9:01   ` [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available Jeff King
@ 2015-04-16  9:03   ` Jeff King
  2015-04-16  9:04   ` [PATCH 9/9] t1430: add another refs-escape test Jeff King
  2015-04-16  9:25   ` [PATCH v2 0/9] address packed-refs speed regressions Jeff King
  9 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-16  9:03 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine

Prior to d0f810f (refs.c: allow listing and deleting badly
named refs, 2014-09-03), read_packed_refs would barf on any
malformed refnames by virtue of calling create_ref_entry
with the "check" parameter set to 1. That commit loosened
our reading so that we call check_refname_format ourselves
and just set a REF_BAD_NAME flag.

We then call create_ref_entry with the check parameter set
to 0. That function learned to do an extra safety check even
when the check parameter is 0, so that we don't load any
dangerous refnames (like "../../../etc/passwd"). This is
implemented by calling refname_is_safe() in
create_ref_entry().

However, we can observe that refname_is_safe() can only be
true if check_refname_format() also failed. So in the common
case of a sanely named ref, we perform _both_ checks, even
though we know that the latter will never trigger. This has
a noticeable performance impact when the packed-refs file is
large.

Let's drop the refname_is_safe check from create_ref_entry(),
and make it the responsibility of the caller.  Of the three
callers that pass a check parameter of "0", two will have
just called check_refname_format(), and can check the
refname-safety only when it fails. The third case,
pack_if_possible_fn, is copying from an existing ref entry,
which must have previously passed our safety check.

With this patch, running "git rev-parse refs/heads/does-not-exist"
on a repo with a large (1.6GB) packed-refs file went from:

  real    0m6.768s
  user    0m6.340s
  sys     0m0.432s

to:

  real    0m5.703s
  user    0m5.276s
  sys     0m0.432s

for a wall-clock speedup of 15%.

Signed-off-by: Jeff King <peff@peff.net>
---
 refs.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index 47e4e53..f36ea75 100644
--- a/refs.c
+++ b/refs.c
@@ -344,8 +344,6 @@ static struct ref_entry *create_ref_entry(const char *refname,
 	if (check_name &&
 	    check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
 		die("Reference has invalid format: '%s'", refname);
-	if (!check_name && !refname_is_safe(refname))
-		die("Reference has invalid name: '%s'", refname);
 	len = strlen(refname) + 1;
 	ref = xmalloc(sizeof(struct ref_entry) + len);
 	hashcpy(ref->u.value.sha1, sha1);
@@ -1178,6 +1176,8 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 			int flag = REF_ISPACKED;
 
 			if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
+				if (!refname_is_safe(refname))
+					die("packed refname is dangerous: %s", refname);
 				hashclr(sha1);
 				flag |= REF_BAD_NAME | REF_ISBROKEN;
 			}
@@ -1323,6 +1323,8 @@ static void read_loose_refs(const char *dirname, struct ref_dir *dir)
 			}
 			if (check_refname_format(refname.buf,
 						 REFNAME_ALLOW_ONELEVEL)) {
+				if (!refname_is_safe(refname.buf))
+					die("loose refname is dangerous: %s", refname.buf);
 				hashclr(sha1);
 				flag |= REF_BAD_NAME | REF_ISBROKEN;
 			}
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 9/9] t1430: add another refs-escape test
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
                     ` (7 preceding siblings ...)
  2015-04-16  9:03   ` [PATCH 8/9] read_packed_refs: avoid double-checking sane refs Jeff King
@ 2015-04-16  9:04   ` Jeff King
  2015-04-16  9:25   ` [PATCH v2 0/9] address packed-refs speed regressions Jeff King
  9 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-16  9:04 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine

In t1430, we check whether deleting the branch "../../foo"
will delete ".git/foo". However, this is not that
interesting a test; the precious file ".git/foo" does not
look like a ref, so even if we did not notice the "escape"
from the "refs/" hierarchy, we would fail for that reason
(i.e., if you turned refname_is_safe into a noop, the test
still passes).

Let's add an additional test for the same thing, but with a
file that actually looks like a ref. That will make sure we
are exercising the refname_is_safe code. While we're at it,
let's also make the code work a little harder by adding some
extra paths and some empty path components.

Signed-off-by: Jeff King <peff@peff.net>
---
This was originally included to exercise refname_is_safe(), because in
the v1 series I refactored it (here I just avoid calling it entirely).
So it's not as important in v2. But AFAICT, we do not exercise
refname_is_safe() at all in the test suite without this patch, so it's
probably a good thing to do regardless.

 t/t1430-bad-ref-name.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh
index 468e856..16d0b8b 100755
--- a/t/t1430-bad-ref-name.sh
+++ b/t/t1430-bad-ref-name.sh
@@ -68,6 +68,14 @@ test_expect_success 'branch -D cannot delete non-ref in .git dir' '
 	test_cmp expect .git/my-private-file
 '
 
+test_expect_success 'branch -D cannot delete ref in .git dir' '
+	git rev-parse HEAD >.git/my-private-file &&
+	git rev-parse HEAD >expect &&
+	git branch foo/legit &&
+	test_must_fail git branch -D foo////./././../../../my-private-file &&
+	test_cmp expect .git/my-private-file
+'
+
 test_expect_success 'branch -D cannot delete absolute path' '
 	git branch -f extra &&
 	test_must_fail git branch -D "$(pwd)/.git/refs/heads/extra" &&
-- 
2.4.0.rc2.384.g7297a4a

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v2 0/9] address packed-refs speed regressions
  2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
                     ` (8 preceding siblings ...)
  2015-04-16  9:04   ` [PATCH 9/9] t1430: add another refs-escape test Jeff King
@ 2015-04-16  9:25   ` Jeff King
  9 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-16  9:25 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine

On Thu, Apr 16, 2015 at 04:47:34AM -0400, Jeff King wrote:

> Here's a re-roll incorporating feedback from the list. Thanks everybody
> for your comments. Last time the final number was ~8.5s, which was
> disappointingly slower than v2.0.0. In this iteration, my final numbers
> are:
> 
>   real    0m5.703s
>   user    0m5.276s
>   sys     0m0.432s
> 
> which is quite pleasing.

I forgot to mention what I _didn't_ include.

We discussed using mmap instead of stdio. Applying René's mmap patch
drops my best-of-five here to 5.114s. Which is nice, but it is a bit
more invasive (and does not help other callers of strbuf_getline).

If I further apply my really nasty patch to avoid the strbuf entirely
(i.e., we parse straight out of the mmap), I can get it down to 4.835s.

I don't know if the complexity is worth it or not. Ultimately, I think
the best route to making packed-refs faster is to drop the whole
ref_cache structure and just iterate directly over the mmap data. It
would use less RAM, and it opens the possibility of binary-searching to
look at only a subset of the entries. That's a _lot_ more invasive,
though.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available
  2015-04-16  9:01   ` [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available Jeff King
@ 2015-04-17 10:16     ` Eric Sunshine
  2015-04-21 23:09       ` Jeff King
  2015-04-22 18:00       ` Johannes Schindelin
  0 siblings, 2 replies; 44+ messages in thread
From: Eric Sunshine @ 2015-04-17 10:16 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, René Scharfe, Rasmus Villemoes

On Thu, Apr 16, 2015 at 5:01 AM, Jeff King <peff@peff.net> wrote:
> We spend a lot of time in strbuf_getwholeline in a tight
> loop reading characters from a stdio handle into a buffer.
> The libc getdelim() function can do this for us with less
> overhead. It's in POSIX.1-2008, and was a GNU extension
> before that. Therefore we can't rely on it, but can fall
> back to the existing getc loop when it is not available.
>
> The HAVE_GETDELIM knob is turned on automatically for Linux,
> where we have glibc. We don't need to set any new
> feature-test macros, because we already define _GNU_SOURCE.
> Other systems that implement getdelim may need to other
> macros (probably _POSIX_C_SOURCE >= 200809L), but we can
> address that along with setting the Makefile knob after
> testing the feature on those systems.
> [...]
>
> Based on a patch from Rasmus Villemoes <rv@rasmusvillemoes.dk>.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> If somebody has a FreeBSD or OS X system to test on, I'd
> love to see what is needed to compile with HAVE_GETDELIM
> there.

Modern Mac OS X, 10.10.x Yosemite, has getdelim() and git builds fine
with HAVE_GETDELIM. I also tested on old Snow Leopard 10.5.8 from
2009. It does not have getdelim(). Unfortunately, I haven't been able
to determine when getdelim() was introduced on the Mac OS X, thus have
been unable to craft a simple rule for config.mak.uname.

> And to confirm that the performance is much better.
> Sharing my 1.6GB packed-refs file would be hard, but you
> should be able to generate something large and ridiculous.
> I'll leave that as an exercise to the reader.
>
> diff --git a/Makefile b/Makefile
> index 5f3987f..36655d5 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -359,6 +359,8 @@ all::
>  # compiler is detected to support it.
>  #
>  # Define HAVE_BSD_SYSCTL if your platform has a BSD-compatible sysctl function.
> +#
> +# Define HAVE_GETDELIM if your system has the getdelim() function.
>
>  GIT-VERSION-FILE: FORCE
>         @$(SHELL_PATH) ./GIT-VERSION-GEN
> @@ -1437,6 +1439,10 @@ ifdef HAVE_BSD_SYSCTL
>         BASIC_CFLAGS += -DHAVE_BSD_SYSCTL
>  endif
>
> +ifdef HAVE_GETDELIM
> +       BASIC_CFLAGS += -DHAVE_GETDELIM
> +endif
> +
>  ifeq ($(TCLTK_PATH),)
>  NO_TCLTK = NoThanks
>  endif
> diff --git a/config.mak.uname b/config.mak.uname
> index f4e77cb..d26665f 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -36,6 +36,7 @@ ifeq ($(uname_S),Linux)
>         HAVE_DEV_TTY = YesPlease
>         HAVE_CLOCK_GETTIME = YesPlease
>         HAVE_CLOCK_MONOTONIC = YesPlease
> +       HAVE_GETDELIM = YesPlease
>  endif
>  ifeq ($(uname_S),GNU/kFreeBSD)
>         HAVE_ALLOCA_H = YesPlease

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available
  2015-04-17 10:16     ` Eric Sunshine
@ 2015-04-21 23:09       ` Jeff King
  2015-05-08 23:56         ` Eric Sunshine
  2015-04-22 18:00       ` Johannes Schindelin
  1 sibling, 1 reply; 44+ messages in thread
From: Jeff King @ 2015-04-21 23:09 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, René Scharfe, Rasmus Villemoes

On Fri, Apr 17, 2015 at 06:16:48AM -0400, Eric Sunshine wrote:

> > If somebody has a FreeBSD or OS X system to test on, I'd
> > love to see what is needed to compile with HAVE_GETDELIM
> > there.
> 
> Modern Mac OS X, 10.10.x Yosemite, has getdelim() and git builds fine
> with HAVE_GETDELIM. I also tested on old Snow Leopard 10.5.8 from
> 2009. It does not have getdelim(). Unfortunately, I haven't been able
> to determine when getdelim() was introduced on the Mac OS X, thus have
> been unable to craft a simple rule for config.mak.uname.

Thanks for looking into it. Since there haven't been any other takers in
the meantime, do you want to prepare a patch that checks $(uname_R) for
10.10.x?  That's likely more conservative than is necessary, but we can
loosen it later if somebody on 10.9.x shows up with test results.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available
  2015-04-17 10:16     ` Eric Sunshine
  2015-04-21 23:09       ` Jeff King
@ 2015-04-22 18:00       ` Johannes Schindelin
  2015-04-22 18:06         ` Jeff King
  1 sibling, 1 reply; 44+ messages in thread
From: Johannes Schindelin @ 2015-04-22 18:00 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Jeff King, Git List, René Scharfe, Rasmus Villemoes

Hi,

On 2015-04-17 12:16, Eric Sunshine wrote:
> On Thu, Apr 16, 2015 at 5:01 AM, Jeff King <peff@peff.net> wrote:
>> We spend a lot of time in strbuf_getwholeline in a tight
>> loop reading characters from a stdio handle into a buffer.
>> The libc getdelim() function can do this for us with less
>> overhead.

Just for the record: Git for Windows cannot lean on `getdelim()`, as it is not available on Windows. Do not let that stop you; if it turns out to impact performance, we will just have to come up with our own implementation of that function.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available
  2015-04-22 18:00       ` Johannes Schindelin
@ 2015-04-22 18:06         ` Jeff King
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff King @ 2015-04-22 18:06 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Eric Sunshine, Git List, René Scharfe, Rasmus Villemoes

On Wed, Apr 22, 2015 at 08:00:55PM +0200, Johannes Schindelin wrote:

> On 2015-04-17 12:16, Eric Sunshine wrote:
> > On Thu, Apr 16, 2015 at 5:01 AM, Jeff King <peff@peff.net> wrote:
> >> We spend a lot of time in strbuf_getwholeline in a tight
> >> loop reading characters from a stdio handle into a buffer.
> >> The libc getdelim() function can do this for us with less
> >> overhead.
> 
> Just for the record: Git for Windows cannot lean on `getdelim()`, as
> it is not available on Windows. Do not let that stop you; if it turns
> out to impact performance, we will just have to come up with our own
> implementation of that function.

Hopefully the earlier patch in the series to avoid locking will help
on Windows. After the end of the series, it isn't used anymore on Linux,
but I kept it in exactly for those less-fortunate systems.

If you can find a Windows equivalent that does the same thing as
getdelim, it should be pretty easy to drop it into an alternate
strbuf_getwholeline implementation (or just provide a compat "getdelim"
if it is close enough to have the same interface).

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available
  2015-04-21 23:09       ` Jeff King
@ 2015-05-08 23:56         ` Eric Sunshine
  2015-05-09  1:09           ` Jeff King
  0 siblings, 1 reply; 44+ messages in thread
From: Eric Sunshine @ 2015-05-08 23:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, René Scharfe, Rasmus Villemoes

On Tue, Apr 21, 2015 at 7:09 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Apr 17, 2015 at 06:16:48AM -0400, Eric Sunshine wrote:
>> > If somebody has a FreeBSD or OS X system to test on, I'd
>> > love to see what is needed to compile with HAVE_GETDELIM
>> > there.
>>
>> Modern Mac OS X, 10.10.x Yosemite, has getdelim() and git builds fine
>> with HAVE_GETDELIM. I also tested on old Snow Leopard 10.5.8 from
>> 2009. It does not have getdelim(). Unfortunately, I haven't been able
>> to determine when getdelim() was introduced on the Mac OS X, thus have
>> been unable to craft a simple rule for config.mak.uname.
>
> Thanks for looking into it. Since there haven't been any other takers in
> the meantime, do you want to prepare a patch that checks $(uname_R) for
> 10.10.x?  That's likely more conservative than is necessary, but we can
> loosen it later if somebody on 10.9.x shows up with test results.

I spent some time downloading old Xcode releases and poking through
the packages. Xcode 3.2.x seems to be the last in the Xcode 3 series,
and none of the Xcode 3.2.x versions I examined carried getdelim().
The first package in which I found getdelim() was Xcode 4.1.
(Unfortunately, Apple doesn't seem to make Xcode 4.0 available for
download anymore or it's only available to paying developers, so I
couldn't check it.) According to Wikipedia[1], Xcode 4.1 was released
the same day as Lion (OS X 10.7 [2]), but was also available to paying
developers for Snow Leopard (OS X 10.6).

Consequently, I think it's safe to say that getdelim() is available
for Lion (10.7) and later. If we don't mind being a bit less
conservative, then we might assume that it also is available for Snow
Leopard (10.6), which it definitely supported, but perhaps that's too
risky, since not everyone would have been a paid subscriber.

Alternately, we could make the test more dynamic and accurate by
grepping stdio.h for 'getdelim' or just by trying a test compile,
though that's probably too expensive.

[1]: http://en.wikipedia.org/wiki/Xcode
[2]: http://en.wikipedia.org/wiki/OS_X

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available
  2015-05-08 23:56         ` Eric Sunshine
@ 2015-05-09  1:09           ` Jeff King
  2015-06-02 18:22             ` Eric Sunshine
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff King @ 2015-05-09  1:09 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Git List, René Scharfe, Rasmus Villemoes

On Fri, May 08, 2015 at 07:56:28PM -0400, Eric Sunshine wrote:

> I spent some time downloading old Xcode releases and poking through
> the packages. Xcode 3.2.x seems to be the last in the Xcode 3 series,
> and none of the Xcode 3.2.x versions I examined carried getdelim().
> The first package in which I found getdelim() was Xcode 4.1.
> (Unfortunately, Apple doesn't seem to make Xcode 4.0 available for
> download anymore or it's only available to paying developers, so I
> couldn't check it.) According to Wikipedia[1], Xcode 4.1 was released
> the same day as Lion (OS X 10.7 [2]), but was also available to paying
> developers for Snow Leopard (OS X 10.6).
> 
> Consequently, I think it's safe to say that getdelim() is available
> for Lion (10.7) and later. If we don't mind being a bit less
> conservative, then we might assume that it also is available for Snow
> Leopard (10.6), which it definitely supported, but perhaps that's too
> risky, since not everyone would have been a paid subscriber.

Thanks for digging. I'd argue for the conservative choice, simply
because this is a pure optimization. The old code should work just fine,
and people have been living with it for years.

I doubt it will affect many people either way, though. Lion is 4 years
old, and most OS X people seem to upgrade fairly regularly. It is not
like long-term server systems where we are supporting Solaris 7. :)

Want to roll a patch?

> Alternately, we could make the test more dynamic and accurate by
> grepping stdio.h for 'getdelim' or just by trying a test compile,
> though that's probably too expensive.

The natural place would be in configure.ac, and that is orthogonal to
the default Darwin setting, I think.

-Peff

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available
  2015-05-09  1:09           ` Jeff King
@ 2015-06-02 18:22             ` Eric Sunshine
  0 siblings, 0 replies; 44+ messages in thread
From: Eric Sunshine @ 2015-06-02 18:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Git List, René Scharfe, Rasmus Villemoes

On Fri, May 8, 2015 at 9:09 PM, Jeff King <peff@peff.net> wrote:
> On Fri, May 08, 2015 at 07:56:28PM -0400, Eric Sunshine wrote:
>> I spent some time downloading old Xcode releases and poking through
>> the packages. Xcode 3.2.x seems to be the last in the Xcode 3 series,
>> and none of the Xcode 3.2.x versions I examined carried getdelim().
>> The first package in which I found getdelim() was Xcode 4.1.
>> (Unfortunately, Apple doesn't seem to make Xcode 4.0 available for
>> download anymore or it's only available to paying developers, so I
>> couldn't check it.) According to Wikipedia[1], Xcode 4.1 was released
>> the same day as Lion (OS X 10.7 [2]), but was also available to paying
>> developers for Snow Leopard (OS X 10.6).
>>
>> Consequently, I think it's safe to say that getdelim() is available
>> for Lion (10.7) and later. If we don't mind being a bit less
>> conservative, then we might assume that it also is available for Snow
>> Leopard (10.6), which it definitely supported, but perhaps that's too
>> risky, since not everyone would have been a paid subscriber.
>
> Thanks for digging. I'd argue for the conservative choice, simply
> because this is a pure optimization. The old code should work just fine,
> and people have been living with it for years.
>
> I doubt it will affect many people either way, though. Lion is 4 years
> old, and most OS X people seem to upgrade fairly regularly. It is not
> like long-term server systems where we are supporting Solaris 7. :)
>
> Want to roll a patch?

After a long, long delay, here it is...[1]

>> Alternately, we could make the test more dynamic and accurate by
>> grepping stdio.h for 'getdelim' or just by trying a test compile,
>> though that's probably too expensive.
>
> The natural place would be in configure.ac, and that is orthogonal to
> the default Darwin setting, I think.

I added that too[1].

[1]: http://thread.gmane.org/gmane.comp.version-control.git/270576

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2015-06-02 18:22 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-05  1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King
2015-04-05  1:07 ` [PATCH 1/6] strbuf_getwholeline: use getc macro Jeff King
2015-04-05  1:08 ` [PATCH 2/6] git-compat-util: add fallbacks for unlocked stdio Jeff King
2015-04-05  1:11 ` [PATCH 3/6] strbuf_getwholeline: use getc_unlocked Jeff King
2015-04-05  4:56   ` Jeff King
2015-04-05  5:27     ` Jeff King
2015-04-05  5:35       ` Jeff King
2015-04-05 20:49         ` Junio C Hamano
2015-04-05 14:36     ` Duy Nguyen
2015-04-05 18:24       ` Jeff King
2015-04-05 20:09     ` Junio C Hamano
2015-04-07 13:48     ` Rasmus Villemoes
2015-04-07 19:04       ` Jeff King
2015-04-07 22:43         ` Rasmus Villemoes
2015-04-08  0:17           ` Jeff King
2015-04-05  1:11 ` [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow Jeff King
2015-04-06  2:13   ` Eric Sunshine
2015-04-06  5:05     ` Jeff King
2015-04-05  1:11 ` [PATCH 5/6] t1430: add another refs-escape test Jeff King
2015-04-05  1:15 ` [PATCH 6/6] refname_is_safe: avoid expensive normalize_path_copy call Jeff King
2015-04-05 13:41 ` [PATCH 0/6] address packed-refs speed regressions René Scharfe
2015-04-05 18:52   ` Jeff King
2015-04-05 18:59     ` Jeff King
2015-04-05 23:04       ` René Scharfe
2015-04-05 22:39     ` René Scharfe
2015-04-06  4:49       ` Jeff King
2015-04-16  8:47 ` [PATCH v2 0/9] " Jeff King
2015-04-16  8:48   ` [PATCH 1/9] strbuf_getwholeline: use getc macro Jeff King
2015-04-16  8:48   ` [PATCH 2/9] git-compat-util: add fallbacks for unlocked stdio Jeff King
2015-04-16  8:49   ` [PATCH 3/9] strbuf_getwholeline: use getc_unlocked Jeff King
2015-04-16  8:51   ` [PATCH 4/9] config: use getc_unlocked when reading from file Jeff King
2015-04-16  8:53   ` [PATCH 5/9] strbuf_addch: avoid calling strbuf_grow Jeff King
2015-04-16  8:58   ` [PATCH 6/9] strbuf_getwholeline: " Jeff King
2015-04-16  9:01   ` [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available Jeff King
2015-04-17 10:16     ` Eric Sunshine
2015-04-21 23:09       ` Jeff King
2015-05-08 23:56         ` Eric Sunshine
2015-05-09  1:09           ` Jeff King
2015-06-02 18:22             ` Eric Sunshine
2015-04-22 18:00       ` Johannes Schindelin
2015-04-22 18:06         ` Jeff King
2015-04-16  9:03   ` [PATCH 8/9] read_packed_refs: avoid double-checking sane refs Jeff King
2015-04-16  9:04   ` [PATCH 9/9] t1430: add another refs-escape test Jeff King
2015-04-16  9:25   ` [PATCH v2 0/9] address packed-refs speed regressions Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.