* [PATCH 0/6] address packed-refs speed regressions @ 2015-04-05 1:06 Jeff King 2015-04-05 1:07 ` [PATCH 1/6] strbuf_getwholeline: use getc macro Jeff King ` (7 more replies) 0 siblings, 8 replies; 44+ messages in thread From: Jeff King @ 2015-04-05 1:06 UTC (permalink / raw) To: git As I've mentioned before, I have some repositories with rather large numbers of refs. The worst one has ~13 million refs, for a 1.6GB packed-refs file. So I was saddened by this: $ time git.v2.0.0 rev-parse refs/heads/foo >/dev/null 2>&1 real 0m6.840s user 0m6.404s sys 0m0.440s $ time git.v2.4.0-rc1 rev-parse refs/heads/foo >/dev/null 2>&1 real 0m19.432s user 0m18.996s sys 0m0.456s The command isn't important; what I'm really measuring is loading the packed-refs file. And yes, of course this repository is absolutely ridiculous. But the slowdowns here are linear with the number of refs. So _every_ git command got a little bit slower, even in less crazy repositories. We just didn't notice it as much. Here are the numbers after this series: real 0m8.539s user 0m8.052s sys 0m0.496s Much better, but I'm frustrated that they are still 20% slower than the original. The main culprits seem to be d0f810f (which introduced some extra expensive code for each ref) and my 10c497a, which switched from fgets() to strbuf_getwholeline. It turns out that strbuf_getwholeline is really slow. There may be other problems lurking to account for the remaining 20%. It's hard to find performance regressions with a bisection if there are multiple of them; if you stop at a random commit and it is 500ms slow, it is hard to tell which problem is causing it. Note that while these are regressions, they are in v2.2.0 and v2.2.2 respectively. So this can wait until post-2.4. [1/6]: strbuf_getwholeline: use getc macro [2/6]: git-compat-util: add fallbacks for unlocked stdio [3/6]: strbuf_getwholeline: use get_unlocked [4/6]: strbuf: add an optimized 1-character strbuf_grow [5/6]: t1430: add another refs-escape test [6/6]: refname_is_safe: avoid expensive normalize_path_copy call -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 1/6] strbuf_getwholeline: use getc macro 2015-04-05 1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King @ 2015-04-05 1:07 ` Jeff King 2015-04-05 1:08 ` [PATCH 2/6] git-compat-util: add fallbacks for unlocked stdio Jeff King ` (6 subsequent siblings) 7 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-05 1:07 UTC (permalink / raw) To: git strbuf_getwholeline calls fgetc in a tight loop. Using the getc form, which can be implemented as a macro, should be faster (and we do not care about it evaluating our argument twice, as we just have a plain variable). On my glibc system, running "git rev-parse refs/heads/does-not-exist" on a file with an extremely large (1.6GB) packed-refs file went from (best of 3 runs): real 0m19.383s user 0m18.876s sys 0m0.528s to: real 0m18.900s user 0m18.472s sys 0m0.448s for a wall-clock speedup of 2.5%. Signed-off-by: Jeff King <peff@peff.net> --- Not that exciting a speedup. But later we will switch to getc_unlocked, and I wanted to measure how much of that was coming from the macro, and how much from the locking. strbuf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/strbuf.c b/strbuf.c index 88cafd4..14f337d 100644 --- a/strbuf.c +++ b/strbuf.c @@ -443,7 +443,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) return EOF; strbuf_reset(sb); - while ((ch = fgetc(fp)) != EOF) { + while ((ch = getc(fp)) != EOF) { strbuf_grow(sb, 1); sb->buf[sb->len++] = ch; if (ch == term) -- 2.4.0.rc0.363.gf9f328b ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/6] git-compat-util: add fallbacks for unlocked stdio 2015-04-05 1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King 2015-04-05 1:07 ` [PATCH 1/6] strbuf_getwholeline: use getc macro Jeff King @ 2015-04-05 1:08 ` Jeff King 2015-04-05 1:11 ` [PATCH 3/6] strbuf_getwholeline: use getc_unlocked Jeff King ` (5 subsequent siblings) 7 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-05 1:08 UTC (permalink / raw) To: git POSIX.1-2001 specifies some functions for optimizing the locking out of tight getc() loops. Not all systems are POSIX, though, and even not all POSIX systems are required to implement these functions. We can check for the feature-test macro to see if they are available, and if not, provide a noop implementation. There's no Makefile knob here, because we should just detect this automatically. If there are very bizarre systems, we may need to add one, but it's not clear yet in which direction: 1. If a system defines _POSIX_THREAD_SAFE_FUNCTIONS but these functions are missing or broken, we would want a knob to manually turn them off. 2. If a system has these functions but does not define _POSIX_THREAD_SAFE_FUNCTIONS, we would want a knob to manually turn them on. We can add such a knob when we find a real-world system that matches this. Signed-off-by: Jeff King <peff@peff.net> --- git-compat-util.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/git-compat-util.h b/git-compat-util.h index bc8fc8c..685a0a4 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -883,4 +883,10 @@ struct tm *git_gmtime_r(const time_t *, struct tm *); # define SHELL_PATH "/bin/sh" #endif +#ifndef _POSIX_THREAD_SAFE_FUNCTIONS +#define flockfile(fh) +#define funlockfile(fh) +#define getc_unlocked(fh) getc(fh) +#endif + #endif -- 2.4.0.rc0.363.gf9f328b ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-05 1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King 2015-04-05 1:07 ` [PATCH 1/6] strbuf_getwholeline: use getc macro Jeff King 2015-04-05 1:08 ` [PATCH 2/6] git-compat-util: add fallbacks for unlocked stdio Jeff King @ 2015-04-05 1:11 ` Jeff King 2015-04-05 4:56 ` Jeff King 2015-04-05 1:11 ` [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow Jeff King ` (4 subsequent siblings) 7 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2015-04-05 1:11 UTC (permalink / raw) To: git strbuf_getwholeline calls getc in a tight loop. On modern libc implementations, the stdio code locks the handle for every operation, which means we are paying a significant overhead. We can get around this by locking the handle for the whole loop and using the unlocked variant. Running "git rev-parse refs/heads/does-not-exist" on a repo with an extremely large (1.6GB) packed-refs file went from: real 0m18.900s user 0m18.472s sys 0m0.448s to: real 0m10.953s user 0m10.384s sys 0m0.580s for a wall-clock speedup of 42%. All times are best-of-3, and done on a glibc 2.19 system. Note that we call into strbuf_grow while holding the lock. It's possible for that function to call other stdio functions (e.g., printing to stderr when dying due to malloc error); however, the POSIX.1-2001 definition of flockfile makes it clear that the locks are per-handle, so we are fine unless somebody else tries to read from our same handle. This doesn't ever happen in the current code, and is unlikely to be added in the future (we would have to do something exotic like add a die_routine that tried to read from stdin). Signed-off-by: Jeff King <peff@peff.net> --- I don't think the complexity is worth it, but if we wanted to be more careful about the locks, I think it would probably involve growing the buffer, locking, doing unlocked reads until it's full, and then unlocking for the next round of growth. I also considered optimizing the "term == '\n'" case by using fgets, but it gets rather complex (you have to pick a size, fgets into it, and then keep going if you didn't get a newline). Also, fgets sucks, because you have to call strlen() immediately after to find out how many bytes you got! strbuf.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/strbuf.c b/strbuf.c index 14f337d..af2bad4 100644 --- a/strbuf.c +++ b/strbuf.c @@ -443,12 +443,14 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) return EOF; strbuf_reset(sb); - while ((ch = getc(fp)) != EOF) { + flockfile(fp); + while ((ch = getc_unlocked(fp)) != EOF) { strbuf_grow(sb, 1); sb->buf[sb->len++] = ch; if (ch == term) break; } + funlockfile(fp); if (ch == EOF && sb->len == 0) return EOF; -- 2.4.0.rc0.363.gf9f328b ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-05 1:11 ` [PATCH 3/6] strbuf_getwholeline: use getc_unlocked Jeff King @ 2015-04-05 4:56 ` Jeff King 2015-04-05 5:27 ` Jeff King ` (3 more replies) 0 siblings, 4 replies; 44+ messages in thread From: Jeff King @ 2015-04-05 4:56 UTC (permalink / raw) To: git On Sat, Apr 04, 2015 at 09:11:10PM -0400, Jeff King wrote: > I also considered optimizing the "term == '\n'" case by using fgets, but > it gets rather complex (you have to pick a size, fgets into it, and then > keep going if you didn't get a newline). Also, fgets sucks, because you > have to call strlen() immediately after to find out how many bytes you > got! My initial attempt at this had been to _just_ use fgets, but the optimization becomes much simpler if you just do an initial fgets, and then follow up with character reads. In most cases, the initial fgets is big enough to get the whole line. I.e., doing: diff --git a/strbuf.c b/strbuf.c index 2facd5f..f319d8d 100644 --- a/strbuf.c +++ b/strbuf.c @@ -443,6 +443,18 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) return EOF; strbuf_reset(sb); + + if (term == '\n') { + strbuf_grow(sb, 256); + if (!fgets(sb->buf, sb->alloc - 1, fp)) { + strbuf_release(sb); + return EOF; + } + sb->len = strlen(sb->buf); + if (sb->buf[sb->len - 1] == '\n') + return 0; + } + flockfile(fp); while ((ch = getc_unlocked(fp)) != EOF) { strbuf_grow_ch(sb); on top of the series drops me from: real 0m8.573s user 0m8.072s sys 0m0.508s to: real 0m6.671s user 0m6.216s sys 0m0.460s which is back to the v2.0.0 number. Even with the extra strlen, it seems that what fgets does internally beats repeated getc calls. Which I guess is not too surprising, as each getc() will have to check for underflow in the buffer. Perhaps there is more room to micro-optimize strbuf_getwholeline, but I kind of doubt it. The big downside is that our input strings are no longer NUL-clean (reading "foo\0bar\n" would yield just "foo". I doubt that matters in the real world, but it does fail a few of the tests (e.g., t7008 tries to read a list of patterns which includes NUL, and we silently truncate the pattern rather than read in the NUL and barf). So we'd have to either: 1. Decide that doesn't matter. 2. Have callers specify a "damn the NULs, I want it fast" flag. 3. Find some alternative that is more robust than fgets, and faster than getc. I don't think there is anything in stdio, but I am not above dropping in a faster non-portable call if it is available, and then falling back to the current code otherwise. -Peff ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-05 4:56 ` Jeff King @ 2015-04-05 5:27 ` Jeff King 2015-04-05 5:35 ` Jeff King 2015-04-05 14:36 ` Duy Nguyen ` (2 subsequent siblings) 3 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2015-04-05 5:27 UTC (permalink / raw) To: git On Sun, Apr 05, 2015 at 12:56:14AM -0400, Jeff King wrote: > The big downside is that our input strings are no longer NUL-clean > (reading "foo\0bar\n" would yield just "foo". I doubt that matters in > the real world, but it does fail a few of the tests (e.g., t7008 tries > to read a list of patterns which includes NUL, and we silently truncate > the pattern rather than read in the NUL and barf). So there is this trick: diff --git a/strbuf.c b/strbuf.c index f319d8d..5ceebb7 100644 --- a/strbuf.c +++ b/strbuf.c @@ -445,12 +445,13 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) strbuf_reset(sb); if (term == '\n') { + long pos = ftell(fp); strbuf_grow(sb, 256); if (!fgets(sb->buf, sb->alloc - 1, fp)) { strbuf_release(sb); return EOF; } - sb->len = strlen(sb->buf); + sb->len = ftell(fp) - pos; if (sb->buf[sb->len - 1] == '\n') return 0; } but much to my surprise it actually runs slower than the strlen version! It also has a 32-bit overflow issue. There's fgetpos() as an alternative, but fpos_t is an opaque type, and we might not be able to do arithmetic on it (for that matter, I am not sure if arithmetic is strictly guaranteed on ftell() results). POSIX gives us ftello(), which returns an off_t. That would probably be fine. The ftello() version seems slower than the strlen, but faster than ftell(). Puzzling. -Peff ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-05 5:27 ` Jeff King @ 2015-04-05 5:35 ` Jeff King 2015-04-05 20:49 ` Junio C Hamano 0 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2015-04-05 5:35 UTC (permalink / raw) To: git On Sun, Apr 05, 2015 at 01:27:32AM -0400, Jeff King wrote: > On Sun, Apr 05, 2015 at 12:56:14AM -0400, Jeff King wrote: > > > The big downside is that our input strings are no longer NUL-clean > > (reading "foo\0bar\n" would yield just "foo". I doubt that matters in > > the real world, but it does fail a few of the tests (e.g., t7008 tries > > to read a list of patterns which includes NUL, and we silently truncate > > the pattern rather than read in the NUL and barf). > > So there is this trick: > > diff --git a/strbuf.c b/strbuf.c > index f319d8d..5ceebb7 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -445,12 +445,13 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > strbuf_reset(sb); > > if (term == '\n') { > + long pos = ftell(fp); > strbuf_grow(sb, 256); > if (!fgets(sb->buf, sb->alloc - 1, fp)) { > strbuf_release(sb); > return EOF; > } > - sb->len = strlen(sb->buf); > + sb->len = ftell(fp) - pos; > if (sb->buf[sb->len - 1] == '\n') > return 0; > } > > but much to my surprise it actually runs slower than the strlen version! > It also has a 32-bit overflow issue. There's fgetpos() as an > alternative, but fpos_t is an opaque type, and we might not be able to > do arithmetic on it (for that matter, I am not sure if arithmetic is > strictly guaranteed on ftell() results). POSIX gives us ftello(), which > returns an off_t. That would probably be fine. Actually, scratch that idea. ftell() always returns 0 on a non-seekable file, so we can't use it in the general case. And that probably explains the performance difference, too, if it is not keeping its own counter and relies on lseek(fileno(fp)) or similar. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-05 5:35 ` Jeff King @ 2015-04-05 20:49 ` Junio C Hamano 0 siblings, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2015-04-05 20:49 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > On Sun, Apr 05, 2015 at 01:27:32AM -0400, Jeff King wrote: > >> On Sun, Apr 05, 2015 at 12:56:14AM -0400, Jeff King wrote: >> >> > The big downside is that our input strings are no longer NUL-clean >> > (reading "foo\0bar\n" would yield just "foo". I doubt that matters in >> > the real world, but it does fail a few of the tests (e.g., t7008 tries >> > to read a list of patterns which includes NUL, and we silently truncate >> > the pattern rather than read in the NUL and barf). >> >> So there is this trick: >> >> diff --git a/strbuf.c b/strbuf.c >> index f319d8d..5ceebb7 100644 >> --- a/strbuf.c >> +++ b/strbuf.c >> @@ -445,12 +445,13 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) >> strbuf_reset(sb); >> >> if (term == '\n') { >> + long pos = ftell(fp); >> strbuf_grow(sb, 256); >> if (!fgets(sb->buf, sb->alloc - 1, fp)) { >> strbuf_release(sb); >> return EOF; >> } >> - sb->len = strlen(sb->buf); >> + sb->len = ftell(fp) - pos; >> if (sb->buf[sb->len - 1] == '\n') >> return 0; >> } >> >> but much to my surprise it actually runs slower than the strlen version! The later loop you have "oops, the thing turns out to be longer than we thought, so let's do byte-by-byte" is protected with locking, but this part is not, and that suggests me that the ftell-fgets-ftell sequence we see here may have its own locking cost built-in in the stdio library, too, perhaps? >> It also has a 32-bit overflow issue. There's fgetpos() as an >> alternative, but fpos_t is an opaque type, and we might not be able to >> do arithmetic on it (for that matter, I am not sure if arithmetic is >> strictly guaranteed on ftell() results). POSIX gives us ftello(), which >> returns an off_t. That would probably be fine. > > Actually, scratch that idea. ftell() always returns 0 on a non-seekable > file, so we can't use it in the general case. And that probably explains > the performance difference, too, if it is not keeping its own counter > and relies on lseek(fileno(fp)) or similar. Looked so promising, though ;-) X-<. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-05 4:56 ` Jeff King 2015-04-05 5:27 ` Jeff King @ 2015-04-05 14:36 ` Duy Nguyen 2015-04-05 18:24 ` Jeff King 2015-04-05 20:09 ` Junio C Hamano 2015-04-07 13:48 ` Rasmus Villemoes 3 siblings, 1 reply; 44+ messages in thread From: Duy Nguyen @ 2015-04-05 14:36 UTC (permalink / raw) To: Jeff King; +Cc: Git Mailing List On Sun, Apr 5, 2015 at 11:56 AM, Jeff King <peff@peff.net> wrote: > So we'd have to either: > > 1. Decide that doesn't matter. > > 2. Have callers specify a "damn the NULs, I want it fast" flag. 2+. Avoid FILE* interface and go with syscalls for reading packed-refs? If mmaping the entire file could be a problem for some platform because it's too large, we have code for reading (with bufferring) from fd somewhere, e.g. index-pack. > 3. Find some alternative that is more robust than fgets, and faster > than getc. I don't think there is anything in stdio, but I am not > above dropping in a faster non-portable call if it is available, > and then falling back to the current code otherwise. -- Duy ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-05 14:36 ` Duy Nguyen @ 2015-04-05 18:24 ` Jeff King 0 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-05 18:24 UTC (permalink / raw) To: Duy Nguyen; +Cc: Git Mailing List On Sun, Apr 05, 2015 at 09:36:04PM +0700, Duy Nguyen wrote: > On Sun, Apr 5, 2015 at 11:56 AM, Jeff King <peff@peff.net> wrote: > > So we'd have to either: > > > > 1. Decide that doesn't matter. > > > > 2. Have callers specify a "damn the NULs, I want it fast" flag. > > 2+. Avoid FILE* interface and go with syscalls for reading > packed-refs? If mmaping the entire file could be a problem for some > platform because it's too large, we have code for reading (with > bufferring) from fd somewhere, e.g. index-pack. There's strbuf_getwholeline_fd, but it's horrifically inefficient (one syscall per character). But the other option is to implement your own buffering, and we're generally better off letting stdio do that for us (the exception here is that stdio does not have a good NUL-safe "read until X" function). -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-05 4:56 ` Jeff King 2015-04-05 5:27 ` Jeff King 2015-04-05 14:36 ` Duy Nguyen @ 2015-04-05 20:09 ` Junio C Hamano 2015-04-07 13:48 ` Rasmus Villemoes 3 siblings, 0 replies; 44+ messages in thread From: Junio C Hamano @ 2015-04-05 20:09 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> writes: > So we'd have to either: > > 1. Decide that doesn't matter. > > 2. Have callers specify a "damn the NULs, I want it fast" flag. The callers that used to call fgets and then later rewritten to strbuf_getwholeline(), either of the above obviously should be OK, and because the whole reason why we added strbuf_getline() interface was to avoid having to repeatedly call fgets() and concatenate the result if the fixed-size buffer we would give it is too small, I'd say the callers that want to read lines terminated by LF and have NUL as part of payload would be a tiny minority. It depends on what we would find out after auditing all callers of this function, but I would not be surprised if we decided #1 (i.e. "this is about a _line_; what are you doing by having a NUL on it?"), and the safest would be to do the inverse of #2, i.e. make it fast by default and make oddball callers that care about NULs to pass a flag. Thanks for working on this. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-05 4:56 ` Jeff King ` (2 preceding siblings ...) 2015-04-05 20:09 ` Junio C Hamano @ 2015-04-07 13:48 ` Rasmus Villemoes 2015-04-07 19:04 ` Jeff King 3 siblings, 1 reply; 44+ messages in thread From: Rasmus Villemoes @ 2015-04-07 13:48 UTC (permalink / raw) To: Jeff King; +Cc: git On Sun, Apr 05 2015, Jeff King <peff@peff.net> wrote: > which is back to the v2.0.0 number. Even with the extra strlen, it seems > that what fgets does internally beats repeated getc calls. Which I guess > is not too surprising, as each getc() will have to check for underflow > in the buffer. Perhaps there is more room to micro-optimize > strbuf_getwholeline, but I kind of doubt it. > > The big downside is that our input strings are no longer NUL-clean > (reading "foo\0bar\n" would yield just "foo". I doubt that matters in > the real world, but it does fail a few of the tests (e.g., t7008 tries > to read a list of patterns which includes NUL, and we silently truncate > the pattern rather than read in the NUL and barf). > > So we'd have to either: > > 1. Decide that doesn't matter. > > 2. Have callers specify a "damn the NULs, I want it fast" flag. > > 3. Find some alternative that is more robust than fgets, and faster > than getc. I don't think there is anything in stdio, but I am not > above dropping in a faster non-portable call if it is available, > and then falling back to the current code otherwise. getdelim is in POSIX 2008 (http://pubs.opengroup.org/stage7tc1/functions/getdelim.html), so should be available on any half-{d,r}ecent platform. It obviously has the advantage of having access to the internal stdio buffer, and by definition handles embedded NULs. No idea if using such modern interfaces in git is ok, though. Implementation-wise, I think strbuf_getwholeline could be implemented mostly as a simple wrapper for getdelim. If I'm reading the current code and the posix spec for getdelim correctly, something like this should do it (though obviously not meant to be included as-is): diff --git a/strbuf.c b/strbuf.c index 0346e74..d3338b9 100644 --- a/strbuf.c +++ b/strbuf.c @@ -434,6 +434,32 @@ int strbuf_getcwd(struct strbuf *sb) return -1; } +#if USE_GETDELIM +/* Hacky declaration to avoid messing with feature test macros. */ +ssize_t getdelim(char **, size_t *, int, FILE *); +int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) +{ + ssize_t r; + + if (feof(fp)) + return EOF; + + strbuf_reset(sb); + if (!sb->alloc) + sb->buf = NULL; + + r = getdelim(&sb->buf, &sb->alloc, term, fp); + + if (r > 0) { + sb->len = r; + return 0; + } + assert(r == -1); + if (!sb->buf) + strbuf_init(sb); + return EOF; +} +#else int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) { int ch; @@ -454,6 +480,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) sb->buf[sb->len] = '\0'; return 0; } +#endif int strbuf_getline(struct strbuf *sb, FILE *fp, int term) { Rasmus ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-07 13:48 ` Rasmus Villemoes @ 2015-04-07 19:04 ` Jeff King 2015-04-07 22:43 ` Rasmus Villemoes 0 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2015-04-07 19:04 UTC (permalink / raw) To: Rasmus Villemoes; +Cc: git On Tue, Apr 07, 2015 at 03:48:33PM +0200, Rasmus Villemoes wrote: > > 3. Find some alternative that is more robust than fgets, and faster > > than getc. I don't think there is anything in stdio, but I am not > > above dropping in a faster non-portable call if it is available, > > and then falling back to the current code otherwise. > > getdelim is in POSIX 2008 > (http://pubs.opengroup.org/stage7tc1/functions/getdelim.html), so should > be available on any half-{d,r}ecent platform. It obviously has the > advantage of having access to the internal stdio buffer, and by > definition handles embedded NULs. No idea if using such modern > interfaces in git is ok, though. Thanks, that's perfect. I knew about getline(), but not getdelim(), and I had thought getline() unconditionally malloc'd. But it doesn't; it behaves exactly as we are already doing here. :-/ > Implementation-wise, I think strbuf_getwholeline could be implemented > mostly as a simple wrapper for getdelim. If I'm reading the current code > and the posix spec for getdelim correctly, something like this should do > it (though obviously not meant to be included as-is): I think it's close to what we want. strbuf_grow calls xrealloc, which will call try_to_free_routine() and possibly die() for us. So we would probably want to check errno on failure and respond similarly if we get ENOMEM. Your patch performs even faster than my fgets version (about 8%). I wonder if it is even worth doing the getc_unlocked dance at all. It would potentially speed up the fallback code, but my hope that would be that most systems would simply use the getdelim() version. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-07 19:04 ` Jeff King @ 2015-04-07 22:43 ` Rasmus Villemoes 2015-04-08 0:17 ` Jeff King 0 siblings, 1 reply; 44+ messages in thread From: Rasmus Villemoes @ 2015-04-07 22:43 UTC (permalink / raw) To: Jeff King; +Cc: git On Tue, Apr 07 2015, Jeff King <peff@peff.net> wrote: > On Tue, Apr 07, 2015 at 03:48:33PM +0200, Rasmus Villemoes wrote: > >> Implementation-wise, I think strbuf_getwholeline could be implemented >> mostly as a simple wrapper for getdelim. If I'm reading the current code >> and the posix spec for getdelim correctly, something like this should do >> it (though obviously not meant to be included as-is): > > I think it's close to what we want. strbuf_grow calls xrealloc, which > will call try_to_free_routine() and possibly die() for us. So we would > probably want to check errno on failure and respond similarly if we get > ENOMEM. Hm, I'm afraid it's not that simple. It seems that data may be lost from the stream if getdelim encounters ENOMEM: Looking at the glibc implementation (libio/iogetdelim.c), if reallocating the user buffer fails, -1 is returned (presumably with errno==ENOMEM set by realloc), and there's no way of knowing how many bytes were already copied to the buffer (cur_len). For regular files, I suppose one could do a ftell/fseek dance. For other cases, I don't see a reliable way to retry upon ENOMEM. Related thread on the posix mailing list: http://thread.gmane.org/gmane.comp.standards.posix.austin.general/10091 > I wonder if it is even worth doing the getc_unlocked dance at all. It > would potentially speed up the fallback code, but my hope that would be > that most systems would simply use the getdelim() version. Well, it's not really an intrusive patch, and 42% speedup is rather significant. And, of course, the above ENOMEM issue may mean that getdelim isn't usable after all. Rasmus ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 3/6] strbuf_getwholeline: use getc_unlocked 2015-04-07 22:43 ` Rasmus Villemoes @ 2015-04-08 0:17 ` Jeff King 0 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-08 0:17 UTC (permalink / raw) To: Rasmus Villemoes; +Cc: git On Wed, Apr 08, 2015 at 12:43:09AM +0200, Rasmus Villemoes wrote: > Hm, I'm afraid it's not that simple. It seems that data may be lost from > the stream if getdelim encounters ENOMEM: Looking at the glibc > implementation (libio/iogetdelim.c), if reallocating the user buffer > fails, -1 is returned (presumably with errno==ENOMEM set by realloc), and > there's no way of knowing how many bytes were already copied to the > buffer (cur_len). > > For regular files, I suppose one could do a ftell/fseek dance. For > other cases, I don't see a reliable way to retry upon ENOMEM. Ah, right, that makes sense. I think it may be OK to just `die()` on ENOMEM here. The "try to free" routine is basically about unmapping pack memory, in case it is clogging up the address space (it is mmap'd, so it shouldn't cause real RAM pressure; the OS can just drop the pages from the cache if it wants). But getdelim() calls are not likely to be big allocations in the first place (so they are not likely to fail at all, and if they do, we really are horribly out of memory). I'd also question whether release_pack_memory is doing anything in practice on 64-bit machines. We have quite a bit of address space to work with in that case. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow 2015-04-05 1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King ` (2 preceding siblings ...) 2015-04-05 1:11 ` [PATCH 3/6] strbuf_getwholeline: use getc_unlocked Jeff King @ 2015-04-05 1:11 ` Jeff King 2015-04-06 2:13 ` Eric Sunshine 2015-04-05 1:11 ` [PATCH 5/6] t1430: add another refs-escape test Jeff King ` (3 subsequent siblings) 7 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2015-04-05 1:11 UTC (permalink / raw) To: git We have to call strbuf_grow anytime we are going to add data to a strbuf. In most cases, it's a noop (since we grow the buffer aggressively), and the cost of the function call and size check is dwarfed by the actual buffer operation. For a tight loop of single-character additions, though, this overhead is noticeable. Furthermore, the single-character case is much easier to check; since the "extra" parameter is 1, we can do it without worrying about overflow. This patch adds a simple inline function for checking single-character growth. For the growth case, it just calls into the regular strbuf_grow(). This is redundant, as strbuf_grow will check again whether we need to grow. But it keeps our inline code simple, and most calls will not need to grow, so it's OK to treat this as a rare "slow path". We apply the new function to strbuf_getwholeline. Running "git rev-parse refs/heads/does-not-exist" on a repo with an extremely large (1.6GB) packed-refs file went from (best-of-3): real 0m10.953s user 0m10.384s sys 0m0.580s to: real 0m8.910s user 0m8.452s sys 0m0.468s for a wall-clock speedup of 18%. Signed-off-by: Jeff King <peff@peff.net> --- strbuf.c | 2 +- strbuf.h | 9 +++++++++ 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/strbuf.c b/strbuf.c index af2bad4..2facd5f 100644 --- a/strbuf.c +++ b/strbuf.c @@ -445,7 +445,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) strbuf_reset(sb); flockfile(fp); while ((ch = getc_unlocked(fp)) != EOF) { - strbuf_grow(sb, 1); + strbuf_grow_ch(sb); sb->buf[sb->len++] = ch; if (ch == term) break; diff --git a/strbuf.h b/strbuf.h index 1883494..ef41151 100644 --- a/strbuf.h +++ b/strbuf.h @@ -137,6 +137,15 @@ static inline size_t strbuf_avail(const struct strbuf *sb) */ extern void strbuf_grow(struct strbuf *, size_t); +/* + * An optimized version of strbuf_grow() for a single character. + */ +static inline void strbuf_grow_ch(struct strbuf *sb) +{ + if (!sb->alloc || sb->alloc - 1 <= sb->len) + strbuf_grow(sb, 1); +} + /** * Set the length of the buffer to a given value. This function does *not* * allocate new memory, so you should not perform a `strbuf_setlen()` to a -- 2.4.0.rc0.363.gf9f328b ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow 2015-04-05 1:11 ` [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow Jeff King @ 2015-04-06 2:13 ` Eric Sunshine 2015-04-06 5:05 ` Jeff King 0 siblings, 1 reply; 44+ messages in thread From: Eric Sunshine @ 2015-04-06 2:13 UTC (permalink / raw) To: Jeff King; +Cc: Git List On Sat, Apr 4, 2015 at 9:11 PM, Jeff King <peff@peff.net> wrote: > We have to call strbuf_grow anytime we are going to add data > to a strbuf. In most cases, it's a noop (since we grow the > buffer aggressively), and the cost of the function call and > size check is dwarfed by the actual buffer operation. > > For a tight loop of single-character additions, though, this > overhead is noticeable. Furthermore, the single-character > case is much easier to check; since the "extra" parameter is > 1, we can do it without worrying about overflow. > > This patch adds a simple inline function for checking > single-character growth. For the growth case, it just calls > into the regular strbuf_grow(). This is redundant, as > strbuf_grow will check again whether we need to grow. But it > keeps our inline code simple, and most calls will not need > to grow, so it's OK to treat this as a rare "slow path". > > We apply the new function to strbuf_getwholeline. [...] > > Signed-off-by: Jeff King <peff@peff.net> > --- > diff --git a/strbuf.c b/strbuf.c > index af2bad4..2facd5f 100644 > --- a/strbuf.c > +++ b/strbuf.c > @@ -445,7 +445,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) > strbuf_reset(sb); > flockfile(fp); > while ((ch = getc_unlocked(fp)) != EOF) { > - strbuf_grow(sb, 1); > + strbuf_grow_ch(sb); strbuf_grow_ch() seems overly special-case. What about instead taking advantage of inline strbuf_avail() to do something like this? if (!strbuf_avail()) strbuf_grow(sb, 1); (Minor tangent: The 1 is still slightly magical and potentially confusing for someone who doesn't know that the buffer is grown aggressively, so changing it to a larger number might make it more obvious to the casual reader that the buffer is in fact not being grown on every iteration.) > sb->buf[sb->len++] = ch; > if (ch == term) > break; > diff --git a/strbuf.h b/strbuf.h > index 1883494..ef41151 100644 > --- a/strbuf.h > +++ b/strbuf.h > @@ -137,6 +137,15 @@ static inline size_t strbuf_avail(const struct strbuf *sb) > */ > extern void strbuf_grow(struct strbuf *, size_t); > > +/* > + * An optimized version of strbuf_grow() for a single character. > + */ > +static inline void strbuf_grow_ch(struct strbuf *sb) > +{ > + if (!sb->alloc || sb->alloc - 1 <= sb->len) > + strbuf_grow(sb, 1); > +} > + > /** > * Set the length of the buffer to a given value. This function does *not* > * allocate new memory, so you should not perform a `strbuf_setlen()` to a > -- > 2.4.0.rc0.363.gf9f328b ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow 2015-04-06 2:13 ` Eric Sunshine @ 2015-04-06 5:05 ` Jeff King 0 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-06 5:05 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List On Sun, Apr 05, 2015 at 10:13:21PM -0400, Eric Sunshine wrote: > > - strbuf_grow(sb, 1); > > + strbuf_grow_ch(sb); > > strbuf_grow_ch() seems overly special-case. What about instead taking > advantage of inline strbuf_avail() to do something like this? > > if (!strbuf_avail()) > strbuf_grow(sb, 1); Thanks, I somehow missed that function (despite it being a few line above the one I added!). I agree that strbuf_avail is a much better generic interface, and it turns out to be just as fast (actually, a tiny bit faster in my tests). I'll use that in the re-roll. > (Minor tangent: The 1 is still slightly magical and potentially > confusing for someone who doesn't know that the buffer is grown > aggressively, so changing it to a larger number might make it more > obvious to the casual reader that the buffer is in fact not being > grown on every iteration.) I agree this is slightly confusing (and I had to double-check how strbuf_grow worked while writing this series). OTOH, this is not so much about the "1" here as about how strbufs work. We care about the amortized asymptotic cost. strbuf_add() has the same issue; we add more bytes in each chunk, but we would still want to make sure that there is a sub-linear relationship between the number of adds and the number of allocations). -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 5/6] t1430: add another refs-escape test 2015-04-05 1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King ` (3 preceding siblings ...) 2015-04-05 1:11 ` [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow Jeff King @ 2015-04-05 1:11 ` Jeff King 2015-04-05 1:15 ` [PATCH 6/6] refname_is_safe: avoid expensive normalize_path_copy call Jeff King ` (2 subsequent siblings) 7 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-05 1:11 UTC (permalink / raw) To: git In t1430, we check whether deleting the branch "../../foo" will delete ".git/foo". However, this is not that interesting a test; the precious file ".git/foo" does not look like a ref, so even if we did not notice the "escape" from the "refs/" hierarchy, we would fail for that reason (i.e., if you turned refname_is_safe into a noop, the test still passes). Let's add an additional test for the same thing, but with a file that actually looks like a ref. That will make sure we are exercising the refname_is_safe code. While we're at it, let's also make the code work a little harder by adding some extra paths and some empty path components. Signed-off-by: Jeff King <peff@peff.net> --- t/t1430-bad-ref-name.sh | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh index 468e856..16d0b8b 100755 --- a/t/t1430-bad-ref-name.sh +++ b/t/t1430-bad-ref-name.sh @@ -68,6 +68,14 @@ test_expect_success 'branch -D cannot delete non-ref in .git dir' ' test_cmp expect .git/my-private-file ' +test_expect_success 'branch -D cannot delete ref in .git dir' ' + git rev-parse HEAD >.git/my-private-file && + git rev-parse HEAD >expect && + git branch foo/legit && + test_must_fail git branch -D foo////./././../../../my-private-file && + test_cmp expect .git/my-private-file +' + test_expect_success 'branch -D cannot delete absolute path' ' git branch -f extra && test_must_fail git branch -D "$(pwd)/.git/refs/heads/extra" && -- 2.4.0.rc0.363.gf9f328b ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 6/6] refname_is_safe: avoid expensive normalize_path_copy call 2015-04-05 1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King ` (4 preceding siblings ...) 2015-04-05 1:11 ` [PATCH 5/6] t1430: add another refs-escape test Jeff King @ 2015-04-05 1:15 ` Jeff King 2015-04-05 13:41 ` [PATCH 0/6] address packed-refs speed regressions René Scharfe 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King 7 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-05 1:15 UTC (permalink / raw) To: git Usually refs are not allowed to contain a ".." component. However, since d0f810f (refs.c: allow listing and deleting badly named refs, 2014-09-03), we relax these rules in some cases in order to help users examine and get rid of badly-named refs. However, we do still check that these refs cannot "escape" the refs hierarchy (e.g., "refs/../foo"). This check is implemented by calling normalize_path_copy, which requires us to allocate a new buffer to hold the result. But we don't care about the result; we only care whether the "up" components outnumbered the "down". We can therefore implement this check ourselves without requiring any extra allocations. With this patch, running "git rev-parse refs/heads/does-not-exist" on a repo with large (1.6GB) packed-refs file went from: real 0m8.910s user 0m8.452s sys 0m0.468s to: real 0m8.529s user 0m8.044s sys 0m0.492s for a wall-clock speedup of 4%. Signed-off-by: Jeff King <peff@peff.net> --- This was a lot less than I was hoping for, especially considering that going from d0f810f^ to d0f810f is more like a 15% slowdown (or in absolute numbers, ~1.1s versus only 400ms here). What's doubly confusing is that I think we were running check_ref_format before d0f810f, which does way more than the check we're doing here. So we should have ended up faster than either. So this is certainly _an_ improvement, but I think there may be more going on. cache.h | 7 +++++++ path.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ refs.c | 16 ++-------------- 3 files changed, 73 insertions(+), 14 deletions(-) diff --git a/cache.h b/cache.h index 3d3244b..c2b3df4 100644 --- a/cache.h +++ b/cache.h @@ -836,6 +836,13 @@ char *strip_path_suffix(const char *path, const char *suffix); int daemon_avoid_alias(const char *path); extern int is_ntfs_dotgit(const char *name); +/** + * Returns 1 if the path "escapes" its root (e.g., "foo/../../bar") and 0 + * otherwise. Note that this is purely textual, and does not care about + * symlinks or other aspects of the filesystem. + */ +int check_path_escape(const char *path); + /* object replacement */ #define LOOKUP_REPLACE_OBJECT 1 extern void *read_sha1_file_extended(const unsigned char *sha1, enum object_type *type, unsigned long *size, unsigned flag); diff --git a/path.c b/path.c index e608993..1127cbd 100644 --- a/path.c +++ b/path.c @@ -703,6 +703,70 @@ int normalize_path_copy(char *dst, const char *src) } /* + * We want to detect a path that "escapes" its root. The general strategy + * is to parse components left to right, keeping track of our depth, + * which is increased by non-empty components and decreased by ".." + * components. + */ +int check_path_escape(const char *path) +{ + int depth = 0; + + while (*path) { + char ch = *path++; + + /* + * We always start our loop at the beginning of a path component. So + * we can skip past any dir separators. This handles leading + * "/", as well as any internal "////". + */ + if (is_dir_sep(ch)) + continue; + + /* + * If we start with a dot, we care about the four cases + * (similar to normalize_path_copy above): + * + * (1) "." - does not affect depth; we are done + * (2) "./" - does not affect depth; skip + * (3) ".." - check depth and finish + * (4) "../" - drop depth, check, and keep looking + */ + if (ch == '.') { + ch = *path++; + + if (!ch) + return 1; /* case (1) */ + if (is_dir_sep(ch)) + continue; /* case (2) */ + if (ch == '.') { + ch = *path++; + if (!ch) + return depth > 0; /* case (3) */ + if (is_dir_sep(ch)) { + /* case (4) */ + if (--depth < 0) + return 0; + continue; + } + /* otherwise, "..foo"; fall through */ + } + /* otherwise ".foo"; fall through */ + } + + /* + * We have a real component; inrement the depth and eat the + * rest of the component + */ + depth++; + while (*path && !is_dir_sep(*path)) + path++; + } + + return 1; +} + +/* * path = Canonical absolute path * prefixes = string_list containing normalized, absolute paths without * trailing slashes (except for the root directory, which is denoted by "/"). diff --git a/refs.c b/refs.c index 47e4e53..daca499 100644 --- a/refs.c +++ b/refs.c @@ -312,20 +312,8 @@ static struct ref_dir *get_ref_dir(struct ref_entry *entry) */ static int refname_is_safe(const char *refname) { - if (starts_with(refname, "refs/")) { - char *buf; - int result; - - buf = xmalloc(strlen(refname) + 1); - /* - * Does the refname try to escape refs/? - * For example: refs/foo/../bar is safe but refs/foo/../../bar - * is not. - */ - result = !normalize_path_copy(buf, refname + strlen("refs/")); - free(buf); - return result; - } + if (skip_prefix(refname, "refs/", &refname)) + return check_path_escape(refname); while (*refname) { if (!isupper(*refname) && *refname != '_') return 0; -- 2.4.0.rc0.363.gf9f328b ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 0/6] address packed-refs speed regressions 2015-04-05 1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King ` (5 preceding siblings ...) 2015-04-05 1:15 ` [PATCH 6/6] refname_is_safe: avoid expensive normalize_path_copy call Jeff King @ 2015-04-05 13:41 ` René Scharfe 2015-04-05 18:52 ` Jeff King 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King 7 siblings, 1 reply; 44+ messages in thread From: René Scharfe @ 2015-04-05 13:41 UTC (permalink / raw) To: Jeff King; +Cc: git, Michael Haggerty Am 05.04.2015 um 03:06 schrieb Jeff King: > As I've mentioned before, I have some repositories with rather large > numbers of refs. The worst one has ~13 million refs, for a 1.6GB > packed-refs file. So I was saddened by this: > > $ time git.v2.0.0 rev-parse refs/heads/foo >/dev/null 2>&1 > real 0m6.840s > user 0m6.404s > sys 0m0.440s > > $ time git.v2.4.0-rc1 rev-parse refs/heads/foo >/dev/null 2>&1 > real 0m19.432s > user 0m18.996s > sys 0m0.456s > > The command isn't important; what I'm really measuring is loading the > packed-refs file. And yes, of course this repository is absolutely > ridiculous. But the slowdowns here are linear with the number of refs. > So _every_ git command got a little bit slower, even in less crazy > repositories. We just didn't notice it as much. > > Here are the numbers after this series: > > real 0m8.539s > user 0m8.052s > sys 0m0.496s > > Much better, but I'm frustrated that they are still 20% slower than the > original. > > The main culprits seem to be d0f810f (which introduced some extra > expensive code for each ref) and my 10c497a, which switched from fgets() > to strbuf_getwholeline. It turns out that strbuf_getwholeline is really > slow. 10c497a changed read_packed_refs(), which reads *all* packed refs. Each is checked for validity. That sounds expensive if the goal is just to look up a single (non-existing) ref. Would it help to defer any checks until a ref is actually accessed? Can a binary search be used instead of reading the whole file? I wonder if pluggable reference backends could help here. Storing refs in a database table indexed by refname should simplify things. Short-term, can we avoid the getc()/strbuf_grow() dance e.g. by mapping the packed refs file? What numbers do you get with the following patch? --- refs.c | 36 ++++++++++++++++++++++++++++-------- 1 file changed, 28 insertions(+), 8 deletions(-) diff --git a/refs.c b/refs.c index 47e4e53..144255f 100644 --- a/refs.c +++ b/refs.c @@ -1153,16 +1153,35 @@ static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1) * compatibility with older clients, but we do not require it * (i.e., "peeled" is a no-op if "fully-peeled" is set). */ -static void read_packed_refs(FILE *f, struct ref_dir *dir) +static void read_packed_refs(int fd, struct ref_dir *dir) { struct ref_entry *last = NULL; struct strbuf line = STRBUF_INIT; enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE; + struct stat st; + void *map; + size_t mapsz, len; + const char *p; + + fstat(fd, &st); + mapsz = xsize_t(st.st_size); + if (!mapsz) + return; + map = xmmap(NULL, mapsz, PROT_READ, MAP_PRIVATE, fd, 0); - while (strbuf_getwholeline(&line, f, '\n') != EOF) { + for (p = map, len = mapsz; len; ) { unsigned char sha1[20]; const char *refname; const char *traits; + const char *nl; + size_t linelen; + + nl = memchr(p, '\n', len); + linelen = nl ? nl - p + 1 : len; + strbuf_reset(&line); + strbuf_add(&line, p, linelen); + p += linelen; + len -= linelen; if (skip_prefix(line.buf, "# pack-refs with:", &traits)) { if (strstr(traits, " fully-peeled ")) @@ -1204,6 +1223,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir) } strbuf_release(&line); + munmap(map, mapsz); } /* @@ -1224,16 +1244,16 @@ static struct packed_ref_cache *get_packed_ref_cache(struct ref_cache *refs) clear_packed_ref_cache(refs); if (!refs->packed) { - FILE *f; + int fd; refs->packed = xcalloc(1, sizeof(*refs->packed)); acquire_packed_ref_cache(refs->packed); refs->packed->root = create_dir_entry(refs, "", 0, 0); - f = fopen(packed_refs_file, "r"); - if (f) { - stat_validity_update(&refs->packed->validity, fileno(f)); - read_packed_refs(f, get_ref_dir(refs->packed->root)); - fclose(f); + fd = open(packed_refs_file, O_RDONLY); + if (fd >= 0) { + stat_validity_update(&refs->packed->validity, fd); + read_packed_refs(fd, get_ref_dir(refs->packed->root)); + close(fd); } } return refs->packed; -- 2.3.5 ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 0/6] address packed-refs speed regressions 2015-04-05 13:41 ` [PATCH 0/6] address packed-refs speed regressions René Scharfe @ 2015-04-05 18:52 ` Jeff King 2015-04-05 18:59 ` Jeff King 2015-04-05 22:39 ` René Scharfe 0 siblings, 2 replies; 44+ messages in thread From: Jeff King @ 2015-04-05 18:52 UTC (permalink / raw) To: René Scharfe; +Cc: git, Michael Haggerty On Sun, Apr 05, 2015 at 03:41:39PM +0200, René Scharfe wrote: > > The main culprits seem to be d0f810f (which introduced some extra > > expensive code for each ref) and my 10c497a, which switched from fgets() > > to strbuf_getwholeline. It turns out that strbuf_getwholeline is really > > slow. > > 10c497a changed read_packed_refs(), which reads *all* packed refs. > Each is checked for validity. That sounds expensive if the goal is > just to look up a single (non-existing) ref. > > Would it help to defer any checks until a ref is actually accessed? > Can a binary search be used instead of reading the whole file? Yes, but addressing that is much more invasive. Right now we parse all of the packed-refs file into an in-memory cache, and then do single lookups from that cache. Doing an mmap() and a binary search is way faster (and costs less memory) for doing individual lookups. It relies on the list being sorted. This is generally true, but not something we currently rely on (however, it would be easy to add a "sorted" flag to top of the file and have the readers fall back when the flag is missing). I've played with a patch to do this (it's not entirely trivial, because you jump into the middle of a line, and then have to walk backwards to find the start of the record). For traversals, it's more complicated. Obviously if you are traversing all refs, you have to read the whole thing anyway. If you are traversing a subset of the refs, you can binary-search the start of the subset, and then walk forward. But that's where it gets tricky with the current code. The ref_cache code expects to fill in from outer to inner. So if you have "refs/foo", you should also have filled in all of "refs/" (but not necessarily "refs/bar"). This matches the way we traverse loose ref directories; we opendir "refs/", find out that it has "foo" and "bar", and the descend into "foo", and so forth. But reading a subset of the packed-ref file is "inside out". You fill in all of "refs/foo", but you have no idea what else is in "refs/". So going in that direction would involve some surgery to the ref_cache code. It might even involve throwing it out entirely (i.e., just mmap the packed-refs file and look through it directly, without any kind of in-memory cache; we don't tend to do more than one ref-iteration per program anyway, so I'm not sure the caching is buying us much anyway). My big concern there would be that there are a lot of subtle race issues between packed and loose refs, and the current state is the result of a lot of tweaking. I'd be worried that a heavy rewrite there would risk introducing subtle and rare corruptions. Plus it would be a lot of work, which leads me to... > I wonder if pluggable reference backends could help here. Storing refs > in a database table indexed by refname should simplify things. ...this. I think that effort might be better spent on a ref storage format that's more efficient, simpler (with respect to subtle races and such), and could provide other features (e.g., transactional atomicity). The big plus side of packed-refs improvements is that they "just work" without worrying about compatibility issues. But ref storage is local, so I'm not sure how big a deal that is in practice. > Short-term, can we avoid the getc()/strbuf_grow() dance e.g. by mapping > the packed refs file? What numbers do you get with the following patch? It's about 9% faster than my series + the fgets optimization I posted (or about 25% than using getc). Which is certainly nice, but I was really hoping to just make strbuf_getline faster for all callers, rather than introducing special code for one call-site. Certainly we could generalize the technique (i.e., a struct with the mmap data), but then I feel we are somewhat reinventing stdio. Which is maybe a good thing, because stdio has a lot of rough edges (as seen here), but it does feel a bit like NIH syndrome. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/6] address packed-refs speed regressions 2015-04-05 18:52 ` Jeff King @ 2015-04-05 18:59 ` Jeff King 2015-04-05 23:04 ` René Scharfe 2015-04-05 22:39 ` René Scharfe 1 sibling, 1 reply; 44+ messages in thread From: Jeff King @ 2015-04-05 18:59 UTC (permalink / raw) To: René Scharfe; +Cc: git, Michael Haggerty On Sun, Apr 05, 2015 at 02:52:59PM -0400, Jeff King wrote: > Right now we parse all of the packed-refs file into an in-memory cache, > and then do single lookups from that cache. Doing an mmap() and a binary > search is way faster (and costs less memory) for doing individual > lookups. It relies on the list being sorted. This is generally true, but > not something we currently rely on (however, it would be easy to add a > "sorted" flag to top of the file and have the readers fall back when the > flag is missing). I've played with a patch to do this (it's not entirely > trivial, because you jump into the middle of a line, and then have to > walk backwards to find the start of the record). > > For traversals, it's more complicated. Obviously if you are traversing > all refs, you have to read the whole thing anyway. If you are traversing > a subset of the refs, you can binary-search the start of the subset, and > then walk forward. But that's where it gets tricky with the current > code. In case you are curious, here is my proof-of-concept for the packed-refs binary search. You'll note that it's a separate program, and not integrated into refs.c. I wrote this last August, and after trying to integrate it into refs.c, I found the ref_cache problems I described, and I haven't touched it since. I also seem to have saved the patch for stuffing it into refs.c, but I am not sure if it even compiles (I wrote only "horrible wip" in the commit message ;) ). -- >8 -- Subject: [PATCH] add git-quick-list This is a proof of concept for binary-searching the packed-refs file in order to traverse an ordered subset of it. Note that it _only_ reads the packed-refs file currently. To really compare to for-each-ref, it would need to also walk the loose ref area for its prefix. On a mostly-packed repository that shouldn't make a big speed difference, though. And of course we don't _really_ want a separate command here at all. This should be part of refs.c, and everyone who calls for_each_ref should benefit from it. Still, the numbers are promising. Here's are comparisons against for-each-ref on torvalds/linux, which has a 218M packed-refs file: $ time git for-each-ref \ --format='%(objectname) %(refname)' \ refs/remotes/2325298/ | wc -c 44139 real 0m1.649s user 0m1.332s sys 0m0.304s $ time ~peff/git-quick-list refs/remotes/2325298/ | wc -c 44139 real 0m0.012s user 0m0.004s sys 0m0.004s --- Makefile | 1 + quick-list.c | 174 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 175 insertions(+) create mode 100644 quick-list.c diff --git a/Makefile b/Makefile index 2457065..aa32598 100644 --- a/Makefile +++ b/Makefile @@ -541,6 +541,7 @@ PROGRAM_OBJS += shell.o PROGRAM_OBJS += show-index.o PROGRAM_OBJS += upload-pack.o PROGRAM_OBJS += remote-testsvn.o +PROGRAM_OBJS += quick-list.o # Binary suffix, set to .exe for Windows builds X = diff --git a/quick-list.c b/quick-list.c new file mode 100644 index 0000000..e423f1f --- /dev/null +++ b/quick-list.c @@ -0,0 +1,174 @@ +#include "cache.h" +#include "refs.h" + +struct packed_refs_iterator { + const char *start; + const char *end; + + const char *cur; + const char *ref; + const char *eol; + const char *next; +}; + +static void iterator_init(struct packed_refs_iterator *pos, + const char *buf, size_t len) +{ + pos->start = buf; + pos->end = buf + len; + + /* skip past header line */ + if (pos->start < pos->end && *pos->start == '#') { + while (pos->start < pos->end && *pos->start != '\n') + pos->start++; + if (pos->start < pos->end) + pos->start++; + } +} + +static int iterator_cmp(const char *key, struct packed_refs_iterator *pos) +{ + const char *ref = pos->ref; + for (; *key && ref < pos->eol; key++, ref++) + if (*key != *ref) + return (unsigned char)*key - (unsigned char)*ref; + return ref == pos->eol ? *key ? 1 : 0 : -1; +} + +static const char *find_eol(const char *p, const char *end) +{ + p = memchr(p, '\n', end - p); + return p ? p : end; +} + +static void parse_line(struct packed_refs_iterator *pos, const char *p) +{ + pos->cur = p; + if (pos->end - p < 41) + die("truncated packed-refs file"); + p += 41; + + pos->ref = p; + pos->eol = p = find_eol(p, pos->end); + + /* skip newline, and then past any peel records */ + if (p < pos->end) + p++; + while (p < pos->end && *p == '^') { + p = find_eol(p, pos->end); + if (p < pos->end) + p++; + } + pos->next = p; +} + +static void iterator_next(struct packed_refs_iterator *pos) +{ + if (pos->next < pos->end) + parse_line(pos, pos->next); + else + pos->cur = NULL; +} + +static void iterator_start(struct packed_refs_iterator *pos, const char *prefix) +{ + const char *lo = pos->start, *hi = pos->end; + + while (lo < hi) { + const char *mi = lo + ((hi - lo) / 2); + int cmp; + + /* + * We landed somewhere on a line. Walk back to find + * the start of the line. + */ + while (mi > lo && *(mi-1) != '\n') + mi--; + + /* + * We may have hit a peel-line. In that case, try + * to walk back to the actual ref line (and skip as + * many peel lines as we find, for future-proofing). + */ + while (*mi == '^') { + if (mi == lo) + die("peel line without a record before it?"); + mi--; + if (mi == lo) + die("peel line with bare newline before it?"); + mi--; + while (mi > lo && *(mi-1) != '\n') + mi--; + } + + /* Now we should be at a real ref line. */ + parse_line(pos, mi); + cmp = iterator_cmp(prefix, pos); + if (!cmp) + return; + else if (cmp < 0) + hi = pos->cur; + else + lo = pos->next; + } + + if (hi < pos->end) + parse_line(pos, hi); + else + pos->cur = NULL; +} + +static void quick_list(const char *prefix, each_ref_fn fn, void *data) +{ + int fd = open(git_path("packed-refs"), O_RDONLY); + struct stat st; + const char *buf = NULL; + size_t len; + struct packed_refs_iterator pos; + + if (fd < 0) + goto out; + if (fstat(fd, &st) < 0) + goto out; + len = xsize_t(st.st_size); + buf = xmmap(NULL, len, PROT_READ, MAP_PRIVATE, fd, 0); + if (!buf) + goto out; + + iterator_init(&pos, buf, len); + for (iterator_start(&pos, prefix); + pos.cur && starts_with(pos.ref, prefix); + iterator_next(&pos)) { + unsigned char sha1[20]; + char *refname; + + if (get_sha1_hex(pos.cur, sha1) < 0) + die("packed-refs contained invalid sha1"); + refname = xmemdupz(pos.ref, pos.eol - pos.ref); + fn(refname, sha1, 0, data); + free(refname); + } + +out: + close(fd); + if (buf) + munmap((void *)buf, len); +} + +static int show_ref(const char *refname, const unsigned char *sha1, + int flags, void *data) +{ + printf("%s %s\n", sha1_to_hex(sha1), refname); + return 0; +} + +int main(int argc, char **argv) +{ + if (argc != 2) + usage("git quick-list <prefix>"); + + setup_git_directory(); + quick_list(argv[1], show_ref, NULL); + + return 0; +} -- 2.4.0.rc0.363.gf9f328b ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 0/6] address packed-refs speed regressions 2015-04-05 18:59 ` Jeff King @ 2015-04-05 23:04 ` René Scharfe 0 siblings, 0 replies; 44+ messages in thread From: René Scharfe @ 2015-04-05 23:04 UTC (permalink / raw) To: Jeff King; +Cc: git, Michael Haggerty Am 05.04.2015 um 20:59 schrieb Jeff King: > Still, the numbers are promising. Here's are comparisons > against for-each-ref on torvalds/linux, which has a 218M > packed-refs file: > > $ time git for-each-ref \ > --format='%(objectname) %(refname)' \ > refs/remotes/2325298/ | > wc -c > 44139 > > real 0m1.649s > user 0m1.332s > sys 0m0.304s > > $ time ~peff/git-quick-list refs/remotes/2325298/ | wc -c > 44139 > > real 0m0.012s > user 0m0.004s > sys 0m0.004s Sweet numbers. :-P I'm not familiar with refs.c, but its sheer size alone suggests that it won't be easy to integrate this prototype code there. :-/ René ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/6] address packed-refs speed regressions 2015-04-05 18:52 ` Jeff King 2015-04-05 18:59 ` Jeff King @ 2015-04-05 22:39 ` René Scharfe 2015-04-06 4:49 ` Jeff King 1 sibling, 1 reply; 44+ messages in thread From: René Scharfe @ 2015-04-05 22:39 UTC (permalink / raw) To: Jeff King; +Cc: git, Michael Haggerty Am 05.04.2015 um 20:52 schrieb Jeff King: > On Sun, Apr 05, 2015 at 03:41:39PM +0200, René Scharfe wrote: >> I wonder if pluggable reference backends could help here. Storing refs >> in a database table indexed by refname should simplify things. > > ...this. I think that effort might be better spent on a ref storage > format that's more efficient, simpler (with respect to subtle races and > such), and could provide other features (e.g., transactional atomicity). Such as a DBMS? :-) Leaving storage details to SQLite or whatever sounds attractive to me because I'm lazy. > The big plus side of packed-refs improvements is that they "just work" > without worrying about compatibility issues. But ref storage is local, > so I'm not sure how big a deal that is in practice. Adding a dependency is a big step, admittedly, so native improvements might be a better fit. There's a chance that we'd run into issues already solved by specialized database engines long ago, though. >> Short-term, can we avoid the getc()/strbuf_grow() dance e.g. by mapping >> the packed refs file? What numbers do you get with the following patch? > > It's about 9% faster than my series + the fgets optimization I posted > (or about 25% than using getc). Which is certainly nice, but I was > really hoping to just make strbuf_getline faster for all callers, rather > than introducing special code for one call-site. Certainly we could > generalize the technique (i.e., a struct with the mmap data), but then I > feel we are somewhat reinventing stdio. Which is maybe a good thing, > because stdio has a lot of rough edges (as seen here), but it does feel > a bit like NIH syndrome. Forgot to say: I like your changes. But if strbuf_getline can only be made fast enough beyond that by duplicating stdio buffering then I feel it's better to take a different way. E.g. dropping the requirement to handle NUL chars and basing it on fgets as Junio suggested in his reply to patch 3 sounds good. In any case, the packed refs file seems special enough to receive special treatment. Using mmap would make the most sense if we could also avoid copying lines to a strbuf for parsing, though. René ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 0/6] address packed-refs speed regressions 2015-04-05 22:39 ` René Scharfe @ 2015-04-06 4:49 ` Jeff King 0 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-06 4:49 UTC (permalink / raw) To: René Scharfe; +Cc: git, Michael Haggerty On Mon, Apr 06, 2015 at 12:39:15AM +0200, René Scharfe wrote: > >...this. I think that effort might be better spent on a ref storage > >format that's more efficient, simpler (with respect to subtle races and > >such), and could provide other features (e.g., transactional atomicity). > > Such as a DBMS? :-) Leaving storage details to SQLite or whatever sounds > attractive to me because I'm lazy. Exactly. Though I think some folks were worried about the extra dependency (e.g., I think SQLite is hard for JGit, because there's no pure-java implementation, which makes Eclipse unhappy). With pluggable backends we can make something like a SQLite backend optional. I.e., use it if you want the benefits and can accept the portability downsides. But that also risks fracturing the community, and people on the "old" format being left behind. > Forgot to say: I like your changes. But if strbuf_getline can only be made > fast enough beyond that by duplicating stdio buffering then I feel it's > better to take a different way. E.g. dropping the requirement to handle NUL > chars and basing it on fgets as Junio suggested in his reply to patch 3 > sounds good. Yeah, though we probably need to either audit the callers, or provide a flag for each caller to turn on the speed-over-NULs behavior. I'll look into that, but it may not be this week, as I'll be traveling starting tomorrow. > In any case, the packed refs file seems special enough to receive special > treatment. Using mmap would make the most sense if we could also avoid > copying lines to a strbuf for parsing, though. I had a similar thought. Below is hacky patch, on top of your mmap patch, that does this. It does shave off another 300ms (around 5%). I think we may be getting into a useless area of micro-optimizing here, though. The results are noticeable on this ridiculous repository, but probably not so much on real ones. The low-hanging fruit (e.g., dropping time in half by using getc_unlocked) seems to provide the most bang for the buck. --- diff --git a/refs.c b/refs.c index 144255f..708b49b 100644 --- a/refs.c +++ b/refs.c @@ -334,27 +334,40 @@ static int refname_is_safe(const char *refname) return 1; } -static struct ref_entry *create_ref_entry(const char *refname, - const unsigned char *sha1, int flag, - int check_name) +static struct ref_entry *create_ref_entry_len(const char *refname, size_t len, + const unsigned char *sha1, int flag, + int check_name) { - int len; struct ref_entry *ref; + /* + * allocate before checking, since the check functions require + * a NUL-terminated refname. And since we die() anyway if + * the check fails, the overhead of the extra malloc is negligible + */ + ref = xmalloc(sizeof(struct ref_entry) + len + 1); + hashcpy(ref->u.value.sha1, sha1); + hashclr(ref->u.value.peeled); + memcpy(ref->name, refname, len); + ref->name[len] = '\0'; + ref->flag = flag; + if (check_name && check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) die("Reference has invalid format: '%s'", refname); if (!check_name && !refname_is_safe(refname)) die("Reference has invalid name: '%s'", refname); - len = strlen(refname) + 1; - ref = xmalloc(sizeof(struct ref_entry) + len); - hashcpy(ref->u.value.sha1, sha1); - hashclr(ref->u.value.peeled); - memcpy(ref->name, refname, len); - ref->flag = flag; return ref; } +static struct ref_entry *create_ref_entry(const char *refname, + const unsigned char *sha1, int flag, + int check_name) +{ + return create_ref_entry_len(refname, strlen(refname), sha1, flag, + check_name); +} + static void clear_ref_dir(struct ref_dir *dir); static void free_ref_entry(struct ref_entry *entry) @@ -1095,7 +1108,9 @@ static const char PACKED_REFS_HEADER[] = * Return a pointer to the refname within the line (null-terminated), * or NULL if there was a problem. */ -static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1) +static const char *parse_ref_line(const char *line, int len, + unsigned char *sha1, + size_t *refname_len) { const char *ref; @@ -1107,22 +1122,22 @@ static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1) * +1 (space in between hex and name) * +1 (newline at the end of the line) */ - if (line->len <= 42) + if (len <= 42) return NULL; - if (get_sha1_hex(line->buf, sha1) < 0) + if (get_sha1_hex(line, sha1) < 0) return NULL; - if (!isspace(line->buf[40])) + if (!isspace(line[40])) return NULL; - ref = line->buf + 41; + ref = line + 41; if (isspace(*ref)) return NULL; - if (line->buf[line->len - 1] != '\n') + if (line[len - 1] != '\n') return NULL; - line->buf[--line->len] = 0; + *refname_len = len - (ref - line) - 1; return ref; } @@ -1156,7 +1171,6 @@ static const char *parse_ref_line(struct strbuf *line, unsigned char *sha1) static void read_packed_refs(int fd, struct ref_dir *dir) { struct ref_entry *last = NULL; - struct strbuf line = STRBUF_INIT; enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE; struct stat st; void *map; @@ -1172,18 +1186,20 @@ static void read_packed_refs(int fd, struct ref_dir *dir) for (p = map, len = mapsz; len; ) { unsigned char sha1[20]; const char *refname; + size_t refname_len; const char *traits; const char *nl; + const char *line; size_t linelen; nl = memchr(p, '\n', len); + line = p; linelen = nl ? nl - p + 1 : len; - strbuf_reset(&line); - strbuf_add(&line, p, linelen); p += linelen; len -= linelen; - if (skip_prefix(line.buf, "# pack-refs with:", &traits)) { + /* XXX these should take care not to look past linelen */ + if (skip_prefix(line, "# pack-refs with:", &traits)) { if (strstr(traits, " fully-peeled ")) peeled = PEELED_FULLY; else if (strstr(traits, " peeled ")) @@ -1192,7 +1208,7 @@ static void read_packed_refs(int fd, struct ref_dir *dir) continue; } - refname = parse_ref_line(&line, sha1); + refname = parse_ref_line(line, linelen, sha1, &refname_len); if (refname) { int flag = REF_ISPACKED; @@ -1200,7 +1216,7 @@ static void read_packed_refs(int fd, struct ref_dir *dir) hashclr(sha1); flag |= REF_BAD_NAME | REF_ISBROKEN; } - last = create_ref_entry(refname, sha1, flag, 0); + last = create_ref_entry_len(refname, len, sha1, flag, 0); if (peeled == PEELED_FULLY || (peeled == PEELED_TAGS && starts_with(refname, "refs/tags/"))) last->flag |= REF_KNOWS_PEELED; @@ -1208,10 +1224,10 @@ static void read_packed_refs(int fd, struct ref_dir *dir) continue; } if (last && - line.buf[0] == '^' && - line.len == PEELED_LINE_LENGTH && - line.buf[PEELED_LINE_LENGTH - 1] == '\n' && - !get_sha1_hex(line.buf + 1, sha1)) { + line[0] == '^' && + linelen == PEELED_LINE_LENGTH && + line[PEELED_LINE_LENGTH - 1] == '\n' && + !get_sha1_hex(line + 1, sha1)) { hashcpy(last->u.value.peeled, sha1); /* * Regardless of what the file header said, @@ -1222,7 +1238,6 @@ static void read_packed_refs(int fd, struct ref_dir *dir) } } - strbuf_release(&line); munmap(map, mapsz); } ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v2 0/9] address packed-refs speed regressions 2015-04-05 1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King ` (6 preceding siblings ...) 2015-04-05 13:41 ` [PATCH 0/6] address packed-refs speed regressions René Scharfe @ 2015-04-16 8:47 ` Jeff King 2015-04-16 8:48 ` [PATCH 1/9] strbuf_getwholeline: use getc macro Jeff King ` (9 more replies) 7 siblings, 10 replies; 44+ messages in thread From: Jeff King @ 2015-04-16 8:47 UTC (permalink / raw) To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine On Sat, Apr 04, 2015 at 09:06:11PM -0400, Jeff King wrote: > As I've mentioned before, I have some repositories with rather large > numbers of refs. The worst one has ~13 million refs, for a 1.6GB > packed-refs file. So I was saddened by this: > > $ time git.v2.0.0 rev-parse refs/heads/foo >/dev/null 2>&1 > real 0m6.840s > user 0m6.404s > sys 0m0.440s > > $ time git.v2.4.0-rc1 rev-parse refs/heads/foo >/dev/null 2>&1 > real 0m19.432s > user 0m18.996s > sys 0m0.456s Here's a re-roll incorporating feedback from the list. Thanks everybody for your comments. Last time the final number was ~8.5s, which was disappointingly slower than v2.0.0. In this iteration, my final numbers are: real 0m5.703s user 0m5.276s sys 0m0.432s which is quite pleasing. The big changes that resulted in this additional speedup are: 1. Use getdelim() when it is available. This is much faster than even a getc_unlocked() loop. 2. The slowdown from d0f810f was from adding in refname_is_safe calls. But what I didn't notice before is that we run them in _addition_ to check_refname_format, rather than instead of it. So in the common case of a sanely-formatted refname, we can skip the call, rather than writing a lot of code to micro-optimize it. It was also mentioned in a nearby thread that the config code could benefit from some of the same micro-optimizations. It can't make use of getdelim(), as it really does want to do character-by-character parsing. But it can still use getc_unlocked() and the strbuf_avail() trick, which speeds up config reading by 47%. Those patches are included here. [1/9]: strbuf_getwholeline: use getc macro [2/9]: git-compat-util: add fallbacks for unlocked stdio [3/9]: strbuf_getwholeline: use getc_unlocked [4/9]: config: use getc_unlocked when reading from file [5/9]: strbuf_addch: avoid calling strbuf_grow [6/9]: strbuf_getwholeline: avoid calling strbuf_grow [7/9]: strbuf_getwholeline: use getdelim if it is available [8/9]: read_packed_refs: avoid double-checking sane refs [9/9]: t1430: add another refs-escape test -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 1/9] strbuf_getwholeline: use getc macro 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King @ 2015-04-16 8:48 ` Jeff King 2015-04-16 8:48 ` [PATCH 2/9] git-compat-util: add fallbacks for unlocked stdio Jeff King ` (8 subsequent siblings) 9 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-16 8:48 UTC (permalink / raw) To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine strbuf_getwholeline calls fgetc in a tight loop. Using the getc form, which can be implemented as a macro, should be faster (and we do not care about it evaluating our argument twice, as we just have a plain variable). On my glibc system, running "git rev-parse refs/heads/does-not-exist" on a file with an extremely large (1.6GB) packed-refs file went from (best of 3 runs): real 0m19.383s user 0m18.876s sys 0m0.528s to: real 0m18.900s user 0m18.472s sys 0m0.448s for a wall-clock speedup of 2.5%. Signed-off-by: Jeff King <peff@peff.net> --- strbuf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/strbuf.c b/strbuf.c index 88cafd4..14f337d 100644 --- a/strbuf.c +++ b/strbuf.c @@ -443,7 +443,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) return EOF; strbuf_reset(sb); - while ((ch = fgetc(fp)) != EOF) { + while ((ch = getc(fp)) != EOF) { strbuf_grow(sb, 1); sb->buf[sb->len++] = ch; if (ch == term) -- 2.4.0.rc2.384.g7297a4a ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/9] git-compat-util: add fallbacks for unlocked stdio 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King 2015-04-16 8:48 ` [PATCH 1/9] strbuf_getwholeline: use getc macro Jeff King @ 2015-04-16 8:48 ` Jeff King 2015-04-16 8:49 ` [PATCH 3/9] strbuf_getwholeline: use getc_unlocked Jeff King ` (7 subsequent siblings) 9 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-16 8:48 UTC (permalink / raw) To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine POSIX.1-2001 specifies some functions for optimizing the locking out of tight getc() loops. Not all systems are POSIX, though, and even not all POSIX systems are required to implement these functions. We can check for the feature-test macro to see if they are available, and if not, provide a noop implementation. There's no Makefile knob here, because we should just detect this automatically. If there are very bizarre systems, we may need to add one, but it's not clear yet in which direction: 1. If a system defines _POSIX_THREAD_SAFE_FUNCTIONS but these functions are missing or broken, we would want a knob to manually turn them off. 2. If a system has these functions but does not define _POSIX_THREAD_SAFE_FUNCTIONS, we would want a knob to manually turn them on. We can add such a knob when we find a real-world system that matches this. Signed-off-by: Jeff King <peff@peff.net> --- git-compat-util.h | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/git-compat-util.h b/git-compat-util.h index bc8fc8c..685a0a4 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -883,4 +883,10 @@ struct tm *git_gmtime_r(const time_t *, struct tm *); # define SHELL_PATH "/bin/sh" #endif +#ifndef _POSIX_THREAD_SAFE_FUNCTIONS +#define flockfile(fh) +#define funlockfile(fh) +#define getc_unlocked(fh) getc(fh) +#endif + #endif -- 2.4.0.rc2.384.g7297a4a ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 3/9] strbuf_getwholeline: use getc_unlocked 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King 2015-04-16 8:48 ` [PATCH 1/9] strbuf_getwholeline: use getc macro Jeff King 2015-04-16 8:48 ` [PATCH 2/9] git-compat-util: add fallbacks for unlocked stdio Jeff King @ 2015-04-16 8:49 ` Jeff King 2015-04-16 8:51 ` [PATCH 4/9] config: use getc_unlocked when reading from file Jeff King ` (6 subsequent siblings) 9 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-16 8:49 UTC (permalink / raw) To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine strbuf_getwholeline calls getc in a tight loop. On modern libc implementations, the stdio code locks the handle for every operation, which means we are paying a significant overhead. We can get around this by locking the handle for the whole loop and using the unlocked variant. Running "git rev-parse refs/heads/does-not-exist" on a repo with an extremely large (1.6GB) packed-refs file went from: real 0m18.900s user 0m18.472s sys 0m0.448s to: real 0m10.953s user 0m10.384s sys 0m0.580s for a wall-clock speedup of 42%. All times are best-of-3, and done on a glibc 2.19 system. Note that we call into strbuf_grow while holding the lock. It's possible for that function to call other stdio functions (e.g., printing to stderr when dying due to malloc error); however, the POSIX.1-2001 definition of flockfile makes it clear that the locks are per-handle, so we are fine unless somebody else tries to read from our same handle. This doesn't ever happen in the current code, and is unlikely to be added in the future (we would have to do something exotic like add a die_routine that tried to read from stdin). Signed-off-by: Jeff King <peff@peff.net> --- strbuf.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/strbuf.c b/strbuf.c index 14f337d..af2bad4 100644 --- a/strbuf.c +++ b/strbuf.c @@ -443,12 +443,14 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) return EOF; strbuf_reset(sb); - while ((ch = getc(fp)) != EOF) { + flockfile(fp); + while ((ch = getc_unlocked(fp)) != EOF) { strbuf_grow(sb, 1); sb->buf[sb->len++] = ch; if (ch == term) break; } + funlockfile(fp); if (ch == EOF && sb->len == 0) return EOF; -- 2.4.0.rc2.384.g7297a4a ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 4/9] config: use getc_unlocked when reading from file 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King ` (2 preceding siblings ...) 2015-04-16 8:49 ` [PATCH 3/9] strbuf_getwholeline: use getc_unlocked Jeff King @ 2015-04-16 8:51 ` Jeff King 2015-04-16 8:53 ` [PATCH 5/9] strbuf_addch: avoid calling strbuf_grow Jeff King ` (5 subsequent siblings) 9 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-16 8:51 UTC (permalink / raw) To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine We read config files character-by-character from a stdio handle using fgetc(). This incurs significant locking overhead, even though we know that only one thread can possibly access the handle. We can speed this up by taking the lock ourselves, and then using getc_unlocked to read each character. On a silly pathological case: perl -le ' print "[core]"; print "key$_ = value$_" for (1..1000000) ' >input git config -f input core.key1 this dropped the time to run git-config from: real 0m0.263s user 0m0.260s sys 0m0.000s to: real 0m0.159s user 0m0.152s sys 0m0.004s for a savings of 39%. Most config files are not this big, but the savings should be proportional to the size of the file (i.e., we always save 39%, just of a much smaller number). Signed-off-by: Jeff King <peff@peff.net> --- config.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/config.c b/config.c index 66c0a51..8b297fc 100644 --- a/config.c +++ b/config.c @@ -49,7 +49,7 @@ static struct config_set the_config_set; static int config_file_fgetc(struct config_source *conf) { - return fgetc(conf->u.file); + return getc_unlocked(conf->u.file); } static int config_file_ungetc(int c, struct config_source *conf) @@ -1088,7 +1088,9 @@ int git_config_from_file(config_fn_t fn, const char *filename, void *data) f = fopen(filename, "r"); if (f) { + flockfile(f); ret = do_config_from_file(fn, filename, filename, f, data); + funlockfile(f); fclose(f); } return ret; -- 2.4.0.rc2.384.g7297a4a ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 5/9] strbuf_addch: avoid calling strbuf_grow 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King ` (3 preceding siblings ...) 2015-04-16 8:51 ` [PATCH 4/9] config: use getc_unlocked when reading from file Jeff King @ 2015-04-16 8:53 ` Jeff King 2015-04-16 8:58 ` [PATCH 6/9] strbuf_getwholeline: " Jeff King ` (4 subsequent siblings) 9 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-16 8:53 UTC (permalink / raw) To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine We mark strbuf_addch as inline, because we expect it may be called from a tight loop. However, the first thing it does is call the non-inline strbuf_grow(), which can handle arbitrary-sized growth. Since we know that we only need a single character, we can use the inline strbuf_avail() to quickly check whether we need to grow at all. Our check is redundant when we do call strbuf_grow(), but that's OK. The common case is that we avoid calling it at all, and we have made that case faster. On a silly pathological case: perl -le ' print "[core]"; print "key$_ = value$_" for (1..1000000) ' >input git config -f input core.key1 this dropped the time to run git-config from: real 0m0.159s user 0m0.152s sys 0m0.004s to: real 0m0.140s user 0m0.136s sys 0m0.004s for a savings of 12%. Signed-off-by: Jeff King <peff@peff.net> --- I doubt anybody will really notice this in practice with config files, and for the most part we do not have tight loops of strbuf_addch elsewhere. But it is such an easy optimization, I'd rather do it now while we're thinking about it. strbuf.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/strbuf.h b/strbuf.h index 1883494..01c5c63 100644 --- a/strbuf.h +++ b/strbuf.h @@ -205,7 +205,8 @@ extern int strbuf_cmp(const struct strbuf *, const struct strbuf *); */ static inline void strbuf_addch(struct strbuf *sb, int c) { - strbuf_grow(sb, 1); + if (!strbuf_avail(sb)) + strbuf_grow(sb, 1); sb->buf[sb->len++] = c; sb->buf[sb->len] = '\0'; } -- 2.4.0.rc2.384.g7297a4a ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 6/9] strbuf_getwholeline: avoid calling strbuf_grow 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King ` (4 preceding siblings ...) 2015-04-16 8:53 ` [PATCH 5/9] strbuf_addch: avoid calling strbuf_grow Jeff King @ 2015-04-16 8:58 ` Jeff King 2015-04-16 9:01 ` [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available Jeff King ` (3 subsequent siblings) 9 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-16 8:58 UTC (permalink / raw) To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine As with the recent speedup to strbuf_addch, we can avoid calling strbuf_grow() in a tight loop of single-character adds by instead checking strbuf_avail. Note that we would instead call strbuf_addch directly here, but it does more work than necessary: it will NUL-terminate the result for each character read. Instead, in this loop we read the characters one by one and then add the terminator manually at the end. Running "git rev-parse refs/heads/does-not-exist" on a repo with an extremely large (1.6GB) packed-refs file went from (best-of-5): real 0m10.948s user 0m10.548s sys 0m0.412s to: real 0m8.601s user 0m8.084s sys 0m0.524s for a wall-clock speedup of 21%. Helped-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Jeff King <peff@peff.net> --- Our "don't write a NUL for each character" optimization is only possible because we're intimate with the strbuf details here. I thought about making a strbuf_addch_unsafe interface to let other callers do this, too. But the only other caller that would use it is the config reader, and I measured only a 3% speedup there. Which I don't think is worth the extra API complexity. Whereas here it does make a big difference. Switching to strbuf_addch knocks us back up into the 9.5s range. I think the difference is that our lines are much longer than the tokens we're parsing in the config file. So the percentage of wasted NUL writes is much higher here. strbuf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/strbuf.c b/strbuf.c index af2bad4..921619e 100644 --- a/strbuf.c +++ b/strbuf.c @@ -445,7 +445,8 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) strbuf_reset(sb); flockfile(fp); while ((ch = getc_unlocked(fp)) != EOF) { - strbuf_grow(sb, 1); + if (!strbuf_avail(sb)) + strbuf_grow(sb, 1); sb->buf[sb->len++] = ch; if (ch == term) break; -- 2.4.0.rc2.384.g7297a4a ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King ` (5 preceding siblings ...) 2015-04-16 8:58 ` [PATCH 6/9] strbuf_getwholeline: " Jeff King @ 2015-04-16 9:01 ` Jeff King 2015-04-17 10:16 ` Eric Sunshine 2015-04-16 9:03 ` [PATCH 8/9] read_packed_refs: avoid double-checking sane refs Jeff King ` (2 subsequent siblings) 9 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2015-04-16 9:01 UTC (permalink / raw) To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine We spend a lot of time in strbuf_getwholeline in a tight loop reading characters from a stdio handle into a buffer. The libc getdelim() function can do this for us with less overhead. It's in POSIX.1-2008, and was a GNU extension before that. Therefore we can't rely on it, but can fall back to the existing getc loop when it is not available. The HAVE_GETDELIM knob is turned on automatically for Linux, where we have glibc. We don't need to set any new feature-test macros, because we already define _GNU_SOURCE. Other systems that implement getdelim may need to other macros (probably _POSIX_C_SOURCE >= 200809L), but we can address that along with setting the Makefile knob after testing the feature on those systems. Running "git rev-parse refs/heads/does-not-exist" on a repo with an extremely large (1.6GB) packed-refs file went from (best-of-5): real 0m8.601s user 0m8.084s sys 0m0.524s to: real 0m6.768s user 0m6.340s sys 0m0.432s for a wall-clock speedup of 21%. Based on a patch from Rasmus Villemoes <rv@rasmusvillemoes.dk>. Signed-off-by: Jeff King <peff@peff.net> --- If somebody has a FreeBSD or OS X system to test on, I'd love to see what is needed to compile with HAVE_GETDELIM there. And to confirm that the performance is much better. Sharing my 1.6GB packed-refs file would be hard, but you should be able to generate something large and ridiculous. I'll leave that as an exercise to the reader. Makefile | 6 ++++++ config.mak.uname | 1 + strbuf.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 49 insertions(+) diff --git a/Makefile b/Makefile index 5f3987f..36655d5 100644 --- a/Makefile +++ b/Makefile @@ -359,6 +359,8 @@ all:: # compiler is detected to support it. # # Define HAVE_BSD_SYSCTL if your platform has a BSD-compatible sysctl function. +# +# Define HAVE_GETDELIM if your system has the getdelim() function. GIT-VERSION-FILE: FORCE @$(SHELL_PATH) ./GIT-VERSION-GEN @@ -1437,6 +1439,10 @@ ifdef HAVE_BSD_SYSCTL BASIC_CFLAGS += -DHAVE_BSD_SYSCTL endif +ifdef HAVE_GETDELIM + BASIC_CFLAGS += -DHAVE_GETDELIM +endif + ifeq ($(TCLTK_PATH),) NO_TCLTK = NoThanks endif diff --git a/config.mak.uname b/config.mak.uname index f4e77cb..d26665f 100644 --- a/config.mak.uname +++ b/config.mak.uname @@ -36,6 +36,7 @@ ifeq ($(uname_S),Linux) HAVE_DEV_TTY = YesPlease HAVE_CLOCK_GETTIME = YesPlease HAVE_CLOCK_MONOTONIC = YesPlease + HAVE_GETDELIM = YesPlease endif ifeq ($(uname_S),GNU/kFreeBSD) HAVE_ALLOCA_H = YesPlease diff --git a/strbuf.c b/strbuf.c index 921619e..0d4f4e5 100644 --- a/strbuf.c +++ b/strbuf.c @@ -435,6 +435,47 @@ int strbuf_getcwd(struct strbuf *sb) return -1; } +#ifdef HAVE_GETDELIM +int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) +{ + ssize_t r; + + if (feof(fp)) + return EOF; + + strbuf_reset(sb); + + /* Translate slopbuf to NULL, as we cannot call realloc on it */ + if (!sb->alloc) + sb->buf = NULL; + r = getdelim(&sb->buf, &sb->alloc, term, fp); + + if (r > 0) { + sb->len = r; + return 0; + } + assert(r == -1); + + /* + * Normally we would have called xrealloc, which will try to free + * memory and recover. But we have no way to tell getdelim() to do so. + * Worse, we cannot try to recover ENOMEM ourselves, because we have + * no idea how many bytes were read by getdelim. + * + * Dying here is reasonable. It mirrors what xrealloc would do on + * catastrophic memory failure. We skip the opportunity to free pack + * memory and retry, but that's unlikely to help for a malloc small + * enough to hold a single line of input, anyway. + */ + if (errno == ENOMEM) + die("Out of memory, getdelim failed"); + + /* Restore slopbuf that we moved out of the way before */ + if (!sb->buf) + strbuf_init(sb, 0); + return EOF; +} +#else int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) { int ch; @@ -458,6 +499,7 @@ int strbuf_getwholeline(struct strbuf *sb, FILE *fp, int term) sb->buf[sb->len] = '\0'; return 0; } +#endif int strbuf_getline(struct strbuf *sb, FILE *fp, int term) { -- 2.4.0.rc2.384.g7297a4a ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available 2015-04-16 9:01 ` [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available Jeff King @ 2015-04-17 10:16 ` Eric Sunshine 2015-04-21 23:09 ` Jeff King 2015-04-22 18:00 ` Johannes Schindelin 0 siblings, 2 replies; 44+ messages in thread From: Eric Sunshine @ 2015-04-17 10:16 UTC (permalink / raw) To: Jeff King; +Cc: Git List, René Scharfe, Rasmus Villemoes On Thu, Apr 16, 2015 at 5:01 AM, Jeff King <peff@peff.net> wrote: > We spend a lot of time in strbuf_getwholeline in a tight > loop reading characters from a stdio handle into a buffer. > The libc getdelim() function can do this for us with less > overhead. It's in POSIX.1-2008, and was a GNU extension > before that. Therefore we can't rely on it, but can fall > back to the existing getc loop when it is not available. > > The HAVE_GETDELIM knob is turned on automatically for Linux, > where we have glibc. We don't need to set any new > feature-test macros, because we already define _GNU_SOURCE. > Other systems that implement getdelim may need to other > macros (probably _POSIX_C_SOURCE >= 200809L), but we can > address that along with setting the Makefile knob after > testing the feature on those systems. > [...] > > Based on a patch from Rasmus Villemoes <rv@rasmusvillemoes.dk>. > > Signed-off-by: Jeff King <peff@peff.net> > --- > If somebody has a FreeBSD or OS X system to test on, I'd > love to see what is needed to compile with HAVE_GETDELIM > there. Modern Mac OS X, 10.10.x Yosemite, has getdelim() and git builds fine with HAVE_GETDELIM. I also tested on old Snow Leopard 10.5.8 from 2009. It does not have getdelim(). Unfortunately, I haven't been able to determine when getdelim() was introduced on the Mac OS X, thus have been unable to craft a simple rule for config.mak.uname. > And to confirm that the performance is much better. > Sharing my 1.6GB packed-refs file would be hard, but you > should be able to generate something large and ridiculous. > I'll leave that as an exercise to the reader. > > diff --git a/Makefile b/Makefile > index 5f3987f..36655d5 100644 > --- a/Makefile > +++ b/Makefile > @@ -359,6 +359,8 @@ all:: > # compiler is detected to support it. > # > # Define HAVE_BSD_SYSCTL if your platform has a BSD-compatible sysctl function. > +# > +# Define HAVE_GETDELIM if your system has the getdelim() function. > > GIT-VERSION-FILE: FORCE > @$(SHELL_PATH) ./GIT-VERSION-GEN > @@ -1437,6 +1439,10 @@ ifdef HAVE_BSD_SYSCTL > BASIC_CFLAGS += -DHAVE_BSD_SYSCTL > endif > > +ifdef HAVE_GETDELIM > + BASIC_CFLAGS += -DHAVE_GETDELIM > +endif > + > ifeq ($(TCLTK_PATH),) > NO_TCLTK = NoThanks > endif > diff --git a/config.mak.uname b/config.mak.uname > index f4e77cb..d26665f 100644 > --- a/config.mak.uname > +++ b/config.mak.uname > @@ -36,6 +36,7 @@ ifeq ($(uname_S),Linux) > HAVE_DEV_TTY = YesPlease > HAVE_CLOCK_GETTIME = YesPlease > HAVE_CLOCK_MONOTONIC = YesPlease > + HAVE_GETDELIM = YesPlease > endif > ifeq ($(uname_S),GNU/kFreeBSD) > HAVE_ALLOCA_H = YesPlease ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available 2015-04-17 10:16 ` Eric Sunshine @ 2015-04-21 23:09 ` Jeff King 2015-05-08 23:56 ` Eric Sunshine 2015-04-22 18:00 ` Johannes Schindelin 1 sibling, 1 reply; 44+ messages in thread From: Jeff King @ 2015-04-21 23:09 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List, René Scharfe, Rasmus Villemoes On Fri, Apr 17, 2015 at 06:16:48AM -0400, Eric Sunshine wrote: > > If somebody has a FreeBSD or OS X system to test on, I'd > > love to see what is needed to compile with HAVE_GETDELIM > > there. > > Modern Mac OS X, 10.10.x Yosemite, has getdelim() and git builds fine > with HAVE_GETDELIM. I also tested on old Snow Leopard 10.5.8 from > 2009. It does not have getdelim(). Unfortunately, I haven't been able > to determine when getdelim() was introduced on the Mac OS X, thus have > been unable to craft a simple rule for config.mak.uname. Thanks for looking into it. Since there haven't been any other takers in the meantime, do you want to prepare a patch that checks $(uname_R) for 10.10.x? That's likely more conservative than is necessary, but we can loosen it later if somebody on 10.9.x shows up with test results. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available 2015-04-21 23:09 ` Jeff King @ 2015-05-08 23:56 ` Eric Sunshine 2015-05-09 1:09 ` Jeff King 0 siblings, 1 reply; 44+ messages in thread From: Eric Sunshine @ 2015-05-08 23:56 UTC (permalink / raw) To: Jeff King; +Cc: Git List, René Scharfe, Rasmus Villemoes On Tue, Apr 21, 2015 at 7:09 PM, Jeff King <peff@peff.net> wrote: > On Fri, Apr 17, 2015 at 06:16:48AM -0400, Eric Sunshine wrote: >> > If somebody has a FreeBSD or OS X system to test on, I'd >> > love to see what is needed to compile with HAVE_GETDELIM >> > there. >> >> Modern Mac OS X, 10.10.x Yosemite, has getdelim() and git builds fine >> with HAVE_GETDELIM. I also tested on old Snow Leopard 10.5.8 from >> 2009. It does not have getdelim(). Unfortunately, I haven't been able >> to determine when getdelim() was introduced on the Mac OS X, thus have >> been unable to craft a simple rule for config.mak.uname. > > Thanks for looking into it. Since there haven't been any other takers in > the meantime, do you want to prepare a patch that checks $(uname_R) for > 10.10.x? That's likely more conservative than is necessary, but we can > loosen it later if somebody on 10.9.x shows up with test results. I spent some time downloading old Xcode releases and poking through the packages. Xcode 3.2.x seems to be the last in the Xcode 3 series, and none of the Xcode 3.2.x versions I examined carried getdelim(). The first package in which I found getdelim() was Xcode 4.1. (Unfortunately, Apple doesn't seem to make Xcode 4.0 available for download anymore or it's only available to paying developers, so I couldn't check it.) According to Wikipedia[1], Xcode 4.1 was released the same day as Lion (OS X 10.7 [2]), but was also available to paying developers for Snow Leopard (OS X 10.6). Consequently, I think it's safe to say that getdelim() is available for Lion (10.7) and later. If we don't mind being a bit less conservative, then we might assume that it also is available for Snow Leopard (10.6), which it definitely supported, but perhaps that's too risky, since not everyone would have been a paid subscriber. Alternately, we could make the test more dynamic and accurate by grepping stdio.h for 'getdelim' or just by trying a test compile, though that's probably too expensive. [1]: http://en.wikipedia.org/wiki/Xcode [2]: http://en.wikipedia.org/wiki/OS_X ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available 2015-05-08 23:56 ` Eric Sunshine @ 2015-05-09 1:09 ` Jeff King 2015-06-02 18:22 ` Eric Sunshine 0 siblings, 1 reply; 44+ messages in thread From: Jeff King @ 2015-05-09 1:09 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List, René Scharfe, Rasmus Villemoes On Fri, May 08, 2015 at 07:56:28PM -0400, Eric Sunshine wrote: > I spent some time downloading old Xcode releases and poking through > the packages. Xcode 3.2.x seems to be the last in the Xcode 3 series, > and none of the Xcode 3.2.x versions I examined carried getdelim(). > The first package in which I found getdelim() was Xcode 4.1. > (Unfortunately, Apple doesn't seem to make Xcode 4.0 available for > download anymore or it's only available to paying developers, so I > couldn't check it.) According to Wikipedia[1], Xcode 4.1 was released > the same day as Lion (OS X 10.7 [2]), but was also available to paying > developers for Snow Leopard (OS X 10.6). > > Consequently, I think it's safe to say that getdelim() is available > for Lion (10.7) and later. If we don't mind being a bit less > conservative, then we might assume that it also is available for Snow > Leopard (10.6), which it definitely supported, but perhaps that's too > risky, since not everyone would have been a paid subscriber. Thanks for digging. I'd argue for the conservative choice, simply because this is a pure optimization. The old code should work just fine, and people have been living with it for years. I doubt it will affect many people either way, though. Lion is 4 years old, and most OS X people seem to upgrade fairly regularly. It is not like long-term server systems where we are supporting Solaris 7. :) Want to roll a patch? > Alternately, we could make the test more dynamic and accurate by > grepping stdio.h for 'getdelim' or just by trying a test compile, > though that's probably too expensive. The natural place would be in configure.ac, and that is orthogonal to the default Darwin setting, I think. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available 2015-05-09 1:09 ` Jeff King @ 2015-06-02 18:22 ` Eric Sunshine 0 siblings, 0 replies; 44+ messages in thread From: Eric Sunshine @ 2015-06-02 18:22 UTC (permalink / raw) To: Jeff King; +Cc: Git List, René Scharfe, Rasmus Villemoes On Fri, May 8, 2015 at 9:09 PM, Jeff King <peff@peff.net> wrote: > On Fri, May 08, 2015 at 07:56:28PM -0400, Eric Sunshine wrote: >> I spent some time downloading old Xcode releases and poking through >> the packages. Xcode 3.2.x seems to be the last in the Xcode 3 series, >> and none of the Xcode 3.2.x versions I examined carried getdelim(). >> The first package in which I found getdelim() was Xcode 4.1. >> (Unfortunately, Apple doesn't seem to make Xcode 4.0 available for >> download anymore or it's only available to paying developers, so I >> couldn't check it.) According to Wikipedia[1], Xcode 4.1 was released >> the same day as Lion (OS X 10.7 [2]), but was also available to paying >> developers for Snow Leopard (OS X 10.6). >> >> Consequently, I think it's safe to say that getdelim() is available >> for Lion (10.7) and later. If we don't mind being a bit less >> conservative, then we might assume that it also is available for Snow >> Leopard (10.6), which it definitely supported, but perhaps that's too >> risky, since not everyone would have been a paid subscriber. > > Thanks for digging. I'd argue for the conservative choice, simply > because this is a pure optimization. The old code should work just fine, > and people have been living with it for years. > > I doubt it will affect many people either way, though. Lion is 4 years > old, and most OS X people seem to upgrade fairly regularly. It is not > like long-term server systems where we are supporting Solaris 7. :) > > Want to roll a patch? After a long, long delay, here it is...[1] >> Alternately, we could make the test more dynamic and accurate by >> grepping stdio.h for 'getdelim' or just by trying a test compile, >> though that's probably too expensive. > > The natural place would be in configure.ac, and that is orthogonal to > the default Darwin setting, I think. I added that too[1]. [1]: http://thread.gmane.org/gmane.comp.version-control.git/270576 ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available 2015-04-17 10:16 ` Eric Sunshine 2015-04-21 23:09 ` Jeff King @ 2015-04-22 18:00 ` Johannes Schindelin 2015-04-22 18:06 ` Jeff King 1 sibling, 1 reply; 44+ messages in thread From: Johannes Schindelin @ 2015-04-22 18:00 UTC (permalink / raw) To: Eric Sunshine; +Cc: Jeff King, Git List, René Scharfe, Rasmus Villemoes Hi, On 2015-04-17 12:16, Eric Sunshine wrote: > On Thu, Apr 16, 2015 at 5:01 AM, Jeff King <peff@peff.net> wrote: >> We spend a lot of time in strbuf_getwholeline in a tight >> loop reading characters from a stdio handle into a buffer. >> The libc getdelim() function can do this for us with less >> overhead. Just for the record: Git for Windows cannot lean on `getdelim()`, as it is not available on Windows. Do not let that stop you; if it turns out to impact performance, we will just have to come up with our own implementation of that function. Ciao, Dscho ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available 2015-04-22 18:00 ` Johannes Schindelin @ 2015-04-22 18:06 ` Jeff King 0 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-22 18:06 UTC (permalink / raw) To: Johannes Schindelin Cc: Eric Sunshine, Git List, René Scharfe, Rasmus Villemoes On Wed, Apr 22, 2015 at 08:00:55PM +0200, Johannes Schindelin wrote: > On 2015-04-17 12:16, Eric Sunshine wrote: > > On Thu, Apr 16, 2015 at 5:01 AM, Jeff King <peff@peff.net> wrote: > >> We spend a lot of time in strbuf_getwholeline in a tight > >> loop reading characters from a stdio handle into a buffer. > >> The libc getdelim() function can do this for us with less > >> overhead. > > Just for the record: Git for Windows cannot lean on `getdelim()`, as > it is not available on Windows. Do not let that stop you; if it turns > out to impact performance, we will just have to come up with our own > implementation of that function. Hopefully the earlier patch in the series to avoid locking will help on Windows. After the end of the series, it isn't used anymore on Linux, but I kept it in exactly for those less-fortunate systems. If you can find a Windows equivalent that does the same thing as getdelim, it should be pretty easy to drop it into an alternate strbuf_getwholeline implementation (or just provide a compat "getdelim" if it is close enough to have the same interface). -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 8/9] read_packed_refs: avoid double-checking sane refs 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King ` (6 preceding siblings ...) 2015-04-16 9:01 ` [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available Jeff King @ 2015-04-16 9:03 ` Jeff King 2015-04-16 9:04 ` [PATCH 9/9] t1430: add another refs-escape test Jeff King 2015-04-16 9:25 ` [PATCH v2 0/9] address packed-refs speed regressions Jeff King 9 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-16 9:03 UTC (permalink / raw) To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine Prior to d0f810f (refs.c: allow listing and deleting badly named refs, 2014-09-03), read_packed_refs would barf on any malformed refnames by virtue of calling create_ref_entry with the "check" parameter set to 1. That commit loosened our reading so that we call check_refname_format ourselves and just set a REF_BAD_NAME flag. We then call create_ref_entry with the check parameter set to 0. That function learned to do an extra safety check even when the check parameter is 0, so that we don't load any dangerous refnames (like "../../../etc/passwd"). This is implemented by calling refname_is_safe() in create_ref_entry(). However, we can observe that refname_is_safe() can only be true if check_refname_format() also failed. So in the common case of a sanely named ref, we perform _both_ checks, even though we know that the latter will never trigger. This has a noticeable performance impact when the packed-refs file is large. Let's drop the refname_is_safe check from create_ref_entry(), and make it the responsibility of the caller. Of the three callers that pass a check parameter of "0", two will have just called check_refname_format(), and can check the refname-safety only when it fails. The third case, pack_if_possible_fn, is copying from an existing ref entry, which must have previously passed our safety check. With this patch, running "git rev-parse refs/heads/does-not-exist" on a repo with a large (1.6GB) packed-refs file went from: real 0m6.768s user 0m6.340s sys 0m0.432s to: real 0m5.703s user 0m5.276s sys 0m0.432s for a wall-clock speedup of 15%. Signed-off-by: Jeff King <peff@peff.net> --- refs.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/refs.c b/refs.c index 47e4e53..f36ea75 100644 --- a/refs.c +++ b/refs.c @@ -344,8 +344,6 @@ static struct ref_entry *create_ref_entry(const char *refname, if (check_name && check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) die("Reference has invalid format: '%s'", refname); - if (!check_name && !refname_is_safe(refname)) - die("Reference has invalid name: '%s'", refname); len = strlen(refname) + 1; ref = xmalloc(sizeof(struct ref_entry) + len); hashcpy(ref->u.value.sha1, sha1); @@ -1178,6 +1176,8 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir) int flag = REF_ISPACKED; if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) { + if (!refname_is_safe(refname)) + die("packed refname is dangerous: %s", refname); hashclr(sha1); flag |= REF_BAD_NAME | REF_ISBROKEN; } @@ -1323,6 +1323,8 @@ static void read_loose_refs(const char *dirname, struct ref_dir *dir) } if (check_refname_format(refname.buf, REFNAME_ALLOW_ONELEVEL)) { + if (!refname_is_safe(refname.buf)) + die("loose refname is dangerous: %s", refname.buf); hashclr(sha1); flag |= REF_BAD_NAME | REF_ISBROKEN; } -- 2.4.0.rc2.384.g7297a4a ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 9/9] t1430: add another refs-escape test 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King ` (7 preceding siblings ...) 2015-04-16 9:03 ` [PATCH 8/9] read_packed_refs: avoid double-checking sane refs Jeff King @ 2015-04-16 9:04 ` Jeff King 2015-04-16 9:25 ` [PATCH v2 0/9] address packed-refs speed regressions Jeff King 9 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-16 9:04 UTC (permalink / raw) To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine In t1430, we check whether deleting the branch "../../foo" will delete ".git/foo". However, this is not that interesting a test; the precious file ".git/foo" does not look like a ref, so even if we did not notice the "escape" from the "refs/" hierarchy, we would fail for that reason (i.e., if you turned refname_is_safe into a noop, the test still passes). Let's add an additional test for the same thing, but with a file that actually looks like a ref. That will make sure we are exercising the refname_is_safe code. While we're at it, let's also make the code work a little harder by adding some extra paths and some empty path components. Signed-off-by: Jeff King <peff@peff.net> --- This was originally included to exercise refname_is_safe(), because in the v1 series I refactored it (here I just avoid calling it entirely). So it's not as important in v2. But AFAICT, we do not exercise refname_is_safe() at all in the test suite without this patch, so it's probably a good thing to do regardless. t/t1430-bad-ref-name.sh | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/t/t1430-bad-ref-name.sh b/t/t1430-bad-ref-name.sh index 468e856..16d0b8b 100755 --- a/t/t1430-bad-ref-name.sh +++ b/t/t1430-bad-ref-name.sh @@ -68,6 +68,14 @@ test_expect_success 'branch -D cannot delete non-ref in .git dir' ' test_cmp expect .git/my-private-file ' +test_expect_success 'branch -D cannot delete ref in .git dir' ' + git rev-parse HEAD >.git/my-private-file && + git rev-parse HEAD >expect && + git branch foo/legit && + test_must_fail git branch -D foo////./././../../../my-private-file && + test_cmp expect .git/my-private-file +' + test_expect_success 'branch -D cannot delete absolute path' ' git branch -f extra && test_must_fail git branch -D "$(pwd)/.git/refs/heads/extra" && -- 2.4.0.rc2.384.g7297a4a ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v2 0/9] address packed-refs speed regressions 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King ` (8 preceding siblings ...) 2015-04-16 9:04 ` [PATCH 9/9] t1430: add another refs-escape test Jeff King @ 2015-04-16 9:25 ` Jeff King 9 siblings, 0 replies; 44+ messages in thread From: Jeff King @ 2015-04-16 9:25 UTC (permalink / raw) To: git; +Cc: René Scharfe, Rasmus Villemoes, Eric Sunshine On Thu, Apr 16, 2015 at 04:47:34AM -0400, Jeff King wrote: > Here's a re-roll incorporating feedback from the list. Thanks everybody > for your comments. Last time the final number was ~8.5s, which was > disappointingly slower than v2.0.0. In this iteration, my final numbers > are: > > real 0m5.703s > user 0m5.276s > sys 0m0.432s > > which is quite pleasing. I forgot to mention what I _didn't_ include. We discussed using mmap instead of stdio. Applying René's mmap patch drops my best-of-five here to 5.114s. Which is nice, but it is a bit more invasive (and does not help other callers of strbuf_getline). If I further apply my really nasty patch to avoid the strbuf entirely (i.e., we parse straight out of the mmap), I can get it down to 4.835s. I don't know if the complexity is worth it or not. Ultimately, I think the best route to making packed-refs faster is to drop the whole ref_cache structure and just iterate directly over the mmap data. It would use less RAM, and it opens the possibility of binary-searching to look at only a subset of the entries. That's a _lot_ more invasive, though. -Peff ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2015-06-02 18:22 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-04-05 1:06 [PATCH 0/6] address packed-refs speed regressions Jeff King 2015-04-05 1:07 ` [PATCH 1/6] strbuf_getwholeline: use getc macro Jeff King 2015-04-05 1:08 ` [PATCH 2/6] git-compat-util: add fallbacks for unlocked stdio Jeff King 2015-04-05 1:11 ` [PATCH 3/6] strbuf_getwholeline: use getc_unlocked Jeff King 2015-04-05 4:56 ` Jeff King 2015-04-05 5:27 ` Jeff King 2015-04-05 5:35 ` Jeff King 2015-04-05 20:49 ` Junio C Hamano 2015-04-05 14:36 ` Duy Nguyen 2015-04-05 18:24 ` Jeff King 2015-04-05 20:09 ` Junio C Hamano 2015-04-07 13:48 ` Rasmus Villemoes 2015-04-07 19:04 ` Jeff King 2015-04-07 22:43 ` Rasmus Villemoes 2015-04-08 0:17 ` Jeff King 2015-04-05 1:11 ` [PATCH 4/6] strbuf: add an optimized 1-character strbuf_grow Jeff King 2015-04-06 2:13 ` Eric Sunshine 2015-04-06 5:05 ` Jeff King 2015-04-05 1:11 ` [PATCH 5/6] t1430: add another refs-escape test Jeff King 2015-04-05 1:15 ` [PATCH 6/6] refname_is_safe: avoid expensive normalize_path_copy call Jeff King 2015-04-05 13:41 ` [PATCH 0/6] address packed-refs speed regressions René Scharfe 2015-04-05 18:52 ` Jeff King 2015-04-05 18:59 ` Jeff King 2015-04-05 23:04 ` René Scharfe 2015-04-05 22:39 ` René Scharfe 2015-04-06 4:49 ` Jeff King 2015-04-16 8:47 ` [PATCH v2 0/9] " Jeff King 2015-04-16 8:48 ` [PATCH 1/9] strbuf_getwholeline: use getc macro Jeff King 2015-04-16 8:48 ` [PATCH 2/9] git-compat-util: add fallbacks for unlocked stdio Jeff King 2015-04-16 8:49 ` [PATCH 3/9] strbuf_getwholeline: use getc_unlocked Jeff King 2015-04-16 8:51 ` [PATCH 4/9] config: use getc_unlocked when reading from file Jeff King 2015-04-16 8:53 ` [PATCH 5/9] strbuf_addch: avoid calling strbuf_grow Jeff King 2015-04-16 8:58 ` [PATCH 6/9] strbuf_getwholeline: " Jeff King 2015-04-16 9:01 ` [PATCH 7/9] strbuf_getwholeline: use getdelim if it is available Jeff King 2015-04-17 10:16 ` Eric Sunshine 2015-04-21 23:09 ` Jeff King 2015-05-08 23:56 ` Eric Sunshine 2015-05-09 1:09 ` Jeff King 2015-06-02 18:22 ` Eric Sunshine 2015-04-22 18:00 ` Johannes Schindelin 2015-04-22 18:06 ` Jeff King 2015-04-16 9:03 ` [PATCH 8/9] read_packed_refs: avoid double-checking sane refs Jeff King 2015-04-16 9:04 ` [PATCH 9/9] t1430: add another refs-escape test Jeff King 2015-04-16 9:25 ` [PATCH v2 0/9] address packed-refs speed regressions Jeff King
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.