* [PU PATCH] Fix git fetch for very large ref counts @ 2007-02-13 1:21 Julian Phillips 2007-02-13 1:21 ` [PATCH] Allow fetch--tool to read from stdin Julian Phillips 2007-02-13 3:18 ` [PU PATCH] Fix git fetch for very large ref counts Junio C Hamano 0 siblings, 2 replies; 8+ messages in thread From: Julian Phillips @ 2007-02-13 1:21 UTC (permalink / raw) To: git The updated git fetch in pu is vastly improved on repositories with very large numbers of refs. The time taken for a no-op fetch over ~9000 refs drops from ~48m to ~0.5m. However, before git fetch will actually run on a repository with ~9000 refs the calling interface between fetch and fetch--tool needs to be changed. The existing version passes the entire reflist on the command line, which means that it is subject to the maxiumum environment size passed to a child process by execve. The following patches add a stdin based interface to fetch--tool allowing the ~9000 refs to be passed without exceeding the environment limit. -- Julian ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] Allow fetch--tool to read from stdin 2007-02-13 1:21 [PU PATCH] Fix git fetch for very large ref counts Julian Phillips @ 2007-02-13 1:21 ` Julian Phillips 2007-02-13 1:21 ` [PATCH] Use stdin reflist passing in parse-remote Julian Phillips 2007-02-13 3:18 ` [PU PATCH] Fix git fetch for very large ref counts Junio C Hamano 1 sibling, 1 reply; 8+ messages in thread From: Julian Phillips @ 2007-02-13 1:21 UTC (permalink / raw) To: git; +Cc: Julian Phillips If the reflist is "-" then read the reflist data from stdin instead, this will allow the passing of more than 128K of reflist data - which won't fit in the environment passed by execve. Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk> --- builtin-fetch--tool.c | 24 ++++++++++++++++++++++-- 1 files changed, 22 insertions(+), 2 deletions(-) diff --git a/builtin-fetch--tool.c b/builtin-fetch--tool.c index 3090ffe..619ceb0 100644 --- a/builtin-fetch--tool.c +++ b/builtin-fetch--tool.c @@ -2,6 +2,20 @@ #include "refs.h" #include "commit.h" +static char *get_stdin(void) +{ +#define CHUNK_SIZE (1048576) + char *data = xmalloc(CHUNK_SIZE); + int offset = 0, read = 0; + read = xread(0, data, CHUNK_SIZE); + while (read == CHUNK_SIZE) { + offset += CHUNK_SIZE; + data = xrealloc(data, offset + CHUNK_SIZE); + read = xread(0, data + offset, CHUNK_SIZE); + } + return data; +} + static void show_new(char *type, unsigned char *sha1_new) { fprintf(stderr, " %s: %s\n", type, @@ -463,12 +477,18 @@ int cmd_fetch__tool(int argc, const char **argv, const char *prefix) if (!strcmp("parse-reflist", argv[1])) { if (argc != 3) return error("parse-reflist takes 1 arg"); - return parse_reflist(argv[2]); + const char *reflist = argv[2]; + if (!strcmp(reflist, "-")) + reflist = get_stdin(); + return parse_reflist(reflist); } if (!strcmp("expand-refs-wildcard", argv[1])) { if (argc < 4) return error("expand-refs-wildcard takes at least 2 args"); - return expand_refs_wildcard(argv[2], argc - 3, argv + 3); + const char *reflist = argv[2]; + if (!strcmp(reflist, "-")) + reflist = get_stdin(); + return expand_refs_wildcard(reflist, argc - 3, argv + 3); } return error("Unknown subcommand: %s", argv[1]); -- 1.4.4.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] Use stdin reflist passing in parse-remote 2007-02-13 1:21 ` [PATCH] Allow fetch--tool to read from stdin Julian Phillips @ 2007-02-13 1:21 ` Julian Phillips 2007-02-13 1:21 ` [PATCH] Use stdin reflist passing in git-fetch.sh Julian Phillips 0 siblings, 1 reply; 8+ messages in thread From: Julian Phillips @ 2007-02-13 1:21 UTC (permalink / raw) To: git; +Cc: Julian Phillips Use the new stdin reflist passing mechanism for the call to fetch--tool expand-refs-wildcard, allowing passing of more than ~128K of reflist data. Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk> --- git-parse-remote.sh | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/git-parse-remote.sh b/git-parse-remote.sh index 9b19a21..185eb54 100755 --- a/git-parse-remote.sh +++ b/git-parse-remote.sh @@ -81,7 +81,7 @@ get_remote_default_refs_for_push () { # is to help prevent randomly "globbed" ref from being chosen as # a merge candidate expand_refs_wildcard () { - git fetch--tool expand-refs-wildcard "$ls_remote_result" "$@" + echo "$ls_remote_result" | git fetch--tool expand-refs-wildcard "-" "$@" } # Subroutine to canonicalize remote:local notation. -- 1.4.4.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] Use stdin reflist passing in git-fetch.sh 2007-02-13 1:21 ` [PATCH] Use stdin reflist passing in parse-remote Julian Phillips @ 2007-02-13 1:21 ` Julian Phillips 2007-02-13 2:31 ` Linus Torvalds 0 siblings, 1 reply; 8+ messages in thread From: Julian Phillips @ 2007-02-13 1:21 UTC (permalink / raw) To: git; +Cc: Julian Phillips Use the new stdin reflist passing mechanism for the call to fetch--tool parse-reflist, allowing passing of more than ~128K of reflist data. Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk> --- git-fetch.sh | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/git-fetch.sh b/git-fetch.sh index 6a8b196..5a1b4e7 100755 --- a/git-fetch.sh +++ b/git-fetch.sh @@ -169,7 +169,7 @@ fi fetch_native () { - eval=$(git-fetch--tool parse-reflist "$1") + eval=$(echo "$1" | git-fetch--tool parse-reflist "-") eval "$eval" ( : subshell because we muck with IFS -- 1.4.4.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] Use stdin reflist passing in git-fetch.sh 2007-02-13 1:21 ` [PATCH] Use stdin reflist passing in git-fetch.sh Julian Phillips @ 2007-02-13 2:31 ` Linus Torvalds 0 siblings, 0 replies; 8+ messages in thread From: Linus Torvalds @ 2007-02-13 2:31 UTC (permalink / raw) To: Julian Phillips; +Cc: git, Julian Phillips On Tue, 13 Feb 2007, Julian Phillips wrote: > > Use the new stdin reflist passing mechanism for the call to > fetch--tool parse-reflist, allowing passing of more than ~128K > of reflist data. This pretty much seems to assume that "echo" is a shell built-in. Otherwise, the echo "$1" part will still fall afoul of any exec limits. Maybe that's aperfectly fine assumption. It's certainly true for bash (and probably _any_ shells that do any built-ins at all - echo is pretty damn basic ;) Linus ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PU PATCH] Fix git fetch for very large ref counts 2007-02-13 1:21 [PU PATCH] Fix git fetch for very large ref counts Julian Phillips 2007-02-13 1:21 ` [PATCH] Allow fetch--tool to read from stdin Julian Phillips @ 2007-02-13 3:18 ` Junio C Hamano 2007-02-13 10:39 ` Julian Phillips 1 sibling, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2007-02-13 3:18 UTC (permalink / raw) To: Julian Phillips; +Cc: git "Julian Phillips" <jp3@quantumfyre.co.uk> writes: > The updated git fetch in pu is vastly improved on repositories with very > large numbers of refs. The time taken for a no-op fetch over ~9000 refs > drops from ~48m to ~0.5m. > > However, before git fetch will actually run on a repository with ~9000 > refs the calling interface between fetch and fetch--tool needs to be > changed. The existing version passes the entire reflist on the command > line, which means that it is subject to the maxiumum environment size > passed to a child process by execve. > > The following patches add a stdin based interface to fetch--tool allowing > the ~9000 refs to be passed without exceeding the environment limit. Thanks. But the ones in 'pu' were done primarily as demonstration of where the bottlenecks are, and not meant for real-world consumption. I think the final shaving of 0.5m down to a few seconds needs to move the ls_remote_result string currently kept as a shell variable to a list of strings represented in a git-fetch largely rewritten in C, and at that point the interface from outside fetch--tool to throw 9000 refs at it would be an internal function call and the code you fixed along with new function you introduced would probably need to be discarded. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PU PATCH] Fix git fetch for very large ref counts 2007-02-13 3:18 ` [PU PATCH] Fix git fetch for very large ref counts Junio C Hamano @ 2007-02-13 10:39 ` Julian Phillips 2007-02-13 17:58 ` Junio C Hamano 0 siblings, 1 reply; 8+ messages in thread From: Julian Phillips @ 2007-02-13 10:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Mon, 12 Feb 2007, Junio C Hamano wrote: > "Julian Phillips" <jp3@quantumfyre.co.uk> writes: > >> The updated git fetch in pu is vastly improved on repositories with very >> large numbers of refs. The time taken for a no-op fetch over ~9000 refs >> drops from ~48m to ~0.5m. >> >> However, before git fetch will actually run on a repository with ~9000 >> refs the calling interface between fetch and fetch--tool needs to be >> changed. The existing version passes the entire reflist on the command >> line, which means that it is subject to the maxiumum environment size >> passed to a child process by execve. >> >> The following patches add a stdin based interface to fetch--tool allowing >> the ~9000 refs to be passed without exceeding the environment limit. > > Thanks. > > But the ones in 'pu' were done primarily as demonstration of > where the bottlenecks are, and not meant for real-world > consumption. I think the final shaving of 0.5m down to a few > seconds needs to move the ls_remote_result string currently kept > as a shell variable to a list of strings represented in a > git-fetch largely rewritten in C, and at that point the > interface from outside fetch--tool to throw 9000 refs at it > would be an internal function call and the code you fixed along > with new function you introduced would probably need to be > discarded. And there I was thinking 0.5m was fast ... given how long I've been reading this list I really should have know better. ;) I only really made the changes so I could try your improvements to fetch out - if they aren't necessary because you're making it even faster then I really don't have much cause to complain. -- Julian --- To be a kind of moral Unix, he touched the hem of Nature's shift. -- Shelley ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PU PATCH] Fix git fetch for very large ref counts 2007-02-13 10:39 ` Julian Phillips @ 2007-02-13 17:58 ` Junio C Hamano 0 siblings, 0 replies; 8+ messages in thread From: Junio C Hamano @ 2007-02-13 17:58 UTC (permalink / raw) To: Julian Phillips; +Cc: git Julian Phillips <julian@quantumfyre.co.uk> writes: > And there I was thinking 0.5m was fast ... given how long I've been > reading this list I really should have know better. ;) Well, the only thing we need to do in fetching between two repositories that are already identical should be to compare ls-remote output from both ends and immediately declare victory, and it is unacceptable for such an obvious no-op to take 30 seconds. At this point it really is the shell loop that is killing us. > I only really made the changes so I could try your improvements to > fetch out - if they aren't necessary because you're making it even > faster then I really don't have much cause to complain. I've applied your fixes to jc/fetch topic and merged it back to 'pu'. Thanks. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-02-13 17:58 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2007-02-13 1:21 [PU PATCH] Fix git fetch for very large ref counts Julian Phillips 2007-02-13 1:21 ` [PATCH] Allow fetch--tool to read from stdin Julian Phillips 2007-02-13 1:21 ` [PATCH] Use stdin reflist passing in parse-remote Julian Phillips 2007-02-13 1:21 ` [PATCH] Use stdin reflist passing in git-fetch.sh Julian Phillips 2007-02-13 2:31 ` Linus Torvalds 2007-02-13 3:18 ` [PU PATCH] Fix git fetch for very large ref counts Junio C Hamano 2007-02-13 10:39 ` Julian Phillips 2007-02-13 17:58 ` Junio C Hamano
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.