All of lore.kernel.org
 help / color / mirror / Atom feed
* [PU PATCH] Fix git fetch for very large ref counts
@ 2007-02-13  1:21 Julian Phillips
  2007-02-13  1:21 ` [PATCH] Allow fetch--tool to read from stdin Julian Phillips
  2007-02-13  3:18 ` [PU PATCH] Fix git fetch for very large ref counts Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Julian Phillips @ 2007-02-13  1:21 UTC (permalink / raw)
  To: git

The updated git fetch in pu is vastly improved on repositories with very
large numbers of refs.  The time taken for a no-op fetch over ~9000 refs
drops from ~48m to ~0.5m.

However, before git fetch will actually run on a repository with ~9000
refs the calling interface between fetch and fetch--tool needs to be
changed.  The existing version passes the entire reflist on the command
line, which means that it is subject to the maxiumum environment size
passed to a child process by execve.

The following patches add a stdin based interface to fetch--tool allowing
the ~9000 refs to be passed without exceeding the environment limit.

--
Julian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] Allow fetch--tool to read from stdin
  2007-02-13  1:21 [PU PATCH] Fix git fetch for very large ref counts Julian Phillips
@ 2007-02-13  1:21 ` Julian Phillips
  2007-02-13  1:21   ` [PATCH] Use stdin reflist passing in parse-remote Julian Phillips
  2007-02-13  3:18 ` [PU PATCH] Fix git fetch for very large ref counts Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Julian Phillips @ 2007-02-13  1:21 UTC (permalink / raw)
  To: git; +Cc: Julian Phillips

If the reflist is "-" then read the reflist data from stdin instead,
this will allow the passing of more than 128K of reflist data - which
won't fit in the environment passed by execve.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---
 builtin-fetch--tool.c |   24 ++++++++++++++++++++++--
 1 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/builtin-fetch--tool.c b/builtin-fetch--tool.c
index 3090ffe..619ceb0 100644
--- a/builtin-fetch--tool.c
+++ b/builtin-fetch--tool.c
@@ -2,6 +2,20 @@
 #include "refs.h"
 #include "commit.h"
 
+static char *get_stdin(void)
+{
+#define CHUNK_SIZE (1048576)
+	char *data = xmalloc(CHUNK_SIZE);
+	int offset = 0, read = 0;
+	read = xread(0, data, CHUNK_SIZE);
+	while (read == CHUNK_SIZE) {
+		offset += CHUNK_SIZE;
+		data = xrealloc(data, offset + CHUNK_SIZE);
+		read = xread(0, data + offset, CHUNK_SIZE);
+	}
+	return data;
+}
+
 static void show_new(char *type, unsigned char *sha1_new)
 {
 	fprintf(stderr, "  %s: %s\n", type,
@@ -463,12 +477,18 @@ int cmd_fetch__tool(int argc, const char **argv, const char *prefix)
 	if (!strcmp("parse-reflist", argv[1])) {
 		if (argc != 3)
 			return error("parse-reflist takes 1 arg");
-		return parse_reflist(argv[2]);
+		const char *reflist = argv[2];
+		if (!strcmp(reflist, "-"))
+			reflist = get_stdin();
+		return parse_reflist(reflist);
 	}
 	if (!strcmp("expand-refs-wildcard", argv[1])) {
 		if (argc < 4)
 			return error("expand-refs-wildcard takes at least 2 args");
-		return expand_refs_wildcard(argv[2], argc - 3, argv + 3);
+		const char *reflist = argv[2];
+		if (!strcmp(reflist, "-"))
+			reflist = get_stdin();
+		return expand_refs_wildcard(reflist, argc - 3, argv + 3);
 	}
 
 	return error("Unknown subcommand: %s", argv[1]);
-- 
1.4.4.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] Use stdin reflist passing in parse-remote
  2007-02-13  1:21 ` [PATCH] Allow fetch--tool to read from stdin Julian Phillips
@ 2007-02-13  1:21   ` Julian Phillips
  2007-02-13  1:21     ` [PATCH] Use stdin reflist passing in git-fetch.sh Julian Phillips
  0 siblings, 1 reply; 8+ messages in thread
From: Julian Phillips @ 2007-02-13  1:21 UTC (permalink / raw)
  To: git; +Cc: Julian Phillips

Use the new stdin reflist passing mechanism for the call to
fetch--tool expand-refs-wildcard, allowing passing of more
than ~128K of reflist data.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---
 git-parse-remote.sh |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/git-parse-remote.sh b/git-parse-remote.sh
index 9b19a21..185eb54 100755
--- a/git-parse-remote.sh
+++ b/git-parse-remote.sh
@@ -81,7 +81,7 @@ get_remote_default_refs_for_push () {
 # is to help prevent randomly "globbed" ref from being chosen as
 # a merge candidate
 expand_refs_wildcard () {
-	git fetch--tool expand-refs-wildcard "$ls_remote_result" "$@"
+	echo "$ls_remote_result" | git fetch--tool expand-refs-wildcard "-" "$@"
 }
 
 # Subroutine to canonicalize remote:local notation.
-- 
1.4.4.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] Use stdin reflist passing in git-fetch.sh
  2007-02-13  1:21   ` [PATCH] Use stdin reflist passing in parse-remote Julian Phillips
@ 2007-02-13  1:21     ` Julian Phillips
  2007-02-13  2:31       ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Julian Phillips @ 2007-02-13  1:21 UTC (permalink / raw)
  To: git; +Cc: Julian Phillips

Use the new stdin reflist passing mechanism for the call to
fetch--tool parse-reflist, allowing passing of more than ~128K
of reflist data.

Signed-off-by: Julian Phillips <julian@quantumfyre.co.uk>
---
 git-fetch.sh |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/git-fetch.sh b/git-fetch.sh
index 6a8b196..5a1b4e7 100755
--- a/git-fetch.sh
+++ b/git-fetch.sh
@@ -169,7 +169,7 @@ fi
 
 fetch_native () {
 
-  eval=$(git-fetch--tool parse-reflist "$1")
+  eval=$(echo "$1" | git-fetch--tool parse-reflist "-")
   eval "$eval"
 
     ( : subshell because we muck with IFS
-- 
1.4.4.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] Use stdin reflist passing in git-fetch.sh
  2007-02-13  1:21     ` [PATCH] Use stdin reflist passing in git-fetch.sh Julian Phillips
@ 2007-02-13  2:31       ` Linus Torvalds
  0 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2007-02-13  2:31 UTC (permalink / raw)
  To: Julian Phillips; +Cc: git, Julian Phillips



On Tue, 13 Feb 2007, Julian Phillips wrote:
>
> Use the new stdin reflist passing mechanism for the call to
> fetch--tool parse-reflist, allowing passing of more than ~128K
> of reflist data.

This pretty much seems to assume that "echo" is a shell built-in.

Otherwise, the

	echo "$1"

part will still fall afoul of any exec limits.

Maybe that's aperfectly fine assumption. It's certainly true for bash (and 
probably _any_ shells that do any built-ins at all - echo is pretty damn 
basic ;)

		Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PU PATCH] Fix git fetch for very large ref counts
  2007-02-13  1:21 [PU PATCH] Fix git fetch for very large ref counts Julian Phillips
  2007-02-13  1:21 ` [PATCH] Allow fetch--tool to read from stdin Julian Phillips
@ 2007-02-13  3:18 ` Junio C Hamano
  2007-02-13 10:39   ` Julian Phillips
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2007-02-13  3:18 UTC (permalink / raw)
  To: Julian Phillips; +Cc: git

"Julian Phillips" <jp3@quantumfyre.co.uk> writes:

> The updated git fetch in pu is vastly improved on repositories with very
> large numbers of refs.  The time taken for a no-op fetch over ~9000 refs
> drops from ~48m to ~0.5m.
>
> However, before git fetch will actually run on a repository with ~9000
> refs the calling interface between fetch and fetch--tool needs to be
> changed.  The existing version passes the entire reflist on the command
> line, which means that it is subject to the maxiumum environment size
> passed to a child process by execve.
>
> The following patches add a stdin based interface to fetch--tool allowing
> the ~9000 refs to be passed without exceeding the environment limit.

Thanks.

But the ones in 'pu' were done primarily as demonstration of
where the bottlenecks are, and not meant for real-world
consumption.  I think the final shaving of 0.5m down to a few
seconds needs to move the ls_remote_result string currently kept
as a shell variable to a list of strings represented in a
git-fetch largely rewritten in C, and at that point the
interface from outside fetch--tool to throw 9000 refs at it
would be an internal function call and the code you fixed along
with new function you introduced would probably need to be
discarded.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PU PATCH] Fix git fetch for very large ref counts
  2007-02-13  3:18 ` [PU PATCH] Fix git fetch for very large ref counts Junio C Hamano
@ 2007-02-13 10:39   ` Julian Phillips
  2007-02-13 17:58     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Julian Phillips @ 2007-02-13 10:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Mon, 12 Feb 2007, Junio C Hamano wrote:

> "Julian Phillips" <jp3@quantumfyre.co.uk> writes:
>
>> The updated git fetch in pu is vastly improved on repositories with very
>> large numbers of refs.  The time taken for a no-op fetch over ~9000 refs
>> drops from ~48m to ~0.5m.
>>
>> However, before git fetch will actually run on a repository with ~9000
>> refs the calling interface between fetch and fetch--tool needs to be
>> changed.  The existing version passes the entire reflist on the command
>> line, which means that it is subject to the maxiumum environment size
>> passed to a child process by execve.
>>
>> The following patches add a stdin based interface to fetch--tool allowing
>> the ~9000 refs to be passed without exceeding the environment limit.
>
> Thanks.
>
> But the ones in 'pu' were done primarily as demonstration of
> where the bottlenecks are, and not meant for real-world
> consumption.  I think the final shaving of 0.5m down to a few
> seconds needs to move the ls_remote_result string currently kept
> as a shell variable to a list of strings represented in a
> git-fetch largely rewritten in C, and at that point the
> interface from outside fetch--tool to throw 9000 refs at it
> would be an internal function call and the code you fixed along
> with new function you introduced would probably need to be
> discarded.

And there I was thinking 0.5m was fast ... given how long I've been 
reading this list I really should have know better. ;)

I only really made the changes so I could try your improvements to fetch 
out - if they aren't necessary because you're making it even faster then I 
really don't have much cause to complain.

-- 
Julian

  ---
To be a kind of moral Unix, he touched the hem of Nature's shift.
 		-- Shelley

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PU PATCH] Fix git fetch for very large ref counts
  2007-02-13 10:39   ` Julian Phillips
@ 2007-02-13 17:58     ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2007-02-13 17:58 UTC (permalink / raw)
  To: Julian Phillips; +Cc: git

Julian Phillips <julian@quantumfyre.co.uk> writes:

> And there I was thinking 0.5m was fast ... given how long I've been
> reading this list I really should have know better. ;)

Well, the only thing we need to do in fetching between two
repositories that are already identical should be to compare
ls-remote output from both ends and immediately declare victory,
and it is unacceptable for such an obvious no-op to take 30
seconds.  At this point it really is the shell loop that is
killing us.

> I only really made the changes so I could try your improvements to
> fetch out - if they aren't necessary because you're making it even
> faster then I really don't have much cause to complain.

I've applied your fixes to jc/fetch topic and merged it back to
'pu'.  Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-02-13 17:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-13  1:21 [PU PATCH] Fix git fetch for very large ref counts Julian Phillips
2007-02-13  1:21 ` [PATCH] Allow fetch--tool to read from stdin Julian Phillips
2007-02-13  1:21   ` [PATCH] Use stdin reflist passing in parse-remote Julian Phillips
2007-02-13  1:21     ` [PATCH] Use stdin reflist passing in git-fetch.sh Julian Phillips
2007-02-13  2:31       ` Linus Torvalds
2007-02-13  3:18 ` [PU PATCH] Fix git fetch for very large ref counts Junio C Hamano
2007-02-13 10:39   ` Julian Phillips
2007-02-13 17:58     ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.