All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFH] send-pack: fix pipeline.
@ 2006-12-29 10:37 Junio C Hamano
  2006-12-29 20:13 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2006-12-29 10:37 UTC (permalink / raw)
  To: git; +Cc: Andy Whitcroft

send-pack builds a pipeline that runs "rev-list | pack-objects"
and sends the output from pack-objects to the other side, while
feeding the input side of that pipe from itself.  However, the
file descriptor that is given to this pipeline (so that it can
be dup2(2)'ed into file descriptor 1 of pack-objects) is closed
by the caller before the complex fork+exec dance!  Worse yet,
the caller already dup2's it to 1, so the child process did not
even have to.

I do not understand how this code could possibly have been
working, but it somehow was working by accident.

Merging the sliding mmap() code reveals this problem, presumably
because it keeps one extra file descriptor open for a packfile
and changes the way file descriptors are allocated.  I am too
tired to diagnose the problem now, but this seems to be a
sensible fix.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 * With this patch (and another one I sent out a fix already),
   it appears that the send-pack problem I was having with
   sp/mmap topic seems to have disappeared.  But that is no way
   a proof that everything is peachy now.

   Somebody less tired than myself should really audit the
   pipeline built by send-pack.

 send-pack.c |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/send-pack.c b/send-pack.c
index cc884f3..54de96e 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -58,7 +58,7 @@ static void exec_rev_list(struct ref *refs)
 /*
  * Run "rev-list --stdin | pack-objects" pipe.
  */
-static void rev_list(int fd, struct ref *refs)
+static void rev_list(struct ref *refs)
 {
 	int pipe_fd[2];
 	pid_t pack_objects_pid;
@@ -71,10 +71,8 @@ static void rev_list(int fd, struct ref *refs)
 		 * and writes to the original fd
 		 */
 		dup2(pipe_fd[0], 0);
-		dup2(fd, 1);
 		close(pipe_fd[0]);
 		close(pipe_fd[1]);
-		close(fd);
 		exec_pack_objects();
 		die("pack-objects setup failed");
 	}
@@ -85,7 +83,6 @@ static void rev_list(int fd, struct ref *refs)
 	dup2(pipe_fd[1], 1);
 	close(pipe_fd[0]);
 	close(pipe_fd[1]);
-	close(fd);
 	exec_rev_list(refs);
 }
 
@@ -111,7 +108,7 @@ static void rev_list_generate(int fd, struct ref *refs)
 		close(pipe_fd[0]);
 		close(pipe_fd[1]);
 		close(fd);
-		rev_list(fd, refs);
+		rev_list(refs);
 		die("rev-list setup failed");
 	}
 	if (rev_list_generate_pid < 0)

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFH] send-pack: fix pipeline.
  2006-12-29 10:37 [PATCH/RFH] send-pack: fix pipeline Junio C Hamano
@ 2006-12-29 20:13 ` Junio C Hamano
  2006-12-29 21:20   ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2006-12-29 20:13 UTC (permalink / raw)
  To: git; +Cc: Andy Whitcroft, Linus Torvalds

I really need a sanity checking on this one.  I think I got the
botched pipeline fixed with the patch I am replying to, but I do
not understand the waitpid() business.  Care to enlighten me?

-- >8 --

 Documentation/technical/send-pack-pipeline.txt |  112 ++++++++++++++++++++++++
 1 files changed, 112 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/technical/send-pack-pipeline.txt

diff --git a/Documentation/technical/send-pack-pipeline.txt b/Documentation/technical/send-pack-pipeline.txt
new file mode 100644
index 0000000..bd32aff
--- /dev/null
+++ b/Documentation/technical/send-pack-pipeline.txt
@@ -0,0 +1,112 @@
+git-send-pack
+=============
+
+Overall operation
+-----------------
+
+. Connects to the remote side and invokes git-receive-pack.
+
+. Learns what refs the remote has and what commit they point at.
+  Matches them to the refspecs we are pushing.
+
+. Checks if there are non-fast-forwards.  Unlike fetch-pack,
+  the repository send-pack runs in is supposed to be a superset
+  of the recipient in fast-forward cases, so there is no need
+  for want/have exchanges, and fast-forward check can be done
+  locally.  Tell the result to the other end.
+
+. Calls pack_objects() which generates a packfile and sends it
+  over to the other end.
+
+. If the remote side is new enough (v1.1.0 or later), wait for
+  the unpack and hook status from the other end.
+
+. Exit with appropriate error codes.
+
+
+Pack_objects pipeline
+---------------------
+
+This function gets one file descriptor (`out`) which is either a
+socket (over the network) or a pipe (local).  What's written to
+this fd goes to git-receive-pack to be unpacked.
+
+    send-pack ---> fd ---> receive-pack
+
+It somehow forks once, but does not wait for it.  I am not sure
+why.
+
+The forked child calls rev_list_generate() with that file
+descriptor (while the parent closes `out` -- the child will be
+the one that writes the packfile to the other end).
+
+    send-pack
+       |
+       rev-list-generate ---> fd ---> receive-pack
+
+
+Then rev-list-generate forks after creates a pipe; the child
+will become a pipeline "rev-list --stdin | pack-objects", which
+is the rev_list() function, while the parent feeds that pipeline
+the list of refs.
+
+    send-pack
+       |
+       rev-list-generate ---> fd ---> receive-pack
+          | ^ (pipe)
+	  v |
+         rev-list
+
+The child process, before calling rev-list, rearranges the file
+descriptors:
+
+. what it reads from rev-list-generate via pipe becomes the
+  stdin; this is to feed the upstream of the pipeline which will
+  be git-rev-list process.
+
+. what it writes to its stdout goes to the fd connected to
+  receive-pack.
+
+On the other hand, the parent process, before starting to feed
+the child pipeline, closes the reading side of the pipe and fd
+to receive-pack.
+
+    send-pack
+       |
+       rev-list-generate
+          |
+	  v [0]
+         rev-list [1] ---> receive-pack
+
+The parent then writes to the pipe and later closes it.  There
+is a commented out waitpid to wait for the rev-list side before
+it exits, I again do not understand why.
+
+The rev-list function further sets up a pipe and forks to run
+git-rev-list piped to git-pack-objects.  The child side, before
+exec'ing git-pack-objects, rearranges the file descriptors:
+
+. what it reads from the pipe becomes the stdin; this gets the
+  list of objects from the git-rev-list process.
+
+. its stdout is already connected to receive-pack, so what it
+  generates goes there.
+
+The parent process arranges its file descriptors before exec'ing
+git-rev-list:
+
+. its stdout is sent to the pipe to feed git-pack-objects.
+
+. its stdin is already connected to rev-list-generate and will
+  read the set of refs from it.
+
+
+    send-pack
+       |
+       rev-list-generate
+          |
+	  v [0]
+	  git-rev-list [1] ---> [0] git-pack-objects [1] ---> receive-pack
+
+
+

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFH] send-pack: fix pipeline.
  2006-12-29 20:13 ` Junio C Hamano
@ 2006-12-29 21:20   ` Linus Torvalds
  2006-12-29 23:53     ` Junio C Hamano
  2006-12-31  9:30     ` [PATCH/RFH] send-pack: fix pipeline Junio C Hamano
  0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2006-12-29 21:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Andy Whitcroft



On Fri, 29 Dec 2006, Junio C Hamano wrote:
>
> I really need a sanity checking on this one.  I think I got the
> botched pipeline fixed with the patch I am replying to, but I do
> not understand the waitpid() business.  Care to enlighten me?

I think it was a beginning of a half-hearted attempt to check the exit 
status of the rev-list in case something went wrong.

Which we simply don't do, so if git-rev-list ends up with some problem 
(due to a corrupt git repo or something), it will just send a partial 
pack.

For some reason I thought we had fixed that by just generating the object 
list internally, but I guess we don't do that. That's just stupid. We 
should make "send-pack.c" use

	list-heads | git pack-objects --revs

	list-heads | git-rev-list --stdin | git-pack-objects

because as it is now, I think send-pack is more fragile than it needs to 
be.

Or maybe I'm just confused.

		Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFH] send-pack: fix pipeline.
  2006-12-29 21:20   ` Linus Torvalds
@ 2006-12-29 23:53     ` Junio C Hamano
  2007-01-02 14:06       ` Andy Whitcroft
  2006-12-31  9:30     ` [PATCH/RFH] send-pack: fix pipeline Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2006-12-29 23:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Andy Whitcroft

Linus Torvalds <torvalds@osdl.org> writes:

> On Fri, 29 Dec 2006, Junio C Hamano wrote:
>>
>> I really need a sanity checking on this one.  I think I got the
>> botched pipeline fixed with the patch I am replying to, but I do
>> not understand the waitpid() business.  Care to enlighten me?
>
> I think it was a beginning of a half-hearted attempt to check the exit 
> status of the rev-list in case something went wrong.
>
> Which we simply don't do, so if git-rev-list ends up with some problem 
> (due to a corrupt git repo or something), it will just send a partial 
> pack.
>
> For some reason I thought we had fixed that by just generating the object 
> list internally, but I guess we don't do that. That's just stupid. We 
> should make "send-pack.c" use
>
> 	list-heads | git pack-objects --revs
>
> 	list-heads | git-rev-list --stdin | git-pack-objects
>
> because as it is now, I think send-pack is more fragile than it needs to 
> be.
>
> Or maybe I'm just confused.

Dont' worry, you are no more confused than I am ;-).

"I thought we've done the 'pack-objects --revs' for the
upload-pack side but haven't done so on the send-pack side." was
what I initially wrote, but apparently we haven't.  On the other
hand, I think upload-pack gets error termination from rev-list
right.

It seems that repack is the only thing that uses the internal
rev-list.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFH] send-pack: fix pipeline.
  2006-12-29 21:20   ` Linus Torvalds
  2006-12-29 23:53     ` Junio C Hamano
@ 2006-12-31  9:30     ` Junio C Hamano
  2006-12-31 19:56       ` Linus Torvalds
  1 sibling, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2006-12-31  9:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> For some reason I thought we had fixed that by just generating the object 
> list internally, but I guess we don't do that. That's just stupid.

Thanks.  How about this?

-- >8 --
[PATCH] send-pack: tell pack-objects to use its internal rev-list.

This means one less process in the pipeline to worry about, and
removes about 1/8 of the code.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---
 send-pack.c |  139 ++++++++++++++++++-----------------------------------------
 1 files changed, 42 insertions(+), 97 deletions(-)

diff --git a/send-pack.c b/send-pack.c
index 29cf736..eaa6efb 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -14,100 +14,49 @@ static int send_all;
 static int force_update;
 static int use_thin_pack;
 
-static void exec_pack_objects(void)
-{
-	static const char *args[] = {
-		"pack-objects",
-		"--all-progress",
-		"--stdout",
-		NULL
-	};
-	execv_git_cmd(args);
-	die("git-pack-objects exec failed (%s)", strerror(errno));
-}
-
-static void exec_rev_list(struct ref *refs)
-{
-	static const char *args[4];
-	int i = 0;
-
-	args[i++] = "rev-list";	/* 0 */
-	if (use_thin_pack)	/* 1 */
-		args[i++] = "--objects-edge";
-	else
-		args[i++] = "--objects";
-
-	args[i++] = "--stdin";
-
-	args[i] = NULL;
-	execv_git_cmd(args);
-	die("git-rev-list exec failed (%s)", strerror(errno));
-}
-
 /*
- * Run "rev-list --stdin | pack-objects" pipe.
- */
-static void rev_list(struct ref *refs)
-{
-	int pipe_fd[2];
-	pid_t pack_objects_pid;
-
-	if (pipe(pipe_fd) < 0)
-		die("rev-list setup: pipe failed");
-	pack_objects_pid = fork();
-	if (!pack_objects_pid) {
-		/* The child becomes pack-objects; reads from pipe
-		 * and writes to the original fd
-		 */
-		dup2(pipe_fd[0], 0);
-		close(pipe_fd[0]);
-		close(pipe_fd[1]);
-		exec_pack_objects();
-		die("pack-objects setup failed");
-	}
-	if (pack_objects_pid < 0)
-		die("pack-objects fork failed");
-
-	/* We become rev-list --stdin; output goes to pipe. */
-	dup2(pipe_fd[1], 1);
-	close(pipe_fd[0]);
-	close(pipe_fd[1]);
-	exec_rev_list(refs);
-}
-
-/*
- * Create "rev-list --stdin | pack-objects" pipe and feed
- * the refs into the pipeline.
+ * Make a pack stream and spit it out into file descriptor fd
  */
-static void rev_list_generate(int fd, struct ref *refs)
+static int pack_objects(int fd, struct ref *refs)
 {
 	int pipe_fd[2];
-	pid_t rev_list_generate_pid;
+	pid_t pid;
 
 	if (pipe(pipe_fd) < 0)
-		die("rev-list-generate setup: pipe failed");
-	rev_list_generate_pid = fork();
-	if (!rev_list_generate_pid) {
-		/* The child becomes the "rev-list | pack-objects"
-		 * pipeline.  It takes input from us, and its output
-		 * goes to fd.
+		return error("send-pack: pipe failed");
+	pid = fork();
+	if (!pid) {
+		/*
+		 * The child becomes pack-objects --revs; we feed
+		 * the revision parameters to it via its stdin and
+		 * let its stdout go back to the other end.
 		 */
+		static const char *args[] = {
+			"pack-objects",
+			"--all-progress",
+			"--revs",
+			"--stdout",
+			NULL,
+			NULL,
+		};
+		if (use_thin_pack)
+			args[4] = "--thin";
 		dup2(pipe_fd[0], 0);
 		dup2(fd, 1);
 		close(pipe_fd[0]);
 		close(pipe_fd[1]);
 		close(fd);
-		rev_list(refs);
-		die("rev-list setup failed");
+		execv_git_cmd(args);
+		die("git-pack-objects exec failed (%s)", strerror(errno));
 	}
-	if (rev_list_generate_pid < 0)
-		die("rev-list-generate fork failed");
 
-	/* We feed the rev parameters to them.  We do not write into
-	 * fd nor read from the pipe.
+	/*
+	 * We feed the pack-objects we just spawned with revision
+	 * parameters by writing to the pipe.
 	 */
 	close(pipe_fd[0]);
 	close(fd);
+
 	while (refs) {
 		char buf[42];
 
@@ -126,28 +75,24 @@ static void rev_list_generate(int fd, struct ref *refs)
 		refs = refs->next;
 	}
 	close(pipe_fd[1]);
-	// waitpid(rev_list_generate_pid);
-	exit(0);
-}
 
-/*
- * Make a pack stream and spit it out into file descriptor fd
- */
-static void pack_objects(int fd, struct ref *refs)
-{
-	pid_t rev_list_pid;
+	for (;;) {
+		int status, code;
+		pid_t waiting = waitpid(pid, &status, 0);
 
-	rev_list_pid = fork();
-	if (!rev_list_pid) {
-		rev_list_generate(fd, refs);
-		die("rev-list setup failed");
+		if (waiting < 0) {
+			if (errno == EINTR)
+				continue;
+			return error("waitpid failed (%s)", strerror(errno));
+		}
+		if ((waiting != pid) || WIFSIGNALED(status) ||
+		    !WIFEXITED(status))
+			return error("pack-objects died with strange error");
+		code = WEXITSTATUS(status);
+		if (code)
+			return -code;
+		return 0;
 	}
-	if (rev_list_pid < 0)
-		die("rev-list fork failed");
-	/*
-	 * We don't wait for the rev-list pipeline in the parent:
-	 * we end up waiting for the other end instead
-	 */
 }
 
 static void unmark_and_free(struct commit_list *list, unsigned int mark)
@@ -379,7 +324,7 @@ static int send_pack(int in, int out, int nr_refspec, char **refspec)
 
 	packet_flush(out);
 	if (new_refs)
-		pack_objects(out, remote_refs);
+		ret = pack_objects(out, remote_refs);
 	close(out);
 
 	if (expect_status_report) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFH] send-pack: fix pipeline.
  2006-12-31  9:30     ` [PATCH/RFH] send-pack: fix pipeline Junio C Hamano
@ 2006-12-31 19:56       ` Linus Torvalds
  0 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2006-12-31 19:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sun, 31 Dec 2006, Junio C Hamano wrote:
> 
> Thanks.  How about this?

Looks good to me. It should hopefully speed things up too, since now we 
don't need to first walk the whole object list and then _re-walk_ it when 
packing to get the size.

		Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH/RFH] send-pack: fix pipeline.
  2006-12-29 23:53     ` Junio C Hamano
@ 2007-01-02 14:06       ` Andy Whitcroft
  2007-01-02 14:12         ` [PATCH] send pack check for failure to send revisions list Andy Whitcroft
  0 siblings, 1 reply; 8+ messages in thread
From: Andy Whitcroft @ 2007-01-02 14:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
> 
>> On Fri, 29 Dec 2006, Junio C Hamano wrote:
>>> I really need a sanity checking on this one.  I think I got the
>>> botched pipeline fixed with the patch I am replying to, but I do
>>> not understand the waitpid() business.  Care to enlighten me?
>> I think it was a beginning of a half-hearted attempt to check the exit 
>> status of the rev-list in case something went wrong.
>>
>> Which we simply don't do, so if git-rev-list ends up with some problem 
>> (due to a corrupt git repo or something), it will just send a partial 
>> pack.
>>
>> For some reason I thought we had fixed that by just generating the object 
>> list internally, but I guess we don't do that. That's just stupid. We 
>> should make "send-pack.c" use
>>
>> 	list-heads | git pack-objects --revs
>>
>> 	list-heads | git-rev-list --stdin | git-pack-objects
>>
>> because as it is now, I think send-pack is more fragile than it needs to 
>> be.
>>
>> Or maybe I'm just confused.
> 
> Dont' worry, you are no more confused than I am ;-).
> 
> "I thought we've done the 'pack-objects --revs' for the
> upload-pack side but haven't done so on the send-pack side." was
> what I initially wrote, but apparently we haven't.  On the other
> hand, I think upload-pack gets error termination from rev-list
> right.
> 
> It seems that repack is the only thing that uses the internal
> rev-list.

>From what I can see in next/pu (by the time I stopped stuffing food and
booze into myself and remembered how to turn on the computer) you have
ripped all this code out and started using the builtin rev-list
functions.  So what I can see in there now looks sane, and seems to work
in some limited testing here.

Reading the code does highlight a weakness in the face of incomplete
writes in the ref list send, which has always been in there.  Now we may
never see these on Linux, but as we do not know what OS is under us and
the relevant standards say they can occur we should cope me thinks.

I have just been testing a patch for that which I will post in follow up
to this post.

-apw

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] send pack check for failure to send revisions list
  2007-01-02 14:06       ` Andy Whitcroft
@ 2007-01-02 14:12         ` Andy Whitcroft
  0 siblings, 0 replies; 8+ messages in thread
From: Andy Whitcroft @ 2007-01-02 14:12 UTC (permalink / raw)
  To: git

When passing the revisions list to pack-objects we do not check for
errors nor short writes.  Introduce a new write_in_full which will
handle short writes and report errors to the caller.  Use this to
short cut the send on failure, allowing us to wait for and report
the child in case the failure is its fault.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
---
diff --git a/send-pack.c b/send-pack.c
index eaa6efb..c195d08 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -65,12 +65,16 @@ static int pack_objects(int fd, struct ref *refs)
 			memcpy(buf + 1, sha1_to_hex(refs->old_sha1), 40);
 			buf[0] = '^';
 			buf[41] = '\n';
-			write(pipe_fd[1], buf, 42);
+			if (!write_in_full(pipe_fd[1], buf, 42,
+						"send-pack: send refs"))
+				break;
 		}
 		if (!is_null_sha1(refs->new_sha1)) {
 			memcpy(buf, sha1_to_hex(refs->new_sha1), 40);
 			buf[40] = '\n';
-			write(pipe_fd[1], buf, 41);
+			if (!write_in_full(pipe_fd[1], buf, 41,
+						"send-pack: send refs"))
+				break;
 		}
 		refs = refs->next;
 	}
diff --git a/write_or_die.c b/write_or_die.c
index 8cf6486..6db1d31 100644
--- a/write_or_die.c
+++ b/write_or_die.c
@@ -59,3 +59,26 @@ int write_or_whine(int fd, const void *buf, size_t count, const char *msg)
 
 	return 1;
 }
+
+int write_in_full(int fd, const void *buf, size_t count, const char *msg)
+{
+	const char *p = buf;
+	ssize_t written;
+
+	while (count > 0) {
+		written = xwrite(fd, p, count);
+		if (written == 0) {
+			fprintf(stderr, "%s: disk full?\n", msg);
+			return 0;
+		}
+		else if (written < 0) {
+			fprintf(stderr, "%s: write error (%s)\n",
+				msg, strerror(errno));
+			return 0;
+		}
+		count -= written;
+		p += written;
+	}
+
+	return 1;
+}

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-01-02 14:44 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-12-29 10:37 [PATCH/RFH] send-pack: fix pipeline Junio C Hamano
2006-12-29 20:13 ` Junio C Hamano
2006-12-29 21:20   ` Linus Torvalds
2006-12-29 23:53     ` Junio C Hamano
2007-01-02 14:06       ` Andy Whitcroft
2007-01-02 14:12         ` [PATCH] send pack check for failure to send revisions list Andy Whitcroft
2006-12-31  9:30     ` [PATCH/RFH] send-pack: fix pipeline Junio C Hamano
2006-12-31 19:56       ` Linus Torvalds

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.