All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC] remote-helpers: set up a pipe to read fast-import response
@ 2012-04-11 14:32 Jonathan Nieder
  2012-04-11 16:10 ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-11 14:32 UTC (permalink / raw)
  To: git; +Cc: Dmitry Ivankov, Jeff King, Sverre Rabbelier

From: Dmitry Ivankov <divanorama@gmail.com>
Date: Tue, 21 Jun 2011 20:52:34 +0600

Remote helpers may want to issue "ls" or "cat-blob" requests to
fast-import.  For example, the svn remote helper from the
svn-dump-fast-import project uses that capability.

The response should be fed back to remote helper; let it go to the
remote helper's standard input (command stream).

After listing the branches to import:

	'import' SP 'refs/heads/foo' LF
	'import' SP 'refs/heads/bar' LF
	'import' SP 'refs/heads/baz' LF
	LF

the transport-helper does not send any commands to the remote helper
until the remote helper sends the "done" command to fast-import, so
this does not result in interleaved output from the transport-helper
and fast-import.

Signed-off-by: Dmitry Ivankov <divanorama@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Hi,

Something like this patch was sitting in my tree gathering moss but I
quite like it so I thought I should send it out for feedback.

I can't seem to find any copy of this patch in the list archives.
Maybe I received it by private email.  Dmitry, do you remember better?

Can also be found at

  git://repo.or.cz/git/jrn.git topics/di/remote-helper-blob-access

and will be part of the svn-fe-pu branch there in the next push.

It is possible (likely, even) that this doesn't work on Windows since
file descriptors are not inherited by default, but it won't cause harm
until someone tries to use "ls" or "cat-blob" from a remote helper.
If someone wants to write tests for it, that would be very useful.
Otherwise, we can deal with it once the remote helper hits the tree.

Changes since v2:

 - new description
 - documented in remote-helpers manpage
 - don't waste so much space for fastimport->argv.  (I restrained
   myself from using malloc in place of calloc since that change
   probably belongs in a separate patch.)

Thoughts?  Bugs?  Improvements?

Thanks,
Jonathan

 Documentation/git-remote-helpers.txt |    5 ++++-
 transport-helper.c                   |    6 +++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-remote-helpers.txt b/Documentation/git-remote-helpers.txt
index 3a23477c..0cae14c7 100644
--- a/Documentation/git-remote-helpers.txt
+++ b/Documentation/git-remote-helpers.txt
@@ -133,7 +133,10 @@ Supported if the helper has the "push" capability.
 
 'import' <name>::
 	Produces a fast-import stream which imports the current value
-	of the named ref. It may additionally import other refs as
+	of the named ref. Standard input for the helper is connected
+	to the fast-import backend's cat-blob stream so the helper
+	can read responses from "ls" and "cat-blob" requests.
+	The helper may additionally import other refs as
 	needed to construct the history efficiently. The script writes
 	to a helper-specific private namespace. The value of the named
 	ref should be written to a location in this namespace derived
diff --git a/transport-helper.c b/transport-helper.c
index acfc88e3..4c547b9a 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -349,12 +349,16 @@ static int fetch_with_fetch(struct transport *transport,
 
 static int get_importer(struct transport *transport, struct child_process *fastimport)
 {
+	char buf[32];
+	struct helper_data *data = transport->data;
 	struct child_process *helper = get_helper(transport);
 	memset(fastimport, 0, sizeof(*fastimport));
 	fastimport->in = helper->out;
-	fastimport->argv = xcalloc(5, sizeof(*fastimport->argv));
+	fastimport->argv = xcalloc(4, sizeof(*fastimport->argv));
 	fastimport->argv[0] = "fast-import";
 	fastimport->argv[1] = "--quiet";
+	snprintf(buf, 32, "--cat-blob-fd=%d", data->helper->in);
+	fastimport->argv[2] = buf;
 
 	fastimport->git_cmd = 1;
 	return start_command(fastimport);
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC] remote-helpers: set up a pipe to read fast-import response
  2012-04-11 14:32 [PATCH/RFC] remote-helpers: set up a pipe to read fast-import response Jonathan Nieder
@ 2012-04-11 16:10 ` Junio C Hamano
  2012-04-11 17:17   ` [PATCH/RFC] fast-import doc: deadlock avoidance in bidirectional mode Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2012-04-11 16:10 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Dmitry Ivankov, Jeff King, Sverre Rabbelier

Jonathan Nieder <jrnieder@gmail.com> writes:

> Something like this patch was sitting in my tree gathering moss but I
> quite like it so I thought I should send it out for feedback.
>
> I can't seem to find any copy of this patch in the list archives.
> Maybe I received it by private email.  Dmitry, do you remember better?
>
> Can also be found at
>
>   git://repo.or.cz/git/jrn.git topics/di/remote-helper-blob-access
>
> and will be part of the svn-fe-pu branch there in the next push.

Does this essentially connect the frontend and fast-import via
bidirectional pipes?  How is the flow control and deadlock avoidance
supposed to happen (I guess the side that wants to give a command to ask
for information needs to wait until it reads everything from the other
side to drain the pipe before doing so), and isn't it something that need
to be documented?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH/RFC] fast-import doc: deadlock avoidance in bidirectional mode
  2012-04-11 16:10 ` Junio C Hamano
@ 2012-04-11 17:17   ` Jonathan Nieder
  2012-04-11 17:32     ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-11 17:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Dmitry Ivankov, Jeff King, Sverre Rabbelier

If fast-import's command pipe and the frontend's cat-blob/ls response
pipe are both filled, there can be a deadlock.  Luckily all existing
frontends consume any pending cat-blob/ls responses completely before
writing the next command.

Document the requirements so future frontend authors and users can be
spared from the problem, too.  It is not always easy to catch that
kind of bug by testing.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Junio C Hamano wrote:

> Does this essentially connect the frontend and fast-import via
> bidirectional pipes?  How is the flow control and deadlock avoidance
> supposed to happen

fast-import never asks the frontend for information, which makes life a
little simpler.

Typically the interaction works just as you described:

 1. Frontend sends "ls" or "cat-blob" request and flushes it, and then
    blocks waiting for the response.

 2. Once fast-import catches up with pending commands, it sends its
    response to the "cat-blob response" pipe.

    (If the pipe does not have enough room, it fills the pipe and then
    blocks until some more room is available.  If the reader has
    closed the pipe, it assumes the import was botched and just exits.)

 3. Only once the frontend has received its response, it moves on and
    starts to send new commands.

A frontend can be more clever than that and use slack to queue some
extra commands, as long as the author understands:

 i.  fast-import is not guaranteed to make any progress in consuming
     its input until the "cat-blob response" pipe has been drained.

 ii. the command pipe is not guaranteed to be able to hold extra
     commands when fast-import is not consuming its input.  Probably
     512 bytes (_POSIX_PIPE_BUF) will fit but I don't know how
     portable that is to other fast-import backends so let's say the
     limit is 1 byte.

In other words, in practice it's best not to rely on the extra space
at all.

How about this?

 Documentation/git-fast-import.txt |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index ec6ef311..0ea649f4 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -942,6 +942,12 @@ This command can be used anywhere in the stream that comments are
 accepted.  In particular, the `cat-blob` command can be used in the
 middle of a commit but not in the middle of a `data` command.
 
+While in some cases the 'cat-blob' result will fit in the pipe buffer,
+allowing fast-import to continue processing additional commands, this
+is not guaranteed.  Frontends must consume the cat-blob response
+completely before performing any writes to fast-import that might
+block.
+
 `ls`
 ~~~~
 Prints information about the object at a path to a file descriptor
@@ -975,7 +981,12 @@ Reading from a named tree::
 
 See `filemodify` above for a detailed description of `<path>`.
 
-Output uses the same format as `git ls-tree <tree> {litdd} <path>`:
+While in some cases the 'ls' response will fit in the pipe buffer,
+allowing fast-import to continue processing additional commands, this
+is not guaranteed.  Frontends must consume the ls response completely
+before performing any writes to fast-import that might block.
+
+The 'ls' response uses the same format as `git ls-tree <tree> {litdd} <path>`:
 
 ====
 	<mode> SP ('blob' | 'tree' | 'commit') SP <dataref> HT <path> LF
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC] fast-import doc: deadlock avoidance in bidirectional mode
  2012-04-11 17:17   ` [PATCH/RFC] fast-import doc: deadlock avoidance in bidirectional mode Jonathan Nieder
@ 2012-04-11 17:32     ` Junio C Hamano
  2012-04-11 21:25       ` [PATCH/RFC v2] fast-import doc: cat-blob and ls responses need to be consumed quickly Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2012-04-11 17:32 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Dmitry Ivankov, Jeff King, Sverre Rabbelier

Jonathan Nieder <jrnieder@gmail.com> writes:

> How about this?

I find all the explanation above this line (elided) understandable and
sensible.

The expected use of --cat-blob-fd is to loop it back to the frontend, and
that is obvious to _us_, but there is nothing in the existing text (or the
text added by this patch) that suggests that fact to the first time
reader.  "Frontends must consume" may not "click" without that knowledge.

>  Documentation/git-fast-import.txt |   13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
> index ec6ef311..0ea649f4 100644
> --- a/Documentation/git-fast-import.txt
> +++ b/Documentation/git-fast-import.txt
> @@ -942,6 +942,12 @@ This command can be used anywhere in the stream that comments are
>  accepted.  In particular, the `cat-blob` command can be used in the
>  middle of a commit but not in the middle of a `data` command.
>  
> +While in some cases the 'cat-blob' result will fit in the pipe buffer,
> +allowing fast-import to continue processing additional commands, this
> +is not guaranteed.  Frontends must consume the cat-blob response
> +completely before performing any writes to fast-import that might
> +block.
> +
>  `ls`
>  ~~~~
>  Prints information about the object at a path to a file descriptor
> @@ -975,7 +981,12 @@ Reading from a named tree::
>  
>  See `filemodify` above for a detailed description of `<path>`.
>  
> -Output uses the same format as `git ls-tree <tree> {litdd} <path>`:
> +While in some cases the 'ls' response will fit in the pipe buffer,
> +allowing fast-import to continue processing additional commands, this
> +is not guaranteed.  Frontends must consume the ls response completely
> +before performing any writes to fast-import that might block.
> +
> +The 'ls' response uses the same format as `git ls-tree <tree> {litdd} <path>`:
>  
>  ====
>  	<mode> SP ('blob' | 'tree' | 'commit') SP <dataref> HT <path> LF

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH/RFC v2] fast-import doc: cat-blob and ls responses need to be consumed quickly
  2012-04-11 17:32     ` Junio C Hamano
@ 2012-04-11 21:25       ` Jonathan Nieder
  2012-04-11 21:32         ` Junio C Hamano
  0 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-11 21:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Dmitry Ivankov, Jeff King, Sverre Rabbelier

If fast-import's command pipe and the frontend's cat-blob/ls response
pipe are both filled, there can be a deadlock.  Luckily all existing
frontends consume any pending cat-blob/ls responses completely before
writing the next command.

Document the requirements so future frontend authors and users can be
spared from the problem, too.  It is not always easy to catch that
kind of bug by testing.

To set the scene, add some words of explanation to help the novice
understand that "cat-blob" and "ls" output are meant for consumption
by the frontend.

Reported-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Junio C Hamano wrote:

> The expected use of --cat-blob-fd is to loop it back to the frontend, and
> that is obvious to _us_, but there is nothing in the existing text (or the
> text added by this patch) that suggests that fact to the first time
> reader.  "Frontends must consume" may not "click" without that knowledge.

True.  Here's a rough attempt (again against "master").

 Documentation/git-fast-import.txt |   44 +++++++++++++++++++++++++++++++++----
 1 file changed, 40 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-fast-import.txt b/Documentation/git-fast-import.txt
index ec6ef311..7b220c28 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -98,9 +98,10 @@ OPTIONS
 	options.
 
 --cat-blob-fd=<fd>::
-	Specify the file descriptor that will be written to
-	when the `cat-blob` command is encountered in the stream.
-	The default behaviour is to write to `stdout`.
+	Write responses to `cat-blob` and `ls` queries to the
+	file descriptor <fd> instead of `stdout`.  Allows `progress`
+	output intended for the end-user to be separated from other
+	output.
 
 --done::
 	Require a `done` command at the end of the stream.
@@ -942,6 +943,9 @@ This command can be used anywhere in the stream that comments are
 accepted.  In particular, the `cat-blob` command can be used in the
 middle of a commit but not in the middle of a `data` command.
 
+See ``Responses To Commands'' below for details about how to read
+this output safely.
+
 `ls`
 ~~~~
 Prints information about the object at a path to a file descriptor
@@ -991,6 +995,9 @@ instead report
 	missing SP <path> LF
 ====
 
+See ``Responses To Commands'' below for details about how to read
+this output safely.
+
 `feature`
 ~~~~~~~~~
 Require that fast-import supports the specified feature, or abort if
@@ -1079,6 +1086,35 @@ If the `--done` command line option or `feature done` command is
 in use, the `done` command is mandatory and marks the end of the
 stream.
 
+Responses To Commands
+---------------------
+New objects written by fast-import are not available immediately.
+Most fast-import commands have no visible effect until the next
+checkpoint (or completion).  The frontend can send commands to
+fill fast-import's input pipe without worrying about how quickly
+they will take effect, which improves performance by simplifying
+scheduling.
+
+For some frontends, though, it is useful to be able to read back
+data from the current repository as it is being updated (for
+example when the source material describes objects in terms of
+patches to be applied to previously imported objects).  This can
+be accomplished by connecting the frontend and fast-import via
+bidirectional pipes:
+
+====
+	mkfifo fast-import-output
+	frontend <fast-import-output |
+	git fast-import >fast-import-output
+====
+
+A frontend set up this way can use `progress`, `ls`, and `cat-blob`
+commands to read information from the import in progress.
+
+To avoid deadlock, such frontends must completely consume any
+pending output from `progress`, `ls`, and `cat-blob` before
+performing writes to fast-import that might block.
+
 Crash Reports
 -------------
 If fast-import is supplied invalid input it will terminate with a
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC v2] fast-import doc: cat-blob and ls responses need to be consumed quickly
  2012-04-11 21:25       ` [PATCH/RFC v2] fast-import doc: cat-blob and ls responses need to be consumed quickly Jonathan Nieder
@ 2012-04-11 21:32         ` Junio C Hamano
  2012-04-11 21:46           ` Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: Junio C Hamano @ 2012-04-11 21:32 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Dmitry Ivankov, Jeff King, Sverre Rabbelier

Jonathan Nieder <jrnieder@gmail.com> writes:

> True.  Here's a rough attempt (again against "master").

Looks good.  Do you want me to queue it, or would it be of not much use
outside the context of the other patch that actually does create the
loop-back?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH/RFC v2] fast-import doc: cat-blob and ls responses need to be consumed quickly
  2012-04-11 21:32         ` Junio C Hamano
@ 2012-04-11 21:46           ` Jonathan Nieder
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2012-04-11 21:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Dmitry Ivankov, Jeff King, Sverre Rabbelier

Junio C Hamano wrote:

> Looks good.  Do you want me to queue it, or would it be of not much use
> outside the context of the other patch that actually does create the
> loop-back?

Thanks.  It's useful for git versions v1.7.9.5~1^2~2 ("fast-import:
add 'ls' command") and newer.  I've pushed a copy of the patch to

  git://repo.or.cz/git/jrn.git fast-import

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-04-11 21:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-11 14:32 [PATCH/RFC] remote-helpers: set up a pipe to read fast-import response Jonathan Nieder
2012-04-11 16:10 ` Junio C Hamano
2012-04-11 17:17   ` [PATCH/RFC] fast-import doc: deadlock avoidance in bidirectional mode Jonathan Nieder
2012-04-11 17:32     ` Junio C Hamano
2012-04-11 21:25       ` [PATCH/RFC v2] fast-import doc: cat-blob and ls responses need to be consumed quickly Jonathan Nieder
2012-04-11 21:32         ` Junio C Hamano
2012-04-11 21:46           ` Jonathan Nieder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.