All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: "Jakub Narębski" <jnareb@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Torsten Bögershausen" <tboegi@web.de>,
	"Martin-Louis Bright" <mlbright@gmail.com>,
	"Eric Wong" <e@80x24.org>, "Jeff King" <peff@peff.net>
Subject: Re: Designing the filter process protocol (was: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option)
Date: Fri, 5 Aug 2016 12:32:27 +0200	[thread overview]
Message-ID: <FE7B04D2-C017-42AF-BD30-6160D251028C@gmail.com> (raw)
In-Reply-To: <607c07fe-5b6f-fd67-13e1-705020c267ee@gmail.com>


> On 03 Aug 2016, at 20:30, Jakub Narębski <jnareb@gmail.com> wrote:
> 
> The ultimate goal is to be able to run filter drivers faster for both `clean`
> and `smudge` operations.  This is done by starting filter driver once per
> git command invocation, instead of once per file being processed.  Git needs
> to pass actual contents of files to filter driver, and get its output.
> 
> We want the protocol between Git and filter driver process to be extensible,
> so that new features can be added without modifying protocol.
> 
> 
> 1. CONFIGURATION
> 
> As I wrote, there are different ways of configuring new-type filter driver:
> 
> ...
> 
> # Using a single variable for new filter type, and decide on which phase
>   (which operation) is supported by filter driver during the handshake
>   *(current approach)*
> 
>   	[filter "protocol"]
>   		process = rot13-filtes.pl
> 
>   PROS: per-file and per-command filters possible with precedence rule;
>         extensible to other types of drivers: textconv, diff, etc.
>         only one invocation for commands which use both clean and smudge
>   CONS: need single driver to be responsible for both clean and smudge;
>         need to run driver to know that it does not support given
>           operation (workaround exists)
> 
> 
> 2. HANDSHAKE (INITIALIZATION)
> 
> Next, there is deciding on and designing the handshake between Git (between
> Git command) and the filter driver process.  With the `filter.<driver>.process`
> solution the driver needs to tell which operations among (for now) "clean"
> and "smudge" it does support.  Plus it provides a way to extend protocol,
> adding new features, like support for streaming, cleaning from file or
> smudging to file, providing size upfront, perhaps even progress report.
> 
> Current handshake consist of filter driver printing a signature, version
> number and capabilities, in that order.  Git checks that it is well formed
> and matches expectations, and notes which of "clean" and "smudge" operations
> are supported by the filter.
> 
> There is no interaction from the Git side in the handshake, for example to
> set options and expectations common to all files being filtered.  Take
> one possible extension of protocol: supporting streaming.  The filter
> driver needs to know whether it needs to read all the input, or whether
> it can start printing output while input is incoming (e.g. to reduce
> memory consumption)... though we may simply decide it to be next version
> of the protocol.
> 
> On the other hand if the handshake began with Git sending some initializer
> info to the filter driver, we probably could detect one-shot filter
> misconfigured as process-filter.

OK, I'll look into this.


> Note that we need some way of deciding where handshake ends, either by
> specifying number of entries (currently: three lines / pkt-line packets),
> or providing some terminator ("smart" transport protocol uses flush packet
> for this).
> 
> ...

Would you be OK with Peff's suggestion?

  version=2
  clean=true
  smudge=true
  0000

http://public-inbox.org/git/20160803224619.bwtbvmslhuicx2qi%40sigill.intra.peff.net/



> 3. SENDING CONTENTS (FILE TO BE FILTERED AND FILTER OUTPUT)
> 
> Next thing to design is decision how to send contents to be filtered
> to the filter driver process, and how to get filtered output from the
> filter driver process.
> 
> ...
> 
> # Send/receive data file by file, using some kind of chunking,
>   with a end-of-file marker.  The solution used by Git is
>   pkt-line, with flush packet used to signal end of file.
> 
>   This is protocol used by the current implementation.
> 
>   PROS:
>   - no need to know size upfront, so easier streaming support
>   - you can signal error that happened during output, after
>     some data were sent, as well as error known upfront
>   - tracing support for free (GIT_TRACE_PACKET)
>   CONS:
>   - filter driver program slightly more difficult to implement
>   - some negligible amount of overhead
> 
> If we want in the end to implement streaming, then the last solution
> is the way to go.
> 
> 
> 4. PER-FILE HANDSHAKE - SENDING FILE TO FILTER
> 
> Let's assume that for simplicity we want to implement (for now) only
> the synchronous (non-streaming) case, where we send whole contents
> of a file to filter driver process, and *then* read filter driver
> output.
> ...
> 
> If we are using pkt-line, then the convention is that text lines
> are terminated using LF ("\n") character.  This needs to be stated
> explicitly in the documentation for filter.<driver>.process writers.
> 
>    git> packet:  [operation] clean size=67\n
> 
> We could denote that it is operation name, but it is obvious from
> position in the stream, thus not really needed.

I would prefer not to mix command and size in one packet as it
makes parsing a little more difficult.


> ...
> 
> The Git would sent contents of the file to be filtered, using
> as many pack lines as needed (note: large file support needs
> to be tested, at least as expensive test).  Flush packet is
> used to signal the end of the file.
> 
>    git> packets:  <file contents>
>    git> flush packet

If expensive tests are enabled the test suite will process data
larger then max pkt size.


> 5. FILTER DRIVER PROCESS RESPONSE
> 
> First filter should, in my opinion, reply that it received the
> request (or the command, in the case of streaming supported).
> Also, in this response it can provide further information to
> Git process.
> 
>    git< packet: [received]  ok size=67\n

I think this would be different for real streaming and the current
non-streaming... therefore it would complicate the protocol?!
I wonder if it is truly necessary.


> This response could be used to refuse to filter specific file
> upfront (for example if the file is not present in the artifactory
> for git-LFS solutions).
> 
>   git< packet: [rejected]  reject\n

Reject is already supported in v4.


> We can even provide the reasoning to Git (maybe in the future
> extension)... or filter driver can print the explanation to the
> standard error (but then, no --quiet / --verbose support).
> 
>   git< packet: [rejected]  reject with-message\n
>   git< packet: [message]   File not found on server\n
>   git< flush packet

I think Git shouldn't care about these details. If the filter
needs to tell something then it should use stderr. 


> Another response, which I think should be standarized, or at
> least described in the documentation, is filter driver refusing
> to filter further (e.g. git-LFS and network is down), to be not
> restarted by Git.
> 
>   git< packet: [quit]      quit msg=Server error\n
> 
> or
> 
>   git< packet: [quit]      quit Server error\n
> 
> or
> 
>   git< packet: [quit]      quit with-message\n
>   git< packet: [message]   Server error\n
>   git< flush packet
> 
> Maybe this is over-engineering, but I don't think so.

Interesting idea! I will look into this for v5!


> Next comes the output from the filter driver (filtered contents),
> using possibly multiple pkt-lines, ending with a flush packet:
> 
>    git< packets:  <filtered contents>
>    git< flush packet
> 
> Note that empty file would consist of zero pack lines of contents,
> and one flush packet.
> 
> Finally, to allow handling of [resumable] errors that occurred
> during sending file contents, especially for the future streaming
> filters case, we want to confirm that we send whole file
> successfully.
> 
>    git< packet: [status]   success\n
> 
> If there was an error during process, making data receives so far
> invalid, filter driver should tell about it
> 
>    git< packet: [status]   fail\n
> 
> or
> 
>    git< packet: [status]   reject\n
> 
> This may happen for example for UCS-2 <-> UTF-8 filter when invalid
> byte sequence is encountered.  This may happen for git-LFS if the
> server fails during fetch, and spare / slave server doesn't have
> a file.

Correct!


> We may want to quit filtering at this point, and not to send another
> file.
> 
>   git< packet: [status]    quit\n

I don't get this one. Git would restart the filter as soon as it finds
another file that needs to be filtered, right?


Thanks a lot for this write up!

- Lars

  reply	other threads:[~2016-08-05 10:32 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20160727000605.49982-1-larsxschneider%40gmail.com/>
2016-07-29 23:37 ` [PATCH v3 00/10] Git filter protocol larsxschneider
2016-07-29 23:37   ` [PATCH v3 01/10] pkt-line: extract set_packet_header() larsxschneider
2016-07-30 10:30     ` Jakub Narębski
2016-08-01 11:33       ` Lars Schneider
2016-08-03 20:05         ` Jakub Narębski
2016-08-05 11:52           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 02/10] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-07-30 10:49     ` Jakub Narębski
2016-08-01 12:00       ` Lars Schneider
2016-08-03 20:12         ` Jakub Narębski
2016-08-05 12:02           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 03/10] pkt-line: add packet_flush_gentle() larsxschneider
2016-07-30 12:04     ` Jakub Narębski
2016-08-01 12:28       ` Lars Schneider
2016-07-31 20:36     ` Torstem Bögershausen
2016-07-31 21:45       ` Lars Schneider
2016-08-02 19:56         ` Torsten Bögershausen
2016-08-05  9:59           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 04/10] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-07-30 12:29     ` Jakub Narębski
2016-08-01 12:18       ` Lars Schneider
2016-08-03 20:15         ` Jakub Narębski
2016-07-29 23:37   ` [PATCH v3 05/10] pack-protocol: fix maximum pkt-line size larsxschneider
2016-07-30 13:58     ` Jakub Narębski
2016-08-01 12:23       ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 06/10] run-command: add clean_on_exit_handler larsxschneider
2016-07-30  9:50     ` Johannes Sixt
2016-08-01 11:14       ` Lars Schneider
2016-08-02  5:53         ` Johannes Sixt
2016-08-02  7:41           ` Lars Schneider
2016-07-29 23:37   ` [PATCH v3 07/10] convert: quote filter names in error messages larsxschneider
2016-07-29 23:37   ` [PATCH v3 08/10] convert: modernize tests larsxschneider
2016-07-29 23:38   ` [PATCH v3 09/10] convert: generate large test files only once larsxschneider
2016-07-29 23:38   ` [PATCH v3 10/10] convert: add filter.<driver>.process option larsxschneider
2016-07-30 22:05     ` Jakub Narębski
2016-07-31  9:42       ` Jakub Narębski
2016-07-31 19:49         ` Lars Schneider
2016-07-31 22:59           ` Jakub Narębski
2016-08-01 13:32       ` Lars Schneider
2016-08-03 18:30         ` Designing the filter process protocol (was: Re: [PATCH v3 10/10] convert: add filter.<driver>.process option) Jakub Narębski
2016-08-05 10:32           ` Lars Schneider [this message]
2016-08-06 18:24           ` Lars Schneider
2016-08-03 22:47         ` [PATCH v3 10/10] convert: add filter.<driver>.process option Jakub Narębski
2016-07-31 22:19     ` Jakub Narębski
2016-08-01 17:55       ` Lars Schneider
2016-08-04  0:42         ` Jakub Narębski
2016-08-03 13:10       ` Lars Schneider
2016-08-04 10:18         ` Jakub Narębski
2016-08-05 13:20           ` Lars Schneider
2016-08-03 16:42   ` [PATCH v4 00/12] Git filter protocol larsxschneider
2016-08-03 16:42     ` [PATCH v4 01/12] pkt-line: extract set_packet_header() larsxschneider
2016-08-03 20:18       ` Junio C Hamano
2016-08-03 21:12         ` Jeff King
2016-08-03 21:27           ` Jeff King
2016-08-04 16:14           ` Junio C Hamano
2016-08-05 14:55             ` Lars Schneider
2016-08-05 16:31               ` Junio C Hamano
2016-08-05 17:31             ` Lars Schneider
2016-08-05 17:41               ` Junio C Hamano
2016-08-03 21:56         ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 02/12] pkt-line: add direct_packet_write() and direct_packet_write_data() larsxschneider
2016-08-03 16:42     ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() larsxschneider
2016-08-03 21:39       ` Jeff King
2016-08-03 22:56         ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-03 22:56           ` [PATCH 1/7] trace: handle NULL argument in trace_disable() Jeff King
2016-08-03 22:58           ` [PATCH 2/7] trace: stop using write_or_whine_pipe() Jeff King
2016-08-03 22:58           ` [PATCH 3/7] trace: use warning() for printing trace errors Jeff King
2016-08-04 20:41             ` Junio C Hamano
2016-08-04 21:21               ` Jeff King
2016-08-04 21:28                 ` Junio C Hamano
2016-08-05  7:56                   ` Jeff King
2016-08-05  7:59                   ` Christian Couder
2016-08-05 18:41                     ` Junio C Hamano
2016-08-03 23:00           ` [PATCH 4/7] trace: cosmetic fixes for error messages Jeff King
2016-08-04 20:42             ` Junio C Hamano
2016-08-05  8:00               ` Jeff King
2016-08-03 23:00           ` [PATCH 5/7] trace: correct variable name in write() error message Jeff King
2016-08-03 23:01           ` [PATCH 6/7] trace: disable key after write error Jeff King
2016-08-04 20:45             ` Junio C Hamano
2016-08-04 21:22               ` Jeff King
2016-08-05  7:58                 ` Jeff King
2016-08-03 23:01           ` [PATCH 7/7] write_or_die: drop write_or_whine_pipe() Jeff King
2016-08-03 23:04           ` [PATCH 0/7] minor trace fixes and cosmetic improvements Jeff King
2016-08-04 16:16         ` [PATCH v4 03/12] pkt-line: add packet_flush_gentle() Junio C Hamano
2016-08-03 16:42     ` [PATCH v4 04/12] pkt-line: call packet_trace() only if a packet is actually send larsxschneider
2016-08-03 16:42     ` [PATCH v4 05/12] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-08-03 16:42     ` [PATCH v4 06/12] pack-protocol: fix maximum pkt-line size larsxschneider
2016-08-03 16:42     ` [PATCH v4 07/12] run-command: add clean_on_exit_handler larsxschneider
2016-08-03 21:24       ` Jeff King
2016-08-03 22:15         ` Lars Schneider
2016-08-03 22:53           ` Jeff King
2016-08-03 23:09             ` Lars Schneider
2016-08-03 23:15               ` Jeff King
2016-08-05 13:08                 ` Lars Schneider
2016-08-05 21:19                   ` Torsten Bögershausen
2016-08-05 21:50                     ` Lars Schneider
2016-08-03 16:42     ` [PATCH v4 08/12] convert: quote filter names in error messages larsxschneider
2016-08-03 16:42     ` [PATCH v4 09/12] convert: modernize tests larsxschneider
2016-08-03 16:42     ` [PATCH v4 10/12] convert: generate large test files only once larsxschneider
2016-08-03 16:42     ` [PATCH v4 11/12] convert: add filter.<driver>.process option larsxschneider
2016-08-03 17:45       ` Junio C Hamano
2016-08-03 21:48         ` Lars Schneider
2016-08-03 22:46           ` Jeff King
2016-08-05 12:53             ` Lars Schneider
2016-08-03 20:29       ` Junio C Hamano
2016-08-03 21:37         ` Lars Schneider
2016-08-03 21:43           ` Junio C Hamano
2016-08-03 22:01             ` Lars Schneider
2016-08-05 21:34       ` Torsten Bögershausen
2016-08-05 21:49         ` Lars Schneider
2016-08-05 22:06         ` Junio C Hamano
2016-08-05 22:27           ` Jeff King
2016-08-06 11:55             ` Lars Schneider
2016-08-06 12:14               ` Jeff King
2016-08-06 18:19                 ` Lars Schneider
2016-08-08 15:02                   ` Jeff King
2016-08-08 16:21                     ` Lars Schneider
2016-08-08 16:26                       ` Jeff King
2016-08-06 20:40           ` Torsten Bögershausen
2016-08-03 16:42     ` [PATCH v4 12/12] convert: add filter.<driver>.process shutdown command option larsxschneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=FE7B04D2-C017-42AF-BD30-6160D251028C@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=e@80x24.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    --cc=mlbright@gmail.com \
    --cc=peff@peff.net \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.