All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/10] Git filter protocol
@ 2016-09-08 18:21 larsxschneider
  2016-09-08 18:21 ` [PATCH v7 01/10] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
                   ` (10 more replies)
  0 siblings, 11 replies; 45+ messages in thread
From: larsxschneider @ 2016-09-08 18:21 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

The goal of this series is to avoid launching a new clean/smudge filter
process for each file that is filtered.

A short summary about v1 to v5 can be found here:
https://git.github.io/rev_news/2016/08/17/edition-18/

This series is also published on web:
https://github.com/larsxschneider/git/pull/11

Thanks a lot to
  Stefan, Junio, Jakub (Narebski and Keller), Peff, and Torsten
for very helpful reviews,
Lars



## Major changes since v6

* series rebased to Git v2.10.0
* simplified and improved multi-packet test
* v2 filter takes precedence over v1 filter
* rename "error-all" to "abort"
* patch 07/13 "pack-protocol: fix maximum pkt-line size" submitted separately:
  http://public-inbox.org/git/20160829175509.69050-1-larsxschneider@gmail.com/
* removed "10/13 convert: generate large test files only once"
* patch 13/13 "read-cache: make sure file handles are not inherited by child
  processes" moved to an own series:
  http://public-inbox.org/git/20160905211111.72956-1-larsxschneider@gmail.com/



## All changes since v6

### Stefan

* http://public-inbox.org/git/CAGZ79kaJn-yf28+Ls2JyqSs6Jt-Oj2JW-sAXQn-Oe5xaxSyjMQ@mail.gmail.com/
    * submit patch "pack-protocol: fix maximum pkt-line size" separately:
      http://public-inbox.org/git/20160829175509.69050-1-larsxschneider@gmail.com/

* http://public-inbox.org/git/CAGZ79kbEoYVM_+MWkajT2pHN1OKZXATe5d+Dv_uSJ7dyPGxUeg@mail.gmail.com/
    * trace failures in new *_gently() pkt-line functions


### Junio

* http://public-inbox.org/git/xmqq8tvkle6o.fsf@gitster.mtv.corp.google.com/
    * share code between packet_write_fmt() and packet_write_fmt_gently()

* http://public-inbox.org/git/xmqq4m68ldrg.fsf@gitster.mtv.corp.google.com/
    * remove "Second, it will..." from "pkt-line: add packet_write_gently()"
      commit message
    * use separate buffer in write_packetized_from_fd()
    * fix write order in packet_write_gently()

* http://public-inbox.org/git/xmqqzio0jxh7.fsf@gitster.mtv.corp.google.com/
    * rename read/write flush terminated packet streams functions
    * remove meaningless "bytes_to_write > ..." check
    * reuse packet_read() function in read_packetized_to_buf()

* http://public-inbox.org/git/xmqqk2f3fgbd.fsf@gitster.mtv.corp.google.com/
    * revert test_config replacement in test setup

* http://public-inbox.org/git/xmqq37lncvj6.fsf@gitster.mtv.corp.google.com/
    * improve multi-packet test

* http://public-inbox.org/git/xmqqzinv6wtg.fsf@gitster.mtv.corp.google.com/
    * give v2 filter precedence over v1 filter

* http://public-inbox.org/git/xmqqmvjt3nht.fsf@gitster.mtv.corp.google.com/
    * explain that "error" behavior mimics v1 filter
    * explain that Git closes the pipe on exit to allow the filter a graceful
      exit


### Jakub

* http://public-inbox.org/git/7fbd02a1-ad62-7391-df2a-835aae8a62a7@gmail.com/
    * rename "error-all" to "abort"
    * add a simple Perl v2 filter example in contrib



## Interdiff (v6..v7)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index df059e9..ac000ea 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -470,20 +470,11 @@ packet:          git< status=error\n
 packet:          git< 0000
 ------------------------

-In case the filter cannot or does not want to process the content
-as well as any future content for the lifetime of the Git process,
-it is expected to respond with an "error-all" status. Depending on
-the `filter.<driver>.required` flag Git will interpret that as error
-but it will not stop or restart the filter process.
-------------------------
-packet:          git< status=error-all\n
-packet:          git< 0000
-------------------------
-
 If the filter experiences an error during processing, then it can
-send the status "error". Depending on the `filter.<driver>.required`
-flag Git will interpret that as error but it will not stop or restart
-the filter process.
+send the status "error" after the content was (partially or
+completely) sent. Depending on the `filter.<driver>.required` flag
+Git will interpret that as error but it will not stop or restart the
+filter process.
 ------------------------
 packet:          git< status=success\n
 packet:          git< 0000
@@ -495,21 +486,38 @@ packet:          git< 0000

 If the filter dies during the communication or does not adhere to
 the protocol then Git will stop the filter process and restart it
-with the next file that needs to be processed.
+with the next file that needs to be processed. Depending on the
+`filter.<driver>.required` flag Git will interpret that as error.
+
+The error handling for all cases above mimic the behavior of
+the `filter.<driver>.clean` / `filter.<driver>.smudge` error
+handling.
+
+In case the filter cannot or does not want to process the content
+as well as any future content for the lifetime of the Git process,
+it is expected to respond with an "abort" status. Depending on
+the `filter.<driver>.required` flag Git will interpret that as error
+for the content as well as any future content for the lifetime of the
+Git process but it will not stop or restart the filter process.
+------------------------
+packet:          git< status=abort\n
+packet:          git< 0000
+------------------------

 After the filter has processed a blob it is expected to wait for
-the next "key=value" list containing a command. When the Git process
-terminates, it will send a kill signal to the filter in that stage.
+the next "key=value" list containing a command. Git will close
+the command pipe on exit. The filter is expected to detect EOF
+and exit gracefully on its own.

 A long running filter demo implementation can be found in
-`t/t0021/rot13-filter.pl` located in the Git core repository.
-If you develop your own long running filter process then the
-`GIT_TRACE_PACKET` environment variables can be very helpful
-for debugging (see linkgit:git[1]).
-
-If a `filter.<driver>.clean` or `filter.<driver>.smudge` command
-is configured then these commands always take precedence over
-a configured `filter.<driver>.process` command.
+`contrib/long-running-filter/example.pl` located in the Git
+core repository. If you develop your own long running filter
+process then the `GIT_TRACE_PACKET` environment variables can be
+very helpful for debugging (see linkgit:git[1]).
+
+If a `filter.<driver>.process` command is configured then it
+always takes precedence over a configured `filter.<driver>.clean`
+or `filter.<driver>.smudge` command.

 Please note that you cannot use an existing `filter.<driver>.clean`
 or `filter.<driver>.smudge` command with `filter.<driver>.process`
diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl
new file mode 100755
index 0000000..279fbfb
--- /dev/null
+++ b/contrib/long-running-filter/example.pl
@@ -0,0 +1,111 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git filter protocol version 2
+# See Documentation/gitattributes.txt, section "Filter Protocol"
+#
+
+use strict;
+use warnings;
+
+my $MAX_PACKET_CONTENT_SIZE = 65516;
+
+sub packet_read {
+    my $buffer;
+    my $bytes_read = read STDIN, $buffer, 4;
+    if ( $bytes_read == 0 ) {
+
+        # EOF - Git stopped talking to us!
+        exit();
+    }
+    elsif ( $bytes_read != 4 ) {
+        die "invalid packet size '$bytes_read' field";
+    }
+    my $pkt_size = hex($buffer);
+    if ( $pkt_size == 0 ) {
+        return ( 1, "" );
+    }
+    elsif ( $pkt_size > 4 ) {
+        my $content_size = $pkt_size - 4;
+        $bytes_read = read STDIN, $buffer, $content_size;
+        if ( $bytes_read != $content_size ) {
+            die "invalid packet ($content_size expected; $bytes_read read)";
+        }
+        return ( 0, $buffer );
+    }
+    else {
+        die "invalid packet size";
+    }
+}
+
+sub packet_write {
+    my ($packet) = @_;
+    print STDOUT sprintf( "%04x", length($packet) + 4 );
+    print STDOUT $packet;
+    STDOUT->flush();
+}
+
+sub packet_flush {
+    print STDOUT sprintf( "%04x", 0 );
+    STDOUT->flush();
+}
+
+( packet_read() eq ( 0, "git-filter-client" ) ) || die "bad initialization";
+( packet_read() eq ( 0, "version=2" ) )         || die "bad version";
+( packet_read() eq ( 1, "" ) )                  || die "bad version end";
+
+packet_write("git-filter-server\n");
+packet_write("version=2\n");
+
+( packet_read() eq ( 0, "clean=true" ) )  || die "bad capability";
+( packet_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
+( packet_read() eq ( 1, "" ) )            || die "bad capability end";
+
+packet_write( "clean=true\n" );
+packet_write( "smudge=true\n" );
+packet_flush();
+
+while (1) {
+    my ($command) = packet_read() =~ /^command=([^=]+)\n$/;
+    my ($pathname) = packet_read() =~ /^pathname=([^=]+)\n$/;
+
+    packet_read();
+
+    my $input = "";
+    {
+        binmode(STDIN);
+        my $buffer;
+        my $done = 0;
+        while ( !$done ) {
+            ( $done, $buffer ) = packet_read();
+            $input .= $buffer;
+        }
+    }
+
+    my $output;
+    if ( $command eq "clean" ) {
+        ### Perform clean here ###
+        $output = $input;
+    }
+    elsif ( $command eq "smudge" ) {
+        ### Perform smudge here ###
+        $output = $input;
+    }
+    else {
+        die "bad command '$command'";
+    }
+
+    packet_write("status=success\n");
+    packet_flush();
+    while ( length($output) > 0 ) {
+        my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
+        packet_write($packet);
+        if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
+            $output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
+        }
+        else {
+            $output = "";
+        }
+    }
+    packet_flush(); # flush content!
+    packet_flush(); # empty list!
+}
diff --git a/convert.c b/convert.c
index 0269a49..0ed48ed 100644
--- a/convert.c
+++ b/convert.c
@@ -723,9 +723,9 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
    goto done;

  if (fd >= 0)
-   err = packet_write_stream_with_flush_from_fd(fd, process->in);
+   err = write_packetized_from_fd(fd, process->in);
  else
-   err = packet_write_stream_with_flush_from_buf(src, len, process->in);
+   err = write_packetized_from_buf(src, len, process->in);
  if (err)
    goto done;

@@ -734,7 +734,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
  if (err)
    goto done;

- err = packet_read_till_flush(process->out, &nbuf) < 0;
+ err = read_packetized_to_buf(process->out, &nbuf) < 0;
  if (err)
    goto done;

@@ -747,7 +747,7 @@ static int apply_multi_file_filter(const char *path, const char *src, size_t len
  if (err || errno == EPIPE) {
    if (!strcmp(filter_status.buf, "error")) {
      /* The filter signaled a problem with the file. */
-   } else if (!strcmp(filter_status.buf, "error-all")) {
+   } else if (!strcmp(filter_status.buf, "abort")) {
      /*
       * The filter signaled a permanent problem. Don't try to filter
       * files with the same command for the lifetime of the current
@@ -790,9 +790,9 @@ static int apply_filter(const char *path, const char *src, size_t len,
  if (!dst)
    return 1;

- if ((CAP_CLEAN & wanted_capability) && drv->clean)
+ if (!drv->process && (CAP_CLEAN & wanted_capability) && drv->clean)
    cmd = drv->clean;
- else if ((CAP_SMUDGE & wanted_capability) && drv->smudge)
+ else if (!drv->process && (CAP_SMUDGE & wanted_capability) && drv->smudge)
    cmd = drv->smudge;

  if (cmd && *cmd)
diff --git a/pkt-line.c b/pkt-line.c
index 3033aa3..5001a07 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -3,7 +3,6 @@
 #include "run-command.h"

 char packet_buffer[LARGE_PACKET_MAX];
-static char packet_write_buffer[LARGE_PACKET_MAX];
 static const char *packet_trace_prefix = "git";
 static struct trace_key trace_packet = TRACE_KEY_INIT(PACKET);
 static struct trace_key trace_pack = TRACE_KEY_INIT(PACKFILE);
@@ -95,7 +94,10 @@ void packet_flush(int fd)
 int packet_flush_gently(int fd)
 {
  packet_trace("0000", 4, 1);
- return (write_in_full(fd, "0000", 4) == 4 ? 0 : -1);
+ if (write_in_full(fd, "0000", 4) == 4)
+   return 0;
+ error("flush packet write failed");
+ return -1;
 }

 void packet_buf_flush(struct strbuf *buf)
@@ -132,39 +134,70 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args)
  packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }

+static int packet_write_fmt_1(int fd, int gently,
+                              const char *fmt, va_list args)
+{
+ struct strbuf buf = STRBUF_INIT;
+ size_t count;
+
+ format_packet(&buf, fmt, args);
+ count = write_in_full(fd, buf.buf, buf.len);
+ if (count == buf.len)
+   return 0;
+
+ if (!gently) {
+   if (errno == EPIPE) {
+     if (in_async())
+       async_exit(141);
+
+     signal(SIGPIPE, SIG_DFL);
+     raise(SIGPIPE);
+     /* Should never happen, but just in case... */
+     exit(141);
+   }
+   die_errno("packet write error");
+ }
+ error("packet write failed");
+ return -1;
+}
+
 void packet_write_fmt(int fd, const char *fmt, ...)
 {
- static struct strbuf buf = STRBUF_INIT;
  va_list args;

- strbuf_reset(&buf);
  va_start(args, fmt);
- format_packet(&buf, fmt, args);
+ packet_write_fmt_1(fd, 0, fmt, args);
  va_end(args);
- write_or_die(fd, buf.buf, buf.len);
 }

 int packet_write_fmt_gently(int fd, const char *fmt, ...)
 {
- static struct strbuf buf = STRBUF_INIT;
+ int status;
  va_list args;

- strbuf_reset(&buf);
  va_start(args, fmt);
- format_packet(&buf, fmt, args);
+ status = packet_write_fmt_1(fd, 1, fmt, args);
  va_end(args);
- return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
+ return status;
 }

 int packet_write_gently(const int fd_out, const char *buf, size_t size)
 {
- if (size > sizeof(packet_write_buffer) - 4)
+ static char packet_write_buffer[LARGE_PACKET_MAX];
+
+ if (size > sizeof(packet_write_buffer) - 4) {
+   error("packet write failed");
    return -1;
+ }
  packet_trace(buf, size, 1);
- memmove(packet_write_buffer + 4, buf, size);
  size += 4;
  set_packet_header(packet_write_buffer, size);
- return (write_in_full(fd_out, packet_write_buffer, size) == size ? 0 : -1);
+ memcpy(packet_write_buffer + 4, buf, size - 4);
+ if (write_in_full(fd_out, packet_write_buffer, size) == size)
+   return 0;
+
+ error("packet write failed");
+ return -1;
 }

 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
@@ -176,35 +209,35 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
  va_end(args);
 }

-int packet_write_stream_with_flush_from_fd(int fd_in, int fd_out)
+int write_packetized_from_fd(int fd_in, int fd_out)
 {
+ static char buf[PKTLINE_DATA_MAXLEN];
  int err = 0;
  ssize_t bytes_to_write;

  while (!err) {
-   bytes_to_write = xread(fd_in, packet_write_buffer, sizeof(packet_write_buffer) - 4);
+   bytes_to_write = xread(fd_in, buf, sizeof(buf));
    if (bytes_to_write < 0)
      return COPY_READ_ERROR;
    if (bytes_to_write == 0)
      break;
-   if (bytes_to_write > sizeof(packet_write_buffer) - 4)
-     return COPY_WRITE_ERROR;
-   err = packet_write_gently(fd_out, packet_write_buffer, bytes_to_write);
+   err = packet_write_gently(fd_out, buf, bytes_to_write);
  }
  if (!err)
    err = packet_flush_gently(fd_out);
  return err;
 }

-int packet_write_stream_with_flush_from_buf(const char *src_in, size_t len, int fd_out)
+int write_packetized_from_buf(const char *src_in, size_t len, int fd_out)
 {
+ static char buf[PKTLINE_DATA_MAXLEN];
  int err = 0;
  size_t bytes_written = 0;
  size_t bytes_to_write;

  while (!err) {
-   if ((len - bytes_written) > sizeof(packet_write_buffer) - 4)
-     bytes_to_write = sizeof(packet_write_buffer) - 4;
+   if ((len - bytes_written) > sizeof(buf))
+     bytes_to_write = sizeof(buf);
    else
      bytes_to_write = len - bytes_written;
    if (bytes_to_write == 0)
@@ -327,52 +360,29 @@ char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len)
  return packet_read_line_generic(-1, src, src_len, dst_len);
 }

-ssize_t packet_read_till_flush(int fd_in, struct strbuf *sb_out)
+ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out)
 {
- int len, ret;
+ int paket_len;
  int options = PACKET_READ_GENTLE_ON_EOF;
- char linelen[4];

  size_t oldlen = sb_out->len;
  size_t oldalloc = sb_out->alloc;

  for (;;) {
-   /* Read packet header */
-   ret = get_packet_data(fd_in, NULL, NULL, linelen, 4, options);
-   if (ret < 0)
-     goto done;
-   len = packet_length(linelen);
-   if (len < 0)
-     die("protocol error: bad line length character: %.4s", linelen);
-   if (!len) {
-     /* Found a flush packet - Done! */
-     packet_trace("0000", 4, 0);
+   strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1);
+   paket_len = packet_read(fd_in, NULL, NULL,
+     sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, options);
+   if (paket_len <= 0)
      break;
-   }
-   len -= 4;
-
-   /* Read packet content */
-   strbuf_grow(sb_out, len);
-   ret = get_packet_data(fd_in, NULL, NULL, sb_out->buf + sb_out->len, len, options);
-   if (ret < 0)
-     goto done;
-
-   if (ret != len) {
-     error("protocol error: incomplete read (expected %d, got %d)", len, ret);
-     goto done;
-   }
-
-   packet_trace(sb_out->buf + sb_out->len, len, 0);
-   sb_out->len += len;
+   sb_out->len += paket_len;
  }

-done:
- if (ret < 0) {
+ if (paket_len < 0) {
    if (oldalloc == 0)
      strbuf_release(sb_out);
    else
      strbuf_setlen(sb_out, oldlen);
-   return ret;  /* unexpected EOF */
+   return paket_len;
  }
  return sb_out->len - oldlen;
 }
diff --git a/pkt-line.h b/pkt-line.h
index 6a3b7cf..3d873f3 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,8 +25,8 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
-int packet_write_stream_with_flush_from_fd(int fd_in, int fd_out);
-int packet_write_stream_with_flush_from_buf(const char *src_in, size_t len, int fd_out);
+int write_packetized_from_fd(int fd_in, int fd_out);
+int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);

 /*
  * Read a packetized line into the buffer, which must be at least size bytes
@@ -82,7 +82,7 @@ char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
 /*
  * Reads a stream of variable sized packets until a flush packet is detected.
  */
-ssize_t packet_read_till_flush(int fd_in, struct strbuf *sb_out);
+ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out);

 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
diff --git a/read-cache.c b/read-cache.c
index 02f74d3..491e52d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -156,7 +156,7 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st)
 static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
 {
  int match = -1;
- int fd = open(ce->name, O_RDONLY | O_CLOEXEC);
+ int fd = open(ce->name, O_RDONLY);

  if (fd >= 0) {
    unsigned char sha1[20];
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 94b0bf3..1c98ac3 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -4,15 +4,6 @@ test_description='blob conversion via gitattributes'

 . ./test-lib.sh

-if test_have_prereq EXPENSIVE
-then
- T0021_LARGE_FILE_SIZE=2048
- T0021_LARGISH_FILE_SIZE=100
-else
- T0021_LARGE_FILE_SIZE=30
- T0021_LARGISH_FILE_SIZE=2
-fi
-
 cat <<EOF >rot13.sh
 #!$SHELL_PATH
 tr \
@@ -22,8 +13,8 @@ EOF
 chmod +x rot13.sh

 test_expect_success setup '
- test_config filter.rot13.smudge ./rot13.sh &&
- test_config filter.rot13.clean ./rot13.sh &&
+ git config filter.rot13.smudge ./rot13.sh &&
+ git config filter.rot13.clean ./rot13.sh &&

  {
      echo "*.t filter=rot13"
@@ -43,26 +34,7 @@ test_expect_success setup '
  git checkout -- test test.t test.i &&

  echo "content-test2" >test2.o &&
- echo "content-test3-subdir" >test3-subdir.o &&
-
- mkdir generated-test-data &&
- for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
- do
-   RANDOM_STRING="$(test-genrandom end $i | tr -dc "A-Za-z0-9" )"
-   ROT_RANDOM_STRING="$(echo $RANDOM_STRING | ./rot13.sh )"
-   # Generate 1MB of empty data and 100 bytes of random characters
-   # printf "$(test-genrandom start $i)"
-   printf "%1048576d" 1 >>generated-test-data/large.file &&
-   printf "$RANDOM_STRING" >>generated-test-data/large.file &&
-   printf "%1048576d" 1 >>generated-test-data/large.file.rot13 &&
-   printf "$ROT_RANDOM_STRING" >>generated-test-data/large.file.rot13 &&
-
-   if test $i = $T0021_LARGISH_FILE_SIZE
-   then
-     cat generated-test-data/large.file >generated-test-data/largish.file &&
-     cat generated-test-data/large.file.rot13 >generated-test-data/largish.file.rot13
-   fi
- done
+ echo "content-test3-subdir" >test3-subdir.o
 '

 script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
@@ -230,9 +202,9 @@ test_expect_success 'required filter clean failure' '
 test_expect_success 'filtering large input to small output should use little memory' '
  test_config filter.devnull.clean "cat >/dev/null" &&
  test_config filter.devnull.required true &&
- cp generated-test-data/large.file large.file &&
- echo "large.file filter=devnull" >.gitattributes &&
- GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add large.file
+ for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
+ echo "30MB filter=devnull" >.gitattributes &&
+ GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
 '

 test_expect_success 'filter that does not read is fine' '
@@ -245,15 +217,15 @@ test_expect_success 'filter that does not read is fine' '
  test_cmp expect actual
 '

-test_expect_success 'filter large file' '
+test_expect_success EXPENSIVE 'filter large file' '
  test_config filter.largefile.smudge cat &&
  test_config filter.largefile.clean cat &&
- echo "large.file filter=largefile" >.gitattributes &&
- cp generated-test-data/large.file large.file &&
- git add large.file 2>err &&
+ for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
+ echo "2GB filter=largefile" >.gitattributes &&
+ git add 2GB 2>err &&
  test_must_be_empty err &&
- rm -f large.file &&
- git checkout -- large.file 2>err &&
+ rm -f 2GB &&
+ git checkout -- 2GB 2>err &&
  test_must_be_empty err
 '

@@ -374,22 +346,24 @@ test_expect_success PERL 'required process filter should filter data' '
    check_filter \
      git add . \
        <<-\EOF &&
-         1 IN: clean test.r 57 [OK] -- OUT: 57 [OK]
-         1 IN: clean test2.r 14 [OK] -- OUT: 14 [OK]
-         1 IN: clean test4-empty.r 0 [OK] -- OUT: 0 [OK]
-         1 IN: clean testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
-         1 start
+         1 IN: clean test.r 57 [OK] -- OUT: 57 . [OK]
+         1 IN: clean test2.r 14 [OK] -- OUT: 14 . [OK]
+         1 IN: clean test4-empty.r 0 [OK] -- OUT: 0  [OK]
+         1 IN: clean testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 . [OK]
+         1 START
+         1 STOP
          1 wrote filter header
        EOF

    check_filter_count_clean \
      git commit . -m "test commit" \
        <<-\EOF &&
-         x IN: clean test.r 57 [OK] -- OUT: 57 [OK]
-         x IN: clean test2.r 14 [OK] -- OUT: 14 [OK]
-         x IN: clean test4-empty.r 0 [OK] -- OUT: 0 [OK]
-         x IN: clean testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
-         1 start
+         x IN: clean test.r 57 [OK] -- OUT: 57 . [OK]
+         x IN: clean test2.r 14 [OK] -- OUT: 14 . [OK]
+         x IN: clean test4-empty.r 0 [OK] -- OUT: 0  [OK]
+         x IN: clean testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 . [OK]
+         1 START
+         1 STOP
          1 wrote filter header
        EOF

@@ -398,28 +372,31 @@ test_expect_success PERL 'required process filter should filter data' '
    check_filter_ignore_clean \
      git checkout . \
        <<-\EOF &&
-         start
+         START
          wrote filter header
-         IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
-         IN: smudge testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
+         IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK]
+         IN: smudge testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 . [OK]
+         STOP
        EOF

    check_filter_ignore_clean \
      git checkout empty \
        <<-\EOF &&
-         start
+         START
          wrote filter header
+         STOP
        EOF

    check_filter_ignore_clean \
      git checkout master \
        <<-\EOF &&
-         start
+         START
          wrote filter header
-         IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
-         IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
-         IN: smudge test4-empty.r 0 [OK] -- OUT: 0 [OK]
-         IN: smudge testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 [OK]
+         IN: smudge test.r 57 [OK] -- OUT: 57 . [OK]
+         IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK]
+         IN: smudge test4-empty.r 0 [OK] -- OUT: 0  [OK]
+         IN: smudge testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 . [OK]
+         STOP
        EOF

    check_rot13 ../test.o test.r &&
@@ -428,57 +405,8 @@ test_expect_success PERL 'required process filter should filter data' '
  )
 '

-test_expect_success PERL 'required process filter should filter smudge data and one-shot filter should clean' '
+test_expect_success PERL 'required process filter should clean only and take precedence' '
  test_config_global filter.protocol.clean ./../rot13.sh &&
- test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl smudge" &&
- test_config_global filter.protocol.required true &&
- rm -rf repo &&
- mkdir repo &&
- (
-   cd repo &&
-   git init &&
-
-   echo "*.r filter=protocol" >.gitattributes &&
-   git add . &&
-   git commit . -m "test commit" &&
-   git branch empty &&
-
-   cp ../test.o test.r &&
-   cp ../test2.o test2.r &&
-
-   check_filter_no_call \
-     git add . &&
-
-   check_filter_no_call \
-     git commit . -m "test commit" &&
-
-   rm -f test?.r testsubdir/test3-subdir.r &&
-
-   check_filter_ignore_clean \
-     git checkout . \
-       <<-\EOF &&
-         start
-         wrote filter header
-         IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
-       EOF
-
-   git checkout empty &&
-
-   check_filter_ignore_clean \
-     git checkout master\
-       <<-\EOF &&
-         start
-         wrote filter header
-         IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
-         IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
-       EOF
-
-   check_rot13 ../test.o test.r &&
-   check_rot13 ../test2.o test2.r
- )
-'
-
-test_expect_success PERL 'required process filter should clean only' '
  test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean" &&
  test_config_global filter.protocol.required true &&
  rm -rf repo &&
@@ -497,42 +425,74 @@ test_expect_success PERL 'required process filter should clean only' '
    check_filter \
      git add . \
        <<-\EOF &&
-         1 IN: clean test.r 57 [OK] -- OUT: 57 [OK]
-         1 start
+         1 IN: clean test.r 57 [OK] -- OUT: 57 . [OK]
+         1 START
+         1 STOP
          1 wrote filter header
        EOF

    check_filter_count_clean \
      git commit . -m "test commit" \
        <<-\EOF
-         x IN: clean test.r 57 [OK] -- OUT: 57 [OK]
-         1 start
+         x IN: clean test.r 57 [OK] -- OUT: 57 . [OK]
+         1 START
+         1 STOP
          1 wrote filter header
        EOF
  )
 '

-test_expect_success PERL 'required process filter should process binary files larger LARGE_PACKET_MAX' '
+generate_test_data () {
+ LEN=$1
+ NAME=$2
+ test-genrandom end $LEN |
+   perl -pe "s/./chr((ord($&) % 26) + 97)/sge" >../$NAME.file &&
+ cp ../$NAME.file . &&
+ ./../rot13.sh <../$NAME.file >../$NAME.file.rot13
+}
+
+test_expect_success PERL 'required process filter should process multiple packets' '
  test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
  test_config_global filter.protocol.required true &&
+
  rm -rf repo &&
  mkdir repo &&
  (
    cd repo &&
    git init &&

-   echo "*.file filter=protocol" >.gitattributes &&
-   cat ../generated-test-data/largish.file.rot13 >large.rot13 &&
-   cat ../generated-test-data/largish.file >large.file &&
-   cat large.file >large.original &&
+   # Generate data that requires 3 packets
+   PKTLINE_DATA_MAXLEN=65516 &&
+
+   generate_test_data $(($PKTLINE_DATA_MAXLEN        )) 1pkt_1__ &&
+   generate_test_data $(($PKTLINE_DATA_MAXLEN     + 1)) 2pkt_1+1 &&
+   generate_test_data $(($PKTLINE_DATA_MAXLEN * 2 - 1)) 2pkt_2-1 &&
+   generate_test_data $(($PKTLINE_DATA_MAXLEN * 2    )) 2pkt_2__ &&
+   generate_test_data $(($PKTLINE_DATA_MAXLEN * 2 + 1)) 3pkt_2+1 &&

-   git add large.file .gitattributes &&
+   echo "*.file filter=protocol" >.gitattributes &&
+   check_filter \
+     git add *.file .gitattributes \
+       <<-\EOF &&
+         1 IN: clean 1pkt_1__.file 65516 [OK] -- OUT: 65516 . [OK]
+         1 IN: clean 2pkt_1+1.file 65517 [OK] -- OUT: 65517 .. [OK]
+         1 IN: clean 2pkt_2-1.file 131031 [OK] -- OUT: 131031 .. [OK]
+         1 IN: clean 2pkt_2__.file 131032 [OK] -- OUT: 131032 .. [OK]
+         1 IN: clean 3pkt_2+1.file 131033 [OK] -- OUT: 131033 ... [OK]
+         1 START
+         1 STOP
+         1 wrote filter header
+       EOF
    git commit . -m "test commit" &&

-   rm -f large.file &&
-   git checkout -- large.file &&
-   git cat-file blob :large.file >actual &&
-   test_cmp large.rot13 actual
+   rm -f *.file &&
+   git checkout -- *.file &&
+
+   for f in *.file
+   do
+     git cat-file blob :$f >actual &&
+     test_cmp ../$f.rot13 actual
+   done
  )
 '

@@ -577,13 +537,14 @@ test_expect_success PERL 'process filter should restart after unexpected write f
    check_filter_ignore_clean \
      git checkout . \
        <<-\EOF &&
-         start
+         START
          wrote filter header
          IN: smudge smudge-write-fail.r 22 [OK] -- OUT: 22 [WRITE FAIL]
-         start
+         START
          wrote filter header
-         IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
-         IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+         IN: smudge test.r 57 [OK] -- OUT: 57 . [OK]
+         IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK]
+         STOP
        EOF

    check_rot13 ../test.o test.r &&
@@ -617,11 +578,12 @@ test_expect_success PERL 'process filter should not restart in case of an error'
    check_filter_ignore_clean \
      git checkout . \
        <<-\EOF &&
-         start
+         START
          wrote filter header
          IN: smudge error.r 25 [OK] -- OUT: 0 [ERROR]
-         IN: smudge test.r 57 [OK] -- OUT: 57 [OK]
-         IN: smudge test2.r 14 [OK] -- OUT: 14 [OK]
+         IN: smudge test.r 57 [OK] -- OUT: 57 . [OK]
+         IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK]
+         STOP
        EOF

    check_rot13 ../test.o test.r &&
@@ -642,8 +604,8 @@ test_expect_success PERL 'process filter should be able to signal an error for a

    cp ../test.o test.r &&
    cp ../test2.o test2.r &&
-   echo "error this blob and all future blobs" >error-all.o &&
-   cp error-all.o error-all.r &&
+   echo "error this blob and all future blobs" >abort.o &&
+   cp abort.o abort.r &&
    git add . &&
    git commit . -m "test commit" &&
    rm -f *.r &&
@@ -651,14 +613,15 @@ test_expect_success PERL 'process filter should be able to signal an error for a
    check_filter_ignore_clean \
      git checkout . \
        <<-\EOF &&
-         start
+         START
          wrote filter header
-         IN: smudge error-all.r 37 [OK] -- OUT: 0 [ERROR-ALL]
+         IN: smudge abort.r 37 [OK] -- OUT: 0 [ABORT]
+         STOP
        EOF

    test_cmp ../test.o test.r &&
    test_cmp ../test2.o test2.r &&
-   test_cmp error-all.o error-all.r
+   test_cmp abort.o abort.r
  )
 '

diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 0e1181d..8e27877 100755
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -14,7 +14,7 @@
 # (3) If data with the pathname "error.r" is processed with any
 #     operation then the filter signals that it cannot or does not want
 #     to process the file.
-# (4) If data with the pathname "error-all.r" is processed with any
+# (4) If data with the pathname "abort.r" is processed with any
 #     operation then the filter signals that it cannot or does not want
 #     to process the file and any file after that is processed with the
 #     same command.
@@ -26,6 +26,8 @@ use warnings;
 my $MAX_PACKET_CONTENT_SIZE = 65516;
 my @capabilities            = @ARGV;

+open my $debug, ">>", "rot13-filter.log";
+
 sub rot13 {
     my ($str) = @_;
     $str =~ y/A-Za-z/N-ZA-Mn-za-m/;
@@ -38,6 +40,7 @@ sub packet_read {
     if ( $bytes_read == 0 ) {

         # EOF - Git stopped talking to us!
+        print $debug "STOP\n";
         exit();
     }
     elsif ( $bytes_read != 4 ) {
@@ -72,8 +75,7 @@ sub packet_flush {
     STDOUT->flush();
 }

-open my $debug, ">>", "rot13-filter.log";
-print $debug "start\n";
+print $debug "START\n";
 $debug->flush();

 ( packet_read() eq ( 0, "git-filter-client" ) ) || die "bad initialization";
@@ -120,7 +122,7 @@ while (1) {
     }

     my $output;
-    if ( $pathname eq "error.r" or $pathname eq "error-all.r" ) {
+    if ( $pathname eq "error.r" or $pathname eq "abort.r" ) {
         $output = "";
     }
     elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
@@ -142,10 +144,10 @@ while (1) {
         packet_write("status=error\n");
         packet_flush();
     }
-    elsif ( $pathname eq "error-all.r" ) {
-        print $debug "[ERROR-ALL]\n";
+    elsif ( $pathname eq "abort.r" ) {
+        print $debug "[ABORT]\n";
         $debug->flush();
-        packet_write("status=error-all\n");
+        packet_write("status=abort\n");
         packet_flush();
     }
     else {
@@ -161,6 +163,7 @@ while (1) {
         while ( length($output) > 0 ) {
             my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
             packet_write($packet);
+            print $debug ".";
             if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
                 $output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
             }
@@ -169,7 +172,7 @@ while (1) {
             }
         }
         packet_flush();
-        print $debug "[OK]\n";
+        print $debug " [OK]\n";
         $debug->flush();
         packet_flush();
     }




Lars Schneider (10):
  pkt-line: rename packet_write() to packet_write_fmt()
  pkt-line: extract set_packet_header()
  pkt-line: add packet_write_fmt_gently()
  pkt-line: add packet_flush_gently()
  pkt-line: add packet_write_gently()
  pkt-line: add functions to read/write flush terminated packet streams
  convert: quote filter names in error messages
  convert: modernize tests
  convert: make apply_filter() adhere to standard Git error handling
  convert: add filter.<driver>.process option

 Documentation/gitattributes.txt        | 154 +++++++++++-
 builtin/archive.c                      |   4 +-
 builtin/receive-pack.c                 |   4 +-
 builtin/remote-ext.c                   |   4 +-
 builtin/upload-archive.c               |   4 +-
 connect.c                              |   2 +-
 contrib/long-running-filter/example.pl | 111 +++++++++
 convert.c                              | 373 +++++++++++++++++++++++++----
 daemon.c                               |   2 +-
 http-backend.c                         |   2 +-
 pkt-line.c                             | 160 ++++++++++++-
 pkt-line.h                             |  12 +-
 shallow.c                              |   2 +-
 t/t0021-conversion.sh                  | 423 ++++++++++++++++++++++++++++++---
 t/t0021/rot13-filter.pl                | 179 ++++++++++++++
 unpack-trees.c                         |   1 +
 upload-pack.c                          |  30 +--
 17 files changed, 1356 insertions(+), 111 deletions(-)
 create mode 100755 contrib/long-running-filter/example.pl
 create mode 100755 t/t0021/rot13-filter.pl

--
2.10.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 01/10] pkt-line: rename packet_write() to packet_write_fmt()
  2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
@ 2016-09-08 18:21 ` larsxschneider
  2016-09-08 18:21 ` [PATCH v7 02/10] pkt-line: extract set_packet_header() larsxschneider
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 45+ messages in thread
From: larsxschneider @ 2016-09-08 18:21 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_write() should be called packet_write_fmt() as the string
parameter can be formatted.

Suggested-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 builtin/archive.c        |  4 ++--
 builtin/receive-pack.c   |  4 ++--
 builtin/remote-ext.c     |  4 ++--
 builtin/upload-archive.c |  4 ++--
 connect.c                |  2 +-
 daemon.c                 |  2 +-
 http-backend.c           |  2 +-
 pkt-line.c               |  2 +-
 pkt-line.h               |  2 +-
 shallow.c                |  2 +-
 upload-pack.c            | 30 +++++++++++++++---------------
 11 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index a1e3b94..49f4914 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -47,10 +47,10 @@ static int run_remote_archiver(int argc, const char **argv,
 	if (name_hint) {
 		const char *format = archive_format_from_filename(name_hint);
 		if (format)
-			packet_write(fd[1], "argument --format=%s\n", format);
+			packet_write_fmt(fd[1], "argument --format=%s\n", format);
 	}
 	for (i = 1; i < argc; i++)
-		packet_write(fd[1], "argument %s\n", argv[i]);
+		packet_write_fmt(fd[1], "argument %s\n", argv[i]);
 	packet_flush(fd[1]);
 
 	buf = packet_read_line(fd[0], NULL);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 011db00..1ce7682 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -218,7 +218,7 @@ static int receive_pack_config(const char *var, const char *value, void *cb)
 static void show_ref(const char *path, const unsigned char *sha1)
 {
 	if (sent_capabilities) {
-		packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
+		packet_write_fmt(1, "%s %s\n", sha1_to_hex(sha1), path);
 	} else {
 		struct strbuf cap = STRBUF_INIT;
 
@@ -233,7 +233,7 @@ static void show_ref(const char *path, const unsigned char *sha1)
 		if (advertise_push_options)
 			strbuf_addstr(&cap, " push-options");
 		strbuf_addf(&cap, " agent=%s", git_user_agent_sanitized());
-		packet_write(1, "%s %s%c%s\n",
+		packet_write_fmt(1, "%s %s%c%s\n",
 			     sha1_to_hex(sha1), path, 0, cap.buf);
 		strbuf_release(&cap);
 		sent_capabilities = 1;
diff --git a/builtin/remote-ext.c b/builtin/remote-ext.c
index 88eb8f9..11b48bf 100644
--- a/builtin/remote-ext.c
+++ b/builtin/remote-ext.c
@@ -128,9 +128,9 @@ static void send_git_request(int stdin_fd, const char *serv, const char *repo,
 	const char *vhost)
 {
 	if (!vhost)
-		packet_write(stdin_fd, "%s %s%c", serv, repo, 0);
+		packet_write_fmt(stdin_fd, "%s %s%c", serv, repo, 0);
 	else
-		packet_write(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
+		packet_write_fmt(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
 			     vhost, 0);
 }
 
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 2caedf1..dc872f6 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -88,11 +88,11 @@ int cmd_upload_archive(int argc, const char **argv, const char *prefix)
 	writer.git_cmd = 1;
 	if (start_command(&writer)) {
 		int err = errno;
-		packet_write(1, "NACK unable to spawn subprocess\n");
+		packet_write_fmt(1, "NACK unable to spawn subprocess\n");
 		die("upload-archive: %s", strerror(err));
 	}
 
-	packet_write(1, "ACK\n");
+	packet_write_fmt(1, "ACK\n");
 	packet_flush(1);
 
 	while (1) {
diff --git a/connect.c b/connect.c
index 722dc3f..5330d9c 100644
--- a/connect.c
+++ b/connect.c
@@ -730,7 +730,7 @@ struct child_process *git_connect(int fd[2], const char *url,
 		 * Note: Do not add any other headers here!  Doing so
 		 * will cause older git-daemon servers to crash.
 		 */
-		packet_write(fd[1],
+		packet_write_fmt(fd[1],
 			     "%s %s%chost=%s%c",
 			     prog, path, 0,
 			     target_host, 0);
diff --git a/daemon.c b/daemon.c
index 425aad0..afce1b9 100644
--- a/daemon.c
+++ b/daemon.c
@@ -281,7 +281,7 @@ static int daemon_error(const char *dir, const char *msg)
 {
 	if (!informative_errors)
 		msg = "access denied or repository not exported";
-	packet_write(1, "ERR %s: %s", msg, dir);
+	packet_write_fmt(1, "ERR %s: %s", msg, dir);
 	return -1;
 }
 
diff --git a/http-backend.c b/http-backend.c
index adc8c8c..eef0a36 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -464,7 +464,7 @@ static void get_info_refs(struct strbuf *hdr, char *arg)
 		hdr_str(hdr, content_type, buf.buf);
 		end_headers(hdr);
 
-		packet_write(1, "# service=git-%s\n", svc->name);
+		packet_write_fmt(1, "# service=git-%s\n", svc->name);
 		packet_flush(1);
 
 		argv[0] = svc->name;
diff --git a/pkt-line.c b/pkt-line.c
index 62fdb37..0a9b61c 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -118,7 +118,7 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args)
 	packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
-void packet_write(int fd, const char *fmt, ...)
+void packet_write_fmt(int fd, const char *fmt, ...)
 {
 	static struct strbuf buf = STRBUF_INIT;
 	va_list args;
diff --git a/pkt-line.h b/pkt-line.h
index 3cb9d91..1902fb3 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -20,7 +20,7 @@
  * side can't, we stay with pure read/write interfaces.
  */
 void packet_flush(int fd);
-void packet_write(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 
diff --git a/shallow.c b/shallow.c
index 54e2db7..d666e24 100644
--- a/shallow.c
+++ b/shallow.c
@@ -260,7 +260,7 @@ static int advertise_shallow_grafts_cb(const struct commit_graft *graft, void *c
 {
 	int fd = *(int *)cb;
 	if (graft->nr_parent == -1)
-		packet_write(fd, "shallow %s\n", oid_to_hex(&graft->oid));
+		packet_write_fmt(fd, "shallow %s\n", oid_to_hex(&graft->oid));
 	return 0;
 }
 
diff --git a/upload-pack.c b/upload-pack.c
index ca7f941..cd47de6 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -393,13 +393,13 @@ static int get_common_commits(void)
 			if (multi_ack == 2 && got_common
 			    && !got_other && ok_to_give_up()) {
 				sent_ready = 1;
-				packet_write(1, "ACK %s ready\n", last_hex);
+				packet_write_fmt(1, "ACK %s ready\n", last_hex);
 			}
 			if (have_obj.nr == 0 || multi_ack)
-				packet_write(1, "NAK\n");
+				packet_write_fmt(1, "NAK\n");
 
 			if (no_done && sent_ready) {
-				packet_write(1, "ACK %s\n", last_hex);
+				packet_write_fmt(1, "ACK %s\n", last_hex);
 				return 0;
 			}
 			if (stateless_rpc)
@@ -416,20 +416,20 @@ static int get_common_commits(void)
 					const char *hex = sha1_to_hex(sha1);
 					if (multi_ack == 2) {
 						sent_ready = 1;
-						packet_write(1, "ACK %s ready\n", hex);
+						packet_write_fmt(1, "ACK %s ready\n", hex);
 					} else
-						packet_write(1, "ACK %s continue\n", hex);
+						packet_write_fmt(1, "ACK %s continue\n", hex);
 				}
 				break;
 			default:
 				got_common = 1;
 				memcpy(last_hex, sha1_to_hex(sha1), 41);
 				if (multi_ack == 2)
-					packet_write(1, "ACK %s common\n", last_hex);
+					packet_write_fmt(1, "ACK %s common\n", last_hex);
 				else if (multi_ack)
-					packet_write(1, "ACK %s continue\n", last_hex);
+					packet_write_fmt(1, "ACK %s continue\n", last_hex);
 				else if (have_obj.nr == 1)
-					packet_write(1, "ACK %s\n", last_hex);
+					packet_write_fmt(1, "ACK %s\n", last_hex);
 				break;
 			}
 			continue;
@@ -437,10 +437,10 @@ static int get_common_commits(void)
 		if (!strcmp(line, "done")) {
 			if (have_obj.nr > 0) {
 				if (multi_ack)
-					packet_write(1, "ACK %s\n", last_hex);
+					packet_write_fmt(1, "ACK %s\n", last_hex);
 				return 0;
 			}
-			packet_write(1, "NAK\n");
+			packet_write_fmt(1, "NAK\n");
 			return -1;
 		}
 		die("git upload-pack: expected SHA1 list, got '%s'", line);
@@ -650,7 +650,7 @@ static void receive_needs(void)
 		while (result) {
 			struct object *object = &result->item->object;
 			if (!(object->flags & (CLIENT_SHALLOW|NOT_SHALLOW))) {
-				packet_write(1, "shallow %s",
+				packet_write_fmt(1, "shallow %s",
 						oid_to_hex(&object->oid));
 				register_shallow(object->oid.hash);
 				shallow_nr++;
@@ -662,7 +662,7 @@ static void receive_needs(void)
 			struct object *object = shallows.objects[i].item;
 			if (object->flags & NOT_SHALLOW) {
 				struct commit_list *parents;
-				packet_write(1, "unshallow %s",
+				packet_write_fmt(1, "unshallow %s",
 					oid_to_hex(&object->oid));
 				object->flags &= ~CLIENT_SHALLOW;
 				/* make sure the real parents are parsed */
@@ -741,7 +741,7 @@ static int send_ref(const char *refname, const struct object_id *oid,
 		struct strbuf symref_info = STRBUF_INIT;
 
 		format_symref_info(&symref_info, cb_data);
-		packet_write(1, "%s %s%c%s%s%s%s%s agent=%s\n",
+		packet_write_fmt(1, "%s %s%c%s%s%s%s%s agent=%s\n",
 			     oid_to_hex(oid), refname_nons,
 			     0, capabilities,
 			     (allow_unadvertised_object_request & ALLOW_TIP_SHA1) ?
@@ -753,11 +753,11 @@ static int send_ref(const char *refname, const struct object_id *oid,
 			     git_user_agent_sanitized());
 		strbuf_release(&symref_info);
 	} else {
-		packet_write(1, "%s %s\n", oid_to_hex(oid), refname_nons);
+		packet_write_fmt(1, "%s %s\n", oid_to_hex(oid), refname_nons);
 	}
 	capabilities = NULL;
 	if (!peel_ref(refname, peeled.hash))
-		packet_write(1, "%s %s^{}\n", oid_to_hex(&peeled), refname_nons);
+		packet_write_fmt(1, "%s %s^{}\n", oid_to_hex(&peeled), refname_nons);
 	return 0;
 }
 
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 02/10] pkt-line: extract set_packet_header()
  2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
  2016-09-08 18:21 ` [PATCH v7 01/10] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
@ 2016-09-08 18:21 ` larsxschneider
  2016-09-08 18:21 ` [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently() larsxschneider
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 45+ messages in thread
From: larsxschneider @ 2016-09-08 18:21 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

set_packet_header() converts an integer to a 4 byte hex string. Make
this function locally available so that other pkt-line functions can
use it.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 0a9b61c..e8adc0f 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -97,10 +97,20 @@ void packet_buf_flush(struct strbuf *buf)
 	strbuf_add(buf, "0000", 4);
 }
 
-#define hex(a) (hexchar[(a) & 15])
-static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+static void set_packet_header(char *buf, const int size)
 {
 	static char hexchar[] = "0123456789abcdef";
+
+	#define hex(a) (hexchar[(a) & 15])
+	buf[0] = hex(size >> 12);
+	buf[1] = hex(size >> 8);
+	buf[2] = hex(size >> 4);
+	buf[3] = hex(size);
+	#undef hex
+}
+
+static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+{
 	size_t orig_len, n;
 
 	orig_len = out->len;
@@ -111,10 +121,7 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args)
 	if (n > LARGE_PACKET_MAX)
 		die("protocol error: impossibly long line");
 
-	out->buf[orig_len + 0] = hex(n >> 12);
-	out->buf[orig_len + 1] = hex(n >> 8);
-	out->buf[orig_len + 2] = hex(n >> 4);
-	out->buf[orig_len + 3] = hex(n);
+	set_packet_header(&out->buf[orig_len], n);
 	packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently()
  2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
  2016-09-08 18:21 ` [PATCH v7 01/10] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
  2016-09-08 18:21 ` [PATCH v7 02/10] pkt-line: extract set_packet_header() larsxschneider
@ 2016-09-08 18:21 ` larsxschneider
  2016-09-08 21:18   ` Stefan Beller
  2016-09-08 18:21 ` [PATCH v7 04/10] pkt-line: add packet_flush_gently() larsxschneider
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 45+ messages in thread
From: larsxschneider @ 2016-09-08 18:21 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_write_fmt() would die in case of a write error even though for
some callers an error would be acceptable. Add packet_write_fmt_gently()
which writes a formatted pkt-line and returns `0` for success and `-1`
for an error.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 43 +++++++++++++++++++++++++++++++++++++++----
 pkt-line.h |  1 +
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index e8adc0f..3824d05 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -125,16 +125,51 @@ static void format_packet(struct strbuf *out, const char *fmt, va_list args)
 	packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
+static int packet_write_fmt_1(int fd, int gently,
+                              const char *fmt, va_list args)
+{
+	struct strbuf buf = STRBUF_INIT;
+	size_t count;
+
+	format_packet(&buf, fmt, args);
+	count = write_in_full(fd, buf.buf, buf.len);
+	if (count == buf.len)
+		return 0;
+
+	if (!gently) {
+		if (errno == EPIPE) {
+			if (in_async())
+				async_exit(141);
+
+			signal(SIGPIPE, SIG_DFL);
+			raise(SIGPIPE);
+			/* Should never happen, but just in case... */
+			exit(141);
+		}
+		die_errno("packet write error");
+	}
+	error("packet write failed");
+	return -1;
+}
+
 void packet_write_fmt(int fd, const char *fmt, ...)
 {
-	static struct strbuf buf = STRBUF_INIT;
 	va_list args;
 
-	strbuf_reset(&buf);
 	va_start(args, fmt);
-	format_packet(&buf, fmt, args);
+	packet_write_fmt_1(fd, 0, fmt, args);
+	va_end(args);
+}
+
+int packet_write_fmt_gently(int fd, const char *fmt, ...)
+{
+	int status;
+	va_list args;
+
+	va_start(args, fmt);
+	status = packet_write_fmt_1(fd, 1, fmt, args);
 	va_end(args);
-	write_or_die(fd, buf.buf, buf.len);
+	return status;
 }
 
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
diff --git a/pkt-line.h b/pkt-line.h
index 1902fb3..3caea77 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -23,6 +23,7 @@ void packet_flush(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 
 /*
  * Read a packetized line into the buffer, which must be at least size bytes
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 04/10] pkt-line: add packet_flush_gently()
  2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
                   ` (2 preceding siblings ...)
  2016-09-08 18:21 ` [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently() larsxschneider
@ 2016-09-08 18:21 ` larsxschneider
  2016-09-12 23:30   ` Junio C Hamano
  2016-09-08 18:21 ` [PATCH v7 05/10] pkt-line: add packet_write_gently() larsxschneider
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 45+ messages in thread
From: larsxschneider @ 2016-09-08 18:21 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_flush() would die in case of a write error even though for some
callers an error would be acceptable. Add packet_flush_gently() which
writes a pkt-line flush packet and returns `0` for success and `-1` for
failure.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 9 +++++++++
 pkt-line.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 3824d05..37345ca 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -91,6 +91,15 @@ void packet_flush(int fd)
 	write_or_die(fd, "0000", 4);
 }
 
+int packet_flush_gently(int fd)
+{
+	packet_trace("0000", 4, 1);
+	if (write_in_full(fd, "0000", 4) == 4)
+		return 0;
+	error("flush packet write failed");
+	return -1;
+}
+
 void packet_buf_flush(struct strbuf *buf)
 {
 	packet_trace("0000", 4, 1);
diff --git a/pkt-line.h b/pkt-line.h
index 3caea77..3fa0899 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -23,6 +23,7 @@ void packet_flush(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 
 /*
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 05/10] pkt-line: add packet_write_gently()
  2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
                   ` (3 preceding siblings ...)
  2016-09-08 18:21 ` [PATCH v7 04/10] pkt-line: add packet_flush_gently() larsxschneider
@ 2016-09-08 18:21 ` larsxschneider
  2016-09-08 21:24   ` Stefan Beller
  2016-09-12 23:31   ` Junio C Hamano
  2016-09-08 18:21 ` [PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 45+ messages in thread
From: larsxschneider @ 2016-09-08 18:21 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

packet_write_fmt_gently() uses format_packet() which lets the caller
only send string data via "%s". That means it cannot be used for
arbitrary data that may contain NULs.

Add packet_write_gently() which writes arbitrary data and returns `0`
for success and `-1` for an error. This function is used by other
pkt-line functions in a subsequent patch.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 37345ca..1d3d725 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -181,6 +181,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
 	return status;
 }
 
+int packet_write_gently(const int fd_out, const char *buf, size_t size)
+{
+	static char packet_write_buffer[LARGE_PACKET_MAX];
+
+	if (size > sizeof(packet_write_buffer) - 4) {
+		error("packet write failed");
+		return -1;
+	}
+	packet_trace(buf, size, 1);
+	size += 4;
+	set_packet_header(packet_write_buffer, size);
+	memcpy(packet_write_buffer + 4, buf, size - 4);
+	if (write_in_full(fd_out, packet_write_buffer, size) == size)
+		return 0;
+
+	error("packet write failed");
+	return -1;
+}
+
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 {
 	va_list args;
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams
  2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
                   ` (4 preceding siblings ...)
  2016-09-08 18:21 ` [PATCH v7 05/10] pkt-line: add packet_write_gently() larsxschneider
@ 2016-09-08 18:21 ` larsxschneider
  2016-09-08 21:49   ` Stefan Beller
  2016-09-08 18:21 ` [PATCH v7 07/10] convert: quote filter names in error messages larsxschneider
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 45+ messages in thread
From: larsxschneider @ 2016-09-08 18:21 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

write_packetized_from_fd() and write_packetized_from_buf() write a
stream of packets. All content packets use the maximal packet size
except for the last one. After the last content packet a `flush` control
packet is written.

read_packetized_to_buf() reads arbitrary sized packets until it detects
a `flush` packet.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 pkt-line.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 pkt-line.h |  7 +++++++
 2 files changed, 75 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 1d3d725..5001a07 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -209,6 +209,47 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 	va_end(args);
 }
 
+int write_packetized_from_fd(int fd_in, int fd_out)
+{
+	static char buf[PKTLINE_DATA_MAXLEN];
+	int err = 0;
+	ssize_t bytes_to_write;
+
+	while (!err) {
+		bytes_to_write = xread(fd_in, buf, sizeof(buf));
+		if (bytes_to_write < 0)
+			return COPY_READ_ERROR;
+		if (bytes_to_write == 0)
+			break;
+		err = packet_write_gently(fd_out, buf, bytes_to_write);
+	}
+	if (!err)
+		err = packet_flush_gently(fd_out);
+	return err;
+}
+
+int write_packetized_from_buf(const char *src_in, size_t len, int fd_out)
+{
+	static char buf[PKTLINE_DATA_MAXLEN];
+	int err = 0;
+	size_t bytes_written = 0;
+	size_t bytes_to_write;
+
+	while (!err) {
+		if ((len - bytes_written) > sizeof(buf))
+			bytes_to_write = sizeof(buf);
+		else
+			bytes_to_write = len - bytes_written;
+		if (bytes_to_write == 0)
+			break;
+		err = packet_write_gently(fd_out, src_in + bytes_written, bytes_to_write);
+		bytes_written += bytes_to_write;
+	}
+	if (!err)
+		err = packet_flush_gently(fd_out);
+	return err;
+}
+
 static int get_packet_data(int fd, char **src_buf, size_t *src_size,
 			   void *dst, unsigned size, int options)
 {
@@ -318,3 +359,30 @@ char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len)
 {
 	return packet_read_line_generic(-1, src, src_len, dst_len);
 }
+
+ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out)
+{
+	int paket_len;
+	int options = PACKET_READ_GENTLE_ON_EOF;
+
+	size_t oldlen = sb_out->len;
+	size_t oldalloc = sb_out->alloc;
+
+	for (;;) {
+		strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1);
+		paket_len = packet_read(fd_in, NULL, NULL,
+			sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, options);
+		if (paket_len <= 0)
+			break;
+		sb_out->len += paket_len;
+	}
+
+	if (paket_len < 0) {
+		if (oldalloc == 0)
+			strbuf_release(sb_out);
+		else
+			strbuf_setlen(sb_out, oldlen);
+		return paket_len;
+	}
+	return sb_out->len - oldlen;
+}
diff --git a/pkt-line.h b/pkt-line.h
index 3fa0899..6df8449 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,6 +25,8 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) __attribute__((format (printf, 2, 3)));
+int write_packetized_from_fd(int fd_in, int fd_out);
+int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
 
 /*
  * Read a packetized line into the buffer, which must be at least size bytes
@@ -77,6 +79,11 @@ char *packet_read_line(int fd, int *size);
  */
 char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
 
+/*
+ * Reads a stream of variable sized packets until a flush packet is detected.
+ */
+ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out);
+
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
 extern char packet_buffer[LARGE_PACKET_MAX];
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 07/10] convert: quote filter names in error messages
  2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
                   ` (5 preceding siblings ...)
  2016-09-08 18:21 ` [PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
@ 2016-09-08 18:21 ` larsxschneider
  2016-09-08 18:21 ` [PATCH v7 08/10] convert: modernize tests larsxschneider
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 45+ messages in thread
From: larsxschneider @ 2016-09-08 18:21 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard to
read in error messages. Quote them to improve the readability.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 convert.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index 077f5e6..986c239 100644
--- a/convert.c
+++ b/convert.c
@@ -412,7 +412,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	child_process.out = out;
 
 	if (start_command(&child_process))
-		return error("cannot fork to run external filter %s", params->cmd);
+		return error("cannot fork to run external filter '%s'", params->cmd);
 
 	sigchain_push(SIGPIPE, SIG_IGN);
 
@@ -430,13 +430,13 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	if (close(child_process.in))
 		write_err = 1;
 	if (write_err)
-		error("cannot feed the input to external filter %s", params->cmd);
+		error("cannot feed the input to external filter '%s'", params->cmd);
 
 	sigchain_pop(SIGPIPE);
 
 	status = finish_command(&child_process);
 	if (status)
-		error("external filter %s failed %d", params->cmd, status);
+		error("external filter '%s' failed %d", params->cmd, status);
 
 	strbuf_release(&cmd);
 	return (write_err || status);
@@ -477,15 +477,15 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 		return 0;	/* error was already reported */
 
 	if (strbuf_read(&nbuf, async.out, len) < 0) {
-		error("read from external filter %s failed", cmd);
+		error("read from external filter '%s' failed", cmd);
 		ret = 0;
 	}
 	if (close(async.out)) {
-		error("read from external filter %s failed", cmd);
+		error("read from external filter '%s' failed", cmd);
 		ret = 0;
 	}
 	if (finish_async(&async)) {
-		error("external filter %s failed", cmd);
+		error("external filter '%s' failed", cmd);
 		ret = 0;
 	}
 
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 08/10] convert: modernize tests
  2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
                   ` (6 preceding siblings ...)
  2016-09-08 18:21 ` [PATCH v7 07/10] convert: quote filter names in error messages larsxschneider
@ 2016-09-08 18:21 ` larsxschneider
  2016-09-08 22:05   ` Stefan Beller
  2016-09-08 18:21 ` [PATCH v7 09/10] convert: make apply_filter() adhere to standard Git error handling larsxschneider
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 45+ messages in thread
From: larsxschneider @ 2016-09-08 18:21 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Use `test_config` to set the config, check that files are empty with
`test_must_be_empty`, compare files with `test_cmp`, and remove spaces
after ">" and "<".

Please note that the "rot13" filter configured in "setup" keeps using
`git config` instead of `test_config` because subsequent tests might
depend on it.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 t/t0021-conversion.sh | 58 +++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e799e59..dc50938 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -38,8 +38,8 @@ script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
 
 test_expect_success check '
 
-	cmp test.o test &&
-	cmp test.o test.t &&
+	test_cmp test.o test &&
+	test_cmp test.o test.t &&
 
 	# ident should be stripped in the repository
 	git diff --raw --exit-code :test :test.i &&
@@ -47,10 +47,10 @@ test_expect_success check '
 	embedded=$(sed -ne "$script" test.i) &&
 	test "z$id" = "z$embedded" &&
 
-	git cat-file blob :test.t > test.r &&
+	git cat-file blob :test.t >test.r &&
 
-	./rot13.sh < test.o > test.t &&
-	cmp test.r test.t
+	./rot13.sh <test.o >test.t &&
+	test_cmp test.r test.t
 '
 
 # If an expanded ident ever gets into the repository, we want to make sure that
@@ -130,7 +130,7 @@ test_expect_success 'filter shell-escaped filenames' '
 
 	# delete the files and check them out again, using a smudge filter
 	# that will count the args and echo the command-line back to us
-	git config filter.argc.smudge "sh ./argc.sh %f" &&
+	test_config filter.argc.smudge "sh ./argc.sh %f" &&
 	rm "$normal" "$special" &&
 	git checkout -- "$normal" "$special" &&
 
@@ -141,7 +141,7 @@ test_expect_success 'filter shell-escaped filenames' '
 	test_cmp expect "$special" &&
 
 	# do the same thing, but with more args in the filter expression
-	git config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
+	test_config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
 	rm "$normal" "$special" &&
 	git checkout -- "$normal" "$special" &&
 
@@ -154,9 +154,9 @@ test_expect_success 'filter shell-escaped filenames' '
 '
 
 test_expect_success 'required filter should filter data' '
-	git config filter.required.smudge ./rot13.sh &&
-	git config filter.required.clean ./rot13.sh &&
-	git config filter.required.required true &&
+	test_config filter.required.smudge ./rot13.sh &&
+	test_config filter.required.clean ./rot13.sh &&
+	test_config filter.required.required true &&
 
 	echo "*.r filter=required" >.gitattributes &&
 
@@ -165,17 +165,17 @@ test_expect_success 'required filter should filter data' '
 
 	rm -f test.r &&
 	git checkout -- test.r &&
-	cmp test.o test.r &&
+	test_cmp test.o test.r &&
 
 	./rot13.sh <test.o >expected &&
 	git cat-file blob :test.r >actual &&
-	cmp expected actual
+	test_cmp expected actual
 '
 
 test_expect_success 'required filter smudge failure' '
-	git config filter.failsmudge.smudge false &&
-	git config filter.failsmudge.clean cat &&
-	git config filter.failsmudge.required true &&
+	test_config filter.failsmudge.smudge false &&
+	test_config filter.failsmudge.clean cat &&
+	test_config filter.failsmudge.required true &&
 
 	echo "*.fs filter=failsmudge" >.gitattributes &&
 
@@ -186,9 +186,9 @@ test_expect_success 'required filter smudge failure' '
 '
 
 test_expect_success 'required filter clean failure' '
-	git config filter.failclean.smudge cat &&
-	git config filter.failclean.clean false &&
-	git config filter.failclean.required true &&
+	test_config filter.failclean.smudge cat &&
+	test_config filter.failclean.clean false &&
+	test_config filter.failclean.required true &&
 
 	echo "*.fc filter=failclean" >.gitattributes &&
 
@@ -197,8 +197,8 @@ test_expect_success 'required filter clean failure' '
 '
 
 test_expect_success 'filtering large input to small output should use little memory' '
-	git config filter.devnull.clean "cat >/dev/null" &&
-	git config filter.devnull.required true &&
+	test_config filter.devnull.clean "cat >/dev/null" &&
+	test_config filter.devnull.required true &&
 	for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
 	echo "30MB filter=devnull" >.gitattributes &&
 	GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
@@ -207,7 +207,7 @@ test_expect_success 'filtering large input to small output should use little mem
 test_expect_success 'filter that does not read is fine' '
 	test-genrandom foo $((128 * 1024 + 1)) >big &&
 	echo "big filter=epipe" >.gitattributes &&
-	git config filter.epipe.clean "echo xyzzy" &&
+	test_config filter.epipe.clean "echo xyzzy" &&
 	git add big &&
 	git cat-file blob :big >actual &&
 	echo xyzzy >expect &&
@@ -215,20 +215,20 @@ test_expect_success 'filter that does not read is fine' '
 '
 
 test_expect_success EXPENSIVE 'filter large file' '
-	git config filter.largefile.smudge cat &&
-	git config filter.largefile.clean cat &&
+	test_config filter.largefile.smudge cat &&
+	test_config filter.largefile.clean cat &&
 	for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
 	echo "2GB filter=largefile" >.gitattributes &&
 	git add 2GB 2>err &&
-	! test -s err &&
+	test_must_be_empty err &&
 	rm -f 2GB &&
 	git checkout -- 2GB 2>err &&
-	! test -s err
+	test_must_be_empty err
 '
 
 test_expect_success "filter: clean empty file" '
-	git config filter.in-repo-header.clean  "echo cleaned && cat" &&
-	git config filter.in-repo-header.smudge "sed 1d" &&
+	test_config filter.in-repo-header.clean  "echo cleaned && cat" &&
+	test_config filter.in-repo-header.smudge "sed 1d" &&
 
 	echo "empty-in-worktree    filter=in-repo-header" >>.gitattributes &&
 	>empty-in-worktree &&
@@ -240,8 +240,8 @@ test_expect_success "filter: clean empty file" '
 '
 
 test_expect_success "filter: smudge empty file" '
-	git config filter.empty-in-repo.clean "cat >/dev/null" &&
-	git config filter.empty-in-repo.smudge "echo smudged && cat" &&
+	test_config filter.empty-in-repo.clean "cat >/dev/null" &&
+	test_config filter.empty-in-repo.smudge "echo smudged && cat" &&
 
 	echo "empty-in-repo filter=empty-in-repo" >>.gitattributes &&
 	echo dead data walking >empty-in-repo &&
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 09/10] convert: make apply_filter() adhere to standard Git error handling
  2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
                   ` (7 preceding siblings ...)
  2016-09-08 18:21 ` [PATCH v7 08/10] convert: modernize tests larsxschneider
@ 2016-09-08 18:21 ` larsxschneider
  2016-09-08 18:21 ` [PATCH v7 10/10] convert: add filter.<driver>.process option larsxschneider
  2016-09-13 16:00 ` [PATCH v7 00/10] Git filter protocol Junio C Hamano
  10 siblings, 0 replies; 45+ messages in thread
From: larsxschneider @ 2016-09-08 18:21 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

apply_filter() returns a boolean that tells the caller if it
"did convert or did not convert". The variable `ret` was used throughout
the function to track errors whereas `1` denoted success and `0`
failure. This is unusual for the Git source where `0` denotes success.

Rename the variable and flip its value to make the function easier
readable for Git developers.

Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 convert.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index 986c239..a068df7 100644
--- a/convert.c
+++ b/convert.c
@@ -451,7 +451,7 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	 *
 	 * (child --> cmd) --> us
 	 */
-	int ret = 1;
+	int err = 0;
 	struct strbuf nbuf = STRBUF_INIT;
 	struct async async;
 	struct filter_params params;
@@ -478,22 +478,22 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 
 	if (strbuf_read(&nbuf, async.out, len) < 0) {
 		error("read from external filter '%s' failed", cmd);
-		ret = 0;
+		err = -1;
 	}
 	if (close(async.out)) {
 		error("read from external filter '%s' failed", cmd);
-		ret = 0;
+		err = -1;
 	}
 	if (finish_async(&async)) {
 		error("external filter '%s' failed", cmd);
-		ret = 0;
+		err = -1;
 	}
 
-	if (ret) {
+	if (!err) {
 		strbuf_swap(dst, &nbuf);
 	}
 	strbuf_release(&nbuf);
-	return ret;
+	return !err;
 }
 
 static struct convert_driver {
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
                   ` (8 preceding siblings ...)
  2016-09-08 18:21 ` [PATCH v7 09/10] convert: make apply_filter() adhere to standard Git error handling larsxschneider
@ 2016-09-08 18:21 ` larsxschneider
  2016-09-10  6:29   ` Torsten Bögershausen
                     ` (2 more replies)
  2016-09-13 16:00 ` [PATCH v7 00/10] Git filter protocol Junio C Hamano
  10 siblings, 3 replies; 45+ messages in thread
From: larsxschneider @ 2016-09-08 18:21 UTC (permalink / raw)
  To: git
  Cc: peff, gitster, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller, Lars Schneider

From: Lars Schneider <larsxschneider@gmail.com>

Git's clean/smudge mechanism invokes an external filter process for
every single blob that is affected by a filter. If Git filters a lot of
blobs then the startup time of the external filter processes can become
a significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge
filter written in golang to filter 12,000 files. This process took 364s
with the existing filter mechanism and 5s with the new mechanism. See
details here: https://github.com/github/git-lfs/pull/1382

This patch adds the `filter.<driver>.process` string option which, if
used, keeps the external filter process running and processes all blobs
with the packet format (pkt-line) based protocol over standard input and
standard output. The full protocol is explained in detail in
`Documentation/gitattributes.txt`.

A few key decisions:

* The long running filter process is referred to as filter protocol
  version 2 because the existing single shot filter invocation is
  considered version 1.
* Git sends a welcome message and expects a response right after the
  external filter process has started. This ensures that Git will not
  hang if a version 1 filter is incorrectly used with the
  filter.<driver>.process option for version 2 filters. In addition,
  Git can detect this kind of error and warn the user.
* The status of a filter operation (e.g. "success" or "error) is set
  before the actual response and (if necessary!) re-set after the
  response. The advantage of this two step status response is that if
  the filter detects an error early, then the filter can communicate
  this and Git does not even need to create structures to read the
  response.
* All status responses are pkt-line lists terminated with a flush
  packet. This allows us to send other status fields with the same
  protocol in the future.

Helped-by: Martin-Louis Bright <mlbright@gmail.com>
Reviewed-by: Jakub Narebski <jnareb@gmail.com>
Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
---
 Documentation/gitattributes.txt        | 154 +++++++++++++-
 contrib/long-running-filter/example.pl | 111 ++++++++++
 convert.c                              | 349 ++++++++++++++++++++++++++++---
 pkt-line.h                             |   1 +
 t/t0021-conversion.sh                  | 365 ++++++++++++++++++++++++++++++++-
 t/t0021/rot13-filter.pl                | 179 ++++++++++++++++
 unpack-trees.c                         |   1 +
 7 files changed, 1129 insertions(+), 31 deletions(-)
 create mode 100755 contrib/long-running-filter/example.pl
 create mode 100755 t/t0021/rot13-filter.pl

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 7aff940..ac000ea 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -293,7 +293,13 @@ checkout, when the `smudge` command is specified, the command is
 fed the blob object from its standard input, and its standard
 output is used to update the worktree file.  Similarly, the
 `clean` command is used to convert the contents of worktree file
-upon checkin.
+upon checkin. By default these commands process only a single
+blob and terminate.  If a long running `process` filter is used
+in place of `clean` and/or `smudge` filters, then Git can process
+all blobs with a single filter command invocation for the entire
+life of a single Git command, for example `git add --all`.  See
+section below for the description of the protocol used to
+communicate with a `process` filter.
 
 One use of the content filtering is to massage the content into a shape
 that is more convenient for the platform, filesystem, and the user to use.
@@ -373,6 +379,152 @@ not exist, or may have different contents. So, smudge and clean commands
 should not try to access the file on disk, but only act as filters on the
 content provided to them on standard input.
 
+Long Running Filter Process
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+If the filter command (a string value) is defined via
+`filter.<driver>.process` then Git can process all blobs with a
+single filter invocation for the entire life of a single Git
+command. This is achieved by using the following packet format
+(pkt-line, see technical/protocol-common.txt) based protocol over
+standard input and standard output.
+
+Git starts the filter when it encounters the first file
+that needs to be cleaned or smudged. After the filter started
+Git sends a welcome message ("git-filter-client"), a list of
+supported protocol version numbers, and a flush packet. Git expects
+to read a welcome response message ("git-filter-server") and exactly
+one protocol version number from the previously sent list. All further
+communication will be based on the selected version. The remaining
+protocol description below documents "version=2". Please note that
+"version=42" in the example below does not exist and is only there
+to illustrate how the protocol would look like with more than one
+version.
+
+After the version negotiation Git sends a list of supported capabilities
+and a flush packet. Git expects to read a list of desired capabilities,
+which must be a subset of the supported capabilities list, and a flush
+packet as response:
+------------------------
+packet:          git> git-filter-client
+packet:          git> version=2
+packet:          git> version=42
+packet:          git> 0000
+packet:          git< git-filter-server
+packet:          git< version=2
+packet:          git> clean=true
+packet:          git> smudge=true
+packet:          git> not-yet-invented=true
+packet:          git> 0000
+packet:          git< clean=true
+packet:          git< smudge=true
+packet:          git< 0000
+------------------------
+Supported filter capabilities in version 2 are "clean" and
+"smudge".
+
+Afterwards Git sends a list of "key=value" pairs terminated with
+a flush packet. The list will contain at least the filter command
+(based on the supported capabilities) and the pathname of the file
+to filter relative to the repository root. Right after these packets
+Git sends the content split in zero or more pkt-line packets and a
+flush packet to terminate content.
+------------------------
+packet:          git> command=smudge\n
+packet:          git> pathname=path/testfile.dat\n
+packet:          git> 0000
+packet:          git> CONTENT
+packet:          git> 0000
+------------------------
+
+The filter is expected to respond with a list of "key=value" pairs
+terminated with a flush packet. If the filter does not experience
+problems then the list must contain a "success" status. Right after
+these packets the filter is expected to send the content in zero
+or more pkt-line packets and a flush packet at the end. Finally, a
+second list of "key=value" pairs terminated with a flush packet
+is expected. The filter can change the status in the second list.
+------------------------
+packet:          git< status=success\n
+packet:          git< 0000
+packet:          git< SMUDGED_CONTENT
+packet:          git< 0000
+packet:          git< 0000  # empty list!
+------------------------
+
+If the result content is empty then the filter is expected to respond
+with a success status and an empty list.
+------------------------
+packet:          git< status=success\n
+packet:          git< 0000
+packet:          git< 0000  # empty content!
+packet:          git< 0000  # empty list!
+------------------------
+
+In case the filter cannot or does not want to process the content,
+it is expected to respond with an "error" status. Depending on the
+`filter.<driver>.required` flag Git will interpret that as error
+but it will not stop or restart the filter process.
+------------------------
+packet:          git< status=error\n
+packet:          git< 0000
+------------------------
+
+If the filter experiences an error during processing, then it can
+send the status "error" after the content was (partially or
+completely) sent. Depending on the `filter.<driver>.required` flag
+Git will interpret that as error but it will not stop or restart the
+filter process.
+------------------------
+packet:          git< status=success\n
+packet:          git< 0000
+packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
+packet:          git< 0000
+packet:          git< status=error\n
+packet:          git< 0000
+------------------------
+
+If the filter dies during the communication or does not adhere to
+the protocol then Git will stop the filter process and restart it
+with the next file that needs to be processed. Depending on the
+`filter.<driver>.required` flag Git will interpret that as error.
+
+The error handling for all cases above mimic the behavior of
+the `filter.<driver>.clean` / `filter.<driver>.smudge` error
+handling.
+
+In case the filter cannot or does not want to process the content
+as well as any future content for the lifetime of the Git process,
+it is expected to respond with an "abort" status. Depending on
+the `filter.<driver>.required` flag Git will interpret that as error
+for the content as well as any future content for the lifetime of the
+Git process but it will not stop or restart the filter process.
+------------------------
+packet:          git< status=abort\n
+packet:          git< 0000
+------------------------
+
+After the filter has processed a blob it is expected to wait for
+the next "key=value" list containing a command. Git will close
+the command pipe on exit. The filter is expected to detect EOF
+and exit gracefully on its own.
+
+A long running filter demo implementation can be found in
+`contrib/long-running-filter/example.pl` located in the Git 
+core repository. If you develop your own long running filter 
+process then the `GIT_TRACE_PACKET` environment variables can be
+very helpful for debugging (see linkgit:git[1]).
+
+If a `filter.<driver>.process` command is configured then it
+always takes precedence over a configured `filter.<driver>.clean`
+or `filter.<driver>.smudge` command.
+
+Please note that you cannot use an existing `filter.<driver>.clean`
+or `filter.<driver>.smudge` command with `filter.<driver>.process`
+because the former two use a different inter process communication
+protocol than the latter one.
+
+
 Interaction between checkin/checkout attributes
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl
new file mode 100755
index 0000000..279fbfb
--- /dev/null
+++ b/contrib/long-running-filter/example.pl
@@ -0,0 +1,111 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git filter protocol version 2
+# See Documentation/gitattributes.txt, section "Filter Protocol"
+#
+
+use strict;
+use warnings;
+
+my $MAX_PACKET_CONTENT_SIZE = 65516;
+
+sub packet_read {
+    my $buffer;
+    my $bytes_read = read STDIN, $buffer, 4;
+    if ( $bytes_read == 0 ) {
+
+        # EOF - Git stopped talking to us!
+        exit();
+    }
+    elsif ( $bytes_read != 4 ) {
+        die "invalid packet size '$bytes_read' field";
+    }
+    my $pkt_size = hex($buffer);
+    if ( $pkt_size == 0 ) {
+        return ( 1, "" );
+    }
+    elsif ( $pkt_size > 4 ) {
+        my $content_size = $pkt_size - 4;
+        $bytes_read = read STDIN, $buffer, $content_size;
+        if ( $bytes_read != $content_size ) {
+            die "invalid packet ($content_size expected; $bytes_read read)";
+        }
+        return ( 0, $buffer );
+    }
+    else {
+        die "invalid packet size";
+    }
+}
+
+sub packet_write {
+    my ($packet) = @_;
+    print STDOUT sprintf( "%04x", length($packet) + 4 );
+    print STDOUT $packet;
+    STDOUT->flush();
+}
+
+sub packet_flush {
+    print STDOUT sprintf( "%04x", 0 );
+    STDOUT->flush();
+}
+
+( packet_read() eq ( 0, "git-filter-client" ) ) || die "bad initialization";
+( packet_read() eq ( 0, "version=2" ) )         || die "bad version";
+( packet_read() eq ( 1, "" ) )                  || die "bad version end";
+
+packet_write("git-filter-server\n");
+packet_write("version=2\n");
+
+( packet_read() eq ( 0, "clean=true" ) )  || die "bad capability";
+( packet_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
+( packet_read() eq ( 1, "" ) )            || die "bad capability end";
+
+packet_write( "clean=true\n" );
+packet_write( "smudge=true\n" );
+packet_flush();
+
+while (1) {
+    my ($command) = packet_read() =~ /^command=([^=]+)\n$/;
+    my ($pathname) = packet_read() =~ /^pathname=([^=]+)\n$/;
+
+    packet_read();
+
+    my $input = "";
+    {
+        binmode(STDIN);
+        my $buffer;
+        my $done = 0;
+        while ( !$done ) {
+            ( $done, $buffer ) = packet_read();
+            $input .= $buffer;
+        }
+    }
+
+    my $output;
+    if ( $command eq "clean" ) {
+        ### Perform clean here ###
+        $output = $input;
+    }
+    elsif ( $command eq "smudge" ) {
+        ### Perform smudge here ###
+        $output = $input;
+    }
+    else {
+        die "bad command '$command'";
+    }
+
+    packet_write("status=success\n");
+    packet_flush();
+    while ( length($output) > 0 ) {
+        my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
+        packet_write($packet);
+        if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
+            $output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
+        }
+        else {
+            $output = "";
+        }
+    }
+    packet_flush(); # flush content!
+    packet_flush(); # empty list!
+}
diff --git a/convert.c b/convert.c
index a068df7..0ed48ed 100644
--- a/convert.c
+++ b/convert.c
@@ -3,6 +3,7 @@
 #include "run-command.h"
 #include "quote.h"
 #include "sigchain.h"
+#include "pkt-line.h"
 
 /*
  * convert.c - convert a file when checking it out and checking it in.
@@ -442,7 +443,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
 	return (write_err || status);
 }
 
-static int apply_filter(const char *path, const char *src, size_t len, int fd,
+static int apply_single_file_filter(const char *path, const char *src, size_t len, int fd,
                         struct strbuf *dst, const char *cmd)
 {
 	/*
@@ -456,12 +457,6 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	struct async async;
 	struct filter_params params;
 
-	if (!cmd || !*cmd)
-		return 0;
-
-	if (!dst)
-		return 1;
-
 	memset(&async, 0, sizeof(async));
 	async.proc = filter_buffer_or_fd;
 	async.data = &params;
@@ -496,14 +491,318 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
 	return !err;
 }
 
+#define CAP_CLEAN    (1u<<0)
+#define CAP_SMUDGE   (1u<<1)
+
+struct cmd2process {
+	struct hashmap_entry ent; /* must be the first member! */
+	int supported_capabilities;
+	const char *cmd;
+	struct child_process process;
+};
+
+static int cmd_process_map_initialized;
+static struct hashmap cmd_process_map;
+
+static int cmd2process_cmp(const struct cmd2process *e1,
+                           const struct cmd2process *e2,
+                           const void *unused)
+{
+	return strcmp(e1->cmd, e2->cmd);
+}
+
+static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd)
+{
+	struct cmd2process key;
+	hashmap_entry_init(&key, strhash(cmd));
+	key.cmd = cmd;
+	return hashmap_get(hashmap, &key, NULL);
+}
+
+static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *entry)
+{
+	if (!entry)
+		return;
+	sigchain_push(SIGPIPE, SIG_IGN);
+	/*
+	 * We kill the filter most likely because an error happened already.
+	 * That's why we are not interested in any error code here.
+	 */
+	close(entry->process.in);
+	close(entry->process.out);
+	sigchain_pop(SIGPIPE);
+	finish_command(&entry->process);
+	hashmap_remove(hashmap, entry, NULL);
+	free(entry);
+}
+
+static int packet_write_list(int fd, const char *line, ...)
+{
+	va_list args;
+	int err;
+	va_start(args, line);
+	for (;;)
+	{
+		if (!line)
+			break;
+		if (strlen(line) > PKTLINE_DATA_MAXLEN)
+			return -1;
+		err = packet_write_fmt_gently(fd, "%s", line);
+		if (err)
+			return err;
+		line = va_arg(args, const char*);
+	}
+	va_end(args);
+	return packet_flush_gently(fd);
+}
+
+static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
+{
+	int err;
+	struct cmd2process *entry;
+	struct child_process *process;
+	const char *argv[] = { cmd, NULL };
+	struct string_list cap_list = STRING_LIST_INIT_NODUP;
+	char *cap_buf;
+	const char *cap_name;
+
+	entry = xmalloc(sizeof(*entry));
+	hashmap_entry_init(entry, strhash(cmd));
+	entry->cmd = cmd;
+	entry->supported_capabilities = 0;
+	process = &entry->process;
+
+	child_process_init(process);
+	process->argv = argv;
+	process->use_shell = 1;
+	process->in = -1;
+	process->out = -1;
+
+	if (start_command(process)) {
+		error("cannot fork to run external filter '%s'", cmd);
+		kill_multi_file_filter(hashmap, entry);
+		return NULL;
+	}
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+	err = packet_write_list(process->in, "git-filter-client", "version=2", NULL);
+	if (err)
+		goto done;
+
+	err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");
+	if (err) {
+		error("external filter '%s' does not support long running filter protocol", cmd);
+		goto done;
+	}
+	err = strcmp(packet_read_line(process->out, NULL), "version=2");
+	if (err)
+		goto done;
+
+	err = packet_write_list(process->in, "clean=true", "smudge=true", NULL);
+
+	for (;;)
+	{
+		cap_buf = packet_read_line(process->out, NULL);
+		if (!cap_buf)
+			break;
+		string_list_split_in_place(&cap_list, cap_buf, '=', 1);
+
+		if (cap_list.nr != 2 || strcmp(cap_list.items[1].string, "true"))
+			continue;
+
+		cap_name = cap_list.items[0].string;
+		if (!strcmp(cap_name, "clean")) {
+			entry->supported_capabilities |= CAP_CLEAN;
+		} else if (!strcmp(cap_name, "smudge")) {
+			entry->supported_capabilities |= CAP_SMUDGE;
+		} else {
+			warning(
+				"external filter '%s' requested unsupported filter capability '%s'",
+				cmd, cap_name
+			);
+		}
+
+		string_list_clear(&cap_list, 0);
+	}
+
+done:
+	sigchain_pop(SIGPIPE);
+
+	if (err || errno == EPIPE) {
+		error("initialization for external filter '%s' failed", cmd);
+		kill_multi_file_filter(hashmap, entry);
+		return NULL;
+	}
+
+	hashmap_add(hashmap, entry);
+	return entry;
+}
+
+static void read_multi_file_filter_values(int fd, struct strbuf *status) {
+	struct strbuf **pair;
+	char *line;
+	for (;;) {
+		line = packet_read_line(fd, NULL);
+		if (!line)
+			break;
+		pair = strbuf_split_str(line, '=', 2);
+		if (pair[0] && pair[0]->len && pair[1]) {
+			if (!strcmp(pair[0]->buf, "status=")) {
+				strbuf_reset(status);
+				strbuf_addbuf(status, pair[1]);
+			}
+		}
+	}
+}
+
+static int apply_multi_file_filter(const char *path, const char *src, size_t len,
+                                   int fd, struct strbuf *dst, const char *cmd,
+                                   const int wanted_capability)
+{
+	int err;
+	struct cmd2process *entry;
+	struct child_process *process;
+	struct stat file_stat;
+	struct strbuf nbuf = STRBUF_INIT;
+	struct strbuf filter_status = STRBUF_INIT;
+	char *filter_type;
+
+	if (!cmd_process_map_initialized) {
+		cmd_process_map_initialized = 1;
+		hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
+		entry = NULL;
+	} else {
+		entry = find_multi_file_filter_entry(&cmd_process_map, cmd);
+	}
+
+	fflush(NULL);
+
+	if (!entry) {
+		entry = start_multi_file_filter(&cmd_process_map, cmd);
+		if (!entry)
+			return 0;
+	}
+	process = &entry->process;
+
+	if (!(wanted_capability & entry->supported_capabilities))
+		return 0;
+
+	if (CAP_CLEAN & wanted_capability)
+		filter_type = "clean";
+	else if (CAP_SMUDGE & wanted_capability)
+		filter_type = "smudge";
+	else
+		die("unexpected filter type");
+
+	if (fd >= 0 && !src) {
+		if (fstat(fd, &file_stat) == -1)
+			return 0;
+		len = xsize_t(file_stat.st_size);
+	}
+
+	sigchain_push(SIGPIPE, SIG_IGN);
+
+
+	err = (strlen(filter_type) > PKTLINE_DATA_MAXLEN);
+	if (err)
+		goto done;
+	err = packet_write_fmt_gently(process->in, "command=%s\n", filter_type);
+	if (err)
+		goto done;
+
+	err = (strlen(path) > PKTLINE_DATA_MAXLEN);
+	if (err)
+		goto done;
+	err = packet_write_fmt_gently(process->in, "pathname=%s\n", path);
+	if (err)
+		goto done;
+
+	err = packet_flush_gently(process->in);
+	if (err)
+		goto done;
+
+	if (fd >= 0)
+		err = write_packetized_from_fd(fd, process->in);
+	else
+		err = write_packetized_from_buf(src, len, process->in);
+	if (err)
+		goto done;
+
+	read_multi_file_filter_values(process->out, &filter_status);
+	err = strcmp(filter_status.buf, "success");
+	if (err)
+		goto done;
+
+	err = read_packetized_to_buf(process->out, &nbuf) < 0;
+	if (err)
+		goto done;
+
+	read_multi_file_filter_values(process->out, &filter_status);
+	err = strcmp(filter_status.buf, "success");
+
+done:
+	sigchain_pop(SIGPIPE);
+
+	if (err || errno == EPIPE) {
+		if (!strcmp(filter_status.buf, "error")) {
+			/* The filter signaled a problem with the file. */
+		} else if (!strcmp(filter_status.buf, "abort")) {
+			/*
+			 * The filter signaled a permanent problem. Don't try to filter
+			 * files with the same command for the lifetime of the current
+			 * Git process.
+			 */
+			 entry->supported_capabilities &= ~wanted_capability;
+		} else {
+			/*
+			 * Something went wrong with the protocol filter.
+			 * Force shutdown and restart if another blob requires filtering!
+			 */
+			error("external filter '%s' failed", cmd);
+			kill_multi_file_filter(&cmd_process_map, entry);
+		}
+	} else {
+		strbuf_swap(dst, &nbuf);
+	}
+	strbuf_release(&nbuf);
+	return !err;
+}
+
 static struct convert_driver {
 	const char *name;
 	struct convert_driver *next;
 	const char *smudge;
 	const char *clean;
+	const char *process;
 	int required;
 } *user_convert, **user_convert_tail;
 
+static int apply_filter(const char *path, const char *src, size_t len,
+                        int fd, struct strbuf *dst, struct convert_driver *drv,
+                        const int wanted_capability)
+{
+	const char* cmd = NULL;
+
+	if (!drv)
+		return 0;
+
+	if (!dst)
+		return 1;
+
+	if (!drv->process && (CAP_CLEAN & wanted_capability) && drv->clean)
+		cmd = drv->clean;
+	else if (!drv->process && (CAP_SMUDGE & wanted_capability) && drv->smudge)
+		cmd = drv->smudge;
+
+	if (cmd && *cmd)
+		return apply_single_file_filter(path, src, len, fd, dst, cmd);
+	else if (drv->process && *drv->process)
+		return apply_multi_file_filter(path, src, len, fd, dst, drv->process, wanted_capability);
+
+	return 0;
+}
+
 static int read_convert_config(const char *var, const char *value, void *cb)
 {
 	const char *key, *name;
@@ -541,6 +840,9 @@ static int read_convert_config(const char *var, const char *value, void *cb)
 	if (!strcmp("clean", key))
 		return git_config_string(&drv->clean, var, value);
 
+	if (!strcmp("process", key))
+		return git_config_string(&drv->process, var, value);
+
 	if (!strcmp("required", key)) {
 		drv->required = git_config_bool(var, value);
 		return 0;
@@ -842,7 +1144,7 @@ int would_convert_to_git_filter_fd(const char *path)
 	if (!ca.drv->required)
 		return 0;
 
-	return apply_filter(path, NULL, 0, -1, NULL, ca.drv->clean);
+	return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN);
 }
 
 const char *get_convert_attr_ascii(const char *path)
@@ -875,18 +1177,12 @@ int convert_to_git(const char *path, const char *src, size_t len,
                    struct strbuf *dst, enum safe_crlf checksafe)
 {
 	int ret = 0;
-	const char *filter = NULL;
-	int required = 0;
 	struct conv_attrs ca;
 
 	convert_attrs(&ca, path);
-	if (ca.drv) {
-		filter = ca.drv->clean;
-		required = ca.drv->required;
-	}
 
-	ret |= apply_filter(path, src, len, -1, dst, filter);
-	if (!ret && required)
+	ret |= apply_filter(path, src, len, -1, dst, ca.drv, CAP_CLEAN);
+	if (!ret && ca.drv && ca.drv->required)
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
 	if (ret && dst) {
@@ -908,9 +1204,9 @@ void convert_to_git_filter_fd(const char *path, int fd, struct strbuf *dst,
 	convert_attrs(&ca, path);
 
 	assert(ca.drv);
-	assert(ca.drv->clean);
+	assert(ca.drv->clean || ca.drv->process);
 
-	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv->clean))
+	if (!apply_filter(path, NULL, 0, fd, dst, ca.drv, CAP_CLEAN))
 		die("%s: clean filter '%s' failed", path, ca.drv->name);
 
 	crlf_to_git(path, dst->buf, dst->len, dst, ca.crlf_action, checksafe);
@@ -922,15 +1218,9 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 					    int normalizing)
 {
 	int ret = 0, ret_filter = 0;
-	const char *filter = NULL;
-	int required = 0;
 	struct conv_attrs ca;
 
 	convert_attrs(&ca, path);
-	if (ca.drv) {
-		filter = ca.drv->smudge;
-		required = ca.drv->required;
-	}
 
 	ret |= ident_to_worktree(path, src, len, dst, ca.ident);
 	if (ret) {
@@ -939,9 +1229,10 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 	}
 	/*
 	 * CRLF conversion can be skipped if normalizing, unless there
-	 * is a smudge filter.  The filter might expect CRLFs.
+	 * is a smudge or process filter (even if the process filter doesn't
+	 * support smudge).  The filters might expect CRLFs.
 	 */
-	if (filter || !normalizing) {
+	if ((ca.drv && (ca.drv->smudge || ca.drv->process)) || !normalizing) {
 		ret |= crlf_to_worktree(path, src, len, dst, ca.crlf_action);
 		if (ret) {
 			src = dst->buf;
@@ -949,8 +1240,8 @@ static int convert_to_working_tree_internal(const char *path, const char *src,
 		}
 	}
 
-	ret_filter = apply_filter(path, src, len, -1, dst, filter);
-	if (!ret_filter && required)
+	ret_filter = apply_filter(path, src, len, -1, dst, ca.drv, CAP_SMUDGE);
+	if (!ret_filter && ca.drv && ca.drv->required)
 		die("%s: smudge filter %s failed", path, ca.drv->name);
 
 	return ret | ret_filter;
@@ -1402,7 +1693,7 @@ struct stream_filter *get_stream_filter(const char *path, const unsigned char *s
 	struct stream_filter *filter = NULL;
 
 	convert_attrs(&ca, path);
-	if (ca.drv && (ca.drv->smudge || ca.drv->clean))
+	if (ca.drv && (ca.drv->process || ca.drv->smudge || ca.drv->clean))
 		return NULL;
 
 	if (ca.crlf_action == CRLF_AUTO || ca.crlf_action == CRLF_AUTO_CRLF)
diff --git a/pkt-line.h b/pkt-line.h
index 6df8449..3d873f3 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -86,6 +86,7 @@ ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out);
 
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
+#define PKTLINE_DATA_MAXLEN (LARGE_PACKET_MAX - 4)
 extern char packet_buffer[LARGE_PACKET_MAX];
 
 #endif
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index dc50938..1c98ac3 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -31,7 +31,10 @@ test_expect_success setup '
 	cat test >test.i &&
 	git add test test.t test.i &&
 	rm -f test test.t test.i &&
-	git checkout -- test test.t test.i
+	git checkout -- test test.t test.i &&
+
+	echo "content-test2" >test2.o &&
+	echo "content-test3-subdir" >test3-subdir.o
 '
 
 script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
@@ -279,4 +282,364 @@ test_expect_success 'diff does not reuse worktree files that need cleaning' '
 	test_line_count = 0 count
 '
 
+check_filter () {
+	rm -f rot13-filter.log actual.log &&
+	"$@" 2> git_stderr.log &&
+	test_must_be_empty git_stderr.log &&
+	cat >expected.log &&
+	sort rot13-filter.log | uniq -c | sed "s/^[ ]*//" >actual.log &&
+	test_cmp expected.log actual.log
+}
+
+check_filter_count_clean () {
+	rm -f rot13-filter.log actual.log &&
+	"$@" 2> git_stderr.log &&
+	test_must_be_empty git_stderr.log &&
+	cat >expected.log &&
+	sort rot13-filter.log | uniq -c | sed "s/^[ ]*//" |
+		sed "s/^\([0-9]\) IN: clean/x IN: clean/" >actual.log &&
+	test_cmp expected.log actual.log
+}
+
+check_filter_ignore_clean () {
+	rm -f rot13-filter.log actual.log &&
+	"$@" &&
+	cat >expected.log &&
+	grep -v "IN: clean" rot13-filter.log >actual.log &&
+	test_cmp expected.log actual.log
+}
+
+check_filter_no_call () {
+	rm -f rot13-filter.log &&
+	"$@" 2> git_stderr.log &&
+	test_must_be_empty git_stderr.log &&
+	test_must_be_empty rot13-filter.log
+}
+
+check_rot13 () {
+	test_cmp $1 $2 &&
+	./../rot13.sh <$1 >expected &&
+	git cat-file blob :$2 >actual &&
+	test_cmp expected actual
+}
+
+test_expect_success PERL 'required process filter should filter data' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+		git add . &&
+		git commit . -m "test commit" &&
+		git branch empty &&
+
+		cp ../test.o test.r &&
+		cp ../test2.o test2.r &&
+		mkdir testsubdir &&
+		cp ../test3-subdir.o testsubdir/test3-subdir.r &&
+		>test4-empty.r &&
+
+		check_filter \
+			git add . \
+				<<-\EOF &&
+					1 IN: clean test.r 57 [OK] -- OUT: 57 . [OK]
+					1 IN: clean test2.r 14 [OK] -- OUT: 14 . [OK]
+					1 IN: clean test4-empty.r 0 [OK] -- OUT: 0  [OK]
+					1 IN: clean testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 . [OK]
+					1 START
+					1 STOP
+					1 wrote filter header
+				EOF
+
+		check_filter_count_clean \
+			git commit . -m "test commit" \
+				<<-\EOF &&
+					x IN: clean test.r 57 [OK] -- OUT: 57 . [OK]
+					x IN: clean test2.r 14 [OK] -- OUT: 14 . [OK]
+					x IN: clean test4-empty.r 0 [OK] -- OUT: 0  [OK]
+					x IN: clean testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 . [OK]
+					1 START
+					1 STOP
+					1 wrote filter header
+				EOF
+
+		rm -f test?.r testsubdir/test3-subdir.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					START
+					wrote filter header
+					IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK]
+					IN: smudge testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 . [OK]
+					STOP
+				EOF
+
+		check_filter_ignore_clean \
+			git checkout empty \
+				<<-\EOF &&
+					START
+					wrote filter header
+					STOP
+				EOF
+
+		check_filter_ignore_clean \
+			git checkout master \
+				<<-\EOF &&
+					START
+					wrote filter header
+					IN: smudge test.r 57 [OK] -- OUT: 57 . [OK]
+					IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK]
+					IN: smudge test4-empty.r 0 [OK] -- OUT: 0  [OK]
+					IN: smudge testsubdir/test3-subdir.r 21 [OK] -- OUT: 21 . [OK]
+					STOP
+				EOF
+
+		check_rot13 ../test.o test.r &&
+		check_rot13 ../test2.o test2.r &&
+		check_rot13 ../test3-subdir.o testsubdir/test3-subdir.r
+	)
+'
+
+test_expect_success PERL 'required process filter should clean only and take precedence' '
+	test_config_global filter.protocol.clean ./../rot13.sh &&
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+		git add . &&
+		git commit . -m "test commit" &&
+		git branch empty &&
+
+		cp ../test.o test.r &&
+
+		check_filter \
+			git add . \
+				<<-\EOF &&
+					1 IN: clean test.r 57 [OK] -- OUT: 57 . [OK]
+					1 START
+					1 STOP
+					1 wrote filter header
+				EOF
+
+		check_filter_count_clean \
+			git commit . -m "test commit" \
+				<<-\EOF
+					x IN: clean test.r 57 [OK] -- OUT: 57 . [OK]
+					1 START
+					1 STOP
+					1 wrote filter header
+				EOF
+	)
+'
+
+generate_test_data () {
+	LEN=$1
+	NAME=$2
+	test-genrandom end $LEN |
+		perl -pe "s/./chr((ord($&) % 26) + 97)/sge" >../$NAME.file &&
+	cp ../$NAME.file . &&
+	./../rot13.sh <../$NAME.file >../$NAME.file.rot13
+}
+
+test_expect_success PERL 'required process filter should process multiple packets' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		# Generate data that requires 3 packets
+		PKTLINE_DATA_MAXLEN=65516 &&
+
+		generate_test_data $(($PKTLINE_DATA_MAXLEN        )) 1pkt_1__ &&
+		generate_test_data $(($PKTLINE_DATA_MAXLEN     + 1)) 2pkt_1+1 &&
+		generate_test_data $(($PKTLINE_DATA_MAXLEN * 2 - 1)) 2pkt_2-1 &&
+		generate_test_data $(($PKTLINE_DATA_MAXLEN * 2    )) 2pkt_2__ &&
+		generate_test_data $(($PKTLINE_DATA_MAXLEN * 2 + 1)) 3pkt_2+1 &&
+
+		echo "*.file filter=protocol" >.gitattributes &&
+		check_filter \
+			git add *.file .gitattributes \
+				<<-\EOF &&
+					1 IN: clean 1pkt_1__.file 65516 [OK] -- OUT: 65516 . [OK]
+					1 IN: clean 2pkt_1+1.file 65517 [OK] -- OUT: 65517 .. [OK]
+					1 IN: clean 2pkt_2-1.file 131031 [OK] -- OUT: 131031 .. [OK]
+					1 IN: clean 2pkt_2__.file 131032 [OK] -- OUT: 131032 .. [OK]
+					1 IN: clean 3pkt_2+1.file 131033 [OK] -- OUT: 131033 ... [OK]
+					1 START
+					1 STOP
+					1 wrote filter header
+				EOF
+		git commit . -m "test commit" &&
+
+		rm -f *.file &&
+		git checkout -- *.file &&
+
+		for f in *.file
+		do
+			git cat-file blob :$f >actual &&
+			test_cmp ../$f.rot13 actual
+		done
+	)
+'
+
+test_expect_success PERL 'required process filter should with clean error should fail' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp ../test.o test.r &&
+		echo "this is going to fail" >clean-write-fail.r &&
+		echo "content-test3-subdir" >test3.r &&
+
+		# Note: There are three clean paths in convert.c we just test one here.
+		test_must_fail git add .
+	)
+'
+
+test_expect_success PERL 'process filter should restart after unexpected write failure' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp ../test.o test.r &&
+		cp ../test2.o test2.r &&
+		echo "this is going to fail" >smudge-write-fail.o &&
+		cat smudge-write-fail.o >smudge-write-fail.r &&
+		git add . &&
+		git commit . -m "test commit" &&
+		rm -f *.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					START
+					wrote filter header
+					IN: smudge smudge-write-fail.r 22 [OK] -- OUT: 22 [WRITE FAIL]
+					START
+					wrote filter header
+					IN: smudge test.r 57 [OK] -- OUT: 57 . [OK]
+					IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK]
+					STOP
+				EOF
+
+		check_rot13 ../test.o test.r &&
+		check_rot13 ../test2.o test2.r &&
+
+		! test_cmp smudge-write-fail.o smudge-write-fail.r && # Smudge failed!
+		./../rot13.sh <smudge-write-fail.o >expected &&
+		git cat-file blob :smudge-write-fail.r >actual &&
+		test_cmp expected actual							  # Clean worked!
+	)
+'
+
+test_expect_success PERL 'process filter should not restart in case of an error' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp ../test.o test.r &&
+		cp ../test2.o test2.r &&
+		echo "this will cause an error" >error.o &&
+		cp error.o error.r &&
+		git add . &&
+		git commit . -m "test commit" &&
+		rm -f *.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					START
+					wrote filter header
+					IN: smudge error.r 25 [OK] -- OUT: 0 [ERROR]
+					IN: smudge test.r 57 [OK] -- OUT: 57 . [OK]
+					IN: smudge test2.r 14 [OK] -- OUT: 14 . [OK]
+					STOP
+				EOF
+
+		check_rot13 ../test.o test.r &&
+		check_rot13 ../test2.o test2.r &&
+		test_cmp error.o error.r
+	)
+'
+
+test_expect_success PERL 'process filter should be able to signal an error for all future files' '
+	test_config_global filter.protocol.process "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp ../test.o test.r &&
+		cp ../test2.o test2.r &&
+		echo "error this blob and all future blobs" >abort.o &&
+		cp abort.o abort.r &&
+		git add . &&
+		git commit . -m "test commit" &&
+		rm -f *.r &&
+
+		check_filter_ignore_clean \
+			git checkout . \
+				<<-\EOF &&
+					START
+					wrote filter header
+					IN: smudge abort.r 37 [OK] -- OUT: 0 [ABORT]
+					STOP
+				EOF
+
+		test_cmp ../test.o test.r &&
+		test_cmp ../test2.o test2.r &&
+		test_cmp abort.o abort.r
+	)
+'
+
+test_expect_success PERL 'invalid process filter must fail (and not hang!)' '
+	test_config_global filter.protocol.process cat &&
+	test_config_global filter.protocol.required true &&
+	rm -rf repo &&
+	mkdir repo &&
+	(
+		cd repo &&
+		git init &&
+
+		echo "*.r filter=protocol" >.gitattributes &&
+
+		cp ../test.o test.r &&
+		test_must_fail git add . 2> git_stderr.log &&
+		grep "not support long running filter protocol" git_stderr.log
+	)
+'
+
 test_done
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
new file mode 100755
index 0000000..8e27877
--- /dev/null
+++ b/t/t0021/rot13-filter.pl
@@ -0,0 +1,179 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git filter protocol version 2
+# See Documentation/gitattributes.txt, section "Filter Protocol"
+#
+# The script takes the list of supported protocol capabilities as
+# arguments ("clean", "smudge", etc).
+#
+# This implementation supports special test cases:
+# (1) If data with the pathname "clean-write-fail.r" is processed with
+#     a "clean" operation then the write operation will die.
+# (2) If data with the pathname "smudge-write-fail.r" is processed with
+#     a "smudge" operation then the write operation will die.
+# (3) If data with the pathname "error.r" is processed with any
+#     operation then the filter signals that it cannot or does not want
+#     to process the file.
+# (4) If data with the pathname "abort.r" is processed with any
+#     operation then the filter signals that it cannot or does not want
+#     to process the file and any file after that is processed with the
+#     same command.
+#
+
+use strict;
+use warnings;
+
+my $MAX_PACKET_CONTENT_SIZE = 65516;
+my @capabilities            = @ARGV;
+
+open my $debug, ">>", "rot13-filter.log";
+
+sub rot13 {
+    my ($str) = @_;
+    $str =~ y/A-Za-z/N-ZA-Mn-za-m/;
+    return $str;
+}
+
+sub packet_read {
+    my $buffer;
+    my $bytes_read = read STDIN, $buffer, 4;
+    if ( $bytes_read == 0 ) {
+
+        # EOF - Git stopped talking to us!
+        print $debug "STOP\n";
+        exit();
+    }
+    elsif ( $bytes_read != 4 ) {
+        die "invalid packet size '$bytes_read' field";
+    }
+    my $pkt_size = hex($buffer);
+    if ( $pkt_size == 0 ) {
+        return ( 1, "" );
+    }
+    elsif ( $pkt_size > 4 ) {
+        my $content_size = $pkt_size - 4;
+        $bytes_read = read STDIN, $buffer, $content_size;
+        if ( $bytes_read != $content_size ) {
+            die "invalid packet ($content_size expected; $bytes_read read)";
+        }
+        return ( 0, $buffer );
+    }
+    else {
+        die "invalid packet size";
+    }
+}
+
+sub packet_write {
+    my ($packet) = @_;
+    print STDOUT sprintf( "%04x", length($packet) + 4 );
+    print STDOUT $packet;
+    STDOUT->flush();
+}
+
+sub packet_flush {
+    print STDOUT sprintf( "%04x", 0 );
+    STDOUT->flush();
+}
+
+print $debug "START\n";
+$debug->flush();
+
+( packet_read() eq ( 0, "git-filter-client" ) ) || die "bad initialization";
+( packet_read() eq ( 0, "version=2" ) )         || die "bad version";
+( packet_read() eq ( 1, "" ) )                  || die "bad version end";
+
+packet_write("git-filter-server\n");
+packet_write("version=2\n");
+
+( packet_read() eq ( 0, "clean=true" ) )  || die "bad capability";
+( packet_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
+( packet_read() eq ( 1, "" ) )            || die "bad capability end";
+
+foreach (@capabilities) {
+    packet_write( $_ . "=true\n" );
+}
+packet_flush();
+print $debug "wrote filter header\n";
+$debug->flush();
+
+while (1) {
+    my ($command) = packet_read() =~ /^command=([^=]+)\n$/;
+    print $debug "IN: $command";
+    $debug->flush();
+
+    my ($pathname) = packet_read() =~ /^pathname=([^=]+)\n$/;
+    print $debug " $pathname";
+    $debug->flush();
+
+    # Flush
+    packet_read();
+
+    my $input = "";
+    {
+        binmode(STDIN);
+        my $buffer;
+        my $done = 0;
+        while ( !$done ) {
+            ( $done, $buffer ) = packet_read();
+            $input .= $buffer;
+        }
+        print $debug " " . length($input) . " [OK] -- ";
+        $debug->flush();
+    }
+
+    my $output;
+    if ( $pathname eq "error.r" or $pathname eq "abort.r" ) {
+        $output = "";
+    }
+    elsif ( $command eq "clean" and grep( /^clean$/, @capabilities ) ) {
+        $output = rot13($input);
+    }
+    elsif ( $command eq "smudge" and grep( /^smudge$/, @capabilities ) ) {
+        $output = rot13($input);
+    }
+    else {
+        die "bad command '$command'";
+    }
+
+    print $debug "OUT: " . length($output) . " ";
+    $debug->flush();
+
+    if ( $pathname eq "error.r" ) {
+        print $debug "[ERROR]\n";
+        $debug->flush();
+        packet_write("status=error\n");
+        packet_flush();
+    }
+    elsif ( $pathname eq "abort.r" ) {
+        print $debug "[ABORT]\n";
+        $debug->flush();
+        packet_write("status=abort\n");
+        packet_flush();
+    }
+    else {
+        packet_write("status=success\n");
+        packet_flush();
+
+        if ( $pathname eq "${command}-write-fail.r" ) {
+            print $debug "[WRITE FAIL]\n";
+            $debug->flush();
+            die "${command} write error";
+        }
+
+        while ( length($output) > 0 ) {
+            my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
+            packet_write($packet);
+            print $debug ".";
+            if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
+                $output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
+            }
+            else {
+                $output = "";
+            }
+        }
+        packet_flush();
+        print $debug " [OK]\n";
+        $debug->flush();
+        packet_flush();
+    }
+}
diff --git a/unpack-trees.c b/unpack-trees.c
index 11c37fb..f6798f8 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -10,6 +10,7 @@
 #include "attr.h"
 #include "split-index.h"
 #include "dir.h"
+#include "convert.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
-- 
2.10.0


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently()
  2016-09-08 18:21 ` [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently() larsxschneider
@ 2016-09-08 21:18   ` Stefan Beller
  2016-09-11 11:36     ` Lars Schneider
  0 siblings, 1 reply; 45+ messages in thread
From: Stefan Beller @ 2016-09-08 21:18 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller

On Thu, Sep 8, 2016 at 11:21 AM,  <larsxschneider@gmail.com> wrote:

> +static int packet_write_fmt_1(int fd, int gently,
> +                              const char *fmt, va_list args)
> +{
> +       struct strbuf buf = STRBUF_INIT;
> +       size_t count;
> +
> +       format_packet(&buf, fmt, args);
> +       count = write_in_full(fd, buf.buf, buf.len);
> +       if (count == buf.len)
> +               return 0;
> +
> +       if (!gently) {

    call check_pipe from write_or_die here instead of
    reproducing that function?

> +               if (errno == EPIPE) {
> +                       if (in_async())
> +                               async_exit(141);
> +
> +                       signal(SIGPIPE, SIG_DFL);
> +                       raise(SIGPIPE);
> +                       /* Should never happen, but just in case... */
> +                       exit(141);
> +               }
> +               die_errno("packet write error");
> +       }
> +       error("packet write failed");
> +       return -1;

I think the more idiomatic way is to

    return error(...);

as error always return -1.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 05/10] pkt-line: add packet_write_gently()
  2016-09-08 18:21 ` [PATCH v7 05/10] pkt-line: add packet_write_gently() larsxschneider
@ 2016-09-08 21:24   ` Stefan Beller
  2016-09-11 11:44     ` Lars Schneider
  2016-09-12 23:31   ` Junio C Hamano
  1 sibling, 1 reply; 45+ messages in thread
From: Stefan Beller @ 2016-09-08 21:24 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller

On Thu, Sep 8, 2016 at 11:21 AM,  <larsxschneider@gmail.com> wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> packet_write_fmt_gently() uses format_packet() which lets the caller
> only send string data via "%s". That means it cannot be used for
> arbitrary data that may contain NULs.

Makes sense.

>
> Add packet_write_gently() which writes arbitrary data and returns `0`
> for success and `-1` for an error.

I think documenting the return code is better done in either the header file
or in a commend preceding the implementation instead of the commit message?

Maybe just a generic comment for *_gently is good enough, maybe even no
comment. So the commit is fine, too. I dunno.


> This function is used by other
> pkt-line functions in a subsequent patch.

That's what I figured. Do we also need to mention that in the preceding patch
for packet_flush_gently ?

>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---
>  pkt-line.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/pkt-line.c b/pkt-line.c
> index 37345ca..1d3d725 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -181,6 +181,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
>         return status;
>  }
>
> +int packet_write_gently(const int fd_out, const char *buf, size_t size)
> +{
> +       static char packet_write_buffer[LARGE_PACKET_MAX];
> +
> +       if (size > sizeof(packet_write_buffer) - 4) {
> +               error("packet write failed");
> +               return -1;
> +       }
> +       packet_trace(buf, size, 1);
> +       size += 4;
> +       set_packet_header(packet_write_buffer, size);
> +       memcpy(packet_write_buffer + 4, buf, size - 4);
> +       if (write_in_full(fd_out, packet_write_buffer, size) == size)
> +               return 0;
> +
> +       error("packet write failed");
> +       return -1;
> +}
> +
>  void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
>  {
>         va_list args;
> --
> 2.10.0
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams
  2016-09-08 18:21 ` [PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
@ 2016-09-08 21:49   ` Stefan Beller
  2016-09-11 12:33     ` Lars Schneider
  2016-09-11 21:42     ` Junio C Hamano
  0 siblings, 2 replies; 45+ messages in thread
From: Stefan Beller @ 2016-09-08 21:49 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller

On Thu, Sep 8, 2016 at 11:21 AM,  <larsxschneider@gmail.com> wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> write_packetized_from_fd() and write_packetized_from_buf() write a
> stream of packets. All content packets use the maximal packet size
> except for the last one. After the last content packet a `flush` control
> packet is written.

I presume we need both write_* things in a later patch; can you clarify why
we need both of them?


> +       if (paket_len < 0) {
> +               if (oldalloc == 0)
> +                       strbuf_release(sb_out);

So if old alloc is 0, we release it, which is documented as
/**
 * Release a string buffer and the memory it used. You should not use the
 * string buffer after using this function, unless you initialize it again.
 */

> +               else
> +                       strbuf_setlen(sb_out, oldlen);

Otherwise we just set the length back, such that it looks like before.

So as a caller the strbuf is in a different state in case of error
depending whether
the strbuf already had some data in it. I think it would be better if
we only did
`strbuf_setlen(sb_out, oldlen);` here, such that the caller can
strbuf_release it
unconditionally.

Or to make things more confusing, you could use strbuf_reset in case of 0,
as that is a strbuf_setlen internally. ;)

> @@ -77,6 +79,11 @@ char *packet_read_line(int fd, int *size);
>   */
>  char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
>
> +/*
> + * Reads a stream of variable sized packets until a flush packet is detected.

Strictly speaking we read until a packet of size 0 appears, but as per
the implementation
of packet_read we cannot distinguish between "0000" and "0004", i.e.
an empty non-flush
packet. So I think we're fine both in the implementation as well as
the documentation here.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 08/10] convert: modernize tests
  2016-09-08 18:21 ` [PATCH v7 08/10] convert: modernize tests larsxschneider
@ 2016-09-08 22:05   ` Stefan Beller
  2016-09-11 12:34     ` Lars Schneider
  0 siblings, 1 reply; 45+ messages in thread
From: Stefan Beller @ 2016-09-08 22:05 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller

On Thu, Sep 8, 2016 at 11:21 AM,  <larsxschneider@gmail.com> wrote:
> From: Lars Schneider <larsxschneider@gmail.com>
>
> Use `test_config` to set the config, check that files are empty with
> `test_must_be_empty`, compare files with `test_cmp`, and remove spaces
> after ">" and "<".
>
> Please note that the "rot13" filter configured in "setup" keeps using
> `git config` instead of `test_config` because subsequent tests might
> depend on it.
>
> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
> ---

Makes sense & Reviewed-by "Stefan Beller <sbeller@google.com>"

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-08 18:21 ` [PATCH v7 10/10] convert: add filter.<driver>.process option larsxschneider
@ 2016-09-10  6:29   ` Torsten Bögershausen
  2016-09-12  9:49     ` Lars Schneider
  2016-09-10 16:40   ` Torsten Bögershausen
  2016-09-13 15:22   ` Junio C Hamano
  2 siblings, 1 reply; 45+ messages in thread
From: Torsten Bögershausen @ 2016-09-10  6:29 UTC (permalink / raw)
  To: larsxschneider
  Cc: git, peff, gitster, sbeller, Johannes.Schindelin, jnareb,
	mlbright, jacob.keller

On Thu, Sep 08, 2016 at 08:21:32PM +0200, larsxschneider@gmail.com wrote:
[]
> +packet:          git> git-filter-client
> +packet:          git> version=2
> +packet:          git> version=42
> +packet:          git> 0000
> +packet:          git< git-filter-server
> +packet:          git< version=2
> +packet:          git> clean=true
> +packet:          git> smudge=true
> +packet:          git> not-yet-invented=true
> +packet:          git> 0000
> +packet:          git< clean=true
> +packet:          git< smudge=true
> +packet:          git< 0000

It's probalby only me who has difficulties to distinguish
'>' from '<'.

packet:          git> git-filter-client
packet:          git> version=2
packet:          git> version=42
packet:          git> 0000
packet:       filter> git-filter-server
packet:       filter> version=2

(Otherwise the dialoge description is nice)

> +------------------------
> +Supported filter capabilities in version 2 are "clean" and
> +"smudge".
> +
> +Afterwards Git sends a list of "key=value" pairs terminated with
> +a flush packet. The list will contain at least the filter command
> +(based on the supported capabilities) and the pathname of the file
> +to filter relative to the repository root. Right after these packets
> +Git sends the content split in zero or more pkt-line packets and a
> +flush packet to terminate content.
> +------------------------
> +packet:          git> command=smudge\n
> +packet:          git> pathname=path/testfile.dat\n

How do we send pathnames the have '\n' ?
Not really recommended, but allowed.
And here I am a little bit lost, is each of the lines packed into
a pkt-line ?
command=smudge is packet as pkt-line and pathname= is packed into
another one ? (The we don't need the '\n' at all)
Or go both lines into one pkt-line (thats what I think), then
we don't need the '\n' afther the pathname.
And in this case the pathname is always the last element, and a '\n'
may occur in the pathname, since we know the length of the packet
we know how long the pathname must be.



> +packet:          git> 0000
> +packet:          git> CONTENT
> +packet:          git> 0000
> +------------------------
> +


> +The filter is expected to respond with a list of "key=value" pairs
> +terminated with a flush packet. If the filter does not experience
> +problems then the list must contain a "success" status. Right after
> +these packets the filter is expected to send the content in zero
> +or more pkt-line packets and a flush packet at the end. Finally, a
> +second list of "key=value" pairs terminated with a flush packet
> +is expected. The filter can change the status in the second list.
> +------------------------
> +packet:          git< status=success\n
> +packet:          git< 0000
> +packet:          git< SMUDGED_CONTENT
> +packet:          git< 0000
> +packet:          git< 0000  # empty list!
> +------------------------
> +
> +If the result content is empty then the filter is expected to respond
> +with a success status and an empty list.
> +------------------------
> +packet:          git< status=success\n
> +packet:          git< 0000
> +packet:          git< 0000  # empty content!
> +packet:          git< 0000  # empty list!
> +------------------------
> +
> +In case the filter cannot or does not want to process the content,

Does not want ? 
I can see something like "I read through the file, there is nothing
to do. So Git, I don't send anything back, you know where the file is.

> +it is expected to respond with an "error" status. Depending on the
> +`filter.<driver>.required` flag Git will interpret that as error
> +but it will not stop or restart the filter process.
> +------------------------
> +packet:          git< status=error\n
> +packet:          git< 0000
> +------------------------
> +
> +If the filter experiences an error during processing, then it can
> +send the status "error" after the content was (partially or
> +completely) sent. Depending on the `filter.<driver>.required` flag
> +Git will interpret that as error but it will not stop or restart the
> +filter process.
> +------------------------
> +packet:          git< status=success\n
> +packet:          git< 0000
> +packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
> +packet:          git< 0000
> +packet:          git< status=error\n
> +packet:          git< 0000
> +------------------------
> +
> +If the filter dies during the communication or does not adhere to
> +the protocol then Git will stop the filter process and restart it

My personal comment:
When a filter is mis-behaving, Git should say so loud and clear, and
die(). 
The filter process can be left running, so that it can be debugged.

Here I stopped the review for a moment, 
I still think that Git shouldn't kill anything, because we loose
the ability to debug these processes.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-08 18:21 ` [PATCH v7 10/10] convert: add filter.<driver>.process option larsxschneider
  2016-09-10  6:29   ` Torsten Bögershausen
@ 2016-09-10 16:40   ` Torsten Bögershausen
  2016-09-13 22:04     ` Lars Schneider
  2016-09-13 15:22   ` Junio C Hamano
  2 siblings, 1 reply; 45+ messages in thread
From: Torsten Bögershausen @ 2016-09-10 16:40 UTC (permalink / raw)
  To: larsxschneider
  Cc: git, peff, gitster, sbeller, Johannes.Schindelin, jnareb,
	mlbright, jacob.keller

[]

One general question up here, more comments inline.
The current order for a clean-filter is like this, I removed the error handling:

int convert_to_git()
{
	ret |= apply_filter(path, src, len, -1, dst, filter);
	if (ret && dst) {
		src = dst->buf;
		len = dst->len;
	}
	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
	return ret | ident_to_git(path, src, len, dst, ca.ident);
}

The first step is the clean filter, the CRLF-LF conversion (if needed),
then ident.
The current implementation streams the whole file content to the filter,
(STDIN of the filter) and reads back STDOUT from the filter into a STRBUF.
This is to use the UNIX-like STDIN--STDOUT method for writing a filter.

However, int would_convert_to_git_filter_fd() and convert_to_git_filter_fd()
offer a sort of short-cut:
The filter reads from the file directly, and the output of the filter is
read into a STRBUF.

It looks as if the multi-filter approach can use this in a similar way:
Give the pathname to the filter, the filter opens the file for reading
and stream the result via the pkt-line protocol into Git.
This needs some more changes, and may be very well go into a separate patch
series. (and should).

What I am asking for:
When a multi-filter is used, the content is handled to the filter via pkt-line,
and the result is given to Git via pkt-line ?
Nothing wrong with it, I just wonder, if it should be mentioned somewhere.

> diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl
> new file mode 100755
> index 0000000..279fbfb
> --- /dev/null
> +++ b/contrib/long-running-filter/example.pl
> @@ -0,0 +1,111 @@
> +#!/usr/bin/perl
> +#
> +# Example implementation for the Git filter protocol version 2
> +# See Documentation/gitattributes.txt, section "Filter Protocol"
> +#
> +
> +use strict;
> +use warnings;
> +
> +my $MAX_PACKET_CONTENT_SIZE = 65516;
> +
> +sub packet_read {
> +    my $buffer;
> +    my $bytes_read = read STDIN, $buffer, 4;
> +    if ( $bytes_read == 0 ) {
> +
> +        # EOF - Git stopped talking to us!
> +        exit();
> +    }
> +    elsif ( $bytes_read != 4 ) {
> +        die "invalid packet size '$bytes_read' field";
> +    }

This is half-kosher, I would say,
(And I really. really would like to see an implementation in C ;-)

A read function may look like this:

   ret = read(0, &buffer, 4);
   if (ret < 0) {
     /* Error, nothing we can do */
     exit(1);
   } else if (ret == 0) {
     /* EOF  */
     exit(0);
   } else if (ret < 4) {
     /* 
      * Go and read more, until we have 4 bytes or EOF or Error */
   } else {
     /* Good case, see below */
   }

> +    my $pkt_size = hex($buffer);
> +    if ( $pkt_size == 0 ) {
> +        return ( 1, "" );
> +    }
> +    elsif ( $pkt_size > 4 ) {
> +        my $content_size = $pkt_size - 4;
> +        $bytes_read = read STDIN, $buffer, $content_size;
> +        if ( $bytes_read != $content_size ) {
> +            die "invalid packet ($content_size expected; $bytes_read read)";
> +        }
> +        return ( 0, $buffer );
> +    }
> +    else {
> +        die "invalid packet size";
> +    }
> +}
> +
> +sub packet_write {
> +    my ($packet) = @_;
> +    print STDOUT sprintf( "%04x", length($packet) + 4 );
> +    print STDOUT $packet;
> +    STDOUT->flush();
> +}
> +
> +sub packet_flush {
> +    print STDOUT sprintf( "%04x", 0 );
> +    STDOUT->flush();
> +}
> +
> +( packet_read() eq ( 0, "git-filter-client" ) ) || die "bad initialization";
> +( packet_read() eq ( 0, "version=2" ) )         || die "bad version";
> +( packet_read() eq ( 1, "" ) )                  || die "bad version end";
> +
> +packet_write("git-filter-server\n");
> +packet_write("version=2\n");
> +
> +( packet_read() eq ( 0, "clean=true" ) )  || die "bad capability";
> +( packet_read() eq ( 0, "smudge=true" ) ) || die "bad capability";
> +( packet_read() eq ( 1, "" ) )            || die "bad capability end";
> +
> +packet_write( "clean=true\n" );
> +packet_write( "smudge=true\n" );
> +packet_flush();
> +
> +while (1) {
> +    my ($command) = packet_read() =~ /^command=([^=]+)\n$/;
> +    my ($pathname) = packet_read() =~ /^pathname=([^=]+)\n$/;
> +
> +    packet_read();
> +
> +    my $input = "";
> +    {
> +        binmode(STDIN);
> +        my $buffer;
> +        my $done = 0;
> +        while ( !$done ) {
> +            ( $done, $buffer ) = packet_read();
> +            $input .= $buffer;
> +        }
> +    }
> +
> +    my $output;
> +    if ( $command eq "clean" ) {
> +        ### Perform clean here ###
> +        $output = $input;
> +    }
> +    elsif ( $command eq "smudge" ) {
> +        ### Perform smudge here ###
> +        $output = $input;
> +    }
> +    else {
> +        die "bad command '$command'";
> +    }
> +
> +    packet_write("status=success\n");
> +    packet_flush();
> +    while ( length($output) > 0 ) {
> +        my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE );
> +        packet_write($packet);
> +        if ( length($output) > $MAX_PACKET_CONTENT_SIZE ) {
> +            $output = substr( $output, $MAX_PACKET_CONTENT_SIZE );
> +        }
> +        else {
> +            $output = "";
> +        }
> +    }
> +    packet_flush(); # flush content!
> +    packet_flush(); # empty list!
> +}
> diff --git a/convert.c b/convert.c
> index a068df7..0ed48ed 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -3,6 +3,7 @@
>  #include "run-command.h"
>  #include "quote.h"
>  #include "sigchain.h"
> +#include "pkt-line.h"
>  
>  /*
>   * convert.c - convert a file when checking it out and checking it in.
> @@ -442,7 +443,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
>  	return (write_err || status);
>  }
>  
> -static int apply_filter(const char *path, const char *src, size_t len, int fd,
> +static int apply_single_file_filter(const char *path, const char *src, size_t len, int fd,
>                          struct strbuf *dst, const char *cmd)
>  {
>  	/*
> @@ -456,12 +457,6 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
>  	struct async async;
>  	struct filter_params params;
>  
> -	if (!cmd || !*cmd)
> -		return 0;
> -
> -	if (!dst)
> -		return 1;
> -
>  	memset(&async, 0, sizeof(async));
>  	async.proc = filter_buffer_or_fd;
>  	async.data = &params;
> @@ -496,14 +491,318 @@ static int apply_filter(const char *path, const char *src, size_t len, int fd,
>  	return !err;
>  }
>  
> +#define CAP_CLEAN    (1u<<0)
> +#define CAP_SMUDGE   (1u<<1)

Is CAP_ too generic, and GIT_FILTER_CAP (or so) less calling for trouble ? 

> +
> +struct cmd2process {
> +	struct hashmap_entry ent; /* must be the first member! */
> +	int supported_capabilities;
> +	const char *cmd;
> +	struct child_process process;
> +};
> +
> +static int cmd_process_map_initialized;
> +static struct hashmap cmd_process_map;
> +
> +static int cmd2process_cmp(const struct cmd2process *e1,
> +                           const struct cmd2process *e2,
> +                           const void *unused)
> +{
> +	return strcmp(e1->cmd, e2->cmd);
> +}
> +
> +static struct cmd2process *find_multi_file_filter_entry(struct hashmap *hashmap, const char *cmd)
> +{
> +	struct cmd2process key;
> +	hashmap_entry_init(&key, strhash(cmd));
> +	key.cmd = cmd;
> +	return hashmap_get(hashmap, &key, NULL);
> +}
> +
> +static void kill_multi_file_filter(struct hashmap *hashmap, struct cmd2process *entry)
> +{
> +	if (!entry)
> +		return;
> +	sigchain_push(SIGPIPE, SIG_IGN);
> +	/*
> +	 * We kill the filter most likely because an error happened already.
> +	 * That's why we are not interested in any error code here.
> +	 */
> +	close(entry->process.in);
> +	close(entry->process.out);
> +	sigchain_pop(SIGPIPE);
> +	finish_command(&entry->process);
> +	hashmap_remove(hashmap, entry, NULL);
> +	free(entry);
> +}
> +
> +static int packet_write_list(int fd, const char *line, ...)
> +{
> +	va_list args;
> +	int err;
> +	va_start(args, line);
> +	for (;;)
> +	{
> +		if (!line)
> +			break;
> +		if (strlen(line) > PKTLINE_DATA_MAXLEN)
> +			return -1;
> +		err = packet_write_fmt_gently(fd, "%s", line);
> +		if (err)
> +			return err;
> +		line = va_arg(args, const char*);
> +	}
> +	va_end(args);
> +	return packet_flush_gently(fd);
> +}
> +
> +static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)
> +{
> +	int err;
> +	struct cmd2process *entry;
> +	struct child_process *process;
> +	const char *argv[] = { cmd, NULL };
> +	struct string_list cap_list = STRING_LIST_INIT_NODUP;
> +	char *cap_buf;
> +	const char *cap_name;
> +
> +	entry = xmalloc(sizeof(*entry));
> +	hashmap_entry_init(entry, strhash(cmd));
> +	entry->cmd = cmd;
> +	entry->supported_capabilities = 0;
> +	process = &entry->process;
> +
> +	child_process_init(process);
> +	process->argv = argv;
> +	process->use_shell = 1;
> +	process->in = -1;
> +	process->out = -1;
> +
> +	if (start_command(process)) {
> +		error("cannot fork to run external filter '%s'", cmd);
> +		kill_multi_file_filter(hashmap, entry);
> +		return NULL;
> +	}
> +
> +	sigchain_push(SIGPIPE, SIG_IGN);
> +
> +	err = packet_write_list(process->in, "git-filter-client", "version=2", NULL);
> +	if (err)
> +		goto done;
> +
> +	err = strcmp(packet_read_line(process->out, NULL), "git-filter-server");
> +	if (err) {
> +		error("external filter '%s' does not support long running filter protocol", cmd);
> +		goto done;
> +	}
> +	err = strcmp(packet_read_line(process->out, NULL), "version=2");
> +	if (err)
> +		goto done;
> +
> +	err = packet_write_list(process->in, "clean=true", "smudge=true", NULL);
> +
> +	for (;;)
> +	{
> +		cap_buf = packet_read_line(process->out, NULL);
> +		if (!cap_buf)
> +			break;
> +		string_list_split_in_place(&cap_list, cap_buf, '=', 1);
> +
> +		if (cap_list.nr != 2 || strcmp(cap_list.items[1].string, "true"))
> +			continue;
> +
> +		cap_name = cap_list.items[0].string;
> +		if (!strcmp(cap_name, "clean")) {
> +			entry->supported_capabilities |= CAP_CLEAN;
> +		} else if (!strcmp(cap_name, "smudge")) {
> +			entry->supported_capabilities |= CAP_SMUDGE;
> +		} else {
> +			warning(
> +				"external filter '%s' requested unsupported filter capability '%s'",
> +				cmd, cap_name
> +			);
> +		}
> +
> +		string_list_clear(&cap_list, 0);
> +	}
> +
> +done:
> +	sigchain_pop(SIGPIPE);
> +
> +	if (err || errno == EPIPE) {
> +		error("initialization for external filter '%s' failed", cmd);
> +		kill_multi_file_filter(hashmap, entry);
> +		return NULL;
> +	}
> +
> +	hashmap_add(hashmap, entry);
> +	return entry;
> +}
> +
> +static void read_multi_file_filter_values(int fd, struct strbuf *status) {
> +	struct strbuf **pair;
> +	char *line;
> +	for (;;) {
> +		line = packet_read_line(fd, NULL);
> +		if (!line)
> +			break;
> +		pair = strbuf_split_str(line, '=', 2);
> +		if (pair[0] && pair[0]->len && pair[1]) {
> +			if (!strcmp(pair[0]->buf, "status=")) {
> +				strbuf_reset(status);
> +				strbuf_addbuf(status, pair[1]);
> +			}
> +		}
> +	}
> +}
> +
> +static int apply_multi_file_filter(const char *path, const char *src, size_t len,
> +                                   int fd, struct strbuf *dst, const char *cmd,
> +                                   const int wanted_capability)
> +{
> +	int err;
> +	struct cmd2process *entry;
> +	struct child_process *process;
> +	struct stat file_stat;
> +	struct strbuf nbuf = STRBUF_INIT;
> +	struct strbuf filter_status = STRBUF_INIT;
> +	char *filter_type;
> +
> +	if (!cmd_process_map_initialized) {
> +		cmd_process_map_initialized = 1;
> +		hashmap_init(&cmd_process_map, (hashmap_cmp_fn) cmd2process_cmp, 0);
> +		entry = NULL;
> +	} else {
> +		entry = find_multi_file_filter_entry(&cmd_process_map, cmd);
> +	}
> +
> +	fflush(NULL);
> +
> +	if (!entry) {
> +		entry = start_multi_file_filter(&cmd_process_map, cmd);
> +		if (!entry)
> +			return 0;
> +	}
> +	process = &entry->process;
> +
> +	if (!(wanted_capability & entry->supported_capabilities))
> +		return 0;
> +
> +	if (CAP_CLEAN & wanted_capability)
> +		filter_type = "clean";
> +	else if (CAP_SMUDGE & wanted_capability)
> +		filter_type = "smudge";
> +	else
> +		die("unexpected filter type");
> +
> +	if (fd >= 0 && !src) {
> +		if (fstat(fd, &file_stat) == -1)
> +			return 0;
> +		len = xsize_t(file_stat.st_size);
> +	}
> +
> +	sigchain_push(SIGPIPE, SIG_IGN);
> +
> +
> +	err = (strlen(filter_type) > PKTLINE_DATA_MAXLEN);

Extra () needed ?
More () in the code...

No more comments today (except that the kill is overkill)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently()
  2016-09-08 21:18   ` Stefan Beller
@ 2016-09-11 11:36     ` Lars Schneider
  2016-09-11 16:01       ` Stefan Beller
  0 siblings, 1 reply; 45+ messages in thread
From: Lars Schneider @ 2016-09-11 11:36 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller


> On 08 Sep 2016, at 23:18, Stefan Beller <sbeller@google.com> wrote:
> 
> On Thu, Sep 8, 2016 at 11:21 AM,  <larsxschneider@gmail.com> wrote:
> 
>> +static int packet_write_fmt_1(int fd, int gently,
>> +                              const char *fmt, va_list args)
>> +{
>> +       struct strbuf buf = STRBUF_INIT;
>> +       size_t count;
>> +
>> +       format_packet(&buf, fmt, args);
>> +       count = write_in_full(fd, buf.buf, buf.len);
>> +       if (count == buf.len)
>> +               return 0;
>> +
>> +       if (!gently) {
> 
>    call check_pipe from write_or_die here instead of
>    reproducing that function?

Yes, might be better. I wasn't sure because the check_pipe is
not public.

Where would you declare check_pipe? In cache.h?
Maybe it would be more suitable to move check_pipe to 
run-command.h/c?


>> +               if (errno == EPIPE) {
>> +                       if (in_async())
>> +                               async_exit(141);
>> +
>> +                       signal(SIGPIPE, SIG_DFL);
>> +                       raise(SIGPIPE);
>> +                       /* Should never happen, but just in case... */
>> +                       exit(141);
>> +               }
>> +               die_errno("packet write error");
>> +       }
>> +       error("packet write failed");
>> +       return -1;
> 
> I think the more idiomatic way is to
> 
>    return error(...);
> 
> as error always return -1.

Of course!!


Thank you,
Lars


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 05/10] pkt-line: add packet_write_gently()
  2016-09-08 21:24   ` Stefan Beller
@ 2016-09-11 11:44     ` Lars Schneider
  0 siblings, 0 replies; 45+ messages in thread
From: Lars Schneider @ 2016-09-11 11:44 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller


> On 08 Sep 2016, at 23:24, Stefan Beller <sbeller@google.com> wrote:
> 
> On Thu, Sep 8, 2016 at 11:21 AM,  <larsxschneider@gmail.com> wrote:
> 
>> 
>> Add packet_write_gently() which writes arbitrary data and returns `0`
>> for success and `-1` for an error.
> 
> I think documenting the return code is better done in either the header file
> or in a commend preceding the implementation instead of the commit message?
> 
> Maybe just a generic comment for *_gently is good enough, maybe even no
> comment. So the commit is fine, too. I dunno.

I agree that this is too verbose as this function follows the standard
Git return value conventions (AFAIK). I'll remove this from all commit
messages.


>> This function is used by other
>> pkt-line functions in a subsequent patch.
> 
> That's what I figured. Do we also need to mention that in the preceding patch
> for packet_flush_gently ?

I'll add this note to all commit messages that introduce new functions which
are used later.


Thanks,
Lars

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams
  2016-09-08 21:49   ` Stefan Beller
@ 2016-09-11 12:33     ` Lars Schneider
  2016-09-11 16:03       ` Stefan Beller
  2016-09-11 21:42     ` Junio C Hamano
  1 sibling, 1 reply; 45+ messages in thread
From: Lars Schneider @ 2016-09-11 12:33 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller


> On 08 Sep 2016, at 23:49, Stefan Beller <sbeller@google.com> wrote:
> 
> On Thu, Sep 8, 2016 at 11:21 AM,  <larsxschneider@gmail.com> wrote:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> write_packetized_from_fd() and write_packetized_from_buf() write a
>> stream of packets. All content packets use the maximal packet size
>> except for the last one. After the last content packet a `flush` control
>> packet is written.
> 
> I presume we need both write_* things in a later patch; can you clarify why
> we need both of them?

Since 9035d7 Git streams from fd to required filters and from buf to
non-required filters. The Git filter protocol v2 makes use of all that,
too.

https://github.com/git/git/commit/9035d75a2be9d80d82676504d69553245017f6d4


>> +       if (paket_len < 0) {
>> +               if (oldalloc == 0)
>> +                       strbuf_release(sb_out);
> 
> So if old alloc is 0, we release it, which is documented as
> /**
> * Release a string buffer and the memory it used. You should not use the
> * string buffer after using this function, unless you initialize it again.
> */
> 
>> +               else
>> +                       strbuf_setlen(sb_out, oldlen);
> 
> Otherwise we just set the length back, such that it looks like before.
> 
> So as a caller the strbuf is in a different state in case of error
> depending whether
> the strbuf already had some data in it. I think it would be better if
> we only did
> `strbuf_setlen(sb_out, oldlen);` here, such that the caller can
> strbuf_release it
> unconditionally.

I tried to mimic the behavior of strbuf_read() [1]. The error handling
was introduced in 2fc647 [2] to ease error handling:

"This allows for easier error handling, as callers only need to call
strbuf_release() if A) the command succeeded or B) if they would have had
to do so anyway because they added something to the strbuf themselves."

Does this convince you to keep the proposed error handling? If yes, then
I would add a comment to the function to document that behavior explicitly!

[1] https://github.com/git/git/blob/cda1bbd474805e653dda8a71d4ea3790e2a66cbb/strbuf.c#L377-L383
[2] https://github.com/git/git/commit/2fc647004ac7016128372a85db8245581e493812


> Or to make things more confusing, you could use strbuf_reset in case of 0,
> as that is a strbuf_setlen internally. ;)


Thanks,
Lars

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 08/10] convert: modernize tests
  2016-09-08 22:05   ` Stefan Beller
@ 2016-09-11 12:34     ` Lars Schneider
  0 siblings, 0 replies; 45+ messages in thread
From: Lars Schneider @ 2016-09-11 12:34 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller


> On 09 Sep 2016, at 00:05, Stefan Beller <sbeller@google.com> wrote:
> 
> On Thu, Sep 8, 2016 at 11:21 AM,  <larsxschneider@gmail.com> wrote:
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> Use `test_config` to set the config, check that files are empty with
>> `test_must_be_empty`, compare files with `test_cmp`, and remove spaces
>> after ">" and "<".
>> 
>> Please note that the "rot13" filter configured in "setup" keeps using
>> `git config` instead of `test_config` because subsequent tests might
>> depend on it.
>> 
>> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
>> ---
> 
> Makes sense & Reviewed-by "Stefan Beller <sbeller@google.com>"

Thank you,
Lars


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently()
  2016-09-11 11:36     ` Lars Schneider
@ 2016-09-11 16:01       ` Stefan Beller
  2016-09-12  9:22         ` Lars Schneider
  0 siblings, 1 reply; 45+ messages in thread
From: Stefan Beller @ 2016-09-11 16:01 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller

On Sun, Sep 11, 2016 at 4:36 AM, Lars Schneider
<larsxschneider@gmail.com> wrote:

>>
>>    call check_pipe from write_or_die here instead of
>>    reproducing that function?
>
> Yes, might be better. I wasn't sure because the check_pipe is
> not public.
>
> Where would you declare check_pipe? In cache.h?

IIRC, once upon a time the community decided to not
clutter cache.h any more as it is like a dirty kitchen sink,
piling up all unrelated things, but on the other hand that
would be handy.

> Maybe it would be more suitable to move check_pipe to
> run-command.h/c?

That's certainly possible.
I don't have a strong opinion, where the code actually
resides, but I do have a strong-ish opinion on code
duplication. ;)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams
  2016-09-11 12:33     ` Lars Schneider
@ 2016-09-11 16:03       ` Stefan Beller
  0 siblings, 0 replies; 45+ messages in thread
From: Stefan Beller @ 2016-09-11 16:03 UTC (permalink / raw)
  To: Lars Schneider
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller

On Sun, Sep 11, 2016 at 5:33 AM, Lars Schneider
<larsxschneider@gmail.com> wrote:

> Does this convince you to keep the proposed error handling? If yes, then
> I would add a comment to the function to document that behavior explicitly!

oops. I should read the docs more carefully.

Thanks for pointing out.
Then I'd be happy with the patch as is.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams
  2016-09-08 21:49   ` Stefan Beller
  2016-09-11 12:33     ` Lars Schneider
@ 2016-09-11 21:42     ` Junio C Hamano
  1 sibling, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2016-09-11 21:42 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Lars Schneider, git, Jeff King, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller

Stefan Beller <sbeller@google.com> writes:

> So as a caller the strbuf is in a different state in case of error
> depending whether
> the strbuf already had some data in it. I think it would be better if
> we only did
> `strbuf_setlen(sb_out, oldlen);` here, such that the caller can
> strbuf_release it
> unconditionally.

If the caller _knows_ that the strbuf is no longer needed, it can
unconditionally call strbuf_release() on it, whether its buffer was
already released or only its length was set to 0, no?

The callee is merely trying to be nice by resetting the strbuf to a
state close to the original in the error return codepath, I would
think.  It may be debatable if such a niceness is needed, but it is
a different matter that does not relate to the burden imposed on the
caller.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently()
  2016-09-11 16:01       ` Stefan Beller
@ 2016-09-12  9:22         ` Lars Schneider
  0 siblings, 0 replies; 45+ messages in thread
From: Lars Schneider @ 2016-09-12  9:22 UTC (permalink / raw)
  To: Stefan Beller
  Cc: git, Jeff King, Junio C Hamano, Johannes Schindelin,
	Jakub Narębski, Martin-Louis Bright,
	Torsten Bögershausen, Jacob Keller


> On 11 Sep 2016, at 18:01, Stefan Beller <sbeller@google.com> wrote:
> 
> On Sun, Sep 11, 2016 at 4:36 AM, Lars Schneider
> <larsxschneider@gmail.com> wrote:
> 
>>> 
>>>   call check_pipe from write_or_die here instead of
>>>   reproducing that function?
>> [...]
> 
>> Maybe it would be more suitable to move check_pipe to
>> run-command.h/c?
> 
> That's certainly possible.
> I don't have a strong opinion, where the code actually
> resides, but I do have a strong-ish opinion on code
> duplication. ;)

OK, then I will move check_pipe() to run-command.

Thanks,
Lars


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-10  6:29   ` Torsten Bögershausen
@ 2016-09-12  9:49     ` Lars Schneider
  2016-09-13 14:44       ` Torsten Bögershausen
  0 siblings, 1 reply; 45+ messages in thread
From: Lars Schneider @ 2016-09-12  9:49 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Git Mailing List, peff, gitster, sbeller, Johannes.Schindelin,
	jnareb, mlbright, jacob.keller


> On 10 Sep 2016, at 08:29, Torsten Bögershausen <tboegi@web.de> wrote:
> 
> On Thu, Sep 08, 2016 at 08:21:32PM +0200, larsxschneider@gmail.com wrote:
> []
>> +packet:          git> git-filter-client
>> +packet:          git> version=2
>> +packet:          git> version=42
>> +packet:          git> 0000
>> +packet:          git< git-filter-server
>> +packet:          git< version=2
>> +packet:          git> clean=true
>> +packet:          git> smudge=true
>> +packet:          git> not-yet-invented=true
>> +packet:          git> 0000
>> +packet:          git< clean=true
>> +packet:          git< smudge=true
>> +packet:          git< 0000
> 
> It's probalby only me who has difficulties to distinguish
> '>' from '<'.

I see what you mean. However, this format is used with "GIT_TRACE_PACKET"
as well and therefore I would prefer to keep it.


> packet:          git> git-filter-client
> packet:          git> version=2
> packet:          git> version=42
> packet:          git> 0000
> packet:       filter> git-filter-server
> packet:       filter> version=2
> 
> (Otherwise the dialoge description is nice)

Thanks!


>> +------------------------
>> +Supported filter capabilities in version 2 are "clean" and
>> +"smudge".
>> +
>> +Afterwards Git sends a list of "key=value" pairs terminated with
>> +a flush packet. The list will contain at least the filter command
>> +(based on the supported capabilities) and the pathname of the file
>> +to filter relative to the repository root. Right after these packets
>> +Git sends the content split in zero or more pkt-line packets and a
>> +flush packet to terminate content.
>> +------------------------
>> +packet:          git> command=smudge\n
>> +packet:          git> pathname=path/testfile.dat\n
> 
> How do we send pathnames the have '\n' ?
> Not really recommended, but allowed.
> And here I am a little bit lost, is each of the lines packed into
> a pkt-line ?
> command=smudge is packet as pkt-line and pathname= is packed into
> another one ? (The we don't need the '\n' at all)

Every line is a dedicated packet. That's why '\n' in a path name would
not be a problem as the receiver is expected to read the entire packet
when parsing the value (and the receiver knows the packet length, too).

The '\n' at the end is required by the pkt-line format:
"A non-binary line SHOULD BE terminated by an LF..."
(see protocol-common.txt)


> Or go both lines into one pkt-line (thats what I think), then
> we don't need the '\n' afther the pathname.

No (see above).


> And in this case the pathname is always the last element, and a '\n'
> may occur in the pathname, since we know the length of the packet
> we know how long the pathname must be.
> 
> 
> [...]
>> 
>> +In case the filter cannot or does not want to process the content,
> 
> Does not want ? 
> I can see something like "I read through the file, there is nothing
> to do. So Git, I don't send anything back, you know where the file is.

That's right. Isn't that covered with "does not want"?


>> +it is expected to respond with an "error" status. Depending on the
>> +`filter.<driver>.required` flag Git will interpret that as error
>> +but it will not stop or restart the filter process.
>> +------------------------
>> +packet:          git< status=error\n
>> +packet:          git< 0000
>> +------------------------
>> +
>> +If the filter experiences an error during processing, then it can
>> +send the status "error" after the content was (partially or
>> +completely) sent. Depending on the `filter.<driver>.required` flag
>> +Git will interpret that as error but it will not stop or restart the
>> +filter process.
>> +------------------------
>> +packet:          git< status=success\n
>> +packet:          git< 0000
>> +packet:          git< HALF_WRITTEN_ERRONEOUS_CONTENT
>> +packet:          git< 0000
>> +packet:          git< status=error\n
>> +packet:          git< 0000
>> +------------------------
>> +
>> +If the filter dies during the communication or does not adhere to
>> +the protocol then Git will stop the filter process and restart it
> 
> My personal comment:
> When a filter is mis-behaving, Git should say so loud and clear, and
> die(). 
> The filter process can be left running, so that it can be debugged.

In the current implementation Git would die already if the filter is 
"required". In this particular case we *could* die, too. However, 
I would prefer to keep it as is because I think the users of a 
"non-required" filter do not expect Git to die if there is *any* 
problem with the filter.


> Here I stopped the review for a moment, 
> I still think that Git shouldn't kill anything, because we loose
> the ability to debug these processes.

Based on my experience debugging the filter in this state is hard
anyways. I think a user would rather try to reproduce the problem
and run Git with the "GIT_TRACE_PACKET" flag enabled for debugging.


Thanks,
Lars

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()
  2016-09-08 18:21 ` [PATCH v7 04/10] pkt-line: add packet_flush_gently() larsxschneider
@ 2016-09-12 23:30   ` Junio C Hamano
  2016-09-13 22:12     ` Lars Schneider
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2016-09-12 23:30 UTC (permalink / raw)
  To: larsxschneider
  Cc: git, peff, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller

larsxschneider@gmail.com writes:

> From: Lars Schneider <larsxschneider@gmail.com>
>
> packet_flush() would die in case of a write error even though for some
> callers an error would be acceptable. Add packet_flush_gently() which
> writes a pkt-line flush packet and returns `0` for success and `-1` for
> failure.
> ...
> +int packet_flush_gently(int fd)
> +{
> +	packet_trace("0000", 4, 1);
> +	if (write_in_full(fd, "0000", 4) == 4)
> +		return 0;
> +	error("flush packet write failed");
> +	return -1;

It is more idiomatic to do

	return error(...);

but more importantly, does the caller even want an error message
unconditionally printed here?

I suspect that it is a strong sign that the caller wants to be in
control of when and what error message is produced; otherwise it
wouldn't be calling the _gently() variant, no?

Of course, if you have written callers to this function in later
patches in this series, they would be responsible for reporting (or
choosing not to report) this failure, but I think making this
function silent is a better course in the longer term.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 05/10] pkt-line: add packet_write_gently()
  2016-09-08 18:21 ` [PATCH v7 05/10] pkt-line: add packet_write_gently() larsxschneider
  2016-09-08 21:24   ` Stefan Beller
@ 2016-09-12 23:31   ` Junio C Hamano
  1 sibling, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2016-09-12 23:31 UTC (permalink / raw)
  To: larsxschneider
  Cc: git, peff, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller

larsxschneider@gmail.com writes:

> +int packet_write_gently(const int fd_out, const char *buf, size_t size)
> +{
> +	static char packet_write_buffer[LARGE_PACKET_MAX];
> +
> +	if (size > sizeof(packet_write_buffer) - 4) {
> +		error("packet write failed");
> +		return -1;
> +	}
> +	packet_trace(buf, size, 1);
> +	size += 4;
> +	set_packet_header(packet_write_buffer, size);
> +	memcpy(packet_write_buffer + 4, buf, size - 4);
> +	if (write_in_full(fd_out, packet_write_buffer, size) == size)
> +		return 0;
> +
> +	error("packet write failed");
> +	return -1;
> +}

The same comment as 4/10 applies here.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-12  9:49     ` Lars Schneider
@ 2016-09-13 14:44       ` Torsten Bögershausen
  2016-09-13 16:42         ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Torsten Bögershausen @ 2016-09-13 14:44 UTC (permalink / raw)
  To: Lars Schneider, Torsten Bögershausen
  Cc: Git Mailing List, peff, gitster, sbeller, Johannes.Schindelin,
	jnareb, mlbright, jacob.keller

On 12.09.16 11:49, Lars Schneider wrote:

>> How do we send pathnames the have '\n' ?
>> Not really recommended, but allowed.
>> And here I am a little bit lost, is each of the lines packed into
>> a pkt-line ?
>> command=smudge is packet as pkt-line and pathname= is packed into
>> another one ? (The we don't need the '\n' at all)
> 
> Every line is a dedicated packet. That's why '\n' in a path name would
> not be a problem as the receiver is expected to read the entire packet
> when parsing the value (and the receiver knows the packet length, too).
> 
> The '\n' at the end is required by the pkt-line format:
> "A non-binary line SHOULD BE terminated by an LF..."
> (see protocol-common.txt)

That is only the half part of the story:
A non-binary line SHOULD BE terminated by an LF, which if present
MUST be included in the total length. Receivers MUST treat pkt-lines
with non-binary data the same whether or not they contain the trailing
LF (stripping the LF if present, and not complaining when it is
missing).


How do we treat pathnames ?
They can have each byte value except '\0'.
What should a receiver do, which reads a string like "ABC\n\n" ?
Is it "ABC\n" or "ABC\n\n" ?

I would really consider to treat pathnames as binary, and not add a trailing '\n',
are there other opinions ?
 






^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-08 18:21 ` [PATCH v7 10/10] convert: add filter.<driver>.process option larsxschneider
  2016-09-10  6:29   ` Torsten Bögershausen
  2016-09-10 16:40   ` Torsten Bögershausen
@ 2016-09-13 15:22   ` Junio C Hamano
  2016-09-15 20:16     ` Lars Schneider
  2 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2016-09-13 15:22 UTC (permalink / raw)
  To: larsxschneider
  Cc: git, peff, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller

larsxschneider@gmail.com writes:

> diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl
> ...
> +sub packet_read {
> +    my $buffer;
> +    my $bytes_read = read STDIN, $buffer, 4;
> +    if ( $bytes_read == 0 ) {
> +
> +        # EOF - Git stopped talking to us!
> +        exit();
> +...
> +packet_write( "clean=true\n" );
> +packet_write( "smudge=true\n" );
> +packet_flush();
> +
> +while (1) {

These extra SP around the contents inside () pair look unfamiliar
and somewhat strange to me, but as long as they are consistently
done (and I think you are mostly being consistent), it is OK.

> +#define CAP_CLEAN    (1u<<0)
> +#define CAP_SMUDGE   (1u<<1)

As these are meant to be usable together, i.e. bits in a single flag
word, they are of type "unsigned int", which makes perfect sense.

Make sure your variables and fields that store them are of the same
type.  I think I saw "int' used to pass them in at least one place.

> +static int apply_filter(const char *path, const char *src, size_t len,
> +                        int fd, struct strbuf *dst, struct convert_driver *drv,
> +                        const int wanted_capability)
> +{
> +	const char* cmd = NULL;

"const char *cmd = NULL;" of course.

> diff --git a/unpack-trees.c b/unpack-trees.c
> index 11c37fb..f6798f8 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -10,6 +10,7 @@
>  #include "attr.h"
>  #include "split-index.h"
>  #include "dir.h"
> +#include "convert.h"
>  
>  /*
>   * Error messages expected by scripts out of plumbing commands such as

Why?  The resulting file seems to compile without this addition.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 00/10] Git filter protocol
  2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
                   ` (9 preceding siblings ...)
  2016-09-08 18:21 ` [PATCH v7 10/10] convert: add filter.<driver>.process option larsxschneider
@ 2016-09-13 16:00 ` Junio C Hamano
  10 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2016-09-13 16:00 UTC (permalink / raw)
  To: larsxschneider
  Cc: git, peff, sbeller, Johannes.Schindelin, jnareb, mlbright,
	tboegi, jacob.keller

larsxschneider@gmail.com writes:

> From: Lars Schneider <larsxschneider@gmail.com>
>
> The goal of this series is to avoid launching a new clean/smudge filter
> process for each file that is filtered.

I tentatively queued this in 'pu' because I wanted to see these
changes in context and also wanted to know if there are overlaps and
conflicts with other topics in flight.  As your earlier steps
renamed packet_write() to packet_write_fmt() but didn't add a new
packet_write() that has different semantics, it was pleasantly easy
to make sure there is no new caller added in the meantime (it did
conflict with Duy's shallow-deepen topic and the resolution had to
touch outside the <<< conflicted === regions >>>, but it was
otherwise trivial).

Some details of the protocol (I think Torsten has already pointed
out that not all paths can be representable with the current
incarnation; there may be other minor nits like that) may still need
to be worked out, but I think the series at this point gets the
basic code structure right and such additional fixes would not
change things too drastically (IOW, I think we are very close to
being 'next'-ready).

Thanks.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-13 14:44       ` Torsten Bögershausen
@ 2016-09-13 16:42         ` Junio C Hamano
  2016-09-15 17:23           ` Lars Schneider
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2016-09-13 16:42 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Lars Schneider, Git Mailing List, peff, sbeller,
	Johannes.Schindelin, jnareb, mlbright, jacob.keller

Torsten Bögershausen <tboegi@web.de> writes:

> I would really consider to treat pathnames as binary, and not add a trailing '\n',
> are there other opinions ?

It would be the most consistent if the same format as
write_name_quoted() is used for this, I would think.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-10 16:40   ` Torsten Bögershausen
@ 2016-09-13 22:04     ` Lars Schneider
  0 siblings, 0 replies; 45+ messages in thread
From: Lars Schneider @ 2016-09-13 22:04 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Git Mailing List, Jeff King, Junio C Hamano, Stefan Beller,
	Johannes.Schindelin, jnareb, mlbright, jacob.keller


> On 10 Sep 2016, at 17:40, Torsten Bögershausen <tboegi@web.de> wrote:
> 
> []
> 
> One general question up here, more comments inline.
> The current order for a clean-filter is like this, I removed the error handling:
> 
> int convert_to_git()
> {
> 	ret |= apply_filter(path, src, len, -1, dst, filter);
> 	if (ret && dst) {
> 		src = dst->buf;
> 		len = dst->len;
> 	}
> 	ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
> 	return ret | ident_to_git(path, src, len, dst, ca.ident);
> }
> 
> The first step is the clean filter, the CRLF-LF conversion (if needed),
> then ident.
> The current implementation streams the whole file content to the filter,
> (STDIN of the filter) and reads back STDOUT from the filter into a STRBUF.
> This is to use the UNIX-like STDIN--STDOUT method for writing a filter.
> 
> However, int would_convert_to_git_filter_fd() and convert_to_git_filter_fd()
> offer a sort of short-cut:
> The filter reads from the file directly, and the output of the filter is
> read into a STRBUF.

Are you sure? As far as I understand the code the filter does not read from 
the file in any case today.  The functions would_convert_to_git_filter_fd() and 
convert_to_git_filter_fd() just avoid avoid mapping the file in Git. The content 
is still streamed via pipes:
https://github.com/git/git/commit/9035d75a2be9d80d82676504d69553245017f6d4


> It looks as if the multi-filter approach can use this in a similar way:
> Give the pathname to the filter, the filter opens the file for reading
> and stream the result via the pkt-line protocol into Git.
> This needs some more changes, and may be very well go into a separate patch
> series. (and should).
> 
> What I am asking for:
> When a multi-filter is used, the content is handled to the filter via pkt-line,
> and the result is given to Git via pkt-line ?
> Nothing wrong with it, I just wonder, if it should be mentioned somewhere.

That is most certainly a good idea and the main reason I added "capabilities"
to the protocol. Joey Hess worked on this topic (not merged, yet) and I would 
like to make this available to the long-running filter protocol as soon as the
feature is available:
http://public-inbox.org/git/1468277112-9909-1-git-send-email-joeyh@joeyh.name/


>> +sub packet_read {
>> +    my $buffer;
>> +    my $bytes_read = read STDIN, $buffer, 4;
>> +    if ( $bytes_read == 0 ) {
>> +
>> +        # EOF - Git stopped talking to us!
>> +        exit();
>> +    }
>> +    elsif ( $bytes_read != 4 ) {
>> +        die "invalid packet size '$bytes_read' field";
>> +    }
> 
> This is half-kosher, I would say,
> (And I really. really would like to see an implementation in C ;-)

Would you be willing to contribute a patch? :-)


> A read function may look like this:
> 
>   ret = read(0, &buffer, 4);
>   if (ret < 0) {
>     /* Error, nothing we can do */
>     exit(1);
>   } else if (ret == 0) {
>     /* EOF  */
>     exit(0);
>   } else if (ret < 4) {
>     /* 
>      * Go and read more, until we have 4 bytes or EOF or Error */
>   } else {
>     /* Good case, see below */
>   }

I see. However, my intention was to provide an absolute minimal
example to teach a reader how the protocol works. I consider
all proper error handling an exercise for the reader ;-)


>> +#define CAP_CLEAN    (1u<<0)
>> +#define CAP_SMUDGE   (1u<<1)
> 
> Is CAP_ too generic, and GIT_FILTER_CAP (or so) less calling for trouble ?

I had something like that but Junio suggested these names in V4:
http://public-inbox.org/git/xmqq8twd8uld.fsf@gitster.mtv.corp.google.com/


>> +
>> +	err = (strlen(filter_type) > PKTLINE_DATA_MAXLEN);
> 
> Extra () needed ?
> More () in the code...

I thought it might improve readability, but I will remove them
if you think this would be more consistent with existing Git code.


Thanks,
Lars

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()
  2016-09-12 23:30   ` Junio C Hamano
@ 2016-09-13 22:12     ` Lars Schneider
  2016-09-13 22:44       ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Lars Schneider @ 2016-09-13 22:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, peff, sbeller, Johannes.Schindelin, jnareb,
	mlbright, tboegi, jacob.keller


> On 13 Sep 2016, at 00:30, Junio C Hamano <gitster@pobox.com> wrote:
> 
> larsxschneider@gmail.com writes:
> 
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> packet_flush() would die in case of a write error even though for some
>> callers an error would be acceptable. Add packet_flush_gently() which
>> writes a pkt-line flush packet and returns `0` for success and `-1` for
>> failure.
>> ...
>> +int packet_flush_gently(int fd)
>> +{
>> +	packet_trace("0000", 4, 1);
>> +	if (write_in_full(fd, "0000", 4) == 4)
>> +		return 0;
>> +	error("flush packet write failed");
>> +	return -1;
> 
> It is more idiomatic to do
> 
> 	return error(...);
> 
> but more importantly, does the caller even want an error message
> unconditionally printed here?
> 
> I suspect that it is a strong sign that the caller wants to be in
> control of when and what error message is produced; otherwise it
> wouldn't be calling the _gently() variant, no?

Agreed!

Thanks,
Lars

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()
  2016-09-13 22:12     ` Lars Schneider
@ 2016-09-13 22:44       ` Junio C Hamano
  2016-09-15 16:42         ` Lars Schneider
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2016-09-13 22:44 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Git Mailing List, peff, sbeller, Johannes.Schindelin, jnareb,
	mlbright, tboegi, jacob.keller

Lars Schneider <larsxschneider@gmail.com> writes:

>> On 13 Sep 2016, at 00:30, Junio C Hamano <gitster@pobox.com> wrote:
>> 
>> larsxschneider@gmail.com writes:
>> 
>>> From: Lars Schneider <larsxschneider@gmail.com>
>>> 
>>> packet_flush() would die in case of a write error even though for some
>>> callers an error would be acceptable. Add packet_flush_gently() which
>>> writes a pkt-line flush packet and returns `0` for success and `-1` for
>>> failure.
>>> ...
>>> +int packet_flush_gently(int fd)
>>> +{
>>> +	packet_trace("0000", 4, 1);
>>> +	if (write_in_full(fd, "0000", 4) == 4)
>>> +		return 0;
>>> +	error("flush packet write failed");
>>> +	return -1;
>> 
>> It is more idiomatic to do
>> 
>> 	return error(...);
>> 
>> but more importantly, does the caller even want an error message
>> unconditionally printed here?
>> 
>> I suspect that it is a strong sign that the caller wants to be in
>> control of when and what error message is produced; otherwise it
>> wouldn't be calling the _gently() variant, no?
>
> Agreed!

I am also OK with the current form, too.  Those who need to enhance
it to packet_flush_gently(int fd, int quiet) can come later.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()
  2016-09-13 22:44       ` Junio C Hamano
@ 2016-09-15 16:42         ` Lars Schneider
  2016-09-15 19:44           ` Jeff King
  0 siblings, 1 reply; 45+ messages in thread
From: Lars Schneider @ 2016-09-15 16:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git Mailing List, peff, sbeller, Johannes.Schindelin, jnareb,
	mlbright, tboegi, jacob.keller


> On 13 Sep 2016, at 23:44, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>>> On 13 Sep 2016, at 00:30, Junio C Hamano <gitster@pobox.com> wrote:
>>> 
>>> larsxschneider@gmail.com writes:
>>> 
>>>> From: Lars Schneider <larsxschneider@gmail.com>
>>>> 
>>>> packet_flush() would die in case of a write error even though for some
>>>> callers an error would be acceptable. Add packet_flush_gently() which
>>>> writes a pkt-line flush packet and returns `0` for success and `-1` for
>>>> failure.
>>>> ...
>>>> +int packet_flush_gently(int fd)
>>>> +{
>>>> +	packet_trace("0000", 4, 1);
>>>> +	if (write_in_full(fd, "0000", 4) == 4)
>>>> +		return 0;
>>>> +	error("flush packet write failed");
>>>> +	return -1;
>>> 
>>> It is more idiomatic to do
>>> 
>>> 	return error(...);
>>> 
>>> but more importantly, does the caller even want an error message
>>> unconditionally printed here?
>>> 
>>> I suspect that it is a strong sign that the caller wants to be in
>>> control of when and what error message is produced; otherwise it
>>> wouldn't be calling the _gently() variant, no?
>> 
>> Agreed!
> 
> I am also OK with the current form, too.  Those who need to enhance
> it to packet_flush_gently(int fd, int quiet) can come later.

"caller wants to be in control [...] otherwise it wouldn't be calling 
the _gently() variant" convinced me. I would like to change it like
this:

	trace_printf_key(&trace_packet, "flush packet write failed");
	return -1;

Objections?

Thanks,
Lars

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-13 16:42         ` Junio C Hamano
@ 2016-09-15 17:23           ` Lars Schneider
  2016-09-15 20:04             ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Lars Schneider @ 2016-09-15 17:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Torsten Bögershausen, Git Mailing List, peff, sbeller,
	Johannes.Schindelin, jnareb, mlbright, jacob.keller


> On 13 Sep 2016, at 17:42, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Torsten Bögershausen <tboegi@web.de> writes:
> 
>> I would really consider to treat pathnames as binary, and not add a trailing '\n',
>> are there other opinions ?
> 
> It would be the most consistent if the same format as
> write_name_quoted() is used for this, I would think.

Is that the solution you had in mind?

	quote_c_style(path, &quoted_path, NULL, 0);
	err = packet_write_fmt_gently(process->in, "pathname=%s\n", quoted_path.buf);

Wouldn't that complicate the pathname parsing on the filter side?
Can't we just define in our filter protocol documentation that our 
"pathname" packet _always_ has a trailing "\n"? That would mean the 
receiver would know a packet "pathname=ABC\n\n" encodes the path
"ABC\n" [1].

Thanks,
Lars


[1] Following Torsten's example in 
http://public-inbox.org/git/96554f6d-988d-e0b8-7936-8d0f29a7564f@web.de )


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()
  2016-09-15 16:42         ` Lars Schneider
@ 2016-09-15 19:44           ` Jeff King
  2016-09-15 20:19             ` Lars Schneider
  0 siblings, 1 reply; 45+ messages in thread
From: Jeff King @ 2016-09-15 19:44 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Junio C Hamano, Git Mailing List, sbeller, Johannes.Schindelin,
	jnareb, mlbright, tboegi, jacob.keller

On Thu, Sep 15, 2016 at 05:42:58PM +0100, Lars Schneider wrote:

> >>>> +int packet_flush_gently(int fd)
> >>>> +{
> >>>> +	packet_trace("0000", 4, 1);
> >>>> +	if (write_in_full(fd, "0000", 4) == 4)
> >>>> +		return 0;
> >>>> +	error("flush packet write failed");
> >>>> +	return -1;
> [...]
> >>> I suspect that it is a strong sign that the caller wants to be in
> >>> control of when and what error message is produced; otherwise it
> >>> wouldn't be calling the _gently() variant, no?
> >> 
> >> Agreed!
> > 
> > I am also OK with the current form, too.  Those who need to enhance
> > it to packet_flush_gently(int fd, int quiet) can come later.
> 
> "caller wants to be in control [...] otherwise it wouldn't be calling 
> the _gently() variant" convinced me. I would like to change it like
> this:
> 
> 	trace_printf_key(&trace_packet, "flush packet write failed");
> 	return -1;
> 
> Objections?

I'm not sure that a trace makes sense, because it means that 99% of the
time we are silent. AFAICT, the question is not "sometimes the user
needs to see an error and sometimes not, and they should decide before
starting the program". It is "sometimes the caller will report the error
to the user as appropriate, and sometimes we need to do so". And only
the calling code knows which is which.

So the "right" pattern is either:

  1. Return -1 and the caller is responsible for telling the user.

or

  2. Return -1 and stuff the error into an error strbuf, so it can be
     passed up the call chain easily (and callers do not have to come up
     with their own wording).

But if all current callers would just call error() themselves anyway,
then it's OK to punt on this and let somebody else handle it later if
they add a new caller who wants different behavior (and that is what
Junio was saying above, I think).

-Peff

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-15 17:23           ` Lars Schneider
@ 2016-09-15 20:04             ` Junio C Hamano
  2016-09-29  6:33               ` Torsten Bögershausen
  0 siblings, 1 reply; 45+ messages in thread
From: Junio C Hamano @ 2016-09-15 20:04 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Torsten Bögershausen, Git Mailing List, peff, sbeller,
	Johannes.Schindelin, jnareb, mlbright, jacob.keller

Lars Schneider <larsxschneider@gmail.com> writes:

> Wouldn't that complicate the pathname parsing on the filter side?
> Can't we just define in our filter protocol documentation that our 
> "pathname" packet _always_ has a trailing "\n"? That would mean the 
> receiver would know a packet "pathname=ABC\n\n" encodes the path
> "ABC\n" [1].

That's fine, too.  If you declare that pathname over the protocol is
a binary thing, you can also define that the packet does not have
the terminating \n, i.e. the example encodes the path "ABC\n\n",
which is also OK ;-)

As long as the rule is clearly documented, easy for filter
implementors to follow it, and hard for them to get it wrong, I'd be
perfectly happy.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-13 15:22   ` Junio C Hamano
@ 2016-09-15 20:16     ` Lars Schneider
  2016-09-15 20:24       ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Lars Schneider @ 2016-09-15 20:16 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: GIT Mailing-list, peff, sbeller, Johannes.Schindelin, jnareb,
	mlbright, tboegi, jacob.keller


> On 13 Sep 2016, at 17:22, Junio C Hamano <gitster@pobox.com> wrote:
> 
> larsxschneider@gmail.com writes:
> 
>> diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl
>> ...
>> +sub packet_read {
>> +    my $buffer;
>> +    my $bytes_read = read STDIN, $buffer, 4;
>> +    if ( $bytes_read == 0 ) {
>> +
>> +        # EOF - Git stopped talking to us!
>> +        exit();
>> +...
>> +packet_write( "clean=true\n" );
>> +packet_write( "smudge=true\n" );
>> +packet_flush();
>> +
>> +while (1) {
> 
> These extra SP around the contents inside () pair look unfamiliar
> and somewhat strange to me, but as long as they are consistently
> done (and I think you are mostly being consistent), it is OK.

Ups. I forgot to run PerlTidy here. I run PerlTidy with the flag 
"-pbp" (= Perl Best Practices). This seems to add no extra SP for
functions with one parameter (e.g. `foo("bar")`) and extra SP
for functions with multiple parameter (e.g. `foo( "bar", 1 )`).
Is this still OK?

Does anyone have a "Git PerlTidy configuration"?


> 
>> +#define CAP_CLEAN    (1u<<0)
>> +#define CAP_SMUDGE   (1u<<1)
> 
> As these are meant to be usable together, i.e. bits in a single flag
> word, they are of type "unsigned int", which makes perfect sense.
> 
> Make sure your variables and fields that store them are of the same
> type.  I think I saw "int' used to pass them in at least one place.

Fixed!


>> +static int apply_filter(const char *path, const char *src, size_t len,
>> +                        int fd, struct strbuf *dst, struct convert_driver *drv,
>> +                        const int wanted_capability)
>> +{
>> +	const char* cmd = NULL;
> 
> "const char *cmd = NULL;" of course.

Fixed!


>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index 11c37fb..f6798f8 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -10,6 +10,7 @@
>> #include "attr.h"
>> #include "split-index.h"
>> #include "dir.h"
>> +#include "convert.h"
>> 
>> /*
>>  * Error messages expected by scripts out of plumbing commands such as
> 
> Why?  The resulting file seems to compile without this addition.

Of course. That shouldn't have been part of this commit.


Thank you,
Lars





^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()
  2016-09-15 19:44           ` Jeff King
@ 2016-09-15 20:19             ` Lars Schneider
  2016-09-15 20:33               ` Junio C Hamano
  0 siblings, 1 reply; 45+ messages in thread
From: Lars Schneider @ 2016-09-15 20:19 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Git Mailing List, sbeller, Johannes.Schindelin,
	jnareb, mlbright, tboegi, jacob.keller


> On 15 Sep 2016, at 21:44, Jeff King <peff@peff.net> wrote:
> 
> On Thu, Sep 15, 2016 at 05:42:58PM +0100, Lars Schneider wrote:
> 
>>>>>> +int packet_flush_gently(int fd)
>>>>>> +{
>>>>>> +	packet_trace("0000", 4, 1);
>>>>>> +	if (write_in_full(fd, "0000", 4) == 4)
>>>>>> +		return 0;
>>>>>> +	error("flush packet write failed");
>>>>>> +	return -1;
>> [...]
>>>>> I suspect that it is a strong sign that the caller wants to be in
>>>>> control of when and what error message is produced; otherwise it
>>>>> wouldn't be calling the _gently() variant, no?
>>>> 
>>>> Agreed!
>>> 
>>> I am also OK with the current form, too.  Those who need to enhance
>>> it to packet_flush_gently(int fd, int quiet) can come later.
>> 
>> "caller wants to be in control [...] otherwise it wouldn't be calling 
>> the _gently() variant" convinced me. I would like to change it like
>> this:
>> 
>> 	trace_printf_key(&trace_packet, "flush packet write failed");
>> 	return -1;
>> 
>> Objections?
> 
> I'm not sure that a trace makes sense, because it means that 99% of the
> time we are silent. AFAICT, the question is not "sometimes the user
> needs to see an error and sometimes not, and they should decide before
> starting the program". It is "sometimes the caller will report the error
> to the user as appropriate, and sometimes we need to do so". And only
> the calling code knows which is which.
> 
> So the "right" pattern is either:
> 
>  1. Return -1 and the caller is responsible for telling the user.
> 
> or
> 
>  2. Return -1 and stuff the error into an error strbuf, so it can be
>     passed up the call chain easily (and callers do not have to come up
>     with their own wording).
> 
> But if all current callers would just call error() themselves anyway,
> then it's OK to punt on this and let somebody else handle it later if
> they add a new caller who wants different behavior (and that is what
> Junio was saying above, I think).

OK. I'll go with 1. then.

Thanks,
Lars

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-15 20:16     ` Lars Schneider
@ 2016-09-15 20:24       ` Junio C Hamano
  0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2016-09-15 20:24 UTC (permalink / raw)
  To: Lars Schneider
  Cc: GIT Mailing-list, peff, sbeller, Johannes.Schindelin, jnareb,
	mlbright, tboegi, jacob.keller

Lars Schneider <larsxschneider@gmail.com> writes:

>> On 13 Sep 2016, at 17:22, Junio C Hamano <gitster@pobox.com> wrote:
>> 
>> larsxschneider@gmail.com writes:
>> 
>>> diff --git a/contrib/long-running-filter/example.pl b/contrib/long-running-filter/example.pl
>>> ...
>>> +packet_write( "clean=true\n" );
>>> +packet_write( "smudge=true\n" );
>>
>> These extra SP around the contents inside () pair look unfamiliar
>> and somewhat strange to me, but as long as they are consistently
>> done (and I think you are mostly being consistent), it is OK.
>
> Ups. I forgot to run PerlTidy here. I run PerlTidy with the flag 
> "-pbp" (= Perl Best Practices). This seems to add no extra SP for
> functions with one parameter (e.g. `foo("bar")`) and extra SP
> for functions with multiple parameter (e.g. `foo( "bar", 1 )`).
> Is this still OK?

Your choice.  I already said I do not care too much either way as
long as you are consistent.

If you prefer PerlTidy's PBP output over what you wrote, and if you
are resending the patch anyway, then why not? ;-)

> Does anyone have a "Git PerlTidy configuration"?

Not me.

Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()
  2016-09-15 20:19             ` Lars Schneider
@ 2016-09-15 20:33               ` Junio C Hamano
  0 siblings, 0 replies; 45+ messages in thread
From: Junio C Hamano @ 2016-09-15 20:33 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Jeff King, Git Mailing List, sbeller, Johannes.Schindelin,
	jnareb, mlbright, tboegi, jacob.keller

Lars Schneider <larsxschneider@gmail.com> writes:

>> So the "right" pattern is either:
>> 
>>  1. Return -1 and the caller is responsible for telling the user.
>> 

... which is valid only if there aren't different kinds of errors
that all return -1; with "return error(...)" with different
messages, the users can tell what kind of error they got (while the
caller may just do the same abort-procedure no matter what kind of
error it got), but if all of them are replaced with "return -1", the
caller cannot produce different error messages to tell the users.

>>  2. Return -1 and stuff the error into an error strbuf, so it can be
>>     passed up the call chain easily (and callers do not have to come up
>>     with their own wording).

... and this would become one of the viable options (the other is to
define your own error code so that the caller can tell what error it
got).

>> But if all current callers would just call error() themselves anyway,
>> then it's OK to punt on this and let somebody else handle it later if
>> they add a new caller who wants different behavior (and that is what
>> Junio was saying above, I think).

Yes.  Just keeping it noisy until somebody wants a quiet-and-gentle
version is probably the best course to take.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-15 20:04             ` Junio C Hamano
@ 2016-09-29  6:33               ` Torsten Bögershausen
  2016-09-29  9:37                 ` Jakub Narębski
  0 siblings, 1 reply; 45+ messages in thread
From: Torsten Bögershausen @ 2016-09-29  6:33 UTC (permalink / raw)
  To: Junio C Hamano, Lars Schneider
  Cc: Torsten Bögershausen, Git Mailing List, peff, sbeller,
	Johannes.Schindelin, jnareb, mlbright, jacob.keller

On 15.09.16 22:04, Junio C Hamano wrote:
> Lars Schneider <larsxschneider@gmail.com> writes:
> 
>> Wouldn't that complicate the pathname parsing on the filter side?
>> Can't we just define in our filter protocol documentation that our 
>> "pathname" packet _always_ has a trailing "\n"? That would mean the 
>> receiver would know a packet "pathname=ABC\n\n" encodes the path
>> "ABC\n" [1].
> 
> That's fine, too.  If you declare that pathname over the protocol is
> a binary thing, you can also define that the packet does not have
> the terminating \n, i.e. the example encodes the path "ABC\n\n",
> which is also OK ;-)
> 
> As long as the rule is clearly documented, easy for filter
> implementors to follow it, and hard for them to get it wrong, I'd be
> perfectly happy.
>

(Sorry for the late reply)

In V8 the additional "\n" is clearly documented.

On the long run,
I would suggest to be more clear what BINARY is:

--- a/Documentation/technical/protocol-common.txt
+++ b/Documentation/technical/protocol-common.txt
@@ -61,6 +61,9 @@ the length's hexadecimal representation.
 A pkt-line MAY contain binary data, so implementors MUST ensure
 pkt-line parsing/formatting routines are 8-bit clean.
 
+Each pkt-line that may contain ASCII control characters should
+be treated as binary.
+


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH v7 10/10] convert: add filter.<driver>.process option
  2016-09-29  6:33               ` Torsten Bögershausen
@ 2016-09-29  9:37                 ` Jakub Narębski
  0 siblings, 0 replies; 45+ messages in thread
From: Jakub Narębski @ 2016-09-29  9:37 UTC (permalink / raw)
  To: Torsten Bögershausen, Junio C Hamano, Lars Schneider
  Cc: Git Mailing List, Jeff King, Stefan Beller, Johannes Schindelin,
	Martin-Louis Bright, Jacob Keller

W dniu 29.09.2016 o 08:33, Torsten Bögershausen pisze:
> On 15.09.16 22:04, Junio C Hamano wrote:
>> Lars Schneider <larsxschneider@gmail.com> writes:
>>
>>> Wouldn't that complicate the pathname parsing on the filter side?
>>> Can't we just define in our filter protocol documentation that our 
>>> "pathname" packet _always_ has a trailing "\n"? That would mean the 
>>> receiver would know a packet "pathname=ABC\n\n" encodes the path
>>> "ABC\n" [1].
>>
>> That's fine, too.  If you declare that pathname over the protocol is
>> a binary thing, you can also define that the packet does not have
>> the terminating \n, i.e. the example encodes the path "ABC\n\n",
>> which is also OK ;-)
>>
>> As long as the rule is clearly documented, easy for filter
>> implementors to follow it, and hard for them to get it wrong, I'd be
>> perfectly happy.
>>
> 
> (Sorry for the late reply)
> 
> In V8 the additional "\n" is clearly documented.
> 
> On the long run,
> I would suggest to be more clear what BINARY is:
> 
> --- a/Documentation/technical/protocol-common.txt
> +++ b/Documentation/technical/protocol-common.txt
> @@ -61,6 +61,9 @@ the length's hexadecimal representation.
>  A pkt-line MAY contain binary data, so implementors MUST ensure
>  pkt-line parsing/formatting routines are 8-bit clean.
>  
> +Each pkt-line that may contain ASCII control characters should
> +be treated as binary.
> +

Well, it is not as clear cut with pathnames.  Sane pathnames should
not contain control characters, even if they are outside US-ASCII,
assuming sane filesystem pathnames charset (like UTF-8).

One thing pathname cannot include is NUL ("\0") character.

So in most cases they are ASCII, but might not be.  Not that 
pkt-line text packets are binary-unsafe... I think the trailing
"\n" is here for easier debugging.

http://www.dwheeler.com/essays/filenames-in-shell.html
http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html

-- 
Jakub Narębski

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-09-29  9:38 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-08 18:21 [PATCH v7 00/10] Git filter protocol larsxschneider
2016-09-08 18:21 ` [PATCH v7 01/10] pkt-line: rename packet_write() to packet_write_fmt() larsxschneider
2016-09-08 18:21 ` [PATCH v7 02/10] pkt-line: extract set_packet_header() larsxschneider
2016-09-08 18:21 ` [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently() larsxschneider
2016-09-08 21:18   ` Stefan Beller
2016-09-11 11:36     ` Lars Schneider
2016-09-11 16:01       ` Stefan Beller
2016-09-12  9:22         ` Lars Schneider
2016-09-08 18:21 ` [PATCH v7 04/10] pkt-line: add packet_flush_gently() larsxschneider
2016-09-12 23:30   ` Junio C Hamano
2016-09-13 22:12     ` Lars Schneider
2016-09-13 22:44       ` Junio C Hamano
2016-09-15 16:42         ` Lars Schneider
2016-09-15 19:44           ` Jeff King
2016-09-15 20:19             ` Lars Schneider
2016-09-15 20:33               ` Junio C Hamano
2016-09-08 18:21 ` [PATCH v7 05/10] pkt-line: add packet_write_gently() larsxschneider
2016-09-08 21:24   ` Stefan Beller
2016-09-11 11:44     ` Lars Schneider
2016-09-12 23:31   ` Junio C Hamano
2016-09-08 18:21 ` [PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams larsxschneider
2016-09-08 21:49   ` Stefan Beller
2016-09-11 12:33     ` Lars Schneider
2016-09-11 16:03       ` Stefan Beller
2016-09-11 21:42     ` Junio C Hamano
2016-09-08 18:21 ` [PATCH v7 07/10] convert: quote filter names in error messages larsxschneider
2016-09-08 18:21 ` [PATCH v7 08/10] convert: modernize tests larsxschneider
2016-09-08 22:05   ` Stefan Beller
2016-09-11 12:34     ` Lars Schneider
2016-09-08 18:21 ` [PATCH v7 09/10] convert: make apply_filter() adhere to standard Git error handling larsxschneider
2016-09-08 18:21 ` [PATCH v7 10/10] convert: add filter.<driver>.process option larsxschneider
2016-09-10  6:29   ` Torsten Bögershausen
2016-09-12  9:49     ` Lars Schneider
2016-09-13 14:44       ` Torsten Bögershausen
2016-09-13 16:42         ` Junio C Hamano
2016-09-15 17:23           ` Lars Schneider
2016-09-15 20:04             ` Junio C Hamano
2016-09-29  6:33               ` Torsten Bögershausen
2016-09-29  9:37                 ` Jakub Narębski
2016-09-10 16:40   ` Torsten Bögershausen
2016-09-13 22:04     ` Lars Schneider
2016-09-13 15:22   ` Junio C Hamano
2016-09-15 20:16     ` Lars Schneider
2016-09-15 20:24       ` Junio C Hamano
2016-09-13 16:00 ` [PATCH v7 00/10] Git filter protocol Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.