All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] port upload-archive to Windows
@ 2011-07-18 18:08 Erik Faye-Lund
  2011-07-18 18:08 ` [PATCH v2 1/4] compat/win32/sys/poll.c: upgrade from upstream Erik Faye-Lund
                   ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: Erik Faye-Lund @ 2011-07-18 18:08 UTC (permalink / raw)
  To: git; +Cc: peff, j6t

Here's the updated version of this series. Changes since last time:
 - Patch 3/4 is new, makes enter_repo take const-input
 - Patch 4/4 (formerly 3/3): added a call to enter_repo to make sure
   the path-argument is taken into account.

Erik Faye-Lund (4):
  compat/win32/sys/poll.c: upgrade from upstream
  mingw: fix compilation of poll-emulation
  enter_repo: do not modify input
  upload-archive: use start_command instead of fork

 builtin/upload-archive.c |   65 +++++++++++++--------------------------------
 cache.h                  |    2 +-
 compat/mingw.h           |    2 -
 compat/win32/sys/poll.c  |   17 ++++++++---
 daemon.c                 |    4 +-
 path.c                   |   30 ++++++++++-----------
 6 files changed, 48 insertions(+), 72 deletions(-)

-- 
1.7.6.135.g378e9


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 1/4] compat/win32/sys/poll.c: upgrade from upstream
  2011-07-18 18:08 [PATCH v2 0/4] port upload-archive to Windows Erik Faye-Lund
@ 2011-07-18 18:08 ` Erik Faye-Lund
  2011-07-18 18:08 ` [PATCH v2 2/4] mingw: fix compilation of poll-emulation Erik Faye-Lund
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 33+ messages in thread
From: Erik Faye-Lund @ 2011-07-18 18:08 UTC (permalink / raw)
  To: git; +Cc: peff, j6t

poll.c is updated from revision adc3a5b in
git://git.savannah.gnu.org/gnulib.git

The changes are applied with --whitespace=fix to reduce noise.

poll.h is not upgraded, because the most recent version now
contains template-stuff that breaks compilation for us.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
---
 compat/win32/sys/poll.c |   19 +++++++++++++------
 1 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/compat/win32/sys/poll.c b/compat/win32/sys/poll.c
index 708a6c9..403eaa7 100644
--- a/compat/win32/sys/poll.c
+++ b/compat/win32/sys/poll.c
@@ -1,7 +1,7 @@
 /* Emulation for poll(2)
    Contributed by Paolo Bonzini.
 
-   Copyright 2001-2003, 2006-2010 Free Software Foundation, Inc.
+   Copyright 2001-2003, 2006-2011 Free Software Foundation, Inc.
 
    This file is part of gnulib.
 
@@ -27,7 +27,10 @@
 #include <malloc.h>
 
 #include <sys/types.h>
-#include "poll.h"
+
+/* Specification.  */
+#include <poll.h>
+
 #include <errno.h>
 #include <limits.h>
 #include <assert.h>
@@ -314,10 +317,7 @@ compute_revents (int fd, int sought, fd_set *rfds, fd_set *wfds, fd_set *efds)
 #endif /* !MinGW */
 
 int
-poll (pfd, nfd, timeout)
-     struct pollfd *pfd;
-     nfds_t nfd;
-     int timeout;
+poll (struct pollfd *pfd, nfds_t nfd, int timeout)
 {
 #ifndef WIN32_NATIVE
   fd_set rfds, wfds, efds;
@@ -454,6 +454,7 @@ poll (pfd, nfd, timeout)
   if (!hEvent)
     hEvent = CreateEvent (NULL, FALSE, FALSE, NULL);
 
+restart:
   handle_array[0] = hEvent;
   nhandles = 1;
   FD_ZERO (&rfds);
@@ -594,6 +595,12 @@ poll (pfd, nfd, timeout)
 	rc++;
     }
 
+  if (!rc && timeout == INFTIM)
+    {
+      SwitchToThread();
+      goto restart;
+    }
+
   return rc;
 #endif
 }
-- 
1.7.6.135.g378e9

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 2/4] mingw: fix compilation of poll-emulation
  2011-07-18 18:08 [PATCH v2 0/4] port upload-archive to Windows Erik Faye-Lund
  2011-07-18 18:08 ` [PATCH v2 1/4] compat/win32/sys/poll.c: upgrade from upstream Erik Faye-Lund
@ 2011-07-18 18:08 ` Erik Faye-Lund
  2011-07-18 18:08 ` [PATCH v2 3/4] enter_repo: do not modify input Erik Faye-Lund
  2011-07-18 18:08 ` [PATCH v2 4/4] upload-archive: use start_command instead of fork Erik Faye-Lund
  3 siblings, 0 replies; 33+ messages in thread
From: Erik Faye-Lund @ 2011-07-18 18:08 UTC (permalink / raw)
  To: git; +Cc: peff, j6t

gnulib has changed the inclusion of poll.h from double quotes
to single-quotes. But because compat/win32/sys/ isn't in our
include-path, this breaks compilation. Change it back the way
it was.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
---
 compat/win32/sys/poll.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/compat/win32/sys/poll.c b/compat/win32/sys/poll.c
index 403eaa7..225ddce 100644
--- a/compat/win32/sys/poll.c
+++ b/compat/win32/sys/poll.c
@@ -29,7 +29,7 @@
 #include <sys/types.h>
 
 /* Specification.  */
-#include <poll.h>
+#include "poll.h"
 
 #include <errno.h>
 #include <limits.h>
-- 
1.7.6.135.g378e9

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 3/4] enter_repo: do not modify input
  2011-07-18 18:08 [PATCH v2 0/4] port upload-archive to Windows Erik Faye-Lund
  2011-07-18 18:08 ` [PATCH v2 1/4] compat/win32/sys/poll.c: upgrade from upstream Erik Faye-Lund
  2011-07-18 18:08 ` [PATCH v2 2/4] mingw: fix compilation of poll-emulation Erik Faye-Lund
@ 2011-07-18 18:08 ` Erik Faye-Lund
  2011-07-18 18:08 ` [PATCH v2 4/4] upload-archive: use start_command instead of fork Erik Faye-Lund
  3 siblings, 0 replies; 33+ messages in thread
From: Erik Faye-Lund @ 2011-07-18 18:08 UTC (permalink / raw)
  To: git; +Cc: peff, j6t

entr_repo(..., 0) currently modifies the input to strip away
trailing slashes. This means that we some times need to copy the
input to keep the original.

Change it to unconditionally copy it into the used_path buffer so
we can safely use the input without having to copy it.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
---
 cache.h  |    2 +-
 daemon.c |    4 ++--
 path.c   |   30 ++++++++++++++----------------
 3 files changed, 17 insertions(+), 19 deletions(-)

diff --git a/cache.h b/cache.h
index bc9e5eb..02386f1 100644
--- a/cache.h
+++ b/cache.h
@@ -716,7 +716,7 @@ int safe_create_leading_directories(char *path);
 int safe_create_leading_directories_const(const char *path);
 int mkdir_in_gitdir(const char *path);
 extern char *expand_user_path(const char *path);
-char *enter_repo(char *path, int strict);
+const char *enter_repo(const char *path, int strict);
 static inline int is_absolute_path(const char *path)
 {
 	return is_dir_sep(path[0]) || has_dos_drive_prefix(path);
diff --git a/daemon.c b/daemon.c
index 4c8346d..9253192 100644
--- a/daemon.c
+++ b/daemon.c
@@ -108,11 +108,11 @@ static void NORETURN daemon_die(const char *err, va_list params)
 	exit(1);
 }
 
-static char *path_ok(char *directory)
+static const char *path_ok(char *directory)
 {
 	static char rpath[PATH_MAX];
 	static char interp_path[PATH_MAX];
-	char *path;
+	const char *path;
 	char *dir;
 
 	dir = directory;
diff --git a/path.c b/path.c
index 4d73cc9..2757d8b 100644
--- a/path.c
+++ b/path.c
@@ -283,7 +283,7 @@ return_null:
  * links.  User relative paths are also returned as they are given,
  * except DWIM suffixing.
  */
-char *enter_repo(char *path, int strict)
+const char *enter_repo(const char *path, int strict)
 {
 	static char used_path[PATH_MAX];
 	static char validated_path[PATH_MAX];
@@ -297,14 +297,15 @@ char *enter_repo(char *path, int strict)
 		};
 		int len = strlen(path);
 		int i;
-		while ((1 < len) && (path[len-1] == '/')) {
-			path[len-1] = 0;
+		while ((1 < len) && (path[len-1] == '/'))
 			len--;
-		}
+
 		if (PATH_MAX <= len)
 			return NULL;
-		if (path[0] == '~') {
-			char *newpath = expand_user_path(path);
+		strncpy(used_path, path, len);
+
+		if (used_path[0] == '~') {
+			char *newpath = expand_user_path(used_path);
 			if (!newpath || (PATH_MAX - 10 < strlen(newpath))) {
 				free(newpath);
 				return NULL;
@@ -316,24 +317,21 @@ char *enter_repo(char *path, int strict)
 			 * anyway.
 			 */
 			strcpy(used_path, newpath); free(newpath);
-			strcpy(validated_path, path);
-			path = used_path;
+			strcpy(validated_path, used_path);
 		}
 		else if (PATH_MAX - 10 < len)
 			return NULL;
-		else {
-			path = strcpy(used_path, path);
-			strcpy(validated_path, path);
-		}
-		len = strlen(path);
+		else
+			strcpy(validated_path, used_path);
+		len = strlen(used_path);
 		for (i = 0; suffix[i]; i++) {
-			strcpy(path + len, suffix[i]);
-			if (!access(path, F_OK)) {
+			strcpy(used_path + len, suffix[i]);
+			if (!access(used_path, F_OK)) {
 				strcat(validated_path, suffix[i]);
 				break;
 			}
 		}
-		if (!suffix[i] || chdir(path))
+		if (!suffix[i] || chdir(used_path))
 			return NULL;
 		path = validated_path;
 	}
-- 
1.7.6.135.g378e9

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-07-18 18:08 [PATCH v2 0/4] port upload-archive to Windows Erik Faye-Lund
                   ` (2 preceding siblings ...)
  2011-07-18 18:08 ` [PATCH v2 3/4] enter_repo: do not modify input Erik Faye-Lund
@ 2011-07-18 18:08 ` Erik Faye-Lund
  2011-07-19 21:09   ` Junio C Hamano
  3 siblings, 1 reply; 33+ messages in thread
From: Erik Faye-Lund @ 2011-07-18 18:08 UTC (permalink / raw)
  To: git; +Cc: peff, j6t

The POSIX-function fork is not supported on Windows. Use our
start_command API instead.

As this is the last call-site that depends on the fork-stub in
compat/mingw.h, remove that as well.

Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
---
 builtin/upload-archive.c |   65 +++++++++++++--------------------------------
 compat/mingw.h           |    2 -
 2 files changed, 19 insertions(+), 48 deletions(-)

diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 73f788e..0c192b5 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -6,6 +6,7 @@
 #include "archive.h"
 #include "pkt-line.h"
 #include "sideband.h"
+#include "run-command.h"
 
 static const char upload_archive_usage[] =
 	"git upload-archive <repo>";
@@ -18,28 +19,16 @@ static const char lostchild[] =
 
 #define MAX_ARGS (64)
 
-static int run_upload_archive(int argc, const char **argv, const char *prefix)
+static void prepare_argv(const char **sent_argv, const char **argv)
 {
-	const char *sent_argv[MAX_ARGS];
 	const char *arg_cmd = "argument ";
 	char *p, buf[4096];
 	int sent_argc;
 	int len;
 
-	if (argc != 2)
-		usage(upload_archive_usage);
-
-	if (strlen(argv[1]) + 1 > sizeof(buf))
-		die("insanely long repository name");
-
-	strcpy(buf, argv[1]); /* enter-repo smudges its argument */
-
-	if (!enter_repo(buf, 0))
-		die("'%s' does not appear to be a git repository", buf);
-
 	/* put received options in sent_argv[] */
 	sent_argc = 1;
-	sent_argv[0] = "git-upload-archive";
+	sent_argv[0] = "archive";
 	for (p = buf;;) {
 		/* This will die if not enough free space in buf */
 		len = packet_read_line(0, p, (buf + sizeof buf) - p);
@@ -62,9 +51,6 @@ static int run_upload_archive(int argc, const char **argv, const char *prefix)
 		*p++ = 0;
 	}
 	sent_argv[sent_argc] = NULL;
-
-	/* parse all options sent by the client */
-	return write_archive(sent_argc, sent_argv, prefix, 0);
 }
 
 __attribute__((format (printf, 1, 2)))
@@ -96,38 +82,25 @@ static ssize_t process_input(int child_fd, int band)
 
 int cmd_upload_archive(int argc, const char **argv, const char *prefix)
 {
-	pid_t writer;
-	int fd1[2], fd2[2];
-	/*
-	 * Set up sideband subprocess.
-	 *
-	 * We (parent) monitor and read from child, sending its fd#1 and fd#2
-	 * multiplexed out to our fd#1.  If the child dies, we tell the other
-	 * end over channel #3.
-	 */
-	if (pipe(fd1) < 0 || pipe(fd2) < 0) {
-		int err = errno;
-		packet_write(1, "NACK pipe failed on the remote side\n");
-		die("upload-archive: %s", strerror(err));
-	}
-	writer = fork();
-	if (writer < 0) {
+	const char *sent_argv[MAX_ARGS];
+	struct child_process cld = { sent_argv };
+	cld.out = cld.err = -1;
+	cld.git_cmd = 1;
+
+	if (argc != 2)
+		usage(upload_archive_usage);
+
+	if (!enter_repo(argv[1], 0))
+		die("'%s' does not appear to be a git repository", argv[1]);
+
+	prepare_argv(sent_argv, argv);
+	if (start_command(&cld)) {
 		int err = errno;
 		packet_write(1, "NACK fork failed on the remote side\n");
 		die("upload-archive: %s", strerror(err));
 	}
-	if (!writer) {
-		/* child - connect fd#1 and fd#2 to the pipe */
-		dup2(fd1[1], 1);
-		dup2(fd2[1], 2);
-		close(fd1[1]); close(fd2[1]);
-		close(fd1[0]); close(fd2[0]); /* we do not read from pipe */
-
-		exit(run_upload_archive(argc, argv, prefix));
-	}
 
 	/* parent - read from child, multiplex and send out to fd#1 */
-	close(fd1[1]); close(fd2[1]); /* we do not write to pipe */
 	packet_write(1, "ACK\n");
 	packet_flush(1);
 
@@ -135,9 +108,9 @@ int cmd_upload_archive(int argc, const char **argv, const char *prefix)
 		struct pollfd pfd[2];
 		int status;
 
-		pfd[0].fd = fd1[0];
+		pfd[0].fd = cld.out;
 		pfd[0].events = POLLIN;
-		pfd[1].fd = fd2[0];
+		pfd[1].fd = cld.err;
 		pfd[1].events = POLLIN;
 		if (poll(pfd, 2, -1) < 0) {
 			if (errno != EINTR) {
@@ -156,7 +129,7 @@ int cmd_upload_archive(int argc, const char **argv, const char *prefix)
 			if (process_input(pfd[0].fd, 1))
 				continue;
 
-		if (waitpid(writer, &status, 0) < 0)
+		if (waitpid(cld.pid, &status, 0) < 0)
 			error_clnt("%s", lostchild);
 		else if (!WIFEXITED(status) || WEXITSTATUS(status) > 0)
 			error_clnt("%s", deadchild);
diff --git a/compat/mingw.h b/compat/mingw.h
index ce9dd98..9cb6eb9 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -85,8 +85,6 @@ static inline int symlink(const char *oldpath, const char *newpath)
 { errno = ENOSYS; return -1; }
 static inline int fchmod(int fildes, mode_t mode)
 { errno = ENOSYS; return -1; }
-static inline pid_t fork(void)
-{ errno = ENOSYS; return -1; }
 static inline unsigned int alarm(unsigned int seconds)
 { return 0; }
 static inline int fsync(int fd)
-- 
1.7.6.135.g378e9

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-07-18 18:08 ` [PATCH v2 4/4] upload-archive: use start_command instead of fork Erik Faye-Lund
@ 2011-07-19 21:09   ` Junio C Hamano
  2011-07-28  8:32     ` Erik Faye-Lund
  0 siblings, 1 reply; 33+ messages in thread
From: Junio C Hamano @ 2011-07-19 21:09 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: git, peff, j6t

Erik Faye-Lund <kusmabite@gmail.com> writes:

> @@ -18,28 +19,16 @@ static const char lostchild[] =
>  
>  #define MAX_ARGS (64)
>  
> -static int run_upload_archive(int argc, const char **argv, const char *prefix)
> +static void prepare_argv(const char **sent_argv, const char **argv)
>  {
> -	const char *sent_argv[MAX_ARGS];
>  	const char *arg_cmd = "argument ";
>  	char *p, buf[4096];
>  	int sent_argc;
>  	int len;
>  
> -	if (argc != 2)
> -		usage(upload_archive_usage);
> -
> -	if (strlen(argv[1]) + 1 > sizeof(buf))
> -		die("insanely long repository name");
> -
> -	strcpy(buf, argv[1]); /* enter-repo smudges its argument */
> -
> -	if (!enter_repo(buf, 0))
> -		die("'%s' does not appear to be a git repository", buf);
> -
>  	/* put received options in sent_argv[] */
>  	sent_argc = 1;
> -	sent_argv[0] = "git-upload-archive";
> +	sent_argv[0] = "archive";
>  	for (p = buf;;) {
>  		/* This will die if not enough free space in buf */
>  		len = packet_read_line(0, p, (buf + sizeof buf) - p);
> @@ -62,9 +51,6 @@ static int run_upload_archive(int argc, const char **argv, const char *prefix)
>  		*p++ = 0;
>  	}
>  	sent_argv[sent_argc] = NULL;
> -
> -	/* parse all options sent by the client */
> -	return write_archive(sent_argc, sent_argv, prefix, 0);
>  }

Hmm, how well would this change work with recent archive-filter work by
Peff, especially with the "when remote, disable some features" bits?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-07-19 21:09   ` Junio C Hamano
@ 2011-07-28  8:32     ` Erik Faye-Lund
  2011-07-28 16:08       ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Erik Faye-Lund @ 2011-07-28  8:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, j6t

Sorry for the late reply. Things have been a bit hectic here in Oslo
over the last week.

On Tue, Jul 19, 2011 at 11:09 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Erik Faye-Lund <kusmabite@gmail.com> writes:
>
>> @@ -18,28 +19,16 @@ static const char lostchild[] =
>>
>>  #define MAX_ARGS (64)
>>
>> -static int run_upload_archive(int argc, const char **argv, const char *prefix)
>> +static void prepare_argv(const char **sent_argv, const char **argv)
>>  {
>> -     const char *sent_argv[MAX_ARGS];
>>       const char *arg_cmd = "argument ";
>>       char *p, buf[4096];
>>       int sent_argc;
>>       int len;
>>
>> -     if (argc != 2)
>> -             usage(upload_archive_usage);
>> -
>> -     if (strlen(argv[1]) + 1 > sizeof(buf))
>> -             die("insanely long repository name");
>> -
>> -     strcpy(buf, argv[1]); /* enter-repo smudges its argument */
>> -
>> -     if (!enter_repo(buf, 0))
>> -             die("'%s' does not appear to be a git repository", buf);
>> -
>>       /* put received options in sent_argv[] */
>>       sent_argc = 1;
>> -     sent_argv[0] = "git-upload-archive";
>> +     sent_argv[0] = "archive";
>>       for (p = buf;;) {
>>               /* This will die if not enough free space in buf */
>>               len = packet_read_line(0, p, (buf + sizeof buf) - p);
>> @@ -62,9 +51,6 @@ static int run_upload_archive(int argc, const char **argv, const char *prefix)
>>               *p++ = 0;
>>       }
>>       sent_argv[sent_argc] = NULL;
>> -
>> -     /* parse all options sent by the client */
>> -     return write_archive(sent_argc, sent_argv, prefix, 0);
>>  }
>
> Hmm, how well would this change work with recent archive-filter work by
> Peff, especially with the "when remote, disable some features" bits?
>

Thanks for the notice. And no, not at all :(

OK then. I'll probably have to rewrite this to use start_async +
dup-bonanza instead of start_command...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-07-28  8:32     ` Erik Faye-Lund
@ 2011-07-28 16:08       ` Jeff King
  2011-07-28 16:47         ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2011-07-28 16:08 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Junio C Hamano, git, j6t

On Thu, Jul 28, 2011 at 10:32:54AM +0200, Erik Faye-Lund wrote:

> > Hmm, how well would this change work with recent archive-filter work by
> > Peff, especially with the "when remote, disable some features" bits?
> >
> 
> Thanks for the notice. And no, not at all :(
> 
> OK then. I'll probably have to rewrite this to use start_async +
> dup-bonanza instead of start_command...

What's the issue? In my final version, I didn't think I changed
run_upload_archive much at all. I would think there would be a trivial
text conflict at the end of run_upload_archive, but the semantics should
otherwise be the same.

Let me know if you need help untangling it.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-07-28 16:08       ` Jeff King
@ 2011-07-28 16:47         ` Jeff King
  2011-07-28 17:02           ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2011-07-28 16:47 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Junio C Hamano, git, j6t

On Thu, Jul 28, 2011 at 10:08:45AM -0600, Jeff King wrote:

> > Thanks for the notice. And no, not at all :(
> > 
> > OK then. I'll probably have to rewrite this to use start_async +
> > dup-bonanza instead of start_command...
> 
> What's the issue? In my final version, I didn't think I changed
> run_upload_archive much at all. I would think there would be a trivial
> text conflict at the end of run_upload_archive, but the semantics should
> otherwise be the same.

Ah, nevermind. I see now; the command-line option interface to "git
archive" is not as rich as what you can pass via write_archive.

I think you can get around it by adding an option to "git archive" to
indicate that we are filling a remote request, which is the only
extra information that my series adds.

I do think the start_async approach is a little more natural, though. If
the dup() bits get too ugly, it would not be that hard to make the
archive code write to a specific descriptor.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-07-28 16:47         ` Jeff King
@ 2011-07-28 17:02           ` Jeff King
  2011-08-01 14:45             ` Erik Faye-Lund
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2011-07-28 17:02 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Junio C Hamano, git, j6t

On Thu, Jul 28, 2011 at 10:47:58AM -0600, Jeff King wrote:

> Ah, nevermind. I see now; the command-line option interface to "git
> archive" is not as rich as what you can pass via write_archive.
> 
> I think you can get around it by adding an option to "git archive" to
> indicate that we are filling a remote request, which is the only
> extra information that my series adds.

And that patch would look something like the one below. You can also now
drop the "remote" parameter to write_archive entirely, but I didn't do
so here. Applied on top of the merge of your series into next, this
passes t5000.

-Peff

---
diff --git a/builtin/archive.c b/builtin/archive.c
index 883c009..6668340 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -85,6 +85,7 @@ int cmd_archive(int argc, const char **argv, const char *prefix)
 	const char *exec = "git-upload-archive";
 	const char *output = NULL;
 	const char *remote = NULL;
+	int is_remote = 0;
 	struct option local_opts[] = {
 		OPT_STRING('o', "output", &output, "file",
 			"write the archive to this file"),
@@ -92,6 +93,9 @@ int cmd_archive(int argc, const char **argv, const char *prefix)
 			"retrieve the archive from remote repository <repo>"),
 		OPT_STRING(0, "exec", &exec, "cmd",
 			"path to the remote git-upload-archive command"),
+		{ OPTION_BOOLEAN, 0, "remote-request", &is_remote, NULL,
+			"indicate we are serving a remote request",
+			PARSE_OPT_NOARG | PARSE_OPT_HIDDEN },
 		OPT_END()
 	};
 
@@ -106,5 +110,5 @@ int cmd_archive(int argc, const char **argv, const char *prefix)
 
 	setvbuf(stderr, NULL, _IOLBF, BUFSIZ);
 
-	return write_archive(argc, argv, prefix, 1, output, 0);
+	return write_archive(argc, argv, prefix, 1, output, is_remote);
 }
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 0c192b5..c57e8bd 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -27,8 +27,9 @@ static void prepare_argv(const char **sent_argv, const char **argv)
 	int len;
 
 	/* put received options in sent_argv[] */
-	sent_argc = 1;
+	sent_argc = 2;
 	sent_argv[0] = "archive";
+	sent_argv[1] = "--remote-request";
 	for (p = buf;;) {
 		/* This will die if not enough free space in buf */
 		len = packet_read_line(0, p, (buf + sizeof buf) - p);

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-07-28 17:02           ` Jeff King
@ 2011-08-01 14:45             ` Erik Faye-Lund
  2011-08-01 17:46               ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Erik Faye-Lund @ 2011-08-01 14:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, j6t

On Thu, Jul 28, 2011 at 7:02 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Jul 28, 2011 at 10:47:58AM -0600, Jeff King wrote:
>
>> Ah, nevermind. I see now; the command-line option interface to "git
>> archive" is not as rich as what you can pass via write_archive.
>>
>> I think you can get around it by adding an option to "git archive" to
>> indicate that we are filling a remote request, which is the only
>> extra information that my series adds.
>
> And that patch would look something like the one below.

This is the most straight forward way of doing this, thanks. I'll post
a new version with this squashed in soon.

What's the proper way of attributing you for the work?

> You can also now
> drop the "remote" parameter to write_archive entirely, but I didn't do
> so here.

I'm not entirely sure how you propose we do this... do you mean by
hoisting the relevant logic from write_archive up to cmd_archive or
something?

> Applied on top of the merge of your series into next, this
> passes t5000.

Unfortunately, it doesn't pass on Windows, but this doesn't seem to be
directly related to the remote-stuff: Here's the tests that fail:


$ make t5000-tar-tree
*** t5000-tar-tree.sh ***
ok 1 - populate workdir
<snip>
ok 53 - infer tgz from .tar.gz filename
not ok - 54 extract tgz file
#
#               $GUNZIP -c <j.tgz >j.tar &&
#               test_cmp b.tar j.tar
#
not ok - 55 remote tar.gz is allowed by default
#
#               git archive --remote=. --format=tar.gz HEAD >remote.tar.gz &&
#               test_cmp j.tgz remote.tar.gz
#
ok 56 - remote tar.gz can be disabled
# failed 2 among 56 test(s)
1..56
make: *** [t5000-tar-tree.sh] Error 1

It seems "git archive --format=tgz HEAD" is broken on Windows:
$ git archive --format=tgz HEAD > j.tgz
$ gunzip -c j.tgz > j.tar && echo ok

gzip: j.tgz: invalid compressed data--format violated

But if I perform the compression manually, the archive is fine:
$ git archive --format=tar HEAD | gzip -cn > j.tgz
$ gunzip -c j.tgz > j.tar && echo ok
ok

So, the big question is, are all tar-filters broken on Windows? It
seems not; the tests for the foo-filter works fine... any clues? Could
it somehow be newline-related, perhaps?

>
> -Peff
>
> ---
> diff --git a/builtin/archive.c b/builtin/archive.c
> index 883c009..6668340 100644
> --- a/builtin/archive.c
> +++ b/builtin/archive.c
> @@ -85,6 +85,7 @@ int cmd_archive(int argc, const char **argv, const char *prefix)
>        const char *exec = "git-upload-archive";
>        const char *output = NULL;
>        const char *remote = NULL;
> +       int is_remote = 0;
>        struct option local_opts[] = {
>                OPT_STRING('o', "output", &output, "file",
>                        "write the archive to this file"),
> @@ -92,6 +93,9 @@ int cmd_archive(int argc, const char **argv, const char *prefix)
>                        "retrieve the archive from remote repository <repo>"),
>                OPT_STRING(0, "exec", &exec, "cmd",
>                        "path to the remote git-upload-archive command"),
> +               { OPTION_BOOLEAN, 0, "remote-request", &is_remote, NULL,
> +                       "indicate we are serving a remote request",
> +                       PARSE_OPT_NOARG | PARSE_OPT_HIDDEN },
>                OPT_END()
>        };
>
> @@ -106,5 +110,5 @@ int cmd_archive(int argc, const char **argv, const char *prefix)
>
>        setvbuf(stderr, NULL, _IOLBF, BUFSIZ);
>
> -       return write_archive(argc, argv, prefix, 1, output, 0);
> +       return write_archive(argc, argv, prefix, 1, output, is_remote);
>  }
> diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
> index 0c192b5..c57e8bd 100644
> --- a/builtin/upload-archive.c
> +++ b/builtin/upload-archive.c
> @@ -27,8 +27,9 @@ static void prepare_argv(const char **sent_argv, const char **argv)
>        int len;
>
>        /* put received options in sent_argv[] */
> -       sent_argc = 1;
> +       sent_argc = 2;
>        sent_argv[0] = "archive";
> +       sent_argv[1] = "--remote-request";
>        for (p = buf;;) {
>                /* This will die if not enough free space in buf */
>                len = packet_read_line(0, p, (buf + sizeof buf) - p);
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-01 14:45             ` Erik Faye-Lund
@ 2011-08-01 17:46               ` Jeff King
  2011-08-01 18:02                 ` Erik Faye-Lund
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2011-08-01 17:46 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Junio C Hamano, git, j6t

On Mon, Aug 01, 2011 at 04:45:03PM +0200, Erik Faye-Lund wrote:

> > And that patch would look something like the one below.
> 
> This is the most straight forward way of doing this, thanks. I'll post
> a new version with this squashed in soon.
> 
> What's the proper way of attributing you for the work?

If you're squashing, just make a note in the text, or add a Helped-by
header at the end.

> > You can also now
> > drop the "remote" parameter to write_archive entirely, but I didn't do
> > so here.
> 
> I'm not entirely sure how you propose we do this... do you mean by
> hoisting the relevant logic from write_archive up to cmd_archive or
> something?

Sorry, I wrote that comment, then totally changed how my patch was
implemented and failed to update the comment. :)

The arguments to git-archive actually get parsed in two places:

  1. cmd_archive passes them to decide on meta stuff like remote vs
     local

  2. write_archive has a parse_archive_args function to actually parse
     the arguments that both upload-archive and archive share

My patch puts the new argument in (1). We could put it in (2) and do
away with passing the flag all the way down the call stack. But it seems
conceptually cleaner to me to have it in (1) (and the diff is much
shorter, too).

> $ make t5000-tar-tree
> *** t5000-tar-tree.sh ***
> ok 1 - populate workdir
> <snip>
> ok 53 - infer tgz from .tar.gz filename
> not ok - 54 extract tgz file
> #
> #               $GUNZIP -c <j.tgz >j.tar &&
> #               test_cmp b.tar j.tar
> #
> not ok - 55 remote tar.gz is allowed by default
> #
> #               git archive --remote=. --format=tar.gz HEAD >remote.tar.gz &&
> #               test_cmp j.tgz remote.tar.gz
> #
> ok 56 - remote tar.gz can be disabled
> # failed 2 among 56 test(s)
> 1..56
> make: *** [t5000-tar-tree.sh] Error 1
> 
> It seems "git archive --format=tgz HEAD" is broken on Windows:
> $ git archive --format=tgz HEAD > j.tgz
> $ gunzip -c j.tgz > j.tar && echo ok
> 
> gzip: j.tgz: invalid compressed data--format violated
> 
> But if I perform the compression manually, the archive is fine:
> $ git archive --format=tar HEAD | gzip -cn > j.tgz
> $ gunzip -c j.tgz > j.tar && echo ok
> ok

Weird. What does j.tgz end up looking like? Is it empty, or does it have
bogus data in it? Does gzip actually get invoked at all? Try running
with GIT_TRACE=1. I don't suppose you guys have something like strace,
which might be helpful.

> So, the big question is, are all tar-filters broken on Windows? It
> seems not; the tests for the foo-filter works fine... any clues? Could
> it somehow be newline-related, perhaps?

I'm not sure that newlines would make a difference. We use straight
write() everywhere in the archive code, which should be binary safe.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-01 17:46               ` Jeff King
@ 2011-08-01 18:02                 ` Erik Faye-Lund
  2011-08-01 18:25                   ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: Erik Faye-Lund @ 2011-08-01 18:02 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, j6t

On Mon, Aug 1, 2011 at 7:46 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Aug 01, 2011 at 04:45:03PM +0200, Erik Faye-Lund wrote:
>
>> > And that patch would look something like the one below.
>>
>> This is the most straight forward way of doing this, thanks. I'll post
>> a new version with this squashed in soon.
>>
>> What's the proper way of attributing you for the work?
>
> If you're squashing, just make a note in the text, or add a Helped-by
> header at the end.
>

OK, that's basically what I did already ;)

>> > You can also now
>> > drop the "remote" parameter to write_archive entirely, but I didn't do
>> > so here.
>>
>> I'm not entirely sure how you propose we do this... do you mean by
>> hoisting the relevant logic from write_archive up to cmd_archive or
>> something?
>
> Sorry, I wrote that comment, then totally changed how my patch was
> implemented and failed to update the comment. :)
>
> The arguments to git-archive actually get parsed in two places:
>
>  1. cmd_archive passes them to decide on meta stuff like remote vs
>     local
>
>  2. write_archive has a parse_archive_args function to actually parse
>     the arguments that both upload-archive and archive share
>
> My patch puts the new argument in (1). We could put it in (2) and do
> away with passing the flag all the way down the call stack. But it seems
> conceptually cleaner to me to have it in (1) (and the diff is much
> shorter, too).
>

Makes sense. Thanks for the explanation, I agree.

>> $ make t5000-tar-tree
>> *** t5000-tar-tree.sh ***
>> ok 1 - populate workdir
>> <snip>
>> ok 53 - infer tgz from .tar.gz filename
>> not ok - 54 extract tgz file
>> #
>> #               $GUNZIP -c <j.tgz >j.tar &&
>> #               test_cmp b.tar j.tar
>> #
>> not ok - 55 remote tar.gz is allowed by default
>> #
>> #               git archive --remote=. --format=tar.gz HEAD >remote.tar.gz &&
>> #               test_cmp j.tgz remote.tar.gz
>> #
>> ok 56 - remote tar.gz can be disabled
>> # failed 2 among 56 test(s)
>> 1..56
>> make: *** [t5000-tar-tree.sh] Error 1
>>
>> It seems "git archive --format=tgz HEAD" is broken on Windows:
>> $ git archive --format=tgz HEAD > j.tgz
>> $ gunzip -c j.tgz > j.tar && echo ok
>>
>> gzip: j.tgz: invalid compressed data--format violated
>>
>> But if I perform the compression manually, the archive is fine:
>> $ git archive --format=tar HEAD | gzip -cn > j.tgz
>> $ gunzip -c j.tgz > j.tar && echo ok
>> ok
>
> Weird. What does j.tgz end up looking like? Is it empty, or does it have
> bogus data in it? Does gzip actually get invoked at all? Try running
> with GIT_TRACE=1. I don't suppose you guys have something like strace,
> which might be helpful.
>

It does have data, and gzip does get invoked:
$ gunzip -c j.tgz | wc -c

gzip: j.tgz: invalid compressed data--format violated
 131072

So it seems there are around 130k of valid data before it barfs.

Hmm, but when I try the same after re-running the test, I get a
different amount of valid data (491520 bytes this time)... Is this a
timing-related issue?

Yeah, strace would have been useful, but we don't have that on
Windows. There are probably other alternatives, but I'm not aware...
perhaps procmon can be of help?

>> So, the big question is, are all tar-filters broken on Windows? It
>> seems not; the tests for the foo-filter works fine... any clues? Could
>> it somehow be newline-related, perhaps?
>
> I'm not sure that newlines would make a difference. We use straight
> write() everywhere in the archive code, which should be binary safe.
>

OK, so that hunch was a bad one ;)

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-01 18:02                 ` Erik Faye-Lund
@ 2011-08-01 18:25                   ` Jeff King
  2011-08-01 20:48                     ` René Scharfe
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2011-08-01 18:25 UTC (permalink / raw)
  To: Erik Faye-Lund; +Cc: Junio C Hamano, git, j6t

On Mon, Aug 01, 2011 at 08:02:22PM +0200, Erik Faye-Lund wrote:

> > Weird. What does j.tgz end up looking like? Is it empty, or does it have
> > bogus data in it? Does gzip actually get invoked at all? Try running
> > with GIT_TRACE=1. I don't suppose you guys have something like strace,
> > which might be helpful.
> >
> 
> It does have data, and gzip does get invoked:
> $ gunzip -c j.tgz | wc -c
> 
> gzip: j.tgz: invalid compressed data--format violated
>  131072
> 
> So it seems there are around 130k of valid data before it barfs.
> 
> Hmm, but when I try the same after re-running the test, I get a
> different amount of valid data (491520 bytes this time)... Is this a
> timing-related issue?

Hmm. Non-deterministic output is not good. Could be timing, or we could
be touching memory we're not supposed to. I just ran t5000 through
valgrind, and nothing turned up. And of course I can't replicate the
test failure on Linux. So I assume it really is Windows-specific.

That makes me suspect some difference in how the run-command API works
on the two platforms, since that is the code here that will be vastly
difference. I notice the start_command code uses dup() on Windows
instead of dup2() (I guess you guys don't have dup2?). I wonder if there
could be some issue with another descriptor accidentally pointing to the
same spot. But that's just a wild guess.

I think at this point, I'd probably start stepping through the archive
code with gdb.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-01 18:25                   ` Jeff King
@ 2011-08-01 20:48                     ` René Scharfe
  2011-08-01 21:20                       ` Johannes Sixt
  0 siblings, 1 reply; 33+ messages in thread
From: René Scharfe @ 2011-08-01 20:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Erik Faye-Lund, Junio C Hamano, git, j6t

Am 01.08.2011 20:25, schrieb Jeff King:
> On Mon, Aug 01, 2011 at 08:02:22PM +0200, Erik Faye-Lund wrote:
> 
>>> Weird. What does j.tgz end up looking like? Is it empty, or does it have
>>> bogus data in it? Does gzip actually get invoked at all? Try running
>>> with GIT_TRACE=1. I don't suppose you guys have something like strace,
>>> which might be helpful.
>>>
>>
>> It does have data, and gzip does get invoked:
>> $ gunzip -c j.tgz | wc -c
>>
>> gzip: j.tgz: invalid compressed data--format violated
>>  131072
>>
>> So it seems there are around 130k of valid data before it barfs.
>>
>> Hmm, but when I try the same after re-running the test, I get a
>> different amount of valid data (491520 bytes this time)... Is this a
>> timing-related issue?
> 
> Hmm. Non-deterministic output is not good. Could be timing, or we could
> be touching memory we're not supposed to. I just ran t5000 through
> valgrind, and nothing turned up. And of course I can't replicate the
> test failure on Linux. So I assume it really is Windows-specific.
> 
> That makes me suspect some difference in how the run-command API works
> on the two platforms, since that is the code here that will be vastly
> difference. I notice the start_command code uses dup() on Windows
> instead of dup2() (I guess you guys don't have dup2?). I wonder if there
> could be some issue with another descriptor accidentally pointing to the
> same spot. But that's just a wild guess.
> 
> I think at this point, I'd probably start stepping through the archive
> code with gdb.

Hrm, here is another data point, for what it's worth (Windows 7 x64,
MinGW):

	$ git archive v1.7.6 | gzip -cn | md5sum
	a0ca1c873a533a5fcd41d248fb325a5b *-

	$ git archive --format=tar.gz v1.7.6 | md5sum
	a0ca1c873a533a5fcd41d248fb325a5b *-

	$ git archive v1.7.6 | gzip -cn >a.tgz && md5sum <a.tgz
	a0ca1c873a533a5fcd41d248fb325a5b *-

	$ ls -l a.tgz
	-rw-r--r--    1 User     Administ  3361584 Aug  1 20:44 a.tgz

	$ git archive --format=tar.gz v1.7.6 >a.tgz && md5sum <a.tgz
	30886283af1aed05ae6a36fc5aeda077 *-

	$ ls -l a.tgz
	-rw-r--r--    1 User     Administ  3374383 Aug  1 20:45 a.tgz

	$ git archive --format=tar.gz v1.7.6 >a.tgz && md5sum <a.tgz
	30886283af1aed05ae6a36fc5aeda077 *-

	$ git archive --format=tar.gz v1.7.6 >a.tgz && md5sum <a.tgz
	30886283af1aed05ae6a36fc5aeda077 *-

	$ git archive v1.7.6 | gzip -cn >a.tgz && md5sum <a.tgz
	a0ca1c873a533a5fcd41d248fb325a5b *-

	$ git archive --format=tar.gz v1.7.6 | cat >a.tgz && md5sum <a.tgz
	a0ca1c873a533a5fcd41d248fb325a5b *-

	$ git archive -o a.tgz v1.7.6 && md5sum <a.tgz
	30886283af1aed05ae6a36fc5aeda077 *-

So git archive gives the right results when writing to a pipe, but
always the same wrong result when writing directly to a file.  With
--format=tar there is no difference:

	$ git archive v1.7.6 | md5sum
	fa364f38f9128df019dd8fd509c40a6d *-

	$ git archive v1.7.6 >a.tar && md5sum <a.tar
	fa364f38f9128df019dd8fd509c40a6d *-

René

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-01 20:48                     ` René Scharfe
@ 2011-08-01 21:20                       ` Johannes Sixt
  2011-08-01 21:42                         ` René Scharfe
  2011-08-01 21:52                         ` René Scharfe
  0 siblings, 2 replies; 33+ messages in thread
From: Johannes Sixt @ 2011-08-01 21:20 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, Erik Faye-Lund, Junio C Hamano, git

Am 01.08.2011 22:48, schrieb René Scharfe:
> So git archive gives the right results when writing to a pipe, but
> always the same wrong result when writing directly to a file.

This could indeed be a CRLF issue. archive-tar.c runs gzip to let it
write to the original fd 1 (stdout). gzip is an MSYS program, and MSYS
is "clever" and sets up the channel in text mode (CRLF conversion) if it
is a regular file, but in binary mode if it is a pipe.

Without the gzip filter, git-archive writes to stdout itself. Since we
have set up all our channels in binary mode, we do not suffer from the
same problem for plain tar format.

So, I don't think we can do a lot about it, short of patching MSYS again...

-- Hannes

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-01 21:20                       ` Johannes Sixt
@ 2011-08-01 21:42                         ` René Scharfe
  2011-08-01 21:52                         ` René Scharfe
  1 sibling, 0 replies; 33+ messages in thread
From: René Scharfe @ 2011-08-01 21:42 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, Erik Faye-Lund, Junio C Hamano, git

Am 01.08.2011 23:20, schrieb Johannes Sixt:
> Am 01.08.2011 22:48, schrieb René Scharfe:
>> So git archive gives the right results when writing to a pipe, but
>> always the same wrong result when writing directly to a file.
> 
> This could indeed be a CRLF issue. archive-tar.c runs gzip to let it
> write to the original fd 1 (stdout). gzip is an MSYS program, and MSYS
> is "clever" and sets up the channel in text mode (CRLF conversion) if it
> is a regular file, but in binary mode if it is a pipe.
> 
> Without the gzip filter, git-archive writes to stdout itself. Since we
> have set up all our channels in binary mode, we do not suffer from the
> same problem for plain tar format.

That seems to be it indeed:

	$ git config tar.t.command 'tar tf -'
	$ git archive --format=t v1.7.6 >a.t
	$ git archive --format=t v1.7.6 | cat >b.t

	$ diff -b a.t b.t | wc -l
	      0

	$ md5sum <a.t
	8026ba48963ec4f849ae863f822bccbc *-

	$ md5sum <b.t
	7bcaa37bfcc47b1ae535369ae0399d90 *-

	$ sed 's/\r//' <a.t | md5sum
	7bcaa37bfcc47b1ae535369ae0399d90 *-

René

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-01 21:20                       ` Johannes Sixt
  2011-08-01 21:42                         ` René Scharfe
@ 2011-08-01 21:52                         ` René Scharfe
  2011-08-02  4:00                           ` Jeff King
  1 sibling, 1 reply; 33+ messages in thread
From: René Scharfe @ 2011-08-01 21:52 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, Erik Faye-Lund, Junio C Hamano, git

Am 01.08.2011 23:20, schrieb Johannes Sixt:
> Am 01.08.2011 22:48, schrieb René Scharfe:
>> So git archive gives the right results when writing to a pipe, but
>> always the same wrong result when writing directly to a file.
> 
> This could indeed be a CRLF issue. archive-tar.c runs gzip to let it
> write to the original fd 1 (stdout). gzip is an MSYS program, and MSYS
> is "clever" and sets up the channel in text mode (CRLF conversion) if it
> is a regular file, but in binary mode if it is a pipe.
> 
> Without the gzip filter, git-archive writes to stdout itself. Since we
> have set up all our channels in binary mode, we do not suffer from the
> same problem for plain tar format.
> 
> So, I don't think we can do a lot about it, short of patching MSYS again...

Or we could pipe the output through us, i.e. attach a builtin version of
cat at the output end of the called command.  Only on Windows, of
course.  Better ugly and limping then wrong, right?

René

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-01 21:52                         ` René Scharfe
@ 2011-08-02  4:00                           ` Jeff King
  2011-08-02 16:46                             ` René Scharfe
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2011-08-02  4:00 UTC (permalink / raw)
  To: René Scharfe; +Cc: Johannes Sixt, Erik Faye-Lund, Junio C Hamano, git

On Mon, Aug 01, 2011 at 11:52:43PM +0200, René Scharfe wrote:

> Am 01.08.2011 23:20, schrieb Johannes Sixt:
> > Am 01.08.2011 22:48, schrieb René Scharfe:
> >> So git archive gives the right results when writing to a pipe, but
> >> always the same wrong result when writing directly to a file.
> > 
> > This could indeed be a CRLF issue. archive-tar.c runs gzip to let it
> > write to the original fd 1 (stdout). gzip is an MSYS program, and MSYS
> > is "clever" and sets up the channel in text mode (CRLF conversion) if it
> > is a regular file, but in binary mode if it is a pipe.
> > 
> > Without the gzip filter, git-archive writes to stdout itself. Since we
> > have set up all our channels in binary mode, we do not suffer from the
> > same problem for plain tar format.
> > 
> > So, I don't think we can do a lot about it, short of patching MSYS again...
> 
> Or we could pipe the output through us, i.e. attach a builtin version of
> cat at the output end of the called command.  Only on Windows, of
> course.  Better ugly and limping then wrong, right?

Yeah, that would work. But I am confused. If what Johannes says is true,
isn't MSYS gzip totally broken for:

  # works
  echo foo | gzip -c | cat >foo.gz

  # broken; introduces CR
  echo foo | gzip -c >foo.gz

? (The "works" and "broken" there are my guesses; I don't have a Windows
box to test on). IOW, is it simply gzip that is broken, and any fix we
do is simply working around a bug in gzip? And therefore the right
solution is for MSYS people to fix gzip?

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-02  4:00                           ` Jeff King
@ 2011-08-02 16:46                             ` René Scharfe
  2011-08-02 18:13                               ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: René Scharfe @ 2011-08-02 16:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Erik Faye-Lund, Junio C Hamano, git

Am 02.08.2011 06:00, schrieb Jeff King:
> On Mon, Aug 01, 2011 at 11:52:43PM +0200, René Scharfe wrote:
> 
>> Am 01.08.2011 23:20, schrieb Johannes Sixt:
>>> Am 01.08.2011 22:48, schrieb René Scharfe:
>>>> So git archive gives the right results when writing to a pipe, but
>>>> always the same wrong result when writing directly to a file.
>>>
>>> This could indeed be a CRLF issue. archive-tar.c runs gzip to let it
>>> write to the original fd 1 (stdout). gzip is an MSYS program, and MSYS
>>> is "clever" and sets up the channel in text mode (CRLF conversion) if it
>>> is a regular file, but in binary mode if it is a pipe.
>>>
>>> Without the gzip filter, git-archive writes to stdout itself. Since we
>>> have set up all our channels in binary mode, we do not suffer from the
>>> same problem for plain tar format.
>>>
>>> So, I don't think we can do a lot about it, short of patching MSYS again...
>>
>> Or we could pipe the output through us, i.e. attach a builtin version of
>> cat at the output end of the called command.  Only on Windows, of
>> course.  Better ugly and limping then wrong, right?
> 
> Yeah, that would work. But I am confused. If what Johannes says is true,
> isn't MSYS gzip totally broken for:
> 
>   # works
>   echo foo | gzip -c | cat >foo.gz
> 
>   # broken; introduces CR
>   echo foo | gzip -c >foo.gz
> 
> ? (The "works" and "broken" there are my guesses; I don't have a Windows
> box to test on). IOW, is it simply gzip that is broken, and any fix we
> do is simply working around a bug in gzip? And therefore the right
> solution is for MSYS people to fix gzip?

"foo" may be too short to trigger the issue as the small resulting gz
file has a low probability of containing LFs.

The output of gzip is not simply always mangled, though (taken from my
earlier email, all three are working):

	$ git archive v1.7.6 | gzip -cn | md5sum
	a0ca1c873a533a5fcd41d248fb325a5b *-

	$ git archive --format=tar.gz v1.7.6 | md5sum
	a0ca1c873a533a5fcd41d248fb325a5b *-

	$ git archive v1.7.6 | gzip -cn >a.tgz && md5sum <a.tgz
	a0ca1c873a533a5fcd41d248fb325a5b *-

It's only broken if we call it from git archive:

	$ git archive --format=tar.gz v1.7.6 >a.tgz && md5sum <a.tgz
	30886283af1aed05ae6a36fc5aeda077 *-

	$ git archive -o a.tgz v1.7.6 && md5sum <a.tgz
	30886283af1aed05ae6a36fc5aeda077 *-

But not if we stuff the result into a pipe instead of a file:

	$ git archive --format=tar.gz v1.7.6 | cat >a.tgz && md5sum <a.tgz
	a0ca1c873a533a5fcd41d248fb325a5b *-

It _is_ confusing.

René

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-02 16:46                             ` René Scharfe
@ 2011-08-02 18:13                               ` Jeff King
  2011-08-02 23:37                                 ` Johannes Sixt
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2011-08-02 18:13 UTC (permalink / raw)
  To: René Scharfe; +Cc: Johannes Sixt, Erik Faye-Lund, Junio C Hamano, git

On Tue, Aug 02, 2011 at 06:46:20PM +0200, René Scharfe wrote:

> The output of gzip is not simply always mangled, though (taken from my
> earlier email, all three are working):
> 
> 	$ git archive v1.7.6 | gzip -cn | md5sum
> 	a0ca1c873a533a5fcd41d248fb325a5b *-
> 
> 	$ git archive --format=tar.gz v1.7.6 | md5sum
> 	a0ca1c873a533a5fcd41d248fb325a5b *-
> 
> 	$ git archive v1.7.6 | gzip -cn >a.tgz && md5sum <a.tgz
> 	a0ca1c873a533a5fcd41d248fb325a5b *-
> 
> It's only broken if we call it from git archive:
> 
> 	$ git archive --format=tar.gz v1.7.6 >a.tgz && md5sum <a.tgz
> 	30886283af1aed05ae6a36fc5aeda077 *-
> 
> 	$ git archive -o a.tgz v1.7.6 && md5sum <a.tgz
> 	30886283af1aed05ae6a36fc5aeda077 *-
> 
> But not if we stuff the result into a pipe instead of a file:
> 
> 	$ git archive --format=tar.gz v1.7.6 | cat >a.tgz && md5sum <a.tgz
> 	a0ca1c873a533a5fcd41d248fb325a5b *-
> 
> It _is_ confusing.

Hmm. So it's not _just_ the pipe vs file thing. What's different about
calling it from the shell, versus the way we call it from git-archive?

IOW, the pipe trick you mentioned is one way to fix it; but it may be
more elegant to do whatever it is the shell is doing to get those
results (unless the shell is secretly putting a "cat" into the pipeline
:) ).

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-02 18:13                               ` Jeff King
@ 2011-08-02 23:37                                 ` Johannes Sixt
  2011-08-03  5:49                                   ` Jeff King
  2011-08-06  9:40                                   ` René Scharfe
  0 siblings, 2 replies; 33+ messages in thread
From: Johannes Sixt @ 2011-08-02 23:37 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, Erik Faye-Lund, Junio C Hamano, git

Am 02.08.2011 20:13, schrieb Jeff King:
> Hmm. So it's not _just_ the pipe vs file thing. What's different about
> calling it from the shell, versus the way we call it from git-archive?

When the parent process of an MSYS process is itself an MSYS process,
such as bash, then the child does not do its own
binary-mode-vs.-text-mode detection, but just uses whatever it is told
by the parent. This is achieved by MSYS's fork emulation.

But if the parent is a regular Windows program, such as git(-archive),
then the autodection happens and file descriptors pointing to files are
put into text mode.

-- Hannes

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-02 23:37                                 ` Johannes Sixt
@ 2011-08-03  5:49                                   ` Jeff King
  2011-08-06  9:40                                   ` René Scharfe
  1 sibling, 0 replies; 33+ messages in thread
From: Jeff King @ 2011-08-03  5:49 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: René Scharfe, Erik Faye-Lund, Junio C Hamano, git

On Wed, Aug 03, 2011 at 01:37:57AM +0200, Johannes Sixt wrote:

> Am 02.08.2011 20:13, schrieb Jeff King:
> > Hmm. So it's not _just_ the pipe vs file thing. What's different about
> > calling it from the shell, versus the way we call it from git-archive?
> 
> When the parent process of an MSYS process is itself an MSYS process,
> such as bash, then the child does not do its own
> binary-mode-vs.-text-mode detection, but just uses whatever it is told
> by the parent. This is achieved by MSYS's fork emulation.
> 
> But if the parent is a regular Windows program, such as git(-archive),
> then the autodection happens and file descriptors pointing to files are
> put into text mode.

Yuck. Well, I guess that's not really an option, then. The pipe trick
sounds like the sanest option (it would even be trivial to switch the
default value to "gzip -cn | cat" on Windows, but I assume "cat" has the
same problem).

Thanks for the explanation.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-02 23:37                                 ` Johannes Sixt
  2011-08-03  5:49                                   ` Jeff King
@ 2011-08-06  9:40                                   ` René Scharfe
  2011-08-07 20:02                                     ` Johannes Sixt
  1 sibling, 1 reply; 33+ messages in thread
From: René Scharfe @ 2011-08-06  9:40 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, Erik Faye-Lund, Junio C Hamano, git

Am 03.08.2011 01:37, schrieb Johannes Sixt:
> Am 02.08.2011 20:13, schrieb Jeff King:
>> Hmm. So it's not _just_ the pipe vs file thing. What's different about
>> calling it from the shell, versus the way we call it from git-archive?
> 
> When the parent process of an MSYS process is itself an MSYS process,
> such as bash, then the child does not do its own
> binary-mode-vs.-text-mode detection, but just uses whatever it is told
> by the parent. This is achieved by MSYS's fork emulation.
> 
> But if the parent is a regular Windows program, such as git(-archive),
> then the autodection happens and file descriptors pointing to files are
> put into text mode.

So here's an ugly patch to implement an internal passthrough filter to
avoid newline conversions.  It makes the tar filter command (gzip etc.)
write to a pipe instead of directly to a file.

The patch does this unconditionally, which is a waste on all unaffected
systems, of course.  We could #ifdef it out, but is there perhaps a nice
way to integrate this functionality into run_command/finish_command?

René

---
 archive-tar.c |   31 +++++++++++++++++++++++++++++++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index 20af005..22d52c5 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -314,6 +314,25 @@ static int write_tar_archive(const struct archiver *ar,
 	return err;
 }
 
+static int cat_fn(int in, int out, void *data)
+{
+	for (;;) {
+		char buf[16 * 1024];
+		ssize_t got = xread(in, buf, sizeof(buf));
+		if (got == 0)
+			break;
+		if (got < 0) {
+			if (errno == EPIPE)
+				break;
+			die_errno("read errror");
+		}
+		write_or_die(out, buf, got);
+	}
+	close(in);
+	close(out);
+	return 0;
+}
+
 static int write_tar_filter_archive(const struct archiver *ar,
 				    struct archiver_args *args)
 {
@@ -321,6 +340,14 @@ static int write_tar_filter_archive(const struct archiver *ar,
 	struct child_process filter;
 	const char *argv[2];
 	int r;
+	struct async cat;
+
+	memset(&cat, 0, sizeof(cat));
+	cat.proc = cat_fn;
+	cat.in = -1;
+	cat.out = dup(1);
+	if (start_async(&cat))
+		die("unable to start passthrough filter");
 
 	if (!ar->data)
 		die("BUG: tar-filter archiver called with no filter defined");
@@ -335,6 +362,7 @@ static int write_tar_filter_archive(const struct archiver *ar,
 	filter.argv = argv;
 	filter.use_shell = 1;
 	filter.in = -1;
+	filter.out = cat.in;
 
 	if (start_command(&filter) < 0)
 		die_errno("unable to start '%s' filter", argv[0]);
@@ -349,6 +377,9 @@ static int write_tar_filter_archive(const struct archiver *ar,
 	if (finish_command(&filter) != 0)
 		die("'%s' filter reported error", argv[0]);
 
+	if (finish_async(&cat))
+		die("passthrough filter reported error");
+
 	strbuf_release(&cmd);
 	return r;
 }

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-06  9:40                                   ` René Scharfe
@ 2011-08-07 20:02                                     ` Johannes Sixt
  2011-08-07 21:06                                       ` Jeff King
  2011-08-08 17:10                                       ` René Scharfe
  0 siblings, 2 replies; 33+ messages in thread
From: Johannes Sixt @ 2011-08-07 20:02 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, Erik Faye-Lund, Junio C Hamano, git

Am 06.08.2011 11:40, schrieb René Scharfe:
> So here's an ugly patch to implement an internal passthrough filter to
> avoid newline conversions.  It makes the tar filter command (gzip etc.)
> write to a pipe instead of directly to a file.

*If* we are working around the CRLF conversion issue in git itself,
wouldn't it be much simpler to just:

diff --git a/archive-tar.c b/archive-tar.c
index 20af005..da3d98a 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -364,9 +364,9 @@ void init_tar_archiver(void)
 	int i;
 	register_archiver(&tar_archiver);

-	tar_filter_config("tar.tgz.command", "gzip -cn", NULL);
+	tar_filter_config("tar.tgz.command", "gzip -cn | cat", NULL);
 	tar_filter_config("tar.tgz.remote", "true", NULL);
-	tar_filter_config("tar.tar.gz.command", "gzip -cn", NULL);
+	tar_filter_config("tar.tar.gz.command", "gzip -cn | cat", NULL);
 	tar_filter_config("tar.tar.gz.remote", "true", NULL);
 	git_config(git_tar_config, NULL);
 	for (i = 0; i < nr_tar_filters; i++) {

(provided that 'cat' magically does not suffer from the same problem,
and I do think that it does not.)

Anyway, I think it would be better to address the problem on the msysgit
side. Perhaps by providing a gzip of a different vintage (e.g. a
self-compiled one) that does not suffer from the CRLF conversion issue.

-- Hannes

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-07 20:02                                     ` Johannes Sixt
@ 2011-08-07 21:06                                       ` Jeff King
  2011-08-08 17:10                                       ` René Scharfe
  1 sibling, 0 replies; 33+ messages in thread
From: Jeff King @ 2011-08-07 21:06 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: René Scharfe, Erik Faye-Lund, Junio C Hamano, git

On Sun, Aug 07, 2011 at 10:02:03PM +0200, Johannes Sixt wrote:

> Am 06.08.2011 11:40, schrieb René Scharfe:
> > So here's an ugly patch to implement an internal passthrough filter to
> > avoid newline conversions.  It makes the tar filter command (gzip etc.)
> > write to a pipe instead of directly to a file.
> 
> *If* we are working around the CRLF conversion issue in git itself,
> wouldn't it be much simpler to just:
> 
> diff --git a/archive-tar.c b/archive-tar.c
> index 20af005..da3d98a 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -364,9 +364,9 @@ void init_tar_archiver(void)
>  	int i;
>  	register_archiver(&tar_archiver);
> 
> -	tar_filter_config("tar.tgz.command", "gzip -cn", NULL);
> +	tar_filter_config("tar.tgz.command", "gzip -cn | cat", NULL);
>  	tar_filter_config("tar.tgz.remote", "true", NULL);
> -	tar_filter_config("tar.tar.gz.command", "gzip -cn", NULL);
> +	tar_filter_config("tar.tar.gz.command", "gzip -cn | cat", NULL);
>  	tar_filter_config("tar.tar.gz.remote", "true", NULL);
>  	git_config(git_tar_config, NULL);
>  	for (i = 0; i < nr_tar_filters; i++) {
> 
> (provided that 'cat' magically does not suffer from the same problem,
> and I do think that it does not.)

I like that much better, but assumed cat was broken. It might be better
still to have a GZIP_FILTER variable in the Makefile. That would let
msysgit do this, but would also let people with a funny path to gzip
define it at build time (they could also just reconfigure it, of course,
but it is nice to work out of the box on odd platforms).

> Anyway, I think it would be better to address the problem on the msysgit
> side. Perhaps by providing a gzip of a different vintage (e.g. a
> self-compiled one) that does not suffer from the CRLF conversion issue.

Yeah, that would make me happy, too. :)

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-07 20:02                                     ` Johannes Sixt
  2011-08-07 21:06                                       ` Jeff King
@ 2011-08-08 17:10                                       ` René Scharfe
  2011-08-09  5:02                                         ` Jeff King
  1 sibling, 1 reply; 33+ messages in thread
From: René Scharfe @ 2011-08-08 17:10 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, Erik Faye-Lund, Junio C Hamano, git

Am 07.08.2011 22:02, schrieb Johannes Sixt:
> Am 06.08.2011 11:40, schrieb René Scharfe:
>> So here's an ugly patch to implement an internal passthrough filter to
>> avoid newline conversions.  It makes the tar filter command (gzip etc.)
>> write to a pipe instead of directly to a file.
> 
> *If* we are working around the CRLF conversion issue in git itself,
> wouldn't it be much simpler to just:
> 
> diff --git a/archive-tar.c b/archive-tar.c
> index 20af005..da3d98a 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -364,9 +364,9 @@ void init_tar_archiver(void)
>  	int i;
>  	register_archiver(&tar_archiver);
> 
> -	tar_filter_config("tar.tgz.command", "gzip -cn", NULL);
> +	tar_filter_config("tar.tgz.command", "gzip -cn | cat", NULL);
>  	tar_filter_config("tar.tgz.remote", "true", NULL);
> -	tar_filter_config("tar.tar.gz.command", "gzip -cn", NULL);
> +	tar_filter_config("tar.tar.gz.command", "gzip -cn | cat", NULL);
>  	tar_filter_config("tar.tar.gz.remote", "true", NULL);
>  	git_config(git_tar_config, NULL);
>  	for (i = 0; i < nr_tar_filters; i++) {
> 
> (provided that 'cat' magically does not suffer from the same problem,
> and I do think that it does not.)

The external cat can indeed be used.  We'd need to do that for user
supplied commands as well, though, like this (ugh):

diff --git a/archive-tar.c b/archive-tar.c
index 20af005..eaa9a1c 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -326,6 +326,9 @@ static int write_tar_filter_archive(const struct archiver *ar,
 		die("BUG: tar-filter archiver called with no filter defined");
 
 	strbuf_addstr(&cmd, ar->data);
+#ifdef WIN32
+	strbuf_addstr(&cmd, " | cat");
+#endif
 	if (args->compression_level >= 0)
 		strbuf_addf(&cmd, " -%d", args->compression_level);
 

> Anyway, I think it would be better to address the problem on the msysgit
> side. Perhaps by providing a gzip of a different vintage (e.g. a
> self-compiled one) that does not suffer from the CRLF conversion issue.

Users would probably also need sanitized versions of bzip2 and xz, and
perhaps more.

If MSYS could be asked to refrain from switching file descriptors to text
mode, e.g. by setting an environment variable, we could solve the issue
in a generic way instead.

René

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-08 17:10                                       ` René Scharfe
@ 2011-08-09  5:02                                         ` Jeff King
  2011-08-09 10:25                                           ` René Scharfe
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2011-08-09  5:02 UTC (permalink / raw)
  To: René Scharfe; +Cc: Johannes Sixt, Erik Faye-Lund, Junio C Hamano, git

On Mon, Aug 08, 2011 at 07:10:01PM +0200, René Scharfe wrote:

> > -	tar_filter_config("tar.tgz.command", "gzip -cn", NULL);
> > +	tar_filter_config("tar.tgz.command", "gzip -cn | cat", NULL);
> >  	tar_filter_config("tar.tgz.remote", "true", NULL);
> > -	tar_filter_config("tar.tar.gz.command", "gzip -cn", NULL);
> > +	tar_filter_config("tar.tar.gz.command", "gzip -cn | cat", NULL);
> >  	tar_filter_config("tar.tar.gz.remote", "true", NULL);
> >  	git_config(git_tar_config, NULL);
> >  	for (i = 0; i < nr_tar_filters; i++) {
> > 
> > (provided that 'cat' magically does not suffer from the same problem,
> > and I do think that it does not.)
> 
> The external cat can indeed be used.  We'd need to do that for user
> supplied commands as well, though, like this (ugh):
> 
> diff --git a/archive-tar.c b/archive-tar.c
> index 20af005..eaa9a1c 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -326,6 +326,9 @@ static int write_tar_filter_archive(const struct archiver *ar,
>  		die("BUG: tar-filter archiver called with no filter defined");
>  
>  	strbuf_addstr(&cmd, ar->data);
> +#ifdef WIN32
> +	strbuf_addstr(&cmd, " | cat");
> +#endif
>  	if (args->compression_level >= 0)
>  		strbuf_addf(&cmd, " -%d", args->compression_level);

Do we need to? It seems to me that defaulting to "gzip -cn | cat" is not
"we are on Windows, a platform that needs a special workaround in git",
but rather "this gzip is horribly broken, but at build-time you can
set a gzip that works".

So if the user wants to specify some broken filter, it is up to them to
add "| cat" if their filter merits it.

But that is somewhat a matter of perception, and it won't make a user on
Windows who does "git config archive.bz2 bzip2 -c" any happier when they
are told it is their responsibility to deal with it.

BTW, as nice as this "gzip -cn | cat" idea is, I think it needs to be
wrapped in a shell script. With the code above, we will generate "gzip
-cn | cat -9". So we really need:

  $ cat `which gzip`
  #!/bin/sh
  gzip.real -cn "$@" | cat

and then no hacks need to go into git at all. The fix is about providing
a sane gzip, not fixing git.

BTW, from what Johannes said, the issue is about a non-msys program
calling an msys one. Does that mean that having git run:

  sh -c 'gzip -cn'

would work? If so, then could the solution be as simple as turning off
the "don't bother with the shell" optimization that run-command uses?
Something like "gzip -cn" gets split by git and run via spawn now
(because it has no metacharacters). But we could easily make it always
run through the shell.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-09  5:02                                         ` Jeff King
@ 2011-08-09 10:25                                           ` René Scharfe
  2011-08-09 20:05                                             ` Jeff King
  0 siblings, 1 reply; 33+ messages in thread
From: René Scharfe @ 2011-08-09 10:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Erik Faye-Lund, Junio C Hamano, git

Am 09.08.2011 07:02, schrieb Jeff King:
> On Mon, Aug 08, 2011 at 07:10:01PM +0200, René Scharfe wrote:
> 
>>> -	tar_filter_config("tar.tgz.command", "gzip -cn", NULL);
>>> +	tar_filter_config("tar.tgz.command", "gzip -cn | cat", NULL);
>>>  	tar_filter_config("tar.tgz.remote", "true", NULL);
>>> -	tar_filter_config("tar.tar.gz.command", "gzip -cn", NULL);
>>> +	tar_filter_config("tar.tar.gz.command", "gzip -cn | cat", NULL);
>>>  	tar_filter_config("tar.tar.gz.remote", "true", NULL);
>>>  	git_config(git_tar_config, NULL);
>>>  	for (i = 0; i < nr_tar_filters; i++) {
>>>
>>> (provided that 'cat' magically does not suffer from the same problem,
>>> and I do think that it does not.)
>>
>> The external cat can indeed be used.  We'd need to do that for user
>> supplied commands as well, though, like this (ugh):
>>
>> diff --git a/archive-tar.c b/archive-tar.c
>> index 20af005..eaa9a1c 100644
>> --- a/archive-tar.c
>> +++ b/archive-tar.c
>> @@ -326,6 +326,9 @@ static int write_tar_filter_archive(const struct archiver *ar,
>>  		die("BUG: tar-filter archiver called with no filter defined");
>>  
>>  	strbuf_addstr(&cmd, ar->data);
>> +#ifdef WIN32
>> +	strbuf_addstr(&cmd, " | cat");
>> +#endif
>>  	if (args->compression_level >= 0)
>>  		strbuf_addf(&cmd, " -%d", args->compression_level);
> 
> Do we need to? It seems to me that defaulting to "gzip -cn | cat" is not
> "we are on Windows, a platform that needs a special workaround in git",
> but rather "this gzip is horribly broken, but at build-time you can
> set a gzip that works".

It's an MSYS platform problem IMHO, and we can work around it.  That
extra cat is nothing to be proud of, sure.  And msysgit avoiding the
issue by distibuting a gzip that never converts line endings would solve
it without any runtime overhead.

> So if the user wants to specify some broken filter, it is up to them to
> add "| cat" if their filter merits it.

Well, I'd rather take this responsibility from the user, who might
struggle a while before finding out what is going on.  However...

> But that is somewhat a matter of perception, and it won't make a user on
> Windows who does "git config archive.bz2 bzip2 -c" any happier when they
> are told it is their responsibility to deal with it.

If all packers shipped with msysgit are usable then we're probably good.

> BTW, as nice as this "gzip -cn | cat" idea is, I think it needs to be
> wrapped in a shell script. With the code above, we will generate "gzip
> -cn | cat -9".

Yes, the three added lines in the patch above would have to be moved
down two lines, after the compression level is added.  D'oh!

> So we really need:
> 
>   $ cat `which gzip`
>   #!/bin/sh
>   gzip.real -cn "$@" | cat
>
> and then no hacks need to go into git at all. The fix is about providing
> a sane gzip, not fixing git.

OK, that's one way to do it; another would be let gzip (and bzip2 etc.)
do whatever cat does to avoid end of line conversions.  And yet another
is to take them from http://unxutils.sourceforge.net/.

> BTW, from what Johannes said, the issue is about a non-msys program
> calling an msys one. Does that mean that having git run:
> 
>   sh -c 'gzip -cn'
> 
> would work? If so, then could the solution be as simple as turning off
> the "don't bother with the shell" optimization that run-command uses?
> Something like "gzip -cn" gets split by git and run via spawn now
> (because it has no metacharacters). But we could easily make it always
> run through the shell.

Just checked -- it doesn't work.  I assume that's because the shell is
also an MSYS program.

René

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-09 10:25                                           ` René Scharfe
@ 2011-08-09 20:05                                             ` Jeff King
  2011-09-29 19:54                                               ` Erik Faye-Lund
  0 siblings, 1 reply; 33+ messages in thread
From: Jeff King @ 2011-08-09 20:05 UTC (permalink / raw)
  To: René Scharfe; +Cc: Johannes Sixt, Erik Faye-Lund, Junio C Hamano, git

On Tue, Aug 09, 2011 at 12:25:42PM +0200, René Scharfe wrote:

> > BTW, as nice as this "gzip -cn | cat" idea is, I think it needs to be
> > wrapped in a shell script. With the code above, we will generate "gzip
> > -cn | cat -9".
> 
> Yes, the three added lines in the patch above would have to be moved
> down two lines, after the compression level is added.  D'oh!

Also, is adding "| cat" also sufficient for arbitrary shell code (i.e.,
whatever the user hands us via the config)? I couldn't think of an
example that wouldn't work.

> OK, that's one way to do it; another would be let gzip (and bzip2 etc.)
> do whatever cat does to avoid end of line conversions.  And yet another
> is to take them from http://unxutils.sourceforge.net/.

Yeah, I like all of those solutions better than hacking an extra pipe
into git. I don't know enough to say how painful they are in practice,
though.

> > BTW, from what Johannes said, the issue is about a non-msys program
> > calling an msys one. Does that mean that having git run:
> > 
> >   sh -c 'gzip -cn'
> > 
> > would work? If so, then could the solution be as simple as turning off
> > the "don't bother with the shell" optimization that run-command uses?
> > Something like "gzip -cn" gets split by git and run via spawn now
> > (because it has no metacharacters). But we could easily make it always
> > run through the shell.
> 
> Just checked -- it doesn't work.  I assume that's because the shell is
> also an MSYS program.

Too bad. I guess the first msys program to be run sets the mode of the
output descriptor, and then everybody else inherits that. If there were
a helper or shell builtin to switch to binary mode, we could put it at
the start of the shell pipeline.

If you write a C program that does:

  setmode(1, O_BINARY);

and call it like:

  ./setmode-helper; gzip -cn

does that work? I suspect not, as from my brief reading of the msys gzip
source, it actually calls setmode() itself.

-Peff

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-08-09 20:05                                             ` Jeff King
@ 2011-09-29 19:54                                               ` Erik Faye-Lund
  2011-09-29 20:18                                                 ` René Scharfe
  0 siblings, 1 reply; 33+ messages in thread
From: Erik Faye-Lund @ 2011-09-29 19:54 UTC (permalink / raw)
  To: Jeff King, msysGit; +Cc: René Scharfe, Johannes Sixt, Junio C Hamano, git

OK, this is quite late, but I've got some news on the subject.

On Tue, Aug 9, 2011 at 10:05 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Aug 09, 2011 at 12:25:42PM +0200, René Scharfe wrote:
>
>> > BTW, as nice as this "gzip -cn | cat" idea is, I think it needs to be
>> > wrapped in a shell script. With the code above, we will generate "gzip
>> > -cn | cat -9".
>>
>> Yes, the three added lines in the patch above would have to be moved
>> down two lines, after the compression level is added.  D'oh!
>
> Also, is adding "| cat" also sufficient for arbitrary shell code (i.e.,
> whatever the user hands us via the config)? I couldn't think of an
> example that wouldn't work.
>
>> OK, that's one way to do it; another would be let gzip (and bzip2 etc.)
>> do whatever cat does to avoid end of line conversions.  And yet another
>> is to take them from http://unxutils.sourceforge.net/.
>
> Yeah, I like all of those solutions better than hacking an extra pipe
> into git. I don't know enough to say how painful they are in practice,
> though.
>

It turns out, if I compile gzip myself from source the test pass just
fine. So the problem seems indeed to be MSYS vs non-MSYS applications.

So the way forward is probably to just change to a self-compiled gzip
in msysGit (and Git for Windows by proxy).

I'll post the patch to build a native-gzip to the msysGit mailing list
(I only got 1.2.4 to compile on Windows, but it's the same version as
we have in msysGit so it's probably fine), and post the latest version
of this series soon...

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-09-29 19:54                                               ` Erik Faye-Lund
@ 2011-09-29 20:18                                                 ` René Scharfe
  2011-09-29 20:20                                                   ` Erik Faye-Lund
  0 siblings, 1 reply; 33+ messages in thread
From: René Scharfe @ 2011-09-29 20:18 UTC (permalink / raw)
  To: kusmabite; +Cc: Jeff King, msysGit, Johannes Sixt, Junio C Hamano, git

Am 29.09.2011 21:54, schrieb Erik Faye-Lund:
> So the way forward is probably to just change to a self-compiled gzip
> in msysGit (and Git for Windows by proxy).
> 
> I'll post the patch to build a native-gzip to the msysGit mailing list
> (I only got 1.2.4 to compile on Windows, but it's the same version as
> we have in msysGit so it's probably fine), and post the latest version
> of this series soon...

Good to hear that.

By the way, are you aware of http://unxutils.sourceforge.net/ (includes
a native gzip 1.2.4 binary)?

Or http://zlib.net/pigz/ (parallel gzip, no binaries)?

René

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 4/4] upload-archive: use start_command instead of fork
  2011-09-29 20:18                                                 ` René Scharfe
@ 2011-09-29 20:20                                                   ` Erik Faye-Lund
  0 siblings, 0 replies; 33+ messages in thread
From: Erik Faye-Lund @ 2011-09-29 20:20 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, msysGit, Johannes Sixt, Junio C Hamano, git

On Thu, Sep 29, 2011 at 10:18 PM, René Scharfe
<rene.scharfe@lsrfire.ath.cx> wrote:
> Am 29.09.2011 21:54, schrieb Erik Faye-Lund:
>> So the way forward is probably to just change to a self-compiled gzip
>> in msysGit (and Git for Windows by proxy).
>>
>> I'll post the patch to build a native-gzip to the msysGit mailing list
>> (I only got 1.2.4 to compile on Windows, but it's the same version as
>> we have in msysGit so it's probably fine), and post the latest version
>> of this series soon...
>
> Good to hear that.
>
> By the way, are you aware of http://unxutils.sourceforge.net/ (includes
> a native gzip 1.2.4 binary)?

Yes, but we prefer to build our binaries ourselves in the msysGit project.

> Or http://zlib.net/pigz/ (parallel gzip, no binaries)?

I'm aware of pigz, but changing the gzip implementation isn't really
the scope of this change IMO.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2011-09-29 20:21 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-18 18:08 [PATCH v2 0/4] port upload-archive to Windows Erik Faye-Lund
2011-07-18 18:08 ` [PATCH v2 1/4] compat/win32/sys/poll.c: upgrade from upstream Erik Faye-Lund
2011-07-18 18:08 ` [PATCH v2 2/4] mingw: fix compilation of poll-emulation Erik Faye-Lund
2011-07-18 18:08 ` [PATCH v2 3/4] enter_repo: do not modify input Erik Faye-Lund
2011-07-18 18:08 ` [PATCH v2 4/4] upload-archive: use start_command instead of fork Erik Faye-Lund
2011-07-19 21:09   ` Junio C Hamano
2011-07-28  8:32     ` Erik Faye-Lund
2011-07-28 16:08       ` Jeff King
2011-07-28 16:47         ` Jeff King
2011-07-28 17:02           ` Jeff King
2011-08-01 14:45             ` Erik Faye-Lund
2011-08-01 17:46               ` Jeff King
2011-08-01 18:02                 ` Erik Faye-Lund
2011-08-01 18:25                   ` Jeff King
2011-08-01 20:48                     ` René Scharfe
2011-08-01 21:20                       ` Johannes Sixt
2011-08-01 21:42                         ` René Scharfe
2011-08-01 21:52                         ` René Scharfe
2011-08-02  4:00                           ` Jeff King
2011-08-02 16:46                             ` René Scharfe
2011-08-02 18:13                               ` Jeff King
2011-08-02 23:37                                 ` Johannes Sixt
2011-08-03  5:49                                   ` Jeff King
2011-08-06  9:40                                   ` René Scharfe
2011-08-07 20:02                                     ` Johannes Sixt
2011-08-07 21:06                                       ` Jeff King
2011-08-08 17:10                                       ` René Scharfe
2011-08-09  5:02                                         ` Jeff King
2011-08-09 10:25                                           ` René Scharfe
2011-08-09 20:05                                             ` Jeff King
2011-09-29 19:54                                               ` Erik Faye-Lund
2011-09-29 20:18                                                 ` René Scharfe
2011-09-29 20:20                                                   ` Erik Faye-Lund

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.