All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] friendlier handling of overflows in archive-tar
@ 2016-06-16  4:35 Jeff King
  2016-06-16  4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
                   ` (3 more replies)
  0 siblings, 4 replies; 61+ messages in thread
From: Jeff King @ 2016-06-16  4:35 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

The ustar format has some fixed-length numeric fields, and it's possible
to generate a git tree that can't be represented (namely file size and
mtime). Since f2f0267 (archive-tar: use xsnprintf for trivial
formatting, 2015-09-24), we detect and die() in these cases. But we can
actually do the friendly (and POSIX-approved) thing, and add extended
pax headers to represent the correct values.

  [1/2]: archive-tar: write extended headers for file sizes >= 8GB
  [2/2]: archive-tar: write extended headers for far-future mtime

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-16  4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King
@ 2016-06-16  4:37 ` Jeff King
  2016-06-20 22:54   ` René Scharfe
  2016-06-21 19:44   ` Robin H. Johnson
  2016-06-16  4:37 ` [PATCH 2/2] archive-tar: write extended headers for far-future mtime Jeff King
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 61+ messages in thread
From: Jeff King @ 2016-06-16  4:37 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

The ustar format has a fixed-length field for the size of
each file entry which is supposed to contain up to 11 bytes
of octal-formatted data plus a NUL or space terminator.

These means that the largest size we can represent is
077777777777, or 1 byte short of 8GB. The correct solution
for a larger file, according to POSIX.1-2001, is to add an
extended pax header, similar to how we handle long
filenames. This patch does that, and writes zero for the
size field in the ustar header (the last bit is not
mentioned by POSIX, but it matches how GNU tar behaves with
--format=pax).

This should be a strict improvement over the current
behavior, which is to die in xsnprintf with a "BUG".
However, there's some interesting history here.

Prior to f2f0267 (archive-tar: use xsnprintf for trivial
formatting, 2015-09-24), we silently overflowed the "size"
field. The extra bytes ended up in the "mtime" field of the
header, which was then immediately written itself,
overwriting our extra bytes. What that means depends on how
many bytes we wrote.

If the size was 64GB or greater, then we actually overflowed
digits into the mtime field, meaning our value was was
effectively right-shifted by those lost octal digits. And
this patch is again a strict improvement over that.

But if the size was between 8GB and 64GB, then our 12-byte
field held all of the actual digits, and only our NUL
terminator overflowed. According to POSIX, there should be a
NUL or space at the end of the field. However, GNU tar seems
to be lenient here, and will correctly parse a size up 64GB
(minus one) from the field. So sizes in this range might
have just worked, depending on the implementation reading
the tarfile.

This patch is mostly still an improvement there, as the 8GB
limit is specifically mentioned in POSIX as the correct
limit. But it's possible that it could be a regression
(versus the pre-f2f0267 state) if all of the following are
true:

  1. You have a file between 8GB and 64GB.

  2. Your tar implementation _doesn't_ know about pax
     extended headers.

  3. Your tar implementation _does_ parse 12-byte sizes from
     the ustar header without a delimiter.

It's probably not worth worrying about such an obscure set
of conditions, but I'm documenting it here just in case.

There's no test included here. I did confirm that this works
(using GNU tar) with:

  $ dd if=/dev/zero seek=64G bs=1 count=1 of=huge
  $ git add huge
  $ git commit -q -m foo
  $ git archive HEAD | head -c 10000 | tar tvf - 2>/dev/null
  -rw-rw-r-- root/root 68719476737 2016-06-15 21:07 huge

Pre-f2f0267, this would yield a bogus size of 8GB, and
post-f2f0267, git-archive simply dies.

Unfortunately, it's quite an expensive test to run. For one
thing, unless your filesystem supports files with holes, it
takes 64GB of disk space (you might think piping straight to
`hash-object --stdin` would be better, but it's not; that
tries to buffer all 64GB in RAM!). Furthermore, hashing and
compressing the object takes several minutes of CPU time.

We could ship just the resulting compressed object data as a
loose object, but even that takes 64MB. So sadly, this code
path remains untested in the test suite.

Signed-off-by: Jeff King <peff@peff.net>
---
 archive-tar.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/archive-tar.c b/archive-tar.c
index cb99df2..7340b64 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
 	strbuf_addch(sb, '\n');
 }
 
+/*
+ * Like strbuf_append_ext_header, but for numeric values.
+ */
+static void strbuf_append_ext_header_uint(struct strbuf *sb,
+					  const char *keyword,
+					  uintmax_t value)
+{
+	char buf[40]; /* big enough for 2^128 in decimal, plus NUL */
+	int len;
+
+	len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value);
+	strbuf_append_ext_header(sb, keyword, buf, len);
+}
+
 static unsigned int ustar_header_chksum(const struct ustar_header *header)
 {
 	const unsigned char *p = (const unsigned char *)header;
@@ -163,12 +177,21 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
 	return i;
 }
 
+static inline unsigned long ustar_size(uintmax_t size)
+{
+	if (size < 077777777777UL)
+		return size;
+	else
+		return 0;
+}
+
 static void prepare_header(struct archiver_args *args,
 			   struct ustar_header *header,
 			   unsigned int mode, unsigned long size)
 {
 	xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777);
-	xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0);
+	xsnprintf(header->size, sizeof(header->size), "%011lo",
+		  S_ISREG(mode) ? ustar_size(size) : 0);
 	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time);
 
 	xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
@@ -267,6 +290,9 @@ static int write_tar_entry(struct archiver_args *args,
 			memcpy(header.linkname, buffer, size);
 	}
 
+	if (ustar_size(size) != size)
+		strbuf_append_ext_header_uint(&ext_header, "size", size);
+
 	prepare_header(args, &header, mode, size);
 
 	if (ext_header.len > 0) {
-- 
2.9.0.150.g8bd4cf6

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 2/2] archive-tar: write extended headers for far-future mtime
  2016-06-16  4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King
  2016-06-16  4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
@ 2016-06-16  4:37 ` Jeff King
  2016-06-20 22:54   ` René Scharfe
  2016-06-16 17:55 ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano
  2016-06-21 16:16 ` Jeff King
  3 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-16  4:37 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

The ustar format represents timestamps as seconds since the
epoch, but only has room to store 11 octal digits.  To
express anything larger, we need to use an extended header.
This is exactly the same case we fixed for the size field in
the previous commit, and the solution here follows the same
pattern.

This is even mentioned as an issue in f2f0267 (archive-tar:
use xsnprintf for trivial formatting, 2015-09-24), but since
it only affected things far in the future, it wasn't worth
dealing with. But note that my calculations claiming
thousands of years were off there; because our xsnprintf
produces a NUL byte, we only have until the year 2242 to
fix this.

Given that this is just around the corner (geologically
speaking), and because the fix for "size" makes doing this
on top trivial, let's just make it work.

Note that we don't (and never did) handle negative
timestamps (i.e., before 1970). This would probably not be
too hard to support in the same way, but since git does not
support negative timestamps at all, I didn't bother here.

Unlike the last patch for "size", this one is easy to test
efficiently (the magic date is octal 01000000000001):

  $ echo content >file
  $ git add file
  $ GIT_COMMITTER_DATE='@68719476737 +0000' \
    git commit -q -m 'tempori parendum'
  $ git archive HEAD | tar tvf -
  -rw-rw-r-- root/root         8 4147-08-20 03:32 file

With the current tip of master, this dies in xsnprintf (and
prior to f2f0267, it overflowed into the checksum field, and
the resulting tarfile claimed to be from the year 2242).

However, I decided not to include this in the test suite,
because I suspect it will create portability headaches,
including:

  1. The exact format of the system tar's "t" output.

  2. System tars that cannot handle pax headers.

  3. System tars tha cannot handle dates that far in the
     future.

  4. If we replace "tar t" with "tar x", then filesystems
     that cannot handle dates that far in the future.

In other words, we do not really have a reliable tar reader
for these corner cases, so the best we could do is a
byte-for-byte comparison of the output.

Signed-off-by: Jeff King <peff@peff.net>
---
 archive-tar.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/archive-tar.c b/archive-tar.c
index 7340b64..749722f 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -185,6 +185,14 @@ static inline unsigned long ustar_size(uintmax_t size)
 		return 0;
 }
 
+static inline unsigned long ustar_mtime(time_t mtime)
+{
+	if (mtime < 077777777777)
+		return mtime;
+	else
+		return 0;
+}
+
 static void prepare_header(struct archiver_args *args,
 			   struct ustar_header *header,
 			   unsigned int mode, unsigned long size)
@@ -192,7 +200,8 @@ static void prepare_header(struct archiver_args *args,
 	xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777);
 	xsnprintf(header->size, sizeof(header->size), "%011lo",
 		  S_ISREG(mode) ? ustar_size(size) : 0);
-	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time);
+	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo",
+		  ustar_mtime(args->time));
 
 	xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
 	xsnprintf(header->gid, sizeof(header->gid), "%07o", 0);
@@ -292,6 +301,8 @@ static int write_tar_entry(struct archiver_args *args,
 
 	if (ustar_size(size) != size)
 		strbuf_append_ext_header_uint(&ext_header, "size", size);
+	if (ustar_mtime(args->time) != args->time)
+		strbuf_append_ext_header_uint(&ext_header, "mtime", args->time);
 
 	prepare_header(args, &header, mode, size);
 
-- 
2.9.0.150.g8bd4cf6

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 0/2] friendlier handling of overflows in archive-tar
  2016-06-16  4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King
  2016-06-16  4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
  2016-06-16  4:37 ` [PATCH 2/2] archive-tar: write extended headers for far-future mtime Jeff King
@ 2016-06-16 17:55 ` Junio C Hamano
  2016-06-21 16:16 ` Jeff King
  3 siblings, 0 replies; 61+ messages in thread
From: Junio C Hamano @ 2016-06-16 17:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe

Both patches looked good to me. Thanks.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-16  4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
@ 2016-06-20 22:54   ` René Scharfe
  2016-06-21 15:59     ` Jeff King
  2016-06-21 19:44   ` Robin H. Johnson
  1 sibling, 1 reply; 61+ messages in thread
From: René Scharfe @ 2016-06-20 22:54 UTC (permalink / raw)
  To: Jeff King, git

Am 16.06.2016 um 06:37 schrieb Jeff King:
> The ustar format has a fixed-length field for the size of
> each file entry which is supposed to contain up to 11 bytes
> of octal-formatted data plus a NUL or space terminator.
>
> These means that the largest size we can represent is
> 077777777777, or 1 byte short of 8GB. The correct solution
> for a larger file, according to POSIX.1-2001, is to add an
> extended pax header, similar to how we handle long
> filenames. This patch does that, and writes zero for the
> size field in the ustar header (the last bit is not
> mentioned by POSIX, but it matches how GNU tar behaves with
> --format=pax).
>
> This should be a strict improvement over the current
> behavior, which is to die in xsnprintf with a "BUG".
> However, there's some interesting history here.
>
> Prior to f2f0267 (archive-tar: use xsnprintf for trivial
> formatting, 2015-09-24), we silently overflowed the "size"
> field. The extra bytes ended up in the "mtime" field of the
> header, which was then immediately written itself,
> overwriting our extra bytes. What that means depends on how
> many bytes we wrote.
>
> If the size was 64GB or greater, then we actually overflowed
> digits into the mtime field, meaning our value was was
> effectively right-shifted by those lost octal digits. And
> this patch is again a strict improvement over that.
>
> But if the size was between 8GB and 64GB, then our 12-byte
> field held all of the actual digits, and only our NUL
> terminator overflowed. According to POSIX, there should be a
> NUL or space at the end of the field. However, GNU tar seems
> to be lenient here, and will correctly parse a size up 64GB
> (minus one) from the field. So sizes in this range might
> have just worked, depending on the implementation reading
> the tarfile.
>
> This patch is mostly still an improvement there, as the 8GB
> limit is specifically mentioned in POSIX as the correct
> limit. But it's possible that it could be a regression
> (versus the pre-f2f0267 state) if all of the following are
> true:
>
>    1. You have a file between 8GB and 64GB.
>
>    2. Your tar implementation _doesn't_ know about pax
>       extended headers.
>
>    3. Your tar implementation _does_ parse 12-byte sizes from
>       the ustar header without a delimiter.
>
> It's probably not worth worrying about such an obscure set
> of conditions, but I'm documenting it here just in case.
>
> There's no test included here. I did confirm that this works
> (using GNU tar) with:
>
>    $ dd if=/dev/zero seek=64G bs=1 count=1 of=huge
>    $ git add huge
>    $ git commit -q -m foo
>    $ git archive HEAD | head -c 10000 | tar tvf - 2>/dev/null
>    -rw-rw-r-- root/root 68719476737 2016-06-15 21:07 huge
>
> Pre-f2f0267, this would yield a bogus size of 8GB, and
> post-f2f0267, git-archive simply dies.
>
> Unfortunately, it's quite an expensive test to run. For one
> thing, unless your filesystem supports files with holes, it
> takes 64GB of disk space (you might think piping straight to
> `hash-object --stdin` would be better, but it's not; that
> tries to buffer all 64GB in RAM!). Furthermore, hashing and
> compressing the object takes several minutes of CPU time.
>
> We could ship just the resulting compressed object data as a
> loose object, but even that takes 64MB. So sadly, this code
> path remains untested in the test suite.

If we could set the limit to a lower value than 8GB for testing then we 
could at least check if the extended header is written, e.g. if 
ustar_size() could be convinced to return 0 every time using a hidden 
command line parameter or an environment variable or something better.

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>   archive-tar.c | 28 +++++++++++++++++++++++++++-
>   1 file changed, 27 insertions(+), 1 deletion(-)
>
> diff --git a/archive-tar.c b/archive-tar.c
> index cb99df2..7340b64 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
>   	strbuf_addch(sb, '\n');
>   }
>
> +/*
> + * Like strbuf_append_ext_header, but for numeric values.
> + */
> +static void strbuf_append_ext_header_uint(struct strbuf *sb,
> +					  const char *keyword,
> +					  uintmax_t value)
> +{
> +	char buf[40]; /* big enough for 2^128 in decimal, plus NUL */
> +	int len;
> +
> +	len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value);
> +	strbuf_append_ext_header(sb, keyword, buf, len);
> +}
> +
>   static unsigned int ustar_header_chksum(const struct ustar_header *header)
>   {
>   	const unsigned char *p = (const unsigned char *)header;
> @@ -163,12 +177,21 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
>   	return i;
>   }
>
> +static inline unsigned long ustar_size(uintmax_t size)
> +{
> +	if (size < 077777777777UL)

Shouldn't that be less-or-equal?

> +		return size;
> +	else
> +		return 0;
> +}
> +
>   static void prepare_header(struct archiver_args *args,
>   			   struct ustar_header *header,
>   			   unsigned int mode, unsigned long size)
>   {
>   	xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777);
> -	xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0);
> +	xsnprintf(header->size, sizeof(header->size), "%011lo",
> +		  S_ISREG(mode) ? ustar_size(size) : 0);
>   	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time);
>
>   	xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
> @@ -267,6 +290,9 @@ static int write_tar_entry(struct archiver_args *args,
>   			memcpy(header.linkname, buffer, size);
>   	}
>
> +	if (ustar_size(size) != size)
> +		strbuf_append_ext_header_uint(&ext_header, "size", size);

It needs "S_ISREG(mode) && " as well, no?  In practice it probably 
doesn't matter (until someone stores a 8GB long symlink target), but the 
size field should only be set for regular files.

> +
>   	prepare_header(args, &header, mode, size);
>
>   	if (ext_header.len > 0) {
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime
  2016-06-16  4:37 ` [PATCH 2/2] archive-tar: write extended headers for far-future mtime Jeff King
@ 2016-06-20 22:54   ` René Scharfe
  2016-06-22  5:46     ` René Scharfe
  0 siblings, 1 reply; 61+ messages in thread
From: René Scharfe @ 2016-06-20 22:54 UTC (permalink / raw)
  To: Jeff King, git

Am 16.06.2016 um 06:37 schrieb Jeff King:
> The ustar format represents timestamps as seconds since the
> epoch, but only has room to store 11 octal digits.  To
> express anything larger, we need to use an extended header.
> This is exactly the same case we fixed for the size field in
> the previous commit, and the solution here follows the same
> pattern.
>
> This is even mentioned as an issue in f2f0267 (archive-tar:
> use xsnprintf for trivial formatting, 2015-09-24), but since
> it only affected things far in the future, it wasn't worth
> dealing with. But note that my calculations claiming
> thousands of years were off there; because our xsnprintf
> produces a NUL byte, we only have until the year 2242 to
> fix this.
>
> Given that this is just around the corner (geologically
> speaking), and because the fix for "size" makes doing this
> on top trivial, let's just make it work.
>
> Note that we don't (and never did) handle negative
> timestamps (i.e., before 1970). This would probably not be
> too hard to support in the same way, but since git does not
> support negative timestamps at all, I didn't bother here.
>
> Unlike the last patch for "size", this one is easy to test
> efficiently (the magic date is octal 01000000000001):
>
>    $ echo content >file
>    $ git add file
>    $ GIT_COMMITTER_DATE='@68719476737 +0000' \
>      git commit -q -m 'tempori parendum'
>    $ git archive HEAD | tar tvf -
>    -rw-rw-r-- root/root         8 4147-08-20 03:32 file
>
> With the current tip of master, this dies in xsnprintf (and
> prior to f2f0267, it overflowed into the checksum field, and
> the resulting tarfile claimed to be from the year 2242).
>
> However, I decided not to include this in the test suite,
> because I suspect it will create portability headaches,
> including:
>
>    1. The exact format of the system tar's "t" output.

Probably not worth it.

>    2. System tars that cannot handle pax headers.

In t5000 there is a simple interpreter for path headers for systems 
whose tar doesn't handle them.  Adding one for mtime headers may be 
feasible.

It's just a bit complicated to link a pax header file to the file it 
applies to when it doesn't also contain a path header.  But we know that 
the mtime of all entries in tar files created by git archive are is the 
same, so we could simply read one header and then adjust the mtime of 
all files accordingly.

>    3. System tars tha cannot handle dates that far in the
>       future.
>
>    4. If we replace "tar t" with "tar x", then filesystems
>       that cannot handle dates that far in the future.

We can test that by supplying a test archive and set a prerequisite if 
tar (possibly with our header interpreter) and the filesystem can cope 
with that.

> In other words, we do not really have a reliable tar reader
> for these corner cases, so the best we could do is a
> byte-for-byte comparison of the output.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>   archive-tar.c | 13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/archive-tar.c b/archive-tar.c
> index 7340b64..749722f 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -185,6 +185,14 @@ static inline unsigned long ustar_size(uintmax_t size)
>   		return 0;
>   }
>
> +static inline unsigned long ustar_mtime(time_t mtime)
> +{
> +	if (mtime < 077777777777)

That should be less-or-equal, right?

> +		return mtime;
> +	else
> +		return 0;
> +}
> +
>   static void prepare_header(struct archiver_args *args,
>   			   struct ustar_header *header,
>   			   unsigned int mode, unsigned long size)
> @@ -192,7 +200,8 @@ static void prepare_header(struct archiver_args *args,
>   	xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777);
>   	xsnprintf(header->size, sizeof(header->size), "%011lo",
>   		  S_ISREG(mode) ? ustar_size(size) : 0);
> -	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time);
> +	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo",
> +		  ustar_mtime(args->time));
>
>   	xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
>   	xsnprintf(header->gid, sizeof(header->gid), "%07o", 0);
> @@ -292,6 +301,8 @@ static int write_tar_entry(struct archiver_args *args,
>
>   	if (ustar_size(size) != size)
>   		strbuf_append_ext_header_uint(&ext_header, "size", size);
> +	if (ustar_mtime(args->time) != args->time)
> +		strbuf_append_ext_header_uint(&ext_header, "mtime", args->time);
>
>   	prepare_header(args, &header, mode, size);
>
>

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-20 22:54   ` René Scharfe
@ 2016-06-21 15:59     ` Jeff King
  2016-06-21 16:02       ` Jeff King
                         ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Jeff King @ 2016-06-21 15:59 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On Tue, Jun 21, 2016 at 12:54:11AM +0200, René Scharfe wrote:

> > Unfortunately, it's quite an expensive test to run. For one
> > thing, unless your filesystem supports files with holes, it
> > takes 64GB of disk space (you might think piping straight to
> > `hash-object --stdin` would be better, but it's not; that
> > tries to buffer all 64GB in RAM!). Furthermore, hashing and
> > compressing the object takes several minutes of CPU time.
> > 
> > We could ship just the resulting compressed object data as a
> > loose object, but even that takes 64MB. So sadly, this code
> > path remains untested in the test suite.
> 
> If we could set the limit to a lower value than 8GB for testing then we
> could at least check if the extended header is written, e.g. if ustar_size()
> could be convinced to return 0 every time using a hidden command line
> parameter or an environment variable or something better.

Yes, we could do that, though I think it loses most of the value of the
test. We can check that if we hit an arbitrary value we generate the pax
header, but I think what we _really_ care about is: did we generate an
output that somebody else's tar implementation can handle.

And for the smaller-than-64GB case, GNU tar happily handles our existing
output (though I suspect other tars might fail at "only" 8GB).

> > +static inline unsigned long ustar_size(uintmax_t size)
> > +{
> > +	if (size < 077777777777UL)
> 
> Shouldn't that be less-or-equal?

Yeah, you're right (and for the one in the next patch, too).

> > +	if (ustar_size(size) != size)
> > +		strbuf_append_ext_header_uint(&ext_header, "size", size);
> 
> It needs "S_ISREG(mode) && " as well, no?  In practice it probably doesn't
> matter (until someone stores a 8GB long symlink target), but the size field
> should only be set for regular files.

Thanks for noticing that. I remembered wondering that when I was early
in debugging/diagnosing, but forgot to follow up on it. I agree it's
unlikely in practice, but we should have consistent checks (I think it
would actually make sense to move the ISREG check inside ustar_size, and
then we can apply it consistently here and when generating the header;
my goal with ustar_size() was to avoid having the same logic in multiple
places).

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-21 15:59     ` Jeff King
@ 2016-06-21 16:02       ` Jeff King
  2016-06-21 20:42       ` René Scharfe
  2016-06-21 20:54       ` René Scharfe
  2 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-21 16:02 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On Tue, Jun 21, 2016 at 11:59:20AM -0400, Jeff King wrote:

> > > +	if (ustar_size(size) != size)
> > > +		strbuf_append_ext_header_uint(&ext_header, "size", size);
> > 
> > It needs "S_ISREG(mode) && " as well, no?  In practice it probably doesn't
> > matter (until someone stores a 8GB long symlink target), but the size field
> > should only be set for regular files.
> 
> Thanks for noticing that. I remembered wondering that when I was early
> in debugging/diagnosing, but forgot to follow up on it. I agree it's
> unlikely in practice, but we should have consistent checks (I think it
> would actually make sense to move the ISREG check inside ustar_size, and
> then we can apply it consistently here and when generating the header;
> my goal with ustar_size() was to avoid having the same logic in multiple
> places).

Actually, scratch that. It makes things awkward because it would be hard
to tell when ustar_size() returned zero because it's a file with a big
size, and thus needs a pax header, versus that it was not a file, and
therefore must _not_ have a pax header.

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 0/2] friendlier handling of overflows in archive-tar
  2016-06-16  4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King
                   ` (2 preceding siblings ...)
  2016-06-16 17:55 ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano
@ 2016-06-21 16:16 ` Jeff King
  2016-06-21 16:16   ` [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
                     ` (3 more replies)
  3 siblings, 4 replies; 61+ messages in thread
From: Jeff King @ 2016-06-21 16:16 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, René Scharfe

On Thu, Jun 16, 2016 at 12:35:23AM -0400, Jeff King wrote:

> The ustar format has some fixed-length numeric fields, and it's possible
> to generate a git tree that can't be represented (namely file size and
> mtime). Since f2f0267 (archive-tar: use xsnprintf for trivial
> formatting, 2015-09-24), we detect and die() in these cases. But we can
> actually do the friendly (and POSIX-approved) thing, and add extended
> pax headers to represent the correct values.
> 
>   [1/2]: archive-tar: write extended headers for file sizes >= 8GB
>   [2/2]: archive-tar: write extended headers for far-future mtime

And here's a v2 that addresses the smaller comments from René. I punted
on doing something fancy with tests. I'm not opposed to it, but I'm also
not convinced it's all that useful. Either way, I think it can come on
top if we want it.

Junio, this is the jk/big-and-old-archive-tar topic.

The interdiff is:

diff --git a/archive-tar.c b/archive-tar.c
index c7b85fd..ed562d4 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -179,7 +179,7 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
 
 static inline unsigned long ustar_size(uintmax_t size)
 {
-	if (size < 077777777777UL)
+	if (size <= 077777777777UL)
 		return size;
 	else
 		return 0;
@@ -187,7 +187,7 @@ static inline unsigned long ustar_size(uintmax_t size)
 
 static inline unsigned long ustar_mtime(time_t mtime)
 {
-	if (mtime < 077777777777UL)
+	if (mtime <= 077777777777UL)
 		return mtime;
 	else
 		return 0;
@@ -299,7 +299,7 @@ static int write_tar_entry(struct archiver_args *args,
 			memcpy(header.linkname, buffer, size);
 	}
 
-	if (ustar_size(size) != size)
+	if (S_ISREG(mode) && ustar_size(size) != size)
 		strbuf_append_ext_header_uint(&ext_header, "size", size);
 	if (ustar_mtime(args->time) != args->time)
 		strbuf_append_ext_header_uint(&ext_header, "mtime", args->time);


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-21 16:16 ` Jeff King
@ 2016-06-21 16:16   ` Jeff King
  2016-06-21 16:17   ` [PATCH v2 2/2] archive-tar: write extended headers for far-future mtime Jeff King
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-21 16:16 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, René Scharfe

The ustar format has a fixed-length field for the size of
each file entry which is supposed to contain up to 11 bytes
of octal-formatted data plus a NUL or space terminator.

These means that the largest size we can represent is
077777777777, or 1 byte short of 8GB. The correct solution
for a larger file, according to POSIX.1-2001, is to add an
extended pax header, similar to how we handle long
filenames. This patch does that, and writes zero for the
size field in the ustar header (the last bit is not
mentioned by POSIX, but it matches how GNU tar behaves with
--format=pax).

This should be a strict improvement over the current
behavior, which is to die in xsnprintf with a "BUG".
However, there's some interesting history here.

Prior to f2f0267 (archive-tar: use xsnprintf for trivial
formatting, 2015-09-24), we silently overflowed the "size"
field. The extra bytes ended up in the "mtime" field of the
header, which was then immediately written itself,
overwriting our extra bytes. What that means depends on how
many bytes we wrote.

If the size was 64GB or greater, then we actually overflowed
digits into the mtime field, meaning our value was was
effectively right-shifted by those lost octal digits. And
this patch is again a strict improvement over that.

But if the size was between 8GB and 64GB, then our 12-byte
field held all of the actual digits, and only our NUL
terminator overflowed. According to POSIX, there should be a
NUL or space at the end of the field. However, GNU tar seems
to be lenient here, and will correctly parse a size up 64GB
(minus one) from the field. So sizes in this range might
have just worked, depending on the implementation reading
the tarfile.

This patch is mostly still an improvement there, as the 8GB
limit is specifically mentioned in POSIX as the correct
limit. But it's possible that it could be a regression
(versus the pre-f2f0267 state) if all of the following are
true:

  1. You have a file between 8GB and 64GB.

  2. Your tar implementation _doesn't_ know about pax
     extended headers.

  3. Your tar implementation _does_ parse 12-byte sizes from
     the ustar header without a delimiter.

It's probably not worth worrying about such an obscure set
of conditions, but I'm documenting it here just in case.

There's no test included here. I did confirm that this works
(using GNU tar) with:

  $ dd if=/dev/zero seek=64G bs=1 count=1 of=huge
  $ git add huge
  $ git commit -q -m foo
  $ git archive HEAD | head -c 10000 | tar tvf - 2>/dev/null
  -rw-rw-r-- root/root 68719476737 2016-06-15 21:07 huge

Pre-f2f0267, this would yield a bogus size of 8GB, and
post-f2f0267, git-archive simply dies.

Unfortunately, it's quite an expensive test to run. For one
thing, unless your filesystem supports files with holes, it
takes 64GB of disk space (you might think piping straight to
`hash-object --stdin` would be better, but it's not; that
tries to buffer all 64GB in RAM!). Furthermore, hashing and
compressing the object takes several minutes of CPU time.

We could ship just the resulting compressed object data as a
loose object, but even that takes 64MB. So sadly, this code
path remains untested in the test suite.

Signed-off-by: Jeff King <peff@peff.net>
---
 archive-tar.c | 28 +++++++++++++++++++++++++++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/archive-tar.c b/archive-tar.c
index cb99df2..a9caf13 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
 	strbuf_addch(sb, '\n');
 }
 
+/*
+ * Like strbuf_append_ext_header, but for numeric values.
+ */
+static void strbuf_append_ext_header_uint(struct strbuf *sb,
+					  const char *keyword,
+					  uintmax_t value)
+{
+	char buf[40]; /* big enough for 2^128 in decimal, plus NUL */
+	int len;
+
+	len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value);
+	strbuf_append_ext_header(sb, keyword, buf, len);
+}
+
 static unsigned int ustar_header_chksum(const struct ustar_header *header)
 {
 	const unsigned char *p = (const unsigned char *)header;
@@ -163,12 +177,21 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
 	return i;
 }
 
+static inline unsigned long ustar_size(uintmax_t size)
+{
+	if (size <= 077777777777UL)
+		return size;
+	else
+		return 0;
+}
+
 static void prepare_header(struct archiver_args *args,
 			   struct ustar_header *header,
 			   unsigned int mode, unsigned long size)
 {
 	xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777);
-	xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0);
+	xsnprintf(header->size, sizeof(header->size), "%011lo",
+		  S_ISREG(mode) ? ustar_size(size) : 0);
 	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time);
 
 	xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
@@ -267,6 +290,9 @@ static int write_tar_entry(struct archiver_args *args,
 			memcpy(header.linkname, buffer, size);
 	}
 
+	if (S_ISREG(mode) && ustar_size(size) != size)
+		strbuf_append_ext_header_uint(&ext_header, "size", size);
+
 	prepare_header(args, &header, mode, size);
 
 	if (ext_header.len > 0) {
-- 
2.9.0.204.g1499a7b


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v2 2/2] archive-tar: write extended headers for far-future mtime
  2016-06-21 16:16 ` Jeff King
  2016-06-21 16:16   ` [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
@ 2016-06-21 16:17   ` Jeff King
  2016-06-21 18:43   ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano
  2016-06-23 23:15   ` [PATCH v3] " Jeff King
  3 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-21 16:17 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, René Scharfe

The ustar format represents timestamps as seconds since the
epoch, but only has room to store 11 octal digits.  To
express anything larger, we need to use an extended header.
This is exactly the same case we fixed for the size field in
the previous commit, and the solution here follows the same
pattern.

This is even mentioned as an issue in f2f0267 (archive-tar:
use xsnprintf for trivial formatting, 2015-09-24), but since
it only affected things far in the future, it wasn't worth
dealing with. But note that my calculations claiming
thousands of years were off there; because our xsnprintf
produces a NUL byte, we only have until the year 2242 to
fix this.

Given that this is just around the corner (geologically
speaking), and because the fix for "size" makes doing this
on top trivial, let's just make it work.

Note that we don't (and never did) handle negative
timestamps (i.e., before 1970). This would probably not be
too hard to support in the same way, but since git does not
support negative timestamps at all, I didn't bother here.

Unlike the last patch for "size", this one is easy to test
efficiently (the magic date is octal 01000000000001):

  $ echo content >file
  $ git add file
  $ GIT_COMMITTER_DATE='@68719476737 +0000' \
    git commit -q -m 'tempori parendum'
  $ git archive HEAD | tar tvf -
  -rw-rw-r-- root/root         8 4147-08-20 03:32 file

With the current tip of master, this dies in xsnprintf (and
prior to f2f0267, it overflowed into the checksum field, and
the resulting tarfile claimed to be from the year 2242).

However, I decided not to include this in the test suite,
because I suspect it will create portability headaches,
including:

  1. The exact format of the system tar's "t" output.

  2. System tars that cannot handle pax headers.

  3. System tars tha cannot handle dates that far in the
     future.

  4. If we replace "tar t" with "tar x", then filesystems
     that cannot handle dates that far in the future.

In other words, we do not really have a reliable tar reader
for these corner cases, so the best we could do is a
byte-for-byte comparison of the output.

Signed-off-by: Jeff King <peff@peff.net>
---
 archive-tar.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/archive-tar.c b/archive-tar.c
index a9caf13..ed562d4 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -185,6 +185,14 @@ static inline unsigned long ustar_size(uintmax_t size)
 		return 0;
 }
 
+static inline unsigned long ustar_mtime(time_t mtime)
+{
+	if (mtime <= 077777777777UL)
+		return mtime;
+	else
+		return 0;
+}
+
 static void prepare_header(struct archiver_args *args,
 			   struct ustar_header *header,
 			   unsigned int mode, unsigned long size)
@@ -192,7 +200,8 @@ static void prepare_header(struct archiver_args *args,
 	xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777);
 	xsnprintf(header->size, sizeof(header->size), "%011lo",
 		  S_ISREG(mode) ? ustar_size(size) : 0);
-	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time);
+	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo",
+		  ustar_mtime(args->time));
 
 	xsnprintf(header->uid, sizeof(header->uid), "%07o", 0);
 	xsnprintf(header->gid, sizeof(header->gid), "%07o", 0);
@@ -292,6 +301,8 @@ static int write_tar_entry(struct archiver_args *args,
 
 	if (S_ISREG(mode) && ustar_size(size) != size)
 		strbuf_append_ext_header_uint(&ext_header, "size", size);
+	if (ustar_mtime(args->time) != args->time)
+		strbuf_append_ext_header_uint(&ext_header, "mtime", args->time);
 
 	prepare_header(args, &header, mode, size);
 
-- 
2.9.0.204.g1499a7b

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 0/2] friendlier handling of overflows in archive-tar
  2016-06-21 16:16 ` Jeff King
  2016-06-21 16:16   ` [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
  2016-06-21 16:17   ` [PATCH v2 2/2] archive-tar: write extended headers for far-future mtime Jeff King
@ 2016-06-21 18:43   ` Junio C Hamano
  2016-06-23 23:15   ` [PATCH v3] " Jeff King
  3 siblings, 0 replies; 61+ messages in thread
From: Junio C Hamano @ 2016-06-21 18:43 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe

Jeff King <peff@peff.net> writes:

> Junio, this is the jk/big-and-old-archive-tar topic.
>
> The interdiff is:
> ...
> diff --git a/archive-tar.c b/archive-tar.c
> index c7b85fd..ed562d4 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -179,7 +179,7 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
>  
>  static inline unsigned long ustar_size(uintmax_t size)
>  {
> -	if (size < 077777777777UL)
> +	if (size <= 077777777777UL)
>  		return size;
>  	else
>  		return 0;
> @@ -187,7 +187,7 @@ static inline unsigned long ustar_size(uintmax_t size)
>  
>  static inline unsigned long ustar_mtime(time_t mtime)
>  {
> -	if (mtime < 077777777777UL)
> +	if (mtime <= 077777777777UL)
>  		return mtime;
>  	else
>  		return 0;
> @@ -299,7 +299,7 @@ static int write_tar_entry(struct archiver_args *args,
>  			memcpy(header.linkname, buffer, size);
>  	}
>  
> -	if (ustar_size(size) != size)
> +	if (S_ISREG(mode) && ustar_size(size) != size)
>  		strbuf_append_ext_header_uint(&ext_header, "size", size);
>  	if (ustar_mtime(args->time) != args->time)
>  		strbuf_append_ext_header_uint(&ext_header, "mtime", args->time);

Thanks.

By the way, I realized the naming mistake and the topic branch is
now named s/old/future/.  Size being big is one condition that needs
this fix, so is timestamp being from far future, not "old".

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-16  4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
  2016-06-20 22:54   ` René Scharfe
@ 2016-06-21 19:44   ` Robin H. Johnson
  2016-06-21 20:57     ` Jeff King
  1 sibling, 1 reply; 61+ messages in thread
From: Robin H. Johnson @ 2016-06-21 19:44 UTC (permalink / raw)
  To: Jeff King, Git Mailing List; +Cc: René Scharfe

On Thu, Jun 16, 2016 at 12:37:33AM -0400,  Jeff King wrote:
> We could ship just the resulting compressed object data as a
> loose object, but even that takes 64MB. So sadly, this code
> path remains untested in the test suite.
Additionally compress the object data, and insert it for the purpose of
testing? It's still an expensive test time-wise, but repeated
gzip compression on zeros does still help; to that end, here's the
pieces to add a testcase while only being <9KiB.

t5005-archive-hugefile.sh:
...
mkdir t
zcat t5005-barerepo-64gb-obj.tar.gz.gz | tar xzf - -C t
GIT_DIR=t git archive HEAD | head -c 10000 | tar tvf - 2>/dev/null
...

Test repo attached, it's only 8KiB.

Test repo can be recreated with:
truncate -s 64g bigfile
git add bigfile # takes 10 mins
git commit -q -m foo # takes another 10 minutes
tar cf - -C .git . |gzip -9 --no-name |gzip -9f --no-name >barerepo.tar.gz.gz


-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-21 15:59     ` Jeff King
  2016-06-21 16:02       ` Jeff King
@ 2016-06-21 20:42       ` René Scharfe
  2016-06-21 20:57         ` René Scharfe
  2016-06-21 21:02         ` Jeff King
  2016-06-21 20:54       ` René Scharfe
  2 siblings, 2 replies; 61+ messages in thread
From: René Scharfe @ 2016-06-21 20:42 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Am 21.06.2016 um 17:59 schrieb Jeff King:
> On Tue, Jun 21, 2016 at 12:54:11AM +0200, René Scharfe wrote:
> 
>>> Unfortunately, it's quite an expensive test to run. For one
>>> thing, unless your filesystem supports files with holes, it
>>> takes 64GB of disk space (you might think piping straight to
>>> `hash-object --stdin` would be better, but it's not; that
>>> tries to buffer all 64GB in RAM!). Furthermore, hashing and
>>> compressing the object takes several minutes of CPU time.
>>>
>>> We could ship just the resulting compressed object data as a
>>> loose object, but even that takes 64MB. So sadly, this code
>>> path remains untested in the test suite.
>>
>> If we could set the limit to a lower value than 8GB for testing then we
>> could at least check if the extended header is written, e.g. if ustar_size()
>> could be convinced to return 0 every time using a hidden command line
>> parameter or an environment variable or something better.
> 
> Yes, we could do that, though I think it loses most of the value of the
> test. We can check that if we hit an arbitrary value we generate the pax
> header, but I think what we _really_ care about is: did we generate an
> output that somebody else's tar implementation can handle.

I agree with the last point, but don't see how that diminishes the
value of such a test.  If we provide file sizes only through extended
headers (the normal header field being set to 0) and we can extract
files with correct sizes then tar must have interpreted those header
as intended, right?

(Or it just guessed the sizes by searching for the next header magic,
but such a fallback won't be accurate for files ending with NUL
characters due to NUL-padding, so we just have to add such a file.)

René


-- >8 --
Subject: archive-tar: test creation of pax extended size headers

---
The value 120 is magic; we need it to pass the tests.  That's
because prepare_header() is used for building extended header
records as well and we don't create extended headers for extended
headers (not sure if that would work anyway), so they simply
vanish when they're over the limit as their size field is set to
zero.

 archive-tar.c       | 7 ++++++-
 t/t5000-tar-tree.sh | 7 +++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/archive-tar.c b/archive-tar.c
index f53e61c..fbbc4cc 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -14,6 +14,7 @@ static char block[BLOCKSIZE];
 static unsigned long offset;
 
 static int tar_umask = 002;
+static unsigned long ustar_size_max = 077777777777UL;
 
 static int write_tar_filter_archive(const struct archiver *ar,
 				    struct archiver_args *args);
@@ -179,7 +180,7 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
 
 static inline unsigned long ustar_size(uintmax_t size)
 {
-	if (size <= 077777777777UL)
+	if (size <= ustar_size_max)
 		return size;
 	else
 		return 0;
@@ -412,6 +413,10 @@ static int git_tar_config(const char *var, const char *value, void *cb)
 		}
 		return 0;
 	}
+	if (!strcmp(var, "tar.ustarsizemax")) {
+		ustar_size_max = git_config_ulong(var, value);
+		return 0;
+	}
 
 	return tar_filter_config(var, value, cb);
 }
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 4b68bba..03bb4c7 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -102,6 +102,7 @@ test_expect_success \
      echo long filename >a/four$hundred &&
      mkdir a/bin &&
      test-genrandom "frotz" 500000 >a/bin/sh &&
+     printf "\0\0\0" >>a/bin/sh &&
      printf "A\$Format:%s\$O" "$SUBSTFORMAT" >a/substfile1 &&
      printf "A not substituted O" >a/substfile2 &&
      if test_have_prereq SYMLINKS; then
@@ -157,6 +158,12 @@ test_expect_success 'git-archive --prefix=olde-' '
 
 check_tar with_olde-prefix olde-
 
+test_expect_success !TAR_NEEDS_PAX_FALLBACK 'pax extended size headers' '
+	git -c tar.ustarsizemax=120 archive HEAD >extended_size_header.tar
+'
+
+check_tar extended_size_header
+
 test_expect_success 'git archive on large files' '
     test_config core.bigfilethreshold 1 &&
     git archive HEAD >b3.tar &&
-- 
2.9.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-21 15:59     ` Jeff King
  2016-06-21 16:02       ` Jeff King
  2016-06-21 20:42       ` René Scharfe
@ 2016-06-21 20:54       ` René Scharfe
  2 siblings, 0 replies; 61+ messages in thread
From: René Scharfe @ 2016-06-21 20:54 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Am 21.06.2016 um 17:59 schrieb Jeff King:
> On Tue, Jun 21, 2016 at 12:54:11AM +0200, René Scharfe wrote:
>
>>> Unfortunately, it's quite an expensive test to run. For one
>>> thing, unless your filesystem supports files with holes, it
>>> takes 64GB of disk space (you might think piping straight to
>>> `hash-object --stdin` would be better, but it's not; that
>>> tries to buffer all 64GB in RAM!). Furthermore, hashing and
>>> compressing the object takes several minutes of CPU time.
>>>
>>> We could ship just the resulting compressed object data as a
>>> loose object, but even that takes 64MB. So sadly, this code
>>> path remains untested in the test suite.
>>
>> If we could set the limit to a lower value than 8GB for testing then we
>> could at least check if the extended header is written, e.g. if ustar_size()
>> could be convinced to return 0 every time using a hidden command line
>> parameter or an environment variable or something better.
>
> Yes, we could do that, though I think it loses most of the value of the
> test. We can check that if we hit an arbitrary value we generate the pax
> header, but I think what we _really_ care about is: did we generate an
> output that somebody else's tar implementation can handle.

I agree with the last point, but don't see how that diminishes the
value of such a test.  If we provide file sizes only through extended
headers (the normal header field being set to 0) and we can extract
files with correct sizes then tar must have interpreted those header
as intended, right?

(Or it just guessed the sizes by searching for the next header magic,
but such a fallback won't be accurate for files ending with NUL
characters due to NUL-padding, so we just have to add such a file.)

René


-- >8 --
Subject: archive-tar: test creation of pax extended size headers

---
The value 120 is magic; we need it to pass the tests.  That's
because prepare_header() is used for building extended header
records as well and we don't create extended headers for extended
headers (not sure if that would work anyway), so they simply
vanish when they're over the limit as their size field is set to
zero.

 archive-tar.c       | 7 ++++++-
 t/t5000-tar-tree.sh | 7 +++++++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/archive-tar.c b/archive-tar.c
index f53e61c..fbbc4cc 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -14,6 +14,7 @@ static char block[BLOCKSIZE];
 static unsigned long offset;
 
 static int tar_umask = 002;
+static unsigned long ustar_size_max = 077777777777UL;
 
 static int write_tar_filter_archive(const struct archiver *ar,
 				    struct archiver_args *args);
@@ -179,7 +180,7 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen)
 
 static inline unsigned long ustar_size(uintmax_t size)
 {
-	if (size <= 077777777777UL)
+	if (size <= ustar_size_max)
 		return size;
 	else
 		return 0;
@@ -412,6 +413,10 @@ static int git_tar_config(const char *var, const char *value, void *cb)
 		}
 		return 0;
 	}
+	if (!strcmp(var, "tar.ustarsizemax")) {
+		ustar_size_max = git_config_ulong(var, value);
+		return 0;
+	}
 
 	return tar_filter_config(var, value, cb);
 }
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 4b68bba..03bb4c7 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -102,6 +102,7 @@ test_expect_success \
      echo long filename >a/four$hundred &&
      mkdir a/bin &&
      test-genrandom "frotz" 500000 >a/bin/sh &&
+     printf "\0\0\0" >>a/bin/sh &&
      printf "A\$Format:%s\$O" "$SUBSTFORMAT" >a/substfile1 &&
      printf "A not substituted O" >a/substfile2 &&
      if test_have_prereq SYMLINKS; then
@@ -157,6 +158,12 @@ test_expect_success 'git-archive --prefix=olde-' '
 
 check_tar with_olde-prefix olde-
 
+test_expect_success !TAR_NEEDS_PAX_FALLBACK 'pax extended size headers' '
+	git -c tar.ustarsizemax=120 archive HEAD >extended_size_header.tar
+'
+
+check_tar extended_size_header
+
 test_expect_success 'git archive on large files' '
     test_config core.bigfilethreshold 1 &&
     git archive HEAD >b3.tar &&
-- 
2.9.0


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-21 19:44   ` Robin H. Johnson
@ 2016-06-21 20:57     ` Jeff King
  0 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-21 20:57 UTC (permalink / raw)
  To: Robin H. Johnson; +Cc: Git Mailing List, René Scharfe

On Tue, Jun 21, 2016 at 07:44:55PM +0000, Robin H. Johnson wrote:

> On Thu, Jun 16, 2016 at 12:37:33AM -0400,  Jeff King wrote:
> > We could ship just the resulting compressed object data as a
> > loose object, but even that takes 64MB. So sadly, this code
> > path remains untested in the test suite.
> Additionally compress the object data, and insert it for the purpose of
> testing? It's still an expensive test time-wise, but repeated
> gzip compression on zeros does still help; to that end, here's the
> pieces to add a testcase while only being <9KiB.

Interesting idea. With bzip2 it actually drops to about 600 bytes (I
suspect we could get it even smaller by writing a custom program to
generate it, but it's diminishing returns).

I think there are some other portability issues, though (like does the
receiving tar actually support 64GB sizes).

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-21 20:42       ` René Scharfe
@ 2016-06-21 20:57         ` René Scharfe
  2016-06-21 21:04           ` Jeff King
  2016-06-21 21:02         ` Jeff King
  1 sibling, 1 reply; 61+ messages in thread
From: René Scharfe @ 2016-06-21 20:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Am 21.06.2016 um 22:42 schrieb René Scharfe:
> The value 120 is magic; we need it to pass the tests.  That's
> because prepare_header() is used for building extended header
> records as well and we don't create extended headers for extended
> headers (not sure if that would work anyway), so they simply
> vanish when they're over the limit as their size field is set to
> zero.

So how about something like this to make sure extended headers are
only written for regular files and not for other extended headers?
---
 archive-tar.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index ed562d4..f53e61c 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -199,7 +199,7 @@ static void prepare_header(struct archiver_args *args,
 {
 	xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777);
 	xsnprintf(header->size, sizeof(header->size), "%011lo",
-		  S_ISREG(mode) ? ustar_size(size) : 0);
+		  S_ISREG(mode) ? size : 0);
 	xsnprintf(header->mtime, sizeof(header->mtime), "%011lo",
 		  ustar_mtime(args->time));
 
@@ -240,7 +240,7 @@ static int write_tar_entry(struct archiver_args *args,
 	struct ustar_header header;
 	struct strbuf ext_header = STRBUF_INIT;
 	unsigned int old_mode = mode;
-	unsigned long size;
+	unsigned long size, size_in_header;
 	void *buffer;
 	int err = 0;
 
@@ -299,12 +299,14 @@ static int write_tar_entry(struct archiver_args *args,
 			memcpy(header.linkname, buffer, size);
 	}
 
-	if (S_ISREG(mode) && ustar_size(size) != size)
+	size_in_header = S_ISREG(mode) ? ustar_size(size) : size;
+	if (size_in_header != size)
 		strbuf_append_ext_header_uint(&ext_header, "size", size);
+
 	if (ustar_mtime(args->time) != args->time)
 		strbuf_append_ext_header_uint(&ext_header, "mtime", args->time);
 
-	prepare_header(args, &header, mode, size);
+	prepare_header(args, &header, mode, size_in_header);
 
 	if (ext_header.len > 0) {
 		err = write_extended_header(args, sha1, ext_header.buf,
-- 
2.9.0



^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-21 20:42       ` René Scharfe
  2016-06-21 20:57         ` René Scharfe
@ 2016-06-21 21:02         ` Jeff King
  2016-06-22  5:46           ` René Scharfe
  1 sibling, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-21 21:02 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On Tue, Jun 21, 2016 at 10:42:52PM +0200, René Scharfe wrote:

> >> If we could set the limit to a lower value than 8GB for testing then we
> >> could at least check if the extended header is written, e.g. if ustar_size()
> >> could be convinced to return 0 every time using a hidden command line
> >> parameter or an environment variable or something better.
> > 
> > Yes, we could do that, though I think it loses most of the value of the
> > test. We can check that if we hit an arbitrary value we generate the pax
> > header, but I think what we _really_ care about is: did we generate an
> > output that somebody else's tar implementation can handle.
> 
> I agree with the last point, but don't see how that diminishes the
> value of such a test.  If we provide file sizes only through extended
> headers (the normal header field being set to 0) and we can extract
> files with correct sizes then tar must have interpreted those header
> as intended, right?

The diminished value is:

  1. This is a situation that doesn't actually happen in real life.

  2. Now we're carrying extra code inside git only for the sake of
     testing (which can have its own bugs, etc).

Still, it may be better than nothing.

> -- >8 --
> Subject: archive-tar: test creation of pax extended size headers
> 
> ---
> The value 120 is magic; we need it to pass the tests.  That's
> because prepare_header() is used for building extended header
> records as well and we don't create extended headers for extended
> headers (not sure if that would work anyway), so they simply
> vanish when they're over the limit as their size field is set to
> zero.

Right, so this is sort of what I meant in (2). Now we have a
tar.ustarsizemax setting shipped in git that is totally broken if you
set it to "1".

I can live with it as a tradeoff, but it is definitely a negative IMHO.

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-21 20:57         ` René Scharfe
@ 2016-06-21 21:04           ` Jeff King
  2016-06-22  5:46             ` René Scharfe
  0 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-21 21:04 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On Tue, Jun 21, 2016 at 10:57:43PM +0200, René Scharfe wrote:

> Am 21.06.2016 um 22:42 schrieb René Scharfe:
> > The value 120 is magic; we need it to pass the tests.  That's
> > because prepare_header() is used for building extended header
> > records as well and we don't create extended headers for extended
> > headers (not sure if that would work anyway), so they simply
> > vanish when they're over the limit as their size field is set to
> > zero.
> 
> So how about something like this to make sure extended headers are
> only written for regular files and not for other extended headers?

This is quite similar to what I wrote originally, but I moved to the
ustar_size() format to better match the mtime code (which needs
something like that, because we pass around args->time).

I think you could drop ustar_size() completely here and just put the
"if" into write_tar_entry().

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-21 21:02         ` Jeff King
@ 2016-06-22  5:46           ` René Scharfe
  2016-06-23 19:21             ` Jeff King
  0 siblings, 1 reply; 61+ messages in thread
From: René Scharfe @ 2016-06-22  5:46 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Am 21.06.2016 um 23:02 schrieb Jeff King:
> On Tue, Jun 21, 2016 at 10:42:52PM +0200, René Scharfe wrote:
>
>>>> If we could set the limit to a lower value than 8GB for testing then we
>>>> could at least check if the extended header is written, e.g. if ustar_size()
>>>> could be convinced to return 0 every time using a hidden command line
>>>> parameter or an environment variable or something better.
>>>
>>> Yes, we could do that, though I think it loses most of the value of the
>>> test. We can check that if we hit an arbitrary value we generate the pax
>>> header, but I think what we _really_ care about is: did we generate an
>>> output that somebody else's tar implementation can handle.
>>
>> I agree with the last point, but don't see how that diminishes the
>> value of such a test.  If we provide file sizes only through extended
>> headers (the normal header field being set to 0) and we can extract
>> files with correct sizes then tar must have interpreted those header
>> as intended, right?
>
> The diminished value is:
>
>    1. This is a situation that doesn't actually happen in real life.
>
>    2. Now we're carrying extra code inside git only for the sake of
>       testing (which can have its own bugs, etc).
>
> Still, it may be better than nothing.
>
>> -- >8 --
>> Subject: archive-tar: test creation of pax extended size headers
>>
>> ---
>> The value 120 is magic; we need it to pass the tests.  That's
>> because prepare_header() is used for building extended header
>> records as well and we don't create extended headers for extended
>> headers (not sure if that would work anyway), so they simply
>> vanish when they're over the limit as their size field is set to
>> zero.
>
> Right, so this is sort of what I meant in (2). Now we have a
> tar.ustarsizemax setting shipped in git that is totally broken if you
> set it to "1".
>
> I can live with it as a tradeoff, but it is definitely a negative IMHO.

Yes, it's only useful as a debug flag, but the fact that it breaks 
highlights a (admittedly mostly theoretical) shortcoming: The code 
doesn't handle extended headers that are over the size limit as nicely 
as it could.  So the test was already worth it, even if won't land in 
master. :)

René


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-21 21:04           ` Jeff King
@ 2016-06-22  5:46             ` René Scharfe
  0 siblings, 0 replies; 61+ messages in thread
From: René Scharfe @ 2016-06-22  5:46 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Am 21.06.2016 um 23:04 schrieb Jeff King:
> On Tue, Jun 21, 2016 at 10:57:43PM +0200, René Scharfe wrote:
> 
>> Am 21.06.2016 um 22:42 schrieb René Scharfe:
>>> The value 120 is magic; we need it to pass the tests.  That's
>>> because prepare_header() is used for building extended header
>>> records as well and we don't create extended headers for extended
>>> headers (not sure if that would work anyway), so they simply
>>> vanish when they're over the limit as their size field is set to
>>> zero.
>>
>> So how about something like this to make sure extended headers are
>> only written for regular files and not for other extended headers?
> 
> This is quite similar to what I wrote originally, but I moved to the
> ustar_size() format to better match the mtime code (which needs
> something like that, because we pass around args->time).
> 
> I think you could drop ustar_size() completely here and just put the
> "if" into write_tar_entry().

Which would look like this:

---
 archive-tar.c | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index cb99df2..274bdfa 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
 	strbuf_addch(sb, '\n');
 }
 
+/*
+ * Like strbuf_append_ext_header, but for numeric values.
+ */
+static void strbuf_append_ext_header_uint(struct strbuf *sb,
+					  const char *keyword,
+					  uintmax_t value)
+{
+	char buf[40]; /* big enough for 2^128 in decimal, plus NUL */
+	int len;
+
+	len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value);
+	strbuf_append_ext_header(sb, keyword, buf, len);
+}
+
 static unsigned int ustar_header_chksum(const struct ustar_header *header)
 {
 	const unsigned char *p = (const unsigned char *)header;
@@ -208,7 +222,7 @@ static int write_tar_entry(struct archiver_args *args,
 	struct ustar_header header;
 	struct strbuf ext_header = STRBUF_INIT;
 	unsigned int old_mode = mode;
-	unsigned long size;
+	unsigned long size, size_in_header;
 	void *buffer;
 	int err = 0;
 
@@ -267,7 +281,13 @@ static int write_tar_entry(struct archiver_args *args,
 			memcpy(header.linkname, buffer, size);
 	}
 
-	prepare_header(args, &header, mode, size);
+	size_in_header = size;
+	if (S_ISREG(mode) && size > 077777777777UL) {
+		size_in_header = 0;
+		strbuf_append_ext_header_uint(&ext_header, "size", size);
+	}
+
+	prepare_header(args, &header, mode, size_in_header);
 
 	if (ext_header.len > 0) {
 		err = write_extended_header(args, sha1, ext_header.buf,
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime
  2016-06-20 22:54   ` René Scharfe
@ 2016-06-22  5:46     ` René Scharfe
  2016-06-23 19:22       ` Jeff King
  0 siblings, 1 reply; 61+ messages in thread
From: René Scharfe @ 2016-06-22  5:46 UTC (permalink / raw)
  To: Jeff King, git

Am 21.06.2016 um 00:54 schrieb René Scharfe:
> Am 16.06.2016 um 06:37 schrieb Jeff King:
>>    2. System tars that cannot handle pax headers.
>
> In t5000 there is a simple interpreter for path headers for systems
> whose tar doesn't handle them.  Adding one for mtime headers may be
> feasible.
>
> It's just a bit complicated to link a pax header file to the file it
> applies to when it doesn't also contain a path header.  But we know that
> the mtime of all entries in tar files created by git archive are is the
> same, so we could simply read one header and then adjust the mtime of
> all files accordingly.

This brings me to the idea of using a single global pax header for mtime 
instead of one per entry.  It reduces the size of the archive and allows 
for slightly easier testing -- it just fits better since we know that 
all our mtimes are the same.

René


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-22  5:46           ` René Scharfe
@ 2016-06-23 19:21             ` Jeff King
  0 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-23 19:21 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On Wed, Jun 22, 2016 at 07:46:13AM +0200, René Scharfe wrote:

> Yes, it's only useful as a debug flag, but the fact that it breaks
> highlights a (admittedly mostly theoretical) shortcoming: The code doesn't
> handle extended headers that are over the size limit as nicely as it could.
> So the test was already worth it, even if won't land in master. :)

Kind of. It was impossible to trigger in the original (and we still
don't actually handle it in the revised version; we just die in
xsnprintf).

But still, I'll go with the simpler thing we've discussed here. The
symmetry with ustar_mtime isn't worth it, and doubly so if we just write
a single pax header.

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime
  2016-06-22  5:46     ` René Scharfe
@ 2016-06-23 19:22       ` Jeff King
  2016-06-23 21:38         ` René Scharfe
  0 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-23 19:22 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On Wed, Jun 22, 2016 at 07:46:25AM +0200, René Scharfe wrote:

> Am 21.06.2016 um 00:54 schrieb René Scharfe:
> > Am 16.06.2016 um 06:37 schrieb Jeff King:
> > >    2. System tars that cannot handle pax headers.
> > 
> > In t5000 there is a simple interpreter for path headers for systems
> > whose tar doesn't handle them.  Adding one for mtime headers may be
> > feasible.
> > 
> > It's just a bit complicated to link a pax header file to the file it
> > applies to when it doesn't also contain a path header.  But we know that
> > the mtime of all entries in tar files created by git archive are is the
> > same, so we could simply read one header and then adjust the mtime of
> > all files accordingly.
> 
> This brings me to the idea of using a single global pax header for mtime
> instead of one per entry.  It reduces the size of the archive and allows for
> slightly easier testing -- it just fits better since we know that all our
> mtimes are the same.

Yeah, I had a similar thought while writing it, but wasn't quite sure
how that was supposed to be formatted. I modeled my output after what
GNU tar writes, but of course they are expecting a different mtime for
each file.

I'll look into how global pax headers should look.

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime
  2016-06-23 19:22       ` Jeff King
@ 2016-06-23 21:38         ` René Scharfe
  2016-06-23 21:39           ` Jeff King
  0 siblings, 1 reply; 61+ messages in thread
From: René Scharfe @ 2016-06-23 21:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Am 23.06.2016 um 21:22 schrieb Jeff King:
> On Wed, Jun 22, 2016 at 07:46:25AM +0200, René Scharfe wrote:
>
>> Am 21.06.2016 um 00:54 schrieb René Scharfe:
>>> Am 16.06.2016 um 06:37 schrieb Jeff King:
>>>>     2. System tars that cannot handle pax headers.
>>>
>>> In t5000 there is a simple interpreter for path headers for systems
>>> whose tar doesn't handle them.  Adding one for mtime headers may be
>>> feasible.
>>>
>>> It's just a bit complicated to link a pax header file to the file it
>>> applies to when it doesn't also contain a path header.  But we know that
>>> the mtime of all entries in tar files created by git archive are is the
>>> same, so we could simply read one header and then adjust the mtime of
>>> all files accordingly.
>>
>> This brings me to the idea of using a single global pax header for mtime
>> instead of one per entry.  It reduces the size of the archive and allows for
>> slightly easier testing -- it just fits better since we know that all our
>> mtimes are the same.
>
> Yeah, I had a similar thought while writing it, but wasn't quite sure
> how that was supposed to be formatted. I modeled my output after what
> GNU tar writes, but of course they are expecting a different mtime for
> each file.
>
> I'll look into how global pax headers should look.

Moving the strbuf_append_ext_header() call into 
write_global_extended_header() should be enough.  The changes to the 
test script are a bit more interesting, though.  Perhaps I can come up 
with something over the weekend.

René

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime
  2016-06-23 21:38         ` René Scharfe
@ 2016-06-23 21:39           ` Jeff King
  0 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-23 21:39 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On Thu, Jun 23, 2016 at 11:38:19PM +0200, René Scharfe wrote:

> > Yeah, I had a similar thought while writing it, but wasn't quite sure
> > how that was supposed to be formatted. I modeled my output after what
> > GNU tar writes, but of course they are expecting a different mtime for
> > each file.
> > 
> > I'll look into how global pax headers should look.
> 
> Moving the strbuf_append_ext_header() call into
> write_global_extended_header() should be enough.  The changes to the test
> script are a bit more interesting, though.  Perhaps I can come up with
> something over the weekend.

Yeah, I have the code itself working now. I'm just slogging through
writing tests that are efficient and portable. I hope to have something
out soon.

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v3] friendlier handling of overflows in archive-tar
  2016-06-21 16:16 ` Jeff King
                     ` (2 preceding siblings ...)
  2016-06-21 18:43   ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano
@ 2016-06-23 23:15   ` Jeff King
  2016-06-23 23:20     ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King
                       ` (3 more replies)
  3 siblings, 4 replies; 61+ messages in thread
From: Jeff King @ 2016-06-23 23:15 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson

This is a replacement for the jk/big-and-future-archive-tar topic. It's
sufficiently rewritten that I won't bother with an interdiff, but rather
just describe the changes below. Thanks to René for talking through some
of the issues, and to Robin for suggesting the double-compression trick
for the tests.

The important changes here are:

  - the "size" fix is more localized to write_tar_entry(). This results
    in shorter code, and means that if one were to somehow write an
    extended header larger than 8GB, we would notice and die(), rather
    than just writing a "0" (though such a thing should not be
    possible).

  - the mtime is now written only in a single global header, rather than
    once per file. I checked POSIX and this is explicitly allowed, and
    of course I also checked that it works against GNU tar.

  - there are now tests! I tried hard to make them portable and
    efficient. See the first patch for details.

  - the final patch is just a cleanup I noticed in the area; it could be
    done independently, but I floated it to the end in case it's
    controversial.

The patches are:

  [1/4]: t5000: test tar files that overflow ustar headers
  [2/4]: archive-tar: write extended headers for file sizes >= 8GB
  [3/4]: archive-tar: write extended headers for far-future mtime
  [4/4]: archive-tar: drop return value

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-23 23:15   ` [PATCH v3] " Jeff King
@ 2016-06-23 23:20     ` Jeff King
  2016-06-23 23:31       ` Jeff King
                         ` (2 more replies)
  2016-06-23 23:21     ` [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB Jeff King
                       ` (2 subsequent siblings)
  3 siblings, 3 replies; 61+ messages in thread
From: Jeff King @ 2016-06-23 23:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson

The ustar format only has room for 11 (or 12, depending on
some implementations) octal digits for the size and mtime of
each file. After this, we have to add pax extended headers
to specify the real data, and git does not yet know how to
do so.

Before fixing that, let's start off with some test
infrastructure, as designing portable and efficient tests
for this is non-trivial.

We want to use the system tar to check our output (because
what we really care about is interoperability), but we can't
rely on it:

  1. being able to read pax headers

  2. being able to handle huge sizes or mtimes

  3. supporting a "t" format we can parse

So as a prerequisite, we can feed the system tar a reference
tarball to make sure it can handle these features. The
reference tar here was created with:

  dd if=/dev/zero seek=64G bs=1 count=1 of=huge
  touch -d @68719476737 huge
  tar cf - --format=pax |
  head -c 2048

using GNU tar. Note that this is not a complete tarfile, but
it's enough to contain the headers we want to examine.

Likewise, we need to convince git that it has a 64GB blob to
output. Running "git add" on that 64GB file takes many
minutes of CPU, and even compressed, the result is 64MB. So
again, I pre-generated that loose object, and then used
bzip2 on the result, which shrinks it to a few hundred
bytes.  Unfortunately, we do still inflate it to 64MB on
disk briefly while the test is running.

The tests are split so that we test as much as we can even
with an uncooperative system tar. This actually catches the
current breakage (which is that we die("BUG") trying to
write the ustar header) on every system, and then on systems
where we can, we go farther and actually verify the result.

Helped-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Jeff King <peff@peff.net>
---
I'm still not excited about the 64MB write, just because it's awfully
heavyweight for such a trivial test. It runs pretty fast on my RAM disk,
but maybe not on other people's system.

I considered but didn't explore two other options:

  1. I couldn't convince zlib to write a smaller file (this is done with
     core.compression=9). But I'm not sure if that's inherent to the
     on-disk format, or simply the maximum size of a deflate block.

     So it's possible that one could hand-roll zlib data that says "I'm
     64GB" but is only a few bytes long.

  2. We don't ever want to see the whole 64GB, of course; we want to
     stream it out and only care about the header (as an aside, this
     makes a wonderful test that we are hitting the streaming code path,
     as it's unlikely to work without it :) ).

     So another option would be to include a truncated file that claims
     to be 64GB, and has only the first 256kb or something worth of data
     (which should deflate down to almost nothing).

     git-fsck wouldn't work, of course, but we don't need to run it.
     Other bits of git might complain, but our plan is for git to get
     SIGPIPE before hitting that point anyway.

     So that seems pretty easy, but it is potentially flaky.

 t/t5000-tar-tree.sh                                |  73 +++++++++++++++++++++
 .../19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2   | Bin 0 -> 578 bytes
 t/t5000/huge-and-future.tar                        | Bin 0 -> 2048 bytes
 3 files changed, 73 insertions(+)
 create mode 100644 t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2
 create mode 100644 t/t5000/huge-and-future.tar

diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 4b68bba..e7c9271 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -319,4 +319,77 @@ test_expect_success 'catch non-matching pathspec' '
 	test_must_fail git archive -v HEAD -- "*.abc" >/dev/null
 '
 
+# See if our system tar can handle a tar file with huge sizes and dates far in
+# the future, and that we can actually parse its output.
+#
+# The reference file was generated by GNU tar, and the magic time and size are
+# both octal 01000000000001, which overflows normal ustar fields.
+#
+# When parsing, we'll pull out only the year from the date; that
+# avoids any question of timezones impacting the result. The output
+# of tar_info is expected to be "<size> <year>", both in decimal. It ignores
+# the return value of tar. We have to do this, because our reference file is
+# only a partial (the whole thing would be 64GB!).
+tar_info () {
+	"$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1
+}
+test_lazy_prereq TAR_HUGE '
+	echo "68719476737 4147" >expect &&
+	tar_info "$TEST_DIRECTORY"/t5000/huge-and-future.tar >actual &&
+	test_cmp expect actual
+'
+
+# Likewise, we need bunzip for the 64GB git object.
+test_lazy_prereq BUNZIP '
+	bunzip2 --version
+'
+
+test_expect_success BUNZIP 'set up repository with huge blob' '
+	obj_d=19 &&
+	obj_f=f9c8273ec45a8938e6999cb59b3ff66739902a &&
+	obj=${obj_d}${obj_f} &&
+	mkdir -p .git/objects/$obj_d &&
+	bunzip2 -c \
+		<"$TEST_DIRECTORY"/t5000/$obj.bz2 \
+		>.git/objects/$obj_d/$obj_f &&
+	rm -f .git/index &&
+	git update-index --add --cacheinfo 100644,$obj,huge &&
+	git commit -m huge
+'
+
+# We expect git to die with SIGPIPE here (otherwise we
+# would generate the whole 64GB).
+test_expect_failure BUNZIP 'generate tar with huge size' '
+	{
+		git archive HEAD
+		echo $? >exit-code
+	} | head -c 4096 >huge.tar &&
+	echo 141 >expect &&
+	test_cmp expect exit-code
+'
+
+test_expect_failure BUNZIP,TAR_HUGE 'system tar can read our huge size' '
+	echo 68719476737 >expect &&
+	tar_info huge.tar | cut -d" " -f1 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'set up repository with far-future commit' '
+	rm -f .git/index &&
+	echo content >file &&
+	git add file &&
+	GIT_COMMITTER_DATE="@68719476737 +0000" \
+		git commit -m "tempori parendum"
+'
+
+test_expect_failure 'generate tar with future mtime' '
+	git archive HEAD >future.tar
+'
+
+test_expect_failure TAR_HUGE 'tar can encode dates far in future' '
+	echo 4147 >expect &&
+	tar_info future.tar | cut -d" " -f2 >actual &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2 b/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2
new file mode 100644
index 0000000000000000000000000000000000000000..9619fd3c5f6f345a40aba1b91807bb4d937fc51c
GIT binary patch
literal 578
zcmV-I0=@l0T4*^jL0KkKS$mI#Bmk}@egE{{00qbxVL(9uNQgj6FakgbL{P{8umA%P
z1cU%-zyaKJ3Zno3&;S`U000^T007Vc88iR@8UZy`L7)Mk0iXZ?8a+UzR8!JvwH}iK
zXvkv}!$8#Dpwk&jhdD}y=}Ly#N{6=5N`@-9@zf#~Y}ml6A5xVKIEqv?6sUZ+ic~-0
z3bD}&yn;flel6M|SGh`t%>^nMd9)O$Wu&Noa%<D9r9(fbH_W9&qsPxi@k)lGl@5I+
zDj0prR6BE&sCksAbrh&!tffP{F-nI-prpRlLJ+$#Qc)DvQlXBLp@B+<1u7d*3ZhIX
zg<RA^t&a>SgnP<VG8CwK>ira`Ve7_Hm1V`0kgE@zR6?%0tYir&brh(3dz7ebDNy(^
zL0KkKS?}&rK>-rd{r~jXFbzl$fdD`NU<g16aRNX9Kv2K{umFG%1O@;pzyY`b3;;A}
zXu>jRG-;qPzyn5xj3Xw4Mw$Zz)lx%14F-mTL6atf)Gz?ip`!@NpwXs)%#?y?q!Xf$
zP8vZs=>+{8*(#7u93>!3K5|fsT$F-(VMr#rK{}}f`lJ&zAez-6nY%u7(h0VZPg+4e
zX$0a(CY`i`W~3AAPBemQq!ZGRP7dlpJ?RA1nvhKtf@|4GCaOU@$`MOSl~R?f3Ia)6
zDFpYV6Llb)sRYUuLMe6QNGF9Ln9>Q3G=gtZ37;(>om7H;BohijH-{QkNGI(epUDKq
QkWcY<BvXY60gn}ckTCG&H~;_u

literal 0
HcmV?d00001

diff --git a/t/t5000/huge-and-future.tar b/t/t5000/huge-and-future.tar
new file mode 100644
index 0000000000000000000000000000000000000000..63155e185585c7589b0db1d6da199bfa27f517c7
GIT binary patch
literal 2048
zcmeHH%L;=q5X{-H$QS6YCRsi7-eZ3uw6XOd6dxe{`X*^rs7fzVia3{DW=WVGG6|!T
z?v6%ZOjU;{l%nX?UJY9lV4;Lyu3CInz(g<_!2ppkX1rTd#L``D-RR0nTAFX1kAc_4
z!yHsfm<dvpP!J<8o1&bMdO{|^&z|%z2c<+6LM8=4Dk<2wb(>gk^{~&l;zIw<Ka%wM
q@2eX*^nb#uM^s?*|C3Di`M;YypV2;0-{yXeagpKN-s}$iu>((@uRaR^

literal 0
HcmV?d00001

-- 
2.9.0.217.g096ca68


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-23 23:15   ` [PATCH v3] " Jeff King
  2016-06-23 23:20     ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King
@ 2016-06-23 23:21     ` Jeff King
  2016-06-24 19:01       ` Junio C Hamano
  2016-06-23 23:21     ` [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime Jeff King
  2016-06-23 23:21     ` [PATCH v3 4/4] archive-tar: drop return value Jeff King
  3 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-23 23:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson

The ustar format has a fixed-length field for the size of
each file entry which is supposed to contain up to 11 bytes
of octal-formatted data plus a NUL or space terminator.

These means that the largest size we can represent is
077777777777, or 1 byte short of 8GB. The correct solution
for a larger file, according to POSIX.1-2001, is to add an
extended pax header, similar to how we handle long
filenames. This patch does that, and writes zero for the
size field in the ustar header (the last bit is not
mentioned by POSIX, but it matches how GNU tar behaves with
--format=pax).

This should be a strict improvement over the current
behavior, which is to die in xsnprintf with a "BUG".
However, there's some interesting history here.

Prior to f2f0267 (archive-tar: use xsnprintf for trivial
formatting, 2015-09-24), we silently overflowed the "size"
field. The extra bytes ended up in the "mtime" field of the
header, which was then immediately written itself,
overwriting our extra bytes. What that means depends on how
many bytes we wrote.

If the size was 64GB or greater, then we actually overflowed
digits into the mtime field, meaning our value was was
effectively right-shifted by those lost octal digits. And
this patch is again a strict improvement over that.

But if the size was between 8GB and 64GB, then our 12-byte
field held all of the actual digits, and only our NUL
terminator overflowed. According to POSIX, there should be a
NUL or space at the end of the field. However, GNU tar seems
to be lenient here, and will correctly parse a size up 64GB
(minus one) from the field. So sizes in this range might
have just worked, depending on the implementation reading
the tarfile.

This patch is mostly still an improvement there, as the 8GB
limit is specifically mentioned in POSIX as the correct
limit. But it's possible that it could be a regression
(versus the pre-f2f0267 state) if all of the following are
true:

  1. You have a file between 8GB and 64GB.

  2. Your tar implementation _doesn't_ know about pax
     extended headers.

  3. Your tar implementation _does_ parse 12-byte sizes from
     the ustar header without a delimiter.

It's probably not worth worrying about such an obscure set
of conditions, but I'm documenting it here just in case.

Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
---
 archive-tar.c       | 24 ++++++++++++++++++++++--
 t/t5000-tar-tree.sh |  4 ++--
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index cb99df2..274bdfa 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
 	strbuf_addch(sb, '\n');
 }
 
+/*
+ * Like strbuf_append_ext_header, but for numeric values.
+ */
+static void strbuf_append_ext_header_uint(struct strbuf *sb,
+					  const char *keyword,
+					  uintmax_t value)
+{
+	char buf[40]; /* big enough for 2^128 in decimal, plus NUL */
+	int len;
+
+	len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value);
+	strbuf_append_ext_header(sb, keyword, buf, len);
+}
+
 static unsigned int ustar_header_chksum(const struct ustar_header *header)
 {
 	const unsigned char *p = (const unsigned char *)header;
@@ -208,7 +222,7 @@ static int write_tar_entry(struct archiver_args *args,
 	struct ustar_header header;
 	struct strbuf ext_header = STRBUF_INIT;
 	unsigned int old_mode = mode;
-	unsigned long size;
+	unsigned long size, size_in_header;
 	void *buffer;
 	int err = 0;
 
@@ -267,7 +281,13 @@ static int write_tar_entry(struct archiver_args *args,
 			memcpy(header.linkname, buffer, size);
 	}
 
-	prepare_header(args, &header, mode, size);
+	size_in_header = size;
+	if (S_ISREG(mode) && size > 077777777777UL) {
+		size_in_header = 0;
+		strbuf_append_ext_header_uint(&ext_header, "size", size);
+	}
+
+	prepare_header(args, &header, mode, size_in_header);
 
 	if (ext_header.len > 0) {
 		err = write_extended_header(args, sha1, ext_header.buf,
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index e7c9271..79dbc88 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -359,7 +359,7 @@ test_expect_success BUNZIP 'set up repository with huge blob' '
 
 # We expect git to die with SIGPIPE here (otherwise we
 # would generate the whole 64GB).
-test_expect_failure BUNZIP 'generate tar with huge size' '
+test_expect_success BUNZIP 'generate tar with huge size' '
 	{
 		git archive HEAD
 		echo $? >exit-code
@@ -368,7 +368,7 @@ test_expect_failure BUNZIP 'generate tar with huge size' '
 	test_cmp expect exit-code
 '
 
-test_expect_failure BUNZIP,TAR_HUGE 'system tar can read our huge size' '
+test_expect_success BUNZIP,TAR_HUGE 'system tar can read our huge size' '
 	echo 68719476737 >expect &&
 	tar_info huge.tar | cut -d" " -f1 >actual &&
 	test_cmp expect actual
-- 
2.9.0.217.g096ca68


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime
  2016-06-23 23:15   ` [PATCH v3] " Jeff King
  2016-06-23 23:20     ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King
  2016-06-23 23:21     ` [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB Jeff King
@ 2016-06-23 23:21     ` Jeff King
  2016-06-24 19:06       ` Junio C Hamano
  2016-06-23 23:21     ` [PATCH v3 4/4] archive-tar: drop return value Jeff King
  3 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-23 23:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson

The ustar format represents timestamps as seconds since the
epoch, but only has room to store 11 octal digits.  To
express anything larger, we need to use an extended header.
This is exactly the same case we fixed for the size field in
the previous commit, and the solution here follows the same
pattern.

This is even mentioned as an issue in f2f0267 (archive-tar:
use xsnprintf for trivial formatting, 2015-09-24), but since
it only affected things far in the future, it wasn't deemed
worth dealing with. But note that my calculations claiming
thousands of years were off there; because our xsnprintf
produces a NUL byte, we only have until the year 2242 to fix
this.

Given that this is just around the corner (geologically
speaking, anyway), and because it's easy to fix, let's just
make it work. Unlike the previous fix for "size", where we
had to write an individual extended header for each file, we
can write one global header (since we have only one mtime
for the whole archive).

There's a slight bit of trickiness there. We may already be
writing a global header with a "comment" field for the
commit sha1. So we need to write our new field into the same
header. To do this, we push the decision of whether to write
such a header down into write_global_extended_header(),
which will now assemble the header as it sees fit, and will
return early if we have nothing to write (in practice, we'll
only have a large mtime if it comes from a commit, but this
makes it also work if you set your system clock ahead such
that time() returns a huge value).

Note that we don't (and never did) handle negative
timestamps (i.e., before 1970). This would probably not be
too hard to support in the same way, but since git does not
support negative timestamps at all, I didn't bother here.

After writing the extended header, we munge the timestamp in
the ustar headers to the maximum-allowable size. This is
wrong, but it's the least-wrong thing we can provide to a
tar implementation that doesn't understand pax headers (it's
also what GNU tar does).

Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
---
 archive-tar.c       | 16 +++++++++++++---
 t/t5000-tar-tree.sh |  4 ++--
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index 274bdfa..0bb164c 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -317,7 +317,18 @@ static int write_global_extended_header(struct archiver_args *args)
 	unsigned int mode;
 	int err = 0;
 
-	strbuf_append_ext_header(&ext_header, "comment", sha1_to_hex(sha1), 40);
+	if (sha1)
+		strbuf_append_ext_header(&ext_header, "comment",
+					 sha1_to_hex(sha1), 40);
+	if (args->time > 077777777777UL) {
+		strbuf_append_ext_header_uint(&ext_header, "mtime",
+					      args->time);
+		args->time = 077777777777UL;
+	}
+
+	if (!ext_header.len)
+		return 0;
+
 	memset(&header, 0, sizeof(header));
 	*header.typeflag = TYPEFLAG_GLOBAL_HEADER;
 	mode = 0100666;
@@ -402,8 +413,7 @@ static int write_tar_archive(const struct archiver *ar,
 {
 	int err = 0;
 
-	if (args->commit_sha1)
-		err = write_global_extended_header(args);
+	err = write_global_extended_header(args);
 	if (!err)
 		err = write_archive_entries(args, write_tar_entry);
 	if (!err)
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 79dbc88..07e0bdc 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -382,11 +382,11 @@ test_expect_success 'set up repository with far-future commit' '
 		git commit -m "tempori parendum"
 '
 
-test_expect_failure 'generate tar with future mtime' '
+test_expect_success 'generate tar with future mtime' '
 	git archive HEAD >future.tar
 '
 
-test_expect_failure TAR_HUGE 'tar can encode dates far in future' '
+test_expect_success TAR_HUGE 'tar can encode dates far in future' '
 	echo 4147 >expect &&
 	tar_info future.tar | cut -d" " -f2 >actual &&
 	test_cmp expect actual
-- 
2.9.0.217.g096ca68


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH v3 4/4] archive-tar: drop return value
  2016-06-23 23:15   ` [PATCH v3] " Jeff King
                       ` (2 preceding siblings ...)
  2016-06-23 23:21     ` [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime Jeff King
@ 2016-06-23 23:21     ` Jeff King
  2016-06-24 11:49       ` Remi Galan Alfonso
  3 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-23 23:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson

We never do any error checks, and so never return anything
but "0". Let's just drop this to simplify the code.

Signed-off-by: Jeff King <peff@peff.net>
---
I wasn't sure if this was perhaps kept as an interface decision, in case
the function grew errors later on. If so, it can still drop the "err"
variable internally. :)

 archive-tar.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index 0bb164c..903b74d 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -309,13 +309,12 @@ static int write_tar_entry(struct archiver_args *args,
 	return err;
 }
 
-static int write_global_extended_header(struct archiver_args *args)
+static void write_global_extended_header(struct archiver_args *args)
 {
 	const unsigned char *sha1 = args->commit_sha1;
 	struct strbuf ext_header = STRBUF_INIT;
 	struct ustar_header header;
 	unsigned int mode;
-	int err = 0;
 
 	if (sha1)
 		strbuf_append_ext_header(&ext_header, "comment",
@@ -327,7 +326,7 @@ static int write_global_extended_header(struct archiver_args *args)
 	}
 
 	if (!ext_header.len)
-		return 0;
+		return;
 
 	memset(&header, 0, sizeof(header));
 	*header.typeflag = TYPEFLAG_GLOBAL_HEADER;
@@ -337,7 +336,6 @@ static int write_global_extended_header(struct archiver_args *args)
 	write_blocked(&header, sizeof(header));
 	write_blocked(ext_header.buf, ext_header.len);
 	strbuf_release(&ext_header);
-	return err;
 }
 
 static struct archiver **tar_filters;
@@ -413,7 +411,7 @@ static int write_tar_archive(const struct archiver *ar,
 {
 	int err = 0;
 
-	err = write_global_extended_header(args);
+	write_global_extended_header(args);
 	if (!err)
 		err = write_archive_entries(args, write_tar_entry);
 	if (!err)
-- 
2.9.0.217.g096ca68

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-23 23:20     ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King
@ 2016-06-23 23:31       ` Jeff King
  2016-06-24 16:38       ` Johannes Sixt
  2016-06-24 18:56       ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Junio C Hamano
  2 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-23 23:31 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson

On Thu, Jun 23, 2016 at 07:20:44PM -0400, Jeff King wrote:

> I'm still not excited about the 64MB write, just because it's awfully
> heavyweight for such a trivial test. It runs pretty fast on my RAM disk,
> but maybe not on other people's system.
> 
> I considered but didn't explore two other options:
> 
>   1. I couldn't convince zlib to write a smaller file (this is done with
>      core.compression=9). But I'm not sure if that's inherent to the
>      on-disk format, or simply the maximum size of a deflate block.
> 
>      So it's possible that one could hand-roll zlib data that says "I'm
>      64GB" but is only a few bytes long.
> 
>   2. We don't ever want to see the whole 64GB, of course; we want to
>      stream it out and only care about the header (as an aside, this
>      makes a wonderful test that we are hitting the streaming code path,
>      as it's unlikely to work without it :) ).
> 
>      So another option would be to include a truncated file that claims
>      to be 64GB, and has only the first 256kb or something worth of data
>      (which should deflate down to almost nothing).
> 
>      git-fsck wouldn't work, of course, but we don't need to run it.
>      Other bits of git might complain, but our plan is for git to get
>      SIGPIPE before hitting that point anyway.
> 
>      So that seems pretty easy, but it is potentially flaky.

Writing that convinced me that (2) is actually quite a sane way to go.
The patch is below, which seems to work.

I arbitrarily picked the first 2048 bytes of the loose object. That's
1/32768 of the original. If we assume the compression ratio is stable
through the file (and it should be; the file is all zeroes), that should
generate 2MB of data should we need it (way more than we feed to our
"head -c" invocation).

This patch is on top of the whole series just to illustrate it. Doing it
for real will involve squashing it into the first patch (and adjusting
the commit message), and then handling the minor rebase conflicts. I'll
hold off on a re-roll until I get any comments on v3.

-Peff

---
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 07e0bdc..e542938 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -339,19 +339,12 @@ test_lazy_prereq TAR_HUGE '
 	test_cmp expect actual
 '
 
-# Likewise, we need bunzip for the 64GB git object.
-test_lazy_prereq BUNZIP '
-	bunzip2 --version
-'
-
-test_expect_success BUNZIP 'set up repository with huge blob' '
+test_expect_success 'set up repository with huge blob' '
 	obj_d=19 &&
 	obj_f=f9c8273ec45a8938e6999cb59b3ff66739902a &&
 	obj=${obj_d}${obj_f} &&
 	mkdir -p .git/objects/$obj_d &&
-	bunzip2 -c \
-		<"$TEST_DIRECTORY"/t5000/$obj.bz2 \
-		>.git/objects/$obj_d/$obj_f &&
+	cp "$TEST_DIRECTORY"/t5000/$obj .git/objects/$obj_d/$obj_f &&
 	rm -f .git/index &&
 	git update-index --add --cacheinfo 100644,$obj,huge &&
 	git commit -m huge
@@ -359,7 +352,7 @@ test_expect_success BUNZIP 'set up repository with huge blob' '
 
 # We expect git to die with SIGPIPE here (otherwise we
 # would generate the whole 64GB).
-test_expect_success BUNZIP 'generate tar with huge size' '
+test_expect_success 'generate tar with huge size' '
 	{
 		git archive HEAD
 		echo $? >exit-code
@@ -368,7 +361,7 @@ test_expect_success BUNZIP 'generate tar with huge size' '
 	test_cmp expect exit-code
 '
 
-test_expect_success BUNZIP,TAR_HUGE 'system tar can read our huge size' '
+test_expect_success TAR_HUGE 'system tar can read our huge size' '
 	echo 68719476737 >expect &&
 	tar_info huge.tar | cut -d" " -f1 >actual &&
 	test_cmp expect actual
diff --git a/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a b/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a
new file mode 100644
index 0000000000000000000000000000000000000000..5cbe9ec312bfd7b7e0398ca281e9d42848743704
GIT binary patch
literal 2048
zcmb=p_2!@<FM|RPgTa!7MhCMEn`g89mv_5$t@dj2a|><dbTfugFd71*Aut*OqaiRF
L0;3@?%t8PFWtt9Z

literal 0
HcmV?d00001

diff --git a/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2 b/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2
deleted file mode 100644
index 9619fd3c5f6f345a40aba1b91807bb4d937fc51c..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 578
zcmV-I0=@l0T4*^jL0KkKS$mI#Bmk}@egE{{00qbxVL(9uNQgj6FakgbL{P{8umA%P
z1cU%-zyaKJ3Zno3&;S`U000^T007Vc88iR@8UZy`L7)Mk0iXZ?8a+UzR8!JvwH}iK
zXvkv}!$8#Dpwk&jhdD}y=}Ly#N{6=5N`@-9@zf#~Y}ml6A5xVKIEqv?6sUZ+ic~-0
z3bD}&yn;flel6M|SGh`t%>^nMd9)O$Wu&Noa%<D9r9(fbH_W9&qsPxi@k)lGl@5I+
zDj0prR6BE&sCksAbrh&!tffP{F-nI-prpRlLJ+$#Qc)DvQlXBLp@B+<1u7d*3ZhIX
zg<RA^t&a>SgnP<VG8CwK>ira`Ve7_Hm1V`0kgE@zR6?%0tYir&brh(3dz7ebDNy(^
zL0KkKS?}&rK>-rd{r~jXFbzl$fdD`NU<g16aRNX9Kv2K{umFG%1O@;pzyY`b3;;A}
zXu>jRG-;qPzyn5xj3Xw4Mw$Zz)lx%14F-mTL6atf)Gz?ip`!@NpwXs)%#?y?q!Xf$
zP8vZs=>+{8*(#7u93>!3K5|fsT$F-(VMr#rK{}}f`lJ&zAez-6nY%u7(h0VZPg+4e
zX$0a(CY`i`W~3AAPBemQq!ZGRP7dlpJ?RA1nvhKtf@|4GCaOU@$`MOSl~R?f3Ia)6
zDFpYV6Llb)sRYUuLMe6QNGF9Ln9>Q3G=gtZ37;(>om7H;BohijH-{QkNGI(epUDKq
QkWcY<BvXY60gn}ckTCG&H~;_u


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 4/4] archive-tar: drop return value
  2016-06-23 23:21     ` [PATCH v3 4/4] archive-tar: drop return value Jeff King
@ 2016-06-24 11:49       ` Remi Galan Alfonso
  2016-06-24 13:13         ` Jeff King
  0 siblings, 1 reply; 61+ messages in thread
From: Remi Galan Alfonso @ 2016-06-24 11:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

Hi Peff,

Jeff King <peff@peff.net> writes:
> @@ -413,7 +411,7 @@ static int write_tar_archive(const struct archiver *ar,
>  {
>          int err = 0;
>  
> -        err = write_global_extended_header(args);
> +        write_global_extended_header(args);
>          if (!err)
>                  err = write_archive_entries(args, write_tar_entry);

If we drop the error code from 'write_global_extended_header' then the
above 'if (!err)' becomes useless (always evaluates to 'true' since
'err' is set to '0').

>          if (!err)

Thanks,
Rémi

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 4/4] archive-tar: drop return value
  2016-06-24 11:49       ` Remi Galan Alfonso
@ 2016-06-24 13:13         ` Jeff King
  2016-06-24 19:10           ` Junio C Hamano
  0 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-24 13:13 UTC (permalink / raw)
  To: Remi Galan Alfonso
  Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 01:49:24PM +0200, Remi Galan Alfonso wrote:

> Hi Peff,
> 
> Jeff King <peff@peff.net> writes:
> > @@ -413,7 +411,7 @@ static int write_tar_archive(const struct archiver *ar,
> >  {
> >          int err = 0;
> >  
> > -        err = write_global_extended_header(args);
> > +        write_global_extended_header(args);
> >          if (!err)
> >                  err = write_archive_entries(args, write_tar_entry);
> 
> If we drop the error code from 'write_global_extended_header' then the
> above 'if (!err)' becomes useless (always evaluates to 'true' since
> 'err' is set to '0').

Thanks, you're right.

I wondered if we could drop "err" entirely, but write_archive_entries()
does indeed have some error code paths (everybody uses write_or_die, but
we return an error for things like unknown file types).

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-23 23:20     ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King
  2016-06-23 23:31       ` Jeff King
@ 2016-06-24 16:38       ` Johannes Sixt
  2016-06-24 16:46         ` Jeff King
  2016-06-24 18:56       ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Junio C Hamano
  2 siblings, 1 reply; 61+ messages in thread
From: Johannes Sixt @ 2016-06-24 16:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

Am 24.06.2016 um 01:20 schrieb Jeff King:
> +# We expect git to die with SIGPIPE here (otherwise we
> +# would generate the whole 64GB).
> +test_expect_failure BUNZIP 'generate tar with huge size' '
> +	{
> +		git archive HEAD
> +		echo $? >exit-code
> +	} | head -c 4096 >huge.tar &&
> +	echo 141 >expect &&
> +	test_cmp expect exit-code

It's going to be 269 with ksh, and who-knows-what on Windows (due to 
lack of SIGPIPE - I haven't tested this, yet).

-- Hannes


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-24 16:38       ` Johannes Sixt
@ 2016-06-24 16:46         ` Jeff King
  2016-06-24 17:05           ` Johannes Sixt
  0 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-24 16:46 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 06:38:55PM +0200, Johannes Sixt wrote:

> Am 24.06.2016 um 01:20 schrieb Jeff King:
> > +# We expect git to die with SIGPIPE here (otherwise we
> > +# would generate the whole 64GB).
> > +test_expect_failure BUNZIP 'generate tar with huge size' '
> > +	{
> > +		git archive HEAD
> > +		echo $? >exit-code
> > +	} | head -c 4096 >huge.tar &&
> > +	echo 141 >expect &&
> > +	test_cmp expect exit-code
> 
> It's going to be 269 with ksh, and who-knows-what on Windows (due to lack of
> SIGPIPE - I haven't tested this, yet).

Thanks, I meant to ask about that. We do a workaround in t0005, but we
_don't_ do it in the new sigpipe handling for test_must_fail. Is the
latter just broken, too?

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-24 16:46         ` Jeff King
@ 2016-06-24 17:05           ` Johannes Sixt
  2016-06-24 19:39             ` [PATCH 0/4] portable signal-checking in tests Jeff King
  0 siblings, 1 reply; 61+ messages in thread
From: Johannes Sixt @ 2016-06-24 17:05 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

Am 24.06.2016 um 18:46 schrieb Jeff King:
> On Fri, Jun 24, 2016 at 06:38:55PM +0200, Johannes Sixt wrote:
>> It's going to be 269 with ksh, and who-knows-what on Windows (due to lack of
>> SIGPIPE - I haven't tested this, yet).
>
> Thanks, I meant to ask about that. We do a workaround in t0005, but we
> _don't_ do it in the new sigpipe handling for test_must_fail. Is the
> latter just broken, too?

That's well possible. It is not prepared to see ksh's exit codes for 
signals.

-- Hannes


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-23 23:20     ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King
  2016-06-23 23:31       ` Jeff King
  2016-06-24 16:38       ` Johannes Sixt
@ 2016-06-24 18:56       ` Junio C Hamano
  2016-06-24 19:07         ` Jeff King
  2 siblings, 1 reply; 61+ messages in thread
From: Junio C Hamano @ 2016-06-24 18:56 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson

Jeff King <peff@peff.net> writes:

> The ustar format only has room for 11 (or 12, depending on
> some implementations) octal digits for the size and mtime of
> each file. After this, we have to add pax extended headers
> to specify the real data, and git does not yet know how to
> do so.

I am not a native speaker but "After" above made me hiccup.  I think
I am correct to understand that it means "after passing this limit",
aka "to represent files bigger or newer than these", but still it
felt somewhat strange.

> So as a prerequisite, we can feed the system tar a reference
> tarball to make sure it can handle these features. The
> reference tar here was created with:
>
>   dd if=/dev/zero seek=64G bs=1 count=1 of=huge
>   touch -d @68719476737 huge
>   tar cf - --format=pax |
>   head -c 2048
>
> using GNU tar. Note that this is not a complete tarfile, but
> it's enough to contain the headers we want to examine.

Cute.  I didn't remember they had @<seconds-since-epoch> format,
even though I must have seen what they do while working on 2c733fb2
(parse_date(): '@' prefix forces git-timestamp, 2012-02-02).

> +# See if our system tar can handle a tar file with huge sizes and dates far in
> +# the future, and that we can actually parse its output.
> +#
> +# The reference file was generated by GNU tar, and the magic time and size are
> +# both octal 01000000000001, which overflows normal ustar fields.
> +#
> +# When parsing, we'll pull out only the year from the date; that
> +# avoids any question of timezones impacting the result. 

... as long as the month-day part is not close to the year boundary.
So this explanation is insuffucient to convince the reader that
"that avoids any question" is correct, without saying that it is in
August of year 4147.

> +tar_info () {
> +	"$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1
> +}

A blank after the shell function to make it easier to see the
boundary.

Seeing an awk piped into cut always makes me want to suggest a
single sed/awk/perl invocation.

> +# We expect git to die with SIGPIPE here (otherwise we
> +# would generate the whole 64GB).
> +test_expect_failure BUNZIP 'generate tar with huge size' '
> +	{
> +		git archive HEAD
> +		echo $? >exit-code
> +	} | head -c 4096 >huge.tar &&
> +	echo 141 >expect &&
> +	test_cmp expect exit-code
> +'

"head -c" is GNU-ism, isn't it?

"dd bs=1 count=4096" is hopefully more portable.

ksh signal death you already know about.  I wonder if we want to
expose something like list_contains as a friend of test_cmp.

	list_contains 141,269 $(cat exit-code)

Thanks.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-23 23:21     ` [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB Jeff King
@ 2016-06-24 19:01       ` Junio C Hamano
  2016-06-24 19:10         ` Jeff King
  0 siblings, 1 reply; 61+ messages in thread
From: Junio C Hamano @ 2016-06-24 19:01 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson

Jeff King <peff@peff.net> writes:

> If the size was 64GB or greater, then we actually overflowed
> digits into the mtime field, meaning our value was was

was was

> effectively right-shifted by those lost octal digits. And
> this patch is again a strict improvement over that.

> diff --git a/archive-tar.c b/archive-tar.c
> index cb99df2..274bdfa 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
>  	strbuf_addch(sb, '\n');
>  }
>  
> +/*
> + * Like strbuf_append_ext_header, but for numeric values.
> + */
> +static void strbuf_append_ext_header_uint(struct strbuf *sb,
> +					  const char *keyword,
> +					  uintmax_t value)
> +{
> +	char buf[40]; /* big enough for 2^128 in decimal, plus NUL */
> +	int len;
> +
> +	len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value);
> +	strbuf_append_ext_header(sb, keyword, buf, len);
> +}
> +
>  static unsigned int ustar_header_chksum(const struct ustar_header *header)
>  {
>  	const unsigned char *p = (const unsigned char *)header;
> @@ -208,7 +222,7 @@ static int write_tar_entry(struct archiver_args *args,
>  	struct ustar_header header;
>  	struct strbuf ext_header = STRBUF_INIT;
>  	unsigned int old_mode = mode;
> -	unsigned long size;
> +	unsigned long size, size_in_header;
>  	void *buffer;
>  	int err = 0;
>  
> @@ -267,7 +281,13 @@ static int write_tar_entry(struct archiver_args *args,
>  			memcpy(header.linkname, buffer, size);
>  	}
>  
> -	prepare_header(args, &header, mode, size);
> +	size_in_header = size;
> +	if (S_ISREG(mode) && size > 077777777777UL) {

Want a symbolic constant with a comment that says why you have
eleven 7's?

> +		size_in_header = 0;
> +		strbuf_append_ext_header_uint(&ext_header, "size", size);
> +	}
> +
> +	prepare_header(args, &header, mode, size_in_header);

The change itself makes sense.

Thanks.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime
  2016-06-23 23:21     ` [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime Jeff King
@ 2016-06-24 19:06       ` Junio C Hamano
  2016-06-24 19:16         ` Jeff King
  0 siblings, 1 reply; 61+ messages in thread
From: Junio C Hamano @ 2016-06-24 19:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson

Jeff King <peff@peff.net> writes:

> There's a slight bit of trickiness there. We may already be
> ...
> After writing the extended header, we munge the timestamp in
> the ustar headers to the maximum-allowable size. This is
> wrong, but it's the least-wrong thing we can provide to a
> tar implementation that doesn't understand pax headers (it's
> also what GNU tar does).

The above looks very sensible design, and its implementation is
surprisingly compact.  Very nicely done.

> Helped-by: René Scharfe <l.s.r@web.de>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  archive-tar.c       | 16 +++++++++++++---
>  t/t5000-tar-tree.sh |  4 ++--
>  2 files changed, 15 insertions(+), 5 deletions(-)
>
> diff --git a/archive-tar.c b/archive-tar.c
> index 274bdfa..0bb164c 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -317,7 +317,18 @@ static int write_global_extended_header(struct archiver_args *args)
>  	unsigned int mode;
>  	int err = 0;
>  
> -	strbuf_append_ext_header(&ext_header, "comment", sha1_to_hex(sha1), 40);
> +	if (sha1)
> +		strbuf_append_ext_header(&ext_header, "comment",
> +					 sha1_to_hex(sha1), 40);
> +	if (args->time > 077777777777UL) {
> +		strbuf_append_ext_header_uint(&ext_header, "mtime",
> +					      args->time);
> +		args->time = 077777777777UL;
> +	}
> +
> +	if (!ext_header.len)
> +		return 0;

Another symbolic constant to explain this, e.g. TAR_TIME_LIMIT, may
want to exist.

Thanks.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-24 18:56       ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Junio C Hamano
@ 2016-06-24 19:07         ` Jeff King
  2016-06-24 19:44           ` Junio C Hamano
                             ` (2 more replies)
  0 siblings, 3 replies; 61+ messages in thread
From: Jeff King @ 2016-06-24 19:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 11:56:19AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The ustar format only has room for 11 (or 12, depending on
> > some implementations) octal digits for the size and mtime of
> > each file. After this, we have to add pax extended headers
> > to specify the real data, and git does not yet know how to
> > do so.
> 
> I am not a native speaker but "After" above made me hiccup.  I think
> I am correct to understand that it means "after passing this limit",
> aka "to represent files bigger or newer than these", but still it
> felt somewhat strange.

Yeah, I agree that it reads badly. I'm not sure what I was thinking.
I'll tweak it in the re-roll.

> > +# See if our system tar can handle a tar file with huge sizes and dates far in
> > +# the future, and that we can actually parse its output.
> > +#
> > +# The reference file was generated by GNU tar, and the magic time and size are
> > +# both octal 01000000000001, which overflows normal ustar fields.
> > +#
> > +# When parsing, we'll pull out only the year from the date; that
> > +# avoids any question of timezones impacting the result. 
> 
> ... as long as the month-day part is not close to the year boundary.
> So this explanation is insuffucient to convince the reader that
> "that avoids any question" is correct, without saying that it is in
> August of year 4147.

I thought that part didn't need to be said, but I can say it
(technically we can include the month, too, but I don't think that level
of accuracy is really important for these tests).

> > +tar_info () {
> > +	"$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1
> > +}
> 
> A blank after the shell function to make it easier to see the
> boundary.

I was intentionally trying to couple it with prereq below, as the
comment describes both of them.

> Seeing an awk piped into cut always makes me want to suggest a
> single sed/awk/perl invocation.

I want the auto-splitting of awk, but then to auto-split the result
using a different delimiter. Is there a not-painful way to do that in
awk?

I could certainly come up with a regex to do it in sed, but I wanted to
keep the parsing as liberal and generic as possible.

Certainly I could do it in perl, but I had the general impression that
we prefer to keep the dependency on perl to a minimum. Maybe it doesn't
matter.

> > +# We expect git to die with SIGPIPE here (otherwise we
> > +# would generate the whole 64GB).
> > +test_expect_failure BUNZIP 'generate tar with huge size' '
> > +	{
> > +		git archive HEAD
> > +		echo $? >exit-code
> > +	} | head -c 4096 >huge.tar &&
> > +	echo 141 >expect &&
> > +	test_cmp expect exit-code
> > +'
> 
> "head -c" is GNU-ism, isn't it?

You're right; for some reason I thought it was in POSIX.

We do have a couple instances of it, but they are all in the valgrind
setup code (which I guess most people don't ever run).

> "dd bs=1 count=4096" is hopefully more portable.

Hmm. I always wonder whether dd is actually very portable, but we do use
it already, at least.

Perhaps the perl monstrosity in t9300 could be replaced with that, too.

> ksh signal death you already know about.  I wonder if we want to
> expose something like list_contains as a friend of test_cmp.
> 
> 	list_contains 141,269 $(cat exit-code)

I think we would want something more like:

  test_signal_match 13 $(cat exit-code)

Each call site should not have to know about every signal convention
(and in your example, the magic "3" of Windows is left out).

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-24 19:01       ` Junio C Hamano
@ 2016-06-24 19:10         ` Jeff King
  2016-06-24 19:45           ` Junio C Hamano
  0 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-24 19:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 12:01:16PM -0700, Junio C Hamano wrote:

> > @@ -267,7 +281,13 @@ static int write_tar_entry(struct archiver_args *args,
> >  			memcpy(header.linkname, buffer, size);
> >  	}
> >  
> > -	prepare_header(args, &header, mode, size);
> > +	size_in_header = size;
> > +	if (S_ISREG(mode) && size > 077777777777UL) {
> 
> Want a symbolic constant with a comment that says why you have
> eleven 7's?

I tried instead to make sure we only mention it once to avoid a symbolic
constant (even though the same constant appears in the next patch, too,
it would be a mistake to give them the same name; they just happen to be
the same size).

So if anything, I would put a comment here, explaining that ustar cannot
handle anything larger than this, and POSIX mandates it (but I didn't
because the commit message already goes into much more detail).

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 4/4] archive-tar: drop return value
  2016-06-24 13:13         ` Jeff King
@ 2016-06-24 19:10           ` Junio C Hamano
  0 siblings, 0 replies; 61+ messages in thread
From: Junio C Hamano @ 2016-06-24 19:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Remi Galan Alfonso, git, René Scharfe, Robin H. Johnson

Jeff King <peff@peff.net> writes:

> On Fri, Jun 24, 2016 at 01:49:24PM +0200, Remi Galan Alfonso wrote:
>
>> Hi Peff,
>> 
>> Jeff King <peff@peff.net> writes:
>> > @@ -413,7 +411,7 @@ static int write_tar_archive(const struct archiver *ar,
>> >  {
>> >          int err = 0;
>> >  
>> > -        err = write_global_extended_header(args);
>> > +        write_global_extended_header(args);
>> >          if (!err)
>> >                  err = write_archive_entries(args, write_tar_entry);
>> 
>> If we drop the error code from 'write_global_extended_header' then the
>> above 'if (!err)' becomes useless (always evaluates to 'true' since
>> 'err' is set to '0').
>
> Thanks, you're right.
>
> I wondered if we could drop "err" entirely, but write_archive_entries()
> does indeed have some error code paths (everybody uses write_or_die, but
> we return an error for things like unknown file types).

You could if you did this ;-)

Which I do not necessarily think is a good idea.

This may be easier to read

	write_global_extended_header(args)
        err = write_archive_entries(args, write_tar_entry);
        if (!err)
		write_trailer();
	return err;

though.

 archive-tar.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index 903b74d..eed2c96 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -412,11 +412,9 @@ static int write_tar_archive(const struct archiver *ar,
 	int err = 0;
 
 	write_global_extended_header(args);
-	if (!err)
-		err = write_archive_entries(args, write_tar_entry);
-	if (!err)
-		write_trailer();
-	return err;
+
+	return (write_archive_entries(args, write_tar_entry) ||
+		write_trailer());
 }
 
 static int write_tar_filter_archive(const struct archiver *ar,

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime
  2016-06-24 19:06       ` Junio C Hamano
@ 2016-06-24 19:16         ` Jeff King
  0 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-24 19:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 12:06:59PM -0700, Junio C Hamano wrote:

> > +	if (args->time > 077777777777UL) {
> > +		strbuf_append_ext_header_uint(&ext_header, "mtime",
> > +					      args->time);
> > +		args->time = 077777777777UL;
> > +	}
> > +
> > +	if (!ext_header.len)
> > +		return 0;
> 
> Another symbolic constant to explain this, e.g. TAR_TIME_LIMIT, may
> want to exist.

This one at least appears twice. I think one of the reasons I am
slightly resistant to a symbolic constant is that it tempts people to
think that it's OK to change it. It's not. These values are mandated by
POSIX, and must match the size of the ustar header field.

So the least-repetitive thing would be to define it as:

  (1UL << (1 + (3 * (sizeof(ustar_header.mtime) - 1)))) - 1

That's pretty horrible to read, but if wrapped in a symbolic constant,
at least people would think twice before touching it. ;)

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 0/4] portable signal-checking in tests
  2016-06-24 17:05           ` Johannes Sixt
@ 2016-06-24 19:39             ` Jeff King
  2016-06-24 19:43               ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King
                                 ` (4 more replies)
  0 siblings, 5 replies; 61+ messages in thread
From: Jeff King @ 2016-06-24 19:39 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 07:05:14PM +0200, Johannes Sixt wrote:

> Am 24.06.2016 um 18:46 schrieb Jeff King:
> > On Fri, Jun 24, 2016 at 06:38:55PM +0200, Johannes Sixt wrote:
> > > It's going to be 269 with ksh, and who-knows-what on Windows (due to lack of
> > > SIGPIPE - I haven't tested this, yet).
> > 
> > Thanks, I meant to ask about that. We do a workaround in t0005, but we
> > _don't_ do it in the new sigpipe handling for test_must_fail. Is the
> > latter just broken, too?
> 
> That's well possible. It is not prepared to see ksh's exit codes for
> signals.

I'm actually not convinced that old versions of ksh are viable for
running the test suite. mksh seems to use POSIX semantics, and I cannot
get through t0005 with ksh93, as it parses nested parentheses wrong. But
maybe there are other ksh variants that use the funny exit codes, but
are otherwise not too buggy.

I'd be more concerned with Windows. The SIGPIPE tests in t0005 are
already marked !MINGW, but other checks elsewhere are not. I know there
is no SIGPIPE on Windows, so it may be that some cases happen to work
because we end up in write_or_die(), which converts EPIPE into a 141
exit.

Anyway. Here's a series that I think makes things better, and it is not
too painful to do.

  [1/4]: tests: factor portable signal check out of t0005
  [2/4]: t0005: use test_match_signal as appropriate
  [3/4]: test_must_fail: use test_match_signal
  [4/4]: t/lib-git-daemon: use test_match_signal

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 1/4] tests: factor portable signal check out of t0005
  2016-06-24 19:39             ` [PATCH 0/4] portable signal-checking in tests Jeff King
@ 2016-06-24 19:43               ` Jeff King
  2016-06-24 20:52                 ` Johannes Sixt
  2016-06-24 19:44               ` [PATCH 2/4] t0005: use test_match_signal as appropriate Jeff King
                                 ` (3 subsequent siblings)
  4 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-24 19:43 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

In POSIX shells, a program which exits due to a signal
generally has an exit code of 128 plus the signal number.
However, some platforms do other things. ksh uses 256 plus
the signal number, and on Windows, all signals are just "3".

We've accounted for that in t0005, but not in other tests.
Let's pull out the logic so we can use it elsewhere.

It would be nice for debugging if this additionally printed
errors to stderr, like our other test_* helpers. But we're
going to need to use it in other places besides the innards
of a test_expect block. So let's leave it as generic as
possible.

Signed-off-by: Jeff King <peff@peff.net>
---
I didn't get into the weirdness of SIGPIPE on Windows here, but I think
this is probably a first step toward handling it better. E.g., it may be
that test_match_signal should respect 128 (or even any code) when we are
checking for SIGPIPE.

I also didn't bother with symbolic names. We could make:

  test_match_signal sigterm $?

work, but I didn't think it was worth the effort. While numbers for some
obscure signals do vary on platforms, sigpipe and sigterm are standard
enough to rely on.

 t/t0005-signals.sh      |  7 +------
 t/test-lib-functions.sh | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/t/t0005-signals.sh b/t/t0005-signals.sh
index e7f27eb..2d96265 100755
--- a/t/t0005-signals.sh
+++ b/t/t0005-signals.sh
@@ -11,12 +11,7 @@ EOF
 
 test_expect_success 'sigchain works' '
 	{ test-sigchain >actual; ret=$?; } &&
-	case "$ret" in
-	143) true ;; # POSIX w/ SIGTERM=15
-	271) true ;; # ksh w/ SIGTERM=15
-	  3) true ;; # Windows
-	  *) false ;;
-	esac &&
+	test_match_signal 15 "$ret" &&
 	test_cmp expect actual
 '
 
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 48884d5..51d3775 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -961,3 +961,21 @@ test_env () {
 		done
 	)
 }
+
+# Returns true if the numeric exit code in "$2" represents the expected signal
+# in "$1". Signals should be given numerically.
+test_match_signal () {
+	if test "$2" = "$((128 + $1))"
+	then
+		# POSIX
+		return 0
+	elif test "$2" = "$((256 + $1))"
+	then
+		# ksh
+		return 0
+	elif test "$2" = "3"; then
+		# Windows
+		return 0
+	fi
+	return 1
+}
-- 
2.9.0.215.gc5c4261


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-24 19:07         ` Jeff King
@ 2016-06-24 19:44           ` Junio C Hamano
  2016-06-24 20:58           ` Eric Sunshine
  2016-06-24 20:58           ` Jeff King
  2 siblings, 0 replies; 61+ messages in thread
From: Junio C Hamano @ 2016-06-24 19:44 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson

Jeff King <peff@peff.net> writes:

>> > +# When parsing, we'll pull out only the year from the date; that
>> > +# avoids any question of timezones impacting the result. 
>> 
>> ... as long as the month-day part is not close to the year boundary.
>> So this explanation is insuffucient to convince the reader that
>> "that avoids any question" is correct, without saying that it is in
>> August of year 4147.
>
> I thought that part didn't need to be said, but I can say it
> (technically we can include the month, too, but I don't think that level
> of accuracy is really important for these tests).

Oh, I wasn't suggesting to include the month in the comparison.  But
to understand why it is safe from TZ jitters to test only year, the
reader needs to know (or do the math herself) that the timestamp is
away from the year boundary, so mentioning August in the justifying
comment is needed.

>> Seeing an awk piped into cut always makes me want to suggest a
>> single sed/awk/perl invocation.
>
> I want the auto-splitting of awk, but then to auto-split the result
> using a different delimiter. Is there a not-painful way to do that in
> awk?
>
> I could certainly come up with a regex to do it in sed, but I wanted to
> keep the parsing as liberal and generic as possible.
>
> Certainly I could do it in perl, but I had the general impression that
> we prefer to keep the dependency on perl to a minimum. Maybe it doesn't
> matter.

Heh.  It was merely "makes me want to suggest", not "I suggest".  If
I were doing this myself, I would have done a single sed but it does
not matter.

> I think we would want something more like:
>
>   test_signal_match 13 $(cat exit-code)

I like that.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 2/4] t0005: use test_match_signal as appropriate
  2016-06-24 19:39             ` [PATCH 0/4] portable signal-checking in tests Jeff King
  2016-06-24 19:43               ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King
@ 2016-06-24 19:44               ` Jeff King
  2016-06-24 19:45               ` [PATCH 3/4] test_must_fail: use test_match_signal Jeff King
                                 ` (2 subsequent siblings)
  4 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-24 19:44 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

The first test already uses this more portable construct
(that was where it was factored from initially), but the
later tests do a raw comparison against 141 to look for
SIGPIPE, which can fail on some shells and platforms.

Signed-off-by: Jeff King <peff@peff.net>
---
Again, I couldn't test these. They're skipped on MINGW, and ksh93 barfs
before even executing the tests.

 t/t0005-signals.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t0005-signals.sh b/t/t0005-signals.sh
index 2d96265..c80c995 100755
--- a/t/t0005-signals.sh
+++ b/t/t0005-signals.sh
@@ -36,12 +36,12 @@ test_expect_success 'create blob' '
 
 test_expect_success !MINGW 'a constipated git dies with SIGPIPE' '
 	OUT=$( ((large_git; echo $? 1>&3) | :) 3>&1 ) &&
-	test "$OUT" -eq 141
+	test_match_signal 13 "$OUT"
 '
 
 test_expect_success !MINGW 'a constipated git dies with SIGPIPE even if parent ignores it' '
 	OUT=$( ((trap "" PIPE; large_git; echo $? 1>&3) | :) 3>&1 ) &&
-	test "$OUT" -eq 141
+	test_match_signal 13 "$OUT"
 '
 
 test_done
-- 
2.9.0.215.gc5c4261


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* [PATCH 3/4] test_must_fail: use test_match_signal
  2016-06-24 19:39             ` [PATCH 0/4] portable signal-checking in tests Jeff King
  2016-06-24 19:43               ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King
  2016-06-24 19:44               ` [PATCH 2/4] t0005: use test_match_signal as appropriate Jeff King
@ 2016-06-24 19:45               ` Jeff King
  2016-06-24 19:45               ` [PATCH 4/4] t/lib-git-daemon: " Jeff King
  2016-06-24 19:48               ` [PATCH 0/4] portable signal-checking in tests Jeff King
  4 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-24 19:45 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

In 8bf4bec (add "ok=sigpipe" to test_must_fail and use it to
fix flaky tests, 2015-11-27), test_must_fail learned to
recognize "141" as a sigpipe failure. However, testing for
a signal is more complicated than that; we should use
test_match_signal to implement more portable checking.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/test-lib-functions.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 51d3775..91d3b58 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -612,7 +612,7 @@ test_must_fail () {
 	then
 		echo >&2 "test_must_fail: command succeeded: $*"
 		return 1
-	elif test $exit_code -eq 141 && list_contains "$_test_ok" sigpipe
+	elif test_match_signal 13 $exit_code && list_contains "$_test_ok" sigpipe
 	then
 		return 0
 	elif test $exit_code -gt 129 && test $exit_code -le 192
-- 
2.9.0.215.gc5c4261


^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-24 19:10         ` Jeff King
@ 2016-06-24 19:45           ` Junio C Hamano
  2016-06-24 19:46             ` Jeff King
  0 siblings, 1 reply; 61+ messages in thread
From: Junio C Hamano @ 2016-06-24 19:45 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson

Jeff King <peff@peff.net> writes:

> So if anything, I would put a comment here, explaining that ustar cannot
> handle anything larger than this, and POSIX mandates it (but I didn't
> because the commit message already goes into much more detail).

That sounds like a good thing to do.  Not everybody runs "blame" to
find the commit whose log message she must read before touching a
line.



^ permalink raw reply	[flat|nested] 61+ messages in thread

* [PATCH 4/4] t/lib-git-daemon: use test_match_signal
  2016-06-24 19:39             ` [PATCH 0/4] portable signal-checking in tests Jeff King
                                 ` (2 preceding siblings ...)
  2016-06-24 19:45               ` [PATCH 3/4] test_must_fail: use test_match_signal Jeff King
@ 2016-06-24 19:45               ` Jeff King
  2016-06-24 19:48               ` [PATCH 0/4] portable signal-checking in tests Jeff King
  4 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-24 19:45 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

When git-daemon exits, we expect it to be with the SIGTERM
we just sent it. If we see anything else, we'll complain.
But our check against exit code "143" is not portable. For
example:

  $ ksh93 t5570-git-daemon.sh
  [...]
  error: git daemon exited with status: 271

We can fix this by using test_match_signal.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/lib-git-daemon.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/t/lib-git-daemon.sh b/t/lib-git-daemon.sh
index 340534c..f9cbd47 100644
--- a/t/lib-git-daemon.sh
+++ b/t/lib-git-daemon.sh
@@ -82,8 +82,7 @@ stop_git_daemon() {
 	kill "$GIT_DAEMON_PID"
 	wait "$GIT_DAEMON_PID" >&3 2>&4
 	ret=$?
-	# expect exit with status 143 = 128+15 for signal TERM=15
-	if test $ret -ne 143
+	if test_match_signal 15 $?
 	then
 		error "git daemon exited with status: $ret"
 	fi
-- 
2.9.0.215.gc5c4261

^ permalink raw reply related	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-24 19:45           ` Junio C Hamano
@ 2016-06-24 19:46             ` Jeff King
  0 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-24 19:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 12:45:05PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > So if anything, I would put a comment here, explaining that ustar cannot
> > handle anything larger than this, and POSIX mandates it (but I didn't
> > because the commit message already goes into much more detail).
> 
> That sounds like a good thing to do.  Not everybody runs "blame" to
> find the commit whose log message she must read before touching a
> line.

Well, they should. My commit messages are works of literature. ;)

I'll add a comment.

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 0/4] portable signal-checking in tests
  2016-06-24 19:39             ` [PATCH 0/4] portable signal-checking in tests Jeff King
                                 ` (3 preceding siblings ...)
  2016-06-24 19:45               ` [PATCH 4/4] t/lib-git-daemon: " Jeff King
@ 2016-06-24 19:48               ` Jeff King
  4 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-24 19:48 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 03:39:24PM -0400, Jeff King wrote:

> Anyway. Here's a series that I think makes things better, and it is not
> too painful to do.
> 
>   [1/4]: tests: factor portable signal check out of t0005
>   [2/4]: t0005: use test_match_signal as appropriate
>   [3/4]: test_must_fail: use test_match_signal
>   [4/4]: t/lib-git-daemon: use test_match_signal

Oh, and I meant to mention: this covers everything I found grepping for
"141" and "143" in the test suite (though you have to filter the results
a bit for false positives).

It doesn't fix the case newly added in the tar series that started this
discussion. I don't want to hold either topic hostage to the other, so
I'll prepare a patch to go on top.

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/4] tests: factor portable signal check out of t0005
  2016-06-24 19:43               ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King
@ 2016-06-24 20:52                 ` Johannes Sixt
  2016-06-24 21:05                   ` Jeff King
  0 siblings, 1 reply; 61+ messages in thread
From: Johannes Sixt @ 2016-06-24 20:52 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

Am 24.06.2016 um 21:43 schrieb Jeff King:
> In POSIX shells, a program which exits due to a signal
> generally has an exit code of 128 plus the signal number.
> However, some platforms do other things. ksh uses 256 plus
> the signal number, and on Windows, all signals are just "3".

That's not true, see below.

> I didn't get into the weirdness of SIGPIPE on Windows here, but I think
> this is probably a first step toward handling it better. E.g., it may be
> that test_match_signal should respect 128 (or even any code) when we are
> checking for SIGPIPE.

The Windows behavior is most closely described as having signal(SIGPIPE, 
SIG_IGN) at the very beginning of the program.

> +# Returns true if the numeric exit code in "$2" represents the expected signal
> +# in "$1". Signals should be given numerically.
> +test_match_signal () {
> +	if test "$2" = "$((128 + $1))"
> +	then
> +		# POSIX
> +		return 0
> +	elif test "$2" = "$((256 + $1))"
> +	then
> +		# ksh
> +		return 0
> +	elif test "$2" = "3"; then
> +		# Windows

You meant well here, but this is close to pointless as a general check. 
We check for this exit code in t0005 because there program termination 
happens via raise(), which on Window just calls exit(3). This exit code 
is not an indication that something related to SIGPIPE (or any signal) 
happened.

IMO there is too much danger to trigger a false positive if exit code 3 
is treated special in this generality.

> +		return 0
> +	fi
> +	return 1
> +}

-- Hannes


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-24 19:07         ` Jeff King
  2016-06-24 19:44           ` Junio C Hamano
@ 2016-06-24 20:58           ` Eric Sunshine
  2016-06-24 21:09             ` Jeff King
  2016-06-24 20:58           ` Jeff King
  2 siblings, 1 reply; 61+ messages in thread
From: Eric Sunshine @ 2016-06-24 20:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git List, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 3:07 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Jun 24, 2016 at 11:56:19AM -0700, Junio C Hamano wrote:
>> Jeff King <peff@peff.net> writes:
>> > +tar_info () {
>> > +   "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1
>> > +}
>
>> Seeing an awk piped into cut always makes me want to suggest a
>> single sed/awk/perl invocation.
>
> I want the auto-splitting of awk, but then to auto-split the result
> using a different delimiter. Is there a not-painful way to do that in
> awk?

The awk split() function is POSIX and accepts an optional separator argument.

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-24 19:07         ` Jeff King
  2016-06-24 19:44           ` Junio C Hamano
  2016-06-24 20:58           ` Eric Sunshine
@ 2016-06-24 20:58           ` Jeff King
  2016-06-24 22:41             ` Junio C Hamano
  2 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-24 20:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 03:07:44PM -0400, Jeff King wrote:

> > "dd bs=1 count=4096" is hopefully more portable.
> 
> Hmm. I always wonder whether dd is actually very portable, but we do use
> it already, at least.
> 
> Perhaps the perl monstrosity in t9300 could be replaced with that, too.

Hrm. So I wrote a patch for t9300 for this. But I wanted to flip the
order to:

  dd bs=4096 count=1

because otherwise, dd will call read() 4096 times, for 1 byte each.

But it's not safe to do that on a pipe. For example:

  {
	echo 1
	sleep 1
	echo 2
  } | dd bs=4 count=1

will copy only 2 bytes. So it's racily wrong, depending on how the
writer feeds the data to write().

The 1-byte reads do work (assuming blocking descriptors and that dd
restarts a read after a signal, which mine seems to). But yuck.

The difference in time between the two is measurable on my system, but
it's only a few milliseconds (for 4096 bytes). So maybe it's not worth
worrying about (though as a general technique, it does make me worry
that it's easy to get wrong in a way that will fail racily).

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/4] tests: factor portable signal check out of t0005
  2016-06-24 20:52                 ` Johannes Sixt
@ 2016-06-24 21:05                   ` Jeff King
  2016-06-24 21:32                     ` Johannes Sixt
  0 siblings, 1 reply; 61+ messages in thread
From: Jeff King @ 2016-06-24 21:05 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 10:52:32PM +0200, Johannes Sixt wrote:

> Am 24.06.2016 um 21:43 schrieb Jeff King:
> > In POSIX shells, a program which exits due to a signal
> > generally has an exit code of 128 plus the signal number.
> > However, some platforms do other things. ksh uses 256 plus
> > the signal number, and on Windows, all signals are just "3".
> 
> That's not true, see below.

I was worried about that. Git for Windows seems like a labyrinth of
bizarre special cases.

> > I didn't get into the weirdness of SIGPIPE on Windows here, but I think
> > this is probably a first step toward handling it better. E.g., it may be
> > that test_match_signal should respect 128 (or even any code) when we are
> > checking for SIGPIPE.
> 
> The Windows behavior is most closely described as having signal(SIGPIPE,
> SIG_IGN) at the very beginning of the program.

Right, but then we would get EPIPE. So what does git do in such cases?
I'd expect it generally to either hit the check_pipe() part of
write_or_die(), or to end up complaining via die() that the write didn't
go as expected.

> > +# Returns true if the numeric exit code in "$2" represents the expected signal
> > +# in "$1". Signals should be given numerically.
> > +test_match_signal () {
> > +	if test "$2" = "$((128 + $1))"
> > +	then
> > +		# POSIX
> > +		return 0
> > +	elif test "$2" = "$((256 + $1))"
> > +	then
> > +		# ksh
> > +		return 0
> > +	elif test "$2" = "3"; then
> > +		# Windows
> 
> You meant well here, but this is close to pointless as a general check. We
> check for this exit code in t0005 because there program termination happens
> via raise(), which on Window just calls exit(3). This exit code is not an
> indication that something related to SIGPIPE (or any signal) happened.
> 
> IMO there is too much danger to trigger a false positive if exit code 3 is
> treated special in this generality.

Yeah, I agree. But what _should_ it do? E.g., what happens to git-daemon
when it is killed via TERM?

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-24 20:58           ` Eric Sunshine
@ 2016-06-24 21:09             ` Jeff King
  0 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-24 21:09 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Junio C Hamano, Git List, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 04:58:58PM -0400, Eric Sunshine wrote:

> On Fri, Jun 24, 2016 at 3:07 PM, Jeff King <peff@peff.net> wrote:
> > On Fri, Jun 24, 2016 at 11:56:19AM -0700, Junio C Hamano wrote:
> >> Jeff King <peff@peff.net> writes:
> >> > +tar_info () {
> >> > +   "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1
> >> > +}
> >
> >> Seeing an awk piped into cut always makes me want to suggest a
> >> single sed/awk/perl invocation.
> >
> > I want the auto-splitting of awk, but then to auto-split the result
> > using a different delimiter. Is there a not-painful way to do that in
> > awk?
> 
> The awk split() function is POSIX and accepts an optional separator argument.

Thanks. I'm not that familiar with awk functions, simply because I came
of age after perl existed, and using perl tends to be more portable and
powerful (if you can assume it's available). But this is simple enough
that it should be OK.

Replacing it with:

        "$TAR" tvf "$1" |
        awk '{
                split($4, date, "-")
                print $3 " " date[1]
        }'

seems to work.

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH 1/4] tests: factor portable signal check out of t0005
  2016-06-24 21:05                   ` Jeff King
@ 2016-06-24 21:32                     ` Johannes Sixt
  0 siblings, 0 replies; 61+ messages in thread
From: Johannes Sixt @ 2016-06-24 21:32 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson

Am 24.06.2016 um 23:05 schrieb Jeff King:
> On Fri, Jun 24, 2016 at 10:52:32PM +0200, Johannes Sixt wrote:
>> The Windows behavior is most closely described as having signal(SIGPIPE,
>> SIG_IGN) at the very beginning of the program.
>
> Right, but then we would get EPIPE. So what does git do in such cases?
> I'd expect it generally to either hit the check_pipe() part of
> write_or_die(), or to end up complaining via die() that the write didn't
> go as expected.

Ah, I have forgotten about this code path. Looks like it will trigger 
exactly the same raise() as test_sigchain. Then the check for exit code 
3 makes a bit more sense. But I'm sure we have code paths that do not go 
through checK_pipe(). Those would proceed through whatever error 
handling we have and most likely die().

>> IMO there is too much danger to trigger a false positive if exit code 3 is
>> treated special in this generality.
>
> Yeah, I agree. But what _should_ it do? E.g., what happens to git-daemon
> when it is killed via TERM?

Frankly, I don't know how bash's 'kill -TERM' and a Windows program 
interact. I've avoided this topic like the plague so far.

-- Hannes


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-24 20:58           ` Jeff King
@ 2016-06-24 22:41             ` Junio C Hamano
  2016-06-24 23:22               ` Jeff King
  0 siblings, 1 reply; 61+ messages in thread
From: Junio C Hamano @ 2016-06-24 22:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson

Jeff King <peff@peff.net> writes:

> The difference in time between the two is measurable on my system, but
> it's only a few milliseconds (for 4096 bytes). So maybe it's not worth
> worrying about (though as a general technique, it does make me worry
> that it's easy to get wrong in a way that will fail racily).

Yeah, GNU dd has iflag=fullblock, but if we assume GNU, we can
safely assume "head -c", so I do not think of a way to do this
portably enough.


^ permalink raw reply	[flat|nested] 61+ messages in thread

* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers
  2016-06-24 22:41             ` Junio C Hamano
@ 2016-06-24 23:22               ` Jeff King
  0 siblings, 0 replies; 61+ messages in thread
From: Jeff King @ 2016-06-24 23:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson

On Fri, Jun 24, 2016 at 03:41:47PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The difference in time between the two is measurable on my system, but
> > it's only a few milliseconds (for 4096 bytes). So maybe it's not worth
> > worrying about (though as a general technique, it does make me worry
> > that it's easy to get wrong in a way that will fail racily).
> 
> Yeah, GNU dd has iflag=fullblock, but if we assume GNU, we can
> safely assume "head -c", so I do not think of a way to do this
> portably enough.

Thinking on it more, "head -c" is _not_ what one would want in all
cases. It would work here, but not in t9300, for instance, where the
code is trying to read an exact number of bytes from a fifo. I don't
think "head" makes any promises about buffering and may read extra
bytes.

So I dunno. "dd" generally does make such promises, or perhaps the perl
sysread() solution in t9300 is not so bad.

-Peff

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, other threads:[~2016-06-24 23:22 UTC | newest]

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16  4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King
2016-06-16  4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
2016-06-20 22:54   ` René Scharfe
2016-06-21 15:59     ` Jeff King
2016-06-21 16:02       ` Jeff King
2016-06-21 20:42       ` René Scharfe
2016-06-21 20:57         ` René Scharfe
2016-06-21 21:04           ` Jeff King
2016-06-22  5:46             ` René Scharfe
2016-06-21 21:02         ` Jeff King
2016-06-22  5:46           ` René Scharfe
2016-06-23 19:21             ` Jeff King
2016-06-21 20:54       ` René Scharfe
2016-06-21 19:44   ` Robin H. Johnson
2016-06-21 20:57     ` Jeff King
2016-06-16  4:37 ` [PATCH 2/2] archive-tar: write extended headers for far-future mtime Jeff King
2016-06-20 22:54   ` René Scharfe
2016-06-22  5:46     ` René Scharfe
2016-06-23 19:22       ` Jeff King
2016-06-23 21:38         ` René Scharfe
2016-06-23 21:39           ` Jeff King
2016-06-16 17:55 ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano
2016-06-21 16:16 ` Jeff King
2016-06-21 16:16   ` [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King
2016-06-21 16:17   ` [PATCH v2 2/2] archive-tar: write extended headers for far-future mtime Jeff King
2016-06-21 18:43   ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano
2016-06-23 23:15   ` [PATCH v3] " Jeff King
2016-06-23 23:20     ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King
2016-06-23 23:31       ` Jeff King
2016-06-24 16:38       ` Johannes Sixt
2016-06-24 16:46         ` Jeff King
2016-06-24 17:05           ` Johannes Sixt
2016-06-24 19:39             ` [PATCH 0/4] portable signal-checking in tests Jeff King
2016-06-24 19:43               ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King
2016-06-24 20:52                 ` Johannes Sixt
2016-06-24 21:05                   ` Jeff King
2016-06-24 21:32                     ` Johannes Sixt
2016-06-24 19:44               ` [PATCH 2/4] t0005: use test_match_signal as appropriate Jeff King
2016-06-24 19:45               ` [PATCH 3/4] test_must_fail: use test_match_signal Jeff King
2016-06-24 19:45               ` [PATCH 4/4] t/lib-git-daemon: " Jeff King
2016-06-24 19:48               ` [PATCH 0/4] portable signal-checking in tests Jeff King
2016-06-24 18:56       ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Junio C Hamano
2016-06-24 19:07         ` Jeff King
2016-06-24 19:44           ` Junio C Hamano
2016-06-24 20:58           ` Eric Sunshine
2016-06-24 21:09             ` Jeff King
2016-06-24 20:58           ` Jeff King
2016-06-24 22:41             ` Junio C Hamano
2016-06-24 23:22               ` Jeff King
2016-06-23 23:21     ` [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB Jeff King
2016-06-24 19:01       ` Junio C Hamano
2016-06-24 19:10         ` Jeff King
2016-06-24 19:45           ` Junio C Hamano
2016-06-24 19:46             ` Jeff King
2016-06-23 23:21     ` [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime Jeff King
2016-06-24 19:06       ` Junio C Hamano
2016-06-24 19:16         ` Jeff King
2016-06-23 23:21     ` [PATCH v3 4/4] archive-tar: drop return value Jeff King
2016-06-24 11:49       ` Remi Galan Alfonso
2016-06-24 13:13         ` Jeff King
2016-06-24 19:10           ` Junio C Hamano

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.