All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] friendlier handling of overflows in archive-tar
@ 2016-06-30  9:06 Jeff King
  2016-06-30  9:07 ` [PATCH v4 1/5] t9300: factor out portable "head -c" replacement Jeff King
                   ` (5 more replies)
  0 siblings, 6 replies; 42+ messages in thread
From: Jeff King @ 2016-06-30  9:06 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Johannes Sixt, Junio C Hamano

This is a re-roll of the jk/big-and-future-archive-tar topic. It
addresses all but one of the review comments, and I hope should be
pretty polished.

The changes are:

  - the dependency on bunzip2 is dropped; instead, we just provide a
    partial object for the 64GB blob. See the first commit message for
    details.

  - the portable "head -c" replacement from t9300 has been factored out,
    and we use it in the new tests

  - symbolic constants for the giant octal numbers (with a comment
    warning that the values are set by posix)

  - the comments for tar_info() and the lazy-prereq were split so the
    two aren't mashed together

  - uses awk in tar_info() instead of "sed | cut"

  - extra simplification in the final commit, as suggested by review

  - typo and awkwardness fixes in the commit messages

The one thing that isn't fixed is the use of "141" to test for sigpipe
death. That should use test_match_signal, but that topic just got
re-rolled, too.

  [1/5]: t9300: factor out portable "head -c" replacement
  [2/5]: t5000: test tar files that overflow ustar headers
  [3/5]: archive-tar: write extended headers for file sizes >= 8GB
  [4/5]: archive-tar: write extended headers for far-future mtime
  [5/5]: archive-tar: drop return value

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v4 1/5] t9300: factor out portable "head -c" replacement
  2016-06-30  9:06 [PATCH v4 0/5] friendlier handling of overflows in archive-tar Jeff King
@ 2016-06-30  9:07 ` Jeff King
  2016-07-01  4:45   ` Eric Sunshine
  2016-07-01 17:23   ` Junio C Hamano
  2016-06-30  9:08 ` [PATCH v4 2/5] t5000: test tar files that overflow ustar headers Jeff King
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 42+ messages in thread
From: Jeff King @ 2016-06-30  9:07 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Johannes Sixt, Junio C Hamano

In shell scripts it is sometimes useful to be able to read
exactly N bytes from a pipe. Doing this portably turns out
to be surprisingly difficult.

We want a solution that:

  - is portable

  - never reads more than N bytes due to buffering (which
    would mean those bytes are not available to the next
    program to read from the same pipe)

  - handles partial reads by looping until N bytes or read
    (or we see EOF)

  - is resilient to stray signals giving us EINTR while
    trying to read (even though we don't send them, things
    like SIGWINCH could cause apparently-random failures)

Some possible solutions are:

  - "head -c" is not portable, and implementations may
    buffer (though GNU head does not)

  - "read -N" is a bash-ism, and thus not portable

  - "dd bs=$n count=1" does not handle partial reads. GNU dd
    has iflags=fullblock, but that is not portable

  - "dd bs=1 count=$n" fixes the partial read problem (all
    reads are 1-byte, so there can be no partial response).
    It does make a lot of write() calls, but for our tests
    that's unlikely to matter.  It's fairly portable. We
    already use it in our tests, and it's unlikely that
    implementations would screw up any of our criteria. The
    most unknown one would be signal handling.

  - perl can do a sysread() loop pretty easily. On my Linux
    system, at least, it seems to restart the read() call
    automatically. If that turns out not to be portable,
    though, it would be easy for us to handle it.

That makes the perl solution the least bad (because we
conveniently omitted "length of code" as a criterion).
It's also what t9300 is currently using, so we can just pull
the implementation from there.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t9300-fast-import.sh  | 23 +++--------------------
 t/test-lib-functions.sh | 14 ++++++++++++++
 2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 74d740d..2e0ba3e 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -7,23 +7,6 @@ test_description='test git fast-import utility'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/diff-lib.sh ;# test-lib chdir's into trash
 
-# Print $1 bytes from stdin to stdout.
-#
-# This could be written as "head -c $1", but IRIX "head" does not
-# support the -c option.
-head_c () {
-	perl -e '
-		my $len = $ARGV[1];
-		while ($len > 0) {
-			my $s;
-			my $nread = sysread(STDIN, $s, $len);
-			die "cannot read: $!" unless defined($nread);
-			print $s;
-			$len -= $nread;
-		}
-	' - "$1"
-}
-
 verify_packs () {
 	for p in .git/objects/pack/*.pack
 	do
@@ -2481,7 +2464,7 @@ test_expect_success PIPE 'R: copy using cat-file' '
 
 		read blob_id type size <&3 &&
 		echo "$blob_id $type $size" >response &&
-		head_c $size >blob <&3 &&
+		test_copy_bytes $size >blob <&3 &&
 		read newline <&3 &&
 
 		cat <<-EOF &&
@@ -2524,7 +2507,7 @@ test_expect_success PIPE 'R: print blob mid-commit' '
 		EOF
 
 		read blob_id type size <&3 &&
-		head_c $size >actual <&3 &&
+		test_copy_bytes $size >actual <&3 &&
 		read newline <&3 &&
 
 		echo
@@ -2559,7 +2542,7 @@ test_expect_success PIPE 'R: print staged blob within commit' '
 		echo "cat-blob $to_get" &&
 
 		read blob_id type size <&3 &&
-		head_c $size >actual <&3 &&
+		test_copy_bytes $size >actual <&3 &&
 		read newline <&3 &&
 
 		echo deleteall
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 48884d5..90856d6 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -961,3 +961,17 @@ test_env () {
 		done
 	)
 }
+
+# Read up to "$1" bytes (or to EOF) from stdin and write them to stdout.
+test_copy_bytes () {
+	perl -e '
+		my $len = $ARGV[1];
+		while ($len > 0) {
+			my $s;
+			my $nread = sysread(STDIN, $s, $len);
+			die "cannot read: $!" unless defined($nread);
+			print $s;
+			$len -= $nread;
+		}
+	' - "$1"
+}
-- 
2.9.0.317.g65b4e7c


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-06-30  9:06 [PATCH v4 0/5] friendlier handling of overflows in archive-tar Jeff King
  2016-06-30  9:07 ` [PATCH v4 1/5] t9300: factor out portable "head -c" replacement Jeff King
@ 2016-06-30  9:08 ` Jeff King
  2016-07-14 15:47   ` Johannes Schindelin
  2016-06-30  9:09 ` [PATCH v4 3/5] archive-tar: write extended headers for file sizes >= 8GB Jeff King
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2016-06-30  9:08 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Johannes Sixt, Junio C Hamano

The ustar format only has room for 11 (or 12, depending on
some implementations) octal digits for the size and mtime of
each file. For values larger than this, we have to add pax
extended headers to specify the real data, and git does not
yet know how to do so.

Before fixing that, let's start off with some test
infrastructure, as designing portable and efficient tests
for this is non-trivial.

We want to use the system tar to check our output (because
what we really care about is interoperability), but we can't
rely on it:

  1. being able to read pax headers

  2. being able to handle huge sizes or mtimes

  3. supporting a "t" format we can parse

So as a prerequisite, we can feed the system tar a reference
tarball to make sure it can handle these features. The
reference tar here was created with:

  dd if=/dev/zero seek=64G bs=1 count=1 of=huge
  touch -d @68719476737 huge
  tar cf - --format=pax |
  head -c 2048

using GNU tar. Note that this is not a complete tarfile, but
it's enough to contain the headers we want to examine.

Likewise, we need to convince git that it has a 64GB blob to
output. Running "git add" on that 64GB file takes many
minutes of CPU, and even compressed, the result is 64MB. So
again, I pre-generated that loose object, and then took only
the first 2k of it. That should be enough to generate 2MB of
data before hitting an inflate error, which is plenty for us
to generate the tar header (and then die of SIGPIPE while
streaming the rest out).

The tests are split so that we test as much as we can even
with an uncooperative system tar. This actually catches the
current breakage (which is that we die("BUG") trying to
write the ustar header) on every system, and then on systems
where we can, we go farther and actually verify the result.

Helped-by: Robin H. Johnson <robbat2@gentoo.org>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5000-tar-tree.sh                              |  74 +++++++++++++++++++++++
 t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a | Bin 0 -> 2048 bytes
 t/t5000/huge-and-future.tar                      | Bin 0 -> 2048 bytes
 3 files changed, 74 insertions(+)
 create mode 100644 t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a
 create mode 100644 t/t5000/huge-and-future.tar

diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 4b68bba..950bdd3 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -319,4 +319,78 @@ test_expect_success 'catch non-matching pathspec' '
 	test_must_fail git archive -v HEAD -- "*.abc" >/dev/null
 '
 
+# Pull the size and date of each entry in a tarfile using the system tar.
+#
+# We'll pull out only the year from the date; that avoids any question of
+# timezones impacting the result (as long as we keep our test times away from a
+# year boundary; our reference times are all in August).
+#
+# The output of tar_info is expected to be "<size> <year>", both in decimal. It
+# ignores the return value of tar. We have to do this, because some of our test
+# input is only partial (the real data is 64GB in some cases).
+tar_info () {
+	"$TAR" tvf "$1" |
+	awk '{
+		split($4, date, "-")
+		print $3 " " date[1]
+	}'
+}
+
+# See if our system tar can handle a tar file with huge sizes and dates far in
+# the future, and that we can actually parse its output.
+#
+# The reference file was generated by GNU tar, and the magic time and size are
+# both octal 01000000000001, which overflows normal ustar fields.
+test_lazy_prereq TAR_HUGE '
+	echo "68719476737 4147" >expect &&
+	tar_info "$TEST_DIRECTORY"/t5000/huge-and-future.tar >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'set up repository with huge blob' '
+	obj_d=19 &&
+	obj_f=f9c8273ec45a8938e6999cb59b3ff66739902a &&
+	obj=${obj_d}${obj_f} &&
+	mkdir -p .git/objects/$obj_d &&
+	cp "$TEST_DIRECTORY"/t5000/$obj .git/objects/$obj_d/$obj_f &&
+	rm -f .git/index &&
+	git update-index --add --cacheinfo 100644,$obj,huge &&
+	git commit -m huge
+'
+
+# We expect git to die with SIGPIPE here (otherwise we
+# would generate the whole 64GB).
+test_expect_failure 'generate tar with huge size' '
+	{
+		git archive HEAD
+		echo $? >exit-code
+	} | test_copy_bytes 4096 >huge.tar &&
+	echo 141 >expect &&
+	test_cmp expect exit-code
+'
+
+test_expect_failure TAR_HUGE 'system tar can read our huge size' '
+	echo 68719476737 >expect &&
+	tar_info huge.tar | cut -d" " -f1 >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'set up repository with far-future commit' '
+	rm -f .git/index &&
+	echo content >file &&
+	git add file &&
+	GIT_COMMITTER_DATE="@68719476737 +0000" \
+		git commit -m "tempori parendum"
+'
+
+test_expect_failure 'generate tar with future mtime' '
+	git archive HEAD >future.tar
+'
+
+test_expect_failure TAR_HUGE 'system tar can read our future mtime' '
+	echo 4147 >expect &&
+	tar_info future.tar | cut -d" " -f2 >actual &&
+	test_cmp expect actual
+'
+
 test_done
diff --git a/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a b/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a
new file mode 100644
index 0000000000000000000000000000000000000000..5cbe9ec312bfd7b7e0398ca281e9d42848743704
GIT binary patch
literal 2048
zcmb=p_2!@<FM|RPgTa!7MhCMEn`g89mv_5$t@dj2a|><dbTfugFd71*Aut*OqaiRF
L0;3@?%t8PFWtt9Z

literal 0
HcmV?d00001

diff --git a/t/t5000/huge-and-future.tar b/t/t5000/huge-and-future.tar
new file mode 100644
index 0000000000000000000000000000000000000000..63155e185585c7589b0db1d6da199bfa27f517c7
GIT binary patch
literal 2048
zcmeHH%L;=q5X{-H$QS6YCRsi7-eZ3uw6XOd6dxe{`X*^rs7fzVia3{DW=WVGG6|!T
z?v6%ZOjU;{l%nX?UJY9lV4;Lyu3CInz(g<_!2ppkX1rTd#L``D-RR0nTAFX1kAc_4
z!yHsfm<dvpP!J<8o1&bMdO{|^&z|%z2c<+6LM8=4Dk<2wb(>gk^{~&l;zIw<Ka%wM
q@2eX*^nb#uM^s?*|C3Di`M;YypV2;0-{yXeagpKN-s}$iu>((@uRaR^

literal 0
HcmV?d00001

-- 
2.9.0.317.g65b4e7c


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 3/5] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-30  9:06 [PATCH v4 0/5] friendlier handling of overflows in archive-tar Jeff King
  2016-06-30  9:07 ` [PATCH v4 1/5] t9300: factor out portable "head -c" replacement Jeff King
  2016-06-30  9:08 ` [PATCH v4 2/5] t5000: test tar files that overflow ustar headers Jeff King
@ 2016-06-30  9:09 ` Jeff King
  2016-07-14 16:48   ` Johannes Sixt
  2016-06-30  9:09 ` [PATCH v4 4/5] archive-tar: write extended headers for far-future mtime Jeff King
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2016-06-30  9:09 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Johannes Sixt, Junio C Hamano

The ustar format has a fixed-length field for the size of
each file entry which is supposed to contain up to 11 bytes
of octal-formatted data plus a NUL or space terminator.

These means that the largest size we can represent is
077777777777, or 1 byte short of 8GB. The correct solution
for a larger file, according to POSIX.1-2001, is to add an
extended pax header, similar to how we handle long
filenames. This patch does that, and writes zero for the
size field in the ustar header (the last bit is not
mentioned by POSIX, but it matches how GNU tar behaves with
--format=pax).

This should be a strict improvement over the current
behavior, which is to die in xsnprintf with a "BUG".
However, there's some interesting history here.

Prior to f2f0267 (archive-tar: use xsnprintf for trivial
formatting, 2015-09-24), we silently overflowed the "size"
field. The extra bytes ended up in the "mtime" field of the
header, which was then immediately written itself,
overwriting our extra bytes. What that means depends on how
many bytes we wrote.

If the size was 64GB or greater, then we actually overflowed
digits into the mtime field, meaning our value was
effectively right-shifted by those lost octal digits. And
this patch is again a strict improvement over that.

But if the size was between 8GB and 64GB, then our 12-byte
field held all of the actual digits, and only our NUL
terminator overflowed. According to POSIX, there should be a
NUL or space at the end of the field. However, GNU tar seems
to be lenient here, and will correctly parse a size up 64GB
(minus one) from the field. So sizes in this range might
have just worked, depending on the implementation reading
the tarfile.

This patch is mostly still an improvement there, as the 8GB
limit is specifically mentioned in POSIX as the correct
limit. But it's possible that it could be a regression
(versus the pre-f2f0267 state) if all of the following are
true:

  1. You have a file between 8GB and 64GB.

  2. Your tar implementation _doesn't_ know about pax
     extended headers.

  3. Your tar implementation _does_ parse 12-byte sizes from
     the ustar header without a delimiter.

It's probably not worth worrying about such an obscure set
of conditions, but I'm documenting it here just in case.

Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
---
 archive-tar.c       | 31 +++++++++++++++++++++++++++++--
 t/t5000-tar-tree.sh |  4 ++--
 2 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index cb99df2..57a1540 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -18,6 +18,13 @@ static int tar_umask = 002;
 static int write_tar_filter_archive(const struct archiver *ar,
 				    struct archiver_args *args);
 
+/*
+ * This is the max value that a ustar size header can specify, as it is fixed
+ * at 11 octal digits. POSIX specifies that we switch to extended headers at
+ * this size.
+ */
+#define USTAR_MAX_SIZE 077777777777UL
+
 /* writes out the whole block, but only if it is full */
 static void write_if_needed(void)
 {
@@ -137,6 +144,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword,
 	strbuf_addch(sb, '\n');
 }
 
+/*
+ * Like strbuf_append_ext_header, but for numeric values.
+ */
+static void strbuf_append_ext_header_uint(struct strbuf *sb,
+					  const char *keyword,
+					  uintmax_t value)
+{
+	char buf[40]; /* big enough for 2^128 in decimal, plus NUL */
+	int len;
+
+	len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value);
+	strbuf_append_ext_header(sb, keyword, buf, len);
+}
+
 static unsigned int ustar_header_chksum(const struct ustar_header *header)
 {
 	const unsigned char *p = (const unsigned char *)header;
@@ -208,7 +229,7 @@ static int write_tar_entry(struct archiver_args *args,
 	struct ustar_header header;
 	struct strbuf ext_header = STRBUF_INIT;
 	unsigned int old_mode = mode;
-	unsigned long size;
+	unsigned long size, size_in_header;
 	void *buffer;
 	int err = 0;
 
@@ -267,7 +288,13 @@ static int write_tar_entry(struct archiver_args *args,
 			memcpy(header.linkname, buffer, size);
 	}
 
-	prepare_header(args, &header, mode, size);
+	size_in_header = size;
+	if (S_ISREG(mode) && size > USTAR_MAX_SIZE) {
+		size_in_header = 0;
+		strbuf_append_ext_header_uint(&ext_header, "size", size);
+	}
+
+	prepare_header(args, &header, mode, size_in_header);
 
 	if (ext_header.len > 0) {
 		err = write_extended_header(args, sha1, ext_header.buf,
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 950bdd3..93c2d34 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -360,7 +360,7 @@ test_expect_success 'set up repository with huge blob' '
 
 # We expect git to die with SIGPIPE here (otherwise we
 # would generate the whole 64GB).
-test_expect_failure 'generate tar with huge size' '
+test_expect_success 'generate tar with huge size' '
 	{
 		git archive HEAD
 		echo $? >exit-code
@@ -369,7 +369,7 @@ test_expect_failure 'generate tar with huge size' '
 	test_cmp expect exit-code
 '
 
-test_expect_failure TAR_HUGE 'system tar can read our huge size' '
+test_expect_success TAR_HUGE 'system tar can read our huge size' '
 	echo 68719476737 >expect &&
 	tar_info huge.tar | cut -d" " -f1 >actual &&
 	test_cmp expect actual
-- 
2.9.0.317.g65b4e7c


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 4/5] archive-tar: write extended headers for far-future mtime
  2016-06-30  9:06 [PATCH v4 0/5] friendlier handling of overflows in archive-tar Jeff King
                   ` (2 preceding siblings ...)
  2016-06-30  9:09 ` [PATCH v4 3/5] archive-tar: write extended headers for file sizes >= 8GB Jeff King
@ 2016-06-30  9:09 ` Jeff King
  2016-06-30  9:09 ` [PATCH v4 5/5] archive-tar: drop return value Jeff King
  2016-06-30  9:14 ` [PATCH v4 6/5] t5000: use test_match_signal Jeff King
  5 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2016-06-30  9:09 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Johannes Sixt, Junio C Hamano

The ustar format represents timestamps as seconds since the
epoch, but only has room to store 11 octal digits.  To
express anything larger, we need to use an extended header.
This is exactly the same case we fixed for the size field in
the previous commit, and the solution here follows the same
pattern.

This is even mentioned as an issue in f2f0267 (archive-tar:
use xsnprintf for trivial formatting, 2015-09-24), but since
it only affected things far in the future, it wasn't deemed
worth dealing with. But note that my calculations claiming
thousands of years were off there; because our xsnprintf
produces a NUL byte, we only have until the year 2242 to fix
this.

Given that this is just around the corner (geologically
speaking, anyway), and because it's easy to fix, let's just
make it work. Unlike the previous fix for "size", where we
had to write an individual extended header for each file, we
can write one global header (since we have only one mtime
for the whole archive).

There's a slight bit of trickiness there. We may already be
writing a global header with a "comment" field for the
commit sha1. So we need to write our new field into the same
header. To do this, we push the decision of whether to write
such a header down into write_global_extended_header(),
which will now assemble the header as it sees fit, and will
return early if we have nothing to write (in practice, we'll
only have a large mtime if it comes from a commit, but this
makes it also work if you set your system clock ahead such
that time() returns a huge value).

Note that we don't (and never did) handle negative
timestamps (i.e., before 1970). This would probably not be
too hard to support in the same way, but since git does not
support negative timestamps at all, I didn't bother here.

After writing the extended header, we munge the timestamp in
the ustar headers to the maximum-allowable size. This is
wrong, but it's the least-wrong thing we can provide to a
tar implementation that doesn't understand pax headers (it's
also what GNU tar does).

Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
---
 archive-tar.c       | 19 ++++++++++++++++---
 t/t5000-tar-tree.sh |  4 ++--
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index 57a1540..d671bc3 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -22,8 +22,11 @@ static int write_tar_filter_archive(const struct archiver *ar,
  * This is the max value that a ustar size header can specify, as it is fixed
  * at 11 octal digits. POSIX specifies that we switch to extended headers at
  * this size.
+ *
+ * Likewise for the mtime (which happens to use a buffer of the same size).
  */
 #define USTAR_MAX_SIZE 077777777777UL
+#define USTAR_MAX_MTIME 077777777777UL
 
 /* writes out the whole block, but only if it is full */
 static void write_if_needed(void)
@@ -324,7 +327,18 @@ static int write_global_extended_header(struct archiver_args *args)
 	unsigned int mode;
 	int err = 0;
 
-	strbuf_append_ext_header(&ext_header, "comment", sha1_to_hex(sha1), 40);
+	if (sha1)
+		strbuf_append_ext_header(&ext_header, "comment",
+					 sha1_to_hex(sha1), 40);
+	if (args->time > USTAR_MAX_MTIME) {
+		strbuf_append_ext_header_uint(&ext_header, "mtime",
+					      args->time);
+		args->time = USTAR_MAX_MTIME;
+	}
+
+	if (!ext_header.len)
+		return 0;
+
 	memset(&header, 0, sizeof(header));
 	*header.typeflag = TYPEFLAG_GLOBAL_HEADER;
 	mode = 0100666;
@@ -409,8 +423,7 @@ static int write_tar_archive(const struct archiver *ar,
 {
 	int err = 0;
 
-	if (args->commit_sha1)
-		err = write_global_extended_header(args);
+	err = write_global_extended_header(args);
 	if (!err)
 		err = write_archive_entries(args, write_tar_entry);
 	if (!err)
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 93c2d34..96d208d 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -383,11 +383,11 @@ test_expect_success 'set up repository with far-future commit' '
 		git commit -m "tempori parendum"
 '
 
-test_expect_failure 'generate tar with future mtime' '
+test_expect_success 'generate tar with future mtime' '
 	git archive HEAD >future.tar
 '
 
-test_expect_failure TAR_HUGE 'system tar can read our future mtime' '
+test_expect_success TAR_HUGE 'system tar can read our future mtime' '
 	echo 4147 >expect &&
 	tar_info future.tar | cut -d" " -f2 >actual &&
 	test_cmp expect actual
-- 
2.9.0.317.g65b4e7c


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 5/5] archive-tar: drop return value
  2016-06-30  9:06 [PATCH v4 0/5] friendlier handling of overflows in archive-tar Jeff King
                   ` (3 preceding siblings ...)
  2016-06-30  9:09 ` [PATCH v4 4/5] archive-tar: write extended headers for far-future mtime Jeff King
@ 2016-06-30  9:09 ` Jeff King
  2016-06-30  9:14 ` [PATCH v4 6/5] t5000: use test_match_signal Jeff King
  5 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2016-06-30  9:09 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Johannes Sixt, Junio C Hamano

We never do any error checks, and so never return anything
but "0". Let's just drop this to simplify the code.

Signed-off-by: Jeff King <peff@peff.net>
---
 archive-tar.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index d671bc3..7ea4e90 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -319,13 +319,12 @@ static int write_tar_entry(struct archiver_args *args,
 	return err;
 }
 
-static int write_global_extended_header(struct archiver_args *args)
+static void write_global_extended_header(struct archiver_args *args)
 {
 	const unsigned char *sha1 = args->commit_sha1;
 	struct strbuf ext_header = STRBUF_INIT;
 	struct ustar_header header;
 	unsigned int mode;
-	int err = 0;
 
 	if (sha1)
 		strbuf_append_ext_header(&ext_header, "comment",
@@ -337,7 +336,7 @@ static int write_global_extended_header(struct archiver_args *args)
 	}
 
 	if (!ext_header.len)
-		return 0;
+		return;
 
 	memset(&header, 0, sizeof(header));
 	*header.typeflag = TYPEFLAG_GLOBAL_HEADER;
@@ -347,7 +346,6 @@ static int write_global_extended_header(struct archiver_args *args)
 	write_blocked(&header, sizeof(header));
 	write_blocked(ext_header.buf, ext_header.len);
 	strbuf_release(&ext_header);
-	return err;
 }
 
 static struct archiver **tar_filters;
@@ -423,9 +421,8 @@ static int write_tar_archive(const struct archiver *ar,
 {
 	int err = 0;
 
-	err = write_global_extended_header(args);
-	if (!err)
-		err = write_archive_entries(args, write_tar_entry);
+	write_global_extended_header(args);
+	err = write_archive_entries(args, write_tar_entry);
 	if (!err)
 		write_trailer();
 	return err;
-- 
2.9.0.317.g65b4e7c

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v4 6/5] t5000: use test_match_signal
  2016-06-30  9:06 [PATCH v4 0/5] friendlier handling of overflows in archive-tar Jeff King
                   ` (4 preceding siblings ...)
  2016-06-30  9:09 ` [PATCH v4 5/5] archive-tar: drop return value Jeff King
@ 2016-06-30  9:14 ` Jeff King
  5 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2016-06-30  9:14 UTC (permalink / raw)
  To: git; +Cc: René Scharfe, Johannes Sixt, Junio C Hamano

On Thu, Jun 30, 2016 at 05:06:14AM -0400, Jeff King wrote:

> The one thing that isn't fixed is the use of "141" to test for sigpipe
> death. That should use test_match_signal, but that topic just got
> re-rolled, too.

And here's what the patch for that looks like (which can be applied if
this topic and jk/test-match-signal are merged).

-- >8 --
Subject: t5000: use test_match_signal

We are testing for sigpipe death from git, and doing so
portably requires using our test helper.

Signed-off-by: Jeff King <peff@peff.net>
---

Of course this does not help Windows at all, because we removed the "3"
check from jk/test-match-signal. So this new test will probably need to
be dealt with by Windows folks, one way or another (either extending
test_match_signal, or just skipping the exit code check under the MINGW
prereq).

 t/t5000-tar-tree.sh | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 96d208d..6950d7d 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -365,8 +365,7 @@ test_expect_success 'generate tar with huge size' '
 		git archive HEAD
 		echo $? >exit-code
 	} | test_copy_bytes 4096 >huge.tar &&
-	echo 141 >expect &&
-	test_cmp expect exit-code
+	test_match_signal 13 "$(cat exit-code)"
 '
 
 test_expect_success TAR_HUGE 'system tar can read our huge size' '
-- 
2.9.0.317.g65b4e7c


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 1/5] t9300: factor out portable "head -c" replacement
  2016-06-30  9:07 ` [PATCH v4 1/5] t9300: factor out portable "head -c" replacement Jeff King
@ 2016-07-01  4:45   ` Eric Sunshine
  2016-07-01 17:23   ` Junio C Hamano
  1 sibling, 0 replies; 42+ messages in thread
From: Eric Sunshine @ 2016-07-01  4:45 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Johannes Sixt, Junio C Hamano

On Thursday, June 30, 2016, Jeff King <peff@peff.net> wrote:
> In shell scripts it is sometimes useful to be able to read
> exactly N bytes from a pipe. Doing this portably turns out
> to be surprisingly difficult.
>
> We want a solution that:
>
>   - is portable
>
>   - never reads more than N bytes due to buffering (which
>     would mean those bytes are not available to the next
>     program to read from the same pipe)
>
>   - handles partial reads by looping until N bytes or read

s/or/are/

>     (or we see EOF)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 1/5] t9300: factor out portable "head -c" replacement
  2016-06-30  9:07 ` [PATCH v4 1/5] t9300: factor out portable "head -c" replacement Jeff King
  2016-07-01  4:45   ` Eric Sunshine
@ 2016-07-01 17:23   ` Junio C Hamano
  2016-07-01 18:01     ` Jeff King
  1 sibling, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2016-07-01 17:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Johannes Sixt

Jeff King <peff@peff.net> writes:

> In shell scripts it is sometimes useful to be able to read
> exactly N bytes from a pipe. Doing this portably turns out
> to be surprisingly difficult.

I'd rotate the above by three words ;-).

    It is sometimes useful to be able to read
    exactly N bytes from a pipe. Doing this portably turns out
    to be surprisingly difficult
    in shell scripts.

>   - "dd bs=1 count=$n" fixes the partial read problem (all
>     reads are 1-byte, so there can be no partial response).
>     It does make a lot of write() calls, but for our tests
>     that's unlikely to matter.

It makes me wonder if it helps to use different ibs and obs if many
writes bother you, but because this patch moves us away from dd,
that is a moot point.

> That makes the perl solution the least bad (because we
> conveniently omitted "length of code" as a criterion).
> It's also what t9300 is currently using, so we can just pull
> the implementation from there.

;-).

The patch itself is good.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 1/5] t9300: factor out portable "head -c" replacement
  2016-07-01 17:23   ` Junio C Hamano
@ 2016-07-01 18:01     ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2016-07-01 18:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe, Johannes Sixt

On Fri, Jul 01, 2016 at 10:23:05AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > In shell scripts it is sometimes useful to be able to read
> > exactly N bytes from a pipe. Doing this portably turns out
> > to be surprisingly difficult.
> 
> I'd rotate the above by three words ;-).
> 
>     It is sometimes useful to be able to read
>     exactly N bytes from a pipe. Doing this portably turns out
>     to be surprisingly difficult
>     in shell scripts.

Yeah, I'd very much agree with that (feel free to mark it up as you
apply).

> >   - "dd bs=1 count=$n" fixes the partial read problem (all
> >     reads are 1-byte, so there can be no partial response).
> >     It does make a lot of write() calls, but for our tests
> >     that's unlikely to matter.
> 
> It makes me wonder if it helps to use different ibs and obs if many
> writes bother you, but because this patch moves us away from dd,
> that is a moot point.

Actually, I just mis-spoke (mis-wrote?). I meant to say that it makes a
lot of read() calls. (It probably also makes a lot of write() calls, but
that is beside the point).

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-06-30  9:08 ` [PATCH v4 2/5] t5000: test tar files that overflow ustar headers Jeff King
@ 2016-07-14 15:47   ` Johannes Schindelin
  2016-07-14 16:45     ` Johannes Sixt
  2016-07-14 18:21     ` Jeff King
  0 siblings, 2 replies; 42+ messages in thread
From: Johannes Schindelin @ 2016-07-14 15:47 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Johannes Sixt, Junio C Hamano

Hi Peff,

sorry for the very, very late reply.

On Thu, 30 Jun 2016, Jeff King wrote:

> The ustar format only has room for 11 (or 12, depending on
> some implementations) octal digits for the size and mtime of
> each file. For values larger than this, we have to add pax
> extended headers to specify the real data, and git does not
> yet know how to do so.
>
> [...]
>  t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a | Bin 0 -> 2048 bytes

It appears that this blob cannot be read when sizeof(unsigned long) == 4.
This happens to break the t5000 test on Windows, where that comparison
holds true.

I am sure that I missed some other discussion about this issue... could
you point me to it?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 15:47   ` Johannes Schindelin
@ 2016-07-14 16:45     ` Johannes Sixt
  2016-07-14 17:08       ` Junio C Hamano
  2016-07-14 18:24       ` Jeff King
  2016-07-14 18:21     ` Jeff King
  1 sibling, 2 replies; 42+ messages in thread
From: Johannes Sixt @ 2016-07-14 16:45 UTC (permalink / raw)
  To: Johannes Schindelin, Jeff King; +Cc: git, René Scharfe, Junio C Hamano

Am 14.07.2016 um 17:47 schrieb Johannes Schindelin:
> On Thu, 30 Jun 2016, Jeff King wrote:
>> The ustar format only has room for 11 (or 12, depending on
>> some implementations) octal digits for the size and mtime of
>> each file. For values larger than this, we have to add pax
>> extended headers to specify the real data, and git does not
>> yet know how to do so.
>>
>> [...]
>>  t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a | Bin 0 -> 2048 bytes
>
> It appears that this blob cannot be read when sizeof(unsigned long) == 4.
> This happens to break the t5000 test on Windows, where that comparison
> holds true.

The problem occurs in parse_sha1_header_extended(), where the 
calculation of the size in the object header overflows our 32-bit 
unsigned long.

-- Hannes


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/5] archive-tar: write extended headers for file sizes >= 8GB
  2016-06-30  9:09 ` [PATCH v4 3/5] archive-tar: write extended headers for file sizes >= 8GB Jeff King
@ 2016-07-14 16:48   ` Johannes Sixt
  2016-07-14 17:11     ` Junio C Hamano
  2016-07-15  2:59     ` Torsten Bögershausen
  0 siblings, 2 replies; 42+ messages in thread
From: Johannes Sixt @ 2016-07-14 16:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe, Junio C Hamano

Am 30.06.2016 um 11:09 schrieb Jeff King:
> +/*
> + * This is the max value that a ustar size header can specify, as it is fixed
> + * at 11 octal digits. POSIX specifies that we switch to extended headers at
> + * this size.
> + */
> +#define USTAR_MAX_SIZE 077777777777UL

This is too large by one bit for our 32-bit unsigned long on Windows:

archive-tar.c: In function 'write_tar_entry':
archive-tar.c:295: warning: integer constant is too large for 'unsigned 
long' type
archive-tar.c: In function 'write_global_extended_header':
archive-tar.c:332: warning: integer constant is too large for 'unsigned 
long' type
archive-tar.c:335: warning: integer constant is too large for 'unsigned 
long' type
archive-tar.c:335: warning: overflow in implicit constant conversion

-- Hannes


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 16:45     ` Johannes Sixt
@ 2016-07-14 17:08       ` Junio C Hamano
  2016-07-14 20:52         ` Johannes Sixt
  2016-07-14 18:24       ` Jeff King
  1 sibling, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 17:08 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Johannes Schindelin, Jeff King, git, René Scharfe

Johannes Sixt <j6t@kdbg.org> writes:

> Am 14.07.2016 um 17:47 schrieb Johannes Schindelin:
>> On Thu, 30 Jun 2016, Jeff King wrote:
>>> The ustar format only has room for 11 (or 12, depending on
>>> some implementations) octal digits for the size and mtime of
>>> each file. For values larger than this, we have to add pax
>>> extended headers to specify the real data, and git does not
>>> yet know how to do so.
>>>
>>> [...]
>>>  t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a | Bin 0 -> 2048 bytes
>>
>> It appears that this blob cannot be read when sizeof(unsigned long) == 4.
>> This happens to break the t5000 test on Windows, where that comparison
>> holds true.
>
> The problem occurs in parse_sha1_header_extended(), where the
> calculation of the size in the object header overflows our 32-bit
> unsigned long.

OK, then we'd want something that measures how big "unsigned long"
is, and use it as a lazy prerequisite 64BIT_LONG, tweaking the other
patch to t0006 the other Johannes sent yesterday.

Dscho?  I'll revert the merge of 'js/t0006-for-v2.9.2' out of
'next' so that we can replace it with your newer version, but it
needs to be massaged to lose the strong linkage with "time", as
it is no longer "is our time big enough", I would think.

Thanks.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/5] archive-tar: write extended headers for file sizes >= 8GB
  2016-07-14 16:48   ` Johannes Sixt
@ 2016-07-14 17:11     ` Junio C Hamano
  2016-07-14 18:16       ` Jeff King
  2016-07-15  2:59     ` Torsten Bögershausen
  1 sibling, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 17:11 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Jeff King, git, René Scharfe

Johannes Sixt <j6t@kdbg.org> writes:

> Am 30.06.2016 um 11:09 schrieb Jeff King:
>> +/*
>> + * This is the max value that a ustar size header can specify, as it is fixed
>> + * at 11 octal digits. POSIX specifies that we switch to extended headers at
>> + * this size.
>> + */
>> +#define USTAR_MAX_SIZE 077777777777UL
>
> This is too large by one bit for our 32-bit unsigned long on Windows:
>
> archive-tar.c: In function 'write_tar_entry':
> archive-tar.c:295: warning: integer constant is too large for
> unsigned long' type
> archive-tar.c: In function 'write_global_extended_header':
> archive-tar.c:332: warning: integer constant is too large for
> unsigned long' type
> archive-tar.c:335: warning: integer constant is too large for
> unsigned long' type
> archive-tar.c:335: warning: overflow in implicit constant conversion

Yikes.  I guess we need ULL here, and it cascade to all the
variables used to interact with this constant, but not everybody has
"long long", so....


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/5] archive-tar: write extended headers for file sizes >= 8GB
  2016-07-14 17:11     ` Junio C Hamano
@ 2016-07-14 18:16       ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2016-07-14 18:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, git, René Scharfe

On Thu, Jul 14, 2016 at 10:11:18AM -0700, Junio C Hamano wrote:

> Johannes Sixt <j6t@kdbg.org> writes:
> 
> > Am 30.06.2016 um 11:09 schrieb Jeff King:
> >> +/*
> >> + * This is the max value that a ustar size header can specify, as it is fixed
> >> + * at 11 octal digits. POSIX specifies that we switch to extended headers at
> >> + * this size.
> >> + */
> >> +#define USTAR_MAX_SIZE 077777777777UL
> >
> > This is too large by one bit for our 32-bit unsigned long on Windows:
> >
> > archive-tar.c: In function 'write_tar_entry':
> > archive-tar.c:295: warning: integer constant is too large for
> > unsigned long' type
> > archive-tar.c: In function 'write_global_extended_header':
> > archive-tar.c:332: warning: integer constant is too large for
> > unsigned long' type
> > archive-tar.c:335: warning: integer constant is too large for
> > unsigned long' type
> > archive-tar.c:335: warning: overflow in implicit constant conversion
> 
> Yikes.  I guess we need ULL here, and it cascade to all the
> variables used to interact with this constant, but not everybody has
> "long long", so....

On 32-bit systems, I suspect the tar limits are a non-issue. For real
filesystem operations, tar on such a system would use off_t. But we use
"unsigned long" internally for object sizes, so I imagine that objects
larger than 4G are simply impossible on such systems.

So one option would be to simply "#if"-out these checks on 32-bit
systems.

I think it would also be OK to just set USTAR_MAX_SIZE to ULONG_MAX on
such a system, too (which effectively eliminates the check, but keeps
the conditional bits contained).

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 15:47   ` Johannes Schindelin
  2016-07-14 16:45     ` Johannes Sixt
@ 2016-07-14 18:21     ` Jeff King
  2016-07-14 20:00       ` Junio C Hamano
  1 sibling, 1 reply; 42+ messages in thread
From: Jeff King @ 2016-07-14 18:21 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, René Scharfe, Johannes Sixt, Junio C Hamano

On Thu, Jul 14, 2016 at 05:47:41PM +0200, Johannes Schindelin wrote:

> On Thu, 30 Jun 2016, Jeff King wrote:
> 
> > The ustar format only has room for 11 (or 12, depending on
> > some implementations) octal digits for the size and mtime of
> > each file. For values larger than this, we have to add pax
> > extended headers to specify the real data, and git does not
> > yet know how to do so.
> >
> > [...]
> >  t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a | Bin 0 -> 2048 bytes
> 
> It appears that this blob cannot be read when sizeof(unsigned long) == 4.
> This happens to break the t5000 test on Windows, where that comparison
> holds true.
> 
> I am sure that I missed some other discussion about this issue... could
> you point me to it?

There's tons of discussion in:

  http://thread.gmane.org/gmane.comp.version-control.git/297409

but frankly it is not worth your time to read it. These tests are about
overflowing the tar limits, which can only happen with times and sizes
greater than 32-bits. The right thing to do is to skip the tests
entirely on systems where sizeof(unsigned long) is less than 8 (the
actual value is 64GB+1, so technically a 37-bit system would work, but I
think it is OK for the test-skipping to be less specific).

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 16:45     ` Johannes Sixt
  2016-07-14 17:08       ` Junio C Hamano
@ 2016-07-14 18:24       ` Jeff King
  1 sibling, 0 replies; 42+ messages in thread
From: Jeff King @ 2016-07-14 18:24 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Johannes Schindelin, git, René Scharfe, Junio C Hamano

On Thu, Jul 14, 2016 at 06:45:47PM +0200, Johannes Sixt wrote:

> Am 14.07.2016 um 17:47 schrieb Johannes Schindelin:
> > On Thu, 30 Jun 2016, Jeff King wrote:
> > > The ustar format only has room for 11 (or 12, depending on
> > > some implementations) octal digits for the size and mtime of
> > > each file. For values larger than this, we have to add pax
> > > extended headers to specify the real data, and git does not
> > > yet know how to do so.
> > > 
> > > [...]
> > >  t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a | Bin 0 -> 2048 bytes
> > 
> > It appears that this blob cannot be read when sizeof(unsigned long) == 4.
> > This happens to break the t5000 test on Windows, where that comparison
> > holds true.
> 
> The problem occurs in parse_sha1_header_extended(), where the calculation of
> the size in the object header overflows our 32-bit unsigned long.

Yep, unsurprising.

I do think git is wrong to use "unsigned long" there. It really ought to
be "size_t". My understanding is that 64-bit Windows is LLP64, which
means that you cannot currently have blobs greater than 4GB there, even
though it would work correctly with size_t.

Switching all of the things that look at blob sizes to size_t will be a
big job, though.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 18:21     ` Jeff King
@ 2016-07-14 20:00       ` Junio C Hamano
  2016-07-14 20:03         ` Junio C Hamano
                           ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 20:00 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, René Scharfe, Johannes Sixt

Jeff King <peff@peff.net> writes:

> On Thu, Jul 14, 2016 at 05:47:41PM +0200, Johannes Schindelin wrote:
>
>> On Thu, 30 Jun 2016, Jeff King wrote:
>> 
>> > The ustar format only has room for 11 (or 12, depending on
>> > some implementations) octal digits for the size and mtime of
>> > each file. For values larger than this, we have to add pax
>> > extended headers to specify the real data, and git does not
>> > yet know how to do so.
>> >
>> > [...]
>> >  t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a | Bin 0 -> 2048 bytes
>> 
>> It appears that this blob cannot be read when sizeof(unsigned long) == 4.
>> This happens to break the t5000 test on Windows, where that comparison
>> holds true.
>> 
>> I am sure that I missed some other discussion about this issue... could
>> you point me to it?
>
> There's tons of discussion in:
>
>   http://thread.gmane.org/gmane.comp.version-control.git/297409
>
> but frankly it is not worth your time to read it. These tests are about
> overflowing the tar limits, which can only happen with times and sizes
> greater than 32-bits. The right thing to do is to skip the tests
> entirely on systems where sizeof(unsigned long) is less than 8 (the
> actual value is 64GB+1, so technically a 37-bit system would work, but I
> think it is OK for the test-skipping to be less specific).

OK, how about this on top of a replacement for js/t0006-for-v2.9.2
that I'll send out as a reply to this message?




 archive-tar.c       |  5 +++++
 t/t5000-tar-tree.sh | 10 +++++-----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index 7ea4e90..4d2832c 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -25,8 +25,13 @@ static int write_tar_filter_archive(const struct archiver *ar,
  *
  * Likewise for the mtime (which happens to use a buffer of the same size).
  */
+#if ULONG_MAX == 0x7FFFFFFF
+#define USTAR_MAX_SIZE ULONG_MAX
+#define USTAR_MAX_MTIME ULONG_MAX
+#else
 #define USTAR_MAX_SIZE 077777777777UL
 #define USTAR_MAX_MTIME 077777777777UL
+#endif
 
 /* writes out the whole block, but only if it is full */
 static void write_if_needed(void)
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 96d208d..9c97789 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -360,7 +360,7 @@ test_expect_success 'set up repository with huge blob' '
 
 # We expect git to die with SIGPIPE here (otherwise we
 # would generate the whole 64GB).
-test_expect_success 'generate tar with huge size' '
+test_expect_success 64BIT 'generate tar with huge size' '
 	{
 		git archive HEAD
 		echo $? >exit-code
@@ -369,13 +369,13 @@ test_expect_success 'generate tar with huge size' '
 	test_cmp expect exit-code
 '
 
-test_expect_success TAR_HUGE 'system tar can read our huge size' '
+test_expect_success TAR_HUGE,64BIT 'system tar can read our huge size' '
 	echo 68719476737 >expect &&
 	tar_info huge.tar | cut -d" " -f1 >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success 'set up repository with far-future commit' '
+test_expect_success 64BIT 'set up repository with far-future commit' '
 	rm -f .git/index &&
 	echo content >file &&
 	git add file &&
@@ -383,11 +383,11 @@ test_expect_success 'set up repository with far-future commit' '
 		git commit -m "tempori parendum"
 '
 
-test_expect_success 'generate tar with future mtime' '
+test_expect_success 64BIT 'generate tar with future mtime' '
 	git archive HEAD >future.tar
 '
 
-test_expect_success TAR_HUGE 'system tar can read our future mtime' '
+test_expect_success TAR_HUGE,64BIT 'system tar can read our future mtime' '
 	echo 4147 >expect &&
 	tar_info future.tar | cut -d" " -f2 >actual &&
 	test_cmp expect actual

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 20:00       ` Junio C Hamano
@ 2016-07-14 20:03         ` Junio C Hamano
  2016-07-14 20:14           ` Jeff King
  2016-07-14 20:09         ` Junio C Hamano
  2016-07-14 20:10         ` Jeff King
  2 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 20:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, René Scharfe, Johannes Sixt

Junio C Hamano <gitster@pobox.com> writes:

> OK, how about this on top of a replacement for js/t0006-for-v2.9.2
> that I'll send out as a reply to this message?

which is this one, which is largely from your $gmane/299310.

-- >8 --
From: Jeff King <peff@peff.net>
Date: Mon, 11 Jul 2016 19:54:18 -0400
Subject: [PATCH] t0006: skip "far in the future" test when unsigned long is
 not long enough

Git's source code refers to timestamps as unsigned longs.  On 32-bit
platforms, as well as on Windows, unsigned long is not large enough
to capture dates that are "absurdly far in the future".

While we can fix this issue properly by replacing unsigned long with
a larger type, we want to be a bit more conservative and just skip
those tests on the maint track.

Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 help.c          | 7 +++++++
 t/t0006-date.sh | 6 +++---
 t/test-lib.sh   | 9 +++++++++
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/help.c b/help.c
index 19328ea..0cea240 100644
--- a/help.c
+++ b/help.c
@@ -419,6 +419,13 @@ int cmd_version(int argc, const char **argv, const char *prefix)
 	 * with external projects that rely on the output of "git version".
 	 */
 	printf("git version %s\n", git_version_string);
+	while (*++argv) {
+		if (!strcmp(*argv, "--build-options")) {
+			printf("sizeof-unsigned-long: %d",
+			       (int)sizeof(unsigned long));
+			/* maybe also save and output GIT-BUILD_OPTIONS? */
+		}
+	}
 	return 0;
 }
 
diff --git a/t/t0006-date.sh b/t/t0006-date.sh
index 04ce535..a0b8497 100755
--- a/t/t0006-date.sh
+++ b/t/t0006-date.sh
@@ -31,7 +31,7 @@ check_show () {
 	format=$1
 	time=$2
 	expect=$3
-	test_expect_${4:-success} "show date ($format:$time)" '
+	test_expect_success $4 "show date ($format:$time)" '
 		echo "$time -> $expect" >expect &&
 		test-date show:$format "$time" >actual &&
 		test_cmp expect actual
@@ -50,8 +50,8 @@ check_show iso-local "$TIME" '2016-06-15 14:13:20 +0000'
 
 # arbitrary time absurdly far in the future
 FUTURE="5758122296 -0400"
-check_show iso       "$FUTURE" "2152-06-19 18:24:56 -0400"
-check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +0000"
+check_show iso       "$FUTURE" "2152-06-19 18:24:56 -0400" 64BIT
+check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +0000" 64BIT
 
 check_parse() {
 	echo "$1 -> $2" >expect
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 0055ebb..211f1a8 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1111,3 +1111,12 @@ run_with_limited_cmdline () {
 }
 
 test_lazy_prereq CMDLINE_LIMIT 'run_with_limited_cmdline true'
+
+build_option () {
+	git version --build-options |
+	sed -ne "s/^$1: //p"
+}
+
+test_lazy_prereq 64BIT '
+	test 8 -le "$(build_option sizeof-unsigned-long)"
+'
-- 
2.9.1-545-g8c0a069


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 20:00       ` Junio C Hamano
  2016-07-14 20:03         ` Junio C Hamano
@ 2016-07-14 20:09         ` Junio C Hamano
  2016-07-14 20:10         ` Jeff King
  2 siblings, 0 replies; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 20:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, René Scharfe, Johannes Sixt

Junio C Hamano <gitster@pobox.com> writes:

> OK, how about this on top of a replacement for js/t0006-for-v2.9.2
> that I'll send out as a reply to this message?

I won't repeat the patch body, but I had to write a log message, so 
here is what I tentatively committed (not queued yet).

Subject: archive-tar: huge offset and future timestamps would not work on 32-bit

As we are not yet moving everything to size_t but still using ulong
internally when talking about the size of object, platforms with
32-bit long will not be able to produce tar archive with 4GB+ file,
and cannot grok 077777777777UL as a constant.  Disable the extended
header feature and do not test it on them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 20:00       ` Junio C Hamano
  2016-07-14 20:03         ` Junio C Hamano
  2016-07-14 20:09         ` Junio C Hamano
@ 2016-07-14 20:10         ` Jeff King
  2016-07-14 20:22           ` Junio C Hamano
  2 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2016-07-14 20:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, René Scharfe, Johannes Sixt

On Thu, Jul 14, 2016 at 01:00:08PM -0700, Junio C Hamano wrote:

> > There's tons of discussion in:
> >
> >   http://thread.gmane.org/gmane.comp.version-control.git/297409
> >
> > but frankly it is not worth your time to read it. These tests are about
> > overflowing the tar limits, which can only happen with times and sizes
> > greater than 32-bits. The right thing to do is to skip the tests
> > entirely on systems where sizeof(unsigned long) is less than 8 (the
> > actual value is 64GB+1, so technically a 37-bit system would work, but I
> > think it is OK for the test-skipping to be less specific).
> 
> OK, how about this on top of a replacement for js/t0006-for-v2.9.2
> that I'll send out as a reply to this message?

Yeah, I think the patch here mostly makes sense. I tried to think what
could go wrong in this hunk:

> diff --git a/archive-tar.c b/archive-tar.c
> index 7ea4e90..4d2832c 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -25,8 +25,13 @@ static int write_tar_filter_archive(const struct archiver *ar,
>   *
>   * Likewise for the mtime (which happens to use a buffer of the same size).
>   */
> +#if ULONG_MAX == 0x7FFFFFFF
> +#define USTAR_MAX_SIZE ULONG_MAX
> +#define USTAR_MAX_MTIME ULONG_MAX
> +#else
>  #define USTAR_MAX_SIZE 077777777777UL
>  #define USTAR_MAX_MTIME 077777777777UL
> +#endif
>  
>  /* writes out the whole block, but only if it is full */
>  static void write_if_needed(void)

If for some reason we are wrong that objects cannot be larger than
ULONG_MAX (e.g., later on we convert everything to size_t, and 64-bit
LLP platforms handle large objects just fine), then we would prematurely
switch to extended headers on those platforms.

I think that's OK. This would just need cleaned up as part of the
conversion from "unsigned long" to "size_t" (the correct check would
then be against the max size_t).

Also, shouldn't it be checking against 0xFFFFFFFF?

An easier check would be "sizeof()", but I guess we can't use that in a
preprocessor directive.

> -test_expect_success TAR_HUGE 'system tar can read our huge size' '
> +test_expect_success TAR_HUGE,64BIT 'system tar can read our huge size' '

The 64BIT prereq is really "unsigned long is 64-bit". I wonder if we
should call it UL64 or something like that to make it more clear.

That makes it unnecessarily tied-in with the implementation, but it does
make it more clear what we care about; the distinction matters for
things like LP64 vs LLP64.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 20:03         ` Junio C Hamano
@ 2016-07-14 20:14           ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2016-07-14 20:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, René Scharfe, Johannes Sixt

On Thu, Jul 14, 2016 at 01:03:35PM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > OK, how about this on top of a replacement for js/t0006-for-v2.9.2
> > that I'll send out as a reply to this message?
> 
> which is this one, which is largely from your $gmane/299310.

Mostly OK, but two minor nits:

> diff --git a/help.c b/help.c
> index 19328ea..0cea240 100644
> --- a/help.c
> +++ b/help.c
> @@ -419,6 +419,13 @@ int cmd_version(int argc, const char **argv, const char *prefix)
>  	 * with external projects that rely on the output of "git version".
>  	 */
>  	printf("git version %s\n", git_version_string);
> +	while (*++argv) {
> +		if (!strcmp(*argv, "--build-options")) {
> +			printf("sizeof-unsigned-long: %d",
> +			       (int)sizeof(unsigned long));
> +			/* maybe also save and output GIT-BUILD_OPTIONS? */
> +		}

That comment was mostly for explaining my mindset in the discussion. If
we're going to keep it, maybe mark it with a TODO or XXX or something?

> +test_lazy_prereq 64BIT '
> +	test 8 -le "$(build_option sizeof-unsigned-long)"
> +'

As mentioned in my previous email, I wonder if this should be UL64 or
something.

As noted earlier in the thread, t0006 actually cares most directly about
LONG_MAX, but I think it's probably OK to assume it is directly related
to "unsigned long". Also, even if we got past that initial "strtol", I
suspect "unsigned long" _would_ come into play anyway, as that is our
internal representation.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 20:10         ` Jeff King
@ 2016-07-14 20:22           ` Junio C Hamano
  2016-07-14 20:27             ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 20:22 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, René Scharfe, Johannes Sixt

Jeff King <peff@peff.net> writes:

>> +#if ULONG_MAX == 0x7FFFFFFF
>> +#define USTAR_MAX_SIZE ULONG_MAX
>> +#define USTAR_MAX_MTIME ULONG_MAX
>> +#else
>>  #define USTAR_MAX_SIZE 077777777777UL
>>  #define USTAR_MAX_MTIME 077777777777UL
>> +#endif
>>  
>>  /* writes out the whole block, but only if it is full */
>>  static void write_if_needed(void)
>
> If for some reason we are wrong that objects cannot be larger than
> ULONG_MAX (e.g., later on we convert everything to size_t, and 64-bit
> LLP platforms handle large objects just fine), then we would prematurely
> switch to extended headers on those platforms.
>
> I think that's OK. This would just need cleaned up as part of the
> conversion from "unsigned long" to "size_t" (the correct check would
> then be against the max size_t).

> Also, shouldn't it be checking against 0xFFFFFFFF?

Correct.  Somehow I thought I was checking with LONG_MAX.  Will correct.

> An easier check would be "sizeof()", but I guess we can't use that in a
> preprocessor directive.

Yes, I tried it and failed ;-)

>> -test_expect_success TAR_HUGE 'system tar can read our huge size' '
>> +test_expect_success TAR_HUGE,64BIT 'system tar can read our huge size' '
>
> The 64BIT prereq is really "unsigned long is 64-bit". I wonder if we
> should call it UL64 or something like that to make it more clear.
>
> That makes it unnecessarily tied-in with the implementation, but it does
> make it more clear what we care about; the distinction matters for
> things like LP64 vs LLP64.

I do not think any platform is weird enough to have different sizes
for long and ulong, so I am not sure you need UL64.

But pointer size can legitimately be different, so it has a value to
differentiate between LP64 and LLP64, if we start doing things like
"does this platform have large virtual address space?"

LONG_IS_64BIT perhaps to be more readable?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 20:22           ` Junio C Hamano
@ 2016-07-14 20:27             ` Jeff King
  2016-07-14 20:34               ` Junio C Hamano
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2016-07-14 20:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, René Scharfe, Johannes Sixt

On Thu, Jul 14, 2016 at 01:22:01PM -0700, Junio C Hamano wrote:

> > Also, shouldn't it be checking against 0xFFFFFFFF?
> 
> Correct.  Somehow I thought I was checking with LONG_MAX.  Will correct.
> 
> > An easier check would be "sizeof()", but I guess we can't use that in a
> > preprocessor directive.
> 
> Yes, I tried it and failed ;-)

I also found it funny that you used "==" instead "<=", but I cannot
really think of a case where it would matter. If it is "<= 0xFFFFFFFF",
that basically covers 16-bit platforms. I really hope nobody is trying
to run git on such a platform. Doing "< 0xFFFFFFFFFFFFFFFF" to check
for "less than 64-bit" would make more sense, but would probably choke
on a 32-bit preprocessor.

So that everybody is either 32- or 64-bit these days, I think it doesn't
matter in practice.

> >> -test_expect_success TAR_HUGE 'system tar can read our huge size' '
> >> +test_expect_success TAR_HUGE,64BIT 'system tar can read our huge size' '
> >
> > The 64BIT prereq is really "unsigned long is 64-bit". I wonder if we
> > should call it UL64 or something like that to make it more clear.
> >
> > That makes it unnecessarily tied-in with the implementation, but it does
> > make it more clear what we care about; the distinction matters for
> > things like LP64 vs LLP64.
> 
> I do not think any platform is weird enough to have different sizes
> for long and ulong, so I am not sure you need UL64.
> 
> But pointer size can legitimately be different, so it has a value to
> differentiate between LP64 and LLP64, if we start doing things like
> "does this platform have large virtual address space?"
> 
> LONG_IS_64BIT perhaps to be more readable?

Yeah, that is what I was trying to get at, but you stated it much more
clearly. LONG_IS_64BIT is good. I wonder if the "git version
--build-options" should be "sizeof-long", too. It's shorter, and
indicates our assumption that we are talking about all longs, not just
unsigned ones.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 20:27             ` Jeff King
@ 2016-07-14 20:34               ` Junio C Hamano
  2016-07-14 20:43                 ` [PATCH v2 0/2] ulong may only be 32-bit wide Junio C Hamano
  2016-07-15 15:10                 ` [PATCH v4 2/5] t5000: test tar files that overflow ustar headers Johannes Schindelin
  0 siblings, 2 replies; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 20:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, René Scharfe, Johannes Sixt

Jeff King <peff@peff.net> writes:

> Yeah, that is what I was trying to get at, but you stated it much more
> clearly. LONG_IS_64BIT is good. I wonder if the "git version
> --build-options" should be "sizeof-long", too. It's shorter, and
> indicates our assumption that we are talking about all longs, not just
> unsigned ones.

OK, I'll do a final reroll and then wait for Europeans ;-)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 0/2] ulong may only be 32-bit wide
  2016-07-14 20:34               ` Junio C Hamano
@ 2016-07-14 20:43                 ` Junio C Hamano
  2016-07-14 20:43                   ` [PATCH v2 1/2] t0006: skip "far in the future" test when unsigned long is not long enough Junio C Hamano
  2016-07-14 20:43                   ` [PATCH v2 2/2] archive-tar: huge offset and future timestamps would not work on 32-bit Junio C Hamano
  2016-07-15 15:10                 ` [PATCH v4 2/5] t5000: test tar files that overflow ustar headers Johannes Schindelin
  1 sibling, 2 replies; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Johannes Schindelin, René Scharfe, Johannes Sixt

So here is the final reroll from me for now that targets 'maint'
(eventually).

Jeff King (1):
  t0006: skip "far in the future" test when unsigned long is not long
    enough

Junio C Hamano (1):
  archive-tar: huge offset and future timestamps would not work on
    32-bit

 archive-tar.c       |  5 +++++
 help.c              |  6 ++++++
 t/t0006-date.sh     |  6 +++---
 t/t5000-tar-tree.sh | 10 +++++-----
 t/test-lib.sh       |  9 +++++++++
 5 files changed, 28 insertions(+), 8 deletions(-)

-- 
2.9.1-545-g8c0a069


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 1/2] t0006: skip "far in the future" test when unsigned long is not long enough
  2016-07-14 20:43                 ` [PATCH v2 0/2] ulong may only be 32-bit wide Junio C Hamano
@ 2016-07-14 20:43                   ` Junio C Hamano
  2016-07-14 20:43                   ` [PATCH v2 2/2] archive-tar: huge offset and future timestamps would not work on 32-bit Junio C Hamano
  1 sibling, 0 replies; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Johannes Schindelin, René Scharfe, Johannes Sixt

From: Jeff King <peff@peff.net>

Git's source code refers to timestamps as unsigned longs.  On 32-bit
platforms, as well as on Windows, unsigned long is not large enough
to capture dates that are "absurdly far in the future".

While we can fix this issue properly by replacing unsigned long with
a larger type, we want to be a bit more conservative and just skip
those tests on the maint track.

Signed-off-by: Jeff King <peff@peff.net>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 help.c          | 6 ++++++
 t/t0006-date.sh | 6 +++---
 t/test-lib.sh   | 9 +++++++++
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/help.c b/help.c
index 19328ea..2ff3b5a 100644
--- a/help.c
+++ b/help.c
@@ -419,6 +419,12 @@ int cmd_version(int argc, const char **argv, const char *prefix)
 	 * with external projects that rely on the output of "git version".
 	 */
 	printf("git version %s\n", git_version_string);
+	while (*++argv) {
+		if (!strcmp(*argv, "--build-options")) {
+			printf("sizeof-long: %d\n", (int)sizeof(long));
+			/* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */
+		}
+	}
 	return 0;
 }
 
diff --git a/t/t0006-date.sh b/t/t0006-date.sh
index 04ce535..4c8cf58 100755
--- a/t/t0006-date.sh
+++ b/t/t0006-date.sh
@@ -31,7 +31,7 @@ check_show () {
 	format=$1
 	time=$2
 	expect=$3
-	test_expect_${4:-success} "show date ($format:$time)" '
+	test_expect_success $4 "show date ($format:$time)" '
 		echo "$time -> $expect" >expect &&
 		test-date show:$format "$time" >actual &&
 		test_cmp expect actual
@@ -50,8 +50,8 @@ check_show iso-local "$TIME" '2016-06-15 14:13:20 +0000'
 
 # arbitrary time absurdly far in the future
 FUTURE="5758122296 -0400"
-check_show iso       "$FUTURE" "2152-06-19 18:24:56 -0400"
-check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +0000"
+check_show iso       "$FUTURE" "2152-06-19 18:24:56 -0400" LONG_IS_64BIT
+check_show iso-local "$FUTURE" "2152-06-19 22:24:56 +0000" LONG_IS_64BIT
 
 check_parse() {
 	echo "$1 -> $2" >expect
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 0055ebb..11201e9 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1111,3 +1111,12 @@ run_with_limited_cmdline () {
 }
 
 test_lazy_prereq CMDLINE_LIMIT 'run_with_limited_cmdline true'
+
+build_option () {
+	git version --build-options |
+	sed -ne "s/^$1: //p"
+}
+
+test_lazy_prereq LONG_IS_64BIT '
+	test 8 -le "$(build_option sizeof-long)"
+'
-- 
2.9.1-545-g8c0a069


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 2/2] archive-tar: huge offset and future timestamps would not work on 32-bit
  2016-07-14 20:43                 ` [PATCH v2 0/2] ulong may only be 32-bit wide Junio C Hamano
  2016-07-14 20:43                   ` [PATCH v2 1/2] t0006: skip "far in the future" test when unsigned long is not long enough Junio C Hamano
@ 2016-07-14 20:43                   ` Junio C Hamano
  2016-07-14 22:20                     ` Jeff King
  1 sibling, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 20:43 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Johannes Schindelin, René Scharfe, Johannes Sixt

As we are not yet moving everything to size_t but still using ulong
internally when talking about the size of object, platforms with
32-bit long will not be able to produce tar archive with 4GB+ file,
and cannot grok 077777777777UL as a constant.  Disable the extended
header feature and do not test it on them.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 archive-tar.c       |  5 +++++
 t/t5000-tar-tree.sh | 10 +++++-----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/archive-tar.c b/archive-tar.c
index 7ea4e90..5568240 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -25,8 +25,13 @@ static int write_tar_filter_archive(const struct archiver *ar,
  *
  * Likewise for the mtime (which happens to use a buffer of the same size).
  */
+#if ULONG_MAX == 0xFFFFFFFF
+#define USTAR_MAX_SIZE ULONG_MAX
+#define USTAR_MAX_MTIME ULONG_MAX
+#else
 #define USTAR_MAX_SIZE 077777777777UL
 #define USTAR_MAX_MTIME 077777777777UL
+#endif
 
 /* writes out the whole block, but only if it is full */
 static void write_if_needed(void)
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 96d208d..699355b 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -360,7 +360,7 @@ test_expect_success 'set up repository with huge blob' '
 
 # We expect git to die with SIGPIPE here (otherwise we
 # would generate the whole 64GB).
-test_expect_success 'generate tar with huge size' '
+test_expect_success LONG_IS_64BIT 'generate tar with huge size' '
 	{
 		git archive HEAD
 		echo $? >exit-code
@@ -369,13 +369,13 @@ test_expect_success 'generate tar with huge size' '
 	test_cmp expect exit-code
 '
 
-test_expect_success TAR_HUGE 'system tar can read our huge size' '
+test_expect_success TAR_HUGE,LONG_IS_64BIT 'system tar can read our huge size' '
 	echo 68719476737 >expect &&
 	tar_info huge.tar | cut -d" " -f1 >actual &&
 	test_cmp expect actual
 '
 
-test_expect_success 'set up repository with far-future commit' '
+test_expect_success LONG_IS_64BIT 'set up repository with far-future commit' '
 	rm -f .git/index &&
 	echo content >file &&
 	git add file &&
@@ -383,11 +383,11 @@ test_expect_success 'set up repository with far-future commit' '
 		git commit -m "tempori parendum"
 '
 
-test_expect_success 'generate tar with future mtime' '
+test_expect_success LONG_IS_64BIT 'generate tar with future mtime' '
 	git archive HEAD >future.tar
 '
 
-test_expect_success TAR_HUGE 'system tar can read our future mtime' '
+test_expect_success TAR_HUGE,LONG_IS_64BIT 'system tar can read our future mtime' '
 	echo 4147 >expect &&
 	tar_info future.tar | cut -d" " -f2 >actual &&
 	test_cmp expect actual
-- 
2.9.1-545-g8c0a069


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 17:08       ` Junio C Hamano
@ 2016-07-14 20:52         ` Johannes Sixt
  2016-07-14 21:32           ` Jeff King
  2016-07-14 22:26           ` Junio C Hamano
  0 siblings, 2 replies; 42+ messages in thread
From: Johannes Sixt @ 2016-07-14 20:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Jeff King, git, René Scharfe

Am 14.07.2016 um 19:08 schrieb Junio C Hamano:
> Johannes Sixt <j6t@kdbg.org> writes:
>
>> Am 14.07.2016 um 17:47 schrieb Johannes Schindelin:
>>> On Thu, 30 Jun 2016, Jeff King wrote:
>>>> The ustar format only has room for 11 (or 12, depending on
>>>> some implementations) octal digits for the size and mtime of
>>>> each file. For values larger than this, we have to add pax
>>>> extended headers to specify the real data, and git does not
>>>> yet know how to do so.
>>>>
>>>> [...]
>>>>  t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a | Bin 0 -> 2048 bytes
>>>
>>> It appears that this blob cannot be read when sizeof(unsigned long) == 4.
>>> This happens to break the t5000 test on Windows, where that comparison
>>> holds true.
>>
>> The problem occurs in parse_sha1_header_extended(), where the
>> calculation of the size in the object header overflows our 32-bit
>> unsigned long.
>
> OK, then we'd want something that measures how big "unsigned long"
> is, and use it as a lazy prerequisite 64BIT_LONG, tweaking the other
> patch to t0006 the other Johannes sent yesterday.
>
> Dscho?  I'll revert the merge of 'js/t0006-for-v2.9.2' out of
> 'next' so that we can replace it with your newer version, but it
> needs to be massaged to lose the strong linkage with "time", as
> it is no longer "is our time big enough", I would think.

My first thought was that this is not warranted because t0006 is about 
commit time stamps, but the huge-tar breakage is file sizes, and the 
cases should be treated differently.

But on second thought, under the hood, both boil down to the size of 
unsigned long in our implementation. It may make sense to tie both cases 
to the same prerequisite.

On third thought, however, I think the two requirements could diverge in 
the future. The file size case should depend on the size of size_t. The 
timestamp case may become dependent on the size of time_t if we decide 
to move timestamp handling away from unsigned long: in modern(!) 
Microsoft SDKs, time_t is 64 bits, but unsigned long is 32 bits, in both 
the 32-bit and 64-bit environments!

-- Hannes


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 20:52         ` Johannes Sixt
@ 2016-07-14 21:32           ` Jeff King
  2016-07-14 22:30             ` Junio C Hamano
  2016-07-14 22:26           ` Junio C Hamano
  1 sibling, 1 reply; 42+ messages in thread
From: Jeff King @ 2016-07-14 21:32 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Junio C Hamano, Johannes Schindelin, git, René Scharfe

On Thu, Jul 14, 2016 at 10:52:55PM +0200, Johannes Sixt wrote:

> > Dscho?  I'll revert the merge of 'js/t0006-for-v2.9.2' out of
> > 'next' so that we can replace it with your newer version, but it
> > needs to be massaged to lose the strong linkage with "time", as
> > it is no longer "is our time big enough", I would think.
> 
> My first thought was that this is not warranted because t0006 is about
> commit time stamps, but the huge-tar breakage is file sizes, and the cases
> should be treated differently.
> 
> But on second thought, under the hood, both boil down to the size of
> unsigned long in our implementation. It may make sense to tie both cases to
> the same prerequisite.
> 
> On third thought, however, I think the two requirements could diverge in the
> future. The file size case should depend on the size of size_t. The
> timestamp case may become dependent on the size of time_t if we decide to
> move timestamp handling away from unsigned long: in modern(!) Microsoft
> SDKs, time_t is 64 bits, but unsigned long is 32 bits, in both the 32-bit
> and 64-bit environments!

The tar tests actually cover both: big files and big timestamps.

I think as long as the prereq is labeled LONG_IS_64BIT, we can then
adjust the appropriate tests if and when they become runnable on more
systems.

If we move to time_t everywhere, I think we'll need an extra
TIME_T_IS_64BIT, but we can cross that bridge when we come to it.

Likewise I think we'll need SIZE_T_IS_64BIT eventually (for real 32-bit
systems; LLP64 systems like Windows will then be able to run the test).

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/2] archive-tar: huge offset and future timestamps would not work on 32-bit
  2016-07-14 20:43                   ` [PATCH v2 2/2] archive-tar: huge offset and future timestamps would not work on 32-bit Junio C Hamano
@ 2016-07-14 22:20                     ` Jeff King
  2016-07-14 22:36                       ` Junio C Hamano
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2016-07-14 22:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin, René Scharfe, Johannes Sixt

On Thu, Jul 14, 2016 at 01:43:57PM -0700, Junio C Hamano wrote:

> As we are not yet moving everything to size_t but still using ulong
> internally when talking about the size of object, platforms with
> 32-bit long will not be able to produce tar archive with 4GB+ file,
> and cannot grok 077777777777UL as a constant.  Disable the extended
> header feature and do not test it on them.

I tried testing this in a VM with 32-bit Debian. It fixes the build
problems, but t5000 still fails.

I think you need to add the prereq to one more test:

diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 699355b..80b2387 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -347,7 +347,7 @@ test_lazy_prereq TAR_HUGE '
 	test_cmp expect actual
 '
 
-test_expect_success 'set up repository with huge blob' '
+test_expect_success LONG_IS_64BIT 'set up repository with huge blob' '
 	obj_d=19 &&
 	obj_f=f9c8273ec45a8938e6999cb59b3ff66739902a &&
 	obj=${obj_d}${obj_f} &&

We shouldn't be accessing the blob in update-index, but I think "git
commit" does so for the diff (and then after seeing the size says
"whoops, that's binary", but even the size check fails on 32-bit
systems).

So another solution would be to use "commit -q" at the end of that test.
I don't think there's much point, though; it's just setting up a state
for other tests that need LONG_IS_64BIT.

As an aside, it is inadvertently testing that our diff code does not
bother to read the whole blob in such a case. Which maybe argues for
using "commit -q", just because that is not a thing we are intending to
test here.

-Peff

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 20:52         ` Johannes Sixt
  2016-07-14 21:32           ` Jeff King
@ 2016-07-14 22:26           ` Junio C Hamano
  1 sibling, 0 replies; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 22:26 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Johannes Schindelin, Jeff King, git, René Scharfe

Johannes Sixt <j6t@kdbg.org> writes:

> My first thought was that this is not warranted because t0006 is about
> commit time stamps, but the huge-tar breakage is file sizes, and the
> cases should be treated differently.
>
> But on second thought, under the hood, both boil down to the size of
> unsigned long in our implementation. It may make sense to tie both
> cases to the same prerequisite.
>
> On third thought, however, I think the two requirements could diverge
> in the future. The file size case should depend on the size of
> size_t. The timestamp case may become dependent on the size of time_t
> if we decide to move timestamp handling away from unsigned long: in
> modern(!) Microsoft SDKs, time_t is 64 bits, but unsigned long is 32
> bits, in both the 32-bit and 64-bit environments!

I had the same three toughts, but this being a 'maint' material
stopped me going too deep into them.  Right now, "long being 32-bit"
is the source of all of these issues, and we would solve them on the
development track (not necessarily during this cycle) by deciding on
more appropriate types.  Timestamps may become time_t, and object
sizes may become off_t, such changes will come separately, and each
of them would need to lift "unless long is 64-bit, skip this test"
limitation and swap it with something else.


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 21:32           ` Jeff King
@ 2016-07-14 22:30             ` Junio C Hamano
  2016-07-14 22:38               ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 22:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Sixt, Johannes Schindelin, git, René Scharfe

Jeff King <peff@peff.net> writes:

> If we move to time_t everywhere, I think we'll need an extra
> TIME_T_IS_64BIT, but we can cross that bridge when we come to it.
>
> Likewise I think we'll need SIZE_T_IS_64BIT eventually (for real 32-bit
> systems; LLP64 systems like Windows will then be able to run the test).

I guess I wrote essentially the same thing before refreshing my
Inbox.

I am a bit fuzzy between off_t and size_t; the former is for the
size of things you see on the filesystem, while the latter is for
you to give malloc(3).  I would have thought that off_t is the type
we would want at the end of the raw object header, denoting the size
of a blob object when deflated, which could be larger than the size
of a region of memory we can get from malloc(3), in which case we
would use the streaming interface.



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/2] archive-tar: huge offset and future timestamps would not work on 32-bit
  2016-07-14 22:20                     ` Jeff King
@ 2016-07-14 22:36                       ` Junio C Hamano
  2016-07-16  6:28                         ` Duy Nguyen
  0 siblings, 1 reply; 42+ messages in thread
From: Junio C Hamano @ 2016-07-14 22:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Johannes Schindelin, René Scharfe, Johannes Sixt

Jeff King <peff@peff.net> writes:

> On Thu, Jul 14, 2016 at 01:43:57PM -0700, Junio C Hamano wrote:
>
>> As we are not yet moving everything to size_t but still using ulong
>> internally when talking about the size of object, platforms with
>> 32-bit long will not be able to produce tar archive with 4GB+ file,
>> and cannot grok 077777777777UL as a constant.  Disable the extended
>> header feature and do not test it on them.
>
> I tried testing this in a VM with 32-bit Debian. It fixes the build
> problems, but t5000 still fails.

Thanks for testing.  I need to find some time hunting for (or
building) an environment to do that myself.

> I think you need to add the prereq to one more test:
>
> diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
> index 699355b..80b2387 100755
> --- a/t/t5000-tar-tree.sh
> +++ b/t/t5000-tar-tree.sh
> @@ -347,7 +347,7 @@ test_lazy_prereq TAR_HUGE '
>  	test_cmp expect actual
>  '
>  
> -test_expect_success 'set up repository with huge blob' '
> +test_expect_success LONG_IS_64BIT 'set up repository with huge blob' '
>  	obj_d=19 &&
>  	obj_f=f9c8273ec45a8938e6999cb59b3ff66739902a &&
>  	obj=${obj_d}${obj_f} &&
>
> We shouldn't be accessing the blob in update-index, but I think "git
> commit" does so for the diff (and then after seeing the size says
> "whoops, that's binary", but even the size check fails on 32-bit
> systems).
>
> So another solution would be to use "commit -q" at the end of that test.
> I don't think there's much point, though; it's just setting up a state
> for other tests that need LONG_IS_64BIT.
>
> As an aside, it is inadvertently testing that our diff code does not
> bother to read the whole blob in such a case. Which maybe argues for
> using "commit -q", just because that is not a thing we are intending to
> test here.

Thanks.  Let's add a prereq there.

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 22:30             ` Junio C Hamano
@ 2016-07-14 22:38               ` Jeff King
  2016-07-15 13:37                 ` Torsten Bögershausen
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff King @ 2016-07-14 22:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Johannes Schindelin, git, René Scharfe

On Thu, Jul 14, 2016 at 03:30:58PM -0700, Junio C Hamano wrote:

> > If we move to time_t everywhere, I think we'll need an extra
> > TIME_T_IS_64BIT, but we can cross that bridge when we come to it.
> >
> > Likewise I think we'll need SIZE_T_IS_64BIT eventually (for real 32-bit
> > systems; LLP64 systems like Windows will then be able to run the test).
> 
> I guess I wrote essentially the same thing before refreshing my
> Inbox.
> 
> I am a bit fuzzy between off_t and size_t; the former is for the
> size of things you see on the filesystem, while the latter is for
> you to give malloc(3).  I would have thought that off_t is the type
> we would want at the end of the raw object header, denoting the size
> of a blob object when deflated, which could be larger than the size
> of a region of memory we can get from malloc(3), in which case we
> would use the streaming interface.

Yeah, your understanding is right (s/deflated/inflated/). I agree that
off_t is probably a better size for blobs. Traditionally git assumed any
object could fit in memory. The streaming interface helps that somewhat,
but I think there are cases where we still must load a blob (e.g., if it
is stored as a delta). In theory that never happens because of
core.bigfilethreshold, but you may get a packfile from somebody with a
higher threshold than you.

I wouldn't be surprised if there are other cases that aren't smart
enough to use the streaming interface yet, but the solution there is to
make them smarter. :)

So off_t is probably better. We do need to be careful, though, when
allocating objects. E.g., this:

  off_t size;
  struct git_istream *stream;
  void *buf;

  stream = open_istream(sha1, &type, &size, NULL);
  buf = xmalloc(size);
  while (1) {
	/* read stream into buf */
  }

is a security hole when size_t is less than off_t (it gets truncated in
the call to xmalloc, which allocates too few bytes). This is a toy
example, obviously, but it's something to watch out for.

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 3/5] archive-tar: write extended headers for file sizes >= 8GB
  2016-07-14 16:48   ` Johannes Sixt
  2016-07-14 17:11     ` Junio C Hamano
@ 2016-07-15  2:59     ` Torsten Bögershausen
  1 sibling, 0 replies; 42+ messages in thread
From: Torsten Bögershausen @ 2016-07-15  2:59 UTC (permalink / raw)
  To: Johannes Sixt, Jeff King; +Cc: git, René Scharfe, Junio C Hamano



On 07/14/2016 06:48 PM, Johannes Sixt wrote:
> Am 30.06.2016 um 11:09 schrieb Jeff King:
>> +/*
>> + * This is the max value that a ustar size header can specify, as it
>> is fixed
>> + * at 11 octal digits. POSIX specifies that we switch to extended
>> headers at
>> + * this size.
>> + */
>> +#define USTAR_MAX_SIZE 077777777777UL
>
> This is too large by one bit for our 32-bit unsigned long on Windows:
>
> archive-tar.c: In function 'write_tar_entry':
> archive-tar.c:295: warning: integer constant is too large for 'unsigned
> long' type
> archive-tar.c: In function 'write_global_extended_header':
> archive-tar.c:332: warning: integer constant is too large for 'unsigned
> long' type
> archive-tar.c:335: warning: integer constant is too large for 'unsigned
> long' type
> archive-tar.c:335: warning: overflow in implicit constant conversion
>
> -- Hannes
Similar problem on 32 Bit Linux:
  In function ‘write_global_extended_header’:
archive-tar.c:29:25: warning: overflow in implicit constant conversion 
[-Woverflow]
  #define USTAR_MAX_MTIME 077777777777UL
                          ^
archive-tar.c:335:16: note: in expansion of macro ‘USTAR_MAX_MTIME’
    args->time = USTAR_MAX_MTIME;

----------------------------------
I want to volunteer to do more tests on 32 bit Linux ;-)
Does this fix it and look as a good thing to change ?


--- a/archive-tar.c
+++ b/archive-tar.c
@@ -332,7 +332,7 @@ static void write_global_extended_header(struct 
archiver_args *args)
         if (args->time > USTAR_MAX_MTIME) {
                 strbuf_append_ext_header_uint(&ext_header, "mtime",
                                               args->time);
-               args->time = USTAR_MAX_MTIME;
+               args->time = (time_t)USTAR_MAX_MTIME;
         }

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 22:38               ` Jeff King
@ 2016-07-15 13:37                 ` Torsten Bögershausen
  2016-07-15 13:46                   ` Jeff King
  0 siblings, 1 reply; 42+ messages in thread
From: Torsten Bögershausen @ 2016-07-15 13:37 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano
  Cc: Johannes Sixt, Johannes Schindelin, git, René Scharfe



On 07/15/2016 12:38 AM, Jeff King wrote:
> On Thu, Jul 14, 2016 at 03:30:58PM -0700, Junio C Hamano wrote:
>
>>> If we move to time_t everywhere, I think we'll need an extra
>>> TIME_T_IS_64BIT, but we can cross that bridge when we come to it.
>>>
>>> Likewise I think we'll need SIZE_T_IS_64BIT eventually (for real 32-bit
>>> systems; LLP64 systems like Windows will then be able to run the test).
>>
>> I guess I wrote essentially the same thing before refreshing my
>> Inbox.
>>
>> I am a bit fuzzy between off_t and size_t; the former is for the
>> size of things you see on the filesystem, while the latter is for
>> you to give malloc(3).  I would have thought that off_t is the type
>> we would want at the end of the raw object header, denoting the size
>> of a blob object when deflated, which could be larger than the size
>> of a region of memory we can get from malloc(3), in which case we
>> would use the streaming interface.
>
> Yeah, your understanding is right (s/deflated/inflated/). I agree that
> off_t is probably a better size for blobs. Traditionally git assumed any
> object could fit in memory. The streaming interface helps that somewhat,
> but I think there are cases where we still must load a blob (e.g., if it
> is stored as a delta). In theory that never happens because of
> core.bigfilethreshold, but you may get a packfile from somebody with a
> higher threshold than you.
>
> I wouldn't be surprised if there are other cases that aren't smart
> enough to use the streaming interface yet, but the solution there is to
> make them smarter. :)
>
> So off_t is probably better. We do need to be careful, though, when
> allocating objects. E.g., this:
>
>   off_t size;
>   struct git_istream *stream;
>   void *buf;
>
>   stream = open_istream(sha1, &type, &size, NULL);
>   buf = xmalloc(size);
>   while (1) {
> 	/* read stream into buf */
>   }
>
> is a security hole when size_t is less than off_t (it gets truncated in
> the call to xmalloc, which allocates too few bytes). This is a toy
> example, obviously, but it's something to watch out for.
>
> -Peff
That code is "illegal", it should be
  buf = xmalloc(xsize_t(size));

And the transition from off_t into size_t
should always got via xsize_t():

static inline size_t xsize_t(off_t len)
{
	if (len > (size_t) len)
		die("Cannot handle files this big");
	return (size_t)len;
}

There are some more things to be done, on the long run:
- convert "unsigned long" into either off_t of size_t in e.g. convert.c
- Use the streaming interface to analyze if blobs are binary
   (That is already on my list, the old "stream and early out"
   from the olc 10/10, gmane/$293010 or so can be reused)

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-15 13:37                 ` Torsten Bögershausen
@ 2016-07-15 13:46                   ` Jeff King
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff King @ 2016-07-15 13:46 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Junio C Hamano, Johannes Sixt, Johannes Schindelin, git,
	René Scharfe

On Fri, Jul 15, 2016 at 03:37:32PM +0200, Torsten Bögershausen wrote:

> > So off_t is probably better. We do need to be careful, though, when
> > allocating objects. E.g., this:
> > 
> >   off_t size;
> >   struct git_istream *stream;
> >   void *buf;
> > 
> >   stream = open_istream(sha1, &type, &size, NULL);
> >   buf = xmalloc(size);
> >   while (1) {
> > 	/* read stream into buf */
> >   }
> > 
> > is a security hole when size_t is less than off_t (it gets truncated in
> > the call to xmalloc, which allocates too few bytes). This is a toy
> > example, obviously, but it's something to watch out for.
> > 
> That code is "illegal", it should be
>  buf = xmalloc(xsize_t(size));

Sure, I agree. The point is that it is easy to forget the extra
wrapper/check, and we should be aware of it. I don't think the compiler
will warn you (probably some static analyzers would, though).

> - Use the streaming interface to analyze if blobs are binary
>   (That is already on my list, the old "stream and early out"
>   from the olc 10/10, gmane/$293010 or so can be reused)

You might be interested in

  https://github.com/peff/git/commit/2fb07bc91f3ac6162c3dd5667d8167fc0bec6d99

I don't remember if it produced good results or not (ISTR that the cost
of setting up the streaming sometimes overwhelmed any benefit).

-Peff

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-14 20:34               ` Junio C Hamano
  2016-07-14 20:43                 ` [PATCH v2 0/2] ulong may only be 32-bit wide Junio C Hamano
@ 2016-07-15 15:10                 ` Johannes Schindelin
  2016-07-15 16:49                   ` Junio C Hamano
  1 sibling, 1 reply; 42+ messages in thread
From: Johannes Schindelin @ 2016-07-15 15:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git, René Scharfe, Johannes Sixt

Hi Junio,

On Thu, 14 Jul 2016, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Yeah, that is what I was trying to get at, but you stated it much more
> > clearly. LONG_IS_64BIT is good. I wonder if the "git version
> > --build-options" should be "sizeof-long", too. It's shorter, and
> > indicates our assumption that we are talking about all longs, not just
> > unsigned ones.
> 
> OK, I'll do a final reroll and then wait for Europeans ;-)

Don't let me stand in the way of progress.

I would have preferred two separate prereqs, of course, for the same
reason we use time_t and off_t: I find it better to describe what the
prereq is about, and I do not mean implementation details here.

But that does not matter all that much. We will have to address the issues
anyway, and at that point that wart will be removed and we will all go on
with our lives.

So what are your plans with 2.9.2? I ask because I do not want to engineer
a 2.9.1 release just to see that 2.9.2 is out and having to spend the same
amount of time for another release ;-)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v4 2/5] t5000: test tar files that overflow ustar headers
  2016-07-15 15:10                 ` [PATCH v4 2/5] t5000: test tar files that overflow ustar headers Johannes Schindelin
@ 2016-07-15 16:49                   ` Junio C Hamano
  0 siblings, 0 replies; 42+ messages in thread
From: Junio C Hamano @ 2016-07-15 16:49 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, git, René Scharfe, Johannes Sixt

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> So what are your plans with 2.9.2? I ask because I do not want to engineer
> a 2.9.1 release just to see that 2.9.2 is out and having to spend the same
> amount of time for another release ;-)

Essentially unchanged from what I said in $gmane/299446 a few days
ago.  Merge only the t0006 workaround with lazy-prereq changes to
'maint', so that we can skip tests that are unrunnable on platforms
without 64-bit long to avoid unnecessary test failures, then tag
that to v2.9.2 soon enough so that you do not have to do two
releases in a row (i.e. skipping v2.9.1 saying "Git for Windows
skipped that one because it was not quite right; this release fixes
the issue" in your v2.9.2 announcement).

I couldn't do that while that t0006 fix was in flux.  Now I
hopefully can.








^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 2/2] archive-tar: huge offset and future timestamps would not work on 32-bit
  2016-07-14 22:36                       ` Junio C Hamano
@ 2016-07-16  6:28                         ` Duy Nguyen
  0 siblings, 0 replies; 42+ messages in thread
From: Duy Nguyen @ 2016-07-16  6:28 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Git Mailing List, Johannes Schindelin,
	René Scharfe, Johannes Sixt

On Fri, Jul 15, 2016 at 12:36 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> On Thu, Jul 14, 2016 at 01:43:57PM -0700, Junio C Hamano wrote:
>>
>>> As we are not yet moving everything to size_t but still using ulong
>>> internally when talking about the size of object, platforms with
>>> 32-bit long will not be able to produce tar archive with 4GB+ file,
>>> and cannot grok 077777777777UL as a constant.  Disable the extended
>>> header feature and do not test it on them.
>>
>> I tried testing this in a VM with 32-bit Debian. It fixes the build
>> problems, but t5000 still fails.
>
> Thanks for testing.  I need to find some time hunting for (or
> building) an environment to do that myself.

If you are on a distro with multilib, building git with "CFLAGS=-m32
LDFLAGS=-m32" should do it. That's how i tested the 32-bit offset
truncation thing. I may pursue the docker thing soon. At least then
you only need to install docker and just build/test (zero docker
configuration, assuming all the relevant kernel options are enabled).
-- 
Duy

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2016-07-16  6:28 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-30  9:06 [PATCH v4 0/5] friendlier handling of overflows in archive-tar Jeff King
2016-06-30  9:07 ` [PATCH v4 1/5] t9300: factor out portable "head -c" replacement Jeff King
2016-07-01  4:45   ` Eric Sunshine
2016-07-01 17:23   ` Junio C Hamano
2016-07-01 18:01     ` Jeff King
2016-06-30  9:08 ` [PATCH v4 2/5] t5000: test tar files that overflow ustar headers Jeff King
2016-07-14 15:47   ` Johannes Schindelin
2016-07-14 16:45     ` Johannes Sixt
2016-07-14 17:08       ` Junio C Hamano
2016-07-14 20:52         ` Johannes Sixt
2016-07-14 21:32           ` Jeff King
2016-07-14 22:30             ` Junio C Hamano
2016-07-14 22:38               ` Jeff King
2016-07-15 13:37                 ` Torsten Bögershausen
2016-07-15 13:46                   ` Jeff King
2016-07-14 22:26           ` Junio C Hamano
2016-07-14 18:24       ` Jeff King
2016-07-14 18:21     ` Jeff King
2016-07-14 20:00       ` Junio C Hamano
2016-07-14 20:03         ` Junio C Hamano
2016-07-14 20:14           ` Jeff King
2016-07-14 20:09         ` Junio C Hamano
2016-07-14 20:10         ` Jeff King
2016-07-14 20:22           ` Junio C Hamano
2016-07-14 20:27             ` Jeff King
2016-07-14 20:34               ` Junio C Hamano
2016-07-14 20:43                 ` [PATCH v2 0/2] ulong may only be 32-bit wide Junio C Hamano
2016-07-14 20:43                   ` [PATCH v2 1/2] t0006: skip "far in the future" test when unsigned long is not long enough Junio C Hamano
2016-07-14 20:43                   ` [PATCH v2 2/2] archive-tar: huge offset and future timestamps would not work on 32-bit Junio C Hamano
2016-07-14 22:20                     ` Jeff King
2016-07-14 22:36                       ` Junio C Hamano
2016-07-16  6:28                         ` Duy Nguyen
2016-07-15 15:10                 ` [PATCH v4 2/5] t5000: test tar files that overflow ustar headers Johannes Schindelin
2016-07-15 16:49                   ` Junio C Hamano
2016-06-30  9:09 ` [PATCH v4 3/5] archive-tar: write extended headers for file sizes >= 8GB Jeff King
2016-07-14 16:48   ` Johannes Sixt
2016-07-14 17:11     ` Junio C Hamano
2016-07-14 18:16       ` Jeff King
2016-07-15  2:59     ` Torsten Bögershausen
2016-06-30  9:09 ` [PATCH v4 4/5] archive-tar: write extended headers for far-future mtime Jeff King
2016-06-30  9:09 ` [PATCH v4 5/5] archive-tar: drop return value Jeff King
2016-06-30  9:14 ` [PATCH v4 6/5] t5000: use test_match_signal Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.