* [PATCH 0/2] friendlier handling of overflows in archive-tar @ 2016-06-16 4:35 Jeff King 2016-06-16 4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King ` (3 more replies) 0 siblings, 4 replies; 61+ messages in thread From: Jeff King @ 2016-06-16 4:35 UTC (permalink / raw) To: git; +Cc: René Scharfe The ustar format has some fixed-length numeric fields, and it's possible to generate a git tree that can't be represented (namely file size and mtime). Since f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), we detect and die() in these cases. But we can actually do the friendly (and POSIX-approved) thing, and add extended pax headers to represent the correct values. [1/2]: archive-tar: write extended headers for file sizes >= 8GB [2/2]: archive-tar: write extended headers for far-future mtime -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-16 4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King @ 2016-06-16 4:37 ` Jeff King 2016-06-20 22:54 ` René Scharfe 2016-06-21 19:44 ` Robin H. Johnson 2016-06-16 4:37 ` [PATCH 2/2] archive-tar: write extended headers for far-future mtime Jeff King ` (2 subsequent siblings) 3 siblings, 2 replies; 61+ messages in thread From: Jeff King @ 2016-06-16 4:37 UTC (permalink / raw) To: git; +Cc: René Scharfe The ustar format has a fixed-length field for the size of each file entry which is supposed to contain up to 11 bytes of octal-formatted data plus a NUL or space terminator. These means that the largest size we can represent is 077777777777, or 1 byte short of 8GB. The correct solution for a larger file, according to POSIX.1-2001, is to add an extended pax header, similar to how we handle long filenames. This patch does that, and writes zero for the size field in the ustar header (the last bit is not mentioned by POSIX, but it matches how GNU tar behaves with --format=pax). This should be a strict improvement over the current behavior, which is to die in xsnprintf with a "BUG". However, there's some interesting history here. Prior to f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), we silently overflowed the "size" field. The extra bytes ended up in the "mtime" field of the header, which was then immediately written itself, overwriting our extra bytes. What that means depends on how many bytes we wrote. If the size was 64GB or greater, then we actually overflowed digits into the mtime field, meaning our value was was effectively right-shifted by those lost octal digits. And this patch is again a strict improvement over that. But if the size was between 8GB and 64GB, then our 12-byte field held all of the actual digits, and only our NUL terminator overflowed. According to POSIX, there should be a NUL or space at the end of the field. However, GNU tar seems to be lenient here, and will correctly parse a size up 64GB (minus one) from the field. So sizes in this range might have just worked, depending on the implementation reading the tarfile. This patch is mostly still an improvement there, as the 8GB limit is specifically mentioned in POSIX as the correct limit. But it's possible that it could be a regression (versus the pre-f2f0267 state) if all of the following are true: 1. You have a file between 8GB and 64GB. 2. Your tar implementation _doesn't_ know about pax extended headers. 3. Your tar implementation _does_ parse 12-byte sizes from the ustar header without a delimiter. It's probably not worth worrying about such an obscure set of conditions, but I'm documenting it here just in case. There's no test included here. I did confirm that this works (using GNU tar) with: $ dd if=/dev/zero seek=64G bs=1 count=1 of=huge $ git add huge $ git commit -q -m foo $ git archive HEAD | head -c 10000 | tar tvf - 2>/dev/null -rw-rw-r-- root/root 68719476737 2016-06-15 21:07 huge Pre-f2f0267, this would yield a bogus size of 8GB, and post-f2f0267, git-archive simply dies. Unfortunately, it's quite an expensive test to run. For one thing, unless your filesystem supports files with holes, it takes 64GB of disk space (you might think piping straight to `hash-object --stdin` would be better, but it's not; that tries to buffer all 64GB in RAM!). Furthermore, hashing and compressing the object takes several minutes of CPU time. We could ship just the resulting compressed object data as a loose object, but even that takes 64MB. So sadly, this code path remains untested in the test suite. Signed-off-by: Jeff King <peff@peff.net> --- archive-tar.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/archive-tar.c b/archive-tar.c index cb99df2..7340b64 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword, strbuf_addch(sb, '\n'); } +/* + * Like strbuf_append_ext_header, but for numeric values. + */ +static void strbuf_append_ext_header_uint(struct strbuf *sb, + const char *keyword, + uintmax_t value) +{ + char buf[40]; /* big enough for 2^128 in decimal, plus NUL */ + int len; + + len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value); + strbuf_append_ext_header(sb, keyword, buf, len); +} + static unsigned int ustar_header_chksum(const struct ustar_header *header) { const unsigned char *p = (const unsigned char *)header; @@ -163,12 +177,21 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen) return i; } +static inline unsigned long ustar_size(uintmax_t size) +{ + if (size < 077777777777UL) + return size; + else + return 0; +} + static void prepare_header(struct archiver_args *args, struct ustar_header *header, unsigned int mode, unsigned long size) { xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777); - xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0); + xsnprintf(header->size, sizeof(header->size), "%011lo", + S_ISREG(mode) ? ustar_size(size) : 0); xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time); xsnprintf(header->uid, sizeof(header->uid), "%07o", 0); @@ -267,6 +290,9 @@ static int write_tar_entry(struct archiver_args *args, memcpy(header.linkname, buffer, size); } + if (ustar_size(size) != size) + strbuf_append_ext_header_uint(&ext_header, "size", size); + prepare_header(args, &header, mode, size); if (ext_header.len > 0) { -- 2.9.0.150.g8bd4cf6 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-16 4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King @ 2016-06-20 22:54 ` René Scharfe 2016-06-21 15:59 ` Jeff King 2016-06-21 19:44 ` Robin H. Johnson 1 sibling, 1 reply; 61+ messages in thread From: René Scharfe @ 2016-06-20 22:54 UTC (permalink / raw) To: Jeff King, git Am 16.06.2016 um 06:37 schrieb Jeff King: > The ustar format has a fixed-length field for the size of > each file entry which is supposed to contain up to 11 bytes > of octal-formatted data plus a NUL or space terminator. > > These means that the largest size we can represent is > 077777777777, or 1 byte short of 8GB. The correct solution > for a larger file, according to POSIX.1-2001, is to add an > extended pax header, similar to how we handle long > filenames. This patch does that, and writes zero for the > size field in the ustar header (the last bit is not > mentioned by POSIX, but it matches how GNU tar behaves with > --format=pax). > > This should be a strict improvement over the current > behavior, which is to die in xsnprintf with a "BUG". > However, there's some interesting history here. > > Prior to f2f0267 (archive-tar: use xsnprintf for trivial > formatting, 2015-09-24), we silently overflowed the "size" > field. The extra bytes ended up in the "mtime" field of the > header, which was then immediately written itself, > overwriting our extra bytes. What that means depends on how > many bytes we wrote. > > If the size was 64GB or greater, then we actually overflowed > digits into the mtime field, meaning our value was was > effectively right-shifted by those lost octal digits. And > this patch is again a strict improvement over that. > > But if the size was between 8GB and 64GB, then our 12-byte > field held all of the actual digits, and only our NUL > terminator overflowed. According to POSIX, there should be a > NUL or space at the end of the field. However, GNU tar seems > to be lenient here, and will correctly parse a size up 64GB > (minus one) from the field. So sizes in this range might > have just worked, depending on the implementation reading > the tarfile. > > This patch is mostly still an improvement there, as the 8GB > limit is specifically mentioned in POSIX as the correct > limit. But it's possible that it could be a regression > (versus the pre-f2f0267 state) if all of the following are > true: > > 1. You have a file between 8GB and 64GB. > > 2. Your tar implementation _doesn't_ know about pax > extended headers. > > 3. Your tar implementation _does_ parse 12-byte sizes from > the ustar header without a delimiter. > > It's probably not worth worrying about such an obscure set > of conditions, but I'm documenting it here just in case. > > There's no test included here. I did confirm that this works > (using GNU tar) with: > > $ dd if=/dev/zero seek=64G bs=1 count=1 of=huge > $ git add huge > $ git commit -q -m foo > $ git archive HEAD | head -c 10000 | tar tvf - 2>/dev/null > -rw-rw-r-- root/root 68719476737 2016-06-15 21:07 huge > > Pre-f2f0267, this would yield a bogus size of 8GB, and > post-f2f0267, git-archive simply dies. > > Unfortunately, it's quite an expensive test to run. For one > thing, unless your filesystem supports files with holes, it > takes 64GB of disk space (you might think piping straight to > `hash-object --stdin` would be better, but it's not; that > tries to buffer all 64GB in RAM!). Furthermore, hashing and > compressing the object takes several minutes of CPU time. > > We could ship just the resulting compressed object data as a > loose object, but even that takes 64MB. So sadly, this code > path remains untested in the test suite. If we could set the limit to a lower value than 8GB for testing then we could at least check if the extended header is written, e.g. if ustar_size() could be convinced to return 0 every time using a hidden command line parameter or an environment variable or something better. > > Signed-off-by: Jeff King <peff@peff.net> > --- > archive-tar.c | 28 +++++++++++++++++++++++++++- > 1 file changed, 27 insertions(+), 1 deletion(-) > > diff --git a/archive-tar.c b/archive-tar.c > index cb99df2..7340b64 100644 > --- a/archive-tar.c > +++ b/archive-tar.c > @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword, > strbuf_addch(sb, '\n'); > } > > +/* > + * Like strbuf_append_ext_header, but for numeric values. > + */ > +static void strbuf_append_ext_header_uint(struct strbuf *sb, > + const char *keyword, > + uintmax_t value) > +{ > + char buf[40]; /* big enough for 2^128 in decimal, plus NUL */ > + int len; > + > + len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value); > + strbuf_append_ext_header(sb, keyword, buf, len); > +} > + > static unsigned int ustar_header_chksum(const struct ustar_header *header) > { > const unsigned char *p = (const unsigned char *)header; > @@ -163,12 +177,21 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen) > return i; > } > > +static inline unsigned long ustar_size(uintmax_t size) > +{ > + if (size < 077777777777UL) Shouldn't that be less-or-equal? > + return size; > + else > + return 0; > +} > + > static void prepare_header(struct archiver_args *args, > struct ustar_header *header, > unsigned int mode, unsigned long size) > { > xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777); > - xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0); > + xsnprintf(header->size, sizeof(header->size), "%011lo", > + S_ISREG(mode) ? ustar_size(size) : 0); > xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time); > > xsnprintf(header->uid, sizeof(header->uid), "%07o", 0); > @@ -267,6 +290,9 @@ static int write_tar_entry(struct archiver_args *args, > memcpy(header.linkname, buffer, size); > } > > + if (ustar_size(size) != size) > + strbuf_append_ext_header_uint(&ext_header, "size", size); It needs "S_ISREG(mode) && " as well, no? In practice it probably doesn't matter (until someone stores a 8GB long symlink target), but the size field should only be set for regular files. > + > prepare_header(args, &header, mode, size); > > if (ext_header.len > 0) { > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-20 22:54 ` René Scharfe @ 2016-06-21 15:59 ` Jeff King 2016-06-21 16:02 ` Jeff King ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Jeff King @ 2016-06-21 15:59 UTC (permalink / raw) To: René Scharfe; +Cc: git On Tue, Jun 21, 2016 at 12:54:11AM +0200, René Scharfe wrote: > > Unfortunately, it's quite an expensive test to run. For one > > thing, unless your filesystem supports files with holes, it > > takes 64GB of disk space (you might think piping straight to > > `hash-object --stdin` would be better, but it's not; that > > tries to buffer all 64GB in RAM!). Furthermore, hashing and > > compressing the object takes several minutes of CPU time. > > > > We could ship just the resulting compressed object data as a > > loose object, but even that takes 64MB. So sadly, this code > > path remains untested in the test suite. > > If we could set the limit to a lower value than 8GB for testing then we > could at least check if the extended header is written, e.g. if ustar_size() > could be convinced to return 0 every time using a hidden command line > parameter or an environment variable or something better. Yes, we could do that, though I think it loses most of the value of the test. We can check that if we hit an arbitrary value we generate the pax header, but I think what we _really_ care about is: did we generate an output that somebody else's tar implementation can handle. And for the smaller-than-64GB case, GNU tar happily handles our existing output (though I suspect other tars might fail at "only" 8GB). > > +static inline unsigned long ustar_size(uintmax_t size) > > +{ > > + if (size < 077777777777UL) > > Shouldn't that be less-or-equal? Yeah, you're right (and for the one in the next patch, too). > > + if (ustar_size(size) != size) > > + strbuf_append_ext_header_uint(&ext_header, "size", size); > > It needs "S_ISREG(mode) && " as well, no? In practice it probably doesn't > matter (until someone stores a 8GB long symlink target), but the size field > should only be set for regular files. Thanks for noticing that. I remembered wondering that when I was early in debugging/diagnosing, but forgot to follow up on it. I agree it's unlikely in practice, but we should have consistent checks (I think it would actually make sense to move the ISREG check inside ustar_size, and then we can apply it consistently here and when generating the header; my goal with ustar_size() was to avoid having the same logic in multiple places). -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-21 15:59 ` Jeff King @ 2016-06-21 16:02 ` Jeff King 2016-06-21 20:42 ` René Scharfe 2016-06-21 20:54 ` René Scharfe 2 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-21 16:02 UTC (permalink / raw) To: René Scharfe; +Cc: git On Tue, Jun 21, 2016 at 11:59:20AM -0400, Jeff King wrote: > > > + if (ustar_size(size) != size) > > > + strbuf_append_ext_header_uint(&ext_header, "size", size); > > > > It needs "S_ISREG(mode) && " as well, no? In practice it probably doesn't > > matter (until someone stores a 8GB long symlink target), but the size field > > should only be set for regular files. > > Thanks for noticing that. I remembered wondering that when I was early > in debugging/diagnosing, but forgot to follow up on it. I agree it's > unlikely in practice, but we should have consistent checks (I think it > would actually make sense to move the ISREG check inside ustar_size, and > then we can apply it consistently here and when generating the header; > my goal with ustar_size() was to avoid having the same logic in multiple > places). Actually, scratch that. It makes things awkward because it would be hard to tell when ustar_size() returned zero because it's a file with a big size, and thus needs a pax header, versus that it was not a file, and therefore must _not_ have a pax header. -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-21 15:59 ` Jeff King 2016-06-21 16:02 ` Jeff King @ 2016-06-21 20:42 ` René Scharfe 2016-06-21 20:57 ` René Scharfe 2016-06-21 21:02 ` Jeff King 2016-06-21 20:54 ` René Scharfe 2 siblings, 2 replies; 61+ messages in thread From: René Scharfe @ 2016-06-21 20:42 UTC (permalink / raw) To: Jeff King; +Cc: git Am 21.06.2016 um 17:59 schrieb Jeff King: > On Tue, Jun 21, 2016 at 12:54:11AM +0200, René Scharfe wrote: > >>> Unfortunately, it's quite an expensive test to run. For one >>> thing, unless your filesystem supports files with holes, it >>> takes 64GB of disk space (you might think piping straight to >>> `hash-object --stdin` would be better, but it's not; that >>> tries to buffer all 64GB in RAM!). Furthermore, hashing and >>> compressing the object takes several minutes of CPU time. >>> >>> We could ship just the resulting compressed object data as a >>> loose object, but even that takes 64MB. So sadly, this code >>> path remains untested in the test suite. >> >> If we could set the limit to a lower value than 8GB for testing then we >> could at least check if the extended header is written, e.g. if ustar_size() >> could be convinced to return 0 every time using a hidden command line >> parameter or an environment variable or something better. > > Yes, we could do that, though I think it loses most of the value of the > test. We can check that if we hit an arbitrary value we generate the pax > header, but I think what we _really_ care about is: did we generate an > output that somebody else's tar implementation can handle. I agree with the last point, but don't see how that diminishes the value of such a test. If we provide file sizes only through extended headers (the normal header field being set to 0) and we can extract files with correct sizes then tar must have interpreted those header as intended, right? (Or it just guessed the sizes by searching for the next header magic, but such a fallback won't be accurate for files ending with NUL characters due to NUL-padding, so we just have to add such a file.) René -- >8 -- Subject: archive-tar: test creation of pax extended size headers --- The value 120 is magic; we need it to pass the tests. That's because prepare_header() is used for building extended header records as well and we don't create extended headers for extended headers (not sure if that would work anyway), so they simply vanish when they're over the limit as their size field is set to zero. archive-tar.c | 7 ++++++- t/t5000-tar-tree.sh | 7 +++++++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/archive-tar.c b/archive-tar.c index f53e61c..fbbc4cc 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -14,6 +14,7 @@ static char block[BLOCKSIZE]; static unsigned long offset; static int tar_umask = 002; +static unsigned long ustar_size_max = 077777777777UL; static int write_tar_filter_archive(const struct archiver *ar, struct archiver_args *args); @@ -179,7 +180,7 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen) static inline unsigned long ustar_size(uintmax_t size) { - if (size <= 077777777777UL) + if (size <= ustar_size_max) return size; else return 0; @@ -412,6 +413,10 @@ static int git_tar_config(const char *var, const char *value, void *cb) } return 0; } + if (!strcmp(var, "tar.ustarsizemax")) { + ustar_size_max = git_config_ulong(var, value); + return 0; + } return tar_filter_config(var, value, cb); } diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 4b68bba..03bb4c7 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -102,6 +102,7 @@ test_expect_success \ echo long filename >a/four$hundred && mkdir a/bin && test-genrandom "frotz" 500000 >a/bin/sh && + printf "\0\0\0" >>a/bin/sh && printf "A\$Format:%s\$O" "$SUBSTFORMAT" >a/substfile1 && printf "A not substituted O" >a/substfile2 && if test_have_prereq SYMLINKS; then @@ -157,6 +158,12 @@ test_expect_success 'git-archive --prefix=olde-' ' check_tar with_olde-prefix olde- +test_expect_success !TAR_NEEDS_PAX_FALLBACK 'pax extended size headers' ' + git -c tar.ustarsizemax=120 archive HEAD >extended_size_header.tar +' + +check_tar extended_size_header + test_expect_success 'git archive on large files' ' test_config core.bigfilethreshold 1 && git archive HEAD >b3.tar && -- 2.9.0 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-21 20:42 ` René Scharfe @ 2016-06-21 20:57 ` René Scharfe 2016-06-21 21:04 ` Jeff King 2016-06-21 21:02 ` Jeff King 1 sibling, 1 reply; 61+ messages in thread From: René Scharfe @ 2016-06-21 20:57 UTC (permalink / raw) To: Jeff King; +Cc: git Am 21.06.2016 um 22:42 schrieb René Scharfe: > The value 120 is magic; we need it to pass the tests. That's > because prepare_header() is used for building extended header > records as well and we don't create extended headers for extended > headers (not sure if that would work anyway), so they simply > vanish when they're over the limit as their size field is set to > zero. So how about something like this to make sure extended headers are only written for regular files and not for other extended headers? --- archive-tar.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index ed562d4..f53e61c 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -199,7 +199,7 @@ static void prepare_header(struct archiver_args *args, { xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777); xsnprintf(header->size, sizeof(header->size), "%011lo", - S_ISREG(mode) ? ustar_size(size) : 0); + S_ISREG(mode) ? size : 0); xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", ustar_mtime(args->time)); @@ -240,7 +240,7 @@ static int write_tar_entry(struct archiver_args *args, struct ustar_header header; struct strbuf ext_header = STRBUF_INIT; unsigned int old_mode = mode; - unsigned long size; + unsigned long size, size_in_header; void *buffer; int err = 0; @@ -299,12 +299,14 @@ static int write_tar_entry(struct archiver_args *args, memcpy(header.linkname, buffer, size); } - if (S_ISREG(mode) && ustar_size(size) != size) + size_in_header = S_ISREG(mode) ? ustar_size(size) : size; + if (size_in_header != size) strbuf_append_ext_header_uint(&ext_header, "size", size); + if (ustar_mtime(args->time) != args->time) strbuf_append_ext_header_uint(&ext_header, "mtime", args->time); - prepare_header(args, &header, mode, size); + prepare_header(args, &header, mode, size_in_header); if (ext_header.len > 0) { err = write_extended_header(args, sha1, ext_header.buf, -- 2.9.0 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-21 20:57 ` René Scharfe @ 2016-06-21 21:04 ` Jeff King 2016-06-22 5:46 ` René Scharfe 0 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-21 21:04 UTC (permalink / raw) To: René Scharfe; +Cc: git On Tue, Jun 21, 2016 at 10:57:43PM +0200, René Scharfe wrote: > Am 21.06.2016 um 22:42 schrieb René Scharfe: > > The value 120 is magic; we need it to pass the tests. That's > > because prepare_header() is used for building extended header > > records as well and we don't create extended headers for extended > > headers (not sure if that would work anyway), so they simply > > vanish when they're over the limit as their size field is set to > > zero. > > So how about something like this to make sure extended headers are > only written for regular files and not for other extended headers? This is quite similar to what I wrote originally, but I moved to the ustar_size() format to better match the mtime code (which needs something like that, because we pass around args->time). I think you could drop ustar_size() completely here and just put the "if" into write_tar_entry(). -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-21 21:04 ` Jeff King @ 2016-06-22 5:46 ` René Scharfe 0 siblings, 0 replies; 61+ messages in thread From: René Scharfe @ 2016-06-22 5:46 UTC (permalink / raw) To: Jeff King; +Cc: git Am 21.06.2016 um 23:04 schrieb Jeff King: > On Tue, Jun 21, 2016 at 10:57:43PM +0200, René Scharfe wrote: > >> Am 21.06.2016 um 22:42 schrieb René Scharfe: >>> The value 120 is magic; we need it to pass the tests. That's >>> because prepare_header() is used for building extended header >>> records as well and we don't create extended headers for extended >>> headers (not sure if that would work anyway), so they simply >>> vanish when they're over the limit as their size field is set to >>> zero. >> >> So how about something like this to make sure extended headers are >> only written for regular files and not for other extended headers? > > This is quite similar to what I wrote originally, but I moved to the > ustar_size() format to better match the mtime code (which needs > something like that, because we pass around args->time). > > I think you could drop ustar_size() completely here and just put the > "if" into write_tar_entry(). Which would look like this: --- archive-tar.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index cb99df2..274bdfa 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword, strbuf_addch(sb, '\n'); } +/* + * Like strbuf_append_ext_header, but for numeric values. + */ +static void strbuf_append_ext_header_uint(struct strbuf *sb, + const char *keyword, + uintmax_t value) +{ + char buf[40]; /* big enough for 2^128 in decimal, plus NUL */ + int len; + + len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value); + strbuf_append_ext_header(sb, keyword, buf, len); +} + static unsigned int ustar_header_chksum(const struct ustar_header *header) { const unsigned char *p = (const unsigned char *)header; @@ -208,7 +222,7 @@ static int write_tar_entry(struct archiver_args *args, struct ustar_header header; struct strbuf ext_header = STRBUF_INIT; unsigned int old_mode = mode; - unsigned long size; + unsigned long size, size_in_header; void *buffer; int err = 0; @@ -267,7 +281,13 @@ static int write_tar_entry(struct archiver_args *args, memcpy(header.linkname, buffer, size); } - prepare_header(args, &header, mode, size); + size_in_header = size; + if (S_ISREG(mode) && size > 077777777777UL) { + size_in_header = 0; + strbuf_append_ext_header_uint(&ext_header, "size", size); + } + + prepare_header(args, &header, mode, size_in_header); if (ext_header.len > 0) { err = write_extended_header(args, sha1, ext_header.buf, -- 2.9.0 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-21 20:42 ` René Scharfe 2016-06-21 20:57 ` René Scharfe @ 2016-06-21 21:02 ` Jeff King 2016-06-22 5:46 ` René Scharfe 1 sibling, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-21 21:02 UTC (permalink / raw) To: René Scharfe; +Cc: git On Tue, Jun 21, 2016 at 10:42:52PM +0200, René Scharfe wrote: > >> If we could set the limit to a lower value than 8GB for testing then we > >> could at least check if the extended header is written, e.g. if ustar_size() > >> could be convinced to return 0 every time using a hidden command line > >> parameter or an environment variable or something better. > > > > Yes, we could do that, though I think it loses most of the value of the > > test. We can check that if we hit an arbitrary value we generate the pax > > header, but I think what we _really_ care about is: did we generate an > > output that somebody else's tar implementation can handle. > > I agree with the last point, but don't see how that diminishes the > value of such a test. If we provide file sizes only through extended > headers (the normal header field being set to 0) and we can extract > files with correct sizes then tar must have interpreted those header > as intended, right? The diminished value is: 1. This is a situation that doesn't actually happen in real life. 2. Now we're carrying extra code inside git only for the sake of testing (which can have its own bugs, etc). Still, it may be better than nothing. > -- >8 -- > Subject: archive-tar: test creation of pax extended size headers > > --- > The value 120 is magic; we need it to pass the tests. That's > because prepare_header() is used for building extended header > records as well and we don't create extended headers for extended > headers (not sure if that would work anyway), so they simply > vanish when they're over the limit as their size field is set to > zero. Right, so this is sort of what I meant in (2). Now we have a tar.ustarsizemax setting shipped in git that is totally broken if you set it to "1". I can live with it as a tradeoff, but it is definitely a negative IMHO. -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-21 21:02 ` Jeff King @ 2016-06-22 5:46 ` René Scharfe 2016-06-23 19:21 ` Jeff King 0 siblings, 1 reply; 61+ messages in thread From: René Scharfe @ 2016-06-22 5:46 UTC (permalink / raw) To: Jeff King; +Cc: git Am 21.06.2016 um 23:02 schrieb Jeff King: > On Tue, Jun 21, 2016 at 10:42:52PM +0200, René Scharfe wrote: > >>>> If we could set the limit to a lower value than 8GB for testing then we >>>> could at least check if the extended header is written, e.g. if ustar_size() >>>> could be convinced to return 0 every time using a hidden command line >>>> parameter or an environment variable or something better. >>> >>> Yes, we could do that, though I think it loses most of the value of the >>> test. We can check that if we hit an arbitrary value we generate the pax >>> header, but I think what we _really_ care about is: did we generate an >>> output that somebody else's tar implementation can handle. >> >> I agree with the last point, but don't see how that diminishes the >> value of such a test. If we provide file sizes only through extended >> headers (the normal header field being set to 0) and we can extract >> files with correct sizes then tar must have interpreted those header >> as intended, right? > > The diminished value is: > > 1. This is a situation that doesn't actually happen in real life. > > 2. Now we're carrying extra code inside git only for the sake of > testing (which can have its own bugs, etc). > > Still, it may be better than nothing. > >> -- >8 -- >> Subject: archive-tar: test creation of pax extended size headers >> >> --- >> The value 120 is magic; we need it to pass the tests. That's >> because prepare_header() is used for building extended header >> records as well and we don't create extended headers for extended >> headers (not sure if that would work anyway), so they simply >> vanish when they're over the limit as their size field is set to >> zero. > > Right, so this is sort of what I meant in (2). Now we have a > tar.ustarsizemax setting shipped in git that is totally broken if you > set it to "1". > > I can live with it as a tradeoff, but it is definitely a negative IMHO. Yes, it's only useful as a debug flag, but the fact that it breaks highlights a (admittedly mostly theoretical) shortcoming: The code doesn't handle extended headers that are over the size limit as nicely as it could. So the test was already worth it, even if won't land in master. :) René ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-22 5:46 ` René Scharfe @ 2016-06-23 19:21 ` Jeff King 0 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-23 19:21 UTC (permalink / raw) To: René Scharfe; +Cc: git On Wed, Jun 22, 2016 at 07:46:13AM +0200, René Scharfe wrote: > Yes, it's only useful as a debug flag, but the fact that it breaks > highlights a (admittedly mostly theoretical) shortcoming: The code doesn't > handle extended headers that are over the size limit as nicely as it could. > So the test was already worth it, even if won't land in master. :) Kind of. It was impossible to trigger in the original (and we still don't actually handle it in the revised version; we just die in xsnprintf). But still, I'll go with the simpler thing we've discussed here. The symmetry with ustar_mtime isn't worth it, and doubly so if we just write a single pax header. -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-21 15:59 ` Jeff King 2016-06-21 16:02 ` Jeff King 2016-06-21 20:42 ` René Scharfe @ 2016-06-21 20:54 ` René Scharfe 2 siblings, 0 replies; 61+ messages in thread From: René Scharfe @ 2016-06-21 20:54 UTC (permalink / raw) To: Jeff King; +Cc: git Am 21.06.2016 um 17:59 schrieb Jeff King: > On Tue, Jun 21, 2016 at 12:54:11AM +0200, René Scharfe wrote: > >>> Unfortunately, it's quite an expensive test to run. For one >>> thing, unless your filesystem supports files with holes, it >>> takes 64GB of disk space (you might think piping straight to >>> `hash-object --stdin` would be better, but it's not; that >>> tries to buffer all 64GB in RAM!). Furthermore, hashing and >>> compressing the object takes several minutes of CPU time. >>> >>> We could ship just the resulting compressed object data as a >>> loose object, but even that takes 64MB. So sadly, this code >>> path remains untested in the test suite. >> >> If we could set the limit to a lower value than 8GB for testing then we >> could at least check if the extended header is written, e.g. if ustar_size() >> could be convinced to return 0 every time using a hidden command line >> parameter or an environment variable or something better. > > Yes, we could do that, though I think it loses most of the value of the > test. We can check that if we hit an arbitrary value we generate the pax > header, but I think what we _really_ care about is: did we generate an > output that somebody else's tar implementation can handle. I agree with the last point, but don't see how that diminishes the value of such a test. If we provide file sizes only through extended headers (the normal header field being set to 0) and we can extract files with correct sizes then tar must have interpreted those header as intended, right? (Or it just guessed the sizes by searching for the next header magic, but such a fallback won't be accurate for files ending with NUL characters due to NUL-padding, so we just have to add such a file.) René -- >8 -- Subject: archive-tar: test creation of pax extended size headers --- The value 120 is magic; we need it to pass the tests. That's because prepare_header() is used for building extended header records as well and we don't create extended headers for extended headers (not sure if that would work anyway), so they simply vanish when they're over the limit as their size field is set to zero. archive-tar.c | 7 ++++++- t/t5000-tar-tree.sh | 7 +++++++ 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/archive-tar.c b/archive-tar.c index f53e61c..fbbc4cc 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -14,6 +14,7 @@ static char block[BLOCKSIZE]; static unsigned long offset; static int tar_umask = 002; +static unsigned long ustar_size_max = 077777777777UL; static int write_tar_filter_archive(const struct archiver *ar, struct archiver_args *args); @@ -179,7 +180,7 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen) static inline unsigned long ustar_size(uintmax_t size) { - if (size <= 077777777777UL) + if (size <= ustar_size_max) return size; else return 0; @@ -412,6 +413,10 @@ static int git_tar_config(const char *var, const char *value, void *cb) } return 0; } + if (!strcmp(var, "tar.ustarsizemax")) { + ustar_size_max = git_config_ulong(var, value); + return 0; + } return tar_filter_config(var, value, cb); } diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 4b68bba..03bb4c7 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -102,6 +102,7 @@ test_expect_success \ echo long filename >a/four$hundred && mkdir a/bin && test-genrandom "frotz" 500000 >a/bin/sh && + printf "\0\0\0" >>a/bin/sh && printf "A\$Format:%s\$O" "$SUBSTFORMAT" >a/substfile1 && printf "A not substituted O" >a/substfile2 && if test_have_prereq SYMLINKS; then @@ -157,6 +158,12 @@ test_expect_success 'git-archive --prefix=olde-' ' check_tar with_olde-prefix olde- +test_expect_success !TAR_NEEDS_PAX_FALLBACK 'pax extended size headers' ' + git -c tar.ustarsizemax=120 archive HEAD >extended_size_header.tar +' + +check_tar extended_size_header + test_expect_success 'git archive on large files' ' test_config core.bigfilethreshold 1 && git archive HEAD >b3.tar && -- 2.9.0 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-16 4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King 2016-06-20 22:54 ` René Scharfe @ 2016-06-21 19:44 ` Robin H. Johnson 2016-06-21 20:57 ` Jeff King 1 sibling, 1 reply; 61+ messages in thread From: Robin H. Johnson @ 2016-06-21 19:44 UTC (permalink / raw) To: Jeff King, Git Mailing List; +Cc: René Scharfe On Thu, Jun 16, 2016 at 12:37:33AM -0400, Jeff King wrote: > We could ship just the resulting compressed object data as a > loose object, but even that takes 64MB. So sadly, this code > path remains untested in the test suite. Additionally compress the object data, and insert it for the purpose of testing? It's still an expensive test time-wise, but repeated gzip compression on zeros does still help; to that end, here's the pieces to add a testcase while only being <9KiB. t5005-archive-hugefile.sh: ... mkdir t zcat t5005-barerepo-64gb-obj.tar.gz.gz | tar xzf - -C t GIT_DIR=t git archive HEAD | head -c 10000 | tar tvf - 2>/dev/null ... Test repo attached, it's only 8KiB. Test repo can be recreated with: truncate -s 64g bigfile git add bigfile # takes 10 mins git commit -q -m foo # takes another 10 minutes tar cf - -C .git . |gzip -9 --no-name |gzip -9f --no-name >barerepo.tar.gz.gz -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer E-Mail : robbat2@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-21 19:44 ` Robin H. Johnson @ 2016-06-21 20:57 ` Jeff King 0 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-21 20:57 UTC (permalink / raw) To: Robin H. Johnson; +Cc: Git Mailing List, René Scharfe On Tue, Jun 21, 2016 at 07:44:55PM +0000, Robin H. Johnson wrote: > On Thu, Jun 16, 2016 at 12:37:33AM -0400, Jeff King wrote: > > We could ship just the resulting compressed object data as a > > loose object, but even that takes 64MB. So sadly, this code > > path remains untested in the test suite. > Additionally compress the object data, and insert it for the purpose of > testing? It's still an expensive test time-wise, but repeated > gzip compression on zeros does still help; to that end, here's the > pieces to add a testcase while only being <9KiB. Interesting idea. With bzip2 it actually drops to about 600 bytes (I suspect we could get it even smaller by writing a custom program to generate it, but it's diminishing returns). I think there are some other portability issues, though (like does the receiving tar actually support 64GB sizes). -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 2/2] archive-tar: write extended headers for far-future mtime 2016-06-16 4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King 2016-06-16 4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King @ 2016-06-16 4:37 ` Jeff King 2016-06-20 22:54 ` René Scharfe 2016-06-16 17:55 ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano 2016-06-21 16:16 ` Jeff King 3 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-16 4:37 UTC (permalink / raw) To: git; +Cc: René Scharfe The ustar format represents timestamps as seconds since the epoch, but only has room to store 11 octal digits. To express anything larger, we need to use an extended header. This is exactly the same case we fixed for the size field in the previous commit, and the solution here follows the same pattern. This is even mentioned as an issue in f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), but since it only affected things far in the future, it wasn't worth dealing with. But note that my calculations claiming thousands of years were off there; because our xsnprintf produces a NUL byte, we only have until the year 2242 to fix this. Given that this is just around the corner (geologically speaking), and because the fix for "size" makes doing this on top trivial, let's just make it work. Note that we don't (and never did) handle negative timestamps (i.e., before 1970). This would probably not be too hard to support in the same way, but since git does not support negative timestamps at all, I didn't bother here. Unlike the last patch for "size", this one is easy to test efficiently (the magic date is octal 01000000000001): $ echo content >file $ git add file $ GIT_COMMITTER_DATE='@68719476737 +0000' \ git commit -q -m 'tempori parendum' $ git archive HEAD | tar tvf - -rw-rw-r-- root/root 8 4147-08-20 03:32 file With the current tip of master, this dies in xsnprintf (and prior to f2f0267, it overflowed into the checksum field, and the resulting tarfile claimed to be from the year 2242). However, I decided not to include this in the test suite, because I suspect it will create portability headaches, including: 1. The exact format of the system tar's "t" output. 2. System tars that cannot handle pax headers. 3. System tars tha cannot handle dates that far in the future. 4. If we replace "tar t" with "tar x", then filesystems that cannot handle dates that far in the future. In other words, we do not really have a reliable tar reader for these corner cases, so the best we could do is a byte-for-byte comparison of the output. Signed-off-by: Jeff King <peff@peff.net> --- archive-tar.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/archive-tar.c b/archive-tar.c index 7340b64..749722f 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -185,6 +185,14 @@ static inline unsigned long ustar_size(uintmax_t size) return 0; } +static inline unsigned long ustar_mtime(time_t mtime) +{ + if (mtime < 077777777777) + return mtime; + else + return 0; +} + static void prepare_header(struct archiver_args *args, struct ustar_header *header, unsigned int mode, unsigned long size) @@ -192,7 +200,8 @@ static void prepare_header(struct archiver_args *args, xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777); xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? ustar_size(size) : 0); - xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time); + xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", + ustar_mtime(args->time)); xsnprintf(header->uid, sizeof(header->uid), "%07o", 0); xsnprintf(header->gid, sizeof(header->gid), "%07o", 0); @@ -292,6 +301,8 @@ static int write_tar_entry(struct archiver_args *args, if (ustar_size(size) != size) strbuf_append_ext_header_uint(&ext_header, "size", size); + if (ustar_mtime(args->time) != args->time) + strbuf_append_ext_header_uint(&ext_header, "mtime", args->time); prepare_header(args, &header, mode, size); -- 2.9.0.150.g8bd4cf6 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime 2016-06-16 4:37 ` [PATCH 2/2] archive-tar: write extended headers for far-future mtime Jeff King @ 2016-06-20 22:54 ` René Scharfe 2016-06-22 5:46 ` René Scharfe 0 siblings, 1 reply; 61+ messages in thread From: René Scharfe @ 2016-06-20 22:54 UTC (permalink / raw) To: Jeff King, git Am 16.06.2016 um 06:37 schrieb Jeff King: > The ustar format represents timestamps as seconds since the > epoch, but only has room to store 11 octal digits. To > express anything larger, we need to use an extended header. > This is exactly the same case we fixed for the size field in > the previous commit, and the solution here follows the same > pattern. > > This is even mentioned as an issue in f2f0267 (archive-tar: > use xsnprintf for trivial formatting, 2015-09-24), but since > it only affected things far in the future, it wasn't worth > dealing with. But note that my calculations claiming > thousands of years were off there; because our xsnprintf > produces a NUL byte, we only have until the year 2242 to > fix this. > > Given that this is just around the corner (geologically > speaking), and because the fix for "size" makes doing this > on top trivial, let's just make it work. > > Note that we don't (and never did) handle negative > timestamps (i.e., before 1970). This would probably not be > too hard to support in the same way, but since git does not > support negative timestamps at all, I didn't bother here. > > Unlike the last patch for "size", this one is easy to test > efficiently (the magic date is octal 01000000000001): > > $ echo content >file > $ git add file > $ GIT_COMMITTER_DATE='@68719476737 +0000' \ > git commit -q -m 'tempori parendum' > $ git archive HEAD | tar tvf - > -rw-rw-r-- root/root 8 4147-08-20 03:32 file > > With the current tip of master, this dies in xsnprintf (and > prior to f2f0267, it overflowed into the checksum field, and > the resulting tarfile claimed to be from the year 2242). > > However, I decided not to include this in the test suite, > because I suspect it will create portability headaches, > including: > > 1. The exact format of the system tar's "t" output. Probably not worth it. > 2. System tars that cannot handle pax headers. In t5000 there is a simple interpreter for path headers for systems whose tar doesn't handle them. Adding one for mtime headers may be feasible. It's just a bit complicated to link a pax header file to the file it applies to when it doesn't also contain a path header. But we know that the mtime of all entries in tar files created by git archive are is the same, so we could simply read one header and then adjust the mtime of all files accordingly. > 3. System tars tha cannot handle dates that far in the > future. > > 4. If we replace "tar t" with "tar x", then filesystems > that cannot handle dates that far in the future. We can test that by supplying a test archive and set a prerequisite if tar (possibly with our header interpreter) and the filesystem can cope with that. > In other words, we do not really have a reliable tar reader > for these corner cases, so the best we could do is a > byte-for-byte comparison of the output. > > Signed-off-by: Jeff King <peff@peff.net> > --- > archive-tar.c | 13 ++++++++++++- > 1 file changed, 12 insertions(+), 1 deletion(-) > > diff --git a/archive-tar.c b/archive-tar.c > index 7340b64..749722f 100644 > --- a/archive-tar.c > +++ b/archive-tar.c > @@ -185,6 +185,14 @@ static inline unsigned long ustar_size(uintmax_t size) > return 0; > } > > +static inline unsigned long ustar_mtime(time_t mtime) > +{ > + if (mtime < 077777777777) That should be less-or-equal, right? > + return mtime; > + else > + return 0; > +} > + > static void prepare_header(struct archiver_args *args, > struct ustar_header *header, > unsigned int mode, unsigned long size) > @@ -192,7 +200,8 @@ static void prepare_header(struct archiver_args *args, > xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777); > xsnprintf(header->size, sizeof(header->size), "%011lo", > S_ISREG(mode) ? ustar_size(size) : 0); > - xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time); > + xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", > + ustar_mtime(args->time)); > > xsnprintf(header->uid, sizeof(header->uid), "%07o", 0); > xsnprintf(header->gid, sizeof(header->gid), "%07o", 0); > @@ -292,6 +301,8 @@ static int write_tar_entry(struct archiver_args *args, > > if (ustar_size(size) != size) > strbuf_append_ext_header_uint(&ext_header, "size", size); > + if (ustar_mtime(args->time) != args->time) > + strbuf_append_ext_header_uint(&ext_header, "mtime", args->time); > > prepare_header(args, &header, mode, size); > > ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime 2016-06-20 22:54 ` René Scharfe @ 2016-06-22 5:46 ` René Scharfe 2016-06-23 19:22 ` Jeff King 0 siblings, 1 reply; 61+ messages in thread From: René Scharfe @ 2016-06-22 5:46 UTC (permalink / raw) To: Jeff King, git Am 21.06.2016 um 00:54 schrieb René Scharfe: > Am 16.06.2016 um 06:37 schrieb Jeff King: >> 2. System tars that cannot handle pax headers. > > In t5000 there is a simple interpreter for path headers for systems > whose tar doesn't handle them. Adding one for mtime headers may be > feasible. > > It's just a bit complicated to link a pax header file to the file it > applies to when it doesn't also contain a path header. But we know that > the mtime of all entries in tar files created by git archive are is the > same, so we could simply read one header and then adjust the mtime of > all files accordingly. This brings me to the idea of using a single global pax header for mtime instead of one per entry. It reduces the size of the archive and allows for slightly easier testing -- it just fits better since we know that all our mtimes are the same. René ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime 2016-06-22 5:46 ` René Scharfe @ 2016-06-23 19:22 ` Jeff King 2016-06-23 21:38 ` René Scharfe 0 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-23 19:22 UTC (permalink / raw) To: René Scharfe; +Cc: git On Wed, Jun 22, 2016 at 07:46:25AM +0200, René Scharfe wrote: > Am 21.06.2016 um 00:54 schrieb René Scharfe: > > Am 16.06.2016 um 06:37 schrieb Jeff King: > > > 2. System tars that cannot handle pax headers. > > > > In t5000 there is a simple interpreter for path headers for systems > > whose tar doesn't handle them. Adding one for mtime headers may be > > feasible. > > > > It's just a bit complicated to link a pax header file to the file it > > applies to when it doesn't also contain a path header. But we know that > > the mtime of all entries in tar files created by git archive are is the > > same, so we could simply read one header and then adjust the mtime of > > all files accordingly. > > This brings me to the idea of using a single global pax header for mtime > instead of one per entry. It reduces the size of the archive and allows for > slightly easier testing -- it just fits better since we know that all our > mtimes are the same. Yeah, I had a similar thought while writing it, but wasn't quite sure how that was supposed to be formatted. I modeled my output after what GNU tar writes, but of course they are expecting a different mtime for each file. I'll look into how global pax headers should look. -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime 2016-06-23 19:22 ` Jeff King @ 2016-06-23 21:38 ` René Scharfe 2016-06-23 21:39 ` Jeff King 0 siblings, 1 reply; 61+ messages in thread From: René Scharfe @ 2016-06-23 21:38 UTC (permalink / raw) To: Jeff King; +Cc: git Am 23.06.2016 um 21:22 schrieb Jeff King: > On Wed, Jun 22, 2016 at 07:46:25AM +0200, René Scharfe wrote: > >> Am 21.06.2016 um 00:54 schrieb René Scharfe: >>> Am 16.06.2016 um 06:37 schrieb Jeff King: >>>> 2. System tars that cannot handle pax headers. >>> >>> In t5000 there is a simple interpreter for path headers for systems >>> whose tar doesn't handle them. Adding one for mtime headers may be >>> feasible. >>> >>> It's just a bit complicated to link a pax header file to the file it >>> applies to when it doesn't also contain a path header. But we know that >>> the mtime of all entries in tar files created by git archive are is the >>> same, so we could simply read one header and then adjust the mtime of >>> all files accordingly. >> >> This brings me to the idea of using a single global pax header for mtime >> instead of one per entry. It reduces the size of the archive and allows for >> slightly easier testing -- it just fits better since we know that all our >> mtimes are the same. > > Yeah, I had a similar thought while writing it, but wasn't quite sure > how that was supposed to be formatted. I modeled my output after what > GNU tar writes, but of course they are expecting a different mtime for > each file. > > I'll look into how global pax headers should look. Moving the strbuf_append_ext_header() call into write_global_extended_header() should be enough. The changes to the test script are a bit more interesting, though. Perhaps I can come up with something over the weekend. René ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 2/2] archive-tar: write extended headers for far-future mtime 2016-06-23 21:38 ` René Scharfe @ 2016-06-23 21:39 ` Jeff King 0 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-23 21:39 UTC (permalink / raw) To: René Scharfe; +Cc: git On Thu, Jun 23, 2016 at 11:38:19PM +0200, René Scharfe wrote: > > Yeah, I had a similar thought while writing it, but wasn't quite sure > > how that was supposed to be formatted. I modeled my output after what > > GNU tar writes, but of course they are expecting a different mtime for > > each file. > > > > I'll look into how global pax headers should look. > > Moving the strbuf_append_ext_header() call into > write_global_extended_header() should be enough. The changes to the test > script are a bit more interesting, though. Perhaps I can come up with > something over the weekend. Yeah, I have the code itself working now. I'm just slogging through writing tests that are efficient and portable. I hope to have something out soon. -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 0/2] friendlier handling of overflows in archive-tar 2016-06-16 4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King 2016-06-16 4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King 2016-06-16 4:37 ` [PATCH 2/2] archive-tar: write extended headers for far-future mtime Jeff King @ 2016-06-16 17:55 ` Junio C Hamano 2016-06-21 16:16 ` Jeff King 3 siblings, 0 replies; 61+ messages in thread From: Junio C Hamano @ 2016-06-16 17:55 UTC (permalink / raw) To: Jeff King; +Cc: git, René Scharfe Both patches looked good to me. Thanks. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 0/2] friendlier handling of overflows in archive-tar 2016-06-16 4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King ` (2 preceding siblings ...) 2016-06-16 17:55 ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano @ 2016-06-21 16:16 ` Jeff King 2016-06-21 16:16 ` [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King ` (3 more replies) 3 siblings, 4 replies; 61+ messages in thread From: Jeff King @ 2016-06-21 16:16 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, René Scharfe On Thu, Jun 16, 2016 at 12:35:23AM -0400, Jeff King wrote: > The ustar format has some fixed-length numeric fields, and it's possible > to generate a git tree that can't be represented (namely file size and > mtime). Since f2f0267 (archive-tar: use xsnprintf for trivial > formatting, 2015-09-24), we detect and die() in these cases. But we can > actually do the friendly (and POSIX-approved) thing, and add extended > pax headers to represent the correct values. > > [1/2]: archive-tar: write extended headers for file sizes >= 8GB > [2/2]: archive-tar: write extended headers for far-future mtime And here's a v2 that addresses the smaller comments from René. I punted on doing something fancy with tests. I'm not opposed to it, but I'm also not convinced it's all that useful. Either way, I think it can come on top if we want it. Junio, this is the jk/big-and-old-archive-tar topic. The interdiff is: diff --git a/archive-tar.c b/archive-tar.c index c7b85fd..ed562d4 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -179,7 +179,7 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen) static inline unsigned long ustar_size(uintmax_t size) { - if (size < 077777777777UL) + if (size <= 077777777777UL) return size; else return 0; @@ -187,7 +187,7 @@ static inline unsigned long ustar_size(uintmax_t size) static inline unsigned long ustar_mtime(time_t mtime) { - if (mtime < 077777777777UL) + if (mtime <= 077777777777UL) return mtime; else return 0; @@ -299,7 +299,7 @@ static int write_tar_entry(struct archiver_args *args, memcpy(header.linkname, buffer, size); } - if (ustar_size(size) != size) + if (S_ISREG(mode) && ustar_size(size) != size) strbuf_append_ext_header_uint(&ext_header, "size", size); if (ustar_mtime(args->time) != args->time) strbuf_append_ext_header_uint(&ext_header, "mtime", args->time); ^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB 2016-06-21 16:16 ` Jeff King @ 2016-06-21 16:16 ` Jeff King 2016-06-21 16:17 ` [PATCH v2 2/2] archive-tar: write extended headers for far-future mtime Jeff King ` (2 subsequent siblings) 3 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-21 16:16 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, René Scharfe The ustar format has a fixed-length field for the size of each file entry which is supposed to contain up to 11 bytes of octal-formatted data plus a NUL or space terminator. These means that the largest size we can represent is 077777777777, or 1 byte short of 8GB. The correct solution for a larger file, according to POSIX.1-2001, is to add an extended pax header, similar to how we handle long filenames. This patch does that, and writes zero for the size field in the ustar header (the last bit is not mentioned by POSIX, but it matches how GNU tar behaves with --format=pax). This should be a strict improvement over the current behavior, which is to die in xsnprintf with a "BUG". However, there's some interesting history here. Prior to f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), we silently overflowed the "size" field. The extra bytes ended up in the "mtime" field of the header, which was then immediately written itself, overwriting our extra bytes. What that means depends on how many bytes we wrote. If the size was 64GB or greater, then we actually overflowed digits into the mtime field, meaning our value was was effectively right-shifted by those lost octal digits. And this patch is again a strict improvement over that. But if the size was between 8GB and 64GB, then our 12-byte field held all of the actual digits, and only our NUL terminator overflowed. According to POSIX, there should be a NUL or space at the end of the field. However, GNU tar seems to be lenient here, and will correctly parse a size up 64GB (minus one) from the field. So sizes in this range might have just worked, depending on the implementation reading the tarfile. This patch is mostly still an improvement there, as the 8GB limit is specifically mentioned in POSIX as the correct limit. But it's possible that it could be a regression (versus the pre-f2f0267 state) if all of the following are true: 1. You have a file between 8GB and 64GB. 2. Your tar implementation _doesn't_ know about pax extended headers. 3. Your tar implementation _does_ parse 12-byte sizes from the ustar header without a delimiter. It's probably not worth worrying about such an obscure set of conditions, but I'm documenting it here just in case. There's no test included here. I did confirm that this works (using GNU tar) with: $ dd if=/dev/zero seek=64G bs=1 count=1 of=huge $ git add huge $ git commit -q -m foo $ git archive HEAD | head -c 10000 | tar tvf - 2>/dev/null -rw-rw-r-- root/root 68719476737 2016-06-15 21:07 huge Pre-f2f0267, this would yield a bogus size of 8GB, and post-f2f0267, git-archive simply dies. Unfortunately, it's quite an expensive test to run. For one thing, unless your filesystem supports files with holes, it takes 64GB of disk space (you might think piping straight to `hash-object --stdin` would be better, but it's not; that tries to buffer all 64GB in RAM!). Furthermore, hashing and compressing the object takes several minutes of CPU time. We could ship just the resulting compressed object data as a loose object, but even that takes 64MB. So sadly, this code path remains untested in the test suite. Signed-off-by: Jeff King <peff@peff.net> --- archive-tar.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/archive-tar.c b/archive-tar.c index cb99df2..a9caf13 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword, strbuf_addch(sb, '\n'); } +/* + * Like strbuf_append_ext_header, but for numeric values. + */ +static void strbuf_append_ext_header_uint(struct strbuf *sb, + const char *keyword, + uintmax_t value) +{ + char buf[40]; /* big enough for 2^128 in decimal, plus NUL */ + int len; + + len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value); + strbuf_append_ext_header(sb, keyword, buf, len); +} + static unsigned int ustar_header_chksum(const struct ustar_header *header) { const unsigned char *p = (const unsigned char *)header; @@ -163,12 +177,21 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen) return i; } +static inline unsigned long ustar_size(uintmax_t size) +{ + if (size <= 077777777777UL) + return size; + else + return 0; +} + static void prepare_header(struct archiver_args *args, struct ustar_header *header, unsigned int mode, unsigned long size) { xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777); - xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? size : 0); + xsnprintf(header->size, sizeof(header->size), "%011lo", + S_ISREG(mode) ? ustar_size(size) : 0); xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time); xsnprintf(header->uid, sizeof(header->uid), "%07o", 0); @@ -267,6 +290,9 @@ static int write_tar_entry(struct archiver_args *args, memcpy(header.linkname, buffer, size); } + if (S_ISREG(mode) && ustar_size(size) != size) + strbuf_append_ext_header_uint(&ext_header, "size", size); + prepare_header(args, &header, mode, size); if (ext_header.len > 0) { -- 2.9.0.204.g1499a7b ^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH v2 2/2] archive-tar: write extended headers for far-future mtime 2016-06-21 16:16 ` Jeff King 2016-06-21 16:16 ` [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King @ 2016-06-21 16:17 ` Jeff King 2016-06-21 18:43 ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano 2016-06-23 23:15 ` [PATCH v3] " Jeff King 3 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-21 16:17 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, René Scharfe The ustar format represents timestamps as seconds since the epoch, but only has room to store 11 octal digits. To express anything larger, we need to use an extended header. This is exactly the same case we fixed for the size field in the previous commit, and the solution here follows the same pattern. This is even mentioned as an issue in f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), but since it only affected things far in the future, it wasn't worth dealing with. But note that my calculations claiming thousands of years were off there; because our xsnprintf produces a NUL byte, we only have until the year 2242 to fix this. Given that this is just around the corner (geologically speaking), and because the fix for "size" makes doing this on top trivial, let's just make it work. Note that we don't (and never did) handle negative timestamps (i.e., before 1970). This would probably not be too hard to support in the same way, but since git does not support negative timestamps at all, I didn't bother here. Unlike the last patch for "size", this one is easy to test efficiently (the magic date is octal 01000000000001): $ echo content >file $ git add file $ GIT_COMMITTER_DATE='@68719476737 +0000' \ git commit -q -m 'tempori parendum' $ git archive HEAD | tar tvf - -rw-rw-r-- root/root 8 4147-08-20 03:32 file With the current tip of master, this dies in xsnprintf (and prior to f2f0267, it overflowed into the checksum field, and the resulting tarfile claimed to be from the year 2242). However, I decided not to include this in the test suite, because I suspect it will create portability headaches, including: 1. The exact format of the system tar's "t" output. 2. System tars that cannot handle pax headers. 3. System tars tha cannot handle dates that far in the future. 4. If we replace "tar t" with "tar x", then filesystems that cannot handle dates that far in the future. In other words, we do not really have a reliable tar reader for these corner cases, so the best we could do is a byte-for-byte comparison of the output. Signed-off-by: Jeff King <peff@peff.net> --- archive-tar.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/archive-tar.c b/archive-tar.c index a9caf13..ed562d4 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -185,6 +185,14 @@ static inline unsigned long ustar_size(uintmax_t size) return 0; } +static inline unsigned long ustar_mtime(time_t mtime) +{ + if (mtime <= 077777777777UL) + return mtime; + else + return 0; +} + static void prepare_header(struct archiver_args *args, struct ustar_header *header, unsigned int mode, unsigned long size) @@ -192,7 +200,8 @@ static void prepare_header(struct archiver_args *args, xsnprintf(header->mode, sizeof(header->mode), "%07o", mode & 07777); xsnprintf(header->size, sizeof(header->size), "%011lo", S_ISREG(mode) ? ustar_size(size) : 0); - xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", (unsigned long) args->time); + xsnprintf(header->mtime, sizeof(header->mtime), "%011lo", + ustar_mtime(args->time)); xsnprintf(header->uid, sizeof(header->uid), "%07o", 0); xsnprintf(header->gid, sizeof(header->gid), "%07o", 0); @@ -292,6 +301,8 @@ static int write_tar_entry(struct archiver_args *args, if (S_ISREG(mode) && ustar_size(size) != size) strbuf_append_ext_header_uint(&ext_header, "size", size); + if (ustar_mtime(args->time) != args->time) + strbuf_append_ext_header_uint(&ext_header, "mtime", args->time); prepare_header(args, &header, mode, size); -- 2.9.0.204.g1499a7b ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH 0/2] friendlier handling of overflows in archive-tar 2016-06-21 16:16 ` Jeff King 2016-06-21 16:16 ` [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King 2016-06-21 16:17 ` [PATCH v2 2/2] archive-tar: write extended headers for far-future mtime Jeff King @ 2016-06-21 18:43 ` Junio C Hamano 2016-06-23 23:15 ` [PATCH v3] " Jeff King 3 siblings, 0 replies; 61+ messages in thread From: Junio C Hamano @ 2016-06-21 18:43 UTC (permalink / raw) To: Jeff King; +Cc: git, René Scharfe Jeff King <peff@peff.net> writes: > Junio, this is the jk/big-and-old-archive-tar topic. > > The interdiff is: > ... > diff --git a/archive-tar.c b/archive-tar.c > index c7b85fd..ed562d4 100644 > --- a/archive-tar.c > +++ b/archive-tar.c > @@ -179,7 +179,7 @@ static size_t get_path_prefix(const char *path, size_t pathlen, size_t maxlen) > > static inline unsigned long ustar_size(uintmax_t size) > { > - if (size < 077777777777UL) > + if (size <= 077777777777UL) > return size; > else > return 0; > @@ -187,7 +187,7 @@ static inline unsigned long ustar_size(uintmax_t size) > > static inline unsigned long ustar_mtime(time_t mtime) > { > - if (mtime < 077777777777UL) > + if (mtime <= 077777777777UL) > return mtime; > else > return 0; > @@ -299,7 +299,7 @@ static int write_tar_entry(struct archiver_args *args, > memcpy(header.linkname, buffer, size); > } > > - if (ustar_size(size) != size) > + if (S_ISREG(mode) && ustar_size(size) != size) > strbuf_append_ext_header_uint(&ext_header, "size", size); > if (ustar_mtime(args->time) != args->time) > strbuf_append_ext_header_uint(&ext_header, "mtime", args->time); Thanks. By the way, I realized the naming mistake and the topic branch is now named s/old/future/. Size being big is one condition that needs this fix, so is timestamp being from far future, not "old". ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v3] friendlier handling of overflows in archive-tar 2016-06-21 16:16 ` Jeff King ` (2 preceding siblings ...) 2016-06-21 18:43 ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano @ 2016-06-23 23:15 ` Jeff King 2016-06-23 23:20 ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King ` (3 more replies) 3 siblings, 4 replies; 61+ messages in thread From: Jeff King @ 2016-06-23 23:15 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson This is a replacement for the jk/big-and-future-archive-tar topic. It's sufficiently rewritten that I won't bother with an interdiff, but rather just describe the changes below. Thanks to René for talking through some of the issues, and to Robin for suggesting the double-compression trick for the tests. The important changes here are: - the "size" fix is more localized to write_tar_entry(). This results in shorter code, and means that if one were to somehow write an extended header larger than 8GB, we would notice and die(), rather than just writing a "0" (though such a thing should not be possible). - the mtime is now written only in a single global header, rather than once per file. I checked POSIX and this is explicitly allowed, and of course I also checked that it works against GNU tar. - there are now tests! I tried hard to make them portable and efficient. See the first patch for details. - the final patch is just a cleanup I noticed in the area; it could be done independently, but I floated it to the end in case it's controversial. The patches are: [1/4]: t5000: test tar files that overflow ustar headers [2/4]: archive-tar: write extended headers for file sizes >= 8GB [3/4]: archive-tar: write extended headers for far-future mtime [4/4]: archive-tar: drop return value -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-23 23:15 ` [PATCH v3] " Jeff King @ 2016-06-23 23:20 ` Jeff King 2016-06-23 23:31 ` Jeff King ` (2 more replies) 2016-06-23 23:21 ` [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB Jeff King ` (2 subsequent siblings) 3 siblings, 3 replies; 61+ messages in thread From: Jeff King @ 2016-06-23 23:20 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson The ustar format only has room for 11 (or 12, depending on some implementations) octal digits for the size and mtime of each file. After this, we have to add pax extended headers to specify the real data, and git does not yet know how to do so. Before fixing that, let's start off with some test infrastructure, as designing portable and efficient tests for this is non-trivial. We want to use the system tar to check our output (because what we really care about is interoperability), but we can't rely on it: 1. being able to read pax headers 2. being able to handle huge sizes or mtimes 3. supporting a "t" format we can parse So as a prerequisite, we can feed the system tar a reference tarball to make sure it can handle these features. The reference tar here was created with: dd if=/dev/zero seek=64G bs=1 count=1 of=huge touch -d @68719476737 huge tar cf - --format=pax | head -c 2048 using GNU tar. Note that this is not a complete tarfile, but it's enough to contain the headers we want to examine. Likewise, we need to convince git that it has a 64GB blob to output. Running "git add" on that 64GB file takes many minutes of CPU, and even compressed, the result is 64MB. So again, I pre-generated that loose object, and then used bzip2 on the result, which shrinks it to a few hundred bytes. Unfortunately, we do still inflate it to 64MB on disk briefly while the test is running. The tests are split so that we test as much as we can even with an uncooperative system tar. This actually catches the current breakage (which is that we die("BUG") trying to write the ustar header) on every system, and then on systems where we can, we go farther and actually verify the result. Helped-by: Robin H. Johnson <robbat2@gentoo.org> Signed-off-by: Jeff King <peff@peff.net> --- I'm still not excited about the 64MB write, just because it's awfully heavyweight for such a trivial test. It runs pretty fast on my RAM disk, but maybe not on other people's system. I considered but didn't explore two other options: 1. I couldn't convince zlib to write a smaller file (this is done with core.compression=9). But I'm not sure if that's inherent to the on-disk format, or simply the maximum size of a deflate block. So it's possible that one could hand-roll zlib data that says "I'm 64GB" but is only a few bytes long. 2. We don't ever want to see the whole 64GB, of course; we want to stream it out and only care about the header (as an aside, this makes a wonderful test that we are hitting the streaming code path, as it's unlikely to work without it :) ). So another option would be to include a truncated file that claims to be 64GB, and has only the first 256kb or something worth of data (which should deflate down to almost nothing). git-fsck wouldn't work, of course, but we don't need to run it. Other bits of git might complain, but our plan is for git to get SIGPIPE before hitting that point anyway. So that seems pretty easy, but it is potentially flaky. t/t5000-tar-tree.sh | 73 +++++++++++++++++++++ .../19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2 | Bin 0 -> 578 bytes t/t5000/huge-and-future.tar | Bin 0 -> 2048 bytes 3 files changed, 73 insertions(+) create mode 100644 t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2 create mode 100644 t/t5000/huge-and-future.tar diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 4b68bba..e7c9271 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -319,4 +319,77 @@ test_expect_success 'catch non-matching pathspec' ' test_must_fail git archive -v HEAD -- "*.abc" >/dev/null ' +# See if our system tar can handle a tar file with huge sizes and dates far in +# the future, and that we can actually parse its output. +# +# The reference file was generated by GNU tar, and the magic time and size are +# both octal 01000000000001, which overflows normal ustar fields. +# +# When parsing, we'll pull out only the year from the date; that +# avoids any question of timezones impacting the result. The output +# of tar_info is expected to be "<size> <year>", both in decimal. It ignores +# the return value of tar. We have to do this, because our reference file is +# only a partial (the whole thing would be 64GB!). +tar_info () { + "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1 +} +test_lazy_prereq TAR_HUGE ' + echo "68719476737 4147" >expect && + tar_info "$TEST_DIRECTORY"/t5000/huge-and-future.tar >actual && + test_cmp expect actual +' + +# Likewise, we need bunzip for the 64GB git object. +test_lazy_prereq BUNZIP ' + bunzip2 --version +' + +test_expect_success BUNZIP 'set up repository with huge blob' ' + obj_d=19 && + obj_f=f9c8273ec45a8938e6999cb59b3ff66739902a && + obj=${obj_d}${obj_f} && + mkdir -p .git/objects/$obj_d && + bunzip2 -c \ + <"$TEST_DIRECTORY"/t5000/$obj.bz2 \ + >.git/objects/$obj_d/$obj_f && + rm -f .git/index && + git update-index --add --cacheinfo 100644,$obj,huge && + git commit -m huge +' + +# We expect git to die with SIGPIPE here (otherwise we +# would generate the whole 64GB). +test_expect_failure BUNZIP 'generate tar with huge size' ' + { + git archive HEAD + echo $? >exit-code + } | head -c 4096 >huge.tar && + echo 141 >expect && + test_cmp expect exit-code +' + +test_expect_failure BUNZIP,TAR_HUGE 'system tar can read our huge size' ' + echo 68719476737 >expect && + tar_info huge.tar | cut -d" " -f1 >actual && + test_cmp expect actual +' + +test_expect_success 'set up repository with far-future commit' ' + rm -f .git/index && + echo content >file && + git add file && + GIT_COMMITTER_DATE="@68719476737 +0000" \ + git commit -m "tempori parendum" +' + +test_expect_failure 'generate tar with future mtime' ' + git archive HEAD >future.tar +' + +test_expect_failure TAR_HUGE 'tar can encode dates far in future' ' + echo 4147 >expect && + tar_info future.tar | cut -d" " -f2 >actual && + test_cmp expect actual +' + test_done diff --git a/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2 b/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2 new file mode 100644 index 0000000000000000000000000000000000000000..9619fd3c5f6f345a40aba1b91807bb4d937fc51c GIT binary patch literal 578 zcmV-I0=@l0T4*^jL0KkKS$mI#Bmk}@egE{{00qbxVL(9uNQgj6FakgbL{P{8umA%P z1cU%-zyaKJ3Zno3&;S`U000^T007Vc88iR@8UZy`L7)Mk0iXZ?8a+UzR8!JvwH}iK zXvkv}!$8#Dpwk&jhdD}y=}Ly#N{6=5N`@-9@zf#~Y}ml6A5xVKIEqv?6sUZ+ic~-0 z3bD}&yn;flel6M|SGh`t%>^nMd9)O$Wu&Noa%<D9r9(fbH_W9&qsPxi@k)lGl@5I+ zDj0prR6BE&sCksAbrh&!tffP{F-nI-prpRlLJ+$#Qc)DvQlXBLp@B+<1u7d*3ZhIX zg<RA^t&a>SgnP<VG8CwK>ira`Ve7_Hm1V`0kgE@zR6?%0tYir&brh(3dz7ebDNy(^ zL0KkKS?}&rK>-rd{r~jXFbzl$fdD`NU<g16aRNX9Kv2K{umFG%1O@;pzyY`b3;;A} zXu>jRG-;qPzyn5xj3Xw4Mw$Zz)lx%14F-mTL6atf)Gz?ip`!@NpwXs)%#?y?q!Xf$ zP8vZs=>+{8*(#7u93>!3K5|fsT$F-(VMr#rK{}}f`lJ&zAez-6nY%u7(h0VZPg+4e zX$0a(CY`i`W~3AAPBemQq!ZGRP7dlpJ?RA1nvhKtf@|4GCaOU@$`MOSl~R?f3Ia)6 zDFpYV6Llb)sRYUuLMe6QNGF9Ln9>Q3G=gtZ37;(>om7H;BohijH-{QkNGI(epUDKq QkWcY<BvXY60gn}ckTCG&H~;_u literal 0 HcmV?d00001 diff --git a/t/t5000/huge-and-future.tar b/t/t5000/huge-and-future.tar new file mode 100644 index 0000000000000000000000000000000000000000..63155e185585c7589b0db1d6da199bfa27f517c7 GIT binary patch literal 2048 zcmeHH%L;=q5X{-H$QS6YCRsi7-eZ3uw6XOd6dxe{`X*^rs7fzVia3{DW=WVGG6|!T z?v6%ZOjU;{l%nX?UJY9lV4;Lyu3CInz(g<_!2ppkX1rTd#L``D-RR0nTAFX1kAc_4 z!yHsfm<dvpP!J<8o1&bMdO{|^&z|%z2c<+6LM8=4Dk<2wb(>gk^{~&l;zIw<Ka%wM q@2eX*^nb#uM^s?*|C3Di`M;YypV2;0-{yXeagpKN-s}$iu>((@uRaR^ literal 0 HcmV?d00001 -- 2.9.0.217.g096ca68 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-23 23:20 ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King @ 2016-06-23 23:31 ` Jeff King 2016-06-24 16:38 ` Johannes Sixt 2016-06-24 18:56 ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Junio C Hamano 2 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-23 23:31 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson On Thu, Jun 23, 2016 at 07:20:44PM -0400, Jeff King wrote: > I'm still not excited about the 64MB write, just because it's awfully > heavyweight for such a trivial test. It runs pretty fast on my RAM disk, > but maybe not on other people's system. > > I considered but didn't explore two other options: > > 1. I couldn't convince zlib to write a smaller file (this is done with > core.compression=9). But I'm not sure if that's inherent to the > on-disk format, or simply the maximum size of a deflate block. > > So it's possible that one could hand-roll zlib data that says "I'm > 64GB" but is only a few bytes long. > > 2. We don't ever want to see the whole 64GB, of course; we want to > stream it out and only care about the header (as an aside, this > makes a wonderful test that we are hitting the streaming code path, > as it's unlikely to work without it :) ). > > So another option would be to include a truncated file that claims > to be 64GB, and has only the first 256kb or something worth of data > (which should deflate down to almost nothing). > > git-fsck wouldn't work, of course, but we don't need to run it. > Other bits of git might complain, but our plan is for git to get > SIGPIPE before hitting that point anyway. > > So that seems pretty easy, but it is potentially flaky. Writing that convinced me that (2) is actually quite a sane way to go. The patch is below, which seems to work. I arbitrarily picked the first 2048 bytes of the loose object. That's 1/32768 of the original. If we assume the compression ratio is stable through the file (and it should be; the file is all zeroes), that should generate 2MB of data should we need it (way more than we feed to our "head -c" invocation). This patch is on top of the whole series just to illustrate it. Doing it for real will involve squashing it into the first patch (and adjusting the commit message), and then handling the minor rebase conflicts. I'll hold off on a re-roll until I get any comments on v3. -Peff --- diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 07e0bdc..e542938 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -339,19 +339,12 @@ test_lazy_prereq TAR_HUGE ' test_cmp expect actual ' -# Likewise, we need bunzip for the 64GB git object. -test_lazy_prereq BUNZIP ' - bunzip2 --version -' - -test_expect_success BUNZIP 'set up repository with huge blob' ' +test_expect_success 'set up repository with huge blob' ' obj_d=19 && obj_f=f9c8273ec45a8938e6999cb59b3ff66739902a && obj=${obj_d}${obj_f} && mkdir -p .git/objects/$obj_d && - bunzip2 -c \ - <"$TEST_DIRECTORY"/t5000/$obj.bz2 \ - >.git/objects/$obj_d/$obj_f && + cp "$TEST_DIRECTORY"/t5000/$obj .git/objects/$obj_d/$obj_f && rm -f .git/index && git update-index --add --cacheinfo 100644,$obj,huge && git commit -m huge @@ -359,7 +352,7 @@ test_expect_success BUNZIP 'set up repository with huge blob' ' # We expect git to die with SIGPIPE here (otherwise we # would generate the whole 64GB). -test_expect_success BUNZIP 'generate tar with huge size' ' +test_expect_success 'generate tar with huge size' ' { git archive HEAD echo $? >exit-code @@ -368,7 +361,7 @@ test_expect_success BUNZIP 'generate tar with huge size' ' test_cmp expect exit-code ' -test_expect_success BUNZIP,TAR_HUGE 'system tar can read our huge size' ' +test_expect_success TAR_HUGE 'system tar can read our huge size' ' echo 68719476737 >expect && tar_info huge.tar | cut -d" " -f1 >actual && test_cmp expect actual diff --git a/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a b/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a new file mode 100644 index 0000000000000000000000000000000000000000..5cbe9ec312bfd7b7e0398ca281e9d42848743704 GIT binary patch literal 2048 zcmb=p_2!@<FM|RPgTa!7MhCMEn`g89mv_5$t@dj2a|><dbTfugFd71*Aut*OqaiRF L0;3@?%t8PFWtt9Z literal 0 HcmV?d00001 diff --git a/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2 b/t/t5000/19f9c8273ec45a8938e6999cb59b3ff66739902a.bz2 deleted file mode 100644 index 9619fd3c5f6f345a40aba1b91807bb4d937fc51c..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 578 zcmV-I0=@l0T4*^jL0KkKS$mI#Bmk}@egE{{00qbxVL(9uNQgj6FakgbL{P{8umA%P z1cU%-zyaKJ3Zno3&;S`U000^T007Vc88iR@8UZy`L7)Mk0iXZ?8a+UzR8!JvwH}iK zXvkv}!$8#Dpwk&jhdD}y=}Ly#N{6=5N`@-9@zf#~Y}ml6A5xVKIEqv?6sUZ+ic~-0 z3bD}&yn;flel6M|SGh`t%>^nMd9)O$Wu&Noa%<D9r9(fbH_W9&qsPxi@k)lGl@5I+ zDj0prR6BE&sCksAbrh&!tffP{F-nI-prpRlLJ+$#Qc)DvQlXBLp@B+<1u7d*3ZhIX zg<RA^t&a>SgnP<VG8CwK>ira`Ve7_Hm1V`0kgE@zR6?%0tYir&brh(3dz7ebDNy(^ zL0KkKS?}&rK>-rd{r~jXFbzl$fdD`NU<g16aRNX9Kv2K{umFG%1O@;pzyY`b3;;A} zXu>jRG-;qPzyn5xj3Xw4Mw$Zz)lx%14F-mTL6atf)Gz?ip`!@NpwXs)%#?y?q!Xf$ zP8vZs=>+{8*(#7u93>!3K5|fsT$F-(VMr#rK{}}f`lJ&zAez-6nY%u7(h0VZPg+4e zX$0a(CY`i`W~3AAPBemQq!ZGRP7dlpJ?RA1nvhKtf@|4GCaOU@$`MOSl~R?f3Ia)6 zDFpYV6Llb)sRYUuLMe6QNGF9Ln9>Q3G=gtZ37;(>om7H;BohijH-{QkNGI(epUDKq QkWcY<BvXY60gn}ckTCG&H~;_u ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-23 23:20 ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King 2016-06-23 23:31 ` Jeff King @ 2016-06-24 16:38 ` Johannes Sixt 2016-06-24 16:46 ` Jeff King 2016-06-24 18:56 ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Junio C Hamano 2 siblings, 1 reply; 61+ messages in thread From: Johannes Sixt @ 2016-06-24 16:38 UTC (permalink / raw) To: Jeff King; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson Am 24.06.2016 um 01:20 schrieb Jeff King: > +# We expect git to die with SIGPIPE here (otherwise we > +# would generate the whole 64GB). > +test_expect_failure BUNZIP 'generate tar with huge size' ' > + { > + git archive HEAD > + echo $? >exit-code > + } | head -c 4096 >huge.tar && > + echo 141 >expect && > + test_cmp expect exit-code It's going to be 269 with ksh, and who-knows-what on Windows (due to lack of SIGPIPE - I haven't tested this, yet). -- Hannes ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-24 16:38 ` Johannes Sixt @ 2016-06-24 16:46 ` Jeff King 2016-06-24 17:05 ` Johannes Sixt 0 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-24 16:46 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 06:38:55PM +0200, Johannes Sixt wrote: > Am 24.06.2016 um 01:20 schrieb Jeff King: > > +# We expect git to die with SIGPIPE here (otherwise we > > +# would generate the whole 64GB). > > +test_expect_failure BUNZIP 'generate tar with huge size' ' > > + { > > + git archive HEAD > > + echo $? >exit-code > > + } | head -c 4096 >huge.tar && > > + echo 141 >expect && > > + test_cmp expect exit-code > > It's going to be 269 with ksh, and who-knows-what on Windows (due to lack of > SIGPIPE - I haven't tested this, yet). Thanks, I meant to ask about that. We do a workaround in t0005, but we _don't_ do it in the new sigpipe handling for test_must_fail. Is the latter just broken, too? -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-24 16:46 ` Jeff King @ 2016-06-24 17:05 ` Johannes Sixt 2016-06-24 19:39 ` [PATCH 0/4] portable signal-checking in tests Jeff King 0 siblings, 1 reply; 61+ messages in thread From: Johannes Sixt @ 2016-06-24 17:05 UTC (permalink / raw) To: Jeff King; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson Am 24.06.2016 um 18:46 schrieb Jeff King: > On Fri, Jun 24, 2016 at 06:38:55PM +0200, Johannes Sixt wrote: >> It's going to be 269 with ksh, and who-knows-what on Windows (due to lack of >> SIGPIPE - I haven't tested this, yet). > > Thanks, I meant to ask about that. We do a workaround in t0005, but we > _don't_ do it in the new sigpipe handling for test_must_fail. Is the > latter just broken, too? That's well possible. It is not prepared to see ksh's exit codes for signals. -- Hannes ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 0/4] portable signal-checking in tests 2016-06-24 17:05 ` Johannes Sixt @ 2016-06-24 19:39 ` Jeff King 2016-06-24 19:43 ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King ` (4 more replies) 0 siblings, 5 replies; 61+ messages in thread From: Jeff King @ 2016-06-24 19:39 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 07:05:14PM +0200, Johannes Sixt wrote: > Am 24.06.2016 um 18:46 schrieb Jeff King: > > On Fri, Jun 24, 2016 at 06:38:55PM +0200, Johannes Sixt wrote: > > > It's going to be 269 with ksh, and who-knows-what on Windows (due to lack of > > > SIGPIPE - I haven't tested this, yet). > > > > Thanks, I meant to ask about that. We do a workaround in t0005, but we > > _don't_ do it in the new sigpipe handling for test_must_fail. Is the > > latter just broken, too? > > That's well possible. It is not prepared to see ksh's exit codes for > signals. I'm actually not convinced that old versions of ksh are viable for running the test suite. mksh seems to use POSIX semantics, and I cannot get through t0005 with ksh93, as it parses nested parentheses wrong. But maybe there are other ksh variants that use the funny exit codes, but are otherwise not too buggy. I'd be more concerned with Windows. The SIGPIPE tests in t0005 are already marked !MINGW, but other checks elsewhere are not. I know there is no SIGPIPE on Windows, so it may be that some cases happen to work because we end up in write_or_die(), which converts EPIPE into a 141 exit. Anyway. Here's a series that I think makes things better, and it is not too painful to do. [1/4]: tests: factor portable signal check out of t0005 [2/4]: t0005: use test_match_signal as appropriate [3/4]: test_must_fail: use test_match_signal [4/4]: t/lib-git-daemon: use test_match_signal -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 1/4] tests: factor portable signal check out of t0005 2016-06-24 19:39 ` [PATCH 0/4] portable signal-checking in tests Jeff King @ 2016-06-24 19:43 ` Jeff King 2016-06-24 20:52 ` Johannes Sixt 2016-06-24 19:44 ` [PATCH 2/4] t0005: use test_match_signal as appropriate Jeff King ` (3 subsequent siblings) 4 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-24 19:43 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson In POSIX shells, a program which exits due to a signal generally has an exit code of 128 plus the signal number. However, some platforms do other things. ksh uses 256 plus the signal number, and on Windows, all signals are just "3". We've accounted for that in t0005, but not in other tests. Let's pull out the logic so we can use it elsewhere. It would be nice for debugging if this additionally printed errors to stderr, like our other test_* helpers. But we're going to need to use it in other places besides the innards of a test_expect block. So let's leave it as generic as possible. Signed-off-by: Jeff King <peff@peff.net> --- I didn't get into the weirdness of SIGPIPE on Windows here, but I think this is probably a first step toward handling it better. E.g., it may be that test_match_signal should respect 128 (or even any code) when we are checking for SIGPIPE. I also didn't bother with symbolic names. We could make: test_match_signal sigterm $? work, but I didn't think it was worth the effort. While numbers for some obscure signals do vary on platforms, sigpipe and sigterm are standard enough to rely on. t/t0005-signals.sh | 7 +------ t/test-lib-functions.sh | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+), 6 deletions(-) diff --git a/t/t0005-signals.sh b/t/t0005-signals.sh index e7f27eb..2d96265 100755 --- a/t/t0005-signals.sh +++ b/t/t0005-signals.sh @@ -11,12 +11,7 @@ EOF test_expect_success 'sigchain works' ' { test-sigchain >actual; ret=$?; } && - case "$ret" in - 143) true ;; # POSIX w/ SIGTERM=15 - 271) true ;; # ksh w/ SIGTERM=15 - 3) true ;; # Windows - *) false ;; - esac && + test_match_signal 15 "$ret" && test_cmp expect actual ' diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 48884d5..51d3775 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -961,3 +961,21 @@ test_env () { done ) } + +# Returns true if the numeric exit code in "$2" represents the expected signal +# in "$1". Signals should be given numerically. +test_match_signal () { + if test "$2" = "$((128 + $1))" + then + # POSIX + return 0 + elif test "$2" = "$((256 + $1))" + then + # ksh + return 0 + elif test "$2" = "3"; then + # Windows + return 0 + fi + return 1 +} -- 2.9.0.215.gc5c4261 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH 1/4] tests: factor portable signal check out of t0005 2016-06-24 19:43 ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King @ 2016-06-24 20:52 ` Johannes Sixt 2016-06-24 21:05 ` Jeff King 0 siblings, 1 reply; 61+ messages in thread From: Johannes Sixt @ 2016-06-24 20:52 UTC (permalink / raw) To: Jeff King; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson Am 24.06.2016 um 21:43 schrieb Jeff King: > In POSIX shells, a program which exits due to a signal > generally has an exit code of 128 plus the signal number. > However, some platforms do other things. ksh uses 256 plus > the signal number, and on Windows, all signals are just "3". That's not true, see below. > I didn't get into the weirdness of SIGPIPE on Windows here, but I think > this is probably a first step toward handling it better. E.g., it may be > that test_match_signal should respect 128 (or even any code) when we are > checking for SIGPIPE. The Windows behavior is most closely described as having signal(SIGPIPE, SIG_IGN) at the very beginning of the program. > +# Returns true if the numeric exit code in "$2" represents the expected signal > +# in "$1". Signals should be given numerically. > +test_match_signal () { > + if test "$2" = "$((128 + $1))" > + then > + # POSIX > + return 0 > + elif test "$2" = "$((256 + $1))" > + then > + # ksh > + return 0 > + elif test "$2" = "3"; then > + # Windows You meant well here, but this is close to pointless as a general check. We check for this exit code in t0005 because there program termination happens via raise(), which on Window just calls exit(3). This exit code is not an indication that something related to SIGPIPE (or any signal) happened. IMO there is too much danger to trigger a false positive if exit code 3 is treated special in this generality. > + return 0 > + fi > + return 1 > +} -- Hannes ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/4] tests: factor portable signal check out of t0005 2016-06-24 20:52 ` Johannes Sixt @ 2016-06-24 21:05 ` Jeff King 2016-06-24 21:32 ` Johannes Sixt 0 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-24 21:05 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 10:52:32PM +0200, Johannes Sixt wrote: > Am 24.06.2016 um 21:43 schrieb Jeff King: > > In POSIX shells, a program which exits due to a signal > > generally has an exit code of 128 plus the signal number. > > However, some platforms do other things. ksh uses 256 plus > > the signal number, and on Windows, all signals are just "3". > > That's not true, see below. I was worried about that. Git for Windows seems like a labyrinth of bizarre special cases. > > I didn't get into the weirdness of SIGPIPE on Windows here, but I think > > this is probably a first step toward handling it better. E.g., it may be > > that test_match_signal should respect 128 (or even any code) when we are > > checking for SIGPIPE. > > The Windows behavior is most closely described as having signal(SIGPIPE, > SIG_IGN) at the very beginning of the program. Right, but then we would get EPIPE. So what does git do in such cases? I'd expect it generally to either hit the check_pipe() part of write_or_die(), or to end up complaining via die() that the write didn't go as expected. > > +# Returns true if the numeric exit code in "$2" represents the expected signal > > +# in "$1". Signals should be given numerically. > > +test_match_signal () { > > + if test "$2" = "$((128 + $1))" > > + then > > + # POSIX > > + return 0 > > + elif test "$2" = "$((256 + $1))" > > + then > > + # ksh > > + return 0 > > + elif test "$2" = "3"; then > > + # Windows > > You meant well here, but this is close to pointless as a general check. We > check for this exit code in t0005 because there program termination happens > via raise(), which on Window just calls exit(3). This exit code is not an > indication that something related to SIGPIPE (or any signal) happened. > > IMO there is too much danger to trigger a false positive if exit code 3 is > treated special in this generality. Yeah, I agree. But what _should_ it do? E.g., what happens to git-daemon when it is killed via TERM? -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH 1/4] tests: factor portable signal check out of t0005 2016-06-24 21:05 ` Jeff King @ 2016-06-24 21:32 ` Johannes Sixt 0 siblings, 0 replies; 61+ messages in thread From: Johannes Sixt @ 2016-06-24 21:32 UTC (permalink / raw) To: Jeff King; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson Am 24.06.2016 um 23:05 schrieb Jeff King: > On Fri, Jun 24, 2016 at 10:52:32PM +0200, Johannes Sixt wrote: >> The Windows behavior is most closely described as having signal(SIGPIPE, >> SIG_IGN) at the very beginning of the program. > > Right, but then we would get EPIPE. So what does git do in such cases? > I'd expect it generally to either hit the check_pipe() part of > write_or_die(), or to end up complaining via die() that the write didn't > go as expected. Ah, I have forgotten about this code path. Looks like it will trigger exactly the same raise() as test_sigchain. Then the check for exit code 3 makes a bit more sense. But I'm sure we have code paths that do not go through checK_pipe(). Those would proceed through whatever error handling we have and most likely die(). >> IMO there is too much danger to trigger a false positive if exit code 3 is >> treated special in this generality. > > Yeah, I agree. But what _should_ it do? E.g., what happens to git-daemon > when it is killed via TERM? Frankly, I don't know how bash's 'kill -TERM' and a Windows program interact. I've avoided this topic like the plague so far. -- Hannes ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH 2/4] t0005: use test_match_signal as appropriate 2016-06-24 19:39 ` [PATCH 0/4] portable signal-checking in tests Jeff King 2016-06-24 19:43 ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King @ 2016-06-24 19:44 ` Jeff King 2016-06-24 19:45 ` [PATCH 3/4] test_must_fail: use test_match_signal Jeff King ` (2 subsequent siblings) 4 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-24 19:44 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson The first test already uses this more portable construct (that was where it was factored from initially), but the later tests do a raw comparison against 141 to look for SIGPIPE, which can fail on some shells and platforms. Signed-off-by: Jeff King <peff@peff.net> --- Again, I couldn't test these. They're skipped on MINGW, and ksh93 barfs before even executing the tests. t/t0005-signals.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/t/t0005-signals.sh b/t/t0005-signals.sh index 2d96265..c80c995 100755 --- a/t/t0005-signals.sh +++ b/t/t0005-signals.sh @@ -36,12 +36,12 @@ test_expect_success 'create blob' ' test_expect_success !MINGW 'a constipated git dies with SIGPIPE' ' OUT=$( ((large_git; echo $? 1>&3) | :) 3>&1 ) && - test "$OUT" -eq 141 + test_match_signal 13 "$OUT" ' test_expect_success !MINGW 'a constipated git dies with SIGPIPE even if parent ignores it' ' OUT=$( ((trap "" PIPE; large_git; echo $? 1>&3) | :) 3>&1 ) && - test "$OUT" -eq 141 + test_match_signal 13 "$OUT" ' test_done -- 2.9.0.215.gc5c4261 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH 3/4] test_must_fail: use test_match_signal 2016-06-24 19:39 ` [PATCH 0/4] portable signal-checking in tests Jeff King 2016-06-24 19:43 ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King 2016-06-24 19:44 ` [PATCH 2/4] t0005: use test_match_signal as appropriate Jeff King @ 2016-06-24 19:45 ` Jeff King 2016-06-24 19:45 ` [PATCH 4/4] t/lib-git-daemon: " Jeff King 2016-06-24 19:48 ` [PATCH 0/4] portable signal-checking in tests Jeff King 4 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-24 19:45 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson In 8bf4bec (add "ok=sigpipe" to test_must_fail and use it to fix flaky tests, 2015-11-27), test_must_fail learned to recognize "141" as a sigpipe failure. However, testing for a signal is more complicated than that; we should use test_match_signal to implement more portable checking. Signed-off-by: Jeff King <peff@peff.net> --- t/test-lib-functions.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 51d3775..91d3b58 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -612,7 +612,7 @@ test_must_fail () { then echo >&2 "test_must_fail: command succeeded: $*" return 1 - elif test $exit_code -eq 141 && list_contains "$_test_ok" sigpipe + elif test_match_signal 13 $exit_code && list_contains "$_test_ok" sigpipe then return 0 elif test $exit_code -gt 129 && test $exit_code -le 192 -- 2.9.0.215.gc5c4261 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* [PATCH 4/4] t/lib-git-daemon: use test_match_signal 2016-06-24 19:39 ` [PATCH 0/4] portable signal-checking in tests Jeff King ` (2 preceding siblings ...) 2016-06-24 19:45 ` [PATCH 3/4] test_must_fail: use test_match_signal Jeff King @ 2016-06-24 19:45 ` Jeff King 2016-06-24 19:48 ` [PATCH 0/4] portable signal-checking in tests Jeff King 4 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-24 19:45 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson When git-daemon exits, we expect it to be with the SIGTERM we just sent it. If we see anything else, we'll complain. But our check against exit code "143" is not portable. For example: $ ksh93 t5570-git-daemon.sh [...] error: git daemon exited with status: 271 We can fix this by using test_match_signal. Signed-off-by: Jeff King <peff@peff.net> --- t/lib-git-daemon.sh | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/t/lib-git-daemon.sh b/t/lib-git-daemon.sh index 340534c..f9cbd47 100644 --- a/t/lib-git-daemon.sh +++ b/t/lib-git-daemon.sh @@ -82,8 +82,7 @@ stop_git_daemon() { kill "$GIT_DAEMON_PID" wait "$GIT_DAEMON_PID" >&3 2>&4 ret=$? - # expect exit with status 143 = 128+15 for signal TERM=15 - if test $ret -ne 143 + if test_match_signal 15 $? then error "git daemon exited with status: $ret" fi -- 2.9.0.215.gc5c4261 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH 0/4] portable signal-checking in tests 2016-06-24 19:39 ` [PATCH 0/4] portable signal-checking in tests Jeff King ` (3 preceding siblings ...) 2016-06-24 19:45 ` [PATCH 4/4] t/lib-git-daemon: " Jeff King @ 2016-06-24 19:48 ` Jeff King 4 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-24 19:48 UTC (permalink / raw) To: Johannes Sixt; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 03:39:24PM -0400, Jeff King wrote: > Anyway. Here's a series that I think makes things better, and it is not > too painful to do. > > [1/4]: tests: factor portable signal check out of t0005 > [2/4]: t0005: use test_match_signal as appropriate > [3/4]: test_must_fail: use test_match_signal > [4/4]: t/lib-git-daemon: use test_match_signal Oh, and I meant to mention: this covers everything I found grepping for "141" and "143" in the test suite (though you have to filter the results a bit for false positives). It doesn't fix the case newly added in the tar series that started this discussion. I don't want to hold either topic hostage to the other, so I'll prepare a patch to go on top. -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-23 23:20 ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King 2016-06-23 23:31 ` Jeff King 2016-06-24 16:38 ` Johannes Sixt @ 2016-06-24 18:56 ` Junio C Hamano 2016-06-24 19:07 ` Jeff King 2 siblings, 1 reply; 61+ messages in thread From: Junio C Hamano @ 2016-06-24 18:56 UTC (permalink / raw) To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson Jeff King <peff@peff.net> writes: > The ustar format only has room for 11 (or 12, depending on > some implementations) octal digits for the size and mtime of > each file. After this, we have to add pax extended headers > to specify the real data, and git does not yet know how to > do so. I am not a native speaker but "After" above made me hiccup. I think I am correct to understand that it means "after passing this limit", aka "to represent files bigger or newer than these", but still it felt somewhat strange. > So as a prerequisite, we can feed the system tar a reference > tarball to make sure it can handle these features. The > reference tar here was created with: > > dd if=/dev/zero seek=64G bs=1 count=1 of=huge > touch -d @68719476737 huge > tar cf - --format=pax | > head -c 2048 > > using GNU tar. Note that this is not a complete tarfile, but > it's enough to contain the headers we want to examine. Cute. I didn't remember they had @<seconds-since-epoch> format, even though I must have seen what they do while working on 2c733fb2 (parse_date(): '@' prefix forces git-timestamp, 2012-02-02). > +# See if our system tar can handle a tar file with huge sizes and dates far in > +# the future, and that we can actually parse its output. > +# > +# The reference file was generated by GNU tar, and the magic time and size are > +# both octal 01000000000001, which overflows normal ustar fields. > +# > +# When parsing, we'll pull out only the year from the date; that > +# avoids any question of timezones impacting the result. ... as long as the month-day part is not close to the year boundary. So this explanation is insuffucient to convince the reader that "that avoids any question" is correct, without saying that it is in August of year 4147. > +tar_info () { > + "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1 > +} A blank after the shell function to make it easier to see the boundary. Seeing an awk piped into cut always makes me want to suggest a single sed/awk/perl invocation. > +# We expect git to die with SIGPIPE here (otherwise we > +# would generate the whole 64GB). > +test_expect_failure BUNZIP 'generate tar with huge size' ' > + { > + git archive HEAD > + echo $? >exit-code > + } | head -c 4096 >huge.tar && > + echo 141 >expect && > + test_cmp expect exit-code > +' "head -c" is GNU-ism, isn't it? "dd bs=1 count=4096" is hopefully more portable. ksh signal death you already know about. I wonder if we want to expose something like list_contains as a friend of test_cmp. list_contains 141,269 $(cat exit-code) Thanks. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-24 18:56 ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Junio C Hamano @ 2016-06-24 19:07 ` Jeff King 2016-06-24 19:44 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 61+ messages in thread From: Jeff King @ 2016-06-24 19:07 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 11:56:19AM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > The ustar format only has room for 11 (or 12, depending on > > some implementations) octal digits for the size and mtime of > > each file. After this, we have to add pax extended headers > > to specify the real data, and git does not yet know how to > > do so. > > I am not a native speaker but "After" above made me hiccup. I think > I am correct to understand that it means "after passing this limit", > aka "to represent files bigger or newer than these", but still it > felt somewhat strange. Yeah, I agree that it reads badly. I'm not sure what I was thinking. I'll tweak it in the re-roll. > > +# See if our system tar can handle a tar file with huge sizes and dates far in > > +# the future, and that we can actually parse its output. > > +# > > +# The reference file was generated by GNU tar, and the magic time and size are > > +# both octal 01000000000001, which overflows normal ustar fields. > > +# > > +# When parsing, we'll pull out only the year from the date; that > > +# avoids any question of timezones impacting the result. > > ... as long as the month-day part is not close to the year boundary. > So this explanation is insuffucient to convince the reader that > "that avoids any question" is correct, without saying that it is in > August of year 4147. I thought that part didn't need to be said, but I can say it (technically we can include the month, too, but I don't think that level of accuracy is really important for these tests). > > +tar_info () { > > + "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1 > > +} > > A blank after the shell function to make it easier to see the > boundary. I was intentionally trying to couple it with prereq below, as the comment describes both of them. > Seeing an awk piped into cut always makes me want to suggest a > single sed/awk/perl invocation. I want the auto-splitting of awk, but then to auto-split the result using a different delimiter. Is there a not-painful way to do that in awk? I could certainly come up with a regex to do it in sed, but I wanted to keep the parsing as liberal and generic as possible. Certainly I could do it in perl, but I had the general impression that we prefer to keep the dependency on perl to a minimum. Maybe it doesn't matter. > > +# We expect git to die with SIGPIPE here (otherwise we > > +# would generate the whole 64GB). > > +test_expect_failure BUNZIP 'generate tar with huge size' ' > > + { > > + git archive HEAD > > + echo $? >exit-code > > + } | head -c 4096 >huge.tar && > > + echo 141 >expect && > > + test_cmp expect exit-code > > +' > > "head -c" is GNU-ism, isn't it? You're right; for some reason I thought it was in POSIX. We do have a couple instances of it, but they are all in the valgrind setup code (which I guess most people don't ever run). > "dd bs=1 count=4096" is hopefully more portable. Hmm. I always wonder whether dd is actually very portable, but we do use it already, at least. Perhaps the perl monstrosity in t9300 could be replaced with that, too. > ksh signal death you already know about. I wonder if we want to > expose something like list_contains as a friend of test_cmp. > > list_contains 141,269 $(cat exit-code) I think we would want something more like: test_signal_match 13 $(cat exit-code) Each call site should not have to know about every signal convention (and in your example, the magic "3" of Windows is left out). -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-24 19:07 ` Jeff King @ 2016-06-24 19:44 ` Junio C Hamano 2016-06-24 20:58 ` Eric Sunshine 2016-06-24 20:58 ` Jeff King 2 siblings, 0 replies; 61+ messages in thread From: Junio C Hamano @ 2016-06-24 19:44 UTC (permalink / raw) To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson Jeff King <peff@peff.net> writes: >> > +# When parsing, we'll pull out only the year from the date; that >> > +# avoids any question of timezones impacting the result. >> >> ... as long as the month-day part is not close to the year boundary. >> So this explanation is insuffucient to convince the reader that >> "that avoids any question" is correct, without saying that it is in >> August of year 4147. > > I thought that part didn't need to be said, but I can say it > (technically we can include the month, too, but I don't think that level > of accuracy is really important for these tests). Oh, I wasn't suggesting to include the month in the comparison. But to understand why it is safe from TZ jitters to test only year, the reader needs to know (or do the math herself) that the timestamp is away from the year boundary, so mentioning August in the justifying comment is needed. >> Seeing an awk piped into cut always makes me want to suggest a >> single sed/awk/perl invocation. > > I want the auto-splitting of awk, but then to auto-split the result > using a different delimiter. Is there a not-painful way to do that in > awk? > > I could certainly come up with a regex to do it in sed, but I wanted to > keep the parsing as liberal and generic as possible. > > Certainly I could do it in perl, but I had the general impression that > we prefer to keep the dependency on perl to a minimum. Maybe it doesn't > matter. Heh. It was merely "makes me want to suggest", not "I suggest". If I were doing this myself, I would have done a single sed but it does not matter. > I think we would want something more like: > > test_signal_match 13 $(cat exit-code) I like that. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-24 19:07 ` Jeff King 2016-06-24 19:44 ` Junio C Hamano @ 2016-06-24 20:58 ` Eric Sunshine 2016-06-24 21:09 ` Jeff King 2016-06-24 20:58 ` Jeff King 2 siblings, 1 reply; 61+ messages in thread From: Eric Sunshine @ 2016-06-24 20:58 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Git List, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 3:07 PM, Jeff King <peff@peff.net> wrote: > On Fri, Jun 24, 2016 at 11:56:19AM -0700, Junio C Hamano wrote: >> Jeff King <peff@peff.net> writes: >> > +tar_info () { >> > + "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1 >> > +} > >> Seeing an awk piped into cut always makes me want to suggest a >> single sed/awk/perl invocation. > > I want the auto-splitting of awk, but then to auto-split the result > using a different delimiter. Is there a not-painful way to do that in > awk? The awk split() function is POSIX and accepts an optional separator argument. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-24 20:58 ` Eric Sunshine @ 2016-06-24 21:09 ` Jeff King 0 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-24 21:09 UTC (permalink / raw) To: Eric Sunshine Cc: Junio C Hamano, Git List, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 04:58:58PM -0400, Eric Sunshine wrote: > On Fri, Jun 24, 2016 at 3:07 PM, Jeff King <peff@peff.net> wrote: > > On Fri, Jun 24, 2016 at 11:56:19AM -0700, Junio C Hamano wrote: > >> Jeff King <peff@peff.net> writes: > >> > +tar_info () { > >> > + "$TAR" tvf "$1" | awk '{print $3 " " $4}' | cut -d- -f1 > >> > +} > > > >> Seeing an awk piped into cut always makes me want to suggest a > >> single sed/awk/perl invocation. > > > > I want the auto-splitting of awk, but then to auto-split the result > > using a different delimiter. Is there a not-painful way to do that in > > awk? > > The awk split() function is POSIX and accepts an optional separator argument. Thanks. I'm not that familiar with awk functions, simply because I came of age after perl existed, and using perl tends to be more portable and powerful (if you can assume it's available). But this is simple enough that it should be OK. Replacing it with: "$TAR" tvf "$1" | awk '{ split($4, date, "-") print $3 " " date[1] }' seems to work. -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-24 19:07 ` Jeff King 2016-06-24 19:44 ` Junio C Hamano 2016-06-24 20:58 ` Eric Sunshine @ 2016-06-24 20:58 ` Jeff King 2016-06-24 22:41 ` Junio C Hamano 2 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-24 20:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 03:07:44PM -0400, Jeff King wrote: > > "dd bs=1 count=4096" is hopefully more portable. > > Hmm. I always wonder whether dd is actually very portable, but we do use > it already, at least. > > Perhaps the perl monstrosity in t9300 could be replaced with that, too. Hrm. So I wrote a patch for t9300 for this. But I wanted to flip the order to: dd bs=4096 count=1 because otherwise, dd will call read() 4096 times, for 1 byte each. But it's not safe to do that on a pipe. For example: { echo 1 sleep 1 echo 2 } | dd bs=4 count=1 will copy only 2 bytes. So it's racily wrong, depending on how the writer feeds the data to write(). The 1-byte reads do work (assuming blocking descriptors and that dd restarts a read after a signal, which mine seems to). But yuck. The difference in time between the two is measurable on my system, but it's only a few milliseconds (for 4096 bytes). So maybe it's not worth worrying about (though as a general technique, it does make me worry that it's easy to get wrong in a way that will fail racily). -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-24 20:58 ` Jeff King @ 2016-06-24 22:41 ` Junio C Hamano 2016-06-24 23:22 ` Jeff King 0 siblings, 1 reply; 61+ messages in thread From: Junio C Hamano @ 2016-06-24 22:41 UTC (permalink / raw) To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson Jeff King <peff@peff.net> writes: > The difference in time between the two is measurable on my system, but > it's only a few milliseconds (for 4096 bytes). So maybe it's not worth > worrying about (though as a general technique, it does make me worry > that it's easy to get wrong in a way that will fail racily). Yeah, GNU dd has iflag=fullblock, but if we assume GNU, we can safely assume "head -c", so I do not think of a way to do this portably enough. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 1/4] t5000: test tar files that overflow ustar headers 2016-06-24 22:41 ` Junio C Hamano @ 2016-06-24 23:22 ` Jeff King 0 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-24 23:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 03:41:47PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > The difference in time between the two is measurable on my system, but > > it's only a few milliseconds (for 4096 bytes). So maybe it's not worth > > worrying about (though as a general technique, it does make me worry > > that it's easy to get wrong in a way that will fail racily). > > Yeah, GNU dd has iflag=fullblock, but if we assume GNU, we can > safely assume "head -c", so I do not think of a way to do this > portably enough. Thinking on it more, "head -c" is _not_ what one would want in all cases. It would work here, but not in t9300, for instance, where the code is trying to read an exact number of bytes from a fifo. I don't think "head" makes any promises about buffering and may read extra bytes. So I dunno. "dd" generally does make such promises, or perhaps the perl sysread() solution in t9300 is not so bad. -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB 2016-06-23 23:15 ` [PATCH v3] " Jeff King 2016-06-23 23:20 ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King @ 2016-06-23 23:21 ` Jeff King 2016-06-24 19:01 ` Junio C Hamano 2016-06-23 23:21 ` [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime Jeff King 2016-06-23 23:21 ` [PATCH v3 4/4] archive-tar: drop return value Jeff King 3 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-23 23:21 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson The ustar format has a fixed-length field for the size of each file entry which is supposed to contain up to 11 bytes of octal-formatted data plus a NUL or space terminator. These means that the largest size we can represent is 077777777777, or 1 byte short of 8GB. The correct solution for a larger file, according to POSIX.1-2001, is to add an extended pax header, similar to how we handle long filenames. This patch does that, and writes zero for the size field in the ustar header (the last bit is not mentioned by POSIX, but it matches how GNU tar behaves with --format=pax). This should be a strict improvement over the current behavior, which is to die in xsnprintf with a "BUG". However, there's some interesting history here. Prior to f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), we silently overflowed the "size" field. The extra bytes ended up in the "mtime" field of the header, which was then immediately written itself, overwriting our extra bytes. What that means depends on how many bytes we wrote. If the size was 64GB or greater, then we actually overflowed digits into the mtime field, meaning our value was was effectively right-shifted by those lost octal digits. And this patch is again a strict improvement over that. But if the size was between 8GB and 64GB, then our 12-byte field held all of the actual digits, and only our NUL terminator overflowed. According to POSIX, there should be a NUL or space at the end of the field. However, GNU tar seems to be lenient here, and will correctly parse a size up 64GB (minus one) from the field. So sizes in this range might have just worked, depending on the implementation reading the tarfile. This patch is mostly still an improvement there, as the 8GB limit is specifically mentioned in POSIX as the correct limit. But it's possible that it could be a regression (versus the pre-f2f0267 state) if all of the following are true: 1. You have a file between 8GB and 64GB. 2. Your tar implementation _doesn't_ know about pax extended headers. 3. Your tar implementation _does_ parse 12-byte sizes from the ustar header without a delimiter. It's probably not worth worrying about such an obscure set of conditions, but I'm documenting it here just in case. Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> --- archive-tar.c | 24 ++++++++++++++++++++++-- t/t5000-tar-tree.sh | 4 ++-- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index cb99df2..274bdfa 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword, strbuf_addch(sb, '\n'); } +/* + * Like strbuf_append_ext_header, but for numeric values. + */ +static void strbuf_append_ext_header_uint(struct strbuf *sb, + const char *keyword, + uintmax_t value) +{ + char buf[40]; /* big enough for 2^128 in decimal, plus NUL */ + int len; + + len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value); + strbuf_append_ext_header(sb, keyword, buf, len); +} + static unsigned int ustar_header_chksum(const struct ustar_header *header) { const unsigned char *p = (const unsigned char *)header; @@ -208,7 +222,7 @@ static int write_tar_entry(struct archiver_args *args, struct ustar_header header; struct strbuf ext_header = STRBUF_INIT; unsigned int old_mode = mode; - unsigned long size; + unsigned long size, size_in_header; void *buffer; int err = 0; @@ -267,7 +281,13 @@ static int write_tar_entry(struct archiver_args *args, memcpy(header.linkname, buffer, size); } - prepare_header(args, &header, mode, size); + size_in_header = size; + if (S_ISREG(mode) && size > 077777777777UL) { + size_in_header = 0; + strbuf_append_ext_header_uint(&ext_header, "size", size); + } + + prepare_header(args, &header, mode, size_in_header); if (ext_header.len > 0) { err = write_extended_header(args, sha1, ext_header.buf, diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index e7c9271..79dbc88 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -359,7 +359,7 @@ test_expect_success BUNZIP 'set up repository with huge blob' ' # We expect git to die with SIGPIPE here (otherwise we # would generate the whole 64GB). -test_expect_failure BUNZIP 'generate tar with huge size' ' +test_expect_success BUNZIP 'generate tar with huge size' ' { git archive HEAD echo $? >exit-code @@ -368,7 +368,7 @@ test_expect_failure BUNZIP 'generate tar with huge size' ' test_cmp expect exit-code ' -test_expect_failure BUNZIP,TAR_HUGE 'system tar can read our huge size' ' +test_expect_success BUNZIP,TAR_HUGE 'system tar can read our huge size' ' echo 68719476737 >expect && tar_info huge.tar | cut -d" " -f1 >actual && test_cmp expect actual -- 2.9.0.217.g096ca68 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB 2016-06-23 23:21 ` [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB Jeff King @ 2016-06-24 19:01 ` Junio C Hamano 2016-06-24 19:10 ` Jeff King 0 siblings, 1 reply; 61+ messages in thread From: Junio C Hamano @ 2016-06-24 19:01 UTC (permalink / raw) To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson Jeff King <peff@peff.net> writes: > If the size was 64GB or greater, then we actually overflowed > digits into the mtime field, meaning our value was was was was > effectively right-shifted by those lost octal digits. And > this patch is again a strict improvement over that. > diff --git a/archive-tar.c b/archive-tar.c > index cb99df2..274bdfa 100644 > --- a/archive-tar.c > +++ b/archive-tar.c > @@ -137,6 +137,20 @@ static void strbuf_append_ext_header(struct strbuf *sb, const char *keyword, > strbuf_addch(sb, '\n'); > } > > +/* > + * Like strbuf_append_ext_header, but for numeric values. > + */ > +static void strbuf_append_ext_header_uint(struct strbuf *sb, > + const char *keyword, > + uintmax_t value) > +{ > + char buf[40]; /* big enough for 2^128 in decimal, plus NUL */ > + int len; > + > + len = xsnprintf(buf, sizeof(buf), "%"PRIuMAX, value); > + strbuf_append_ext_header(sb, keyword, buf, len); > +} > + > static unsigned int ustar_header_chksum(const struct ustar_header *header) > { > const unsigned char *p = (const unsigned char *)header; > @@ -208,7 +222,7 @@ static int write_tar_entry(struct archiver_args *args, > struct ustar_header header; > struct strbuf ext_header = STRBUF_INIT; > unsigned int old_mode = mode; > - unsigned long size; > + unsigned long size, size_in_header; > void *buffer; > int err = 0; > > @@ -267,7 +281,13 @@ static int write_tar_entry(struct archiver_args *args, > memcpy(header.linkname, buffer, size); > } > > - prepare_header(args, &header, mode, size); > + size_in_header = size; > + if (S_ISREG(mode) && size > 077777777777UL) { Want a symbolic constant with a comment that says why you have eleven 7's? > + size_in_header = 0; > + strbuf_append_ext_header_uint(&ext_header, "size", size); > + } > + > + prepare_header(args, &header, mode, size_in_header); The change itself makes sense. Thanks. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB 2016-06-24 19:01 ` Junio C Hamano @ 2016-06-24 19:10 ` Jeff King 2016-06-24 19:45 ` Junio C Hamano 0 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-24 19:10 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 12:01:16PM -0700, Junio C Hamano wrote: > > @@ -267,7 +281,13 @@ static int write_tar_entry(struct archiver_args *args, > > memcpy(header.linkname, buffer, size); > > } > > > > - prepare_header(args, &header, mode, size); > > + size_in_header = size; > > + if (S_ISREG(mode) && size > 077777777777UL) { > > Want a symbolic constant with a comment that says why you have > eleven 7's? I tried instead to make sure we only mention it once to avoid a symbolic constant (even though the same constant appears in the next patch, too, it would be a mistake to give them the same name; they just happen to be the same size). So if anything, I would put a comment here, explaining that ustar cannot handle anything larger than this, and POSIX mandates it (but I didn't because the commit message already goes into much more detail). -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB 2016-06-24 19:10 ` Jeff King @ 2016-06-24 19:45 ` Junio C Hamano 2016-06-24 19:46 ` Jeff King 0 siblings, 1 reply; 61+ messages in thread From: Junio C Hamano @ 2016-06-24 19:45 UTC (permalink / raw) To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson Jeff King <peff@peff.net> writes: > So if anything, I would put a comment here, explaining that ustar cannot > handle anything larger than this, and POSIX mandates it (but I didn't > because the commit message already goes into much more detail). That sounds like a good thing to do. Not everybody runs "blame" to find the commit whose log message she must read before touching a line. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB 2016-06-24 19:45 ` Junio C Hamano @ 2016-06-24 19:46 ` Jeff King 0 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-24 19:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 12:45:05PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > So if anything, I would put a comment here, explaining that ustar cannot > > handle anything larger than this, and POSIX mandates it (but I didn't > > because the commit message already goes into much more detail). > > That sounds like a good thing to do. Not everybody runs "blame" to > find the commit whose log message she must read before touching a > line. Well, they should. My commit messages are works of literature. ;) I'll add a comment. -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime 2016-06-23 23:15 ` [PATCH v3] " Jeff King 2016-06-23 23:20 ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King 2016-06-23 23:21 ` [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB Jeff King @ 2016-06-23 23:21 ` Jeff King 2016-06-24 19:06 ` Junio C Hamano 2016-06-23 23:21 ` [PATCH v3 4/4] archive-tar: drop return value Jeff King 3 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-23 23:21 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson The ustar format represents timestamps as seconds since the epoch, but only has room to store 11 octal digits. To express anything larger, we need to use an extended header. This is exactly the same case we fixed for the size field in the previous commit, and the solution here follows the same pattern. This is even mentioned as an issue in f2f0267 (archive-tar: use xsnprintf for trivial formatting, 2015-09-24), but since it only affected things far in the future, it wasn't deemed worth dealing with. But note that my calculations claiming thousands of years were off there; because our xsnprintf produces a NUL byte, we only have until the year 2242 to fix this. Given that this is just around the corner (geologically speaking, anyway), and because it's easy to fix, let's just make it work. Unlike the previous fix for "size", where we had to write an individual extended header for each file, we can write one global header (since we have only one mtime for the whole archive). There's a slight bit of trickiness there. We may already be writing a global header with a "comment" field for the commit sha1. So we need to write our new field into the same header. To do this, we push the decision of whether to write such a header down into write_global_extended_header(), which will now assemble the header as it sees fit, and will return early if we have nothing to write (in practice, we'll only have a large mtime if it comes from a commit, but this makes it also work if you set your system clock ahead such that time() returns a huge value). Note that we don't (and never did) handle negative timestamps (i.e., before 1970). This would probably not be too hard to support in the same way, but since git does not support negative timestamps at all, I didn't bother here. After writing the extended header, we munge the timestamp in the ustar headers to the maximum-allowable size. This is wrong, but it's the least-wrong thing we can provide to a tar implementation that doesn't understand pax headers (it's also what GNU tar does). Helped-by: René Scharfe <l.s.r@web.de> Signed-off-by: Jeff King <peff@peff.net> --- archive-tar.c | 16 +++++++++++++--- t/t5000-tar-tree.sh | 4 ++-- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index 274bdfa..0bb164c 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -317,7 +317,18 @@ static int write_global_extended_header(struct archiver_args *args) unsigned int mode; int err = 0; - strbuf_append_ext_header(&ext_header, "comment", sha1_to_hex(sha1), 40); + if (sha1) + strbuf_append_ext_header(&ext_header, "comment", + sha1_to_hex(sha1), 40); + if (args->time > 077777777777UL) { + strbuf_append_ext_header_uint(&ext_header, "mtime", + args->time); + args->time = 077777777777UL; + } + + if (!ext_header.len) + return 0; + memset(&header, 0, sizeof(header)); *header.typeflag = TYPEFLAG_GLOBAL_HEADER; mode = 0100666; @@ -402,8 +413,7 @@ static int write_tar_archive(const struct archiver *ar, { int err = 0; - if (args->commit_sha1) - err = write_global_extended_header(args); + err = write_global_extended_header(args); if (!err) err = write_archive_entries(args, write_tar_entry); if (!err) diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 79dbc88..07e0bdc 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -382,11 +382,11 @@ test_expect_success 'set up repository with far-future commit' ' git commit -m "tempori parendum" ' -test_expect_failure 'generate tar with future mtime' ' +test_expect_success 'generate tar with future mtime' ' git archive HEAD >future.tar ' -test_expect_failure TAR_HUGE 'tar can encode dates far in future' ' +test_expect_success TAR_HUGE 'tar can encode dates far in future' ' echo 4147 >expect && tar_info future.tar | cut -d" " -f2 >actual && test_cmp expect actual -- 2.9.0.217.g096ca68 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime 2016-06-23 23:21 ` [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime Jeff King @ 2016-06-24 19:06 ` Junio C Hamano 2016-06-24 19:16 ` Jeff King 0 siblings, 1 reply; 61+ messages in thread From: Junio C Hamano @ 2016-06-24 19:06 UTC (permalink / raw) To: Jeff King; +Cc: git, René Scharfe, Robin H. Johnson Jeff King <peff@peff.net> writes: > There's a slight bit of trickiness there. We may already be > ... > After writing the extended header, we munge the timestamp in > the ustar headers to the maximum-allowable size. This is > wrong, but it's the least-wrong thing we can provide to a > tar implementation that doesn't understand pax headers (it's > also what GNU tar does). The above looks very sensible design, and its implementation is surprisingly compact. Very nicely done. > Helped-by: René Scharfe <l.s.r@web.de> > Signed-off-by: Jeff King <peff@peff.net> > --- > archive-tar.c | 16 +++++++++++++--- > t/t5000-tar-tree.sh | 4 ++-- > 2 files changed, 15 insertions(+), 5 deletions(-) > > diff --git a/archive-tar.c b/archive-tar.c > index 274bdfa..0bb164c 100644 > --- a/archive-tar.c > +++ b/archive-tar.c > @@ -317,7 +317,18 @@ static int write_global_extended_header(struct archiver_args *args) > unsigned int mode; > int err = 0; > > - strbuf_append_ext_header(&ext_header, "comment", sha1_to_hex(sha1), 40); > + if (sha1) > + strbuf_append_ext_header(&ext_header, "comment", > + sha1_to_hex(sha1), 40); > + if (args->time > 077777777777UL) { > + strbuf_append_ext_header_uint(&ext_header, "mtime", > + args->time); > + args->time = 077777777777UL; > + } > + > + if (!ext_header.len) > + return 0; Another symbolic constant to explain this, e.g. TAR_TIME_LIMIT, may want to exist. Thanks. ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime 2016-06-24 19:06 ` Junio C Hamano @ 2016-06-24 19:16 ` Jeff King 0 siblings, 0 replies; 61+ messages in thread From: Jeff King @ 2016-06-24 19:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 12:06:59PM -0700, Junio C Hamano wrote: > > + if (args->time > 077777777777UL) { > > + strbuf_append_ext_header_uint(&ext_header, "mtime", > > + args->time); > > + args->time = 077777777777UL; > > + } > > + > > + if (!ext_header.len) > > + return 0; > > Another symbolic constant to explain this, e.g. TAR_TIME_LIMIT, may > want to exist. This one at least appears twice. I think one of the reasons I am slightly resistant to a symbolic constant is that it tempts people to think that it's OK to change it. It's not. These values are mandated by POSIX, and must match the size of the ustar header field. So the least-repetitive thing would be to define it as: (1UL << (1 + (3 * (sizeof(ustar_header.mtime) - 1)))) - 1 That's pretty horrible to read, but if wrapped in a symbolic constant, at least people would think twice before touching it. ;) -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* [PATCH v3 4/4] archive-tar: drop return value 2016-06-23 23:15 ` [PATCH v3] " Jeff King ` (2 preceding siblings ...) 2016-06-23 23:21 ` [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime Jeff King @ 2016-06-23 23:21 ` Jeff King 2016-06-24 11:49 ` Remi Galan Alfonso 3 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-23 23:21 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, René Scharfe, Robin H. Johnson We never do any error checks, and so never return anything but "0". Let's just drop this to simplify the code. Signed-off-by: Jeff King <peff@peff.net> --- I wasn't sure if this was perhaps kept as an interface decision, in case the function grew errors later on. If so, it can still drop the "err" variable internally. :) archive-tar.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index 0bb164c..903b74d 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -309,13 +309,12 @@ static int write_tar_entry(struct archiver_args *args, return err; } -static int write_global_extended_header(struct archiver_args *args) +static void write_global_extended_header(struct archiver_args *args) { const unsigned char *sha1 = args->commit_sha1; struct strbuf ext_header = STRBUF_INIT; struct ustar_header header; unsigned int mode; - int err = 0; if (sha1) strbuf_append_ext_header(&ext_header, "comment", @@ -327,7 +326,7 @@ static int write_global_extended_header(struct archiver_args *args) } if (!ext_header.len) - return 0; + return; memset(&header, 0, sizeof(header)); *header.typeflag = TYPEFLAG_GLOBAL_HEADER; @@ -337,7 +336,6 @@ static int write_global_extended_header(struct archiver_args *args) write_blocked(&header, sizeof(header)); write_blocked(ext_header.buf, ext_header.len); strbuf_release(&ext_header); - return err; } static struct archiver **tar_filters; @@ -413,7 +411,7 @@ static int write_tar_archive(const struct archiver *ar, { int err = 0; - err = write_global_extended_header(args); + write_global_extended_header(args); if (!err) err = write_archive_entries(args, write_tar_entry); if (!err) -- 2.9.0.217.g096ca68 ^ permalink raw reply related [flat|nested] 61+ messages in thread
* Re: [PATCH v3 4/4] archive-tar: drop return value 2016-06-23 23:21 ` [PATCH v3 4/4] archive-tar: drop return value Jeff King @ 2016-06-24 11:49 ` Remi Galan Alfonso 2016-06-24 13:13 ` Jeff King 0 siblings, 1 reply; 61+ messages in thread From: Remi Galan Alfonso @ 2016-06-24 11:49 UTC (permalink / raw) To: Jeff King; +Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson Hi Peff, Jeff King <peff@peff.net> writes: > @@ -413,7 +411,7 @@ static int write_tar_archive(const struct archiver *ar, > { > int err = 0; > > - err = write_global_extended_header(args); > + write_global_extended_header(args); > if (!err) > err = write_archive_entries(args, write_tar_entry); If we drop the error code from 'write_global_extended_header' then the above 'if (!err)' becomes useless (always evaluates to 'true' since 'err' is set to '0'). > if (!err) Thanks, Rémi ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 4/4] archive-tar: drop return value 2016-06-24 11:49 ` Remi Galan Alfonso @ 2016-06-24 13:13 ` Jeff King 2016-06-24 19:10 ` Junio C Hamano 0 siblings, 1 reply; 61+ messages in thread From: Jeff King @ 2016-06-24 13:13 UTC (permalink / raw) To: Remi Galan Alfonso Cc: git, Junio C Hamano, René Scharfe, Robin H. Johnson On Fri, Jun 24, 2016 at 01:49:24PM +0200, Remi Galan Alfonso wrote: > Hi Peff, > > Jeff King <peff@peff.net> writes: > > @@ -413,7 +411,7 @@ static int write_tar_archive(const struct archiver *ar, > > { > > int err = 0; > > > > - err = write_global_extended_header(args); > > + write_global_extended_header(args); > > if (!err) > > err = write_archive_entries(args, write_tar_entry); > > If we drop the error code from 'write_global_extended_header' then the > above 'if (!err)' becomes useless (always evaluates to 'true' since > 'err' is set to '0'). Thanks, you're right. I wondered if we could drop "err" entirely, but write_archive_entries() does indeed have some error code paths (everybody uses write_or_die, but we return an error for things like unknown file types). -Peff ^ permalink raw reply [flat|nested] 61+ messages in thread
* Re: [PATCH v3 4/4] archive-tar: drop return value 2016-06-24 13:13 ` Jeff King @ 2016-06-24 19:10 ` Junio C Hamano 0 siblings, 0 replies; 61+ messages in thread From: Junio C Hamano @ 2016-06-24 19:10 UTC (permalink / raw) To: Jeff King; +Cc: Remi Galan Alfonso, git, René Scharfe, Robin H. Johnson Jeff King <peff@peff.net> writes: > On Fri, Jun 24, 2016 at 01:49:24PM +0200, Remi Galan Alfonso wrote: > >> Hi Peff, >> >> Jeff King <peff@peff.net> writes: >> > @@ -413,7 +411,7 @@ static int write_tar_archive(const struct archiver *ar, >> > { >> > int err = 0; >> > >> > - err = write_global_extended_header(args); >> > + write_global_extended_header(args); >> > if (!err) >> > err = write_archive_entries(args, write_tar_entry); >> >> If we drop the error code from 'write_global_extended_header' then the >> above 'if (!err)' becomes useless (always evaluates to 'true' since >> 'err' is set to '0'). > > Thanks, you're right. > > I wondered if we could drop "err" entirely, but write_archive_entries() > does indeed have some error code paths (everybody uses write_or_die, but > we return an error for things like unknown file types). You could if you did this ;-) Which I do not necessarily think is a good idea. This may be easier to read write_global_extended_header(args) err = write_archive_entries(args, write_tar_entry); if (!err) write_trailer(); return err; though. archive-tar.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index 903b74d..eed2c96 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -412,11 +412,9 @@ static int write_tar_archive(const struct archiver *ar, int err = 0; write_global_extended_header(args); - if (!err) - err = write_archive_entries(args, write_tar_entry); - if (!err) - write_trailer(); - return err; + + return (write_archive_entries(args, write_tar_entry) || + write_trailer()); } static int write_tar_filter_archive(const struct archiver *ar, ^ permalink raw reply related [flat|nested] 61+ messages in thread
end of thread, other threads:[~2016-06-24 23:22 UTC | newest] Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-16 4:35 [PATCH 0/2] friendlier handling of overflows in archive-tar Jeff King 2016-06-16 4:37 ` [PATCH 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King 2016-06-20 22:54 ` René Scharfe 2016-06-21 15:59 ` Jeff King 2016-06-21 16:02 ` Jeff King 2016-06-21 20:42 ` René Scharfe 2016-06-21 20:57 ` René Scharfe 2016-06-21 21:04 ` Jeff King 2016-06-22 5:46 ` René Scharfe 2016-06-21 21:02 ` Jeff King 2016-06-22 5:46 ` René Scharfe 2016-06-23 19:21 ` Jeff King 2016-06-21 20:54 ` René Scharfe 2016-06-21 19:44 ` Robin H. Johnson 2016-06-21 20:57 ` Jeff King 2016-06-16 4:37 ` [PATCH 2/2] archive-tar: write extended headers for far-future mtime Jeff King 2016-06-20 22:54 ` René Scharfe 2016-06-22 5:46 ` René Scharfe 2016-06-23 19:22 ` Jeff King 2016-06-23 21:38 ` René Scharfe 2016-06-23 21:39 ` Jeff King 2016-06-16 17:55 ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano 2016-06-21 16:16 ` Jeff King 2016-06-21 16:16 ` [PATCH v2 1/2] archive-tar: write extended headers for file sizes >= 8GB Jeff King 2016-06-21 16:17 ` [PATCH v2 2/2] archive-tar: write extended headers for far-future mtime Jeff King 2016-06-21 18:43 ` [PATCH 0/2] friendlier handling of overflows in archive-tar Junio C Hamano 2016-06-23 23:15 ` [PATCH v3] " Jeff King 2016-06-23 23:20 ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Jeff King 2016-06-23 23:31 ` Jeff King 2016-06-24 16:38 ` Johannes Sixt 2016-06-24 16:46 ` Jeff King 2016-06-24 17:05 ` Johannes Sixt 2016-06-24 19:39 ` [PATCH 0/4] portable signal-checking in tests Jeff King 2016-06-24 19:43 ` [PATCH 1/4] tests: factor portable signal check out of t0005 Jeff King 2016-06-24 20:52 ` Johannes Sixt 2016-06-24 21:05 ` Jeff King 2016-06-24 21:32 ` Johannes Sixt 2016-06-24 19:44 ` [PATCH 2/4] t0005: use test_match_signal as appropriate Jeff King 2016-06-24 19:45 ` [PATCH 3/4] test_must_fail: use test_match_signal Jeff King 2016-06-24 19:45 ` [PATCH 4/4] t/lib-git-daemon: " Jeff King 2016-06-24 19:48 ` [PATCH 0/4] portable signal-checking in tests Jeff King 2016-06-24 18:56 ` [PATCH v3 1/4] t5000: test tar files that overflow ustar headers Junio C Hamano 2016-06-24 19:07 ` Jeff King 2016-06-24 19:44 ` Junio C Hamano 2016-06-24 20:58 ` Eric Sunshine 2016-06-24 21:09 ` Jeff King 2016-06-24 20:58 ` Jeff King 2016-06-24 22:41 ` Junio C Hamano 2016-06-24 23:22 ` Jeff King 2016-06-23 23:21 ` [PATCH v3 2/4] archive-tar: write extended headers for file sizes >= 8GB Jeff King 2016-06-24 19:01 ` Junio C Hamano 2016-06-24 19:10 ` Jeff King 2016-06-24 19:45 ` Junio C Hamano 2016-06-24 19:46 ` Jeff King 2016-06-23 23:21 ` [PATCH v3 3/4] archive-tar: write extended headers for far-future mtime Jeff King 2016-06-24 19:06 ` Junio C Hamano 2016-06-24 19:16 ` Jeff King 2016-06-23 23:21 ` [PATCH v3 4/4] archive-tar: drop return value Jeff King 2016-06-24 11:49 ` Remi Galan Alfonso 2016-06-24 13:13 ` Jeff King 2016-06-24 19:10 ` Junio C Hamano
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.