* [PATCH 1/2] archive: factor out write phase of tar format @ 2011-06-14 18:17 Jeff King 2011-06-14 18:18 ` [PATCH 2/2] archive: support gzipped tar files Jeff King 0 siblings, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-14 18:17 UTC (permalink / raw) To: git; +Cc: René Scharfe, git-dev The code to output the tar format for git-archive always assumes we are writing directly to stdout. Let's factor out that bit of code so that we can put an in-process gzip filter in place. Signed-off-by: Jeff King <peff@peff.net> --- archive-tar.c | 22 +++++++++++++++++----- 1 files changed, 17 insertions(+), 5 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index cee06ce..b1aea87 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -10,6 +10,7 @@ static char block[BLOCKSIZE]; static unsigned long offset; +static void (*output)(const char *buf, unsigned long size); static int tar_umask = 002; @@ -17,7 +18,7 @@ static int tar_umask = 002; static void write_if_needed(void) { if (offset == BLOCKSIZE) { - write_or_die(1, block, BLOCKSIZE); + output(block, BLOCKSIZE); offset = 0; } } @@ -42,7 +43,7 @@ static void write_blocked(const void *data, unsigned long size) write_if_needed(); } while (size >= BLOCKSIZE) { - write_or_die(1, buf, BLOCKSIZE); + output(buf, BLOCKSIZE); size -= BLOCKSIZE; buf += BLOCKSIZE; } @@ -66,10 +67,10 @@ static void write_trailer(void) { int tail = BLOCKSIZE - offset; memset(block + offset, 0, tail); - write_or_die(1, block, BLOCKSIZE); + output(block, BLOCKSIZE); if (tail < 2 * RECORDSIZE) { memset(block, 0, offset); - write_or_die(1, block, BLOCKSIZE); + output(block, BLOCKSIZE); } } @@ -234,7 +235,7 @@ static int git_tar_config(const char *var, const char *value, void *cb) return git_default_config(var, value, cb); } -int write_tar_archive(struct archiver_args *args) +static int write_tar_archive_internal(struct archiver_args *args) { int err = 0; @@ -248,3 +249,14 @@ int write_tar_archive(struct archiver_args *args) write_trailer(); return err; } + +static void output_write(const char *buf, unsigned long len) +{ + write_or_die(1, buf, len); +} + +int write_tar_archive(struct archiver_args *args) +{ + output = output_write; + return write_tar_archive_internal(args); +} -- 1.7.6.rc1.37.g6d4ed.dirty ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 2/2] archive: support gzipped tar files 2011-06-14 18:17 [PATCH 1/2] archive: factor out write phase of tar format Jeff King @ 2011-06-14 18:18 ` Jeff King 2011-06-14 19:25 ` J.H. ` (2 more replies) 0 siblings, 3 replies; 56+ messages in thread From: Jeff King @ 2011-06-14 18:18 UTC (permalink / raw) To: git; +Cc: René Scharfe, git-dev git-archive already supports the creation of tar files. For local cases, one can simply pipe the output to gzip, and having git-archive do the gzip is a minor convenience. However, when running git-archive against a remote site, having the remote side do the compression can save considerable bandwidth. Service providers could always wrap git-archive to provide that functionality, but this makes it much simpler. Creating gzipped archives is of course more expensive than regular tar archives; however, the amount of work should be comparable to that of creating a zip file, which is already possible. So there should be no new security implications with respect to creating load on a remote server. Signed-off-by: Jeff King <peff@peff.net> --- Documentation/git-archive.txt | 17 +++++++++++++++-- archive-tar.c | 27 +++++++++++++++++++++++++++ archive.c | 1 + archive.h | 1 + builtin/archive.c | 6 ++++++ t/t5000-tar-tree.sh | 26 ++++++++++++++++++++++++++ 6 files changed, 76 insertions(+), 2 deletions(-) diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt index 9c750e2..963bec4 100644 --- a/Documentation/git-archive.txt +++ b/Documentation/git-archive.txt @@ -34,10 +34,11 @@ OPTIONS ------- --format=<fmt>:: - Format of the resulting archive: 'tar' or 'zip'. If this option + Format of the resulting archive: 'tar', 'tgz', or 'zip'. If this option is not given, and the output file is specified, the format is inferred from the filename if possible (e.g. writing to "foo.zip" - makes the output to be in the zip format). Otherwise the output + creates the output in the zip format; "foo.tgz" or "foo.tar.gz" + creates the output in the tgz format). Otherwise the output format is `tar`. -l:: @@ -89,6 +90,12 @@ zip Highest and slowest compression level. You can specify any number from 1 to 9 to adjust compression speed and ratio. +tgz +~~~ +-9:: + Highest and slowest compression level. You can specify any + number from 1 to 9 to adjust compression speed and ratio. + CONFIGURATION ------------- @@ -133,6 +140,12 @@ git archive --format=tar --prefix=git-1.4.0/ v1.4.0 | gzip >git-1.4.0.tar.gz:: Create a compressed tarball for v1.4.0 release. +git archive --prefix=git-1.4.0/ -o git-1.4.0.tar.gz v1.4.0 + + Same as above, except that we use the internal gzip. Note that + the output format is inferred by the extension of the output + file. + git archive --format=tar --prefix=git-1.4.0/ v1.4.0{caret}\{tree\} | gzip >git-1.4.0.tar.gz:: Create a compressed tarball for v1.4.0 release, but without a diff --git a/archive-tar.c b/archive-tar.c index b1aea87..86c8aa9 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -260,3 +260,30 @@ int write_tar_archive(struct archiver_args *args) output = output_write; return write_tar_archive_internal(args); } + +static gzFile gz_file; +static void output_gz(const char *buf, unsigned long len) +{ + if (!gzwrite(gz_file, buf, len)) + die("unable to write compressed stream: %s", + gzerror(gz_file, NULL)); +} + +int write_tgz_archive(struct archiver_args *args) +{ + int r; + + gz_file = gzdopen(1, "w"); + if (!gz_file) + die_errno("unable to open compressed stream"); + gzsetparams(gz_file, args->compression_level, Z_DEFAULT_STRATEGY); + + output = output_gz; + r = write_tar_archive_internal(args); + if (r == 0) { + int zerr = gzclose(gz_file); + if (zerr != Z_OK) + die("unable to write compressed stream (err=%d)", zerr); + } + return r; +} diff --git a/archive.c b/archive.c index 42f2d2f..6073a8d 100644 --- a/archive.c +++ b/archive.c @@ -23,6 +23,7 @@ static const struct archiver { } archivers[] = { { "tar", write_tar_archive }, { "zip", write_zip_archive, USES_ZLIB_COMPRESSION }, + { "tgz", write_tgz_archive, USES_ZLIB_COMPRESSION }, }; static void format_subst(const struct commit *commit, diff --git a/archive.h b/archive.h index 038ac35..c1bf72e 100644 --- a/archive.h +++ b/archive.h @@ -23,6 +23,7 @@ typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const unsign */ extern int write_tar_archive(struct archiver_args *); extern int write_zip_archive(struct archiver_args *); +extern int write_tgz_archive(struct archiver_args *); extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry); extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix); diff --git a/builtin/archive.c b/builtin/archive.c index b14eaba..4f60af5 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -71,6 +71,12 @@ static const char *format_from_name(const char *filename) ext++; if (!strcasecmp(ext, "zip")) return "--format=zip"; + if (!strcasecmp(ext, "tgz")) + return "--format=tgz"; + if (!strcasecmp(ext, "gz") && + ext - 4 >= filename && + !strcasecmp(ext - 4, "tar.gz")) + return "--format=tgz"; return NULL; } diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index cff1b3e..faf2784 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -26,6 +26,7 @@ commit id embedding: . ./test-lib.sh UNZIP=${UNZIP:-unzip} +GUNZIP=${GUNZIP:-gunzip} SUBSTFORMAT=%H%n @@ -252,4 +253,29 @@ test_expect_success 'git-archive --prefix=olde-' ' test -f h/olde-a/bin/sh ' +test_expect_success 'git archive --format=tgz' ' + git archive --format=tgz HEAD >e.tgz +' + +test_expect_success 'infer tgz from .tgz filename' ' + git archive --output=e1.tgz HEAD && + test_cmp e.tgz e1.tgz +' + +test_expect_success 'infer tgz from .tar.gz filename' ' + git archive --output=e2.tar.gz HEAD && + test_cmp e.tgz e2.tar.gz +' + +if $GUNZIP --version >/dev/null 2>&1; then + test_set_prereq GUNZIP +else + say "Skipping tgz tests because gunzip was not found" +fi + +test_expect_success GUNZIP 'extract tgz file' ' + gunzip -c <e.tgz >e.tar && + test_cmp b.tar e.tar +' + test_done -- 1.7.6.rc1.37.g6d4ed.dirty ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 2/2] archive: support gzipped tar files 2011-06-14 18:18 ` [PATCH 2/2] archive: support gzipped tar files Jeff King @ 2011-06-14 19:25 ` J.H. 2011-06-14 19:30 ` Jeff King 2011-06-14 19:39 ` René Scharfe 2011-06-14 20:30 ` [PATCH 2/2] archive: support gzipped tar files Junio C Hamano 2 siblings, 1 reply; 56+ messages in thread From: J.H. @ 2011-06-14 19:25 UTC (permalink / raw) To: Jeff King; +Cc: git, René Scharfe, git-dev On 06/14/2011 11:18 AM, Jeff King wrote: > git-archive already supports the creation of tar files. For > local cases, one can simply pipe the output to gzip, and > having git-archive do the gzip is a minor convenience. > > However, when running git-archive against a remote site, > having the remote side do the compression can save > considerable bandwidth. Service providers could always wrap > git-archive to provide that functionality, but this makes it > much simpler. > > Creating gzipped archives is of course more expensive than > regular tar archives; however, the amount of work should be > comparable to that of creating a zip file, which is already > possible. So there should be no new security implications > with respect to creating load on a remote server. Would it make sense to make this a little more generic and support bz2 and xz as well? - John 'Warthog9' Hawley ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/2] archive: support gzipped tar files 2011-06-14 19:25 ` J.H. @ 2011-06-14 19:30 ` Jeff King 0 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-14 19:30 UTC (permalink / raw) To: J.H.; +Cc: git, René Scharfe, git-dev On Tue, Jun 14, 2011 at 12:25:17PM -0700, J.H. wrote: > On 06/14/2011 11:18 AM, Jeff King wrote: > > git-archive already supports the creation of tar files. For > > local cases, one can simply pipe the output to gzip, and > > having git-archive do the gzip is a minor convenience. > > > > However, when running git-archive against a remote site, > > having the remote side do the compression can save > > considerable bandwidth. Service providers could always wrap > > git-archive to provide that functionality, but this makes it > > much simpler. > > > > Creating gzipped archives is of course more expensive than > > regular tar archives; however, the amount of work should be > > comparable to that of creating a zip file, which is already > > possible. So there should be no new security implications > > with respect to creating load on a remote server. > > Would it make sense to make this a little more generic and support bz2 > and xz as well? I think it's a great idea if somebody wants to do it on top of my patch. They should be able to hook into the tar code just like I did in 2/2. But they will need library support that we don't already have in git. Doing gz was easy because we already require zlib. We could also support them by piping to an external compressor, which wouldn't be too hard (you could do it for gzip, too, but given that we have zlib, this was much simpler). There is a slight hitch with the "--list" command, though. Should git-archive advertise these formats, and if so, how should it know that they are available? -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/2] archive: support gzipped tar files 2011-06-14 18:18 ` [PATCH 2/2] archive: support gzipped tar files Jeff King 2011-06-14 19:25 ` J.H. @ 2011-06-14 19:39 ` René Scharfe 2011-06-14 20:14 ` Jeff King 2011-06-14 20:30 ` [PATCH 2/2] archive: support gzipped tar files Junio C Hamano 2 siblings, 1 reply; 56+ messages in thread From: René Scharfe @ 2011-06-14 19:39 UTC (permalink / raw) To: Jeff King; +Cc: git, git-dev Am 14.06.2011 20:18, schrieb Jeff King: > git-archive already supports the creation of tar files. For > local cases, one can simply pipe the output to gzip, and > having git-archive do the gzip is a minor convenience. > > However, when running git-archive against a remote site, > having the remote side do the compression can save > considerable bandwidth. Service providers could always wrap > git-archive to provide that functionality, but this makes it > much simpler. That's a good point and one that was overlooked when this topic came up earlier (see http://kerneltrap.org/mailarchive/git/2009/9/10/11507 and http://kerneltrap.org/mailarchive/git/2009/9/11/11577). That implementation was ... heavier than yours, but it also avoided an unnecessary level of buffering. I wonder if it makes a measurable difference, though. > Creating gzipped archives is of course more expensive than > regular tar archives; however, the amount of work should be > comparable to that of creating a zip file, which is already > possible. So there should be no new security implications > with respect to creating load on a remote server. > > Signed-off-by: Jeff King <peff@peff.net> > --- > Documentation/git-archive.txt | 17 +++++++++++++++-- > archive-tar.c | 27 +++++++++++++++++++++++++++ > archive.c | 1 + > archive.h | 1 + > builtin/archive.c | 6 ++++++ > t/t5000-tar-tree.sh | 26 ++++++++++++++++++++++++++ > 6 files changed, 76 insertions(+), 2 deletions(-) > > diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt > index 9c750e2..963bec4 100644 > --- a/Documentation/git-archive.txt > +++ b/Documentation/git-archive.txt > @@ -34,10 +34,11 @@ OPTIONS > ------- > > --format=<fmt>:: > - Format of the resulting archive: 'tar' or 'zip'. If this option > + Format of the resulting archive: 'tar', 'tgz', or 'zip'. If this option > is not given, and the output file is specified, the format is > inferred from the filename if possible (e.g. writing to "foo.zip" > - makes the output to be in the zip format). Otherwise the output > + creates the output in the zip format; "foo.tgz" or "foo.tar.gz" > + creates the output in the tgz format). Otherwise the output > format is `tar`. > > -l:: > @@ -89,6 +90,12 @@ zip > Highest and slowest compression level. You can specify any > number from 1 to 9 to adjust compression speed and ratio. > > +tgz > +~~~ > +-9:: > + Highest and slowest compression level. You can specify any > + number from 1 to 9 to adjust compression speed and ratio. > + > > CONFIGURATION > ------------- > @@ -133,6 +140,12 @@ git archive --format=tar --prefix=git-1.4.0/ v1.4.0 | gzip >git-1.4.0.tar.gz:: > > Create a compressed tarball for v1.4.0 release. > > +git archive --prefix=git-1.4.0/ -o git-1.4.0.tar.gz v1.4.0 > + > + Same as above, except that we use the internal gzip. Note that > + the output format is inferred by the extension of the output > + file. > + > git archive --format=tar --prefix=git-1.4.0/ v1.4.0{caret}\{tree\} | gzip >git-1.4.0.tar.gz:: > > Create a compressed tarball for v1.4.0 release, but without a > diff --git a/archive-tar.c b/archive-tar.c > index b1aea87..86c8aa9 100644 > --- a/archive-tar.c > +++ b/archive-tar.c > @@ -260,3 +260,30 @@ int write_tar_archive(struct archiver_args *args) > output = output_write; > return write_tar_archive_internal(args); > } > + > +static gzFile gz_file; > +static void output_gz(const char *buf, unsigned long len) > +{ > + if (!gzwrite(gz_file, buf, len)) > + die("unable to write compressed stream: %s", > + gzerror(gz_file, NULL)); > +} Does this do the right things when faced with interrupted writes or truncated pipes? I ask because the earlier attempt had a gzwrite_or_die() which did that, but I don't know anymore if that is strictly needed. Oh, and bridging the gap between unsigned long and int was certainly another reason for the existence of this function. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/2] archive: support gzipped tar files 2011-06-14 19:39 ` René Scharfe @ 2011-06-14 20:14 ` Jeff King 2011-06-14 20:45 ` Jeff King 0 siblings, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-14 20:14 UTC (permalink / raw) To: René Scharfe; +Cc: git, git-dev On Tue, Jun 14, 2011 at 09:39:55PM +0200, René Scharfe wrote: > > However, when running git-archive against a remote site, > > having the remote side do the compression can save > > considerable bandwidth. Service providers could always wrap > > git-archive to provide that functionality, but this makes it > > much simpler. > > That's a good point and one that was overlooked when this topic came up > earlier (see http://kerneltrap.org/mailarchive/git/2009/9/10/11507 and > http://kerneltrap.org/mailarchive/git/2009/9/11/11577). Hmph, I should have done my homework better. I totally missed that thread. Yeah, I am unsurprised that doing it in a single process is actually slower. I do think because of the remote issue that we should provide something like this. But we could implement it by piping to an external gzip. That would make us just slightly less portable, but would give us the multi-processor speedup, or even allow using something like pigz. > > +static void output_gz(const char *buf, unsigned long len) > > +{ > > + if (!gzwrite(gz_file, buf, len)) > > + die("unable to write compressed stream: %s", > > + gzerror(gz_file, NULL)); > > +} > > Does this do the right things when faced with interrupted writes or > truncated pipes? I ask because the earlier attempt had a > gzwrite_or_die() which did that, but I don't know anymore if that is > strictly needed. No, I blindly assumed that gzwrite was a little bit smart, but looking at the zlib code, it really is just propagating whatever it got from fwrite. I need to handle both errors and short writes myself. So we do need gzwrite_or_die. > Oh, and bridging the gap between unsigned long and int > was certainly another reason for the existence of this function. Ugh. I correctly saw that it took an unsigned long, but it actually returns the number of bytes written as an int! Nice interface. All of this can go away, though, if we switch to an external process. It's tempting. -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/2] archive: support gzipped tar files 2011-06-14 20:14 ` Jeff King @ 2011-06-14 20:45 ` Jeff King 2011-06-15 22:30 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King 0 siblings, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-14 20:45 UTC (permalink / raw) To: René Scharfe; +Cc: git, git-dev On Tue, Jun 14, 2011 at 04:14:33PM -0400, Jeff King wrote: > Yeah, I am unsurprised that doing it in a single process is actually > slower. I do think because of the remote issue that we should provide > something like this. But we could implement it by piping to an external > gzip. That would make us just slightly less portable, but would give us > the multi-processor speedup, or even allow using something like pigz. So here's a relatively quick implementation of the pipe idea. It just handles .tar.gz, but it would be trivial to do bz2 or other formats, as long as they can act as a stdio filter. The gzip path is not configurable at all. Probably it should read the path and arguments from the config file. In fact, we could even allow arbitrary config like: [tarfilter "tgz"] command = gzip -c extension = tgz extension = tar.gz which also solves the "don't advertise in --list if you don't have it installed problem". At the same time, that is a lot to have to configure for somebody who is not providing remote service and just wants: git archive -o HEAD foo.tar.gz to work out of the box. I think we could probably allow arbitrary config, but provide a few sane, common defaults like gzip and bz2 unless the user specifically turns them off at build time. --- archive-tar.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ archive.c | 1 + archive.h | 1 + builtin/archive.c | 6 ++++++ t/t5000-tar-tree.sh | 26 ++++++++++++++++++++++++++ 5 files changed, 79 insertions(+), 0 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index cee06ce..a77d605 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -4,6 +4,7 @@ #include "cache.h" #include "tar.h" #include "archive.h" +#include "run-command.h" #define RECORDSIZE (512) #define BLOCKSIZE (RECORDSIZE * 20) @@ -248,3 +249,47 @@ int write_tar_archive(struct archiver_args *args) write_trailer(); return err; } + +static int write_tar_to_filter(struct archiver_args *args, const char **argv) +{ + struct child_process filter; + int r; + + memset(&filter, 0, sizeof(filter)); + filter.argv = argv; + filter.in = -1; + + if (start_command(&filter) < 0) + die_errno("unable to start '%s' filter", argv[0]); + close(1); + if (dup2(filter.in, 1) < 0) + die_errno("unable to redirect descriptor"); + close(filter.in); + + r = write_tar_archive(args); + + close(1); + if (finish_command(&filter) != 0) + die("'%s' filter reported error", argv[0]); + + return r; +} + +int write_tgz_archive(struct archiver_args *args) +{ + char compression[4]; + const char *argv[] = { + "gzip", + "-c", + NULL, /* compression level */ + NULL + }; + + if (args->compression_level >= 0) { + snprintf(compression, sizeof(compression), + "-%d", args->compression_level); + argv[2] = compression; + } + + return write_tar_to_filter(args, argv); +} diff --git a/archive.c b/archive.c index 42f2d2f..6073a8d 100644 --- a/archive.c +++ b/archive.c @@ -23,6 +23,7 @@ static const struct archiver { } archivers[] = { { "tar", write_tar_archive }, { "zip", write_zip_archive, USES_ZLIB_COMPRESSION }, + { "tgz", write_tgz_archive, USES_ZLIB_COMPRESSION }, }; static void format_subst(const struct commit *commit, diff --git a/archive.h b/archive.h index 038ac35..c1bf72e 100644 --- a/archive.h +++ b/archive.h @@ -23,6 +23,7 @@ typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const unsign */ extern int write_tar_archive(struct archiver_args *); extern int write_zip_archive(struct archiver_args *); +extern int write_tgz_archive(struct archiver_args *); extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry); extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix); diff --git a/builtin/archive.c b/builtin/archive.c index b14eaba..4f60af5 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -71,6 +71,12 @@ static const char *format_from_name(const char *filename) ext++; if (!strcasecmp(ext, "zip")) return "--format=zip"; + if (!strcasecmp(ext, "tgz")) + return "--format=tgz"; + if (!strcasecmp(ext, "gz") && + ext - 4 >= filename && + !strcasecmp(ext - 4, "tar.gz")) + return "--format=tgz"; return NULL; } diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index cff1b3e..faf2784 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -26,6 +26,7 @@ commit id embedding: . ./test-lib.sh UNZIP=${UNZIP:-unzip} +GUNZIP=${GUNZIP:-gunzip} SUBSTFORMAT=%H%n @@ -252,4 +253,29 @@ test_expect_success 'git-archive --prefix=olde-' ' test -f h/olde-a/bin/sh ' +test_expect_success 'git archive --format=tgz' ' + git archive --format=tgz HEAD >e.tgz +' + +test_expect_success 'infer tgz from .tgz filename' ' + git archive --output=e1.tgz HEAD && + test_cmp e.tgz e1.tgz +' + +test_expect_success 'infer tgz from .tar.gz filename' ' + git archive --output=e2.tar.gz HEAD && + test_cmp e.tgz e2.tar.gz +' + +if $GUNZIP --version >/dev/null 2>&1; then + test_set_prereq GUNZIP +else + say "Skipping tgz tests because gunzip was not found" +fi + +test_expect_success GUNZIP 'extract tgz file' ' + gunzip -c <e.tgz >e.tar && + test_cmp b.tar e.tar +' + test_done -- 1.7.6.rc1.4.g49204 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC/PATCH 0/7] user-configurable git-archive output formats 2011-06-14 20:45 ` Jeff King @ 2011-06-15 22:30 ` Jeff King 2011-06-15 22:31 ` [PATCH 1/7] archive: reorder option parsing and config reading Jeff King ` (8 more replies) 0 siblings, 9 replies; 56+ messages in thread From: Jeff King @ 2011-06-15 22:30 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, J.H., git, git-dev On Tue, Jun 14, 2011 at 04:45:21PM -0400, Jeff King wrote: > The gzip path is not configurable at all. Probably it should read the > path and arguments from the config file. In fact, we could even allow > arbitrary config like: > > [tarfilter "tgz"] > command = gzip -c > extension = tgz > extension = tar.gz Here's a series implementing that. You can configure whatever you want, and it includes builtin gzip configuration by default. You can override to turn it off, or even switch it to run something like pigz instead. My biggest reservation with the patches as-is is that they are very tar-centric and not orthogonal. Specifically, they won't handle: 1. Other streamable archive formats you would want to pipe through compressors. Do any of these actually exist? I guess we could offer "pax" as a format eventually, and it might be like tar with different defaults? I dunno. Fixing this would not be too hard. Instead of these being "tarfilters", they would be "archive filters", and they would chain to some format, defaulting to "tar". Since there is no other format right now, we could even punt on writing most of the code until somebody adds one. But we would want to get the naming of the config options right, since those are user-facing. Maybe "archivefilter" (unfortunately the more readable archive.filter is a little awkward with the way we parse config files)? 2. In theory you might want to plug in external helpers that are not just stream filters, but actually their own container formats (like zip). I think people who want 7zip would want this. But how does git-archive interact with the helper? By definition the data it wants is the set of files, not a single stream. So either: a. We give the helper a temporary exported checkout, and it generates the stream from that. b. We use tar as the lingua franca of streaming file containers, and let the helper deal with converting to its preferred output format. Option (a) seems horribly inefficient on disk I/O. And if we did want to do that, I think it's largely unrelated to this patch series. You can actually do option (b) with this series. In its worst case, you can do the same as (a): just untar into a temporary directory and compress from there. But a well-written helper could convert tar into the output format on the fly. The patches are: [1/7]: archive: reorder option parsing and config reading [2/7]: archive: add user-configurable tar-filter infrastructure [3/7]: archive: support user tar-filters via --format [4/7]: archive: advertise user tar-filters in --list [5/7]: archive: refactor format-guessing from filename [6/7]: archive: match extensions from user-configured formats [7/7]: archive: provide builtin .tar.gz filter -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 1/7] archive: reorder option parsing and config reading 2011-06-15 22:30 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King @ 2011-06-15 22:31 ` Jeff King 2011-06-15 22:33 ` [PATCH 2/7] archive: add user-configurable tar-filter infrastructure Jeff King ` (7 subsequent siblings) 8 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-15 22:31 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, J.H., git, git-dev The archive command does three things during its initialization phase: 1. parse command-line options 2. setup the git directory 3. read config During phase (1), if we see any options that do not require a git directory (like "--list"), we handle them immediately and exit, making it safe to abort step (2) if we are not in a git directory. Step (3) must come after step (2), since the git directory may influence configuration. However, this leaves no possibility of configuration from step (3) impacting the command-line options in step (1) (which is useful, for example, for supporting user-configurable output formats). Instead, let's reorder this to: 1. setup the git directory, if it exists 2. read config 3. parse command-line options 4. if we are not in a git repository, die This should have the same external behavior, but puts configuration before command-line parsing. Signed-off-by: Jeff King <peff@peff.net> --- archive.c | 18 ++++++++++++++---- 1 files changed, 14 insertions(+), 4 deletions(-) diff --git a/archive.c b/archive.c index 42f2d2f..2616676 100644 --- a/archive.c +++ b/archive.c @@ -387,17 +387,27 @@ static int parse_archive_args(int argc, const char **argv, int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix) { + int nongit = 0; const struct archiver *ar = NULL; struct archiver_args args; - argc = parse_archive_args(argc, argv, &ar, &args); if (setup_prefix && prefix == NULL) - prefix = setup_git_directory(); + prefix = setup_git_directory_gently(&nongit); + + git_config(git_default_config, NULL); + + argc = parse_archive_args(argc, argv, &ar, &args); + if (nongit) { + /* + * We know this will die() with an error, so we could just + * die ourselves; but its error message will be more specific + * than what we could write here. + */ + setup_git_directory(); + } parse_treeish_arg(argv, &args, prefix); parse_pathspec_arg(argv + 1, &args); - git_config(git_default_config, NULL); - return ar->write_archive(&args); } -- 1.7.6.rc1.4.g49204 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 2/7] archive: add user-configurable tar-filter infrastructure 2011-06-15 22:30 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King 2011-06-15 22:31 ` [PATCH 1/7] archive: reorder option parsing and config reading Jeff King @ 2011-06-15 22:33 ` Jeff King 2011-06-15 23:33 ` Junio C Hamano 2011-06-15 22:33 ` [PATCH 3/7] archive: support user tar-filters via --format Jeff King ` (6 subsequent siblings) 8 siblings, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-15 22:33 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, J.H., git, git-dev Archive supports two output formats: tar and zip. The tar files are totally uncompressed, and it is up to the user to pipe them through gzip or similar. This is no more than a minor inconvenience when running archive locally. However, for remote calls to upload-archive, having the server do the compression can save a lot of bandwidth. This patch lays the foundation for tar filters; it parses user configuration into an internal representation, but doesn't yet do anything with the result. It also introduces some tests to document the intended usage. Signed-off-by: Jeff King <peff@peff.net> --- Makefile | 1 + archive-tar-filter.c | 112 ++++++++++++++++++++++++++++++++++++++++++++++++++ archive.c | 1 + archive.h | 15 +++++++ t/t5000-tar-tree.sh | 54 ++++++++++++++++++++++++ 5 files changed, 183 insertions(+), 0 deletions(-) create mode 100644 archive-tar-filter.c diff --git a/Makefile b/Makefile index e40ac0c..bd3002f 100644 --- a/Makefile +++ b/Makefile @@ -573,6 +573,7 @@ LIB_OBJS += advice.o LIB_OBJS += alias.o LIB_OBJS += alloc.o LIB_OBJS += archive.o +LIB_OBJS += archive-tar-filter.o LIB_OBJS += archive-tar.o LIB_OBJS += archive-zip.o LIB_OBJS += attr.o diff --git a/archive-tar-filter.c b/archive-tar-filter.c new file mode 100644 index 0000000..211f1df --- /dev/null +++ b/archive-tar-filter.c @@ -0,0 +1,112 @@ +#include "cache.h" +#include "archive.h" + +struct tar_filter *tar_filters; +static struct tar_filter **tar_filters_tail = &tar_filters; + +static struct tar_filter *tar_filter_new(const char *name, int namelen) +{ + struct tar_filter *tf; + tf = xcalloc(1, sizeof(*tf)); + tf->name = xmemdupz(name, namelen); + tf->extensions.strdup_strings = 1; + *tar_filters_tail = tf; + tar_filters_tail = &tf->next; + return tf; +} + +static void tar_filter_free(struct tar_filter *tf) +{ + string_list_clear(&tf->extensions, 0); + free(tf->name); + free(tf->command); + free(tf); +} + +static struct tar_filter *tar_filter_by_namelen(const char *name, + int len) +{ + struct tar_filter *p; + for (p = tar_filters; p; p = p->next) + if (!strncmp(p->name, name, len) && !p->name[len]) + return p; + return NULL; +} + +struct tar_filter *tar_filter_by_name(const char *name) +{ + return tar_filter_by_namelen(name, strlen(name)); +} + +static int tar_filter_config(const char *var, const char *value, void *data) +{ + struct tar_filter *tf; + const char *dot; + const char *name; + const char *type; + int namelen; + + if (prefixcmp(var, "tarfilter.")) + return 0; + dot = strrchr(var, '.'); + if (dot == var + 9) + return 0; + + name = var + 10; + namelen = dot - name; + type = dot + 1; + + tf = tar_filter_by_namelen(name, namelen); + if (!tf) + tf = tar_filter_new(name, namelen); + + if (!strcmp(type, "command")) { + if (!value) + return config_error_nonbool(var); + tf->command = xstrdup(value); + return 0; + } + else if (!strcmp(type, "extension")) { + if (!value) + return config_error_nonbool(var); + string_list_append(&tf->extensions, value); + return 0; + } + else if (!strcmp(type, "compressionlevels")) { + tf->use_compression = git_config_bool(var, value); + return 0; + } + + return 0; +} + +static void remove_filters_without_command(void) +{ + struct tar_filter *p = tar_filters; + struct tar_filter **last = &tar_filters; + + while (p) { + if (p->command && *p->command) + last = &p->next; + else { + *last = p->next; + tar_filter_free(p); + } + p = *last; + } +} + +/* + * We don't want to load twice, since some of our + * values actually append rather than overwrite. + */ +static int tar_filter_config_loaded; +extern void tar_filter_load_config(void) +{ + if (tar_filter_config_loaded) + return; + tar_filter_config_loaded = 1; + + git_config(tar_filter_config, NULL); + remove_filters_without_command(); +} diff --git a/archive.c b/archive.c index 2616676..2ed9259 100644 --- a/archive.c +++ b/archive.c @@ -395,6 +395,7 @@ int write_archive(int argc, const char **argv, const char *prefix, prefix = setup_git_directory_gently(&nongit); git_config(git_default_config, NULL); + tar_filter_load_config(); argc = parse_archive_args(argc, argv, &ar, &args); if (nongit) { diff --git a/archive.h b/archive.h index 038ac35..8386c46 100644 --- a/archive.h +++ b/archive.h @@ -1,6 +1,8 @@ #ifndef ARCHIVE_H #define ARCHIVE_H +#include "string-list.h" + struct archiver_args { const char *base; size_t baselen; @@ -27,4 +29,17 @@ extern int write_zip_archive(struct archiver_args *); extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry); extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix); +struct tar_filter { + char *name; + char *command; + struct string_list extensions; + unsigned use_compression:1; + struct tar_filter *next; +}; + +extern struct tar_filter *tar_filters; +extern struct tar_filter *tar_filter_by_name(const char *name); + +extern void tar_filter_load_config(void); + #endif /* ARCHIVE_H */ diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index cff1b3e..c3e1a4e 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -252,4 +252,58 @@ test_expect_success 'git-archive --prefix=olde-' ' test -f h/olde-a/bin/sh ' +test_expect_success 'setup fake tar filter' ' + git config tarfilter.fake.command "cat >/dev/null; echo args: " +' + +test_expect_failure 'filter does not allow compression levels by default' ' + test_must_fail git archive --format=fake -9 HEAD >output +' + +test_expect_failure 'filters can allow compression levels' ' + git config tarfilter.fake.compressionlevels true && + echo "args: -9" >expect && + git archive --format=fake -9 HEAD >output && + test_cmp expect output +' + +test_expect_failure 'archive --list mentions user filter' ' + git archive --list >output && + grep "^fake\$" output +' + +test_expect_failure 'archive --list shows remote user filters' ' + git archive --list --remote=. >output && + grep "^fake\$" output +' + +test_expect_success 'setup slightly more useful tar filter' ' + git config tarfilter.foo.command "tr ab ba" && + git config --add tarfilter.foo.extension tar.foo && + git config --add tarfilter.foo.extension bar +' + +test_expect_failure 'archive outputs in configurable format' ' + git archive --format=foo HEAD >config.tar.foo && + tr ab ba <config.tar.foo >config.tar && + test_cmp b.tar config.tar +' + +test_expect_failure 'archive selects implicit format by configured extension' ' + git archive -o config-implicit.tar.foo HEAD && + test_cmp config.tar.foo config-implicit.tar.foo && + git archive -o config-implicit.bar HEAD && + test_cmp config.tar.foo config-implicit.bar +' + +test_expect_success 'default output format remains tar' ' + git archive -o config-implicit.baz HEAD && + test_cmp b.tar config-implicit.baz +' + +test_expect_failure 'extension matching requires dot' ' + git archive -o config-implicittar.foo HEAD && + test_cmp b.tar config-implicittar.foo +' + test_done -- 1.7.6.rc1.4.g49204 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 2/7] archive: add user-configurable tar-filter infrastructure 2011-06-15 22:33 ` [PATCH 2/7] archive: add user-configurable tar-filter infrastructure Jeff King @ 2011-06-15 23:33 ` Junio C Hamano 2011-06-16 0:29 ` Jeff King 0 siblings, 1 reply; 56+ messages in thread From: Junio C Hamano @ 2011-06-15 23:33 UTC (permalink / raw) To: Jeff King; +Cc: René Scharfe, Junio C Hamano, J.H., git, git-dev Jeff King <peff@peff.net> writes: > Archive supports two output formats: tar and zip. The tar > ... > +static struct tar_filter *tar_filter_by_namelen(const char *name, > + int len) > +{ > + struct tar_filter *p; > + for (p = tar_filters; p; p = p->next) > + if (!strncmp(p->name, name, len) && !p->name[len]) > + return p; > + return NULL; > +} Makes me wonder if we want to have a generic table that is keyed by name whose contents can be looked up by counted string. string_list is the closest thing we already have, but I do not think it has counted string interface (shouldn't be a rocket surgery to add it, though). > +static int tar_filter_config(const char *var, const char *value, void *data) > +{ > ... > + if (!strcmp(type, "command")) { > + if (!value) > + return config_error_nonbool(var); > + tf->command = xstrdup(value); Does this result in small leak if the same filter is multiply defined, say in /etc/gitconfig and then in ~/.gitconfig? > diff --git a/archive.h b/archive.h > index 038ac35..8386c46 100644 > --- a/archive.h > +++ b/archive.h > @@ -1,6 +1,8 @@ > #ifndef ARCHIVE_H > #define ARCHIVE_H > > +#include "string-list.h" > + > struct archiver_args { > const char *base; > size_t baselen; > @@ -27,4 +29,17 @@ extern int write_zip_archive(struct archiver_args *); > extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry); > extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix); > > +struct tar_filter { > + char *name; > + char *command; > + struct string_list extensions; > + unsigned use_compression:1; I suspect that you plan to pass sprintf("-%d", level) for the ones marked with this bit, but I wonder if we want to give a bit more control on how a compression level option is shaped for the particular command, and where on the command line the option comes. As long as we are targetting gzip and nothing else it is fine, and I suspect newer compression commands would try to mimic the -[0-9] command line interface gzip has (e.g. xz), so this probably is not an issue in practice. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/7] archive: add user-configurable tar-filter infrastructure 2011-06-15 23:33 ` Junio C Hamano @ 2011-06-16 0:29 ` Jeff King 0 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-16 0:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev On Wed, Jun 15, 2011 at 04:33:33PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Archive supports two output formats: tar and zip. The tar > > ... > > +static struct tar_filter *tar_filter_by_namelen(const char *name, > > + int len) > > +{ > > + struct tar_filter *p; > > + for (p = tar_filters; p; p = p->next) > > + if (!strncmp(p->name, name, len) && !p->name[len]) > > + return p; > > + return NULL; > > +} > > Makes me wonder if we want to have a generic table that is keyed by name > whose contents can be looked up by counted string. string_list is the > closest thing we already have, but I do not think it has counted string > interface (shouldn't be a rocket surgery to add it, though). I don't know that it would actually make this code significantly clearer or more efficient. If it were a sorted array, one could do a binary search, but we are really talking about a handful of elements (if you did want to refactor, this is almost identical to the matching code in userdiff, too). > > +static int tar_filter_config(const char *var, const char *value, void *data) > > +{ > > ... > > + if (!strcmp(type, "command")) { > > + if (!value) > > + return config_error_nonbool(var); > > + tf->command = xstrdup(value); > > Does this result in small leak if the same filter is multiply defined, say > in /etc/gitconfig and then in ~/.gitconfig? Yeah, it does. My original version had the builtin gzip statically allocated, and it wasn't safe to free() anything. But I ended up having to allocate it dynamically anyway because of the variable-sized list of extensions, so it would be safe to free(tf->command) here. I'll do that in my re-roll. > > +struct tar_filter { > > + char *name; > > + char *command; > > + struct string_list extensions; > > + unsigned use_compression:1; > > I suspect that you plan to pass sprintf("-%d", level) for the ones marked > with this bit, but I wonder if we want to give a bit more control on how a > compression level option is shaped for the particular command, and where > on the command line the option comes. As long as we are targetting gzip > and nothing else it is fine, and I suspect newer compression commands > would try to mimic the -[0-9] command line interface gzip has (e.g. xz), > so this probably is not an issue in practice. Yeah, I assumed everyone who would want this would support -[0-9]. After all, all the flag is doing is passing -[0-9] that was supplied to git-archive. We could allow something like: [tarfilter "gzip"] command = gzip %(compression) but I don't see much point. Either you want it or you don't. If there is a complex mapping of those numbers to some other options in your command, then point git to a helper script which does the conversion and then execs your command. If somebody has a counterexample, I'd be curious to hear it. -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 3/7] archive: support user tar-filters via --format 2011-06-15 22:30 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King 2011-06-15 22:31 ` [PATCH 1/7] archive: reorder option parsing and config reading Jeff King 2011-06-15 22:33 ` [PATCH 2/7] archive: add user-configurable tar-filter infrastructure Jeff King @ 2011-06-15 22:33 ` Jeff King 2011-06-15 22:33 ` [PATCH 4/7] archive: advertise user tar-filters in --list Jeff King ` (5 subsequent siblings) 8 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-15 22:33 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, J.H., git, git-dev The previous commit set up the infrastructure to read tar-filter configuration. This commit actually uses it to pipe the tar output through the specified filter. Signed-off-by: Jeff King <peff@peff.net> --- archive-tar-filter.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ archive.c | 16 +++++++++++++--- archive.h | 2 ++ t/t5000-tar-tree.sh | 6 +++--- 4 files changed, 66 insertions(+), 6 deletions(-) diff --git a/archive-tar-filter.c b/archive-tar-filter.c index 211f1df..ffe510e 100644 --- a/archive-tar-filter.c +++ b/archive-tar-filter.c @@ -1,5 +1,6 @@ #include "cache.h" #include "archive.h" +#include "run-command.h" struct tar_filter *tar_filters; static struct tar_filter **tar_filters_tail = &tar_filters; @@ -110,3 +111,50 @@ extern void tar_filter_load_config(void) git_config(tar_filter_config, NULL); remove_filters_without_command(); } + +static int write_tar_to_filter(struct archiver_args *args, const char *cmd) +{ + struct child_process filter; + const char *argv[2]; + int r; + + memset(&filter, 0, sizeof(filter)); + argv[0] = cmd; + argv[1] = NULL; + filter.argv = argv; + filter.use_shell = 1; + filter.in = -1; + + if (start_command(&filter) < 0) + die_errno("unable to start '%s' filter", argv[0]); + close(1); + if (dup2(filter.in, 1) < 0) + die_errno("unable to redirect descriptor"); + close(filter.in); + + r = write_tar_archive(args); + + close(1); + if (finish_command(&filter) != 0) + die("'%s' filter reported error", argv[0]); + + return r; +} + +int write_tar_filter_archive(struct archiver_args *args) +{ + struct strbuf cmd = STRBUF_INIT; + int r; + + if (!args->tar_filter) + die("BUG: tar-filter archiver called with no filter defined"); + + strbuf_addstr(&cmd, args->tar_filter->command); + if (args->tar_filter->use_compression && args->compression_level >= 0) + strbuf_addf(&cmd, " -%d", args->compression_level); + + r = write_tar_to_filter(args, cmd.buf); + + strbuf_release(&cmd); + return r; +} diff --git a/archive.c b/archive.c index 2ed9259..cf58faa 100644 --- a/archive.c +++ b/archive.c @@ -24,6 +24,9 @@ static const struct archiver { { "tar", write_tar_archive }, { "zip", write_zip_archive, USES_ZLIB_COMPRESSION }, }; +static const struct archiver tar_filter_archiver = { + "tar-filter", write_tar_filter_archive +}; static void format_subst(const struct commit *commit, const char *src, size_t len, @@ -364,12 +367,19 @@ static int parse_archive_args(int argc, const char **argv, if (argc < 1) usage_with_options(archive_usage, opts); *ar = lookup_archiver(format); - if (!*ar) - die("Unknown archive format '%s'", format); + + /* Fallback to user-configured tar filters */ + if (!*ar) { + args->tar_filter = tar_filter_by_name(format); + if (!args->tar_filter) + die("Unknown archive format '%s'", format); + *ar = &tar_filter_archiver; + } args->compression_level = Z_DEFAULT_COMPRESSION; if (compression_level != -1) { - if ((*ar)->flags & USES_ZLIB_COMPRESSION) + if ((*ar)->flags & USES_ZLIB_COMPRESSION || + (args->tar_filter && args->tar_filter->use_compression)) args->compression_level = compression_level; else { die("Argument not supported for format '%s': -%d", diff --git a/archive.h b/archive.h index 8386c46..fb2bb9e 100644 --- a/archive.h +++ b/archive.h @@ -14,6 +14,7 @@ struct archiver_args { unsigned int verbose : 1; unsigned int worktree_attributes : 1; int compression_level; + struct tar_filter *tar_filter; }; typedef int (*write_archive_fn_t)(struct archiver_args *); @@ -25,6 +26,7 @@ typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const unsign */ extern int write_tar_archive(struct archiver_args *); extern int write_zip_archive(struct archiver_args *); +extern int write_tar_filter_archive(struct archiver_args *); extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry); extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix); diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index c3e1a4e..2b2b128 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -256,11 +256,11 @@ test_expect_success 'setup fake tar filter' ' git config tarfilter.fake.command "cat >/dev/null; echo args: " ' -test_expect_failure 'filter does not allow compression levels by default' ' +test_expect_success 'filter does not allow compression levels by default' ' test_must_fail git archive --format=fake -9 HEAD >output ' -test_expect_failure 'filters can allow compression levels' ' +test_expect_success 'filters can allow compression levels' ' git config tarfilter.fake.compressionlevels true && echo "args: -9" >expect && git archive --format=fake -9 HEAD >output && @@ -283,7 +283,7 @@ test_expect_success 'setup slightly more useful tar filter' ' git config --add tarfilter.foo.extension bar ' -test_expect_failure 'archive outputs in configurable format' ' +test_expect_success 'archive outputs in configurable format' ' git archive --format=foo HEAD >config.tar.foo && tr ab ba <config.tar.foo >config.tar && test_cmp b.tar config.tar -- 1.7.6.rc1.4.g49204 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 4/7] archive: advertise user tar-filters in --list 2011-06-15 22:30 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King ` (2 preceding siblings ...) 2011-06-15 22:33 ` [PATCH 3/7] archive: support user tar-filters via --format Jeff King @ 2011-06-15 22:33 ` Jeff King 2011-06-15 22:34 ` [PATCH 5/7] archive: refactor format-guessing from filename Jeff King ` (4 subsequent siblings) 8 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-15 22:33 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, J.H., git, git-dev These can be selected by --format, so we need to let users know about them. It is especially important for the remote case, since this is the only method by which users can find out which formats the remote has configured. Signed-off-by: Jeff King <peff@peff.net> --- archive.c | 3 +++ t/t5000-tar-tree.sh | 4 ++-- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/archive.c b/archive.c index cf58faa..a987936 100644 --- a/archive.c +++ b/archive.c @@ -358,8 +358,11 @@ static int parse_archive_args(int argc, const char **argv, base = ""; if (list) { + struct tar_filter *p; for (i = 0; i < ARRAY_SIZE(archivers); i++) printf("%s\n", archivers[i].name); + for (p = tar_filters; p; p = p->next) + printf("%s\n", p->name); exit(0); } diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 2b2b128..9f959b1 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -267,12 +267,12 @@ test_expect_success 'filters can allow compression levels' ' test_cmp expect output ' -test_expect_failure 'archive --list mentions user filter' ' +test_expect_success 'archive --list mentions user filter' ' git archive --list >output && grep "^fake\$" output ' -test_expect_failure 'archive --list shows remote user filters' ' +test_expect_success 'archive --list shows remote user filters' ' git archive --list --remote=. >output && grep "^fake\$" output ' -- 1.7.6.rc1.4.g49204 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 5/7] archive: refactor format-guessing from filename 2011-06-15 22:30 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King ` (3 preceding siblings ...) 2011-06-15 22:33 ` [PATCH 4/7] archive: advertise user tar-filters in --list Jeff King @ 2011-06-15 22:34 ` Jeff King 2011-06-15 23:48 ` Junio C Hamano 2011-06-15 22:34 ` [PATCH 6/7] archive: match extensions from user-configured formats Jeff King ` (3 subsequent siblings) 8 siblings, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-15 22:34 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, J.H., git, git-dev The process for guessing an archive output format based on the filename is something like this: a. parse --output in cmd_archive; check the filename against a static set of mapping heuristics (right now it just matches ".zip" for zip files). b. if found, stick a fake "--format=zip" at the beginning of the arguments list (if the user did specify a --format manually, the later option will override our fake one) c. if it's a remote call, ship the arguments to the remote (including the fake), which will call write_archive on their end d. if it's local, ship the arguments to write_archive locally There are two problems: 1. The set of mappings is static and at too high a level. The write_archive level is going to check config for user-defined formats, some of which will specify extensions. We need to delay lookup until those are parsed, so we can match against them. 2. For a remote archive call, our set of mappings (or formats) may not match the remote side's. This is OK in practice right now, because all versions of git understand "zip" and "tar". But as new formats are added, there is going to be a mismatch between what the client can do and what the remote server can do. To fix (1), this patch refactors the location guessing to happen at the write_archive level, instead of the cmd_archive level. So instead of sticking a fake --format field in the argv list, we actually pass a "name hint" down the callchain; this hint is used at the appropriate time to guess the format (if one hasn't been given already). This patch leaves (2) unfixed. The name_hint is converted to a "--format" option as before, and passed to the remote. We could in theory pass the name hint to the remote side and let it decide which format to use. But that introduces a compatibility problem, as we have no place to put that information during the remote call without adding a new "--name-hint=" argument. An older version of git would choke on that, and the client has no way of knowing if the server is new enough or not (i.e., there is no capabilities advertisement, as there is with the git protocol itself). On top of this, it's unclear whether the remote side should be in charge of format selection, anyway. There is a minor information leak; the server will learn about the filename you are using to save. If we sent just the basename, though, that would lessen the leak and still give the remote side enough information to make a decision. But more important is that the name hint is only a hint, and we default to the tar format. Which means that inconsistencies between the client's and server's set of formats will have confusing results. For example, imagine the client learns about "tar.gz" as an extension for gzip'd tar ("tgz") files, but the server does not. Locally, running: git archive -o file.tar.gz HEAD will produce a gzip'd file. If we make the mapping decision locally, then running: git archive --remote=origin -o file.tar.gz HEAD will send "--format=tgz" to the remote side. The server will complain, saying that it doesn't know about the tgz format. If we instead send the name hint to the remote side and let it make the decision, it will not know what ".tar.gz" is, and will silently default to a plain tar, without the user even realizing it. The flip side of this is an old client talking to a new server (i.e., only the servers knows about ".tar.gz"). If we map the filename remotely, then the user is happy. If we map it locally, though, we will send the server no --format and it will silently default to tar. So the question is: should the mapping of filenames to formats be consistent for a single client (i.e., doing it locally or against a remote will either produce the same format, or report an error if the remote does not support the format), or should it be consistent for multiple clients hitting the same server (i.e., no matter what machine I am on, if I use git-archive to hit kernel.org, I will always see the same format for the same filename)? I chose consistency on a single client (i.e., we do the mapping locally), because: 1. Using git on one machine against multiple remotes is more common than using git on many machines against the same remote. So it's less likely for the user to be surprised. 2. Even if we wanted to do the reverse, it is not as simple as making the decision and writing some code. Because the server literally says nothing before we send it arguments, it's difficult to get interoperability between versions. We'd probably end up having to write the server side, wait sufficiently long for everybody to have deployed it, and then write the client side. Signed-off-by: Jeff King <peff@peff.net> --- archive.c | 25 +++++++++++++++++++--- archive.h | 4 ++- builtin/archive.c | 51 ++++++++++++++------------------------------- builtin/upload-archive.c | 2 +- 4 files changed, 41 insertions(+), 41 deletions(-) diff --git a/archive.c b/archive.c index a987936..e04f689 100644 --- a/archive.c +++ b/archive.c @@ -302,9 +302,10 @@ static void parse_treeish_arg(const char **argv, PARSE_OPT_NOARG | PARSE_OPT_NONEG | PARSE_OPT_HIDDEN, NULL, (p) } static int parse_archive_args(int argc, const char **argv, - const struct archiver **ar, struct archiver_args *args) + const struct archiver **ar, struct archiver_args *args, + const char *name_hint) { - const char *format = "tar"; + const char *format = NULL; const char *base = NULL; const char *remote = NULL; const char *exec = NULL; @@ -366,6 +367,11 @@ static int parse_archive_args(int argc, const char **argv, exit(0); } + if (!format && name_hint) + format = archive_format_from_filename(name_hint); + if (!format) + format = "tar"; + /* We need at least one parameter -- tree-ish */ if (argc < 1) usage_with_options(archive_usage, opts); @@ -398,7 +404,7 @@ static int parse_archive_args(int argc, const char **argv, } int write_archive(int argc, const char **argv, const char *prefix, - int setup_prefix) + int setup_prefix, const char *name_hint) { int nongit = 0; const struct archiver *ar = NULL; @@ -410,7 +416,7 @@ int write_archive(int argc, const char **argv, const char *prefix, git_config(git_default_config, NULL); tar_filter_load_config(); - argc = parse_archive_args(argc, argv, &ar, &args); + argc = parse_archive_args(argc, argv, &ar, &args, name_hint); if (nongit) { /* * We know this will die() with an error, so we could just @@ -425,3 +431,14 @@ int write_archive(int argc, const char **argv, const char *prefix, return ar->write_archive(&args); } + +const char *archive_format_from_filename(const char *filename) +{ + const char *ext = strrchr(filename, '.'); + if (!ext) + return NULL; + ext++; + if (!strcasecmp(ext, "zip")) + return "zip"; + return NULL; +} diff --git a/archive.h b/archive.h index fb2bb9e..894d4c4 100644 --- a/archive.h +++ b/archive.h @@ -29,7 +29,7 @@ extern int write_zip_archive(struct archiver_args *); extern int write_tar_filter_archive(struct archiver_args *); extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry); -extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix); +extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix, const char *name_hint); struct tar_filter { char *name; @@ -44,4 +44,6 @@ extern struct tar_filter *tar_filter_by_name(const char *name); extern void tar_filter_load_config(void); +const char *archive_format_from_filename(const char *filename); + #endif /* ARCHIVE_H */ diff --git a/builtin/archive.c b/builtin/archive.c index b14eaba..2578cf5 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -24,7 +24,8 @@ static void create_output_file(const char *output_file) } static int run_remote_archiver(int argc, const char **argv, - const char *remote, const char *exec) + const char *remote, const char *exec, + const char *name_hint) { char buf[LARGE_PACKET_MAX]; int fd[2], i, len, rv; @@ -37,6 +38,17 @@ static int run_remote_archiver(int argc, const char **argv, transport = transport_get(_remote, _remote->url[0]); transport_connect(transport, "git-upload-archive", exec, fd); + /* + * Inject a fake --format field at the beginning of the + * arguments, with the format inferred from our output + * filename. This way explicit --format options can override + * it. + */ + if (name_hint) { + const char *format = archive_format_from_filename(name_hint); + if (format) + packet_write(fd[1], "argument --format=%s\n", format); + } for (i = 1; i < argc; i++) packet_write(fd[1], "argument %s\n", argv[i]); packet_flush(fd[1]); @@ -63,17 +75,6 @@ static int run_remote_archiver(int argc, const char **argv, return !!rv; } -static const char *format_from_name(const char *filename) -{ - const char *ext = strrchr(filename, '.'); - if (!ext) - return NULL; - ext++; - if (!strcasecmp(ext, "zip")) - return "--format=zip"; - return NULL; -} - #define PARSE_OPT_KEEP_ALL ( PARSE_OPT_KEEP_DASHDASH | \ PARSE_OPT_KEEP_ARGV0 | \ PARSE_OPT_KEEP_UNKNOWN | \ @@ -84,7 +85,6 @@ int cmd_archive(int argc, const char **argv, const char *prefix) const char *exec = "git-upload-archive"; const char *output = NULL; const char *remote = NULL; - const char *format_option = NULL; struct option local_opts[] = { OPT_STRING('o', "output", &output, "file", "write the archive to this file"), @@ -98,32 +98,13 @@ int cmd_archive(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, local_opts, NULL, PARSE_OPT_KEEP_ALL); - if (output) { + if (output) create_output_file(output); - format_option = format_from_name(output); - } - - /* - * We have enough room in argv[] to muck it in place, because - * --output must have been given on the original command line - * if we get to this point, and parse_options() must have eaten - * it, i.e. we can add back one element to the array. - * - * We add a fake --format option at the beginning, with the - * format inferred from our output filename. This way explicit - * --format options can override it, and the fake option is - * inserted before any "--" that might have been given. - */ - if (format_option) { - memmove(argv + 2, argv + 1, sizeof(*argv) * argc); - argv[1] = format_option; - argv[++argc] = NULL; - } if (remote) - return run_remote_archiver(argc, argv, remote, exec); + return run_remote_archiver(argc, argv, remote, exec, output); setvbuf(stderr, NULL, _IOLBF, BUFSIZ); - return write_archive(argc, argv, prefix, 1); + return write_archive(argc, argv, prefix, 1, output); } diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c index 73f788e..e6bb97d 100644 --- a/builtin/upload-archive.c +++ b/builtin/upload-archive.c @@ -64,7 +64,7 @@ static int run_upload_archive(int argc, const char **argv, const char *prefix) sent_argv[sent_argc] = NULL; /* parse all options sent by the client */ - return write_archive(sent_argc, sent_argv, prefix, 0); + return write_archive(sent_argc, sent_argv, prefix, 0, NULL); } __attribute__((format (printf, 1, 2))) -- 1.7.6.rc1.4.g49204 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 5/7] archive: refactor format-guessing from filename 2011-06-15 22:34 ` [PATCH 5/7] archive: refactor format-guessing from filename Jeff King @ 2011-06-15 23:48 ` Junio C Hamano 2011-06-16 0:34 ` Jeff King 0 siblings, 1 reply; 56+ messages in thread From: Junio C Hamano @ 2011-06-15 23:48 UTC (permalink / raw) To: Jeff King; +Cc: René Scharfe, Junio C Hamano, J.H., git, git-dev Jeff King <peff@peff.net> writes: > But more important is that the name hint is only a hint, and > we default to the tar format. Which means that > inconsistencies between the client's and server's set of > formats will have confusing results. For example, imagine > the client learns about "tar.gz" as an extension for gzip'd > tar ("tgz") files, but the server does not. Locally, > running: > > git archive -o file.tar.gz HEAD > > will produce a gzip'd file. If we make the mapping decision > locally, then running: > > git archive --remote=origin -o file.tar.gz HEAD > > will send "--format=tgz" to the remote side. The server will > complain, saying that it doesn't know about the tgz format. As long as that complaint is clearly marked as coming from the remote side, the user now knows that tgz is not supported, and can fall back to a plain tar. Am I being naïve thinking that barfing (and assuming that the user understands why the remote end barfed) actually is a good thing? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 5/7] archive: refactor format-guessing from filename 2011-06-15 23:48 ` Junio C Hamano @ 2011-06-16 0:34 ` Jeff King 0 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-16 0:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev On Wed, Jun 15, 2011 at 04:48:32PM -0700, Junio C Hamano wrote: > > will produce a gzip'd file. If we make the mapping decision > > locally, then running: > > > > git archive --remote=origin -o file.tar.gz HEAD > > > > will send "--format=tgz" to the remote side. The server will > > complain, saying that it doesn't know about the tgz format. > > As long as that complaint is clearly marked as coming from the remote > side, the user now knows that tgz is not supported, and can fall back to a > plain tar. It is. You can simulate with: $ git archive --format=foobar --remote=. HEAD remote: fatal: Unknown archive format 'foobar' remote: git upload-archive: archiver died with error fatal: sent error to the client: git upload-archive: archiver died with error Or if you want to be really thorough and you have an old git lying around, you can see the format auto-selection triggers the same code: $ git archive -o foo.tar.gz --remote=. --exec='git.v1.7.5 upload-archive' HEAD remote: fatal: Unknown archive format 'tgz' fatal: sent error to the client: git upload-archive: archiver died with error remote: git upload-archive: archiver died with error > Am I being naïve thinking that barfing (and assuming that the user > understands why the remote end barfed) actually is a good thing? No, I think barfing is totally fine there. I worry more about the cases where we silently produce a format the user was not expecting (the way mine is coded, it is "the server knows about .tar.gz, but the client does not; I expect gzip, but I get regular tar"). -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH 6/7] archive: match extensions from user-configured formats 2011-06-15 22:30 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King ` (4 preceding siblings ...) 2011-06-15 22:34 ` [PATCH 5/7] archive: refactor format-guessing from filename Jeff King @ 2011-06-15 22:34 ` Jeff King 2011-06-15 22:35 ` [PATCH 7/7] archive: provide builtin .tar.gz filter Jeff King ` (2 subsequent siblings) 8 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-15 22:34 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, J.H., git, git-dev This lets you configure a format like: [tarfilter "tgz"] command = gzip extension = tgz extension = tar.gz and have it automatically used for "foo.tgz" or "foo.tar.gz". Signed-off-by: Jeff King <peff@peff.net> --- archive-tar-filter.c | 29 +++++++++++++++++++++++++++++ archive.c | 12 ++++++++++++ archive.h | 1 + t/t5000-tar-tree.sh | 4 ++-- 4 files changed, 44 insertions(+), 2 deletions(-) diff --git a/archive-tar-filter.c b/archive-tar-filter.c index ffe510e..e749133 100644 --- a/archive-tar-filter.c +++ b/archive-tar-filter.c @@ -39,6 +39,35 @@ struct tar_filter *tar_filter_by_name(const char *name) return tar_filter_by_namelen(name, strlen(name)); } +static int match_extension(const char *filename, const char *ext) +{ + int prefixlen = strlen(filename) - strlen(ext); + + /* + * We need 1 character for the '.', and 1 character to ensure that the + * prefix is non-empty (i.e., we don't match ".tar.gz" with no actual + * filename). + */ + if (prefixlen < 2 || filename[prefixlen-1] != '.') + return 0; + return !strcmp(filename + prefixlen, ext); +} + +struct tar_filter *tar_filter_by_extension(const char *filename) +{ + struct tar_filter *p; + + for (p = tar_filters; p; p = p->next) { + int i; + for (i = 0; i < p->extensions.nr; i++) { + const char *ext = p->extensions.items[i].string; + if (match_extension(filename, ext)) + return p; + } + } + return NULL; +} + static int tar_filter_config(const char *var, const char *value, void *data) { struct tar_filter *tf; diff --git a/archive.c b/archive.c index e04f689..e509b6c 100644 --- a/archive.c +++ b/archive.c @@ -434,11 +434,23 @@ int write_archive(int argc, const char **argv, const char *prefix, const char *archive_format_from_filename(const char *filename) { + struct tar_filter *tf; const char *ext = strrchr(filename, '.'); if (!ext) return NULL; ext++; if (!strcasecmp(ext, "zip")) return "zip"; + + /* + * Fallback to user-configured tar filters; but note + * that we might have to load config ourselves, first, + * if we are not being called via write_archive. + */ + tar_filter_load_config(); + tf = tar_filter_by_extension(filename); + if (tf) + return tf->name; + return NULL; } diff --git a/archive.h b/archive.h index 894d4c4..80c89dc 100644 --- a/archive.h +++ b/archive.h @@ -41,6 +41,7 @@ struct tar_filter { extern struct tar_filter *tar_filters; extern struct tar_filter *tar_filter_by_name(const char *name); +extern struct tar_filter *tar_filter_by_extension(const char *filename); extern void tar_filter_load_config(void); diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 9f959b1..fe661f3 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -289,7 +289,7 @@ test_expect_success 'archive outputs in configurable format' ' test_cmp b.tar config.tar ' -test_expect_failure 'archive selects implicit format by configured extension' ' +test_expect_success 'archive selects implicit format by configured extension' ' git archive -o config-implicit.tar.foo HEAD && test_cmp config.tar.foo config-implicit.tar.foo && git archive -o config-implicit.bar HEAD && @@ -301,7 +301,7 @@ test_expect_success 'default output format remains tar' ' test_cmp b.tar config-implicit.baz ' -test_expect_failure 'extension matching requires dot' ' +test_expect_success 'extension matching requires dot' ' git archive -o config-implicittar.foo HEAD && test_cmp b.tar config-implicittar.foo ' -- 1.7.6.rc1.4.g49204 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-15 22:30 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King ` (5 preceding siblings ...) 2011-06-15 22:34 ` [PATCH 6/7] archive: match extensions from user-configured formats Jeff King @ 2011-06-15 22:35 ` Jeff King 2011-06-15 23:55 ` Junio C Hamano 2011-06-18 14:52 ` [RFC/PATCH 0/7] user-configurable git-archive output formats René Scharfe 2011-06-18 15:40 ` René Scharfe 8 siblings, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-15 22:35 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, J.H., git, git-dev This works exactly as if the user had configured it via: [tarfilter "tgz"] command = gzip extension = tgz extension = tar.gz compressionlevels = true but since it is so common, it's convenient to have it builtin without the user needing to do anything. Signed-off-by: Jeff King <peff@peff.net> --- archive-tar-filter.c | 12 ++++++++++++ t/t5000-tar-tree.sh | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 60 insertions(+), 0 deletions(-) diff --git a/archive-tar-filter.c b/archive-tar-filter.c index e749133..de8719a 100644 --- a/archive-tar-filter.c +++ b/archive-tar-filter.c @@ -126,6 +126,17 @@ static void remove_filters_without_command(void) } } +static void load_builtin_filters(void) +{ + struct tar_filter *tf; + + tf = tar_filter_new("tgz", strlen("tgz")); + tf->command = xstrdup("gzip"); + string_list_append(&tf->extensions, "tgz"); + string_list_append(&tf->extensions, "tar.gz"); + tf->use_compression = 1; +} + /* * We don't want to load twice, since some of our * values actually append rather than overwrite. @@ -137,6 +148,7 @@ extern void tar_filter_load_config(void) return; tar_filter_config_loaded = 1; + load_builtin_filters(); git_config(tar_filter_config, NULL); remove_filters_without_command(); } diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index fe661f3..ebad295 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -26,6 +26,7 @@ commit id embedding: . ./test-lib.sh UNZIP=${UNZIP:-unzip} +GUNZIP=${GUNZIP:-gzip -d} SUBSTFORMAT=%H%n @@ -306,4 +307,51 @@ test_expect_success 'extension matching requires dot' ' test_cmp b.tar config-implicittar.foo ' +test_expect_success 'git archive --format=tgz' ' + git archive --format=tgz HEAD >j.tgz +' + +test_expect_success 'infer tgz from .tgz filename' ' + git archive --output=j1.tgz HEAD && + test_cmp j.tgz j1.tgz +' + +test_expect_success 'infer tgz from .tar.gz filename' ' + git archive --output=j2.tar.gz HEAD && + test_cmp j.tgz j2.tar.gz +' + +if $GUNZIP --version >/dev/null 2>&1; then + test_set_prereq GUNZIP +else + say "Skipping some tgz tests because gunzip was not found" +fi + +test_expect_success GUNZIP 'extract tgz file' ' + $GUNZIP -c <j.tgz >j.tar && + test_cmp b.tar j.tar +' + +test_expect_success GUNZIP 'tgz allows compression levels' ' + git archive -1 --output=j3.tgz HEAD +' + +test_expect_success 'disable builtin tgz via config' ' + git config tarfilter.tgz.command "" +' + +test_expect_success 'disabled filter does not appear in --list' ' + git archive --list >output && + ! grep tgz output +' + +test_expect_success 'disabled filter cannot be used' ' + test_must_fail git archive --format=tgz HEAD >output +' + +test_expect_success 'disabled filter does not match extensions' ' + git archive -o disabled.tar.gz HEAD && + test_cmp b.tar disabled.tar.gz +' + test_done -- 1.7.6.rc1.4.g49204 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-15 22:35 ` [PATCH 7/7] archive: provide builtin .tar.gz filter Jeff King @ 2011-06-15 23:55 ` Junio C Hamano 2011-06-15 23:57 ` Junio C Hamano 2011-06-16 0:38 ` Jeff King 0 siblings, 2 replies; 56+ messages in thread From: Junio C Hamano @ 2011-06-15 23:55 UTC (permalink / raw) To: Jeff King; +Cc: René Scharfe, J.H., git, git-dev Jeff King <peff@peff.net> writes: > +test_expect_success 'git archive --format=tgz' ' > + git archive --format=tgz HEAD >j.tgz > +' > + > +test_expect_success 'infer tgz from .tgz filename' ' > + git archive --output=j1.tgz HEAD && > + test_cmp j.tgz j1.tgz > +' I suspect this would get intermittent failures for the same reason as 0c8c385 (gitweb: supply '-n' to gzip for identical output, 2011-04-26) ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-15 23:55 ` Junio C Hamano @ 2011-06-15 23:57 ` Junio C Hamano 2011-06-16 0:38 ` Jeff King 1 sibling, 0 replies; 56+ messages in thread From: Junio C Hamano @ 2011-06-15 23:57 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, René Scharfe, J.H., git, git-dev Junio C Hamano <gitster@pobox.com> writes: > Jeff King <peff@peff.net> writes: > >> +test_expect_success 'git archive --format=tgz' ' >> + git archive --format=tgz HEAD >j.tgz >> +' >> + >> +test_expect_success 'infer tgz from .tgz filename' ' >> + git archive --output=j1.tgz HEAD && >> + test_cmp j.tgz j1.tgz >> +' > > I suspect this would get intermittent failures for the same reason as > 0c8c385 (gitweb: supply '-n' to gzip for identical output, 2011-04-26) To be squashed into 7/7, I guess... archive-tar-filter.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/archive-tar-filter.c b/archive-tar-filter.c index de8719a..d6e4e32 100644 --- a/archive-tar-filter.c +++ b/archive-tar-filter.c @@ -131,7 +131,7 @@ static void load_builtin_filters(void) struct tar_filter *tf; tf = tar_filter_new("tgz", strlen("tgz")); - tf->command = xstrdup("gzip"); + tf->command = xstrdup("gzip -n"); string_list_append(&tf->extensions, "tgz"); string_list_append(&tf->extensions, "tar.gz"); tf->use_compression = 1; ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-15 23:55 ` Junio C Hamano 2011-06-15 23:57 ` Junio C Hamano @ 2011-06-16 0:38 ` Jeff King 2011-06-16 6:27 ` Junio C Hamano 1 sibling, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-16 0:38 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev On Wed, Jun 15, 2011 at 04:55:57PM -0700, Junio C Hamano wrote: > > +test_expect_success 'infer tgz from .tgz filename' ' > > + git archive --output=j1.tgz HEAD && > > + test_cmp j.tgz j1.tgz > > +' > > I suspect this would get intermittent failures for the same reason as > 0c8c385 (gitweb: supply '-n' to gzip for identical output, 2011-04-26) Ick, yeah. I pulled these tests from my original internal implementation, which I suspect may have been more stable. The filename will always be stdin, which is OK, but the timestamp will probably get us. > diff --git a/archive-tar-filter.c b/archive-tar-filter.c > index de8719a..d6e4e32 100644 > --- a/archive-tar-filter.c > +++ b/archive-tar-filter.c > @@ -131,7 +131,7 @@ static void load_builtin_filters(void) > struct tar_filter *tf; > > tf = tar_filter_new("tgz", strlen("tgz")); > - tf->command = xstrdup("gzip"); > + tf->command = xstrdup("gzip -n"); > string_list_append(&tf->extensions, "tgz"); > string_list_append(&tf->extensions, "tar.gz"); > tf->use_compression = 1; This feels a little wrong, as we are changing what the tool outputs all the time just to appease a poorly-written test. Maybe nobody cares about the timestamp field (I certainly don't), but it seems like it might surprise some users. -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-16 0:38 ` Jeff King @ 2011-06-16 6:27 ` Junio C Hamano 2011-06-16 6:51 ` Jeff King 0 siblings, 1 reply; 56+ messages in thread From: Junio C Hamano @ 2011-06-16 6:27 UTC (permalink / raw) To: Jeff King; +Cc: René Scharfe, J.H., git, git-dev Jeff King <peff@github.com> writes: > This feels a little wrong, as we are changing what the tool outputs all > the time just to appease a poorly-written test. Now you confused me. Isn't '-n' to tell the tool *not* to write timestamp out, so that we can avoid changing what the tool outputs all the time? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-16 6:27 ` Junio C Hamano @ 2011-06-16 6:51 ` Jeff King 2011-06-16 7:56 ` Chris Webb 0 siblings, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-16 6:51 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev On Wed, Jun 15, 2011 at 11:27:26PM -0700, Junio C Hamano wrote: > Jeff King <peff@github.com> writes: > > > This feels a little wrong, as we are changing what the tool outputs all > > the time just to appease a poorly-written test. > > Now you confused me. Isn't '-n' to tell the tool *not* to write timestamp > out, so that we can avoid changing what the tool outputs all the time? No, I mean that people may _want_ the timestamp in day to day use. Using "-n" all the time suppresses it. And there is no reason to suppress it, except that our test does not account for it properly. So your patch is hurting people who don't want "-n" (i.e., want the timestamp) just to make our test happy. -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-16 6:51 ` Jeff King @ 2011-06-16 7:56 ` Chris Webb 2011-06-16 17:46 ` Jeff King 0 siblings, 1 reply; 56+ messages in thread From: Chris Webb @ 2011-06-16 7:56 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Ren?? Scharfe, J.H., git, git-dev Jeff King <peff@github.com> writes: > No, I mean that people may _want_ the timestamp in day to day use. Using > "-n" all the time suppresses it. And there is no reason to suppress it, > except that our test does not account for it properly. So your patch is > hurting people who don't want "-n" (i.e., want the timestamp) just to > make our test happy. It's useful to omit the timestamp outside of git too. Source-based package management systems generally store a URL from which to fetch a source tarball, and a hash of that source tarball to ensure it hasn't been tampered with. It's nice to be able to use a gitweb URL like http://git.kernel.org/?p=git/git.git;a=snapshot;h=e5af0de202e885b793482d416b8ce9d50dd2b8bc;sf=tgz as the tarball source, and still be able to verify its integrity against a prestored hash. Cheers, Chris. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-16 7:56 ` Chris Webb @ 2011-06-16 17:46 ` Jeff King 2011-06-16 18:02 ` Junio C Hamano 0 siblings, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-16 17:46 UTC (permalink / raw) To: Chris Webb; +Cc: Junio C Hamano, Ren?? Scharfe, J.H., git, git-dev On Thu, Jun 16, 2011 at 08:56:21AM +0100, Chris Webb wrote: > Jeff King <peff@github.com> writes: > > > No, I mean that people may _want_ the timestamp in day to day use. Using > > "-n" all the time suppresses it. And there is no reason to suppress it, > > except that our test does not account for it properly. So your patch is > > hurting people who don't want "-n" (i.e., want the timestamp) just to > > make our test happy. > > It's useful to omit the timestamp outside of git too. Source-based package > management systems generally store a URL from which to fetch a source > tarball, and a hash of that source tarball to ensure it hasn't been tampered > with. It's nice to be able to use a gitweb URL like > > http://git.kernel.org/?p=git/git.git;a=snapshot;h=e5af0de202e885b793482d416b8ce9d50dd2b8bc;sf=tgz > > as the tarball source, and still be able to verify its integrity against a > prestored hash. OK. I'm totally willing to accept that people actually prefer the "-n" behavior. I don't care either way myself. I just don't want the reason to default to "-n" to be "because our test scripts need it" and not "because this is what people actually want". -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-16 17:46 ` Jeff King @ 2011-06-16 18:02 ` Junio C Hamano 2011-06-16 18:21 ` Jeff King 0 siblings, 1 reply; 56+ messages in thread From: Junio C Hamano @ 2011-06-16 18:02 UTC (permalink / raw) To: Jeff King; +Cc: Chris Webb, Junio C Hamano, Ren?? Scharfe, J.H., git, git-dev Jeff King <peff@github.com> writes: > OK. I'm totally willing to accept that people actually prefer the "-n" > behavior. I don't care either way myself. I just don't want the reason > to default to "-n" to be "because our test scripts need it" and not > "because this is what people actually want". Surely I share the exact feeling, and that is why I quoted the other "-n" added to gitweb because that was what people actually wanted. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-16 18:02 ` Junio C Hamano @ 2011-06-16 18:21 ` Jeff King 2011-06-16 18:27 ` John Szakmeister 2011-06-16 18:42 ` Junio C Hamano 0 siblings, 2 replies; 56+ messages in thread From: Jeff King @ 2011-06-16 18:21 UTC (permalink / raw) To: Junio C Hamano; +Cc: Chris Webb, Ren?? Scharfe, J.H., git, git-dev On Thu, Jun 16, 2011 at 11:02:09AM -0700, Junio C Hamano wrote: > Jeff King <peff@github.com> writes: > > > OK. I'm totally willing to accept that people actually prefer the "-n" > > behavior. I don't care either way myself. I just don't want the reason > > to default to "-n" to be "because our test scripts need it" and not > > "because this is what people actually want". > > Surely I share the exact feeling, and that is why I quoted the other "-n" > added to gitweb because that was what people actually wanted. Fair enough. I'll use "gzip -n" in my re-roll. Any comment on the "tarfilter" versus "generic archive filter" issue, or on the general interface? I think getting that right is my biggest issue in moving forward. Also, since it's easy via the external helper route, should there be any other builtin formats? Bzip2? It's not that big a deal for a big hosting site like kernel.org to stick it in their configuration, but I wonder if normal users would find it useful. -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-16 18:21 ` Jeff King @ 2011-06-16 18:27 ` John Szakmeister 2011-06-16 18:42 ` Junio C Hamano 1 sibling, 0 replies; 56+ messages in thread From: John Szakmeister @ 2011-06-16 18:27 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, Chris Webb, Ren?? Scharfe, J.H., git, git-dev On Thu, Jun 16, 2011 at 2:21 PM, Jeff King <peff@github.com> wrote: [snip] > Also, since it's easy via the external helper route, should there be any > other builtin formats? Bzip2? It's not that big a deal for a big hosting > site like kernel.org to stick it in their configuration, but I wonder if > normal users would find it useful. I'd certainly find it useful. -John ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-16 18:21 ` Jeff King 2011-06-16 18:27 ` John Szakmeister @ 2011-06-16 18:42 ` Junio C Hamano 2011-06-16 18:57 ` Jeff King 1 sibling, 1 reply; 56+ messages in thread From: Junio C Hamano @ 2011-06-16 18:42 UTC (permalink / raw) To: Jeff King; +Cc: Chris Webb, Ren?? Scharfe, J.H., git, git-dev Jeff King <peff@github.com> writes: > Also, since it's easy via the external helper route, should there be any > other builtin formats? Bzip2? It's not that big a deal for a big hosting > site like kernel.org to stick it in their configuration, but I wonder if > normal users would find it useful. I know k.org statically prepares *.bz2 for any *.gz in /pub/software/ hierarchy, but isn't bzip2 significantly more expensive (like taking ten times as much memory or four times as much CPI time to squeeze the last extra 15% out), making it unsuitable for one-shot online use? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 7/7] archive: provide builtin .tar.gz filter 2011-06-16 18:42 ` Junio C Hamano @ 2011-06-16 18:57 ` Jeff King 0 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-16 18:57 UTC (permalink / raw) To: Junio C Hamano; +Cc: Chris Webb, Ren?? Scharfe, J.H., git, git-dev On Thu, Jun 16, 2011 at 11:42:17AM -0700, Junio C Hamano wrote: > Jeff King <peff@github.com> writes: > > > Also, since it's easy via the external helper route, should there be any > > other builtin formats? Bzip2? It's not that big a deal for a big hosting > > site like kernel.org to stick it in their configuration, but I wonder if > > normal users would find it useful. > > I know k.org statically prepares *.bz2 for any *.gz in /pub/software/ > hierarchy, but isn't bzip2 significantly more expensive (like taking ten > times as much memory or four times as much CPI time to squeeze the last > extra 15% out), making it unsuitable for one-shot online use? I will let J.H. comment on how appropriate that is for k.org; he is the one who mentioned bzip2 in the first place earlier in the thread. Even if kernel.org wants it, it is a departure from what we now, though. People who set up remote upload-archive access long ago and upgrade git would now suddenly let anyone convince their machines to chew up a lot of CPU. I don't mind too much doing it for gzip, which takes roughly the same amount of time as zip, which already exists. Silently adding other formats may be worse, though. At the same time, it would be nice for people running git-archive locally to avoid having to configure a bzip2 filter manually. Maybe upload-archive should ignore these filters by default, and require a special config variable to enable them? -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC/PATCH 0/7] user-configurable git-archive output formats 2011-06-15 22:30 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King ` (6 preceding siblings ...) 2011-06-15 22:35 ` [PATCH 7/7] archive: provide builtin .tar.gz filter Jeff King @ 2011-06-18 14:52 ` René Scharfe 2011-06-18 15:28 ` Jakub Narebski ` (2 more replies) 2011-06-18 15:40 ` René Scharfe 8 siblings, 3 replies; 56+ messages in thread From: René Scharfe @ 2011-06-18 14:52 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, J.H., git, git-dev Am 16.06.2011 00:30, schrieb Jeff King: > On Tue, Jun 14, 2011 at 04:45:21PM -0400, Jeff King wrote: > >> The gzip path is not configurable at all. Probably it should read the >> path and arguments from the config file. In fact, we could even allow >> arbitrary config like: >> >> [tarfilter "tgz"] >> command = gzip -c >> extension = tgz >> extension = tar.gz Configuration options whose values are appended instead of overwritten by duplicate definitions are a new concept for git, I think. Perhaps it's not a big thing, but I think it's better avoided. The only (stupid) practical shortcoming I can think if is this, though: You can't remove anything from the list of supported extensions in a user config if the system config already contains e.g. tgz and tar.gz. > Here's a series implementing that. You can configure whatever you want, > and it includes builtin gzip configuration by default. You can override > to turn it off, or even switch it to run something like pigz instead. > > My biggest reservation with the patches as-is is that they are very > tar-centric and not orthogonal. Specifically, they won't handle: > > 1. Other streamable archive formats you would want to pipe through > compressors. Do any of these actually exist? I guess we could offer > "pax" as a format eventually, and it might be like tar with > different defaults? I dunno. > > Fixing this would not be too hard. Instead of these being > "tarfilters", they would be "archive filters", and they would chain > to some format, defaulting to "tar". Since there is no other > format right now, we could even punt on writing most of the code > until somebody adds one. But we would want to get the naming of the > config options right, since those are user-facing. Maybe > "archivefilter" (unfortunately the more readable archive.filter is > a little awkward with the way we parse config files)? The pax format is identical to the ustar format, which --format=tar produces. The other major format that comes to mind is cpio. The (never merged) predecessor of tar-tree actually used that format. Since then I have been waiting for users to request being able to export using cpio format (which is simpler and slightly smaller than tar), but that never happened. It seems the existence of the pax format really has pacified the tar vs. cpio war of old. I'm not sure "filter" is a good name, though. We have core.pager, which is technically a filter as well, but for a specific purpose. And we have the tar.umask setting as a precedence for format specfic config options. So how about tar.<extension>.compressor? [tar "tgz"] compressor = gzip -cn [tar "tar.gz"] compressor = gzip -cn [tar "tar.bz2"] compressor = bzip2 -c We don't need a compressionlevels option here because we can simply assume that the compressor commands do support them. (Side note: this is not fully true for bzip2, as it doesn't support -0, but I don't think this is worth special consideration in our code, as long as errors of the filter are displayed properly.) And we can also add a config option to restrict the formats creatable by upload-archive, to address concerns over DoS attacks with expensive compressors: [archive] remoteFormats = tar zip tgz tar.gz ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC/PATCH 0/7] user-configurable git-archive output formats 2011-06-18 14:52 ` [RFC/PATCH 0/7] user-configurable git-archive output formats René Scharfe @ 2011-06-18 15:28 ` Jakub Narebski 2011-06-20 15:58 ` Junio C Hamano 2011-06-21 16:01 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King 2 siblings, 0 replies; 56+ messages in thread From: Jakub Narebski @ 2011-06-18 15:28 UTC (permalink / raw) To: René Scharfe; +Cc: Jeff King, Junio C Hamano, J.H., git, git-dev René Scharfe <rene.scharfe@lsrfire.ath.cx> writes: > Am 16.06.2011 00:30, schrieb Jeff King: >> On Tue, Jun 14, 2011 at 04:45:21PM -0400, Jeff King wrote: >> >>> The gzip path is not configurable at all. Probably it should read the >>> path and arguments from the config file. In fact, we could even allow >>> arbitrary config like: >>> >>> [tarfilter "tgz"] >>> command = gzip -c >>> extension = tgz >>> extension = tar.gz > > Configuration options whose values are appended instead of overwritten > by duplicate definitions are a new concept for git, I think. Perhaps > it's not a big thing, but I think it's better avoided. Actually they are not a new concept, but they are quite rare. `git config` has even special options to deal with them ('--replace-all', '--add', '--get-all', '--unset-all'. For example "core.gitProxy" can be set multiple times, and of course "remote.<remotename>.fetch", 'pull' and 'url'. -- Jakub Narebski Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC/PATCH 0/7] user-configurable git-archive output formats 2011-06-18 14:52 ` [RFC/PATCH 0/7] user-configurable git-archive output formats René Scharfe 2011-06-18 15:28 ` Jakub Narebski @ 2011-06-20 15:58 ` Junio C Hamano 2011-06-22 1:19 ` [PATCHv2 0/9] configurable tar compressors Jeff King 2011-06-21 16:01 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King 2 siblings, 1 reply; 56+ messages in thread From: Junio C Hamano @ 2011-06-20 15:58 UTC (permalink / raw) To: René Scharfe; +Cc: Jeff King, J.H., git, git-dev René Scharfe <rene.scharfe@lsrfire.ath.cx> writes: > I'm not sure "filter" is a good name, though. We have core.pager, which > is technically a filter as well, but for a specific purpose. And we > have the tar.umask setting as a precedence for format specfic config > options. So how about tar.<extension>.compressor? > > [tar "tgz"] > compressor = gzip -cn > [tar "tar.gz"] > compressor = gzip -cn > [tar "tar.bz2"] > compressor = bzip2 -c > > We don't need a compressionlevels option here because we can simply > assume that the compressor commands do support them. (Side note: this > is not fully true for bzip2, as it doesn't support -0, but I don't think > this is worth special consideration in our code, as long as errors of > the filter are displayed properly.) > > And we can also add a config option to restrict the formats creatable by > upload-archive, to address concerns over DoS attacks with expensive > compressors: > > [archive] > remoteFormats = tar zip tgz tar.gz Both sounds sensible, I think. ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCHv2 0/9] configurable tar compressors 2011-06-20 15:58 ` Junio C Hamano @ 2011-06-22 1:19 ` Jeff King 2011-06-22 1:20 ` [PATCHv2 1/9] archive: reorder option parsing and config reading Jeff King ` (8 more replies) 0 siblings, 9 replies; 56+ messages in thread From: Jeff King @ 2011-06-22 1:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev Here's my re-roll. In addition to integrating comments on the list, I made the integration with the archive code a little smoother. Filters now appear as their own "struct archiver" in the list, and don't have to be special-cased everywhere. [1/9]: archive: reorder option parsing and config reading [2/9]: archive-tar: don't reload default config options [3/9]: archive: refactor list of archive formats [4/9]: archive: pass archiver struct to write_archive callback [5/9]: archive: move file extension format-guessing lower [6/9]: archive: refactor file extension format-guessing [7/9]: archive: implement configurable tar filters [8/9]: archive: provide builtin .tar.gz filter [9/9]: upload-archive: allow user to turn off filters -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCHv2 1/9] archive: reorder option parsing and config reading 2011-06-22 1:19 ` [PATCHv2 0/9] configurable tar compressors Jeff King @ 2011-06-22 1:20 ` Jeff King 2011-06-22 1:22 ` [PATCHv2 2/9] archive-tar: don't reload default config options Jeff King ` (7 subsequent siblings) 8 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-22 1:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev The archive command does three things during its initialization phase: 1. parse command-line options 2. setup the git directory 3. read config During phase (1), if we see any options that do not require a git directory (like "--list"), we handle them immediately and exit, making it safe to abort step (2) if we are not in a git directory. Step (3) must come after step (2), since the git directory may influence configuration. However, this leaves no possibility of configuration from step (3) impacting the command-line options in step (1) (which is useful, for example, for supporting user-configurable output formats). Instead, let's reorder this to: 1. setup the git directory, if it exists 2. read config 3. parse command-line options 4. if we are not in a git repository, die This should have the same external behavior, but puts configuration before command-line parsing. Signed-off-by: Jeff King <peff@peff.net> --- Same as v1. archive.c | 18 ++++++++++++++---- 1 files changed, 14 insertions(+), 4 deletions(-) diff --git a/archive.c b/archive.c index 42f2d2f..2616676 100644 --- a/archive.c +++ b/archive.c @@ -387,17 +387,27 @@ static int parse_archive_args(int argc, const char **argv, int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix) { + int nongit = 0; const struct archiver *ar = NULL; struct archiver_args args; - argc = parse_archive_args(argc, argv, &ar, &args); if (setup_prefix && prefix == NULL) - prefix = setup_git_directory(); + prefix = setup_git_directory_gently(&nongit); + + git_config(git_default_config, NULL); + + argc = parse_archive_args(argc, argv, &ar, &args); + if (nongit) { + /* + * We know this will die() with an error, so we could just + * die ourselves; but its error message will be more specific + * than what we could write here. + */ + setup_git_directory(); + } parse_treeish_arg(argv, &args, prefix); parse_pathspec_arg(argv + 1, &args); - git_config(git_default_config, NULL); - return ar->write_archive(&args); } -- 1.7.5.4.44.g4b107 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCHv2 2/9] archive-tar: don't reload default config options 2011-06-22 1:19 ` [PATCHv2 0/9] configurable tar compressors Jeff King 2011-06-22 1:20 ` [PATCHv2 1/9] archive: reorder option parsing and config reading Jeff King @ 2011-06-22 1:22 ` Jeff King 2011-06-22 1:23 ` [PATCHv2 3/9] archive: refactor list of archive formats Jeff King ` (6 subsequent siblings) 8 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-22 1:22 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev We load our own tar-specific config, and then chain to git_default_config. This is pointless, as our caller should already have loaded the default config. It also introduces a needless inconsistency with the zip archiver, which does not look at the config files at all (and therefore relies on the caller to have loaded config). Signed-off-by: Jeff King <peff@peff.net> --- Last time the tar-filter code was not integrated with the tar code. Since we are now under tar.*, and since the tar code reads the config anyway, it makes sense for us to use that config callback. This was just a little nit I noticed while changing it. archive-tar.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index cee06ce..1ab1a2c 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -231,7 +231,7 @@ static int git_tar_config(const char *var, const char *value, void *cb) } return 0; } - return git_default_config(var, value, cb); + return 0; } int write_tar_archive(struct archiver_args *args) -- 1.7.5.4.44.g4b107 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCHv2 3/9] archive: refactor list of archive formats 2011-06-22 1:19 ` [PATCHv2 0/9] configurable tar compressors Jeff King 2011-06-22 1:20 ` [PATCHv2 1/9] archive: reorder option parsing and config reading Jeff King 2011-06-22 1:22 ` [PATCHv2 2/9] archive-tar: don't reload default config options Jeff King @ 2011-06-22 1:23 ` Jeff King 2011-06-23 17:05 ` Thiago Farina 2011-06-22 1:24 ` [PATCHv2 4/9] archive: pass archiver struct to write_archive callback Jeff King ` (5 subsequent siblings) 8 siblings, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-22 1:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev Most of the tar and zip code was nicely split out into two abstracted files which knew only about their specific formats. The entry point to this code was a single "write archive" function. However, as these basic formats grow more complex (e.g., by handling multiple file extensions and format names), a static list of the entry point functions won't be enough. Instead, let's provide a way for the tar and zip code to tell the main archive code what they support by registering archiver names and functions. Signed-off-by: Jeff King <peff@peff.net> --- New in v2. This turns archivers more into proper objects, rather than a hard-coded list of functions, and makes the rest of the series much cleaner. archive-tar.c | 16 +++++++++++++--- archive-zip.c | 13 ++++++++++++- archive.c | 33 +++++++++++++++++---------------- archive.h | 17 ++++++++++------- 4 files changed, 52 insertions(+), 27 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index 1ab1a2c..930375b 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -234,12 +234,10 @@ static int git_tar_config(const char *var, const char *value, void *cb) return 0; } -int write_tar_archive(struct archiver_args *args) +static int write_tar_archive(struct archiver_args *args) { int err = 0; - git_config(git_tar_config, NULL); - if (args->commit_sha1) err = write_global_extended_header(args); if (!err) @@ -248,3 +246,15 @@ int write_tar_archive(struct archiver_args *args) write_trailer(); return err; } + +static struct archiver tar_archiver = { + "tar", + write_tar_archive, + 0 +}; + +void init_tar_archiver(void) +{ + register_archiver(&tar_archiver); + git_config(git_tar_config, NULL); +} diff --git a/archive-zip.c b/archive-zip.c index cf28504..a776d83 100644 --- a/archive-zip.c +++ b/archive-zip.c @@ -261,7 +261,7 @@ static void dos_time(time_t *time, int *dos_date, int *dos_time) *dos_time = t->tm_sec / 2 + t->tm_min * 32 + t->tm_hour * 2048; } -int write_zip_archive(struct archiver_args *args) +static int write_zip_archive(struct archiver_args *args) { int err; @@ -278,3 +278,14 @@ int write_zip_archive(struct archiver_args *args) return err; } + +static struct archiver zip_archiver = { + "zip", + write_zip_archive, + ARCHIVER_WANT_COMPRESSION_LEVELS +}; + +void init_zip_archiver(void) +{ + register_archiver(&zip_archiver); +} diff --git a/archive.c b/archive.c index 2616676..f0b4e85 100644 --- a/archive.c +++ b/archive.c @@ -14,16 +14,15 @@ static char const * const archive_usage[] = { NULL }; -#define USES_ZLIB_COMPRESSION 1 - -static const struct archiver { - const char *name; - write_archive_fn_t write_archive; - unsigned int flags; -} archivers[] = { - { "tar", write_tar_archive }, - { "zip", write_zip_archive, USES_ZLIB_COMPRESSION }, -}; +static const struct archiver **archivers; +static int nr_archivers; +static int alloc_archivers; + +void register_archiver(struct archiver *ar) +{ + ALLOC_GROW(archivers, nr_archivers + 1, alloc_archivers); + archivers[nr_archivers++] = ar; +} static void format_subst(const struct commit *commit, const char *src, size_t len, @@ -208,9 +207,9 @@ static const struct archiver *lookup_archiver(const char *name) if (!name) return NULL; - for (i = 0; i < ARRAY_SIZE(archivers); i++) { - if (!strcmp(name, archivers[i].name)) - return &archivers[i]; + for (i = 0; i < nr_archivers; i++) { + if (!strcmp(name, archivers[i]->name)) + return archivers[i]; } return NULL; } @@ -355,8 +354,8 @@ static int parse_archive_args(int argc, const char **argv, base = ""; if (list) { - for (i = 0; i < ARRAY_SIZE(archivers); i++) - printf("%s\n", archivers[i].name); + for (i = 0; i < nr_archivers; i++) + printf("%s\n", archivers[i]->name); exit(0); } @@ -369,7 +368,7 @@ static int parse_archive_args(int argc, const char **argv, args->compression_level = Z_DEFAULT_COMPRESSION; if (compression_level != -1) { - if ((*ar)->flags & USES_ZLIB_COMPRESSION) + if ((*ar)->flags & ARCHIVER_WANT_COMPRESSION_LEVELS) args->compression_level = compression_level; else { die("Argument not supported for format '%s': -%d", @@ -395,6 +394,8 @@ int write_archive(int argc, const char **argv, const char *prefix, prefix = setup_git_directory_gently(&nongit); git_config(git_default_config, NULL); + init_tar_archiver(); + init_zip_archiver(); argc = parse_archive_args(argc, argv, &ar, &args); if (nongit) { diff --git a/archive.h b/archive.h index 038ac35..f39cede 100644 --- a/archive.h +++ b/archive.h @@ -14,15 +14,18 @@ struct archiver_args { int compression_level; }; -typedef int (*write_archive_fn_t)(struct archiver_args *); +#define ARCHIVER_WANT_COMPRESSION_LEVELS 1 +struct archiver { + const char *name; + int (*write_archive)(struct archiver_args *); + unsigned flags; +}; +extern void register_archiver(struct archiver *); -typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const unsigned char *sha1, const char *path, size_t pathlen, unsigned int mode, void *buffer, unsigned long size); +extern void init_tar_archiver(void); +extern void init_zip_archiver(void); -/* - * Archive-format specific backends. - */ -extern int write_tar_archive(struct archiver_args *); -extern int write_zip_archive(struct archiver_args *); +typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const unsigned char *sha1, const char *path, size_t pathlen, unsigned int mode, void *buffer, unsigned long size); extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry); extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix); -- 1.7.5.4.44.g4b107 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCHv2 3/9] archive: refactor list of archive formats 2011-06-22 1:23 ` [PATCHv2 3/9] archive: refactor list of archive formats Jeff King @ 2011-06-23 17:05 ` Thiago Farina 2011-06-23 17:30 ` Jeff King 0 siblings, 1 reply; 56+ messages in thread From: Thiago Farina @ 2011-06-23 17:05 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, René Scharfe, J.H., git, git-dev On Tue, Jun 21, 2011 at 10:23 PM, Jeff King <peff@peff.net> wrote: > Most of the tar and zip code was nicely split out into two > abstracted files which knew only about their specific > formats. The entry point to this code was a single "write > archive" function. > > However, as these basic formats grow more complex (e.g., by > handling multiple file extensions and format names), a > static list of the entry point functions won't be enough. > Instead, let's provide a way for the tar and zip code to > tell the main archive code what they support by registering > archiver names and functions. > > Signed-off-by: Jeff King <peff@peff.net> > --- > New in v2. This turns archivers more into proper objects, rather than a > hard-coded list of functions, and makes the rest of the series much > cleaner. > > archive-tar.c | 16 +++++++++++++--- > archive-zip.c | 13 ++++++++++++- > archive.c | 33 +++++++++++++++++---------------- > archive.h | 17 ++++++++++------- > 4 files changed, 52 insertions(+), 27 deletions(-) > > diff --git a/archive-tar.c b/archive-tar.c > index 1ab1a2c..930375b 100644 > --- a/archive-tar.c > +++ b/archive-tar.c > @@ -234,12 +234,10 @@ static int git_tar_config(const char *var, const char *value, void *cb) > return 0; > } > > -int write_tar_archive(struct archiver_args *args) > +static int write_tar_archive(struct archiver_args *args) > { > int err = 0; > > - git_config(git_tar_config, NULL); > - > if (args->commit_sha1) > err = write_global_extended_header(args); > if (!err) > @@ -248,3 +246,15 @@ int write_tar_archive(struct archiver_args *args) > write_trailer(); > return err; > } > + > +static struct archiver tar_archiver = { > + "tar", > + write_tar_archive, > + 0 A named constant instead of 0, like you did with ARCHIVER_WANT_COMPRESSION_LEVELS, would be better? 0 here means the archiver does not want compression? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCHv2 3/9] archive: refactor list of archive formats 2011-06-23 17:05 ` Thiago Farina @ 2011-06-23 17:30 ` Jeff King 0 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-23 17:30 UTC (permalink / raw) To: Thiago Farina; +Cc: Junio C Hamano, René Scharfe, J.H., git, git-dev On Thu, Jun 23, 2011 at 02:05:35PM -0300, Thiago Farina wrote: > > +static struct archiver tar_archiver = { > > + "tar", > > + write_tar_archive, > > + 0 > A named constant instead of 0, like you did with > ARCHIVER_WANT_COMPRESSION_LEVELS, would be better? 0 here means the > archiver does not want compression? It's actually a bit-wise flag, so it is not "no compression", but "no flags". So "0" is fairly idiomatic. Given that it's a static initializer that will default to 0, probably a more readable version would be: static struct archiver tar_archiver = { "tar", write_tar_archive }; -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCHv2 4/9] archive: pass archiver struct to write_archive callback 2011-06-22 1:19 ` [PATCHv2 0/9] configurable tar compressors Jeff King ` (2 preceding siblings ...) 2011-06-22 1:23 ` [PATCHv2 3/9] archive: refactor list of archive formats Jeff King @ 2011-06-22 1:24 ` Jeff King 2011-06-22 1:24 ` [PATCHv2 5/9] archive: move file extension format-guessing lower Jeff King ` (4 subsequent siblings) 8 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-22 1:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev The current archivers are very static; when you are in the write_tar_archive function, you know you are writing a tar. However, to facilitate runtime-configurable archivers that will share a common write function we need to tell the function which archiver was used. As a convenience, we also provide an opaque data pointer in the archiver struct so that individual archivers can put something useful there when they register themselves. Technically they could just use the "name" field to look in an internal map of names to data, but this is much simpler. Signed-off-by: Jeff King <peff@peff.net> --- New in v2; before there was magic special-casing of the tar_filter code. archive-tar.c | 3 ++- archive-zip.c | 3 ++- archive.c | 2 +- archive.h | 3 ++- 4 files changed, 7 insertions(+), 4 deletions(-) diff --git a/archive-tar.c b/archive-tar.c index 930375b..bed9a9b 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -234,7 +234,8 @@ static int git_tar_config(const char *var, const char *value, void *cb) return 0; } -static int write_tar_archive(struct archiver_args *args) +static int write_tar_archive(const struct archiver *ar, + struct archiver_args *args) { int err = 0; diff --git a/archive-zip.c b/archive-zip.c index a776d83..42df660 100644 --- a/archive-zip.c +++ b/archive-zip.c @@ -261,7 +261,8 @@ static void dos_time(time_t *time, int *dos_date, int *dos_time) *dos_time = t->tm_sec / 2 + t->tm_min * 32 + t->tm_hour * 2048; } -static int write_zip_archive(struct archiver_args *args) +static int write_zip_archive(const struct archiver *ar, + struct archiver_args *args) { int err; diff --git a/archive.c b/archive.c index f0b4e85..a0a5beb 100644 --- a/archive.c +++ b/archive.c @@ -410,5 +410,5 @@ int write_archive(int argc, const char **argv, const char *prefix, parse_treeish_arg(argv, &args, prefix); parse_pathspec_arg(argv + 1, &args); - return ar->write_archive(&args); + return ar->write_archive(ar, &args); } diff --git a/archive.h b/archive.h index f39cede..b3cf219 100644 --- a/archive.h +++ b/archive.h @@ -17,8 +17,9 @@ struct archiver_args { #define ARCHIVER_WANT_COMPRESSION_LEVELS 1 struct archiver { const char *name; - int (*write_archive)(struct archiver_args *); + int (*write_archive)(const struct archiver *, struct archiver_args *); unsigned flags; + void *data; }; extern void register_archiver(struct archiver *); -- 1.7.5.4.44.g4b107 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCHv2 5/9] archive: move file extension format-guessing lower 2011-06-22 1:19 ` [PATCHv2 0/9] configurable tar compressors Jeff King ` (3 preceding siblings ...) 2011-06-22 1:24 ` [PATCHv2 4/9] archive: pass archiver struct to write_archive callback Jeff King @ 2011-06-22 1:24 ` Jeff King 2011-06-22 1:25 ` [PATCHv2 6/9] archive: refactor file extension format-guessing Jeff King ` (3 subsequent siblings) 8 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-22 1:24 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev The process for guessing an archive output format based on the filename is something like this: a. parse --output in cmd_archive; check the filename against a static set of mapping heuristics (right now it just matches ".zip" for zip files). b. if found, stick a fake "--format=zip" at the beginning of the arguments list (if the user did specify a --format manually, the later option will override our fake one) c. if it's a remote call, ship the arguments to the remote (including the fake), which will call write_archive on their end d. if it's local, ship the arguments to write_archive locally There are two problems: 1. The set of mappings is static and at too high a level. The write_archive level is going to check config for user-defined formats, some of which will specify extensions. We need to delay lookup until those are parsed, so we can match against them. 2. For a remote archive call, our set of mappings (or formats) may not match the remote side's. This is OK in practice right now, because all versions of git understand "zip" and "tar". But as new formats are added, there is going to be a mismatch between what the client can do and what the remote server can do. To fix (1), this patch refactors the location guessing to happen at the write_archive level, instead of the cmd_archive level. So instead of sticking a fake --format field in the argv list, we actually pass a "name hint" down the callchain; this hint is used at the appropriate time to guess the format (if one hasn't been given already). This patch leaves (2) unfixed. The name_hint is converted to a "--format" option as before, and passed to the remote. This means the local side's idea of how extensions map to formats will take precedence. Another option would be to pass the name hint to the remote side and let the remote choose. This isn't a good idea for two reasons: 1. There's no room in the protocol for passing that information. We can pass a new argument, but older versions of git on the server will choke on it. 2. Letting the remote side decide creates a silent inconsistency in user experience. Consider the case that the locally installed git knows about the "tar.gz" format, but a remote server doesn't. Running "git archive -o foo.tar.gz" will use the tar.gz format. If we use --remote, and the local side chooses the format, then we send "--format=tar.gz" to the remote, which will complain about the unknown format. But if we let the remote side choose the format, then it will realize that it doesn't know about "tar.gz" and output uncompressed tar without even issuing a warning. Signed-off-by: Jeff King <peff@peff.net> --- Code same as v1, but I cleaned up the commit message to be a little less rambling. archive.c | 25 +++++++++++++++++++--- archive.h | 4 ++- builtin/archive.c | 51 ++++++++++++++------------------------------- builtin/upload-archive.c | 2 +- 4 files changed, 41 insertions(+), 41 deletions(-) diff --git a/archive.c b/archive.c index a0a5beb..7d0ca32 100644 --- a/archive.c +++ b/archive.c @@ -298,9 +298,10 @@ static void parse_treeish_arg(const char **argv, PARSE_OPT_NOARG | PARSE_OPT_NONEG | PARSE_OPT_HIDDEN, NULL, (p) } static int parse_archive_args(int argc, const char **argv, - const struct archiver **ar, struct archiver_args *args) + const struct archiver **ar, struct archiver_args *args, + const char *name_hint) { - const char *format = "tar"; + const char *format = NULL; const char *base = NULL; const char *remote = NULL; const char *exec = NULL; @@ -359,6 +360,11 @@ static int parse_archive_args(int argc, const char **argv, exit(0); } + if (!format && name_hint) + format = archive_format_from_filename(name_hint); + if (!format) + format = "tar"; + /* We need at least one parameter -- tree-ish */ if (argc < 1) usage_with_options(archive_usage, opts); @@ -384,7 +390,7 @@ static int parse_archive_args(int argc, const char **argv, } int write_archive(int argc, const char **argv, const char *prefix, - int setup_prefix) + int setup_prefix, const char *name_hint) { int nongit = 0; const struct archiver *ar = NULL; @@ -397,7 +403,7 @@ int write_archive(int argc, const char **argv, const char *prefix, init_tar_archiver(); init_zip_archiver(); - argc = parse_archive_args(argc, argv, &ar, &args); + argc = parse_archive_args(argc, argv, &ar, &args, name_hint); if (nongit) { /* * We know this will die() with an error, so we could just @@ -412,3 +418,14 @@ int write_archive(int argc, const char **argv, const char *prefix, return ar->write_archive(ar, &args); } + +const char *archive_format_from_filename(const char *filename) +{ + const char *ext = strrchr(filename, '.'); + if (!ext) + return NULL; + ext++; + if (!strcasecmp(ext, "zip")) + return "zip"; + return NULL; +} diff --git a/archive.h b/archive.h index b3cf219..202d528 100644 --- a/archive.h +++ b/archive.h @@ -29,6 +29,8 @@ extern void init_zip_archiver(void); typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const unsigned char *sha1, const char *path, size_t pathlen, unsigned int mode, void *buffer, unsigned long size); extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry); -extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix); +extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix, const char *name_hint); + +const char *archive_format_from_filename(const char *filename); #endif /* ARCHIVE_H */ diff --git a/builtin/archive.c b/builtin/archive.c index b14eaba..2578cf5 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -24,7 +24,8 @@ static void create_output_file(const char *output_file) } static int run_remote_archiver(int argc, const char **argv, - const char *remote, const char *exec) + const char *remote, const char *exec, + const char *name_hint) { char buf[LARGE_PACKET_MAX]; int fd[2], i, len, rv; @@ -37,6 +38,17 @@ static int run_remote_archiver(int argc, const char **argv, transport = transport_get(_remote, _remote->url[0]); transport_connect(transport, "git-upload-archive", exec, fd); + /* + * Inject a fake --format field at the beginning of the + * arguments, with the format inferred from our output + * filename. This way explicit --format options can override + * it. + */ + if (name_hint) { + const char *format = archive_format_from_filename(name_hint); + if (format) + packet_write(fd[1], "argument --format=%s\n", format); + } for (i = 1; i < argc; i++) packet_write(fd[1], "argument %s\n", argv[i]); packet_flush(fd[1]); @@ -63,17 +75,6 @@ static int run_remote_archiver(int argc, const char **argv, return !!rv; } -static const char *format_from_name(const char *filename) -{ - const char *ext = strrchr(filename, '.'); - if (!ext) - return NULL; - ext++; - if (!strcasecmp(ext, "zip")) - return "--format=zip"; - return NULL; -} - #define PARSE_OPT_KEEP_ALL ( PARSE_OPT_KEEP_DASHDASH | \ PARSE_OPT_KEEP_ARGV0 | \ PARSE_OPT_KEEP_UNKNOWN | \ @@ -84,7 +85,6 @@ int cmd_archive(int argc, const char **argv, const char *prefix) const char *exec = "git-upload-archive"; const char *output = NULL; const char *remote = NULL; - const char *format_option = NULL; struct option local_opts[] = { OPT_STRING('o', "output", &output, "file", "write the archive to this file"), @@ -98,32 +98,13 @@ int cmd_archive(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, local_opts, NULL, PARSE_OPT_KEEP_ALL); - if (output) { + if (output) create_output_file(output); - format_option = format_from_name(output); - } - - /* - * We have enough room in argv[] to muck it in place, because - * --output must have been given on the original command line - * if we get to this point, and parse_options() must have eaten - * it, i.e. we can add back one element to the array. - * - * We add a fake --format option at the beginning, with the - * format inferred from our output filename. This way explicit - * --format options can override it, and the fake option is - * inserted before any "--" that might have been given. - */ - if (format_option) { - memmove(argv + 2, argv + 1, sizeof(*argv) * argc); - argv[1] = format_option; - argv[++argc] = NULL; - } if (remote) - return run_remote_archiver(argc, argv, remote, exec); + return run_remote_archiver(argc, argv, remote, exec, output); setvbuf(stderr, NULL, _IOLBF, BUFSIZ); - return write_archive(argc, argv, prefix, 1); + return write_archive(argc, argv, prefix, 1, output); } diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c index 73f788e..e6bb97d 100644 --- a/builtin/upload-archive.c +++ b/builtin/upload-archive.c @@ -64,7 +64,7 @@ static int run_upload_archive(int argc, const char **argv, const char *prefix) sent_argv[sent_argc] = NULL; /* parse all options sent by the client */ - return write_archive(sent_argc, sent_argv, prefix, 0); + return write_archive(sent_argc, sent_argv, prefix, 0, NULL); } __attribute__((format (printf, 1, 2))) -- 1.7.5.4.44.g4b107 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCHv2 6/9] archive: refactor file extension format-guessing 2011-06-22 1:19 ` [PATCHv2 0/9] configurable tar compressors Jeff King ` (4 preceding siblings ...) 2011-06-22 1:24 ` [PATCHv2 5/9] archive: move file extension format-guessing lower Jeff King @ 2011-06-22 1:25 ` Jeff King 2011-06-22 1:26 ` [PATCHv2 7/9] archive: implement configurable tar filters Jeff King ` (2 subsequent siblings) 8 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-22 1:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev Git-archive will guess a format from the output filename if no format is explicitly given. The current function just hardcodes "zip" to the zip format, and leaves everything else NULL (which will default to tar). Since we are about to add user-specified formats, we need to be more flexible. The new rule is "if a filename ends with a dot and the name of a format, it matches that format". For the existing "tar" and "zip" formats, this is identical to the current behavior. For new user-specified formats, this will do what the user expects if they name their formats appropriately. Because we will eventually start matching arbitrary user-specified extensions that may include dots, the strrchr search for the final dot is not sufficient. We need to do an actual suffix match with each extension. Signed-off-by: Jeff King <peff@peff.net> --- Similar to v1, except we don't need to special case tar-filter code anymore. archive.c | 25 +++++++++++++++++++------ 1 files changed, 19 insertions(+), 6 deletions(-) diff --git a/archive.c b/archive.c index 7d0ca32..41065a8 100644 --- a/archive.c +++ b/archive.c @@ -419,13 +419,26 @@ int write_archive(int argc, const char **argv, const char *prefix, return ar->write_archive(ar, &args); } +static int match_extension(const char *filename, const char *ext) +{ + int prefixlen = strlen(filename) - strlen(ext); + + /* + * We need 1 character for the '.', and 1 character to ensure that the + * prefix is non-empty (k.e., we don't match .tar.gz with no actual + * filename). + */ + if (prefixlen < 2 || filename[prefixlen-1] != '.') + return 0; + return !strcmp(filename + prefixlen, ext); +} + const char *archive_format_from_filename(const char *filename) { - const char *ext = strrchr(filename, '.'); - if (!ext) - return NULL; - ext++; - if (!strcasecmp(ext, "zip")) - return "zip"; + int i; + + for (i = 0; i < nr_archivers; i++) + if (match_extension(filename, archivers[i]->name)) + return archivers[i]->name; return NULL; } -- 1.7.5.4.44.g4b107 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCHv2 7/9] archive: implement configurable tar filters 2011-06-22 1:19 ` [PATCHv2 0/9] configurable tar compressors Jeff King ` (5 preceding siblings ...) 2011-06-22 1:25 ` [PATCHv2 6/9] archive: refactor file extension format-guessing Jeff King @ 2011-06-22 1:26 ` Jeff King 2011-06-22 1:45 ` Jeff King 2011-06-22 6:09 ` René Scharfe 2011-06-22 1:27 ` [PATCHv2 8/9] archive: provide builtin .tar.gz filter Jeff King 2011-06-22 1:35 ` [PATCHv2 9/9] upload-archive: allow user to turn off filters Jeff King 8 siblings, 2 replies; 56+ messages in thread From: Jeff King @ 2011-06-22 1:26 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev It's common to pipe the tar output produce by "git archive" through gzip or some other compressor. Locally, this can easily be done by using a shell pipe. When requesting a remote archive, though, it cannot be done through the upload-archive interface. This patch allows configurable tar filters, so that one could define a "tar.gz" format that automatically pipes tar output through gzip. Signed-off-by: Jeff King <peff@peff.net> --- This was split across several commits in the previous version of the series, but due to the cleanups it fits nicely into a single commit. Documentation/git-archive.txt | 16 ++++++ archive-tar.c | 107 ++++++++++++++++++++++++++++++++++++++++- t/t5000-tar-tree.sh | 43 ++++++++++++++++ 3 files changed, 165 insertions(+), 1 deletions(-) diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt index 9c750e2..726bf63 100644 --- a/Documentation/git-archive.txt +++ b/Documentation/git-archive.txt @@ -101,6 +101,16 @@ tar.umask:: details. If `--remote` is used then only the configuration of the remote repository takes effect. +tar.<format>.command:: + This variable specifies a shell command through which the tar + output generated by `git archive` should be piped. The command + is executed using the shell with the generated tar file on its + standard input, and should produce the final output on its + standard output. Any compression-level options will be passed + to the command (e.g., "-9"). An output file with the same + extension as `<format>` will be use this format if no other + format is given. + ATTRIBUTES ---------- @@ -149,6 +159,12 @@ git archive -o latest.zip HEAD:: commit on the current branch. Note that the output format is inferred by the extension of the output file. +git config tar.tar.xz.command "xz -c":: + + Configure a "tar.xz" format for making LZMA-compressed tarfiles. + You can use it specifying `--format=tar.xz`, or by creating an + output file like `-o foo.tar.xz`. + SEE ALSO -------- diff --git a/archive-tar.c b/archive-tar.c index bed9a9b..5c30747 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -4,6 +4,7 @@ #include "cache.h" #include "tar.h" #include "archive.h" +#include "run-command.h" #define RECORDSIZE (512) #define BLOCKSIZE (RECORDSIZE * 20) @@ -13,6 +14,9 @@ static unsigned long offset; static int tar_umask = 002; +static int write_tar_filter_archive(const struct archiver *ar, + struct archiver_args *args); + /* writes out the whole block, but only if it is full */ static void write_if_needed(void) { @@ -220,6 +224,60 @@ static int write_global_extended_header(struct archiver_args *args) return err; } +static struct archiver **tar_filters; +static int nr_tar_filters; +static int alloc_tar_filters; + +static struct archiver *find_tar_filter(const char *name, int len) +{ + int i; + for (i = 0; i < nr_tar_filters; i++) { + struct archiver *ar = tar_filters[i]; + if (!strncmp(ar->name, name, len) && !ar->name[len]) + return ar; + } + return NULL; +} + +static int tar_filter_config(const char *var, const char *value, void *data) +{ + struct archiver *ar; + const char *dot; + const char *name; + const char *type; + int namelen; + + if (prefixcmp(var, "tar.")) + return 0; + dot = strrchr(var, '.'); + if (dot == var + 9) + return 0; + + name = var + 4; + namelen = dot - name; + type = dot + 1; + + ar = find_tar_filter(name, namelen); + if (!ar) { + ar = xcalloc(1, sizeof(*ar)); + ar->name = xmemdupz(name, namelen); + ar->write_archive = write_tar_filter_archive; + ar->flags = ARCHIVER_WANT_COMPRESSION_LEVELS; + ALLOC_GROW(tar_filters, nr_tar_filters + 1, alloc_tar_filters); + tar_filters[nr_tar_filters++] = ar; + } + + if (!strcmp(type, "command")) { + if (!value) + return config_error_nonbool(var); + free(ar->data); + ar->data = xstrdup(value); + return 0; + } + + return 0; +} + static int git_tar_config(const char *var, const char *value, void *cb) { if (!strcmp(var, "tar.umask")) { @@ -231,7 +289,8 @@ static int git_tar_config(const char *var, const char *value, void *cb) } return 0; } - return 0; + + return tar_filter_config(var, value, cb); } static int write_tar_archive(const struct archiver *ar, @@ -248,6 +307,45 @@ static int write_tar_archive(const struct archiver *ar, return err; } +static int write_tar_filter_archive(const struct archiver *ar, + struct archiver_args *args) +{ + struct strbuf cmd = STRBUF_INIT; + struct child_process filter; + const char *argv[2]; + int r; + + if (!ar->data) + die("BUG: tar-filter archiver called with no filter defined"); + + strbuf_addstr(&cmd, ar->data); + if (args->compression_level >= 0) + strbuf_addf(&cmd, " -%d", args->compression_level); + + memset(&filter, 0, sizeof(filter)); + argv[0] = cmd.buf; + argv[1] = NULL; + filter.argv = argv; + filter.use_shell = 1; + filter.in = -1; + + if (start_command(&filter) < 0) + die_errno("unable to start '%s' filter", argv[0]); + close(1); + if (dup2(filter.in, 1) < 0) + die_errno("unable to redirect descriptor"); + close(filter.in); + + r = write_tar_archive(ar, args); + + close(1); + if (finish_command(&filter) != 0) + die("'%s' filter reported error", argv[0]); + + strbuf_release(&cmd); + return r; +} + static struct archiver tar_archiver = { "tar", write_tar_archive, @@ -256,6 +354,13 @@ static struct archiver tar_archiver = { void init_tar_archiver(void) { + int i; register_archiver(&tar_archiver); + git_config(git_tar_config, NULL); + for (i = 0; i < nr_tar_filters; i++) { + /* omit any filters that never had a command configured */ + if (tar_filters[i]->data) + register_archiver(tar_filters[i]); + } } diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index cff1b3e..1f90692 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -252,4 +252,47 @@ test_expect_success 'git-archive --prefix=olde-' ' test -f h/olde-a/bin/sh ' +test_expect_success 'setup tar filters' ' + git config tar.tar.foo.command "tr ab ba" && + git config tar.bar.command "tr ab ba" +' + +test_expect_success 'archive --list mentions user filter' ' + git archive --list >output && + grep "^tar\.foo\$" output && + grep "^bar\$" output +' + +test_expect_success 'archive --list shows remote user filters' ' + git archive --list --remote=. >output && + grep "^tar\.foo\$" output && + grep "^bar\$" output +' + +test_expect_success 'invoke tar filter by format' ' + git archive --format=tar.foo HEAD >config.tar.foo && + tr ab ba <config.tar.foo >config.tar && + test_cmp b.tar config.tar && + git archive --format=bar HEAD >config.bar && + tr ab ba <config.bar >config.tar && + test_cmp b.tar config.tar +' + +test_expect_success 'invoke tar filter by extension' ' + git archive -o config-implicit.tar.foo HEAD && + test_cmp config.tar.foo config-implicit.tar.foo && + git archive -o config-implicit.bar HEAD && + test_cmp config.tar.foo config-implicit.bar +' + +test_expect_success 'default output format remains tar' ' + git archive -o config-implicit.baz HEAD && + test_cmp b.tar config-implicit.baz +' + +test_expect_success 'extension matching requires dot' ' + git archive -o config-implicittar.foo HEAD && + test_cmp b.tar config-implicittar.foo +' + test_done -- 1.7.5.4.44.g4b107 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCHv2 7/9] archive: implement configurable tar filters 2011-06-22 1:26 ` [PATCHv2 7/9] archive: implement configurable tar filters Jeff King @ 2011-06-22 1:45 ` Jeff King 2011-06-22 6:09 ` René Scharfe 1 sibling, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-22 1:45 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev On Tue, Jun 21, 2011 at 09:26:31PM -0400, Jeff King wrote: > It's common to pipe the tar output produce by "git archive" > through gzip or some other compressor. Locally, this can > easily be done by using a shell pipe. When requesting a > remote archive, though, it cannot be done through the > upload-archive interface. > > This patch allows configurable tar filters, so that one > could define a "tar.gz" format that automatically pipes tar > output through gzip. > > Signed-off-by: Jeff King <peff@peff.net> > --- > This was split across several commits in the previous version of the > series, but due to the cleanups it fits nicely into a single commit. A few comments on what I took and what I didn't: 1. config is now in tar.<filter>.*; this avoids having yet another config section. However, the flat names we give to "git config" look a little silly. E.g., "tar.tar.gz.command". 2. René suggested compressor as the config name. I wanted to stay away from that name, as this really is about generic filtering. I expect most uses will be compressors, but this is also our method for supporting other container formats via external helpers (e.g., you could convert to cpio on the fly). The word "filter" is generic, but it's also a bit redundant. The whole tar.<name> subsection is about the filter. I chose "command", as that is what is used for external diff in userdiff drivers, and it makes it clear that we are running an external helper. I'm lukewarm on it if somebody wants to argue for something else. 3. There's no config to say you want or don't want -<n> compression levels. We always allow them, and it is up to the tool to complain if it doesn't want them. My reasoning is that most everything either takes them already (e.g., gzip, bzip2, xz), or would require a helper script that can either map them (7z) or reject them (whatever helper somebody might write to convert tar2cpio on the fly). -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCHv2 7/9] archive: implement configurable tar filters 2011-06-22 1:26 ` [PATCHv2 7/9] archive: implement configurable tar filters Jeff King 2011-06-22 1:45 ` Jeff King @ 2011-06-22 6:09 ` René Scharfe 2011-06-22 14:59 ` Jeff King 1 sibling, 1 reply; 56+ messages in thread From: René Scharfe @ 2011-06-22 6:09 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, J.H., git, git-dev Hello, just a quick comment before I drop off the net for a few days. I like the series a lot, especially the refactorings in patches 1 to 6. Am 22.06.2011 03:26, schrieb Jeff King: > diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt > index 9c750e2..726bf63 100644 > --- a/Documentation/git-archive.txt > +++ b/Documentation/git-archive.txt > @@ -101,6 +101,16 @@ tar.umask:: > details. If `--remote` is used then only the configuration of > the remote repository takes effect. > > +tar.<format>.command:: Would switching around format and "command" be better? [tar "command"] tar.gz = gzip -cn tar.xz = xz -c > diff --git a/archive-tar.c b/archive-tar.c > index bed9a9b..5c30747 100644 > --- a/archive-tar.c > +++ b/archive-tar.c > @@ -4,6 +4,7 @@ > #include "cache.h" > #include "tar.h" > #include "archive.h" > +#include "run-command.h" > > #define RECORDSIZE (512) > #define BLOCKSIZE (RECORDSIZE * 20) > @@ -13,6 +14,9 @@ static unsigned long offset; > > static int tar_umask = 002; > > +static int write_tar_filter_archive(const struct archiver *ar, > + struct archiver_args *args); > + > /* writes out the whole block, but only if it is full */ > static void write_if_needed(void) > { > @@ -220,6 +224,60 @@ static int write_global_extended_header(struct archiver_args *args) > return err; > } > > +static struct archiver **tar_filters; > +static int nr_tar_filters; > +static int alloc_tar_filters; > + > +static struct archiver *find_tar_filter(const char *name, int len) > +{ > + int i; > + for (i = 0; i < nr_tar_filters; i++) { > + struct archiver *ar = tar_filters[i]; > + if (!strncmp(ar->name, name, len) && !ar->name[len]) > + return ar; > + } > + return NULL; > +} > + > +static int tar_filter_config(const char *var, const char *value, void *data) > +{ > + struct archiver *ar; > + const char *dot; > + const char *name; > + const char *type; > + int namelen; > + > + if (prefixcmp(var, "tar.")) > + return 0; > + dot = strrchr(var, '.'); > + if (dot == var + 9) > + return 0; > + > + name = var + 4; > + namelen = dot - name; > + type = dot + 1; > + > + ar = find_tar_filter(name, namelen); > + if (!ar) { > + ar = xcalloc(1, sizeof(*ar)); > + ar->name = xmemdupz(name, namelen); > + ar->write_archive = write_tar_filter_archive; > + ar->flags = ARCHIVER_WANT_COMPRESSION_LEVELS; > + ALLOC_GROW(tar_filters, nr_tar_filters + 1, alloc_tar_filters); > + tar_filters[nr_tar_filters++] = ar; > + } > + > + if (!strcmp(type, "command")) { > + if (!value) > + return config_error_nonbool(var); > + free(ar->data); > + ar->data = xstrdup(value); > + return 0; > + } Why not register it right here instead of adding it to the intermediate list? And are duplicates handled properly, e.g. system has "gzip -cn" and local wants "gzip -c"? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCHv2 7/9] archive: implement configurable tar filters 2011-06-22 6:09 ` René Scharfe @ 2011-06-22 14:59 ` Jeff King 0 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-22 14:59 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, J.H., git, git-dev On Wed, Jun 22, 2011 at 08:09:51AM +0200, René Scharfe wrote: > just a quick comment before I drop off the net for a few days. I like > the series a lot, especially the refactorings in patches 1 to 6. Thanks. I didn't know if I was going overboard, but the result looked cleaner to me, so it is good to have a second opinion. > > +tar.<format>.command:: > > Would switching around format and "command" be better? > > [tar "command"] > tar.gz = gzip -cn > tar.xz = xz -c That violates our usual convention that the first and final components are static, and the middle part can contain everything. So doing: git config tar.command.tar.gz "gzip -cn" is going to end up as: [tar "command.tar"] gz = gzip -cn Plus it doesn't leave room for any additional per-command config keys if we want to add them in the future. > > + ar = find_tar_filter(name, namelen); > > + if (!ar) { > > + ar = xcalloc(1, sizeof(*ar)); > > + ar->name = xmemdupz(name, namelen); > > + ar->write_archive = write_tar_filter_archive; > > + ar->flags = ARCHIVER_WANT_COMPRESSION_LEVELS; > > + ALLOC_GROW(tar_filters, nr_tar_filters + 1, alloc_tar_filters); > > + tar_filters[nr_tar_filters++] = ar; > > + } > > + > > + if (!strcmp(type, "command")) { > > + if (!value) > > + return config_error_nonbool(var); > > + free(ar->data); > > + ar->data = xstrdup(value); > > + return 0; > > + } > > Why not register it right here instead of adding it to the intermediate > list? If it were just this patch, you could do that. But as soon as you add more keys (e.g., a later patch adds tar.*.remote), then you run into the situation of getting only part of the config at a time, and maybe not getting the full config for a command at all. For example: [tar "tar.gz"] remote = true would make an archiver with no "command" set. We would need to special-case it everywhere to ignore it when we looked at the list, or later just remove it. This patch takes the approach of having a secondary list of all of the configured bits, and then only registering those that are actually valid. It also keeps the configured and builtin lists separate. Otherwise I have to special-case: [tar "zip"] command = ... to ignore the builtin zip archiver, which I think is not something we want to be able to override in this way. > And are duplicates handled properly, e.g. system has "gzip -cn" > and local wants "gzip -c"? Yes. We look up the archiver in the list of configured ones and overwrite its command field (that's why the .tar.gz patch actually calls the config parser as if you had those lines in your config file, instead of registering static archiver structs). I should probably include a test for that, though. -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCHv2 8/9] archive: provide builtin .tar.gz filter 2011-06-22 1:19 ` [PATCHv2 0/9] configurable tar compressors Jeff King ` (6 preceding siblings ...) 2011-06-22 1:26 ` [PATCHv2 7/9] archive: implement configurable tar filters Jeff King @ 2011-06-22 1:27 ` Jeff King 2011-06-22 1:35 ` [PATCHv2 9/9] upload-archive: allow user to turn off filters Jeff King 8 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-22 1:27 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev This works exactly as if the user had configured it via: [tar "tgz"] command = gzip -cn [tar "tar.gz"] command = gzip -cn but since it is so common, it's convenient to have it builtin without the user needing to do anything. Signed-off-by: Jeff King <peff@peff.net> --- Similar to v1, but rebased. And with docs. Documentation/git-archive.txt | 11 +++++++++++ archive-tar.c | 2 ++ t/t5000-tar-tree.sh | 38 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 51 insertions(+), 0 deletions(-) diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt index 726bf63..8b0080a 100644 --- a/Documentation/git-archive.txt +++ b/Documentation/git-archive.txt @@ -110,6 +110,9 @@ tar.<format>.command:: to the command (e.g., "-9"). An output file with the same extension as `<format>` will be use this format if no other format is given. ++ +The "tar.gz" and "tgz" formats are defined automatically and default to +`gzip -cn`. You may override them with custom commands. ATTRIBUTES ---------- @@ -143,6 +146,14 @@ git archive --format=tar --prefix=git-1.4.0/ v1.4.0 | gzip >git-1.4.0.tar.gz:: Create a compressed tarball for v1.4.0 release. +git archive --format=tar.gz --prefix=git-1.4.0/ v1.4.0 >git-1.4.0.tar.gz:: + + Same as above, but using the builtin tar.gz handling. + +git archive --prefix=git-1.4.0/ -o git-1.4.0.tar.gz v1.4.0:: + + Same as above, but the format is inferred from the output file. + git archive --format=tar --prefix=git-1.4.0/ v1.4.0{caret}\{tree\} | gzip >git-1.4.0.tar.gz:: Create a compressed tarball for v1.4.0 release, but without a diff --git a/archive-tar.c b/archive-tar.c index 5c30747..f470ebe 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -357,6 +357,8 @@ void init_tar_archiver(void) int i; register_archiver(&tar_archiver); + tar_filter_config("tar.tgz.command", "gzip -cn", NULL); + tar_filter_config("tar.tar.gz.command", "gzip -cn", NULL); git_config(git_tar_config, NULL); for (i = 0; i < nr_tar_filters; i++) { /* omit any filters that never had a command configured */ diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 1f90692..070250e 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -26,6 +26,8 @@ commit id embedding: . ./test-lib.sh UNZIP=${UNZIP:-unzip} +GZIP=${GZIP:-gzip} +GUNZIP=${GUNZIP:-gzip -d} SUBSTFORMAT=%H%n @@ -295,4 +297,40 @@ test_expect_success 'extension matching requires dot' ' test_cmp b.tar config-implicittar.foo ' +if $GZIP --version >/dev/null 2>&1; then + test_set_prereq GZIP +else + say "Skipping some tar.gz tests because gzip not found" +fi + +test_expect_success GZIP 'git archive --format=tgz' ' + git archive --format=tgz HEAD >j.tgz +' + +test_expect_success GZIP 'git archive --format=tar.gz' ' + git archive --format=tar.gz HEAD >j1.tar.gz && + test_cmp j.tgz j1.tar.gz +' + +test_expect_success GZIP 'infer tgz from .tgz filename' ' + git archive --output=j2.tgz HEAD && + test_cmp j.tgz j2.tgz +' + +test_expect_success GZIP 'infer tgz from .tar.gz filename' ' + git archive --output=j3.tar.gz HEAD && + test_cmp j.tgz j3.tar.gz +' + +if $GUNZIP --version >/dev/null 2>&1; then + test_set_prereq GUNZIP +else + say "Skipping some tar.gz tests because gunzip was not found" +fi + +test_expect_success GZIP,GUNZIP 'extract tgz file' ' + $GUNZIP -c <j.tgz >j.tar && + test_cmp b.tar j.tar +' + test_done -- 1.7.5.4.44.g4b107 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* [PATCHv2 9/9] upload-archive: allow user to turn off filters 2011-06-22 1:19 ` [PATCHv2 0/9] configurable tar compressors Jeff King ` (7 preceding siblings ...) 2011-06-22 1:27 ` [PATCHv2 8/9] archive: provide builtin .tar.gz filter Jeff King @ 2011-06-22 1:35 ` Jeff King 2011-06-22 3:17 ` Jeff King 8 siblings, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-22 1:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev Some tar filters may be very expensive to run, so sites do not want to expose them via upload-archive. This patch lets users configure tar.<filter>.remote to turn them off. By default, gzip filters are left on, as they are about as expensive as creating zip archives. Signed-off-by: Jeff King <peff@peff.net> --- This is my response to René's: [archive] remoteFormats = tar zip tgz tar.gz It's a little more verbose, but it lets you turn individual formats off and on without having to enumerate the whole list. By itself, this may not be that useful. It seems unlikely that somebody would both to configure a filter globally on a machine serving upload-pack, and then not want it remotely available. You can also disable the builtin gzip filter, but it's not really much more expensive than zip. But two possible follow-on patches would be: 1. tar.remote and zip.remote, to turn those formats off. I can see wanting to turn off _all_ compression, including zip and gzip, if your CPU is terrible (but git servers are almost always I/O bound, so I don't know how common that will be). I can also see wanting to turn off uncompressed tar, because bandwidth is expensive, CPU is cheap, and source code repositories compress well. 2. adding default bzip2 config; this should be off by default for upload-archive Documentation/git-archive.txt | 6 ++++++ archive-tar.c | 11 ++++++++++- archive-zip.c | 2 +- archive.c | 11 ++++++----- archive.h | 3 ++- builtin/archive.c | 2 +- builtin/upload-archive.c | 2 +- t/t5000-tar-tree.sh | 25 ++++++++++++++++++++++--- 8 files changed, 49 insertions(+), 13 deletions(-) diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt index 8b0080a..1320c87 100644 --- a/Documentation/git-archive.txt +++ b/Documentation/git-archive.txt @@ -114,6 +114,12 @@ tar.<format>.command:: The "tar.gz" and "tgz" formats are defined automatically and default to `gzip -cn`. You may override them with custom commands. +tar.<format>.remote:: + If true, enable `<format>` for use by remote clients via + linkgit:git-upload-archive[1]. Defaults to false for + user-defined formats, but true for the "tar.gz" and "tgz" + formats. + ATTRIBUTES ---------- diff --git a/archive-tar.c b/archive-tar.c index f470ebe..01ab43f 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -274,6 +274,13 @@ static int tar_filter_config(const char *var, const char *value, void *data) ar->data = xstrdup(value); return 0; } + if (!strcmp(type, "remote")) { + if (git_config_bool(var, value)) + tf->flags |= ARCHIVER_REMOTE; + else + tf->flags &= ~ARCHIVER_REMOTE; + return 0; + } return 0; } @@ -349,7 +356,7 @@ static int write_tar_filter_archive(const struct archiver *ar, static struct archiver tar_archiver = { "tar", write_tar_archive, - 0 + ARCHIVER_REMOTE }; void init_tar_archiver(void) @@ -358,7 +365,9 @@ void init_tar_archiver(void) register_archiver(&tar_archiver); tar_filter_config("tar.tgz.command", "gzip -cn", NULL); + tar_filter_config("tar.tgz.remote", "true", NULL); tar_filter_config("tar.tar.gz.command", "gzip -cn", NULL); + tar_filter_config("tar.tar.gz.remote", "true", NULL); git_config(git_tar_config, NULL); for (i = 0; i < nr_tar_filters; i++) { /* omit any filters that never had a command configured */ diff --git a/archive-zip.c b/archive-zip.c index 42df660..3c102a1 100644 --- a/archive-zip.c +++ b/archive-zip.c @@ -283,7 +283,7 @@ static int write_zip_archive(const struct archiver *ar, static struct archiver zip_archiver = { "zip", write_zip_archive, - ARCHIVER_WANT_COMPRESSION_LEVELS + ARCHIVER_WANT_COMPRESSION_LEVELS|ARCHIVER_REMOTE }; void init_zip_archiver(void) diff --git a/archive.c b/archive.c index 41065a8..2a7a28e 100644 --- a/archive.c +++ b/archive.c @@ -299,7 +299,7 @@ static void parse_treeish_arg(const char **argv, static int parse_archive_args(int argc, const char **argv, const struct archiver **ar, struct archiver_args *args, - const char *name_hint) + const char *name_hint, int is_remote) { const char *format = NULL; const char *base = NULL; @@ -356,7 +356,8 @@ static int parse_archive_args(int argc, const char **argv, if (list) { for (i = 0; i < nr_archivers; i++) - printf("%s\n", archivers[i]->name); + if (!is_remote || archivers[i]->flags & ARCHIVER_REMOTE) + printf("%s\n", archivers[i]->name); exit(0); } @@ -369,7 +370,7 @@ static int parse_archive_args(int argc, const char **argv, if (argc < 1) usage_with_options(archive_usage, opts); *ar = lookup_archiver(format); - if (!*ar) + if (!*ar || (is_remote && !((*ar)->flags & ARCHIVER_REMOTE))) die("Unknown archive format '%s'", format); args->compression_level = Z_DEFAULT_COMPRESSION; @@ -390,7 +391,7 @@ static int parse_archive_args(int argc, const char **argv, } int write_archive(int argc, const char **argv, const char *prefix, - int setup_prefix, const char *name_hint) + int setup_prefix, const char *name_hint, int remote) { int nongit = 0; const struct archiver *ar = NULL; @@ -403,7 +404,7 @@ int write_archive(int argc, const char **argv, const char *prefix, init_tar_archiver(); init_zip_archiver(); - argc = parse_archive_args(argc, argv, &ar, &args, name_hint); + argc = parse_archive_args(argc, argv, &ar, &args, name_hint, remote); if (nongit) { /* * We know this will die() with an error, so we could just diff --git a/archive.h b/archive.h index 202d528..2b0884f 100644 --- a/archive.h +++ b/archive.h @@ -15,6 +15,7 @@ struct archiver_args { }; #define ARCHIVER_WANT_COMPRESSION_LEVELS 1 +#define ARCHIVER_REMOTE 2 struct archiver { const char *name; int (*write_archive)(const struct archiver *, struct archiver_args *); @@ -29,7 +30,7 @@ extern void init_zip_archiver(void); typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const unsigned char *sha1, const char *path, size_t pathlen, unsigned int mode, void *buffer, unsigned long size); extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry); -extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix, const char *name_hint); +extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix, const char *name_hint, int remote); const char *archive_format_from_filename(const char *filename); diff --git a/builtin/archive.c b/builtin/archive.c index 2578cf5..883c009 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -106,5 +106,5 @@ int cmd_archive(int argc, const char **argv, const char *prefix) setvbuf(stderr, NULL, _IOLBF, BUFSIZ); - return write_archive(argc, argv, prefix, 1, output); + return write_archive(argc, argv, prefix, 1, output, 0); } diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c index e6bb97d..2d0b383 100644 --- a/builtin/upload-archive.c +++ b/builtin/upload-archive.c @@ -64,7 +64,7 @@ static int run_upload_archive(int argc, const char **argv, const char *prefix) sent_argv[sent_argc] = NULL; /* parse all options sent by the client */ - return write_archive(sent_argc, sent_argv, prefix, 0, NULL); + return write_archive(sent_argc, sent_argv, prefix, 0, NULL, 1); } __attribute__((format (printf, 1, 2))) diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 070250e..9e3ba98 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -256,7 +256,8 @@ test_expect_success 'git-archive --prefix=olde-' ' test_expect_success 'setup tar filters' ' git config tar.tar.foo.command "tr ab ba" && - git config tar.bar.command "tr ab ba" + git config tar.bar.command "tr ab ba" && + git config tar.bar.remote true ' test_expect_success 'archive --list mentions user filter' ' @@ -265,9 +266,9 @@ test_expect_success 'archive --list mentions user filter' ' grep "^bar\$" output ' -test_expect_success 'archive --list shows remote user filters' ' +test_expect_success 'archive --list shows only enabled remote filters' ' git archive --list --remote=. >output && - grep "^tar\.foo\$" output && + ! grep "^tar\.foo\$" output && grep "^bar\$" output ' @@ -297,6 +298,13 @@ test_expect_success 'extension matching requires dot' ' test_cmp b.tar config-implicittar.foo ' +test_expect_success 'only enabled filters are available remotely' ' + test_must_fail git archive --remote=. --format=tar.foo HEAD \ + >remote.tar.foo && + git archive --remote=. --format=bar >remote.bar HEAD && + test_cmp remote.bar config.bar +' + if $GZIP --version >/dev/null 2>&1; then test_set_prereq GZIP else @@ -333,4 +341,15 @@ test_expect_success GZIP,GUNZIP 'extract tgz file' ' test_cmp b.tar j.tar ' +test_expect_success GZIP 'remote tar.gz is allowed by default' ' + git archive --remote=. --format=tar.gz HEAD >remote.tar.gz && + test_cmp j.tgz remote.tar.gz +' + +test_expect_success GZIP 'remote tar.gz can be disabled' ' + git config tar.tar.gz.remote false && + test_must_fail git archive --remote=. --format=tar.gz HEAD \ + >remote.tar.gz +' + test_done -- 1.7.5.4.44.g4b107 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCHv2 9/9] upload-archive: allow user to turn off filters 2011-06-22 1:35 ` [PATCHv2 9/9] upload-archive: allow user to turn off filters Jeff King @ 2011-06-22 3:17 ` Jeff King 0 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-22 3:17 UTC (permalink / raw) To: Junio C Hamano; +Cc: René Scharfe, J.H., git, git-dev On Tue, Jun 21, 2011 at 09:35:45PM -0400, Jeff King wrote: > Some tar filters may be very expensive to run, so sites do > not want to expose them via upload-archive. This patch lets > users configure tar.<filter>.remote to turn them off. > > By default, gzip filters are left on, as they are about as > expensive as creating zip archives. Argh, sorry, wrong version of the patch. This one has a slight refactoring typo that makes it not compile. Correct patch is below. -- >8 -- Subject: upload-archive: allow user to turn off filters Some tar filters may be very expensive to run, so sites do not want to expose them via upload-archive. This patch lets users configure tar.<filter>.remote to turn them off. By default, gzip filters are left on, as they are about as expensive as creating zip archives. Signed-off-by: Jeff King <peff@peff.net> --- Documentation/git-archive.txt | 6 ++++++ archive-tar.c | 11 ++++++++++- archive-zip.c | 2 +- archive.c | 11 ++++++----- archive.h | 3 ++- builtin/archive.c | 2 +- builtin/upload-archive.c | 2 +- t/t5000-tar-tree.sh | 25 ++++++++++++++++++++++--- 8 files changed, 49 insertions(+), 13 deletions(-) diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt index 8b0080a..1320c87 100644 --- a/Documentation/git-archive.txt +++ b/Documentation/git-archive.txt @@ -114,6 +114,12 @@ tar.<format>.command:: The "tar.gz" and "tgz" formats are defined automatically and default to `gzip -cn`. You may override them with custom commands. +tar.<format>.remote:: + If true, enable `<format>` for use by remote clients via + linkgit:git-upload-archive[1]. Defaults to false for + user-defined formats, but true for the "tar.gz" and "tgz" + formats. + ATTRIBUTES ---------- diff --git a/archive-tar.c b/archive-tar.c index f470ebe..20af005 100644 --- a/archive-tar.c +++ b/archive-tar.c @@ -274,6 +274,13 @@ static int tar_filter_config(const char *var, const char *value, void *data) ar->data = xstrdup(value); return 0; } + if (!strcmp(type, "remote")) { + if (git_config_bool(var, value)) + ar->flags |= ARCHIVER_REMOTE; + else + ar->flags &= ~ARCHIVER_REMOTE; + return 0; + } return 0; } @@ -349,7 +356,7 @@ static int write_tar_filter_archive(const struct archiver *ar, static struct archiver tar_archiver = { "tar", write_tar_archive, - 0 + ARCHIVER_REMOTE }; void init_tar_archiver(void) @@ -358,7 +365,9 @@ void init_tar_archiver(void) register_archiver(&tar_archiver); tar_filter_config("tar.tgz.command", "gzip -cn", NULL); + tar_filter_config("tar.tgz.remote", "true", NULL); tar_filter_config("tar.tar.gz.command", "gzip -cn", NULL); + tar_filter_config("tar.tar.gz.remote", "true", NULL); git_config(git_tar_config, NULL); for (i = 0; i < nr_tar_filters; i++) { /* omit any filters that never had a command configured */ diff --git a/archive-zip.c b/archive-zip.c index 42df660..3c102a1 100644 --- a/archive-zip.c +++ b/archive-zip.c @@ -283,7 +283,7 @@ static int write_zip_archive(const struct archiver *ar, static struct archiver zip_archiver = { "zip", write_zip_archive, - ARCHIVER_WANT_COMPRESSION_LEVELS + ARCHIVER_WANT_COMPRESSION_LEVELS|ARCHIVER_REMOTE }; void init_zip_archiver(void) diff --git a/archive.c b/archive.c index 41065a8..2a7a28e 100644 --- a/archive.c +++ b/archive.c @@ -299,7 +299,7 @@ static void parse_treeish_arg(const char **argv, static int parse_archive_args(int argc, const char **argv, const struct archiver **ar, struct archiver_args *args, - const char *name_hint) + const char *name_hint, int is_remote) { const char *format = NULL; const char *base = NULL; @@ -356,7 +356,8 @@ static int parse_archive_args(int argc, const char **argv, if (list) { for (i = 0; i < nr_archivers; i++) - printf("%s\n", archivers[i]->name); + if (!is_remote || archivers[i]->flags & ARCHIVER_REMOTE) + printf("%s\n", archivers[i]->name); exit(0); } @@ -369,7 +370,7 @@ static int parse_archive_args(int argc, const char **argv, if (argc < 1) usage_with_options(archive_usage, opts); *ar = lookup_archiver(format); - if (!*ar) + if (!*ar || (is_remote && !((*ar)->flags & ARCHIVER_REMOTE))) die("Unknown archive format '%s'", format); args->compression_level = Z_DEFAULT_COMPRESSION; @@ -390,7 +391,7 @@ static int parse_archive_args(int argc, const char **argv, } int write_archive(int argc, const char **argv, const char *prefix, - int setup_prefix, const char *name_hint) + int setup_prefix, const char *name_hint, int remote) { int nongit = 0; const struct archiver *ar = NULL; @@ -403,7 +404,7 @@ int write_archive(int argc, const char **argv, const char *prefix, init_tar_archiver(); init_zip_archiver(); - argc = parse_archive_args(argc, argv, &ar, &args, name_hint); + argc = parse_archive_args(argc, argv, &ar, &args, name_hint, remote); if (nongit) { /* * We know this will die() with an error, so we could just diff --git a/archive.h b/archive.h index 202d528..2b0884f 100644 --- a/archive.h +++ b/archive.h @@ -15,6 +15,7 @@ struct archiver_args { }; #define ARCHIVER_WANT_COMPRESSION_LEVELS 1 +#define ARCHIVER_REMOTE 2 struct archiver { const char *name; int (*write_archive)(const struct archiver *, struct archiver_args *); @@ -29,7 +30,7 @@ extern void init_zip_archiver(void); typedef int (*write_archive_entry_fn_t)(struct archiver_args *args, const unsigned char *sha1, const char *path, size_t pathlen, unsigned int mode, void *buffer, unsigned long size); extern int write_archive_entries(struct archiver_args *args, write_archive_entry_fn_t write_entry); -extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix, const char *name_hint); +extern int write_archive(int argc, const char **argv, const char *prefix, int setup_prefix, const char *name_hint, int remote); const char *archive_format_from_filename(const char *filename); diff --git a/builtin/archive.c b/builtin/archive.c index 2578cf5..883c009 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -106,5 +106,5 @@ int cmd_archive(int argc, const char **argv, const char *prefix) setvbuf(stderr, NULL, _IOLBF, BUFSIZ); - return write_archive(argc, argv, prefix, 1, output); + return write_archive(argc, argv, prefix, 1, output, 0); } diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c index e6bb97d..2d0b383 100644 --- a/builtin/upload-archive.c +++ b/builtin/upload-archive.c @@ -64,7 +64,7 @@ static int run_upload_archive(int argc, const char **argv, const char *prefix) sent_argv[sent_argc] = NULL; /* parse all options sent by the client */ - return write_archive(sent_argc, sent_argv, prefix, 0, NULL); + return write_archive(sent_argc, sent_argv, prefix, 0, NULL, 1); } __attribute__((format (printf, 1, 2))) diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh index 070250e..9e3ba98 100755 --- a/t/t5000-tar-tree.sh +++ b/t/t5000-tar-tree.sh @@ -256,7 +256,8 @@ test_expect_success 'git-archive --prefix=olde-' ' test_expect_success 'setup tar filters' ' git config tar.tar.foo.command "tr ab ba" && - git config tar.bar.command "tr ab ba" + git config tar.bar.command "tr ab ba" && + git config tar.bar.remote true ' test_expect_success 'archive --list mentions user filter' ' @@ -265,9 +266,9 @@ test_expect_success 'archive --list mentions user filter' ' grep "^bar\$" output ' -test_expect_success 'archive --list shows remote user filters' ' +test_expect_success 'archive --list shows only enabled remote filters' ' git archive --list --remote=. >output && - grep "^tar\.foo\$" output && + ! grep "^tar\.foo\$" output && grep "^bar\$" output ' @@ -297,6 +298,13 @@ test_expect_success 'extension matching requires dot' ' test_cmp b.tar config-implicittar.foo ' +test_expect_success 'only enabled filters are available remotely' ' + test_must_fail git archive --remote=. --format=tar.foo HEAD \ + >remote.tar.foo && + git archive --remote=. --format=bar >remote.bar HEAD && + test_cmp remote.bar config.bar +' + if $GZIP --version >/dev/null 2>&1; then test_set_prereq GZIP else @@ -333,4 +341,15 @@ test_expect_success GZIP,GUNZIP 'extract tgz file' ' test_cmp b.tar j.tar ' +test_expect_success GZIP 'remote tar.gz is allowed by default' ' + git archive --remote=. --format=tar.gz HEAD >remote.tar.gz && + test_cmp j.tgz remote.tar.gz +' + +test_expect_success GZIP 'remote tar.gz can be disabled' ' + git config tar.tar.gz.remote false && + test_must_fail git archive --remote=. --format=tar.gz HEAD \ + >remote.tar.gz +' + test_done -- 1.7.5.4.44.g4b107 ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [RFC/PATCH 0/7] user-configurable git-archive output formats 2011-06-18 14:52 ` [RFC/PATCH 0/7] user-configurable git-archive output formats René Scharfe 2011-06-18 15:28 ` Jakub Narebski 2011-06-20 15:58 ` Junio C Hamano @ 2011-06-21 16:01 ` Jeff King 2 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-21 16:01 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, J.H., git, git-dev On Sat, Jun 18, 2011 at 04:52:02PM +0200, René Scharfe wrote: > >> The gzip path is not configurable at all. Probably it should read the > >> path and arguments from the config file. In fact, we could even allow > >> arbitrary config like: > >> > >> [tarfilter "tgz"] > >> command = gzip -c > >> extension = tgz > >> extension = tar.gz > > Configuration options whose values are appended instead of overwritten > by duplicate definitions are a new concept for git, I think. Perhaps > it's not a big thing, but I think it's better avoided. > > The only (stupid) practical shortcoming I can think if is this, though: > You can't remove anything from the list of supported extensions in a > user config if the system config already contains e.g. tgz and tar.gz. Yeah, I have mixed feelings on that. As Jakub pointed out, we already have them in several places. I don't know that removal is that big a deal in this instance. If we did want to support it, I think it would make more sense to have a generic solution at the config level, like: [some-section] multivalue = foo multivalue = bar !multivalue multivalue = baz multivalue = whee at which point the value is ("baz", "whee"). That matches what we do on the command line, where: git foo --multivalue=foo --multivalue=bar --no-multivalue \ --multivalue=baz --multivalue=whee handles the same issue in a similar way. The other option, of course, is having a single value with list semantics. But then you have to invent separator syntax. In this instance whitespace would probably be fine, but I'd rather that each new multi-valued option did not invent its own syntax, and in the general case you may need to handle quoting. Plus you may need some kind of append syntax. For example, if we support "tgz" and "tar.gz" internally, how do you say 'add "pax.gz"' to that list without reiterating the whole list? > The pax format is identical to the ustar format, which --format=tar > produces. The other major format that comes to mind is cpio. The > (never merged) predecessor of tar-tree actually used that format. Thanks, cpio is probably the most likely example. > Since then I have been waiting for users to request being able to export > using cpio format (which is simpler and slightly smaller than tar), but > that never happened. It seems the existence of the pax format really > has pacified the tar vs. cpio war of old. Fair enough. I haven't heard anybody clamoring for it either. I just didn't want to paint us into a corner. Since it seems like the most likely format and nobody really wants it, it's perhaps not worth worrying about. > I'm not sure "filter" is a good name, though. We have core.pager, which > is technically a filter as well, but for a specific purpose. Yeah, any name would have to be "archive filter" or similar. But I would think being under the "tar" section would be enough to disambiguate it. > And we have the tar.umask setting as a precedence for format specfic > config options. So how about tar.<extension>.compressor? > > [tar "tgz"] > compressor = gzip -cn > [tar "tar.gz"] > compressor = gzip -cn > [tar "tar.bz2"] > compressor = bzip2 -c My two complaints are: 1. The user has to repeat themselves in describing the command for multiple extensions. In practice, that's probably not a big deal, though. 2. The namespace for user-defined extensions is the same as the namespace for tar options. I guess we can disambiguate based on the number of dots (so, e.g., I know that "tar.umask" is not the umask extension, because it doesn't have a third component). It does limit us a little bit for adding future options. I don't know if it's worth caring about. We have the same problem with the diff.* namespace (e.g., diff.color.* exists, but is not a userdiff driver). In that case, besides the code being a little careful to be tolerant of the clash, I don't think it has been a problem. > We don't need a compressionlevels option here because we can simply > assume that the compressor commands do support them. But we discussed elsewhere the concept of a tar-to-7z filter. I'm not sure I'd call that a "compressor" as much as a filter. And it wouldn't want the compression-level options (or maybe you would; I don't use it, but skimming the manpage, it looks like you would want to convert -5 into "-mx=5"; so maybe you would want a wrapper script anyway). > (Side note: this is not fully true for bzip2, as it doesn't support > -0, but I don't think this is worth special consideration in our code, > as long as errors of the filter are displayed properly.) Yeah, I think that can be ignored. bzip can take care of complaining itself. > And we can also add a config option to restrict the formats creatable by > upload-archive, to address concerns over DoS attacks with expensive > compressors: > > [archive] > remoteFormats = tar zip tgz tar.gz Right. It does have the ad-hoc list syntax I complained about above, though. -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC/PATCH 0/7] user-configurable git-archive output formats 2011-06-15 22:30 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King ` (7 preceding siblings ...) 2011-06-18 14:52 ` [RFC/PATCH 0/7] user-configurable git-archive output formats René Scharfe @ 2011-06-18 15:40 ` René Scharfe 8 siblings, 0 replies; 56+ messages in thread From: René Scharfe @ 2011-06-18 15:40 UTC (permalink / raw) To: git; +Cc: Junio C Hamano, J.H., git, git-dev Am 16.06.2011 00:30, schrieb Jeff King: > 2. In theory you might want to plug in external helpers that are not > just stream filters, but actually their own container formats (like > zip). I think people who want 7zip would want this. > > But how does git-archive interact with the helper? By definition > the data it wants is the set of files, not a single stream. So > either: > > a. We give the helper a temporary exported checkout, and it > generates the stream from that. > > b. We use tar as the lingua franca of streaming file containers, > and let the helper deal with converting to its preferred > output format. > > Option (a) seems horribly inefficient on disk I/O. And if we did > want to do that, I think it's largely unrelated to this patch > series. > > You can actually do option (b) with this series. In its worst > case, you can do the same as (a): just untar into a temporary > directory and compress from there. But a well-written helper could > convert tar into the output format on the fly. Both can be done today, locally. One just needs to add a tar-to-7z/any program for 2.b). :) Currently the easiest way to apply LZMA compression would be to pipe --format=tar through the xz utils. This method is supported by your patches and the result can be read by 7-Zip. It should be good enough for most uses; I think we can disregard the whole point 2 until users start to ask for these other formats or use cases. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/2] archive: support gzipped tar files 2011-06-14 18:18 ` [PATCH 2/2] archive: support gzipped tar files Jeff King 2011-06-14 19:25 ` J.H. 2011-06-14 19:39 ` René Scharfe @ 2011-06-14 20:30 ` Junio C Hamano 2011-06-14 20:49 ` Jeff King 2 siblings, 1 reply; 56+ messages in thread From: Junio C Hamano @ 2011-06-14 20:30 UTC (permalink / raw) To: Jeff King; +Cc: git, René Scharfe, git-dev Jeff King <peff@peff.net> writes: I didn't know it was that easy (primarily because I didn't know zlib had a ready-to-eat interface to do this). > + if (!strcasecmp(ext, "tgz")) > + return "--format=tgz"; > + if (!strcasecmp(ext, "gz") && > + ext - 4 >= filename && > + !strcasecmp(ext - 4, "tar.gz")) Shouldn't this be if (!strcasecmp(ext, "gz") && filename < ext - 5 && !strcasecmp(ext - 5, ".tar.gz")) to exclude "hellotar.gz" and possibly ".tar.gz" ("<=" vs "<")? > diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh > index cff1b3e..faf2784 100755 > --- a/t/t5000-tar-tree.sh > +++ b/t/t5000-tar-tree.sh > @@ -26,6 +26,7 @@ commit id embedding: > > . ./test-lib.sh > UNZIP=${UNZIP:-unzip} > +GUNZIP=${GUNZIP:-gunzip} Just a personal preference but I find myself using "gzip -d" more often than "gunzip". ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/2] archive: support gzipped tar files 2011-06-14 20:30 ` [PATCH 2/2] archive: support gzipped tar files Junio C Hamano @ 2011-06-14 20:49 ` Jeff King 2011-06-14 23:40 ` Miles Bader 0 siblings, 1 reply; 56+ messages in thread From: Jeff King @ 2011-06-14 20:49 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, René Scharfe, git-dev On Tue, Jun 14, 2011 at 01:30:47PM -0700, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > I didn't know it was that easy (primarily because I didn't know zlib had a > ready-to-eat interface to do this). Yes, though I think it may be worth doing the more flexible, external-filters approach. See elsewhere in the thread. > > + if (!strcasecmp(ext, "tgz")) > > + return "--format=tgz"; > > + if (!strcasecmp(ext, "gz") && > > + ext - 4 >= filename && > > + !strcasecmp(ext - 4, "tar.gz")) > > Shouldn't this be > > if (!strcasecmp(ext, "gz") && filename < ext - 5 && > !strcasecmp(ext - 5, ".tar.gz")) > > to exclude "hellotar.gz" and possibly ".tar.gz" ("<=" vs "<")? Yeah, definitely. I think the right way forward is to make this less hard-coded, to allow people to configure new filters, so this bit of code will get rewritten in my next version. But I'll make sure it is a little more strict on extension matching. -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/2] archive: support gzipped tar files 2011-06-14 20:49 ` Jeff King @ 2011-06-14 23:40 ` Miles Bader 2011-06-15 22:46 ` Jeff King 0 siblings, 1 reply; 56+ messages in thread From: Miles Bader @ 2011-06-14 23:40 UTC (permalink / raw) To: Jeff King; +Cc: Junio C Hamano, git, René Scharfe, git-dev Jeff King <peff@github.com> writes: >> I didn't know it was that easy (primarily because I didn't know zlib had a >> ready-to-eat interface to do this). > > Yes, though I think it may be worth doing the more flexible, > external-filters approach. See elsewhere in the thread. Given the relatively trivial code, isn't it worth doing both...? One method for flexibility/multi-threaded-speed, the other for portability/robustness (doesn't depend on configuration / setup details)... -Miles -- "Though they may have different meanings, the cries of 'Yeeeee-haw!' and 'Allahu akbar!' are, in spirit, not actually all that different." ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH 2/2] archive: support gzipped tar files 2011-06-14 23:40 ` Miles Bader @ 2011-06-15 22:46 ` Jeff King 0 siblings, 0 replies; 56+ messages in thread From: Jeff King @ 2011-06-15 22:46 UTC (permalink / raw) To: Miles Bader; +Cc: Junio C Hamano, git, René Scharfe, git-dev On Wed, Jun 15, 2011 at 08:40:22AM +0900, Miles Bader wrote: > Jeff King <peff@github.com> writes: > >> I didn't know it was that easy (primarily because I didn't know zlib had a > >> ready-to-eat interface to do this). > > > > Yes, though I think it may be worth doing the more flexible, > > external-filters approach. See elsewhere in the thread. > > Given the relatively trivial code, isn't it worth doing both...? > > One method for flexibility/multi-threaded-speed, the other for > portability/robustness (doesn't depend on configuration / setup > details)... Maybe, although the code is a little less trivial than I hoped (see René's response for some bugs in my original series). My series allowing external filters via configuration also comes with builtin config for gzip. So there's no extra config or setup details for the user, assuming they can run "gzip" from their PATH. So I think the only disadvantage now is for people who don't have gzip at all. I suspect people on such platforms are going to want another format anyway, but we could still help them out. However, I think instead of building it specially into the archive code, it would be cleaner to simply ship a bare-bones version of gzip that only stdio (i.e., a "git-gzip"). It would be no more code than what the internal solution would be, it could be easier to read (since the program is self-contained), and it benefits from the SMP speedup. Although at the point we are shipping "git-gzip", I really have to wonder if people wouldn't prefer to just install gzip themselves. So I'm inclined to wait until somebody complains that git+zlib are easy to get on their system, but gzip isn't. -Peff ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2011-06-23 17:31 UTC | newest] Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-06-14 18:17 [PATCH 1/2] archive: factor out write phase of tar format Jeff King 2011-06-14 18:18 ` [PATCH 2/2] archive: support gzipped tar files Jeff King 2011-06-14 19:25 ` J.H. 2011-06-14 19:30 ` Jeff King 2011-06-14 19:39 ` René Scharfe 2011-06-14 20:14 ` Jeff King 2011-06-14 20:45 ` Jeff King 2011-06-15 22:30 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King 2011-06-15 22:31 ` [PATCH 1/7] archive: reorder option parsing and config reading Jeff King 2011-06-15 22:33 ` [PATCH 2/7] archive: add user-configurable tar-filter infrastructure Jeff King 2011-06-15 23:33 ` Junio C Hamano 2011-06-16 0:29 ` Jeff King 2011-06-15 22:33 ` [PATCH 3/7] archive: support user tar-filters via --format Jeff King 2011-06-15 22:33 ` [PATCH 4/7] archive: advertise user tar-filters in --list Jeff King 2011-06-15 22:34 ` [PATCH 5/7] archive: refactor format-guessing from filename Jeff King 2011-06-15 23:48 ` Junio C Hamano 2011-06-16 0:34 ` Jeff King 2011-06-15 22:34 ` [PATCH 6/7] archive: match extensions from user-configured formats Jeff King 2011-06-15 22:35 ` [PATCH 7/7] archive: provide builtin .tar.gz filter Jeff King 2011-06-15 23:55 ` Junio C Hamano 2011-06-15 23:57 ` Junio C Hamano 2011-06-16 0:38 ` Jeff King 2011-06-16 6:27 ` Junio C Hamano 2011-06-16 6:51 ` Jeff King 2011-06-16 7:56 ` Chris Webb 2011-06-16 17:46 ` Jeff King 2011-06-16 18:02 ` Junio C Hamano 2011-06-16 18:21 ` Jeff King 2011-06-16 18:27 ` John Szakmeister 2011-06-16 18:42 ` Junio C Hamano 2011-06-16 18:57 ` Jeff King 2011-06-18 14:52 ` [RFC/PATCH 0/7] user-configurable git-archive output formats René Scharfe 2011-06-18 15:28 ` Jakub Narebski 2011-06-20 15:58 ` Junio C Hamano 2011-06-22 1:19 ` [PATCHv2 0/9] configurable tar compressors Jeff King 2011-06-22 1:20 ` [PATCHv2 1/9] archive: reorder option parsing and config reading Jeff King 2011-06-22 1:22 ` [PATCHv2 2/9] archive-tar: don't reload default config options Jeff King 2011-06-22 1:23 ` [PATCHv2 3/9] archive: refactor list of archive formats Jeff King 2011-06-23 17:05 ` Thiago Farina 2011-06-23 17:30 ` Jeff King 2011-06-22 1:24 ` [PATCHv2 4/9] archive: pass archiver struct to write_archive callback Jeff King 2011-06-22 1:24 ` [PATCHv2 5/9] archive: move file extension format-guessing lower Jeff King 2011-06-22 1:25 ` [PATCHv2 6/9] archive: refactor file extension format-guessing Jeff King 2011-06-22 1:26 ` [PATCHv2 7/9] archive: implement configurable tar filters Jeff King 2011-06-22 1:45 ` Jeff King 2011-06-22 6:09 ` René Scharfe 2011-06-22 14:59 ` Jeff King 2011-06-22 1:27 ` [PATCHv2 8/9] archive: provide builtin .tar.gz filter Jeff King 2011-06-22 1:35 ` [PATCHv2 9/9] upload-archive: allow user to turn off filters Jeff King 2011-06-22 3:17 ` Jeff King 2011-06-21 16:01 ` [RFC/PATCH 0/7] user-configurable git-archive output formats Jeff King 2011-06-18 15:40 ` René Scharfe 2011-06-14 20:30 ` [PATCH 2/2] archive: support gzipped tar files Junio C Hamano 2011-06-14 20:49 ` Jeff King 2011-06-14 23:40 ` Miles Bader 2011-06-15 22:46 ` Jeff King
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.