All of lore.kernel.org
 help / color / mirror / Atom feed
* zip files created with git archive flags text files as binaries
@ 2015-02-23 13:58 Ulrike Fischer
  2015-02-23 19:30 ` René Scharfe
  2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe
  0 siblings, 2 replies; 8+ messages in thread
From: Ulrike Fischer @ 2015-02-23 13:58 UTC (permalink / raw)
  To: git

I'm using git on windows 7.

$ git --version
git version 1.9.4.msysgit.0

Some days ago I uploaded a latex package to CTAN (www.ctan.org). 
I created the zip-file with 

git archive --format=zip --prefix=citeall/
--output=zip/citeall_2015-02-20.zip HEAD 

The zip contained four text files and a pdf. 

The CTAN maintainers informed me that all files in the zip are
flagged as binaries and this makes it difficult for them to process
them further (they want to correct line feeds of the text files:
http://mirror.ctan.org/tex-archive/help/ctan/CTAN-upload-addendum.html#crlf) 

unzip -Z reports for my zip:

$ unzip -Z citeall_2015_02_20.zip
Archive:  citeall_2015_02_20.zip
Zip file size: 105509 bytes, number of entries: 6
drwx---     0.0 fat        0 bx stor 15-Feb-20 17:07 citeall/
-rw----     0.0 fat      458 bx defN 15-Feb-20 17:07 citeall/README
-rw----     0.0 fat   102244 bx defN 15-Feb-20 17:07
citeall/citeall.pdf
-rw----     0.0 fat     3431 bx defN 15-Feb-20 17:07
citeall/citeall.sty
-rw----     0.0 fat     3971 bx defN 15-Feb-20 17:07
citeall/citeall.tex
-rw----     0.0 fat      557 bx defN 15-Feb-20 17:07
citeall/examples-citeall.bib
6 files, 110661 bytes uncompressed, 104669 bytes compressed:  5.4%

The problem are all the "bx" entries. 

When I zip all the files with the standard windows zip-tool I get
this:

$ unzip -Z citeall-win.zip
Archive:  citeall-win.zip
Zip file size: 105275 bytes, number of entries: 5
-rw----     2.0 fat   102244 b- defN 15-Feb-20 17:07
citeall/citeall.pdf
-rw----     2.0 fat     3431 t- defN 15-Feb-20 17:07
citeall/citeall.sty
-rw----     2.0 fat     3971 t- defN 15-Feb-20 17:07
citeall/citeall.tex
-rw----     2.0 fat      557 t- defN 15-Feb-20 17:07
citeall/examples-citeall.bib
-rw----     2.0 fat      458 t- defN 15-Feb-20 17:07 citeall/README
5 files, 110661 bytes uncompressed, 104675 bytes compressed:  5.4%

Here the text files have a correct t flag. 

I don't know if it the problem exists also with zips created with
git archive on non-windows OS.

Would it be possible to correct the zip-backend so that it flags
text files correctly? Or alternativly could one configure git
archive to use another zip programm? 

(I tried to sent this already some hours ago, but the message seems
to be lost ...)

-- 
Ulrike Fischer 
http://www.troubleshooting-tex.de/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zip files created with git archive flags text files as binaries
  2015-02-23 13:58 zip files created with git archive flags text files as binaries Ulrike Fischer
@ 2015-02-23 19:30 ` René Scharfe
  2015-03-04 21:13   ` René Scharfe
  2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe
  1 sibling, 1 reply; 8+ messages in thread
From: René Scharfe @ 2015-02-23 19:30 UTC (permalink / raw)
  To: luatex, git

Am 23.02.2015 um 14:58 schrieb Ulrike Fischer:
> I'm using git on windows 7.
>
> $ git --version
> git version 1.9.4.msysgit.0

Git's code for ZIP file creation hasn't changed since then.

> Some days ago I uploaded a latex package to CTAN (www.ctan.org).
> I created the zip-file with
>
> git archive --format=zip --prefix=citeall/
> --output=zip/citeall_2015-02-20.zip HEAD
>
> The zip contained four text files and a pdf.
>
> The CTAN maintainers informed me that all files in the zip are
> flagged as binaries and this makes it difficult for them to process
> them further (they want to correct line feeds of the text files:
> http://mirror.ctan.org/tex-archive/help/ctan/CTAN-upload-addendum.html#crlf)

I wouldn't have expected that this ZIP file attribute is actually used 
in the wild.

> unzip -Z reports for my zip:
>
> $ unzip -Z citeall_2015_02_20.zip
> Archive:  citeall_2015_02_20.zip
> Zip file size: 105509 bytes, number of entries: 6
> drwx---     0.0 fat        0 bx stor 15-Feb-20 17:07 citeall/
> -rw----     0.0 fat      458 bx defN 15-Feb-20 17:07 citeall/README
> -rw----     0.0 fat   102244 bx defN 15-Feb-20 17:07
> citeall/citeall.pdf
> -rw----     0.0 fat     3431 bx defN 15-Feb-20 17:07
> citeall/citeall.sty
> -rw----     0.0 fat     3971 bx defN 15-Feb-20 17:07
> citeall/citeall.tex
> -rw----     0.0 fat      557 bx defN 15-Feb-20 17:07
> citeall/examples-citeall.bib
> 6 files, 110661 bytes uncompressed, 104669 bytes compressed:  5.4%
>
> The problem are all the "bx" entries.

The "x" is due to the extra mtime header and unrelated to your issue.

"b" (binary) is set unconditionally to ensure that line endings are kept 
intact by unpackers on all platforms.

"fat" is used as lowest common denominator for regular files and 
directories; "unx" is used for symbolic links.

> When I zip all the files with the standard windows zip-tool I get
> this:
>
> $ unzip -Z citeall-win.zip
> Archive:  citeall-win.zip
> Zip file size: 105275 bytes, number of entries: 5
> -rw----     2.0 fat   102244 b- defN 15-Feb-20 17:07
> citeall/citeall.pdf
> -rw----     2.0 fat     3431 t- defN 15-Feb-20 17:07
> citeall/citeall.sty
> -rw----     2.0 fat     3971 t- defN 15-Feb-20 17:07
> citeall/citeall.tex
> -rw----     2.0 fat      557 t- defN 15-Feb-20 17:07
> citeall/examples-citeall.bib
> -rw----     2.0 fat      458 t- defN 15-Feb-20 17:07 citeall/README
> 5 files, 110661 bytes uncompressed, 104675 bytes compressed:  5.4%
>
> Here the text files have a correct t flag.
>
> I don't know if it the problem exists also with zips created with
> git archive on non-windows OS.

Yes, git archive creates the same ZIP files everywhere.

> Would it be possible to correct the zip-backend so that it flags
> text files correctly? Or alternativly could one configure git
> archive to use another zip programm?

We would have to detect the line ending format (DOS, Unix, Macintosh, 
etc.) of each file, then set the attribute "t" (text) and the host 
system.  The detection would slow down archive creation a bit and the 
resulting files would be different, of course, so this feature should 
only be enabled by a new command line option.  I'll take a look.

René

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zip files created with git archive flags text files as binaries
  2015-02-23 19:30 ` René Scharfe
@ 2015-03-04 21:13   ` René Scharfe
  0 siblings, 0 replies; 8+ messages in thread
From: René Scharfe @ 2015-03-04 21:13 UTC (permalink / raw)
  To: luatex, git

Am 23.02.2015 um 20:30 schrieb René Scharfe:
> Am 23.02.2015 um 14:58 schrieb Ulrike Fischer:
>> The zip contained four text files and a pdf.
>>
>> The CTAN maintainers informed me that all files in the zip are
>> flagged as binaries and this makes it difficult for them to process
>> them further (they want to correct line feeds of the text files:
>> http://mirror.ctan.org/tex-archive/help/ctan/CTAN-upload-addendum.html#crlf)

By the way, a workaround for CTAN could be to extract the files using 
unzip and zip them again using Info-ZIP zip (one of the "good zip 
programs" mentioned on that website).  The result will be a ZIP file 
with text files flagged appropriately.  Just saying.

>> Would it be possible to correct the zip-backend so that it flags
>> text files correctly? Or alternativly could one configure git
>> archive to use another zip programm?
>
> We would have to detect the line ending format (DOS, Unix, Macintosh,
> etc.) of each file, then set the attribute "t" (text) and the host
> system.  The detection would slow down archive creation a bit and the
> resulting files would be different, of course, so this feature should
> only be enabled by a new command line option.  I'll take a look.

Actually its easier than that.  unzip -a (with end-of-line conversion) 
doesn't care for the actual format, it simply converts all occurrences 
of CR, CRLF and LF to the appropriate newline chars for the platform it 
is running on if the text flag is set.  Files with mixed line endings 
are normalized, i.e. they have a consistent end-of-line style afterward.

I'll send a first patch shortly.

René

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] archive-zip: add --text parameter
  2015-02-23 13:58 zip files created with git archive flags text files as binaries Ulrike Fischer
  2015-02-23 19:30 ` René Scharfe
@ 2015-03-04 21:13 ` René Scharfe
  2015-03-05  2:16   ` Junio C Hamano
                     ` (2 more replies)
  1 sibling, 3 replies; 8+ messages in thread
From: René Scharfe @ 2015-03-04 21:13 UTC (permalink / raw)
  To: luatex, git

Entries in a ZIP file can be marked as text files.  Extractors can use
that flag to apply end-of-line conversions.  An example is unzip -a.

git archive currently marks all ZIP file entries as binary files.  This
patch adds the new option --text that can be used to mark non-binary
files or all files as text files, thus enabling the use of unzip -a.

No sign-off, yet, because I'm not sure we really need another option.
E.g. --text=all doesn't seem to be actually useful, but it was easy to
implement.  Info-ZIP's zip always creates archives like --text=auto
does, so perhaps we should make that our default behavior as well?

Changing the default behavior would cause newer versions of git archive
to create different ZIP files than older ones, of course.  This can
break caching and signature checking.

The last time we did that was in 2012 when we added an extended mtime
field (227bf5980), I think.  I don't remember any fallout from that
change, but there was a recent discussion about the stability of
generated tar files, so I'm a bit cautious:

    http://thread.gmane.org/gmane.comp.version-control.git/258516

---
 Documentation/git-archive.txt |  5 ++++
 archive-zip.c                 | 23 ++++++++++++++----
 archive.c                     | 18 ++++++++++++++
 archive.h                     |  7 ++++++
 t/t5003-archive-zip.sh        | 56 +++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 105 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-archive.txt b/Documentation/git-archive.txt
index cfa1e4e..684ca36 100644
--- a/Documentation/git-archive.txt
+++ b/Documentation/git-archive.txt
@@ -93,6 +93,11 @@ zip
 	Highest and slowest compression level.  You can specify any
 	number from 1 to 9 to adjust compression speed and ratio.
 
+--text=<which>::
+	Mark the specfied entries as text files so that `unzip -a`
+	converts end-of-line characters while extracting. The value
+	must be either 'all', 'auto', or 'none' (the default).
+
 
 CONFIGURATION
 -------------
diff --git a/archive-zip.c b/archive-zip.c
index 4bde019..3767940 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -5,6 +5,7 @@
 #include "archive.h"
 #include "streaming.h"
 #include "utf8.h"
+#include "xdiff-interface.h"
 
 static int zip_date;
 static int zip_time;
@@ -210,6 +211,7 @@ static int write_zip_entry(struct archiver_args *args,
 	struct git_istream *stream = NULL;
 	unsigned long flags = 0;
 	unsigned long size;
+	int is_binary = -1;
 
 	crc = crc32(0, NULL, 0);
 
@@ -238,8 +240,14 @@ static int write_zip_entry(struct archiver_args *args,
 		method = 0;
 		attr2 = S_ISLNK(mode) ? ((mode | 0777) << 16) :
 			(mode & 0111) ? ((mode) << 16) : 0;
-		if (S_ISREG(mode) && args->compression_level != 0 && size > 0)
-			method = 8;
+		if (S_ISREG(mode)) {
+			if (args->compression_level != 0 && size > 0)
+				method = 8;
+			if (args->text == ARCHIVE_TEXT_ALL)
+				is_binary = 0;
+			else if (args->text == ARCHIVE_TEXT_NONE)
+				is_binary = 1;
+		}
 
 		if (S_ISREG(mode) && type == OBJ_BLOB && !args->convert &&
 		    size > big_file_threshold) {
@@ -256,6 +264,8 @@ static int write_zip_entry(struct archiver_args *args,
 				return error("cannot read %s",
 					     sha1_to_hex(sha1));
 			crc = crc32(crc, buffer, size);
+			if (is_binary < 0)
+				is_binary = buffer_is_binary(buffer, size);
 			out = buffer;
 		}
 		compressed_size = (method == 0) ? size : 0;
@@ -300,7 +310,6 @@ static int write_zip_entry(struct archiver_args *args,
 	copy_le16(dirent.extra_length, ZIP_EXTRA_MTIME_SIZE);
 	copy_le16(dirent.comment_length, 0);
 	copy_le16(dirent.disk, 0);
-	copy_le16(dirent.attr1, 0);
 	copy_le32(dirent.attr2, attr2);
 	copy_le32(dirent.offset, zip_offset);
 
@@ -328,6 +337,8 @@ static int write_zip_entry(struct archiver_args *args,
 			if (readlen <= 0)
 				break;
 			crc = crc32(crc, buf, readlen);
+			if (is_binary < 0)
+				is_binary = buffer_is_binary(buffer, size);
 			write_or_die(1, buf, readlen);
 		}
 		close_istream(stream);
@@ -361,6 +372,8 @@ static int write_zip_entry(struct archiver_args *args,
 			if (readlen <= 0)
 				break;
 			crc = crc32(crc, buf, readlen);
+			if (is_binary < 0)
+				is_binary = buffer_is_binary(buffer, size);
 
 			zstream.next_in = buf;
 			zstream.avail_in = readlen;
@@ -405,6 +418,8 @@ static int write_zip_entry(struct archiver_args *args,
 	free(deflated);
 	free(buffer);
 
+	copy_le16(dirent.attr1, !is_binary);
+
 	memcpy(zip_dir + zip_dir_offset, &dirent, ZIP_DIR_HEADER_SIZE);
 	zip_dir_offset += ZIP_DIR_HEADER_SIZE;
 	memcpy(zip_dir + zip_dir_offset, path, pathlen);
@@ -466,7 +481,7 @@ static int write_zip_archive(const struct archiver *ar,
 static struct archiver zip_archiver = {
 	"zip",
 	write_zip_archive,
-	ARCHIVER_WANT_COMPRESSION_LEVELS|ARCHIVER_REMOTE
+	ARCHIVER_WANT_COMPRESSION_LEVELS|ARCHIVER_REMOTE|ARCHIVER_TEXT_ATTRIBUTE
 };
 
 void init_zip_archiver(void)
diff --git a/archive.c b/archive.c
index 96057ed..89bd23d 100644
--- a/archive.c
+++ b/archive.c
@@ -417,6 +417,7 @@ static int parse_archive_args(int argc, const char **argv,
 	const char *remote = NULL;
 	const char *exec = NULL;
 	const char *output = NULL;
+	const char *text = NULL;
 	int compression_level = -1;
 	int verbose = 0;
 	int i;
@@ -442,6 +443,8 @@ static int parse_archive_args(int argc, const char **argv,
 		OPT__COMPR_HIDDEN('7', &compression_level, 7),
 		OPT__COMPR_HIDDEN('8', &compression_level, 8),
 		OPT__COMPR('9', &compression_level, N_("compress better"), 9),
+		OPT_STRING(0, "text", &text, N_("which"),
+			N_("specify which files contain text")),
 		OPT_GROUP(""),
 		OPT_BOOL('l', "list", &list,
 			N_("list supported archive formats")),
@@ -493,6 +496,21 @@ static int parse_archive_args(int argc, const char **argv,
 					format, compression_level);
 		}
 	}
+	args->text = ARCHIVE_TEXT_NONE;
+	if (text) {
+		if (!strcmp(text, "auto"))
+			args->text = ARCHIVE_TEXT_AUTO;
+		else if (!strcmp(text, "all"))
+			args->text = ARCHIVE_TEXT_ALL;
+		else if (!strcmp(text, "none"))
+			args->text = ARCHIVE_TEXT_NONE;
+		else
+			die("Unknown argument: --text=%s", text);
+		if (args->text != ARCHIVE_TEXT_NONE &&
+		    !((*ar)->flags & ARCHIVER_TEXT_ATTRIBUTE))
+			die("Argument not supported for format '%s': --text=%s",
+			    format, text);
+	}
 	args->verbose = verbose;
 	args->base = base;
 	args->baselen = strlen(base);
diff --git a/archive.h b/archive.h
index 4a791e1..eabcd11 100644
--- a/archive.h
+++ b/archive.h
@@ -14,11 +14,18 @@ struct archiver_args {
 	unsigned int verbose : 1;
 	unsigned int worktree_attributes : 1;
 	unsigned int convert : 1;
+	unsigned int text : 2;
 	int compression_level;
 };
 
 #define ARCHIVER_WANT_COMPRESSION_LEVELS 1
 #define ARCHIVER_REMOTE 2
+#define ARCHIVER_TEXT_ATTRIBUTE 4
+
+#define ARCHIVE_TEXT_NONE	0
+#define ARCHIVE_TEXT_ALL	1
+#define ARCHIVE_TEXT_AUTO	2
+
 struct archiver {
 	const char *name;
 	int (*write_archive)(const struct archiver *, struct archiver_args *);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index c929db5..4e49aad 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -35,12 +35,56 @@ check_zip() {
 	"
 }
 
+zip_text() {
+	option=$1
+	zipfile=text_$1.zip
+	dir=text_$1
+
+	test_expect_success "git archive --format-zip --text=$option" "
+		git archive --format=zip --text=$option HEAD >$zipfile
+	"
+
+	test_expect_success UNZIP " extract ZIP archive with EOL conversion" '
+		(mkdir $dir && cd $dir && "$GIT_UNZIP" -a ../$zipfile)
+	'
+}
+
+check_text_converted() {
+	dir=text_$1
+	filetype=$2
+	extracted=$dir/a/$filetype
+
+	test_expect_success " validate that $filetype files are converted" "
+		test_cmp_bin $extracted.cr $extracted.crlf &&
+		test_cmp_bin $extracted.cr $extracted.lf
+	"
+}
+
+check_text_verbatim() {
+	dir=text_$1
+	filetype=$2
+	extracted=$dir/a/$filetype
+	original=a/$filetype
+
+	test_expect_success " validate that $filetype files are unchanged" "
+		test_cmp_bin $original.cr   $extracted.cr &&
+		test_cmp_bin $original.crlf $extracted.crlf &&
+		test_cmp_bin $original.lf   $extracted.lf
+	"
+}
+
 test_expect_success \
     'populate workdir' \
     'mkdir a &&
      echo simple textfile >a/a &&
      mkdir a/bin &&
      cp /bin/sh a/bin &&
+     printf "text\r"   >a/text.cr &&
+     printf "text\r\n" >a/text.crlf &&
+     printf "text\n"   >a/text.lf &&
+     printf "\0\r"   >a/binary.cr &&
+     printf "\0\r\n" >a/binary.crlf &&
+     printf "\0\n"   >a/binary.lf &&
      printf "A\$Format:%s\$O" "$SUBSTFORMAT" >a/substfile1 &&
      printf "A not substituted O" >a/substfile2 &&
      (p=long_path_to_a_file && cd a &&
@@ -124,4 +168,16 @@ test_expect_success 'git archive --format=zip on large files' '
 
 check_zip large-compressed
 
+zip_text all
+check_text_converted all text
+check_text_converted all binary
+
+zip_text auto
+check_text_converted auto text
+check_text_verbatim auto binary
+
+zip_text none
+check_text_verbatim none text
+check_text_verbatim none binary
+
 test_done
-- 
2.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] archive-zip: add --text parameter
  2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe
@ 2015-03-05  2:16   ` Junio C Hamano
  2015-03-05 15:27     ` René Scharfe
  2015-03-05 15:27   ` René Scharfe
  2015-03-05 19:06   ` [PATCH v2] archive-zip: mark text files in archives René Scharfe
  2 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2015-03-05  2:16 UTC (permalink / raw)
  To: René Scharfe; +Cc: luatex, git

René Scharfe <l.s.r@web.de> writes:

> No sign-off, yet, because I'm not sure we really need another option.
> E.g. --text=all doesn't seem to be actually useful, but it was easy to
> implement.  Info-ZIP's zip always creates archives like --text=auto
> does, so perhaps we should make that our default behavior as well?

My knee-jerk reaction is "yeah, why not? what are the downsides,
other than the result will not be bit-for-bit identical to the
output from older Git".  I am sure I am missing something as I do
not regularly use this format.

> @@ -256,6 +264,8 @@ static int write_zip_entry(struct archiver_args *args,
>  				return error("cannot read %s",
>  					     sha1_to_hex(sha1));
>  			crc = crc32(crc, buffer, size);
> +			if (is_binary < 0)
> +				is_binary = buffer_is_binary(buffer, size);

In this codepath, do you have the path of the thing the buffer
contents came from?  I am wondering if consulting the attributes
system is a better idea. Anything that is explicitly marked as
"binary" or "-diff" is definitely binary, and anything that is not
marked as "binary" is text to us for all practical purposes, no?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] archive-zip: add --text parameter
  2015-03-05  2:16   ` Junio C Hamano
@ 2015-03-05 15:27     ` René Scharfe
  0 siblings, 0 replies; 8+ messages in thread
From: René Scharfe @ 2015-03-05 15:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: luatex, git

Am 05.03.2015 um 03:16 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
>
>> No sign-off, yet, because I'm not sure we really need another option.
>> E.g. --text=all doesn't seem to be actually useful, but it was easy to
>> implement.  Info-ZIP's zip always creates archives like --text=auto
>> does, so perhaps we should make that our default behavior as well?
>
> My knee-jerk reaction is "yeah, why not? what are the downsides,
> other than the result will not be bit-for-bit identical to the
> output from older Git".  I am sure I am missing something as I do
> not regularly use this format.

AFAICS there won't be any other downsides.  And archive stability is 
harder to achieve for ZIP anyway because it depends on compression level 
and (more fundamentally) on libz version.

>> @@ -256,6 +264,8 @@ static int write_zip_entry(struct archiver_args *args,
>>   				return error("cannot read %s",
>>   					     sha1_to_hex(sha1));
>>   			crc = crc32(crc, buffer, size);
>> +			if (is_binary < 0)
>> +				is_binary = buffer_is_binary(buffer, size);
>
> In this codepath, do you have the path of the thing the buffer
> contents came from?  I am wondering if consulting the attributes
> system is a better idea. Anything that is explicitly marked as
> "binary" or "-diff" is definitely binary, and anything that is not
> marked as "binary" is text to us for all practical purposes, no?

Yes, attributes can help, especially to allow users to correct wrong 
guesses of the heuristic.  Offering automatic detection of binary files 
by default like git diff and git grep is still a good idea, I think. 
buffer_is_binary() doesn't add a lot of overhead since it only looks at 
the first few bytes of the buffer.

René

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] archive-zip: add --text parameter
  2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe
  2015-03-05  2:16   ` Junio C Hamano
@ 2015-03-05 15:27   ` René Scharfe
  2015-03-05 19:06   ` [PATCH v2] archive-zip: mark text files in archives René Scharfe
  2 siblings, 0 replies; 8+ messages in thread
From: René Scharfe @ 2015-03-05 15:27 UTC (permalink / raw)
  To: luatex, git

Am 04.03.2015 um 22:13 schrieb René Scharfe:
> diff --git a/archive-zip.c b/archive-zip.c
> index 4bde019..3767940 100644
> --- a/archive-zip.c
> +++ b/archive-zip.c
> @@ -5,6 +5,7 @@
>   #include "archive.h"
>   #include "streaming.h"
>   #include "utf8.h"
> +#include "xdiff-interface.h"
>
>   static int zip_date;
>   static int zip_time;
> @@ -210,6 +211,7 @@ static int write_zip_entry(struct archiver_args *args,
>   	struct git_istream *stream = NULL;
>   	unsigned long flags = 0;
>   	unsigned long size;
> +	int is_binary = -1;
>
>   	crc = crc32(0, NULL, 0);
>
> @@ -238,8 +240,14 @@ static int write_zip_entry(struct archiver_args *args,
>   		method = 0;
>   		attr2 = S_ISLNK(mode) ? ((mode | 0777) << 16) :
>   			(mode & 0111) ? ((mode) << 16) : 0;
> -		if (S_ISREG(mode) && args->compression_level != 0 && size > 0)
> -			method = 8;
> +		if (S_ISREG(mode)) {
> +			if (args->compression_level != 0 && size > 0)
> +				method = 8;
> +			if (args->text == ARCHIVE_TEXT_ALL)
> +				is_binary = 0;
> +			else if (args->text == ARCHIVE_TEXT_NONE)
> +				is_binary = 1;
> +		}
>
>   		if (S_ISREG(mode) && type == OBJ_BLOB && !args->convert &&
>   		    size > big_file_threshold) {
> @@ -256,6 +264,8 @@ static int write_zip_entry(struct archiver_args *args,
>   				return error("cannot read %s",
>   					     sha1_to_hex(sha1));
>   			crc = crc32(crc, buffer, size);
> +			if (is_binary < 0)
> +				is_binary = buffer_is_binary(buffer, size);
>   			out = buffer;
>   		}
>   		compressed_size = (method == 0) ? size : 0;
> @@ -300,7 +310,6 @@ static int write_zip_entry(struct archiver_args *args,
>   	copy_le16(dirent.extra_length, ZIP_EXTRA_MTIME_SIZE);
>   	copy_le16(dirent.comment_length, 0);
>   	copy_le16(dirent.disk, 0);
> -	copy_le16(dirent.attr1, 0);
>   	copy_le32(dirent.attr2, attr2);
>   	copy_le32(dirent.offset, zip_offset);
>
> @@ -328,6 +337,8 @@ static int write_zip_entry(struct archiver_args *args,
>   			if (readlen <= 0)
>   				break;
>   			crc = crc32(crc, buf, readlen);
> +			if (is_binary < 0)
> +				is_binary = buffer_is_binary(buffer, size);

buffer is NULL here, so this crashes.  buf and readlen have to be used 
instead.  This code path is only used for entries that are too big to be 
compressed in one go.

>   			write_or_die(1, buf, readlen);
>   		}
>   		close_istream(stream);
> @@ -361,6 +372,8 @@ static int write_zip_entry(struct archiver_args *args,
>   			if (readlen <= 0)
>   				break;
>   			crc = crc32(crc, buf, readlen);
> +			if (is_binary < 0)
> +				is_binary = buffer_is_binary(buffer, size);

Same here.

>   			zstream.next_in = buf;
>   			zstream.avail_in = readlen;
> @@ -405,6 +418,8 @@ static int write_zip_entry(struct archiver_args *args,
>   	free(deflated);
>   	free(buffer);
>
> +	copy_le16(dirent.attr1, !is_binary);
> +
>   	memcpy(zip_dir + zip_dir_offset, &dirent, ZIP_DIR_HEADER_SIZE);
>   	zip_dir_offset += ZIP_DIR_HEADER_SIZE;
>   	memcpy(zip_dir + zip_dir_offset, path, pathlen);
> @@ -466,7 +481,7 @@ static int write_zip_archive(const struct archiver *ar,
>   static struct archiver zip_archiver = {
>   	"zip",
>   	write_zip_archive,
> -	ARCHIVER_WANT_COMPRESSION_LEVELS|ARCHIVER_REMOTE
> +	ARCHIVER_WANT_COMPRESSION_LEVELS|ARCHIVER_REMOTE|ARCHIVER_TEXT_ATTRIBUTE
>   };
>
>   void init_zip_archiver(void)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] archive-zip: mark text files in archives
  2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe
  2015-03-05  2:16   ` Junio C Hamano
  2015-03-05 15:27   ` René Scharfe
@ 2015-03-05 19:06   ` René Scharfe
  2 siblings, 0 replies; 8+ messages in thread
From: René Scharfe @ 2015-03-05 19:06 UTC (permalink / raw)
  To: Ulrike Fischer, git; +Cc: Junio C Hamano

Set the text flag for ZIP archive entries that look like text files so
that unzip -a can be used to perform end-of-line conversions.  Info-ZIP
zip does the same.

Detect binary files the same way as git diff and git grep do, namely by
checking for the attribute "diff" and its negation "-diff", and if none
is found by falling back to checking for the presence of NUL bytes in
the first few bytes of the file contents.

7-Zip, Windows' built-in ZIP functionality and Info-ZIP unzip without
the switch -a are not affected by the change and still extract text
files without doing any end-of-line conversions.

NB: The actual end-of-line style used in the archive entries doesn't
matter to unzip -a, as it converts any CR, CRLF and LF to the line end
characters appropriate for the platform it is running on.

Suggested-by: Ulrike Fischer <luatex@nililand.de>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
 archive-zip.c          | 25 ++++++++++++++++++++++++-
 t/t5003-archive-zip.sh | 47 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index 4bde019..0f9e87f 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -5,6 +5,8 @@
 #include "archive.h"
 #include "streaming.h"
 #include "utf8.h"
+#include "userdiff.h"
+#include "xdiff-interface.h"
 
 static int zip_date;
 static int zip_time;
@@ -189,6 +191,16 @@ static int has_only_ascii(const char *s)
 	}
 }
 
+static int entry_is_binary(const char *path, const void *buffer, size_t size)
+{
+	struct userdiff_driver *driver = userdiff_find_by_path(path);
+	if (!driver)
+		driver = userdiff_find_by_name("default");
+	if (driver->binary != -1)
+		return driver->binary;
+	return buffer_is_binary(buffer, size);
+}
+
 #define STREAM_BUFFER_SIZE (1024 * 16)
 
 static int write_zip_entry(struct archiver_args *args,
@@ -210,6 +222,8 @@ static int write_zip_entry(struct archiver_args *args,
 	struct git_istream *stream = NULL;
 	unsigned long flags = 0;
 	unsigned long size;
+	int is_binary = -1;
+	const char *path_without_prefix = path + args->baselen;
 
 	crc = crc32(0, NULL, 0);
 
@@ -256,6 +270,8 @@ static int write_zip_entry(struct archiver_args *args,
 				return error("cannot read %s",
 					     sha1_to_hex(sha1));
 			crc = crc32(crc, buffer, size);
+			is_binary = entry_is_binary(path_without_prefix,
+						    buffer, size);
 			out = buffer;
 		}
 		compressed_size = (method == 0) ? size : 0;
@@ -300,7 +316,6 @@ static int write_zip_entry(struct archiver_args *args,
 	copy_le16(dirent.extra_length, ZIP_EXTRA_MTIME_SIZE);
 	copy_le16(dirent.comment_length, 0);
 	copy_le16(dirent.disk, 0);
-	copy_le16(dirent.attr1, 0);
 	copy_le32(dirent.attr2, attr2);
 	copy_le32(dirent.offset, zip_offset);
 
@@ -328,6 +343,9 @@ static int write_zip_entry(struct archiver_args *args,
 			if (readlen <= 0)
 				break;
 			crc = crc32(crc, buf, readlen);
+			if (is_binary == -1)
+				is_binary = entry_is_binary(path_without_prefix,
+							    buf, readlen);
 			write_or_die(1, buf, readlen);
 		}
 		close_istream(stream);
@@ -361,6 +379,9 @@ static int write_zip_entry(struct archiver_args *args,
 			if (readlen <= 0)
 				break;
 			crc = crc32(crc, buf, readlen);
+			if (is_binary == -1)
+				is_binary = entry_is_binary(path_without_prefix,
+							    buf, readlen);
 
 			zstream.next_in = buf;
 			zstream.avail_in = readlen;
@@ -405,6 +426,8 @@ static int write_zip_entry(struct archiver_args *args,
 	free(deflated);
 	free(buffer);
 
+	copy_le16(dirent.attr1, !is_binary);
+
 	memcpy(zip_dir + zip_dir_offset, &dirent, ZIP_DIR_HEADER_SIZE);
 	zip_dir_offset += ZIP_DIR_HEADER_SIZE;
 	memcpy(zip_dir + zip_dir_offset, path, pathlen);
diff --git a/t/t5003-archive-zip.sh b/t/t5003-archive-zip.sh
index c929db5..14744b2 100755
--- a/t/t5003-archive-zip.sh
+++ b/t/t5003-archive-zip.sh
@@ -33,6 +33,37 @@ check_zip() {
 	test_expect_success UNZIP " validate file contents" "
 		diff -r a ${dir_with_prefix}a
 	"
+
+	dir=eol_$1
+	dir_with_prefix=$dir/$2
+	extracted=${dir_with_prefix}a
+	original=a
+
+	test_expect_success UNZIP " extract ZIP archive with EOL conversion" '
+		(mkdir $dir && cd $dir && "$GIT_UNZIP" -a ../$zipfile)
+	'
+
+	test_expect_success UNZIP " validate that text files are converted" "
+		test_cmp_bin $extracted/text.cr $extracted/text.crlf &&
+		test_cmp_bin $extracted/text.cr $extracted/text.lf
+	"
+
+	test_expect_success UNZIP " validate that binary files are unchanged" "
+		test_cmp_bin $original/binary.cr   $extracted/binary.cr &&
+		test_cmp_bin $original/binary.crlf $extracted/binary.crlf &&
+		test_cmp_bin $original/binary.lf   $extracted/binary.lf
+	"
+
+	test_expect_success UNZIP " validate that diff files are converted" "
+		test_cmp_bin $extracted/diff.cr $extracted/diff.crlf &&
+		test_cmp_bin $extracted/diff.cr $extracted/diff.lf
+	"
+
+	test_expect_success UNZIP " validate that -diff files are unchanged" "
+		test_cmp_bin $original/nodiff.cr   $extracted/nodiff.cr &&
+		test_cmp_bin $original/nodiff.crlf $extracted/nodiff.crlf &&
+		test_cmp_bin $original/nodiff.lf   $extracted/nodiff.lf
+	"
 }
 
 test_expect_success \
@@ -41,6 +72,18 @@ test_expect_success \
      echo simple textfile >a/a &&
      mkdir a/bin &&
      cp /bin/sh a/bin &&
+     printf "text\r"	>a/text.cr &&
+     printf "text\r\n"	>a/text.crlf &&
+     printf "text\n"	>a/text.lf &&
+     printf "text\r"	>a/nodiff.cr &&
+     printf "text\r\n"	>a/nodiff.crlf &&
+     printf "text\n"	>a/nodiff.lf &&
+     printf "\0\r"	>a/binary.cr &&
+     printf "\0\r\n"	>a/binary.crlf &&
+     printf "\0\n"	>a/binary.lf &&
+     printf "\0\r"	>a/diff.cr &&
+     printf "\0\r\n"	>a/diff.crlf &&
+     printf "\0\n"	>a/diff.lf &&
      printf "A\$Format:%s\$O" "$SUBSTFORMAT" >a/substfile1 &&
      printf "A not substituted O" >a/substfile2 &&
      (p=long_path_to_a_file && cd a &&
@@ -66,7 +109,9 @@ test_expect_success 'add files to repository' '
 	GIT_COMMITTER_DATE="2005-05-27 22:00" git commit -m initial
 '
 
-test_expect_success 'setup export-subst' '
+test_expect_success 'setup export-subst and diff attributes' '
+	echo "a/nodiff.* -diff" >>.git/info/attributes &&
+	echo "a/diff.* diff" >>.git/info/attributes &&
 	echo "substfile?" export-subst >>.git/info/attributes &&
 	git log --max-count=1 "--pretty=format:A${SUBSTFORMAT}O" HEAD \
 		>a/substfile1
-- 
2.3.1

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-03-05 19:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-23 13:58 zip files created with git archive flags text files as binaries Ulrike Fischer
2015-02-23 19:30 ` René Scharfe
2015-03-04 21:13   ` René Scharfe
2015-03-04 21:13 ` [PATCH] archive-zip: add --text parameter René Scharfe
2015-03-05  2:16   ` Junio C Hamano
2015-03-05 15:27     ` René Scharfe
2015-03-05 15:27   ` René Scharfe
2015-03-05 19:06   ` [PATCH v2] archive-zip: mark text files in archives René Scharfe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.