All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] apply: handle traditional patches with spaces in filename
@ 2010-07-24  1:06 Jonathan Nieder
  2010-07-24  1:09 ` [PATCH 1/3] apply: Split quoted filename handling into new function Jonathan Nieder
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Jonathan Nieder @ 2010-07-24  1:06 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Guido Günther, Giuseppe Iuculano

The goal: let ‘git apply’ handle such filenames as
“b/debian/licenses/LICENSE.Apache (v2.0)” in patches produced by
non-git tools without erroring out.

When we last left our heroes[1] in April:

> | The name and last modification time of each file shall be output in
> | the following format:
> |
> | "---[space]%s  %s%s%s", file1, <file1 timestamp>, <file1 frac>, <file1 zone>
> | "+++[space]%s  %s%s%s", file2, <file2 timestamp>, <file2 frac>, <file2 zone>
[...]
> If this is really describing the format of patches in the wild, that
> means we should only look for a tab character to terminate the filename.
[...]
> A big downside: this does not cope with copy-and-pasted patches with
> tabs transformed to spaces.  The example [2] consists mostly of
> file-creation patches, so we can’t look to the repository for hints.
> Maybe the space-plus-date-plus-newline sequence should be used as a
> delimiter.

It turns out that is not so hard.  Maybe it could be rewritten using
regcomp() and regexec(); if someone wants to do that, I won’t stop
them. ;-)

Patch 1 factors out a function to handle "GNU-format" C-style quoted
filenames in patches.  The only tool I know of that produces this
format is git; the discussion in [2] about what characters to escape
seems to have come to no conclusion.

Patch 2 adds some tests for all those weird characters that might
appear in a filename.  They abuse “diff” and “pr”; testing on weird
platforms would be helpful.

Patch 3 adds the logic to search for a date at the end of a filename
line, for traditional (non --git) patches only.  If no date is found
at the end, we return to the previous heuristic, except that the only
accepted filename terminator is a tab.  Whitespace damage is only
accepted if there is a timestamp at the end of the line.

Thoughts, suggestions, improvements welcome.

Jonathan Nieder (3):
  apply: Split quoted filename handling into new function
  tests: Test how well “git apply” copes with weird filenames
  apply: Handle traditional patches with space in filename

 builtin/apply.c                  |  251 ++++++++++++++++++++++++++++++++------
 t/t4120-apply-popt.sh            |   35 +++++-
 t/t4135-apply-weird-filenames.sh |  119 ++++++++++++++++++
 3 files changed, 363 insertions(+), 42 deletions(-)
 create mode 100755 t/t4135-apply-weird-filenames.sh

[1] http://thread.gmane.org/gmane.linux.debian.devel.bugs.general/697969/focus=145543
[2] http://thread.gmane.org/gmane.comp.version-control.git/9813/focus=10046

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] apply: Split quoted filename handling into new function
  2010-07-24  1:06 [PATCH 0/3] apply: handle traditional patches with spaces in filename Jonathan Nieder
@ 2010-07-24  1:09 ` Jonathan Nieder
  2010-07-24  1:11 ` [PATCH 2/3] tests: Test how well "git apply" copes with weird filenames Jonathan Nieder
  2010-07-24  1:20 ` [PATCH 3/3] apply: Handle traditional patches with space in filename Jonathan Nieder
  2 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2010-07-24  1:09 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Guido Günther, Giuseppe Iuculano

The new find_name_gnu() function handles new-style ‘--- "a/foo"’
patch header lines, leaving find_name() itself a bit less
daunting.

Functional change: do not clobber the p-value when there are not
enough path components in a quoted file name to honor it.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 builtin/apply.c       |   70 +++++++++++++++++++++++++++---------------------
 t/t4120-apply-popt.sh |   35 ++++++++++++++++++++++--
 2 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 12ef9ea..efc109e 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -416,44 +416,52 @@ static char *squash_slash(char *name)
 	return name;
 }
 
+static char *find_name_gnu(const char *line, char *def, int p_value)
+{
+	struct strbuf name = STRBUF_INIT;
+	char *cp;
+
+	/*
+	 * Proposed "new-style" GNU patch/diff format; see
+	 * http://marc.theaimsgroup.com/?l=git&m=112927316408690&w=2
+	 */
+	if (unquote_c_style(&name, line, NULL)) {
+		strbuf_release(&name);
+		return NULL;
+	}
+
+	for (cp = name.buf; p_value; p_value--) {
+		cp = strchr(cp, '/');
+		if (!cp) {
+			strbuf_release(&name);
+			return NULL;
+		}
+		cp++;
+	}
+
+	/* name can later be freed, so we need
+	 * to memmove, not just return cp
+	 */
+	strbuf_remove(&name, 0, cp - name.buf);
+	free(def);
+	if (root)
+		strbuf_insert(&name, 0, root, root_len);
+	return squash_slash(strbuf_detach(&name, NULL));
+}
+
 static char *find_name(const char *line, char *def, int p_value, int terminate)
 {
 	int len;
 	const char *start = NULL;
 
+	if (*line == '"') {
+		char *name = find_name_gnu(line, def, p_value);
+		if (name)
+			return name;
+	}
+
 	if (p_value == 0)
 		start = line;
-
-	if (*line == '"') {
-		struct strbuf name = STRBUF_INIT;
-
-		/*
-		 * Proposed "new-style" GNU patch/diff format; see
-		 * http://marc.theaimsgroup.com/?l=git&m=112927316408690&w=2
-		 */
-		if (!unquote_c_style(&name, line, NULL)) {
-			char *cp;
-
-			for (cp = name.buf; p_value; p_value--) {
-				cp = strchr(cp, '/');
-				if (!cp)
-					break;
-				cp++;
-			}
-			if (cp) {
-				/* name can later be freed, so we need
-				 * to memmove, not just return cp
-				 */
-				strbuf_remove(&name, 0, cp - name.buf);
-				free(def);
-				if (root)
-					strbuf_insert(&name, 0, root, root_len);
-				return squash_slash(strbuf_detach(&name, NULL));
-			}
-		}
-		strbuf_release(&name);
-	}
-
 	for (;;) {
 		char c = *line;
 
diff --git a/t/t4120-apply-popt.sh b/t/t4120-apply-popt.sh
index b463b4f..2b2d00b 100755
--- a/t/t4120-apply-popt.sh
+++ b/t/t4120-apply-popt.sh
@@ -10,21 +10,50 @@ test_description='git apply -p handling.'
 test_expect_success setup '
 	mkdir sub &&
 	echo A >sub/file1 &&
-	cp sub/file1 file1 &&
+	cp sub/file1 file1.saved &&
 	git add sub/file1 &&
 	echo B >sub/file1 &&
 	git diff >patch.file &&
-	rm sub/file1 &&
-	rmdir sub
+	git checkout -- sub/file1 &&
+	git mv sub süb &&
+	echo B >süb/file1 &&
+	git diff >patch.escaped &&
+	grep "[\]" patch.escaped &&
+	rm süb/file1 &&
+	rmdir süb
 '
 
 test_expect_success 'apply git diff with -p2' '
+	cp file1.saved file1 &&
 	git apply -p2 patch.file
 '
 
 test_expect_success 'apply with too large -p' '
+	cp file1.saved file1 &&
 	test_must_fail git apply --stat -p3 patch.file 2>err &&
 	grep "removing 3 leading" err
 '
 
+test_expect_success 'apply (-p2) traditional diff with funny filenames' '
+	cat >patch.quotes <<-\EOF &&
+	diff -u "a/"sub/file1 "b/"sub/file1
+	--- "a/"sub/file1
+	+++ "b/"sub/file1
+	@@ -1 +1 @@
+	-A
+	+B
+	EOF
+	echo B >expected &&
+
+	cp file1.saved file1 &&
+	git apply -p2 patch.quotes &&
+	test_cmp expected file1
+'
+
+test_expect_success 'apply with too large -p and fancy filename' '
+	cp file1.saved file1 &&
+	test_must_fail git apply --stat -p3 patch.escaped 2>err &&
+	grep "removing 3 leading" err
+'
+
 test_done
-- 
1.7.2.rc3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] tests: Test how well "git apply" copes with weird filenames
  2010-07-24  1:06 [PATCH 0/3] apply: handle traditional patches with spaces in filename Jonathan Nieder
  2010-07-24  1:09 ` [PATCH 1/3] apply: Split quoted filename handling into new function Jonathan Nieder
@ 2010-07-24  1:11 ` Jonathan Nieder
  2010-07-24  8:03   ` Andreas Schwab
  2010-07-24  1:20 ` [PATCH 3/3] apply: Handle traditional patches with space in filename Jonathan Nieder
  2 siblings, 1 reply; 7+ messages in thread
From: Jonathan Nieder @ 2010-07-24  1:11 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Guido Günther, Giuseppe Iuculano

The tests assume a reasonably well-behaved “diff -u” and “pr”
to produce the (possibly whitespace-damaged) patches.  On platforms
without those commands, skip the tests.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
 t/t4135-apply-weird-filenames.sh |  119 ++++++++++++++++++++++++++++++++++++++
 1 files changed, 119 insertions(+), 0 deletions(-)
 create mode 100755 t/t4135-apply-weird-filenames.sh

diff --git a/t/t4135-apply-weird-filenames.sh b/t/t4135-apply-weird-filenames.sh
new file mode 100755
index 0000000..2dcb040
--- /dev/null
+++ b/t/t4135-apply-weird-filenames.sh
@@ -0,0 +1,119 @@
+#!/bin/sh
+
+test_description='git apply with weird postimage filenames'
+
+. ./test-lib.sh
+
+test_expect_success 'setup: empty commit' '
+	test_tick &&
+	git commit --allow-empty -m preimage &&
+	git tag preimage
+'
+
+test_expect_success 'setup: clean-up functions' '
+	reset_preimage() {
+		git checkout -f preimage^0 &&
+		git read-tree -u --reset HEAD &&
+		git update-index --refresh
+	} &&
+
+	reset_subdirs() {
+		rm -fr a b &&
+		mkdir a b
+	}
+'
+
+test_expect_success 'setup: test prerequisites' '
+	echo one >1 &&
+	echo one >2 &&
+	if diff -u 1 2
+	then
+		test_set_prereq UNIDIFF
+	fi &&
+
+	if diff -pruN 1 2
+	then
+		test_set_prereq FULLDIFF
+	fi &&
+
+	echo "tab ->  ." >expected &&
+	echo "tab ->	." >with-tab &&
+
+	pr -tT -e8 with-tab >actual &&
+	if test_cmp expected actual
+	then
+		test_set_prereq PR
+	fi
+'
+
+try_filename() {
+	desc=$1
+	postimage=$2
+	exp1=${3:-success}
+	exp2=${4:-success}
+	exp3=${5:-success}
+
+	test_expect_$exp1 "$desc, git-style file creation patch" "
+		reset_preimage &&
+		echo postimage >'$postimage' &&
+		git add -N '$postimage' &&
+		git diff HEAD >'git-$desc.diff' &&
+
+		git rm -f --cached '$postimage' &&
+		mv '$postimage' postimage.saved &&
+		git apply -v 'git-$desc.diff' &&
+
+		test_cmp postimage.saved '$postimage'
+	"
+
+	test_expect_$exp2 UNIDIFF "$desc, traditional patch" "
+		reset_preimage &&
+		echo preimage >'$postimage.orig' &&
+		echo postimage >'$postimage' &&
+		! diff -u '$postimage.orig' '$postimage' >'diff-$desc.diff' &&
+
+		mv '$postimage' postimage.saved &&
+		mv '$postimage.orig' '$postimage' &&
+		git apply -v 'diff-$desc.diff' &&
+
+		test_cmp postimage.saved '$postimage'
+	"
+
+	test_expect_$exp3 FULLDIFF "$desc, traditional file creation patch" "
+		reset_preimage &&
+		reset_subdirs &&
+		echo postimage >b/'$postimage' &&
+		! diff -pruN a b >'add-$desc.diff' &&
+
+		rm -f '$postimage' &&
+		mv b/'$postimage' postimage.saved &&
+		git apply -v 'add-$desc.diff' &&
+
+		test_cmp postimage.saved '$postimage'
+	"
+}
+
+try_filename 'plain'            'postimage.txt'
+try_filename 'with spaces'      'post image.txt' success failure failure
+try_filename 'with tab'         'post	image.txt' success failure failure
+try_filename 'with backslash'   'post\image.txt'
+try_filename 'with quote'       '"postimage".txt' success failure success
+
+if test_have_prereq FULLDIFF && test_have_prereq PR
+then
+	test_set_prereq FULLDIFFPR
+fi
+test_expect_success FULLDIFFPR 'whitespace-damaged traditional patch' '
+	reset_preimage &&
+	reset_subdirs &&
+	echo postimage >b/postimage.txt &&
+	! diff -pruN a b >diff-plain.txt &&
+	pr -tT -e8 diff-plain.txt >damaged.diff &&
+
+	mv postimage.txt postimage.saved &&
+	git apply -v damaged.diff &&
+
+	test_cmp postimage.saved postimage.txt
+'
+
+test_done
-- 
1.7.2.rc3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] apply: Handle traditional patches with space in filename
  2010-07-24  1:06 [PATCH 0/3] apply: handle traditional patches with spaces in filename Jonathan Nieder
  2010-07-24  1:09 ` [PATCH 1/3] apply: Split quoted filename handling into new function Jonathan Nieder
  2010-07-24  1:11 ` [PATCH 2/3] tests: Test how well "git apply" copes with weird filenames Jonathan Nieder
@ 2010-07-24  1:20 ` Jonathan Nieder
  2 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2010-07-24  1:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Guido Günther, Giuseppe Iuculano

To discover filenames from the --- and +++ lines in a traditional
unified diff, currently ‘git apply’ scans forward for a whitespace
character on each line and stops there.  It can’t use the whole line
because ‘diff -u’ likes to include timestamps, like so:

 --- foo	2000-07-12 16:56:50.020000414 -0500
 +++ bar	2010-07-12 16:56:50.020000414 -0500

The whitespace-seeking heuristic works great, even when the tab
has been converted to spaces by some email + copy-and-paste
related corruption.

Except for one problem: if the filename itself contains whitespace,
the inferred filename will be too short.

When Giuseppe ran into this problem, it was for a file creation
patch (filename ‘debian/licenses/LICENSE.global BSD-style Chromium’).
So one can’t use the list of files present in the index to deduce an
appropriate filename (not to mention that way lies madness; see
v0.99~402, 2005-05-31).

Instead, look for a timestamp and use that if present to mark the end
of the filename.  If no timestamp is present, the old heuristic is
used, with one exception: the space character \040 is not considered
terminating whitespace any more unless it is followed by a timestamp.

Reported-by: Giuseppe Iuculano <iuculano@debian.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Acked-by: Guido Günther <agx@sigxcpu.org>
---
Guido, I have carried over your ack from <http://bugs.debian.org/578752>.
I hope that is okay.  And to both Guido and Giuseppe, sorry to have
taken so long on this.

That is the end of the series.  I hope it was not too unpleasant a read.
As always, thoughts, improvements, bugs welcome.

Regards,
Jonathan

 builtin/apply.c                  |  193 +++++++++++++++++++++++++++++++++++---
 t/t4135-apply-weird-filenames.sh |    4 +-
 2 files changed, 181 insertions(+), 16 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index efc109e..b975c99 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -449,23 +449,157 @@ static char *find_name_gnu(const char *line, char *def, int p_value)
 	return squash_slash(strbuf_detach(&name, NULL));
 }
 
-static char *find_name(const char *line, char *def, int p_value, int terminate)
+static size_t tz_len(const char *line, size_t len)
+{
+	const char *tz, *p;
+
+	if (len < strlen(" +0500") || line[len-strlen(" +0500")] != ' ')
+		return 0;
+	tz = line + len - strlen(" +0500");
+
+	if (tz[1] != '+' && tz[1] != '-')
+		return 0;
+
+	for (p = tz + 2; p != line + len; p++)
+		if (!isdigit(*p))
+			return 0;
+
+	return line + len - tz;
+}
+
+static size_t date_len(const char *line, size_t len)
+{
+	const char *date, *p;
+
+	if (len < strlen("72-02-05") || line[len-strlen("-05")] != '-')
+		return 0;
+	p = date = line + len - strlen("72-02-05");
+
+	if (!isdigit(*p++) || !isdigit(*p++) || *p++ != '-' ||
+	    !isdigit(*p++) || !isdigit(*p++) || *p++ != '-' ||
+	    !isdigit(*p++) || !isdigit(*p++))	/* Not a date. */
+		return 0;
+
+	if (date - line >= strlen("19") &&
+	    isdigit(date[-1]) && isdigit(date[-2]))	/* 4-digit year */
+		date -= strlen("19");
+
+	return line + len - date;
+}
+
+static size_t short_time_len(const char *line, size_t len)
+{
+	const char *time, *p;
+
+	if (len < strlen(" 07:01:32") || line[len-strlen(":32")] != ':')
+		return 0;
+	p = time = line + len - strlen(" 07:01:32");
+
+	/* Permit 1-digit hours? */
+	if (*p++ != ' ' ||
+	    !isdigit(*p++) || !isdigit(*p++) || *p++ != ':' ||
+	    !isdigit(*p++) || !isdigit(*p++) || *p++ != ':' ||
+	    !isdigit(*p++) || !isdigit(*p++))	/* Not a time. */
+		return 0;
+
+	return line + len - time;
+}
+
+static size_t fractional_time_len(const char *line, size_t len)
+{
+	const char *p;
+	size_t n;
+
+	/* Expected format: 19:41:17.620000023 */
+	if (!len || !isdigit(line[len - 1]))
+		return 0;
+	p = line + len - 1;
+
+	/* Fractional seconds. */
+	while (p > line && isdigit(*p))
+		p--;
+	if (*p != '.')
+		return 0;
+
+	/* Hours, minutes, and whole seconds. */
+	n = short_time_len(line, p - line);
+	if (!n)
+		return 0;
+
+	return line + len - p + n;
+}
+
+static size_t trailing_spaces_len(const char *line, size_t len)
+{
+	const char *p;
+
+	/* Expected format: ' ' x (1 or more)  */
+	if (!len || line[len - 1] != ' ')
+		return 0;
+
+	p = line + len;
+	while (p != line) {
+		p--;
+		if (*p != ' ')
+			return line + len - (p + 1);
+	}
+
+	/* All spaces! */
+	return len;
+}
+
+static size_t diff_timestamp_len(const char *line, size_t len)
+{
+	const char *end = line + len;
+	size_t n;
+
+	/*
+	 * Posix: 2010-07-05 19:41:17
+	 * GNU: 2010-07-05 19:41:17.620000023 -0500
+	 */
+
+	if (!isdigit(end[-1]))
+		return 0;
+
+	n = tz_len(line, end - line);
+	end -= n;
+
+	n = short_time_len(line, end - line);
+	if (!n)
+		n = fractional_time_len(line, end - line);
+	end -= n;
+
+	n = date_len(line, end - line);
+	if (!n)	/* No date.  Too bad. */
+		return 0;
+	end -= n;
+
+	if (end == line)	/* No space before date. */
+		return 0;
+	if (end[-1] == '\t') {	/* Success! */
+		end--;
+		return line + len - end;
+	}
+	if (end[-1] != ' ')	/* No space before date. */
+		return 0;
+
+	/* Whitespace damage. */
+	end -= trailing_spaces_len(line, end - line);
+	return line + len - end;
+}
+
+static char *find_name_common(const char *line, char *def, int p_value,
+				const char *end, int terminate)
 {
 	int len;
 	const char *start = NULL;
 
-	if (*line == '"') {
-		char *name = find_name_gnu(line, def, p_value);
-		if (name)
-			return name;
-	}
-
 	if (p_value == 0)
 		start = line;
-	for (;;) {
+	while (line != end) {
 		char c = *line;
 
-		if (isspace(c)) {
+		if (!end && isspace(c)) {
 			if (c == '\n')
 				break;
 			if (name_terminate(start, line-start, c, terminate))
@@ -505,6 +639,37 @@ static char *find_name(const char *line, char *def, int p_value, int terminate)
 	return squash_slash(xmemdupz(start, len));
 }
 
+static char *find_name(const char *line, char *def, int p_value, int terminate)
+{
+	if (*line == '"') {
+		char *name = find_name_gnu(line, def, p_value);
+		if (name)
+			return name;
+	}
+
+	return find_name_common(line, def, p_value, NULL, terminate);
+}
+
+static char *find_name_traditional(const char *line, char *def, int p_value)
+{
+	size_t len = strlen(line);
+	size_t date_len;
+
+	if (*line == '"') {
+		char *name = find_name_gnu(line, def, p_value);
+		if (name)
+			return name;
+	}
+
+	len = strchrnul(line, '\n') - line;
+	date_len = diff_timestamp_len(line, len);
+	if (!date_len)
+		return find_name(line, def, p_value, TERM_TAB);
+	len -= date_len;
+
+	return find_name_common(line, def, p_value, line + len, 0);
+}
+
 static int count_slashes(const char *cp)
 {
 	int cnt = 0;
@@ -527,7 +692,7 @@ static int guess_p_value(const char *nameline)
 
 	if (is_dev_null(nameline))
 		return -1;
-	name = find_name(nameline, NULL, 0, TERM_SPACE | TERM_TAB);
+	name = find_name_traditional(nameline, NULL, 0);
 	if (!name)
 		return -1;
 	cp = strchr(name, '/');
@@ -646,16 +811,16 @@ static void parse_traditional_patch(const char *first, const char *second, struc
 	if (is_dev_null(first)) {
 		patch->is_new = 1;
 		patch->is_delete = 0;
-		name = find_name(second, NULL, p_value, TERM_SPACE | TERM_TAB);
+		name = find_name_traditional(second, NULL, p_value);
 		patch->new_name = name;
 	} else if (is_dev_null(second)) {
 		patch->is_new = 0;
 		patch->is_delete = 1;
-		name = find_name(first, NULL, p_value, TERM_SPACE | TERM_TAB);
+		name = find_name_traditional(first, NULL, p_value);
 		patch->old_name = name;
 	} else {
-		name = find_name(first, NULL, p_value, TERM_SPACE | TERM_TAB);
-		name = find_name(second, name, p_value, TERM_SPACE | TERM_TAB);
+		name = find_name_traditional(first, NULL, p_value);
+		name = find_name_traditional(second, name, p_value);
 		if (has_epoch_timestamp(first)) {
 			patch->is_new = 1;
 			patch->is_delete = 0;
diff --git a/t/t4135-apply-weird-filenames.sh b/t/t4135-apply-weird-filenames.sh
index 2dcb040..52f9f5b 100755
--- a/t/t4135-apply-weird-filenames.sh
+++ b/t/t4135-apply-weird-filenames.sh
@@ -94,8 +94,8 @@ try_filename() {
 }
 
 try_filename 'plain'            'postimage.txt'
-try_filename 'with spaces'      'post image.txt' success failure failure
-try_filename 'with tab'         'post	image.txt' success failure failure
+try_filename 'with spaces'      'post image.txt'
+try_filename 'with tab'         'post	image.txt'
 try_filename 'with backslash'   'post\image.txt'
 try_filename 'with quote'       '"postimage".txt' success failure success
 
-- 
1.7.2.rc3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] tests: Test how well "git apply" copes with weird filenames
  2010-07-24  1:11 ` [PATCH 2/3] tests: Test how well "git apply" copes with weird filenames Jonathan Nieder
@ 2010-07-24  8:03   ` Andreas Schwab
  2010-07-24  8:48     ` Jonathan Nieder
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Schwab @ 2010-07-24  8:03 UTC (permalink / raw)
  To: Jonathan Nieder
  Cc: git, Junio C Hamano, Guido Günther, Giuseppe Iuculano

Jonathan Nieder <jrnieder@gmail.com> writes:

> The tests assume a reasonably well-behaved “diff -u” and “pr”
> to produce the (possibly whitespace-damaged) patches.  On platforms
> without those commands, skip the tests.

pr -T is not portable.  What's wrong with expand?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/3] tests: Test how well "git apply" copes with weird filenames
  2010-07-24  8:03   ` Andreas Schwab
@ 2010-07-24  8:48     ` Jonathan Nieder
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2010-07-24  8:48 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: git, Junio C Hamano, Guido Günther, Giuseppe Iuculano

Andreas Schwab wrote:

> pr -T is not portable.  What's wrong with expand?

Nothing at all.  Here’s a patch for squashing in.

-- 8< --
Subject: t4135 (apply): use expand instead of pr for portability

expand is just the thing for expanding tabs into spaces, and
unlike pr -T, it is portable.  Use it.

Noticed-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Thank you. :)

 t/t4135-apply-weird-filenames.sh |   17 ++---------------
 1 files changed, 2 insertions(+), 15 deletions(-)

diff --git i/t/t4135-apply-weird-filenames.sh w/t/t4135-apply-weird-filenames.sh
index 2dcb040..dda554e 100755
--- i/t/t4135-apply-weird-filenames.sh
+++ w/t/t4135-apply-weird-filenames.sh
@@ -34,15 +34,6 @@ test_expect_success 'setup: test prerequisites' '
 	if diff -pruN 1 2
 	then
 		test_set_prereq FULLDIFF
-	fi &&
-
-	echo "tab ->  ." >expected &&
-	echo "tab ->	." >with-tab &&
-
-	pr -tT -e8 with-tab >actual &&
-	if test_cmp expected actual
-	then
-		test_set_prereq PR
 	fi
 '
 
@@ -99,16 +90,12 @@ try_filename 'with tab'         'post	image.txt' success failure failure
 try_filename 'with backslash'   'post\image.txt'
 try_filename 'with quote'       '"postimage".txt' success failure success
 
-if test_have_prereq FULLDIFF && test_have_prereq PR
-then
-	test_set_prereq FULLDIFFPR
-fi
-test_expect_success FULLDIFFPR 'whitespace-damaged traditional patch' '
+test_expect_success FULLDIFF 'whitespace-damaged traditional patch' '
 	reset_preimage &&
 	reset_subdirs &&
 	echo postimage >b/postimage.txt &&
 	! diff -pruN a b >diff-plain.txt &&
-	pr -tT -e8 diff-plain.txt >damaged.diff &&
+	expand diff-plain.txt >damaged.diff &&
 
 	mv postimage.txt postimage.saved &&
 	git apply -v damaged.diff &&
-- 

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] apply: handle traditional patches with space in filename
  2010-08-19  1:45       ` [PATCH v2 0/3] apply: handle traditional patches with space in filename Jonathan Nieder
@ 2010-08-19  1:50         ` Jonathan Nieder
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Nieder @ 2010-08-19  1:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Johannes Sixt, git, Greg Brockman, Ilari Liusvaara,
	Elijah Newren, Andreas Schwab

To discover filenames from the --- and +++ lines in a traditional
unified diff, currently "git apply" scans forward for a whitespace
character on each line and stops there.  It can't use the whole line
because "diff -u" likes to include timestamps, like so:

 --- foo	2000-07-12 16:56:50.020000414 -0500
 +++ bar	2010-07-12 16:56:50.020000414 -0500

The whitespace-seeking heuristic works great, even when the tab
has been converted to spaces by some email + copy-and-paste
related corruption.

Except for one problem: if the filename itself contains whitespace,
the inferred filename will be too short.

When Giuseppe ran into this problem, it was for a file creation
patch (for debian/licenses/LICENSE.global BSD-style Chromium).
So one can't use the list of files present in the index to deduce an
appropriate filename (not to mention that way lies madness; see
v0.99~402, 2005-05-31).

Instead, look for a timestamp and use that if present to mark the end
of the filename.  If no timestamp is present, the old heuristic is
used, with one exception: the space character \040 is not considered
terminating whitespace any more unless it is followed by a timestamp.

Reported-by: Giuseppe Iuculano <iuculano@debian.org>
Acked-by: Guido Günther <agx@sigxcpu.org>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
Thanks for the helpful advice and patience at my use of it.

 builtin/apply.c                  |  193 +++++++++++++++++++++++++++++++++++---
 t/t4135-apply-weird-filenames.sh |    4 +-
 2 files changed, 181 insertions(+), 16 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index efc109e..bd2fcb3 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -449,23 +449,157 @@ static char *find_name_gnu(const char *line, char *def, int p_value)
 	return squash_slash(strbuf_detach(&name, NULL));
 }
 
-static char *find_name(const char *line, char *def, int p_value, int terminate)
+static size_t tz_len(const char *line, size_t len)
+{
+	const char *tz, *p;
+
+	if (len < strlen(" +0500") || line[len-strlen(" +0500")] != ' ')
+		return 0;
+	tz = line + len - strlen(" +0500");
+
+	if (tz[1] != '+' && tz[1] != '-')
+		return 0;
+
+	for (p = tz + 2; p != line + len; p++)
+		if (!isdigit(*p))
+			return 0;
+
+	return line + len - tz;
+}
+
+static size_t date_len(const char *line, size_t len)
+{
+	const char *date, *p;
+
+	if (len < strlen("72-02-05") || line[len-strlen("-05")] != '-')
+		return 0;
+	p = date = line + len - strlen("72-02-05");
+
+	if (!isdigit(*p++) || !isdigit(*p++) || *p++ != '-' ||
+	    !isdigit(*p++) || !isdigit(*p++) || *p++ != '-' ||
+	    !isdigit(*p++) || !isdigit(*p++))	/* Not a date. */
+		return 0;
+
+	if (date - line >= strlen("19") &&
+	    isdigit(date[-1]) && isdigit(date[-2]))	/* 4-digit year */
+		date -= strlen("19");
+
+	return line + len - date;
+}
+
+static size_t short_time_len(const char *line, size_t len)
+{
+	const char *time, *p;
+
+	if (len < strlen(" 07:01:32") || line[len-strlen(":32")] != ':')
+		return 0;
+	p = time = line + len - strlen(" 07:01:32");
+
+	/* Permit 1-digit hours? */
+	if (*p++ != ' ' ||
+	    !isdigit(*p++) || !isdigit(*p++) || *p++ != ':' ||
+	    !isdigit(*p++) || !isdigit(*p++) || *p++ != ':' ||
+	    !isdigit(*p++) || !isdigit(*p++))	/* Not a time. */
+		return 0;
+
+	return line + len - time;
+}
+
+static size_t fractional_time_len(const char *line, size_t len)
+{
+	const char *p;
+	size_t n;
+
+	/* Expected format: 19:41:17.620000023 */
+	if (!len || !isdigit(line[len - 1]))
+		return 0;
+	p = line + len - 1;
+
+	/* Fractional seconds. */
+	while (p > line && isdigit(*p))
+		p--;
+	if (*p != '.')
+		return 0;
+
+	/* Hours, minutes, and whole seconds. */
+	n = short_time_len(line, p - line);
+	if (!n)
+		return 0;
+
+	return line + len - p + n;
+}
+
+static size_t trailing_spaces_len(const char *line, size_t len)
+{
+	const char *p;
+
+	/* Expected format: ' ' x (1 or more)  */
+	if (!len || line[len - 1] != ' ')
+		return 0;
+
+	p = line + len;
+	while (p != line) {
+		p--;
+		if (*p != ' ')
+			return line + len - (p + 1);
+	}
+
+	/* All spaces! */
+	return len;
+}
+
+static size_t diff_timestamp_len(const char *line, size_t len)
+{
+	const char *end = line + len;
+	size_t n;
+
+	/*
+	 * Posix: 2010-07-05 19:41:17
+	 * GNU: 2010-07-05 19:41:17.620000023 -0500
+	 */
+
+	if (!isdigit(end[-1]))
+		return 0;
+
+	n = tz_len(line, end - line);
+	end -= n;
+
+	n = short_time_len(line, end - line);
+	if (!n)
+		n = fractional_time_len(line, end - line);
+	end -= n;
+
+	n = date_len(line, end - line);
+	if (!n)	/* No date.  Too bad. */
+		return 0;
+	end -= n;
+
+	if (end == line)	/* No space before date. */
+		return 0;
+	if (end[-1] == '\t') {	/* Success! */
+		end--;
+		return line + len - end;
+	}
+	if (end[-1] != ' ')	/* No space before date. */
+		return 0;
+
+	/* Whitespace damage. */
+	end -= trailing_spaces_len(line, end - line);
+	return line + len - end;
+}
+
+static char *find_name_common(const char *line, char *def, int p_value,
+				const char *end, int terminate)
 {
 	int len;
 	const char *start = NULL;
 
-	if (*line == '"') {
-		char *name = find_name_gnu(line, def, p_value);
-		if (name)
-			return name;
-	}
-
 	if (p_value == 0)
 		start = line;
-	for (;;) {
+	while (line != end) {
 		char c = *line;
 
-		if (isspace(c)) {
+		if (!end && isspace(c)) {
 			if (c == '\n')
 				break;
 			if (name_terminate(start, line-start, c, terminate))
@@ -505,6 +639,37 @@ static char *find_name(const char *line, char *def, int p_value, int terminate)
 	return squash_slash(xmemdupz(start, len));
 }
 
+static char *find_name(const char *line, char *def, int p_value, int terminate)
+{
+	if (*line == '"') {
+		char *name = find_name_gnu(line, def, p_value);
+		if (name)
+			return name;
+	}
+
+	return find_name_common(line, def, p_value, NULL, terminate);
+}
+
+static char *find_name_traditional(const char *line, char *def, int p_value)
+{
+	size_t len = strlen(line);
+	size_t date_len;
+
+	if (*line == '"') {
+		char *name = find_name_gnu(line, def, p_value);
+		if (name)
+			return name;
+	}
+
+	len = strchrnul(line, '\n') - line;
+	date_len = diff_timestamp_len(line, len);
+	if (!date_len)
+		return find_name_common(line, def, p_value, NULL, TERM_TAB);
+	len -= date_len;
+
+	return find_name_common(line, def, p_value, line + len, 0);
+}
+
 static int count_slashes(const char *cp)
 {
 	int cnt = 0;
@@ -527,7 +692,7 @@ static int guess_p_value(const char *nameline)
 
 	if (is_dev_null(nameline))
 		return -1;
-	name = find_name(nameline, NULL, 0, TERM_SPACE | TERM_TAB);
+	name = find_name_traditional(nameline, NULL, 0);
 	if (!name)
 		return -1;
 	cp = strchr(name, '/');
@@ -646,16 +811,16 @@ static void parse_traditional_patch(const char *first, const char *second, struc
 	if (is_dev_null(first)) {
 		patch->is_new = 1;
 		patch->is_delete = 0;
-		name = find_name(second, NULL, p_value, TERM_SPACE | TERM_TAB);
+		name = find_name_traditional(second, NULL, p_value);
 		patch->new_name = name;
 	} else if (is_dev_null(second)) {
 		patch->is_new = 0;
 		patch->is_delete = 1;
-		name = find_name(first, NULL, p_value, TERM_SPACE | TERM_TAB);
+		name = find_name_traditional(first, NULL, p_value);
 		patch->old_name = name;
 	} else {
-		name = find_name(first, NULL, p_value, TERM_SPACE | TERM_TAB);
-		name = find_name(second, name, p_value, TERM_SPACE | TERM_TAB);
+		name = find_name_traditional(first, NULL, p_value);
+		name = find_name_traditional(second, name, p_value);
 		if (has_epoch_timestamp(first)) {
 			patch->is_new = 1;
 			patch->is_delete = 0;
diff --git a/t/t4135-apply-weird-filenames.sh b/t/t4135-apply-weird-filenames.sh
index 9373f64..1e5aad5 100755
--- a/t/t4135-apply-weird-filenames.sh
+++ b/t/t4135-apply-weird-filenames.sh
@@ -59,8 +59,8 @@ try_filename() {
 }
 
 try_filename 'plain'            'postimage.txt'
-try_filename 'with spaces'      'post image.txt' '' success failure failure
-try_filename 'with tab'         'post	image.txt' FUNNYNAMES success failure failure
+try_filename 'with spaces'      'post image.txt'
+try_filename 'with tab'         'post	image.txt' FUNNYNAMES
 try_filename 'with backslash'   'post\image.txt' BSLASHPSPEC
 try_filename 'with quote'       '"postimage".txt' FUNNYNAMES success failure success
 
-- 
1.7.2.1.544.ga752d.dirty

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-08-19  1:52 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-24  1:06 [PATCH 0/3] apply: handle traditional patches with spaces in filename Jonathan Nieder
2010-07-24  1:09 ` [PATCH 1/3] apply: Split quoted filename handling into new function Jonathan Nieder
2010-07-24  1:11 ` [PATCH 2/3] tests: Test how well "git apply" copes with weird filenames Jonathan Nieder
2010-07-24  8:03   ` Andreas Schwab
2010-07-24  8:48     ` Jonathan Nieder
2010-07-24  1:20 ` [PATCH 3/3] apply: Handle traditional patches with space in filename Jonathan Nieder
2010-08-11 23:35 What's cooking in git.git (Aug 2010, #02; Wed, 11) Junio C Hamano
2010-08-13 21:44 ` jn/apply-filename-with-sp (Re: What's cooking in git.git (Aug 2010, #02; Wed, 11)) Johannes Sixt
2010-08-14  2:27   ` Jonathan Nieder
2010-08-14 18:37     ` Johannes Sixt
2010-08-19  1:45       ` [PATCH v2 0/3] apply: handle traditional patches with space in filename Jonathan Nieder
2010-08-19  1:50         ` [PATCH 3/3] " Jonathan Nieder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.