All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks
@ 2010-09-18 17:25 Kirill Smelkov
  2010-09-18 17:25 ` [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program Kirill Smelkov
                   ` (3 more replies)
  0 siblings, 4 replies; 35+ messages in thread
From: Kirill Smelkov @ 2010-09-18 17:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Axel Bonnet, Clément Poulain, Diane Gasselin, Kirill Smelkov

Recently I've spot a bug in git blame --textconv, which was wrongly
calling pdftotext (my *.pdf conversion program) on a symlink.pdf, and I
was getting something like

    $ git blame -C -C regular-file.pdf
    Error: May not be a PDF file (continuing anyway)
    Error: PDF file is damaged - attempting to reconstruct xref table...
    Error: Couldn't find trailer dictionary
    Error: Couldn't read xref table
    Warning: program returned non-zero exit code #1
    fatal: unable to read files to diff

That errors come from pdftotext run on symlink.pdf being extracted to
/tmp/ with one-line plain-text content pointing to link destination.

First 2 patches demonstrate the problem (and are ready to go into git.git),
though third, while working for me, is only an RFC.

Thanks,
Kirill


Kirill Smelkov (3):
  tests: Prepare --textconv tests for correctly-failing conversion
    program
  blame,cat-file: Demonstrate --textconv is wrongly running converter
    on symlinks
  RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF |
    0664''

 builtin.h                        |    2 +-
 builtin/blame.c                  |   33 ++++++++++++++-------
 builtin/cat-file.c               |    2 +-
 sha1_name.c                      |    3 +-
 t/t4042-diff-textconv-caching.sh |   25 ++++++++--------
 t/t8006-blame-textconv.sh        |   57 +++++++++++++++++++++++++++++++++----
 t/t8007-cat-file-textconv.sh     |   38 ++++++++++++++++++++++---
 7 files changed, 122 insertions(+), 38 deletions(-)

-- 
1.7.3.rc2

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program
  2010-09-18 17:25 [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks Kirill Smelkov
@ 2010-09-18 17:25 ` Kirill Smelkov
  2010-09-18 19:14   ` Matthieu Moy
  2010-09-18 17:25 ` [PATCH 2/3] blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks Kirill Smelkov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 35+ messages in thread
From: Kirill Smelkov @ 2010-09-18 17:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Axel Bonnet, Clément Poulain, Diane Gasselin,
	Kirill Smelkov, Jeff King

Recently I've spot a bug in git blame --textconv, which was wrongly
calling pdftotext (my *.pdf conversion program) on a symlink.pdf, and I
was getting something like

    $ git blame -C -C regular-file.pdf
    Error: May not be a PDF file (continuing anyway)
    Error: PDF file is damaged - attempting to reconstruct xref table...
    Error: Couldn't find trailer dictionary
    Error: Couldn't read xref table
    Warning: program returned non-zero exit code #1
    fatal: unable to read files to diff

That errors come from pdftotext run on symlink.pdf being extracted to
/tmp/ with one-line plain-text content pointing to link destination.

So git-blame is wrong here, and I'm going to write problem-demonstration
tests + try to fix it, but first we have to convert existing textconv
converter, so it will mimic pdftext behaviour -- if there is no '^bin:'
in input -- it's not a "binary" file and helper exits with error.

No other semantic changes at this stage.

Cc: Axel Bonnet <axel.bonnet@ensimag.imag.fr>
Cc: Clément Poulain <clement.poulain@ensimag.imag.fr>
Cc: Diane Gasselin <diane.gasselin@ensimag.imag.fr>
Cc: Jeff King <peff@peff.net>
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
---
 t/t4042-diff-textconv-caching.sh |   25 +++++++++++++------------
 t/t8006-blame-textconv.sh        |   15 ++++++++-------
 t/t8007-cat-file-textconv.sh     |   11 ++++++-----
 3 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/t/t4042-diff-textconv-caching.sh b/t/t4042-diff-textconv-caching.sh
index 91f8198..7668099 100755
--- a/t/t4042-diff-textconv-caching.sh
+++ b/t/t4042-diff-textconv-caching.sh
@@ -5,18 +5,19 @@ test_description='test textconv caching'
 
 cat >helper <<'EOF'
 #!/bin/sh
-sed 's/^/converted: /' "$@" >helper.out
+grep -q '^bin: ' "$@" || { echo "E: $@ is not \"binary\" file" 1>&2; exit 1; }
+sed 's/^bin:/converted:/' "$@" >helper.out
 cat helper.out
 EOF
 chmod +x helper
 
 test_expect_success 'setup' '
-	echo foo content 1 >foo.bin &&
-	echo bar content 1 >bar.bin &&
+	echo "bin: foo content 1" >foo.bin &&
+	echo "bin: bar content 1" >bar.bin &&
 	git add . &&
 	git commit -m one &&
-	echo foo content 2 >foo.bin &&
-	echo bar content 2 >bar.bin &&
+	echo "bin: foo content 2" >foo.bin &&
+	echo "bin: bar content 2" >bar.bin &&
 	git commit -a -m two &&
 	echo "*.bin diff=magic" >.gitattributes &&
 	git config diff.magic.textconv ./helper &&
@@ -25,14 +26,14 @@ test_expect_success 'setup' '
 
 cat >expect <<EOF
 diff --git a/bar.bin b/bar.bin
-index fcf9166..28283d5 100644
+index 628fb83..f64d847 100644
 --- a/bar.bin
 +++ b/bar.bin
 @@ -1 +1 @@
 -converted: bar content 1
 +converted: bar content 2
 diff --git a/foo.bin b/foo.bin
-index d5b9fe3..1345db2 100644
+index 255496b..ad450ff 100644
 --- a/foo.bin
 +++ b/foo.bin
 @@ -1 +1 @@
@@ -59,7 +60,7 @@ test_expect_success 'cached textconv does not run helper' '
 
 cat >expect <<EOF
 diff --git a/bar.bin b/bar.bin
-index fcf9166..28283d5 100644
+index 628fb83..f64d847 100644
 --- a/bar.bin
 +++ b/bar.bin
 @@ -1,2 +1,2 @@
@@ -67,7 +68,7 @@ index fcf9166..28283d5 100644
 -converted: bar content 1
 +converted: bar content 2
 diff --git a/foo.bin b/foo.bin
-index d5b9fe3..1345db2 100644
+index 255496b..ad450ff 100644
 --- a/foo.bin
 +++ b/foo.bin
 @@ -1,2 +1,2 @@
@@ -76,7 +77,7 @@ index d5b9fe3..1345db2 100644
 +converted: foo content 2
 EOF
 test_expect_success 'changing textconv invalidates cache' '
-	echo other >other &&
+	echo "bin: other" >other &&
 	git config diff.magic.textconv "./helper other" &&
 	git diff HEAD^ HEAD >actual &&
 	test_cmp expect actual
@@ -84,7 +85,7 @@ test_expect_success 'changing textconv invalidates cache' '
 
 cat >expect <<EOF
 diff --git a/bar.bin b/bar.bin
-index fcf9166..28283d5 100644
+index 628fb83..f64d847 100644
 --- a/bar.bin
 +++ b/bar.bin
 @@ -1,2 +1,2 @@
@@ -92,7 +93,7 @@ index fcf9166..28283d5 100644
 -converted: bar content 1
 +converted: bar content 2
 diff --git a/foo.bin b/foo.bin
-index d5b9fe3..1345db2 100644
+index 255496b..ad450ff 100644
 --- a/foo.bin
 +++ b/foo.bin
 @@ -1 +1 @@
diff --git a/t/t8006-blame-textconv.sh b/t/t8006-blame-textconv.sh
index 9ad96d4..d0f8d62 100755
--- a/t/t8006-blame-textconv.sh
+++ b/t/t8006-blame-textconv.sh
@@ -9,22 +9,23 @@ find_blame() {
 
 cat >helper <<'EOF'
 #!/bin/sh
-sed 's/^/converted: /' "$@"
+grep -q '^bin: ' "$@" || { echo "E: $@ is not \"binary\" file" 1>&2; exit 1; }
+sed 's/^bin:/converted:/' "$@"
 EOF
 chmod +x helper
 
 test_expect_success 'setup ' '
-	echo test 1 >one.bin &&
-	echo test number 2 >two.bin &&
+	echo "bin: test 1" >one.bin &&
+	echo "bin: test number 2" >two.bin &&
 	git add . &&
 	GIT_AUTHOR_NAME=Number1 git commit -a -m First --date="2010-01-01 18:00:00" &&
-	echo test 1 version 2 >one.bin &&
-	echo test number 2 version 2 >>two.bin &&
+	echo "bin: test 1 version 2" >one.bin &&
+	echo "bin: test number 2 version 2" >>two.bin &&
 	GIT_AUTHOR_NAME=Number2 git commit -a -m Second --date="2010-01-01 20:00:00"
 '
 
 cat >expected <<EOF
-(Number2 2010-01-01 20:00:00 +0000 1) test 1 version 2
+(Number2 2010-01-01 20:00:00 +0000 1) bin: test 1 version 2
 EOF
 
 test_expect_success 'no filter specified' '
@@ -67,7 +68,7 @@ test_expect_success 'blame --textconv going through revisions' '
 '
 
 test_expect_success 'make a new commit' '
-	echo "test number 2 version 3" >>two.bin &&
+	echo "bin: test number 2 version 3" >>two.bin &&
 	GIT_AUTHOR_NAME=Number3 git commit -a -m Third --date="2010-01-01 22:00:00"
 '
 
diff --git a/t/t8007-cat-file-textconv.sh b/t/t8007-cat-file-textconv.sh
index 38ac05e..413d623 100755
--- a/t/t8007-cat-file-textconv.sh
+++ b/t/t8007-cat-file-textconv.sh
@@ -5,15 +5,16 @@ test_description='git cat-file textconv support'
 
 cat >helper <<'EOF'
 #!/bin/sh
-sed 's/^/converted: /' "$@"
+grep -q '^bin: ' "$@" || { echo "E: $@ is not \"binary\" file" 1>&2; exit 1; }
+sed 's/^bin:/converted:/' "$@"
 EOF
 chmod +x helper
 
 test_expect_success 'setup ' '
-	echo test >one.bin &&
+	echo "bin: test" >one.bin &&
 	git add . &&
 	GIT_AUTHOR_NAME=Number1 git commit -a -m First --date="2010-01-01 18:00:00" &&
-	echo test version 2 >one.bin &&
+	echo "bin: test version 2" >one.bin &&
 	GIT_AUTHOR_NAME=Number2 git commit -a -m Second --date="2010-01-01 20:00:00"
 '
 
@@ -33,7 +34,7 @@ test_expect_success 'setup textconv filters' '
 '
 
 cat >expected <<EOF
-test version 2
+bin: test version 2
 EOF
 
 test_expect_success 'cat-file without --textconv' '
@@ -42,7 +43,7 @@ test_expect_success 'cat-file without --textconv' '
 '
 
 cat >expected <<EOF
-test
+bin: test
 EOF
 
 test_expect_success 'cat-file without --textconv on previous commit' '
-- 
1.7.3.rc2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/3] blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks
  2010-09-18 17:25 [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks Kirill Smelkov
  2010-09-18 17:25 ` [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program Kirill Smelkov
@ 2010-09-18 17:25 ` Kirill Smelkov
  2010-09-18 19:26   ` Matthieu Moy
  2010-09-18 17:25 ` [PATCH 3/3] RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664'' Kirill Smelkov
  2010-09-18 18:08 ` [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks Matthieu Moy
  3 siblings, 1 reply; 35+ messages in thread
From: Kirill Smelkov @ 2010-09-18 17:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Axel Bonnet, Clément Poulain, Diane Gasselin, Kirill Smelkov

Because as described in previous patch - it should not.

Several failures are demonstrated here:

  - git cat-file --textconv :symlink.bin    # also HEAD:symlink.bin
  - git blame --textconv symlink.bin
  - git blame -C -C --textconv regular-file # but also looks on symlink.bin

At present they all fail with something like.

    E: /tmp/j3ELEs_symlink.bin is not "binary" file

Cc: Axel Bonnet <axel.bonnet@ensimag.imag.fr>
Cc: Clément Poulain <clement.poulain@ensimag.imag.fr>
Cc: Diane Gasselin <diane.gasselin@ensimag.imag.fr>
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
---
 t/t8006-blame-textconv.sh    |   44 ++++++++++++++++++++++++++++++++++++++++++
 t/t8007-cat-file-textconv.sh |   29 +++++++++++++++++++++++++++
 2 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/t/t8006-blame-textconv.sh b/t/t8006-blame-textconv.sh
index d0f8d62..3ed6155 100755
--- a/t/t8006-blame-textconv.sh
+++ b/t/t8006-blame-textconv.sh
@@ -17,10 +17,12 @@ chmod +x helper
 test_expect_success 'setup ' '
 	echo "bin: test 1" >one.bin &&
 	echo "bin: test number 2" >two.bin &&
+	ln -s one.bin symlink.bin &&
 	git add . &&
 	GIT_AUTHOR_NAME=Number1 git commit -a -m First --date="2010-01-01 18:00:00" &&
 	echo "bin: test 1 version 2" >one.bin &&
 	echo "bin: test number 2 version 2" >>two.bin &&
+	ln -sf two.bin symlink.bin &&
 	GIT_AUTHOR_NAME=Number2 git commit -a -m Second --date="2010-01-01 20:00:00"
 '
 
@@ -78,4 +80,46 @@ test_expect_success 'blame from previous revision' '
 	test_cmp expected result
 '
 
+cat >expected <<EOF
+(Number2 2010-01-01 20:00:00 +0000 1) two.bin
+EOF
+
+test_expect_success 'blame with --no-textconv (on symlink)' '
+	git blame --no-textconv symlink.bin >blame &&
+	find_blame <blame >result &&
+	test_cmp expected result
+'
+
+# fails with '...symlink.bin is not "binary" file'
+test_expect_failure 'blame --textconv (on symlink)' '
+	git blame --textconv symlink.bin >blame &&
+	find_blame <blame >result &&
+	test_cmp expected result
+'
+
+# cp two.bin three.bin  and make small tweak
+# (this will direct blame -C -C three.bin to consider two.bin and symlink.bin)
+test_expect_success 'make another new commit' '
+	echo "bin: test number 2" >three.bin &&
+	echo "bin: test number 2 version 2" >>three.bin &&
+	echo "bin: test number 2 version 3" >>three.bin &&
+	echo "bin: test number 3" >>three.bin &&
+	git add three.bin &&
+	GIT_AUTHOR_NAME=Number4 git commit -a -m Fourth --date="2010-01-01 23:00:00"
+'
+
+cat >expected <<EOF
+(Number1 2010-01-01 18:00:00 +0000 1) converted: test number 2
+(Number2 2010-01-01 20:00:00 +0000 2) converted: test number 2 version 2
+(Number3 2010-01-01 22:00:00 +0000 3) converted: test number 2 version 3
+(Number4 2010-01-01 23:00:00 +0000 4) converted: test number 3
+EOF
+
+# fails with '...symlink.bin is not "binary" file'
+test_expect_failure 'blame on last commit (-C -C, symlink)' '
+	git blame -C -C three.bin >blame &&
+	find_blame <blame >result &&
+	test_cmp expected result
+'
+
 test_done
diff --git a/t/t8007-cat-file-textconv.sh b/t/t8007-cat-file-textconv.sh
index 413d623..dfb2b04 100755
--- a/t/t8007-cat-file-textconv.sh
+++ b/t/t8007-cat-file-textconv.sh
@@ -12,6 +12,7 @@ chmod +x helper
 
 test_expect_success 'setup ' '
 	echo "bin: test" >one.bin &&
+	ln -s one.bin symlink.bin &&
 	git add . &&
 	GIT_AUTHOR_NAME=Number1 git commit -a -m First --date="2010-01-01 18:00:00" &&
 	echo "bin: test version 2" >one.bin &&
@@ -68,4 +69,32 @@ test_expect_success 'cat-file --textconv on previous commit' '
 	git cat-file --textconv HEAD^:one.bin >result &&
 	test_cmp expected result
 '
+
+echo -n "one.bin" >expected
+
+test_expect_success 'cat-file without --textconv (symlink)' '
+	git cat-file blob :symlink.bin >result &&
+	test_cmp expected result
+'
+
+cat >expected <<EOF
+fatal: git cat-file --textconv: unable to run textconv on :symlink.bin
+EOF
+
+# fails because cat-file tries to run converter on symlink.bin
+test_expect_failure 'cat-file --textconv on index (symlink)' '
+	! git cat-file --textconv :symlink.bin 2>result &&
+	test_cmp expected result
+'
+
+cat >expected <<EOF
+fatal: git cat-file --textconv: unable to run textconv on HEAD:symlink.bin
+EOF
+
+# fails because cat-file tries to run converter on symlink.bin
+test_expect_failure 'cat-file --textconv on HEAD (symlink)' '
+	! git cat-file --textconv HEAD:symlink.bin 2>result &&
+	test_cmp expected result
+'
+
 test_done
-- 
1.7.3.rc2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/3] RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664''
  2010-09-18 17:25 [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks Kirill Smelkov
  2010-09-18 17:25 ` [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program Kirill Smelkov
  2010-09-18 17:25 ` [PATCH 2/3] blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks Kirill Smelkov
@ 2010-09-18 17:25 ` Kirill Smelkov
  2010-09-18 19:04   ` Matthieu Moy
                     ` (2 more replies)
  2010-09-18 18:08 ` [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks Matthieu Moy
  3 siblings, 3 replies; 35+ messages in thread
From: Kirill Smelkov @ 2010-09-18 17:25 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Axel Bonnet, Clément Poulain, Diane Gasselin, Kirill Smelkov

Instead get the mode from either worktree, index, .git, or origin
entries when blaming and pass it to textconv_object() as context.

The reason to do it is not to run textconv filters on symlinks.

~~~~

I don't know blame code well, and I'm not sure I'm doing it right or
otherwise without mistakes. Thus an RFC.

Cc: Axel Bonnet <axel.bonnet@ensimag.imag.fr>
Cc: Clément Poulain <clement.poulain@ensimag.imag.fr>
Cc: Diane Gasselin <diane.gasselin@ensimag.imag.fr>
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>
---
 builtin.h                    |    2 +-
 builtin/blame.c              |   33 ++++++++++++++++++++++-----------
 builtin/cat-file.c           |    2 +-
 sha1_name.c                  |    3 ++-
 t/t8006-blame-textconv.sh    |    6 ++----
 t/t8007-cat-file-textconv.sh |    6 ++----
 6 files changed, 30 insertions(+), 22 deletions(-)

diff --git a/builtin.h b/builtin.h
index 0398d24..9bf69ee 100644
--- a/builtin.h
+++ b/builtin.h
@@ -35,7 +35,7 @@ void finish_copy_notes_for_rewrite(struct notes_rewrite_cfg *c);
 
 extern int check_pager_config(const char *cmd);
 
-extern int textconv_object(const char *path, const unsigned char *sha1, char **buf, unsigned long *buf_size);
+extern int textconv_object(const char *path, unsigned mode, const unsigned char *sha1, char **buf, unsigned long *buf_size);
 
 extern int cmd_add(int argc, const char **argv, const char *prefix);
 extern int cmd_annotate(int argc, const char **argv, const char *prefix);
diff --git a/builtin/blame.c b/builtin/blame.c
index 1015354..cfb7d30 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -83,6 +83,7 @@ struct origin {
 	struct commit *commit;
 	mmfile_t file;
 	unsigned char blob_sha1[20];
+	unsigned mode;
 	char path[FLEX_ARRAY];
 };
 
@@ -92,6 +93,7 @@ struct origin {
  * Return 1 if the conversion succeeds, 0 otherwise.
  */
 int textconv_object(const char *path,
+		    unsigned mode,
 		    const unsigned char *sha1,
 		    char **buf,
 		    unsigned long *buf_size)
@@ -100,7 +102,7 @@ int textconv_object(const char *path,
 	struct userdiff_driver *textconv;
 
 	df = alloc_filespec(path);
-	fill_filespec(df, sha1, S_IFREG | 0664);
+	fill_filespec(df, sha1, mode);
 	textconv = get_textconv(df);
 	if (!textconv) {
 		free_filespec(df);
@@ -125,7 +127,7 @@ static void fill_origin_blob(struct diff_options *opt,
 
 		num_read_blob++;
 		if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) &&
-		    textconv_object(o->path, o->blob_sha1, &file->ptr, &file_size))
+		    textconv_object(o->path, o->mode, o->blob_sha1, &file->ptr, &file_size))
 			;
 		else
 			file->ptr = read_sha1_file(o->blob_sha1, &type, &file_size);
@@ -313,21 +315,23 @@ static struct origin *get_origin(struct scoreboard *sb,
  * for an origin is also used to pass the blame for the entire file to
  * the parent to detect the case where a child's blob is identical to
  * that of its parent's.
+ *
+ * This also fills origin->mode for correspoinding tree path.
  */
-static int fill_blob_sha1(struct origin *origin)
+static int fill_blob_sha1_and_mode(struct origin *origin)
 {
-	unsigned mode;
 	if (!is_null_sha1(origin->blob_sha1))
 		return 0;
 	if (get_tree_entry(origin->commit->object.sha1,
 			   origin->path,
-			   origin->blob_sha1, &mode))
+			   origin->blob_sha1, &origin->mode))
 		goto error_out;
 	if (sha1_object_info(origin->blob_sha1, NULL) != OBJ_BLOB)
 		goto error_out;
 	return 0;
  error_out:
 	hashclr(origin->blob_sha1);
+	origin->mode = S_IFINVALID;
 	return -1;
 }
 
@@ -360,12 +364,14 @@ static struct origin *find_origin(struct scoreboard *sb,
 			/*
 			 * If the origin was newly created (i.e. get_origin
 			 * would call make_origin if none is found in the
-			 * scoreboard), it does not know the blob_sha1,
+			 * scoreboard), it does not know the blob_sha1/mode,
 			 * so copy it.  Otherwise porigin was in the
-			 * scoreboard and already knows blob_sha1.
+			 * scoreboard and already knows blob_sha1/mode.
 			 */
-			if (porigin->refcnt == 1)
+			if (porigin->refcnt == 1) {
 				hashcpy(porigin->blob_sha1, cached->blob_sha1);
+				porigin->mode = cached->mode;
+			}
 			return porigin;
 		}
 		/* otherwise it was not very useful; free it */
@@ -400,6 +406,7 @@ static struct origin *find_origin(struct scoreboard *sb,
 		/* The path is the same as parent */
 		porigin = get_origin(sb, parent, origin->path);
 		hashcpy(porigin->blob_sha1, origin->blob_sha1);
+		porigin->mode = origin->mode;
 	} else {
 		/*
 		 * Since origin->path is a pathspec, if the parent
@@ -425,6 +432,7 @@ static struct origin *find_origin(struct scoreboard *sb,
 		case 'M':
 			porigin = get_origin(sb, parent, origin->path);
 			hashcpy(porigin->blob_sha1, p->one->sha1);
+			porigin->mode = p->one->mode;
 			break;
 		case 'A':
 		case 'T':
@@ -444,6 +452,7 @@ static struct origin *find_origin(struct scoreboard *sb,
 
 		cached = make_origin(porigin->commit, porigin->path);
 		hashcpy(cached->blob_sha1, porigin->blob_sha1);
+		cached->mode = porigin->mode;
 		parent->util = cached;
 	}
 	return porigin;
@@ -486,6 +495,7 @@ static struct origin *find_rename(struct scoreboard *sb,
 		    !strcmp(p->two->path, origin->path)) {
 			porigin = get_origin(sb, parent, p->one->path);
 			hashcpy(porigin->blob_sha1, p->one->sha1);
+			porigin->mode = p->one->mode;
 			break;
 		}
 	}
@@ -1099,6 +1109,7 @@ static int find_copy_in_parent(struct scoreboard *sb,
 
 			norigin = get_origin(sb, parent, p->one->path);
 			hashcpy(norigin->blob_sha1, p->one->sha1);
+			norigin->mode = p->one->mode;
 			fill_origin_blob(&sb->revs->diffopt, norigin, &file_p);
 			if (!file_p.ptr)
 				continue;
@@ -2075,7 +2086,7 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 		switch (st.st_mode & S_IFMT) {
 		case S_IFREG:
 			if (DIFF_OPT_TST(opt, ALLOW_TEXTCONV) &&
-			    textconv_object(read_from, null_sha1, &buf.buf, &buf_len))
+			    textconv_object(read_from, mode, null_sha1, &buf.buf, &buf_len))
 				buf.len = buf_len;
 			else if (strbuf_read_file(&buf, read_from, st.st_size) != st.st_size)
 				die_errno("cannot open or read '%s'", read_from);
@@ -2455,11 +2466,11 @@ parse_done:
 	}
 	else {
 		o = get_origin(&sb, sb.final, path);
-		if (fill_blob_sha1(o))
+		if (fill_blob_sha1_and_mode(o))
 			die("no such path %s in %s", path, final_commit_name);
 
 		if (DIFF_OPT_TST(&sb.revs->diffopt, ALLOW_TEXTCONV) &&
-		    textconv_object(path, o->blob_sha1, (char **) &sb.final_buf,
+		    textconv_object(path, o->mode, o->blob_sha1, (char **) &sb.final_buf,
 				    &sb.final_buf_size))
 			;
 		else
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 76ec3fe..94632db 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -143,7 +143,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name)
 			die("git cat-file --textconv %s: <object> must be <sha1:path>",
 			    obj_name);
 
-		if (!textconv_object(obj_context.path, sha1, &buf, &size))
+		if (!textconv_object(obj_context.path, obj_context.mode, sha1, &buf, &size))
 			die("git cat-file --textconv: unable to run textconv on %s",
 			    obj_name);
 		break;
diff --git a/sha1_name.c b/sha1_name.c
index 7b7e617..5470a69 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1068,7 +1068,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
 		struct cache_entry *ce;
 		int pos;
 		if (namelen > 2 && name[1] == '/')
-			return get_sha1_oneline(name + 2, sha1);
+			return get_sha1_oneline(name + 2, sha1);    /* XXX also mode? */
 		if (namelen < 3 ||
 		    name[2] != ':' ||
 		    name[1] < '0' || '3' < name[1])
@@ -1095,6 +1095,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
 				break;
 			if (ce_stage(ce) == stage) {
 				hashcpy(sha1, ce->sha1);
+				oc->mode = ce->ce_mode; /* XXX ok? */
 				return 0;
 			}
 			pos++;
diff --git a/t/t8006-blame-textconv.sh b/t/t8006-blame-textconv.sh
index 3ed6155..43074d1 100755
--- a/t/t8006-blame-textconv.sh
+++ b/t/t8006-blame-textconv.sh
@@ -90,8 +90,7 @@ test_expect_success 'blame with --no-textconv (on symlink)' '
 	test_cmp expected result
 '
 
-# fails with '...symlink.bin is not "binary" file'
-test_expect_failure 'blame --textconv (on symlink)' '
+test_expect_success 'blame --textconv (on symlink)' '
 	git blame --textconv symlink.bin >blame &&
 	find_blame <blame >result &&
 	test_cmp expected result
@@ -115,8 +114,7 @@ cat >expected <<EOF
 (Number4 2010-01-01 23:00:00 +0000 4) converted: test number 3
 EOF
 
-# fails with '...symlink.bin is not "binary" file'
-test_expect_failure 'blame on last commit (-C -C, symlink)' '
+test_expect_success 'blame on last commit (-C -C, symlink)' '
 	git blame -C -C three.bin >blame &&
 	find_blame <blame >result &&
 	test_cmp expected result
diff --git a/t/t8007-cat-file-textconv.sh b/t/t8007-cat-file-textconv.sh
index dfb2b04..2b3d56a 100755
--- a/t/t8007-cat-file-textconv.sh
+++ b/t/t8007-cat-file-textconv.sh
@@ -81,8 +81,7 @@ cat >expected <<EOF
 fatal: git cat-file --textconv: unable to run textconv on :symlink.bin
 EOF
 
-# fails because cat-file tries to run converter on symlink.bin
-test_expect_failure 'cat-file --textconv on index (symlink)' '
+test_expect_success 'cat-file --textconv on index (symlink)' '
 	! git cat-file --textconv :symlink.bin 2>result &&
 	test_cmp expected result
 '
@@ -91,8 +90,7 @@ cat >expected <<EOF
 fatal: git cat-file --textconv: unable to run textconv on HEAD:symlink.bin
 EOF
 
-# fails because cat-file tries to run converter on symlink.bin
-test_expect_failure 'cat-file --textconv on HEAD (symlink)' '
+test_expect_success 'cat-file --textconv on HEAD (symlink)' '
 	! git cat-file --textconv HEAD:symlink.bin 2>result &&
 	test_cmp expected result
 '
-- 
1.7.3.rc2

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks
  2010-09-18 17:25 [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks Kirill Smelkov
                   ` (2 preceding siblings ...)
  2010-09-18 17:25 ` [PATCH 3/3] RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664'' Kirill Smelkov
@ 2010-09-18 18:08 ` Matthieu Moy
  2010-09-18 20:01   ` Junio C Hamano
  3 siblings, 1 reply; 35+ messages in thread
From: Matthieu Moy @ 2010-09-18 18:08 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Junio C Hamano, git, Axel Bonnet, Clément Poulain, Diane Gasselin

Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:

> Recently I've spot a bug in git blame --textconv, which was wrongly
> calling pdftotext (my *.pdf conversion program) on a symlink.pdf,

No time for a detailed review now, but textconv is also called by diff
(that's the original implementation). Does "git diff" work properly on
symlinks in the same case?

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664''
  2010-09-18 17:25 ` [PATCH 3/3] RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664'' Kirill Smelkov
@ 2010-09-18 19:04   ` Matthieu Moy
  2010-09-20 18:21   ` Jeff King
  2010-09-20 21:01   ` [PATCH] sha1_name.c: update comment to mention :/foo syntax Matthieu Moy
  2 siblings, 0 replies; 35+ messages in thread
From: Matthieu Moy @ 2010-09-18 19:04 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Junio C Hamano, git, Axel Bonnet, Clément Poulain, Diane Gasselin

Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:

> I don't know blame code well, and I'm not sure I'm doing it right or
> otherwise without mistakes. Thus an RFC.

I don't know the code well either, but I didn't see any obvious
problem in your code.

> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -1068,7 +1068,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
>  		struct cache_entry *ce;
>  		int pos;
>  		if (namelen > 2 && name[1] == '/')
> -			return get_sha1_oneline(name + 2, sha1);
> +			return get_sha1_oneline(name + 2, sha1);    /* XXX also mode? */

(This is the case where we're parsing ":/foo")

Currently, the mode is set a few lines above:

	oc->mode = S_IFINVALID;

I guess this is OK since :/foo will return a commit sha1, not a file
sha1.

> @@ -1095,6 +1095,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
>  				break;
>  			if (ce_stage(ce) == stage) {
>  				hashcpy(sha1, ce->sha1);
> +				oc->mode = ce->ce_mode; /* XXX ok? */

I'd say this is OK, you're setting the mode from the index. What was
the reason for your question mark?

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program
  2010-09-18 17:25 ` [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program Kirill Smelkov
@ 2010-09-18 19:14   ` Matthieu Moy
  0 siblings, 0 replies; 35+ messages in thread
From: Matthieu Moy @ 2010-09-18 19:14 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Junio C Hamano, git, Axel Bonnet, Clément Poulain,
	Diane Gasselin, Jeff King

Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:

> Recently I've spot a bug

We usually avoid the first person in commit messages. The cover letter
is a good place to tell about your personal story, but the commit
message is what will remain, what people will read after a blame or
bisect. They won't care whether you've "recently" found a bug, or in
which circumstances you've found it.

I'd write stg like (which would probably go to PATCH 2/3 instead of
here):

-----8<----
git blame --textconv is wrongly calling the textconv filter on
symlinks: symlinks are stored as blobs whose content is the target of
the link, and blame calls the textconv filter on a temporary file
filled-in with the content of this blob.

For example:

>     $ git blame -C -C regular-file.pdf
>     Error: May not be a PDF file (continuing anyway)
>     Error: PDF file is damaged - attempting to reconstruct xref table...
>     Error: Couldn't find trailer dictionary
>     Error: Couldn't read xref table
>     Warning: program returned non-zero exit code #1
>     fatal: unable to read files to diff
-----8<----

> So git-blame is wrong here, and I'm going to write problem-demonstration
> tests + try to fix it, but first we have to convert existing textconv
> converter, so it will mimic pdftext behaviour -- if there is no '^bin:'
> in input -- it's not a "binary" file and helper exits with error.

What's interesting here is not that you mimick pdftext behavior, but
that you allow to easily distinguish file content and symlink target.

Here's my try:

-----8<----
The textconv filter is sometimes incorrectly ran on a temporary file
whose content is the target of a symbolic link, instead of actual file
content. Prepare to test this by marking the content of the file to
convert with "bin:", and let the helper die if "bin:" is not found in
the file content.
-----8<----

> No other semantic changes at this stage.

Otherwise, the code looks OK.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/3] blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks
  2010-09-18 17:25 ` [PATCH 2/3] blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks Kirill Smelkov
@ 2010-09-18 19:26   ` Matthieu Moy
  0 siblings, 0 replies; 35+ messages in thread
From: Matthieu Moy @ 2010-09-18 19:26 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Junio C Hamano, git, Axel Bonnet, Clément Poulain, Diane Gasselin

Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:

> Subject: Re: [PATCH 2/3] blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks

We try to keep the subject lines short (<80 chars, and as much as
possible less so that "git log --oneline" be pretty).

How about

blame,cat-file: add failing tests for --textconv on symlinks

> Because as described in previous patch - it should not.

Since the actual problem is exhibited here, I think it is the best
place to actually describe it.

> +test_expect_success 'make another new commit' '
> +	echo "bin: test number 2" >three.bin &&
> +	echo "bin: test number 2 version 2" >>three.bin &&
> +	echo "bin: test number 2 version 3" >>three.bin &&
> +	echo "bin: test number 3" >>three.bin &&

cat >three.bin <<EOF
bin: test number 2
bin: test number 2 version 2
bin: test number 2 version 3
bin: test number 3
EOF

?

> +cat >expected <<EOF
> +(Number1 2010-01-01 18:00:00 +0000 1) converted: test number 2
> +(Number2 2010-01-01 20:00:00 +0000 2) converted: test number 2 version 2
> +(Number3 2010-01-01 22:00:00 +0000 3) converted: test number 2 version 3
> +(Number4 2010-01-01 23:00:00 +0000 4) converted: test number 3
> +EOF

These days, it's recommanded to put this kind of code within the
test_expect_success/test_expect_failure.

> +
> +echo -n "one.bin" >expected

echo -n is not very portable (and doesn't seem to be used in git's t/
directory). Better use

printf "%s" "one.bin" >expected

(again, within text_expect_failure if possible)

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks
  2010-09-18 18:08 ` [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks Matthieu Moy
@ 2010-09-18 20:01   ` Junio C Hamano
  2010-09-19  8:58     ` Matthieu Moy
  2010-09-20 18:00     ` Jeff King
  0 siblings, 2 replies; 35+ messages in thread
From: Junio C Hamano @ 2010-09-18 20:01 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Kirill Smelkov, git, Axel Bonnet, Clément Poulain, Diane Gasselin

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

> Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:
>
>> Recently I've spot a bug in git blame --textconv, which was wrongly
>> calling pdftotext (my *.pdf conversion program) on a symlink.pdf,
>
> No time for a detailed review now, but textconv is also called by diff
> (that's the original implementation). Does "git diff" work properly on
> symlinks in the same case?

diff knows symlinks and regular files are different, and produces "delete
old then add new" if you changed a regular file to a symlink.

That said, if you changed a symlink from pointing at A to pointing at B,
it does run the textual diff between the string we get from readlink(3).

I happen to think that textconv, if specified, for such a path should be
honored, so that people can keep doing whatever munging they have been
doing in their existing textconv filters.

I didn't look at the thread or problem description, but are we running the
textconv filter on the file that symlink points at, instead of the
pathname stored in the symlink?  If so I'd call that a bug.

On the other hand, if we are running the textconv filter on the pathname,
then I don't think we are doing anything wrong.  If you have a filter that
is meant to munge a PDF file to some other format, and if you do not want
to apply that filter to munge a pathname a symlink that happens to be
named "foo.pdf", either the filter itself or the attributes pattern you
are using to choose what paths to apply that filter might want to be
written more carefully, that's all.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks
  2010-09-18 20:01   ` Junio C Hamano
@ 2010-09-19  8:58     ` Matthieu Moy
  2010-09-19 18:17       ` Junio C Hamano
  2010-09-20 18:00     ` Jeff King
  1 sibling, 1 reply; 35+ messages in thread
From: Matthieu Moy @ 2010-09-19  8:58 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kirill Smelkov, git, Axel Bonnet, Clément Poulain, Diane Gasselin

Junio C Hamano <gitster@pobox.com> writes:

> That said, if you changed a symlink from pointing at A to pointing at B,
> it does run the textual diff between the string we get from readlink(3).

Yes, and my question was to make sure we don't run the textconv filter
on these strings.

I've just checked, and from my little tests, git diff doesn't try to
textconv the pathnames, it runs the textual diff without textconv,
which is the expected behavior.

> I didn't look at the thread or problem description, but are we running the
> textconv filter on the file that symlink points at, instead of the
> pathname stored in the symlink?  If so I'd call that a bug.

No, we don't do that. And yes, it's good not to do that: AFAIK, Git
never looks at the file symlinks points to (which allows one to store
broken symlinks or symlinks pointing outside the repository).

> On the other hand, if we are running the textconv filter on the pathname,
> then I don't think we are doing anything wrong.  If you have a filter that
> is meant to munge a PDF file to some other format, and if you do not want
> to apply that filter to munge a pathname a symlink that happens to be
> named "foo.pdf", either the filter itself or the attributes pattern you
> are using to choose what paths to apply that filter might want to be
> written more carefully, that's all.

If a symlink points to "foo.pdf", then you really want to call the
string "foo.pdf" a pathname, not a "content", although it happens to
be stored in a blob content.

If you have a program that can textconv PDF files, then you really
want to say

*.pdf textconv=the-driver-for-that-program

in your .gitattributes file, and let Git call the program on _files_,
not pathnames, and hence not symlink target (i.e. result of readlink).

If you think Git should call the textconv filter on pathnames, and let
the driver decide whether to call it depending on whether it's a file
content or a symlink target, then you should 1) give a use-case where
it's usefull to textconv symlink targets (I really don't see any, and
even on highly artificial examples, I can't imagine one where the same
driver should textconv file content and pathnames), and 2) tell us the
syntax to use in .gitattributes or in the driver to call it only on
file content (I don't know any, and the .gitattribute manpage doesn't
containt either "link" or "mode", so I don't think there is actually
any).

So, I'm pretty sure either I misunderstood you or you misunderstood
something ;-).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks
  2010-09-19  8:58     ` Matthieu Moy
@ 2010-09-19 18:17       ` Junio C Hamano
  0 siblings, 0 replies; 35+ messages in thread
From: Junio C Hamano @ 2010-09-19 18:17 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Kirill Smelkov, git, Axel Bonnet, Clément Poulain, Diane Gasselin

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> That said, if you changed a symlink from pointing at A to pointing at B,
>> it does run the textual diff between the string we get from readlink(3).
>
> Yes, and my question was to make sure we don't run the textconv filter
> on these strings.
>
> I've just checked, and from my little tests, git diff doesn't try to
> textconv the pathnames, it runs the textual diff without textconv,
> which is the expected behavior.

Ok, then I think it makes sense to make sure we don't run the textconv
filter on them.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks
  2010-09-18 20:01   ` Junio C Hamano
  2010-09-19  8:58     ` Matthieu Moy
@ 2010-09-20 18:00     ` Jeff King
  2010-09-20 20:18       ` Johannes Sixt
  2010-09-21 17:57       ` Junio C Hamano
  1 sibling, 2 replies; 35+ messages in thread
From: Jeff King @ 2010-09-20 18:00 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthieu Moy, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

On Sat, Sep 18, 2010 at 01:01:17PM -0700, Junio C Hamano wrote:

> diff knows symlinks and regular files are different, and produces "delete
> old then add new" if you changed a regular file to a symlink.
> 
> That said, if you changed a symlink from pointing at A to pointing at B,
> it does run the textual diff between the string we get from readlink(3).
> 
> I happen to think that textconv, if specified, for such a path should be
> honored, so that people can keep doing whatever munging they have been
> doing in their existing textconv filters.

I think you came to the conclusion later in the thread that this is a
bad idea, if only because it is not how "git diff" works, but I wanted
to make one additional point.

I think that git, being symlink aware, needs to behave similarly to
"lstat". That is, we should never dereference symlinks transparently
when diffing or analyzing content, because otherwise there is no way to
actually look at the symlink data itself. It is the user's
responsibility to dereference symlinks in their diffs (e.g., I can get
either the symlink data _or_ the actual file data by doing "git diff
symlink-to-foo.bin" or "git diff foo.bin". If git dereferenced for me, I
would get file data for _both_). Not to mention that we can't always
dereference anyway because of broken links or links outside the repo, as
Matthieu pointed out.

So doing anything but a straight text diff for the pathnames in symlink
blobs is, IMHO, a bug.

The one thing this does not enable is using a special diff driver on the
_pathnames_ of symlinks. Since these are by-definition text, I don't
know why anyone would want to do that. But it is an orthogonal problem,
anyway.  We would need some way in the .gitattributes or the .gitconfig
to say "this is the diff driver to use not based on pathname matching,
but based on the file's mode". E.g., a special "SYMLINK" diff driver
like:

  [diff "SYMLINK"]
    textconv = pointless-munge

But again, I have no idea why anyone would want such a feature, so it is
not worth thinking too hard about it.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 3/3] RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664''
  2010-09-18 17:25 ` [PATCH 3/3] RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664'' Kirill Smelkov
  2010-09-18 19:04   ` Matthieu Moy
@ 2010-09-20 18:21   ` Jeff King
  2010-09-20 20:35     ` [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program Kirill Smelkov
  2010-09-20 21:01   ` [PATCH] sha1_name.c: update comment to mention :/foo syntax Matthieu Moy
  2 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2010-09-20 18:21 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Junio C Hamano, git, Axel Bonnet, Clément Poulain, Diane Gasselin

On Sat, Sep 18, 2010 at 09:25:06PM +0400, Kirill Smelkov wrote:

> Instead get the mode from either worktree, index, .git, or origin
> entries when blaming and pass it to textconv_object() as context.
> 
> The reason to do it is not to run textconv filters on symlinks.

I think this is absolutely a bug, and your solution is definitely in the
right direction. We obviously can't just ignore the mode when deciding
whether to textconv. I suspect there is similar breakage for S_IFGITLINK
files, though they are perhaps less likely in practice to match another
filetype's extension.

So all three patches look sane to me, with the caveat that I also don't
know the blame code very well.

I agree with Matthieu's points on cleaning up the commit messages, and
there is a small comment typo in this third patch:

> @@ -313,21 +315,23 @@ static struct origin *get_origin(struct scoreboard *sb,
>   * for an origin is also used to pass the blame for the entire file to
>   * the parent to detect the case where a child's blob is identical to
>   * that of its parent's.
> + *
> + * This also fills origin->mode for correspoinding tree path.

Typo: s/poind/pond

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks
  2010-09-20 18:00     ` Jeff King
@ 2010-09-20 20:18       ` Johannes Sixt
  2010-09-21 17:57       ` Junio C Hamano
  1 sibling, 0 replies; 35+ messages in thread
From: Johannes Sixt @ 2010-09-20 20:18 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Matthieu Moy, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

On Montag, 20. September 2010, Jeff King wrote:
>   [diff "SYMLINK"]
>     textconv = pointless-munge
>
> But again, I have no idea why anyone would want such a feature, so it is
> not worth thinking too hard about it.

Some people dislike "\ No newline at end of file" for symlinks. This would be 
the opportunity to hook a suitable diff driver.

-- Hannes

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program
  2010-09-20 18:21   ` Jeff King
@ 2010-09-20 20:35     ` Kirill Smelkov
  2010-09-20 21:03       ` Matthieu Moy
  0 siblings, 1 reply; 35+ messages in thread
From: Kirill Smelkov @ 2010-09-20 20:35 UTC (permalink / raw)
  To: Matthieu Moy, Jeff King
  Cc: Junio C Hamano, git, Axel Bonnet, Cl?ment Poulain, Diane Gasselin

Matthieu, Jeff, Junio, thanks for your review and input.

Comments below inline...

On Sat, Sep 18, 2010 at 09:14:57PM +0200, Matthieu Moy wrote:
> Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:
> 
> > Recently I've spot a bug
> 
> We usually avoid the first person in commit messages. The cover letter
> is a good place to tell about your personal story, but the commit
> message is what will remain, what people will read after a blame or
> bisect. They won't care whether you've "recently" found a bug, or in
> which circumstances you've found it.
> 
> I'd write stg like (which would probably go to PATCH 2/3 instead of
> here):
> 
> -----8<----
> git blame --textconv is wrongly calling the textconv filter on
> symlinks: symlinks are stored as blobs whose content is the target of
> the link, and blame calls the textconv filter on a temporary file
> filled-in with the content of this blob.
> 
> For example:
> 
> >     $ git blame -C -C regular-file.pdf
> >     Error: May not be a PDF file (continuing anyway)
> >     Error: PDF file is damaged - attempting to reconstruct xref table...
> >     Error: Couldn't find trailer dictionary
> >     Error: Couldn't read xref table
> >     Warning: program returned non-zero exit code #1
> >     fatal: unable to read files to diff
> -----8<----

Agree, your wording is better.

> > So git-blame is wrong here, and I'm going to write problem-demonstration
> > tests + try to fix it, but first we have to convert existing textconv
> > converter, so it will mimic pdftext behaviour -- if there is no '^bin:'
> > in input -- it's not a "binary" file and helper exits with error.
> 
> What's interesting here is not that you mimick pdftext behavior, but
> that you allow to easily distinguish file content and symlink target.
> 
> Here's my try:
> 
> -----8<----
> The textconv filter is sometimes incorrectly ran on a temporary file
> whose content is the target of a symbolic link, instead of actual file
> content. Prepare to test this by marking the content of the file to
> convert with "bin:", and let the helper die if "bin:" is not found in
> the file content.
> -----8<----

Agree

> > No other semantic changes at this stage.
> 
> Otherwise, the code looks OK.

Thanks


On Sat, Sep 18, 2010 at 09:26:29PM +0200, Matthieu Moy wrote:
> Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:
> 
> > Subject: Re: [PATCH 2/3] blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks
> 
> We try to keep the subject lines short (<80 chars, and as much as
> possible less so that "git log --oneline" be pretty).
> 
> How about
> 
> blame,cat-file: add failing tests for --textconv on symlinks

I'd like to shorten it, but "add failing tests" is not as descriptive as
"Demonstrate --textconv misbehaves in such-and-such way", and I can't
come up with a shorter subject without making it more cryptic. And btw,
I've looked at log --oneline output, and (surprise, surprise)

479a56 4fccc04 etc ...

So if that's not a major obstacle, I'd leave it as is.

> > Because as described in previous patch - it should not.
> 
> Since the actual problem is exhibited here, I think it is the best
> place to actually describe it.

Agree - will move description here.

> > +test_expect_success 'make another new commit' '
> > +	echo "bin: test number 2" >three.bin &&
> > +	echo "bin: test number 2 version 2" >>three.bin &&
> > +	echo "bin: test number 2 version 3" >>three.bin &&
> > +	echo "bin: test number 3" >>three.bin &&
> 
> cat >three.bin <<EOF
> bin: test number 2
> bin: test number 2 version 2
> bin: test number 2 version 3
> bin: test number 3
> EOF
> 
> ?

Yes, thanks.

> > +cat >expected <<EOF
> > +(Number1 2010-01-01 18:00:00 +0000 1) converted: test number 2
> > +(Number2 2010-01-01 20:00:00 +0000 2) converted: test number 2 version 2
> > +(Number3 2010-01-01 22:00:00 +0000 3) converted: test number 2 version 3
> > +(Number4 2010-01-01 23:00:00 +0000 4) converted: test number 3
> > +EOF
> 
> These days, it's recommanded to put this kind of code within the
> test_expect_success/test_expect_failure.

I see, thanks.

I've moved other >expected preparation inside test_expect_*, but this
expect is special in that it is used in subsequent two tests. So I'd
leave this one outside.

And btw, I've originally copied in-style what was already there in t8006
and t8007 which date to Jun 2010.

> > +
> > +echo -n "one.bin" >expected
> 
> echo -n is not very portable (and doesn't seem to be used in git's t/
> directory). Better use
> 
> printf "%s" "one.bin" >expected
> 
> (again, within text_expect_failure if possible)

I didn't knew echo -n is not portable, thanks. Converted to printf and
moved inside test.


On Sat, Sep 18, 2010 at 09:04:00PM +0200, Matthieu Moy wrote:
> Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:
> 
> > I don't know blame code well, and I'm not sure I'm doing it right or
> > otherwise without mistakes. Thus an RFC.
> 
> I don't know the code well either, but I didn't see any obvious
> problem in your code.
> 
> > --- a/sha1_name.c
> > +++ b/sha1_name.c
> > @@ -1068,7 +1068,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
> >  		struct cache_entry *ce;
> >  		int pos;
> >  		if (namelen > 2 && name[1] == '/')
> > -			return get_sha1_oneline(name + 2, sha1);
> > +			return get_sha1_oneline(name + 2, sha1);    /* XXX also mode? */
> 
> (This is the case where we're parsing ":/foo")
> 
> Currently, the mode is set a few lines above:
> 
> 	oc->mode = S_IFINVALID;
> 
> I guess this is OK since :/foo will return a commit sha1, not a file
> sha1.

Ah, yes, thanks. I forgot ':/text' means `commit which log is text' -
yes, mode should be S_IFINVALID here then.

> > @@ -1095,6 +1095,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
> >  				break;
> >  			if (ce_stage(ce) == stage) {
> >  				hashcpy(sha1, ce->sha1);
> > +				oc->mode = ce->ce_mode; /* XXX ok? */
> 
> I'd say this is OK, you're setting the mode from the index. What was
> the reason for your question mark?

Yes, this turned out to be needed - without this copy, one of cat-file
tests does not pass.

I was worrying because initially I've also tried to propagete
origin->mode into ce->ce_mode in fake_working_tree_commit():

--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2133,9 +2133,9 @@ static struct commit *fake_working_tree_commit(struct diff_options *opt,
 	size = cache_entry_size(len);
 	ce = xcalloc(1, size);
 	hashcpy(ce->sha1, origin->blob_sha1);
+	ce->ce_mode = create_ce_mode(origin->mode);
 	memcpy(ce->name, path, len);
 	ce->ce_flags = create_ce_flags(len, 0);
-	ce->ce_mode = create_ce_mode(mode);
 	add_cache_entry(ce, ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE);
 
 	/*

and it caused test failures, e.g.

    --- expected    2010-09-20 20:26:53.000000000 +0000
    +++ result      2010-09-20 20:26:53.000000000 +0000
    @@ -1 +1 @@
    -(Number2 2010-01-01 20:00:00 +0000 1) two.bin
    +(Not Committed Yet 2010-09-20 20:26:53 +0000 1) two.bin
    not ok - 9 blame with --no-textconv (on symlink)
    #
    #               git blame --no-textconv symlink.bin >blame &&
    #               find_blame <blame >result &&
    #               test_cmp expected result
    #

in t8006-blame-textconv.sh

So I dummily conclude I'de better not touch ce_mode...

But now I read comments in there 

        /*
         * Read the current index, replace the path entry with
         * origin->blob_sha1 without mucking with its mode or type
         * bits; we are not going to write this index out -- we just
         * want to run "diff-index --cached".
         */

And guess I probably should not touch mode here. Not 100% sure, but more
confident (call it heuristic programming :)), so I'm removing my XXX.


On Mon, Sep 20, 2010 at 02:21:28PM -0400, Jeff King wrote:
> On Sat, Sep 18, 2010 at 09:25:06PM +0400, Kirill Smelkov wrote:
> 
> > Instead get the mode from either worktree, index, .git, or origin
> > entries when blaming and pass it to textconv_object() as context.
> > 
> > The reason to do it is not to run textconv filters on symlinks.
> 
> I think this is absolutely a bug, and your solution is definitely in the
> right direction. We obviously can't just ignore the mode when deciding
> whether to textconv. I suspect there is similar breakage for S_IFGITLINK
> files, though they are perhaps less likely in practice to match another
> filetype's extension.
> 
> So all three patches look sane to me, with the caveat that I also don't
> know the blame code very well.
> 
> I agree with Matthieu's points on cleaning up the commit messages, and
> there is a small comment typo in this third patch:

Jeff, thanks for you comments - appreciated.

> > @@ -313,21 +315,23 @@ static struct origin *get_origin(struct scoreboard *sb,
> >   * for an origin is also used to pass the blame for the entire file to
> >   * the parent to detect the case where a child's blob is identical to
> >   * that of its parent's.
> > + *
> > + * This also fills origin->mode for correspoinding tree path.
> 
> Typo: s/poind/pond

Good eyes, thanks.

I should sleep more :)



Thanks again to everyone. I'll repost this series as v2 shortly.

Kirill

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] sha1_name.c: update comment to mention :/foo syntax
  2010-09-18 17:25 ` [PATCH 3/3] RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664'' Kirill Smelkov
  2010-09-18 19:04   ` Matthieu Moy
  2010-09-20 18:21   ` Jeff King
@ 2010-09-20 21:01   ` Matthieu Moy
  2010-09-21 18:02     ` Junio C Hamano
  2 siblings, 1 reply; 35+ messages in thread
From: Matthieu Moy @ 2010-09-20 21:01 UTC (permalink / raw)
  To: git, gitster; +Cc: Matthieu Moy


Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
Noticed while reviewing the patch serie about textconv and symlinks.
If we have comments, better have them up-to-date ;-).

 sha1_name.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 7b7e617..d7ab72a 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1062,6 +1062,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
 	/* sha1:path --> object name of path in ent sha1
 	 * :path -> object name of path in index
 	 * :[0-3]:path -> object name of path in index at stage
+	 * :/foo -> last commit whose subject starts with foo
 	 */
 	if (name[0] == ':') {
 		int stage = 0;
-- 
1.7.3.2.g257b5f

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program
  2010-09-20 20:35     ` [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program Kirill Smelkov
@ 2010-09-20 21:03       ` Matthieu Moy
  2010-09-21 18:39         ` Kirill Smelkov
  0 siblings, 1 reply; 35+ messages in thread
From: Matthieu Moy @ 2010-09-20 21:03 UTC (permalink / raw)
  To: Kirill Smelkov
  Cc: Jeff King, Junio C Hamano, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:

> Matthieu, Jeff, Junio, thanks for your review and input.
>
> Comments below inline...

You've replied to several messages in one. That's OK here since the
content of your message is essentially uncontroversial, but I'd
suggest keeping separate threads for separate patches for next time.
(for example, my reply below looks strange since I'm commenting on
PATCH 2/3 in a message titled PATCH 1/3 ...). No big deal.

>> > Subject: Re: [PATCH 2/3] blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks
>>
>> We try to keep the subject lines short (<80 chars, and as much as
>> possible less so that "git log --oneline" be pretty).
>>
>> How about
>>
>> blame,cat-file: add failing tests for --textconv on symlinks
>
> I'd like to shorten it, but "add failing tests" is not as descriptive as
> "Demonstrate --textconv misbehaves in such-and-such way",

Right, but your "such-and-such way" is not really precise either. One
cannot know whether it's running, but the wrong way, whether it's
running and should not, etc. The subject line is here to be a summary,
you don't need details. And I actually find "add failing tests" more
precise than "demonstrate that ... is behaving wrongly".

> and I can't come up with a shorter subject without making it more
> cryptic. And btw, I've looked at log --oneline output, and
> (surprise, surprise)
>
> 479a56 4fccc04 etc ...

I don't understand what you mean. After applying your patches, I get:

6998f9a RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664''
025aaac blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks
3646401 tests: Prepare --textconv tests for correctly-failing conversion program

which do not fit on a 80-columns terminal.

> So if that's not a major obstacle, I'd leave it as is.

I don't consider it "major" (but other may have different opinions)
though.

>> These days, it's recommanded to put this kind of code within the
>> test_expect_success/test_expect_failure.
>
> I see, thanks.
>
> I've moved other >expected preparation inside test_expect_*, but this
> expect is special in that it is used in subsequent two tests. So I'd
> leave this one outside.
>
> And btw, I've originally copied in-style what was already there in t8006
> and t8007 which date to Jun 2010.

Yes, the guideline is new, and not widely applied (yet).

> Ah, yes, thanks. I forgot ':/text' means `commit which log is text' -

A patch to update the comment above follows.

> Thanks again to everyone.

That "everyone" includes you ;-). Thanks for the patch, good job both
for testing and for the actual fix!

--
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks
  2010-09-20 18:00     ` Jeff King
  2010-09-20 20:18       ` Johannes Sixt
@ 2010-09-21 17:57       ` Junio C Hamano
  2010-09-21 18:42         ` Jeff King
  1 sibling, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2010-09-21 17:57 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Matthieu Moy, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

Jeff King <peff@peff.net> writes:

> The one thing this does not enable is using a special diff driver on the
> _pathnames_ of symlinks. Since these are by-definition text, I don't
> know why anyone would want to do that. But it is an orthogonal problem,
> anyway.  We would need some way in the .gitattributes or the .gitconfig
> to say "this is the diff driver to use not based on pathname matching,
> but based on the file's mode". E.g., a special "SYMLINK" diff driver
> like:
>
>   [diff "SYMLINK"]
>     textconv = pointless-munge
>
> But again, I have no idea why anyone would want such a feature, so it is
> not worth thinking too hard about it.

I agree with you; pointless-munge would just be 'printf "%s\n"' ;-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] sha1_name.c: update comment to mention :/foo syntax
  2010-09-20 21:01   ` [PATCH] sha1_name.c: update comment to mention :/foo syntax Matthieu Moy
@ 2010-09-21 18:02     ` Junio C Hamano
  2010-09-21 20:06       ` Matthieu Moy
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2010-09-21 18:02 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

Matthieu Moy <Matthieu.Moy@imag.fr> writes:

> Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
> ---
> Noticed while reviewing the patch serie about textconv and symlinks.
> If we have comments, better have them up-to-date ;-).
>
>  sha1_name.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/sha1_name.c b/sha1_name.c
> index 7b7e617..d7ab72a 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -1062,6 +1062,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
>  	/* sha1:path --> object name of path in ent sha1
>  	 * :path -> object name of path in index
>  	 * :[0-3]:path -> object name of path in index at stage
> +	 * :/foo -> last commit whose subject starts with foo

Documenting what hasn't been is a good thing, but is it really up-to-date?

Isn't it "a randomly chosen recent commit whose subject contains regexp
foo" these days?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program
  2010-09-20 21:03       ` Matthieu Moy
@ 2010-09-21 18:39         ` Kirill Smelkov
  0 siblings, 0 replies; 35+ messages in thread
From: Kirill Smelkov @ 2010-09-21 18:39 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Jeff King, Junio C Hamano, git, Axel Bonnet, Cl?ment Poulain,
	Diane Gasselin

On Mon, Sep 20, 2010 at 11:03:35PM +0200, Matthieu Moy wrote:
> Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:
> 
> > Matthieu, Jeff, Junio, thanks for your review and input.
> >
> > Comments below inline...
> 
> You've replied to several messages in one. That's OK here since the
> content of your message is essentially uncontroversial, but I'd
> suggest keeping separate threads for separate patches for next time.
> (for example, my reply below looks strange since I'm commenting on
> PATCH 2/3 in a message titled PATCH 1/3 ...). No big deal.

I knew someone will say this :)

But let's try to think about it from a bit different point of view - we
had 3 patches, and description for at least two of them were intermixed.
In this situation the series is better considered as a whole, and if we
consider each patch git branch, what would you do if you need to get
cumulative result of all the branches?

Yes, merge them - so did I. And the reply actually had all the
references to parent messages in it's In-Reply-To header. It's just that
usually MUA's don't show merges graphically (please correct me if I'm
wrong), but otherwise isn't it ok?

I agree, subject should be better corrected somehow, but main idea still
stays valid!

Clearly I did lots of giting recently, so I project it's concepts on
the world :)


> >> > Subject: Re: [PATCH 2/3] blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks
> >>
> >> We try to keep the subject lines short (<80 chars, and as much as
> >> possible less so that "git log --oneline" be pretty).
> >>
> >> How about
> >>
> >> blame,cat-file: add failing tests for --textconv on symlinks
> >
> > I'd like to shorten it, but "add failing tests" is not as descriptive as
> > "Demonstrate --textconv misbehaves in such-and-such way",
> 
> Right, but your "such-and-such way" is not really precise either. One
> cannot know whether it's running, but the wrong way, whether it's
> running and should not, etc. The subject line is here to be a summary,
> you don't need details. And I actually find "add failing tests" more
> precise than "demonstrate that ... is behaving wrongly".

Hmm, maybe my wording is not that good, but to me "Demonstrate
--textconv misbehaves ..." is closer to being more descriptive than "add
failing tests".

This is maybe just a matter of preferences, so I'm ok with either way.

> > and I can't come up with a shorter subject without making it more
> > cryptic. And btw, I've looked at log --oneline output, and
> > (surprise, surprise)
> >
> > 479a56 4fccc04 etc ...
> 
> I don't understand what you mean. After applying your patches, I get:
> 
> 6998f9a RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664''
> 025aaac blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks
> 3646401 tests: Prepare --textconv tests for correctly-failing conversion program
> 
> which do not fit on a 80-columns terminal.

I meant do git log on those sha's, but late at night I've ommitted one
letter. Here you go:

$ git log -1 --oneline a479a56
a479a56 Documentation/Makefile: allow man.base.url.for.relative.link to be set from Make
$ git log -1 --oneline 4fccc04
4fccc04 Documentation: remove stray backslashes from "Fighting regressions" article


That's 88 & 83 characters. I mean to say maybe it's not that big deal to
sacrifice understandability for 80 chars, especially when git log output
is paged, usually through less -S.

That said I also try to keep subjects short when possible.

> > So if that's not a major obstacle, I'd leave it as is.
> 
> I don't consider it "major" (but other may have different opinions)
> though.

So what would be conclusion here?

> >> These days, it's recommanded to put this kind of code within the
> >> test_expect_success/test_expect_failure.
> >
> > I see, thanks.
> >
> > I've moved other >expected preparation inside test_expect_*, but this
> > expect is special in that it is used in subsequent two tests. So I'd
> > leave this one outside.
> >
> > And btw, I've originally copied in-style what was already there in t8006
> > and t8007 which date to Jun 2010.
> 
> Yes, the guideline is new, and not widely applied (yet).
> 
> > Ah, yes, thanks. I forgot ':/text' means `commit which log is text' -
> 
> A patch to update the comment above follows.

thanks

> > Thanks again to everyone.
> 
> That "everyone" includes you ;-). Thanks for the patch, good job both
> for testing and for the actual fix!

Come on, Matthieu, it's just one small correction :)

Thanks,
Kirill

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks
  2010-09-21 17:57       ` Junio C Hamano
@ 2010-09-21 18:42         ` Jeff King
  2010-09-21 18:56           ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2010-09-21 18:42 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthieu Moy, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

On Tue, Sep 21, 2010 at 10:57:56AM -0700, Junio C Hamano wrote:

> >   [diff "SYMLINK"]
> >     textconv = pointless-munge
> >
> > But again, I have no idea why anyone would want such a feature, so it is
> > not worth thinking too hard about it.
> 
> I agree with you; pointless-munge would just be 'printf "%s\n"' ;-)

Almost. That would print the name of the tempfile; the actual pathname
is the contents of the tempfile (unless you are proposing totally
alternate semantics for the symlink diff section. :) ).

Just for fun, the patch to make this work is as tiny as:

diff --git a/diff.c b/diff.c
index 6fb18ae..58c4477 100644
--- a/diff.c
+++ b/diff.c
@@ -1771,8 +1771,14 @@ static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two, char *pre
 
 static void diff_filespec_load_driver(struct diff_filespec *one)
 {
-	if (!one->driver)
+	if (one->driver)
+		return;
+
+	if (S_ISLNK(one->mode))
+		one->driver = userdiff_find_by_name("SYMLINK");
+	else
 		one->driver = userdiff_find_by_path(one->path);
+
 	if (!one->driver)
 		one->driver = userdiff_find_by_name("default");
 }
@@ -1820,7 +1826,7 @@ struct userdiff_driver *get_textconv(struct diff_filespec *one)
 {
 	if (!DIFF_FILE_VALID(one))
 		return NULL;
-	if (!S_ISREG(one->mode))
+	if (!S_ISREG(one->mode) && !S_ISLNK(one->mode))
 		return NULL;
 	diff_filespec_load_driver(one);
 	if (!one->driver->textconv)

after which I successfully tested with:

  git init repo && cd repo &&
  echo content >file.txt &&
  ln -s file.txt link.txt &&
  echo '*.txt diff=txt' >.gitattributes &&
  git add . && git commit -m foo &&
  git config diff.txt.textconv "sed 's/^/converted: /'" &&
  git config diff.SYMLINK.textconv "perl -pe 's/$/\n/'" &&
  git show

It works with the whole range of diff config, so you could do something
as awesomely stupid as:

  $ git config diff.SYMLINK.binary true
  $ git show link.txt
  ...
  diff --git a/link.txt b/link.txt
  new file mode 120000
  index 0000000..4c33073
  Binary files /dev/null and b/link.txt differ

If you really wanted the "dereference my symlinks" behavior, you could
do:

  git config diff.SYMLINK.textconv 'sh -c "cat `cat $1`" -'

but that is not quite right; you would actually need to dereference
relative symlinks with respect to the link itself, which textconv never
gets (plus you would probably want to handle broken links more
gracefully).

Anyway, as I said at the beginning, for me this was just for fun. I find
the intended use rather silly, but maybe somebody else is interested. I
do think it's the right way of implementing such a feature, because we
already turn off textconv when making patches that are meant to be
applied rather than viewed. However, I didn't do any testing or give
much thought to whether this would affect any unintended code paths.

-Peff

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks
  2010-09-21 18:42         ` Jeff King
@ 2010-09-21 18:56           ` Jeff King
  2010-09-21 20:59             ` [PATCH 0/2] better userdiff behavior for symlinks Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2010-09-21 18:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthieu Moy, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

On Tue, Sep 21, 2010 at 02:42:41PM -0400, Jeff King wrote:

> diff --git a/diff.c b/diff.c
> index 6fb18ae..58c4477 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1771,8 +1771,14 @@ static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two, char *pre
>  
>  static void diff_filespec_load_driver(struct diff_filespec *one)
>  {
> -	if (!one->driver)
> +	if (one->driver)
> +		return;
> +
> +	if (S_ISLNK(one->mode))
> +		one->driver = userdiff_find_by_name("SYMLINK");
> +	else
>  		one->driver = userdiff_find_by_path(one->path);
> +
>  	if (!one->driver)
>  		one->driver = userdiff_find_by_name("default");
>  }
> @@ -1820,7 +1826,7 @@ struct userdiff_driver *get_textconv(struct diff_filespec *one)
>  {
>  	if (!DIFF_FILE_VALID(one))
>  		return NULL;
> -	if (!S_ISREG(one->mode))
> +	if (!S_ISREG(one->mode) && !S_ISLNK(one->mode))
>  		return NULL;
>  	diff_filespec_load_driver(one);
>  	if (!one->driver->textconv)

Actually, thinking about this a little more, I wonder if the test for
S_ISREG should perhaps have been in diff_filespec_load_driver from the
very beginning. With the current code, I can do:

  $ echo 'binary content' >file.bin
  $ ln -s file.bin link.bin
  $ echo '*.bin diff=bin' >.gitattributes
  $ git config diff.bin.binary true
  $ git add -N .
  $ git diff
  diff --git a/.gitattributes b/.gitattributes
  index e69de29..d38ad2e 100644
  --- a/.gitattributes
  +++ b/.gitattributes
  @@ -0,0 +1 @@
  +*.bin diff=bin
  diff --git a/file.bin b/file.bin
  index e69de29..f55142d 100644
  Binary files a/file.bin and b/file.bin differ
  diff --git a/link.bin b/link.bin
  index e69de29..dce41ec 120000
  Binary files a/link.bin and b/link.bin differ

which does not seem right. We are again acting on the symlink's
contents, which are a text pathname, as if they represented the content
of that pathname.

So I think that is a bug, albeit one that is relatively uncommon to
trigger. S_ISGITLINK files may have the same issues, but perhaps not, as
I think they get routed out of the regular diff codepath early.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH] sha1_name.c: update comment to mention :/foo syntax
  2010-09-21 18:02     ` Junio C Hamano
@ 2010-09-21 20:06       ` Matthieu Moy
  2010-09-21 23:29         ` Junio C Hamano
  0 siblings, 1 reply; 35+ messages in thread
From: Matthieu Moy @ 2010-09-21 20:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Matthieu Moy <Matthieu.Moy@imag.fr> writes:
>
>> Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
>> ---
>> Noticed while reviewing the patch serie about textconv and symlinks.
>> If we have comments, better have them up-to-date ;-).
>>
>>  sha1_name.c |    1 +
>>  1 files changed, 1 insertions(+), 0 deletions(-)
>>
>> diff --git a/sha1_name.c b/sha1_name.c
>> index 7b7e617..d7ab72a 100644
>> --- a/sha1_name.c
>> +++ b/sha1_name.c
>> @@ -1062,6 +1062,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
>>  	/* sha1:path --> object name of path in ent sha1
>>  	 * :path -> object name of path in index
>>  	 * :[0-3]:path -> object name of path in index at stage
>> +	 * :/foo -> last commit whose subject starts with foo
>
> Documenting what hasn't been is a good thing, but is it really up-to-date?
>
> Isn't it "a randomly chosen recent commit whose subject contains regexp
> foo" these days?

I don't know.

I just tried to summarize what man git-rev-parse says:

       ·   A colon, followed by a slash, followed by a text (e.g.  :/fix nasty bug):
           this names a commit whose commit message starts with the specified text.
           This name returns the youngest matching commit which is reachable from any
           ref. If the commit message starts with a !, you have to repeat that; the
           special sequence :/!, followed by something else than !  is reserved for
           now.

If your description is more accurate than mine (especially about
"randomly" Vs "last" and "regexp"), then the man should be updated.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 0/2] better userdiff behavior for symlinks
  2010-09-21 18:56           ` Jeff King
@ 2010-09-21 20:59             ` Jeff King
  2010-09-21 21:01               ` [PATCH 1/2] diff: don't use pathname-based diff drivers " Jeff King
  2010-09-21 21:13               ` [PATCH 2/2] diff: add a special SYMLINK user-diff driver Jeff King
  0 siblings, 2 replies; 35+ messages in thread
From: Jeff King @ 2010-09-21 20:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthieu Moy, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

On Tue, Sep 21, 2010 at 02:56:13PM -0400, Jeff King wrote:

> We are again acting on the symlink's contents, which are a text
> pathname, as if they represented the content of that pathname.
> 
> So I think that is a bug, albeit one that is relatively uncommon to
> trigger. S_ISGITLINK files may have the same issues, but perhaps not,
> as I think they get routed out of the regular diff codepath early.

Here is a patch series to address this:

  [1/2]: diff: don't use pathname-based diff drivers for symlinks
  [2/2]: diff: add a special SYMLINK user-diff driver

The first one fixes the bug. The second one is similar to what I posted
before, but it may actually be more useful now, as it re-enables some
dubious functionality that the first patch dropped (but in a more sane
way). See the comments in each patch for details.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/2] diff: don't use pathname-based diff drivers for symlinks
  2010-09-21 20:59             ` [PATCH 0/2] better userdiff behavior for symlinks Jeff King
@ 2010-09-21 21:01               ` Jeff King
  2010-09-22  5:40                 ` Matthieu Moy
  2010-09-21 21:13               ` [PATCH 2/2] diff: add a special SYMLINK user-diff driver Jeff King
  1 sibling, 1 reply; 35+ messages in thread
From: Jeff King @ 2010-09-21 21:01 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthieu Moy, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

When we're diffing symlinks, we consider the contents to be
the pathname that the symlink points to. When a user sets up
a userdiff driver like "*.pdf diff=pdf", their "diff.pdf.*"
config generally tells us what to do with the content of
pdf files.

With the current code, we will actually process a symlink
like "link.pdf" using a configured pdf driver, meaning we
are using contents which consist of a pathname with
configuration that is expecting contents that consist of an
actual pdf file.

The most noticeable example of this would have been
textconv; however, it was already protected in its own
textconv-specific code path. We can still see the breakage
with something like "diff.*.binary", though. You could
also see it with diff.*.funcname, though it is a bit harder
to trigger accidentally there.

This patch adds a check for S_ISREG lower in the callstack
than the textconv-specific check, which should block use of
any userdiff config for non-regular files. We can drop the
check in the textconv code, which is now redundant.

Signed-off-by: Jeff King <peff@peff.net>
---
Technically, this could be breaking somebody's setup if:

  1. They found some use for userdiff config on symlinks. Textconv is
     already disabled. A custom diff driver might work.

and

  2. They were willing to specify symlinks individually in
     .gitattributes, or name them according to some symlink-specific
     pattern. Attribute patterns that matched both regular files and
     symlinks were broken, as shown in the test.

I find it unlikely, and given the potential breakage, it seems more like
exploiting a bug to get what you want. A more sane way of doing the same
thing is provided in patch 2/2.

 diff.c                  |   11 ++++++++---
 t/t4011-diff-symlink.sh |   26 ++++++++++++++++++++++++++
 2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 9a5c77c..276e029 100644
--- a/diff.c
+++ b/diff.c
@@ -1771,8 +1771,14 @@ static void emit_binary_diff(FILE *file, mmfile_t *one, mmfile_t *two, char *pre
 
 static void diff_filespec_load_driver(struct diff_filespec *one)
 {
-	if (!one->driver)
+	/* Use already-loaded driver */
+	if (one->driver)
+		return;
+
+	if (S_ISREG(one->mode))
 		one->driver = userdiff_find_by_path(one->path);
+
+	/* Fallback to default settings */
 	if (!one->driver)
 		one->driver = userdiff_find_by_name("default");
 }
@@ -1820,8 +1826,7 @@ struct userdiff_driver *get_textconv(struct diff_filespec *one)
 {
 	if (!DIFF_FILE_VALID(one))
 		return NULL;
-	if (!S_ISREG(one->mode))
-		return NULL;
+
 	diff_filespec_load_driver(one);
 	if (!one->driver->textconv)
 		return NULL;
diff --git a/t/t4011-diff-symlink.sh b/t/t4011-diff-symlink.sh
index 6f69489..408a19c 100755
--- a/t/t4011-diff-symlink.sh
+++ b/t/t4011-diff-symlink.sh
@@ -88,4 +88,30 @@ test_expect_success SYMLINKS \
     test_must_fail git diff --no-index pinky brain > output 2> output.err &&
     grep narf output &&
     ! grep error output.err'
+
+test_expect_success SYMLINKS 'setup symlinks with attributes' '
+	echo "*.bin diff=bin" >>.gitattributes &&
+	echo content >file.bin &&
+	ln -s file.bin link.bin &&
+	git add -N file.bin link.bin
+'
+
+cat >expect <<'EOF'
+diff --git a/file.bin b/file.bin
+index e69de29..d95f3ad 100644
+Binary files a/file.bin and b/file.bin differ
+diff --git a/link.bin b/link.bin
+index e69de29..dce41ec 120000
+--- a/link.bin
++++ b/link.bin
+@@ -0,0 +1 @@
++file.bin
+\ No newline at end of file
+EOF
+test_expect_success SYMLINKS 'symlinks do not respect userdiff config by path' '
+	git config diff.bin.binary true &&
+	git diff file.bin link.bin >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
1.7.3.256.gb9713

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/2] diff: add a special SYMLINK user-diff driver
  2010-09-21 20:59             ` [PATCH 0/2] better userdiff behavior for symlinks Jeff King
  2010-09-21 21:01               ` [PATCH 1/2] diff: don't use pathname-based diff drivers " Jeff King
@ 2010-09-21 21:13               ` Jeff King
  2010-09-22  0:12                 ` Ævar Arnfjörð Bjarmason
                                   ` (2 more replies)
  1 sibling, 3 replies; 35+ messages in thread
From: Jeff King @ 2010-09-21 21:13 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthieu Moy, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

As of the previous patch, there is no way to impact the diff
characteristics of a symbolic link. So if you wanted to use
a custom external driver or textconv for a symbolic link,
you can't. This patch enables that. See the added
documentation for an example use.

Signed-off-by: Jeff King <peff@peff.net>
---
And this provides any people who really were using the symlink behavior
fixed in patch 1/2 with a less buggy way to do the same thing. In
addition, it now allows them to use textconv, which can be used to
address the "no newline at end of file" complaint.

Writing this, though, made me wonder if diff is alone in having this
bug. If I have "*.pdf merge=pdf" in my .gitattributes file, will we try
to merge a symoblic link to a pdf like a pdf? That seems just as wrong
as diffing.

So I wonder if the problem really is not in the userdiff code at all,
and we should be disabling gitattribute lookup entirely for non-regular
files.

Also, this special SYMLINK driver assumes you want to treat all symlinks
the same. Which is probably good enough in practice. But if
gitattributes were aware of file modes, it would be nice to be able to
be able to apply multiple criteria (i.e., something besides just a
pathname match) to matching files. And then you could do something like
this in your gitattributes file:

  # all symlinks match this
  * is_symlink diff=symlink-diff-driver
  # only a symlink that matches this filename will get this
  some-particular-symlink is_symlink diff=other-symlink-diff-driver
  # and here we can explicitly say that only regular files get this
  # treatment
  *.pdf is_regular diff=pdf
  # and since that's what you want 99% of the time, we can imply
  # is_regular, which is backwards compatible and syntactically nicer.
  # So this is equivalent to the above
  *.pdf diff=pdf

which is more flexible, and neatly solves the "attributes shouldn't
affect symlinks at all" issue.

It does feel like overengineering, though. I really don't think people
are going to want to treat various symlinks differently. There is also
the question of submodule links. In theory they could match
gitattributes, too, and maybe there is something more useful people
could do there. I dunno.

 Documentation/gitattributes.txt |   32 ++++++++++++++++++++++++++++++++
 diff.c                          |    2 ++
 t/t4011-diff-symlink.sh         |   11 +++++++++++
 3 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e5a27d8..cc3bddf 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -405,6 +405,8 @@ String::
 	by the configuration variables in the "diff.foo" section of the
 	git config file.
 
+NOTE: Diff attributes apply only to regular files, not symlinks or
+submodules. See the section on special diff drivers below.
 
 Defining an external diff driver
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -572,6 +574,36 @@ and now produces better output), you can remove the cache
 manually with `git update-ref -d refs/notes/textconv/jpg` (where
 "jpg" is the name of the diff driver, as in the example above).
 
+Special diff drivers
+^^^^^^^^^^^^^^^^^^^^
+
+Diff drivers matched by attributes are used only for regular files. They
+do not apply to symbolic links, as a diff of symbolic links is not
+performed on the content of the files the links point to, but rather
+on the names of the files pointed to by the links. This makes a
+configuration like the following work as expected:
+
+-----------------------------
+$ echo '*.pdf diff=pdf' >.gitattributes
+$ git config diff.pdf.textconv pdftotext
+$ ln -s file.pdf link.pdf
+-----------------------------
+
+The regular file "file.pdf" will be converted to text for diffing, but
+the symlink "link.pdf", whose diffed contents are not PDF at all but
+rather the pathname "file.pdf", will not be affected.
+
+However, it may happen that you do want to change the diff parameters
+specifically for symlinks. You can do that by configuring the special
+"SYMLINK" diff driver. For example, to add a newline to the end of
+the symlink contents (and suppress the usual "no newline at end of file"
+warning), you could configure:
+
+-----------------------------
+[diff "SYMLINK"]
+	textconv = perl -pe 's/$/\n/'
+-----------------------------
+
 Performing a three-way merge
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/diff.c b/diff.c
index 276e029..281b74a 100644
--- a/diff.c
+++ b/diff.c
@@ -1777,6 +1777,8 @@ static void diff_filespec_load_driver(struct diff_filespec *one)
 
 	if (S_ISREG(one->mode))
 		one->driver = userdiff_find_by_path(one->path);
+	else if(S_ISLNK(one->mode))
+		one->driver = userdiff_find_by_name("SYMLINK");
 
 	/* Fallback to default settings */
 	if (!one->driver)
diff --git a/t/t4011-diff-symlink.sh b/t/t4011-diff-symlink.sh
index 408a19c..aa1daa2 100755
--- a/t/t4011-diff-symlink.sh
+++ b/t/t4011-diff-symlink.sh
@@ -114,4 +114,15 @@ test_expect_success SYMLINKS 'symlinks do not respect userdiff config by path' '
 	test_cmp expect actual
 '
 
+cat >expect <<'EOF'
+diff --git a/link.bin b/link.bin
+index e69de29..dce41ec 120000
+Binary files a/link.bin and b/link.bin differ
+EOF
+test_expect_success SYMLINKS 'symlinks do respect SYMLINK userdiff config' '
+	git config diff.SYMLINK.binary true &&
+	git diff link.bin >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
1.7.3.256.gb9713

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH] sha1_name.c: update comment to mention :/foo syntax
  2010-09-21 20:06       ` Matthieu Moy
@ 2010-09-21 23:29         ` Junio C Hamano
  2010-09-24 16:43           ` [PATCH] update comment and documentation for " Matthieu Moy
  0 siblings, 1 reply; 35+ messages in thread
From: Junio C Hamano @ 2010-09-21 23:29 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: git

Matthieu Moy <Matthieu.Moy@grenoble-inp.fr> writes:

>>> +	 * :/foo -> last commit whose subject starts with foo
>>
>> Documenting what hasn't been is a good thing, but is it really up-to-date?
>>
>> Isn't it "a randomly chosen recent commit whose subject contains regexp
>> foo" these days?
>
> I don't know.

I was vaguely recalling this one when I wrote the above.

commit 57895105c4ff083d7c9bc59b2b88b9b5176c1915
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Fri Apr 23 08:20:20 2010 -0700

    Make :/ accept a regex rather than a fixed pattern
    
    This also makes it trigger anywhere in the commit message, rather than
    just at the beginning. Which tends to be a lot more useful.
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] diff: add a special SYMLINK user-diff driver
  2010-09-21 21:13               ` [PATCH 2/2] diff: add a special SYMLINK user-diff driver Jeff King
@ 2010-09-22  0:12                 ` Ævar Arnfjörð Bjarmason
  2010-09-22  0:30                   ` Jeff King
  2010-09-22  5:53                 ` Matthieu Moy
  2010-09-22 16:59                 ` Matthieu Moy
  2 siblings, 1 reply; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-09-22  0:12 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Matthieu Moy, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

2010/9/21 Jeff King <peff@peff.net>:
> +However, it may happen that you do want to change the diff parameters
> +specifically for symlinks. You can do that by configuring the special
> +"SYMLINK" diff driver. For example, to add a newline to the end of
> +the symlink contents (and suppress the usual "no newline at end of file"
> +warning), you could configure:
> +
> +-----------------------------
> +[diff "SYMLINK"]
> +       textconv = perl -pe 's/$/\n/'
> +-----------------------------

That'll turn every \n in the stream into \n\n, not add a newline to
the end of the file. Don't you mean:

    perl -0666 -pe 's/$/\n/'

Or, more efficiently:

    perl -ple 'END { print }'

?

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] diff: add a special SYMLINK user-diff driver
  2010-09-22  0:12                 ` Ævar Arnfjörð Bjarmason
@ 2010-09-22  0:30                   ` Jeff King
  2010-09-22  0:39                     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 35+ messages in thread
From: Jeff King @ 2010-09-22  0:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Matthieu Moy, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

On Wed, Sep 22, 2010 at 12:12:30AM +0000, Ævar Arnfjörð Bjarmason wrote:

> 2010/9/21 Jeff King <peff@peff.net>:
> > +However, it may happen that you do want to change the diff parameters
> > +specifically for symlinks. You can do that by configuring the special
> > +"SYMLINK" diff driver. For example, to add a newline to the end of
> > +the symlink contents (and suppress the usual "no newline at end of file"
> > +warning), you could configure:
> > +
> > +-----------------------------
> > +[diff "SYMLINK"]
> > +       textconv = perl -pe 's/$/\n/'
> > +-----------------------------
> 
> That'll turn every \n in the stream into \n\n, not add a newline to
> the end of the file. Don't you mean:
> 
>     perl -0666 -pe 's/$/\n/'

Yeah, it will add a newline to the end of file, but it will also double
existing newlines. I wanted something short and clear so the reader
could understand what was going on, and symlink paths don't generally
contain an extra newline.

Just "perl -0pe 's/$/\n/'" would work, as symlink targets can't possibly
have embedded NULs.

But:

> Or, more efficiently:
> 
>     perl -ple 'END { print }'

That one is even more readable, IMHO. But it isn't right. :)
The automatic line-handling actually adds the missing newline, so I
think it would have to be:

  perl -ple ''

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] diff: add a special SYMLINK user-diff driver
  2010-09-22  0:30                   ` Jeff King
@ 2010-09-22  0:39                     ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 35+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2010-09-22  0:39 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Matthieu Moy, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

On Wed, Sep 22, 2010 at 00:30, Jeff King <peff@peff.net> wrote:
> On Wed, Sep 22, 2010 at 12:12:30AM +0000, Ævar Arnfjörð Bjarmason wrote:
>
>> 2010/9/21 Jeff King <peff@peff.net>:
>> > +However, it may happen that you do want to change the diff parameters
>> > +specifically for symlinks. You can do that by configuring the special
>> > +"SYMLINK" diff driver. For example, to add a newline to the end of
>> > +the symlink contents (and suppress the usual "no newline at end of file"
>> > +warning), you could configure:
>> > +
>> > +-----------------------------
>> > +[diff "SYMLINK"]
>> > +       textconv = perl -pe 's/$/\n/'
>> > +-----------------------------
>>
>> That'll turn every \n in the stream into \n\n, not add a newline to
>> the end of the file. Don't you mean:
>>
>>     perl -0666 -pe 's/$/\n/'
>
> Yeah, it will add a newline to the end of file, but it will also double
> existing newlines. I wanted something short and clear so the reader
> could understand what was going on, and symlink paths don't generally
> contain an extra newline.
>
> Just "perl -0pe 's/$/\n/'" would work, as symlink targets can't possibly
> have embedded NULs.
>
> But:
>
>> Or, more efficiently:
>>
>>     perl -ple 'END { print }'
>
> That one is even more readable, IMHO. But it isn't right. :)
> The automatic line-handling actually adds the missing newline, so I
> think it would have to be:
>
>  perl -ple ''

    perl -ple1

:)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] diff: don't use pathname-based diff drivers for symlinks
  2010-09-21 21:01               ` [PATCH 1/2] diff: don't use pathname-based diff drivers " Jeff King
@ 2010-09-22  5:40                 ` Matthieu Moy
  2010-09-22  5:50                   ` Jeff King
  0 siblings, 1 reply; 35+ messages in thread
From: Matthieu Moy @ 2010-09-22  5:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

Jeff King <peff@peff.net> writes:

> We can drop the check in the textconv code, which is now redundant.

Am I correct if I say that this makes "[PATCH 3/3] RFC: blame,cat-file
--textconv: Don't assume mode is ``S_IFREF | 0664''" obsolete?

(but patches 1 and 2 are still useful to check the behavior)

> Technically, this could be breaking somebody's setup if:
>
>   1. They found some use for userdiff config on symlinks. Textconv is
>      already disabled. A custom diff driver might work.

I'm wondering about cases other than regular files and symlinks here.
Directories are not a problem, since for Git, they somehow don't
exist, but for example, "git diff" shows diff for subprojects too.
After little testing, it seem the textconv is never applied on
subprojects, but I couldn't say why.

> I find it unlikely, and given the potential breakage, it seems more like
> exploiting a bug to get what you want.

Agreed.

> +cat >expect <<'EOF'
> +diff --git a/file.bin b/file.bin
> +index e69de29..d95f3ad 100644
> +Binary files a/file.bin and b/file.bin differ
> +diff --git a/link.bin b/link.bin
> +index e69de29..dce41ec 120000
> +--- a/link.bin
> ++++ b/link.bin
> +@@ -0,0 +1 @@
> ++file.bin
> +\ No newline at end of file
> +EOF
> +test_expect_success SYMLINKS 'symlinks do not respect userdiff config by path' '
> +	git config diff.bin.binary true &&
> +	git diff file.bin link.bin >actual &&
> +	test_cmp expect actual
> +'
> +
>  test_done

It's recommanded to put these cat <<'EOF' within the
test_expect_success, but otherwise, the code looks good.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] diff: don't use pathname-based diff drivers for symlinks
  2010-09-22  5:40                 ` Matthieu Moy
@ 2010-09-22  5:50                   ` Jeff King
  0 siblings, 0 replies; 35+ messages in thread
From: Jeff King @ 2010-09-22  5:50 UTC (permalink / raw)
  To: Matthieu Moy
  Cc: Junio C Hamano, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

On Wed, Sep 22, 2010 at 07:40:51AM +0200, Matthieu Moy wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We can drop the check in the textconv code, which is now redundant.
> 
> Am I correct if I say that this makes "[PATCH 3/3] RFC: blame,cat-file
> --textconv: Don't assume mode is ``S_IFREF | 0664''" obsolete?
> 
> (but patches 1 and 2 are still useful to check the behavior)

No. This just moves the check of the mode further down in the callstack.
But the problem that Kirill's patch fixes is that we were feeding a
bogus mode in the first place. So we still need to start actually
sending the correct mode.

> > Technically, this could be breaking somebody's setup if:
> >
> >   1. They found some use for userdiff config on symlinks. Textconv is
> >      already disabled. A custom diff driver might work.
> 
> I'm wondering about cases other than regular files and symlinks here.
> Directories are not a problem, since for Git, they somehow don't
> exist, but for example, "git diff" shows diff for subprojects too.
> After little testing, it seem the textconv is never applied on
> subprojects, but I couldn't say why.

Yeah, gitlinks were the only example I could come up with, too. The
current textconv code already checked for S_ISREG, so you couldn't
textconv them. I'm not sure if you could do an external diff, or mark
them as binary. I suspect not, since we tend to treat them specially in
the diff code, but I didn't actually test. At any rate, those things are
explicitly disallowed by my change.

> > +cat >expect <<'EOF'
> [...]
> It's recommanded to put these cat <<'EOF' within the
> test_expect_success, but otherwise, the code looks good.

I didn't think we had actually reached consensus on that.

-Peff

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] diff: add a special SYMLINK user-diff driver
  2010-09-21 21:13               ` [PATCH 2/2] diff: add a special SYMLINK user-diff driver Jeff King
  2010-09-22  0:12                 ` Ævar Arnfjörð Bjarmason
@ 2010-09-22  5:53                 ` Matthieu Moy
  2010-09-22 16:59                 ` Matthieu Moy
  2 siblings, 0 replies; 35+ messages in thread
From: Matthieu Moy @ 2010-09-22  5:53 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

Jeff King <peff@peff.net> writes:

> So I wonder if the problem really is not in the userdiff code at all,
> and we should be disabling gitattribute lookup entirely for non-regular
> files.

Actually, I'd also say that what needs to be changed is on the
.gitattributes side, not on the .gitconfig side.

Another option would be zsh-style globbing, like:

# any symbolic link
*(@) diff=link

# Symbolic links named *.pdf
*.pdf(@)

(except that * means "all" to zsh, and should mean "regular" to Git).

But OTOH, that's a lot of work to implement, for very little benefit.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] diff: add a special SYMLINK user-diff driver
  2010-09-21 21:13               ` [PATCH 2/2] diff: add a special SYMLINK user-diff driver Jeff King
  2010-09-22  0:12                 ` Ævar Arnfjörð Bjarmason
  2010-09-22  5:53                 ` Matthieu Moy
@ 2010-09-22 16:59                 ` Matthieu Moy
  2 siblings, 0 replies; 35+ messages in thread
From: Matthieu Moy @ 2010-09-22 16:59 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Kirill Smelkov, git, Axel Bonnet,
	Clément Poulain, Diane Gasselin

Jeff King <peff@peff.net> writes:

> Also, this special SYMLINK driver assumes you want to treat all symlinks
> the same. Which is probably good enough in practice. But if
> gitattributes were aware of file modes, it would be nice to be able to
> be able to apply multiple criteria (i.e., something besides just a
> pathname match) to matching files. And then you could do something like
> this in your gitattributes file:
>
>   # all symlinks match this
>   * is_symlink diff=symlink-diff-driver
>   # only a symlink that matches this filename will get this
>   some-particular-symlink is_symlink diff=other-symlink-diff-driver
>   # and here we can explicitly say that only regular files get this
>   # treatment
>   *.pdf is_regular diff=pdf
>   # and since that's what you want 99% of the time, we can imply
>   # is_regular, which is backwards compatible and syntactically nicer.
>   # So this is equivalent to the above
>   *.pdf diff=pdf
>
> which is more flexible, and neatly solves the "attributes shouldn't
> affect symlinks at all" issue.
>
> It does feel like overengineering, though.

After thinking a bit more, I don't feel this is so much
overengineering. Your initial SYMLINK driver proposal was somehow a
solution looking for a problem, and now we've found just one problem.
So I'd say the SYMLINK driver is overengineering for the potential
problem it solves: instead, people would probably prefer a
diff.showFinalNewlineForSymlink config option or so --- btw, wasn't
there a patch for that recently?

OTOH, if the SYMLINK driver is ever used for something other than the
"no newline at end of file", I'm pretty sure it will not be general
enough. So, to me, the SYMLINK driver is a rather bad compromise
between complexity and flexibility.

(but the counter-argument "that's a lot of work for very little
benefit" still holds...)

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH] update comment and documentation for :/foo syntax
  2010-09-21 23:29         ` Junio C Hamano
@ 2010-09-24 16:43           ` Matthieu Moy
  0 siblings, 0 replies; 35+ messages in thread
From: Matthieu Moy @ 2010-09-24 16:43 UTC (permalink / raw)
  To: git, gitster; +Cc: Matthieu Moy

The documentation in revisions.txt did not match the implementation, and
the comment in sha1_name.c was incomplete.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
---
Junio wrote:

> Isn't it "a randomly chosen recent commit whose subject contains regexp
> foo" these days?

So, that's right. Here's an updated version. I've kept the comment
short and vague, to avoid duplicating the doc, it should be sufficient
as a reminder.

 Documentation/revisions.txt |    4 +++-
 sha1_name.c                 |    1 +
 2 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/Documentation/revisions.txt b/Documentation/revisions.txt
index fe846f0..3d4b79c 100644
--- a/Documentation/revisions.txt
+++ b/Documentation/revisions.txt
@@ -107,11 +107,13 @@ the `$GIT_DIR/refs` directory or from the `$GIT_DIR/packed-refs` file.
   found.
 
 * A colon, followed by a slash, followed by a text (e.g. `:/fix nasty bug`): this names
-  a commit whose commit message starts with the specified text.
+  a commit whose commit message matches the specified regular expression.
   This name returns the youngest matching commit which is
   reachable from any ref.  If the commit message starts with a
   '!', you have to repeat that;  the special sequence ':/!',
   followed by something else than '!' is reserved for now.
+  The regular expression can match any part of the commit message. To
+  match messages starting with a string, one can use e.g. `:/^foo`.
 
 * A suffix ':' followed by a path (e.g. `HEAD:README`); this names the blob or tree
   at the given path in the tree-ish object named by the part
diff --git a/sha1_name.c b/sha1_name.c
index 5470a69..ba95a79 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1062,6 +1062,7 @@ int get_sha1_with_context_1(const char *name, unsigned char *sha1,
 	/* sha1:path --> object name of path in ent sha1
 	 * :path -> object name of path in index
 	 * :[0-3]:path -> object name of path in index at stage
+	 * :/foo -> recent commit matching foo
 	 */
 	if (name[0] == ':') {
 		int stage = 0;
-- 
1.7.3.2.g257b5f

^ permalink raw reply related	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2010-09-24 16:45 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-18 17:25 [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks Kirill Smelkov
2010-09-18 17:25 ` [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program Kirill Smelkov
2010-09-18 19:14   ` Matthieu Moy
2010-09-18 17:25 ` [PATCH 2/3] blame,cat-file: Demonstrate --textconv is wrongly running converter on symlinks Kirill Smelkov
2010-09-18 19:26   ` Matthieu Moy
2010-09-18 17:25 ` [PATCH 3/3] RFC: blame,cat-file --textconv: Don't assume mode is ``S_IFREF | 0664'' Kirill Smelkov
2010-09-18 19:04   ` Matthieu Moy
2010-09-20 18:21   ` Jeff King
2010-09-20 20:35     ` [PATCH 1/3] tests: Prepare --textconv tests for correctly-failing conversion program Kirill Smelkov
2010-09-20 21:03       ` Matthieu Moy
2010-09-21 18:39         ` Kirill Smelkov
2010-09-20 21:01   ` [PATCH] sha1_name.c: update comment to mention :/foo syntax Matthieu Moy
2010-09-21 18:02     ` Junio C Hamano
2010-09-21 20:06       ` Matthieu Moy
2010-09-21 23:29         ` Junio C Hamano
2010-09-24 16:43           ` [PATCH] update comment and documentation for " Matthieu Moy
2010-09-18 18:08 ` [BUG, PATCH 0/3] Fix {blame,cat-file} --textconv for cases with symlinks Matthieu Moy
2010-09-18 20:01   ` Junio C Hamano
2010-09-19  8:58     ` Matthieu Moy
2010-09-19 18:17       ` Junio C Hamano
2010-09-20 18:00     ` Jeff King
2010-09-20 20:18       ` Johannes Sixt
2010-09-21 17:57       ` Junio C Hamano
2010-09-21 18:42         ` Jeff King
2010-09-21 18:56           ` Jeff King
2010-09-21 20:59             ` [PATCH 0/2] better userdiff behavior for symlinks Jeff King
2010-09-21 21:01               ` [PATCH 1/2] diff: don't use pathname-based diff drivers " Jeff King
2010-09-22  5:40                 ` Matthieu Moy
2010-09-22  5:50                   ` Jeff King
2010-09-21 21:13               ` [PATCH 2/2] diff: add a special SYMLINK user-diff driver Jeff King
2010-09-22  0:12                 ` Ævar Arnfjörð Bjarmason
2010-09-22  0:30                   ` Jeff King
2010-09-22  0:39                     ` Ævar Arnfjörð Bjarmason
2010-09-22  5:53                 ` Matthieu Moy
2010-09-22 16:59                 ` Matthieu Moy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.