All of lore.kernel.org
 help / color / mirror / Atom feed
* Bug: problem with file named with dash character
       [not found] <52ae7682-3e9a-4b52-bec1-08ba3aadffc0@office.digitalus.nl>
@ 2012-06-27  7:32 ` Daniel Lyubomirov -|- Digitalus Bulgaria
  2012-06-27  9:57   ` faux
  2012-06-27 18:28   ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: Daniel Lyubomirov -|- Digitalus Bulgaria @ 2012-06-27  7:32 UTC (permalink / raw)
  To: git

Hi,

Аccidentally my colleague created a file in the root dir of the git repo called - (just dash).
As result for every commit having this file, diff , merge, cherry-pick maybe others just hang.
I tried to make rebase interactive to edit the commit and remove the file but that fails too.

system: Fedora Linux 3.4.2-1.fc16.x86_64
git version: 1.7.7.6

How to reproduce the bug:
1. create a file named -
2. add the file in the index and commit
3. try diff with another commit, branch

So far i found following workarounds:
1. You can delete the file. Then the problem stays in the history only for the commits that have the file.
2. Use git filter-branch and deleted the file from the history. 

   git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch -- -" <problem commit>..HEAD

   But this changes the history.


Thanks for your work.

Regards,
Daniel Lyubomirov

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27  7:32 ` Bug: problem with file named with dash character Daniel Lyubomirov -|- Digitalus Bulgaria
@ 2012-06-27  9:57   ` faux
  2012-06-27 18:28   ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: faux @ 2012-06-27  9:57 UTC (permalink / raw)
  To: git

On Wed, Jun 27, 2012 at 09:32:21AM +0200, Daniel Lyubomirov -|- Digitalus Bulgaria wrote:
> Hi,
> 
> Аccidentally my colleague created a file in the root dir of the git repo called - (just dash).
> As result for every commit having this file, diff , merge, cherry-pick maybe others just hang.
> I tried to make rebase interactive to edit the commit and remove the file but that fails too.

Haha, it tries to read from standard in:

$ git init && printf '' >- && git add -- - && git commit -m "lol"

Pressing ^D allows it to continue.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27  7:32 ` Bug: problem with file named with dash character Daniel Lyubomirov -|- Digitalus Bulgaria
  2012-06-27  9:57   ` faux
@ 2012-06-27 18:28   ` Junio C Hamano
  2012-06-27 19:52     ` Jeff King
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-06-27 18:28 UTC (permalink / raw)
  To: Daniel Lyubomirov; +Cc: git

Daniel Lyubomirov -|- Digitalus Bulgaria <daniel@digitalus.bg>
writes:

> Аccidentally my colleague created a file in the root dir of the git repo called - (just dash).
> As result for every commit having this file, diff , merge, cherry-pick maybe others just hang.

Thanks for a report.  I think the "diff" callchain should be
refactored so that the caller can mark the special "stdin" token in
a saner way, but until it happens, the following one-liner should
do.


 diff.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/diff.c b/diff.c
index 1a594df..caa2309 100644
--- a/diff.c
+++ b/diff.c
@@ -2589,6 +2589,14 @@ static int reuse_worktree_file(const char *name, const unsigned char *sha1, int
 	if (!FAST_WORKING_DIRECTORY && !want_file && has_sha1_pack(sha1))
 		return 0;
 
+	/*
+	 * And asking to read "-" from the working tree triggers stdin
+	 * input (which needs to be fixed separately by refactoring the
+	 * callchain), forbid "reuse" for now.
+	 */
+	if (!strcmp(name, "-"))
+		return 0;
+
 	len = strlen(name);
 	pos = cache_name_pos(name, len);
 	if (pos < 0)

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27 18:28   ` Junio C Hamano
@ 2012-06-27 19:52     ` Jeff King
  2012-06-27 20:25       ` Jeff King
  2012-06-27 20:27       ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff King @ 2012-06-27 19:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Lyubomirov, git

On Wed, Jun 27, 2012 at 11:28:21AM -0700, Junio C Hamano wrote:

> Thanks for a report.  I think the "diff" callchain should be
> refactored so that the caller can mark the special "stdin" token in
> a saner way, but until it happens, the following one-liner should
> do.

Yeah, I assume this is there at all to support "diff --no-index -", so the
special marking should happen at that layer.

> diff --git a/diff.c b/diff.c
> index 1a594df..caa2309 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2589,6 +2589,14 @@ static int reuse_worktree_file(const char *name, const unsigned char *sha1, int
>  	if (!FAST_WORKING_DIRECTORY && !want_file && has_sha1_pack(sha1))
>  		return 0;
>  
> +	/*
> +	 * And asking to read "-" from the working tree triggers stdin
> +	 * input (which needs to be fixed separately by refactoring the
> +	 * callchain), forbid "reuse" for now.
> +	 */
> +	if (!strcmp(name, "-"))
> +		return 0;
> +

Unfortunately this is not enough. The problematic code path is the call
to populate_from_stdin in diff_populate_filespec. And we follow that
conditional if reuse_worktree_file is true, _or_ if the sha1_valid flag
on the filespec is not set. We hit the latter due to the --no-index
case, but we can also hit it if we are comparing a working tree file in
the repo that is stat-dirty.

So without your patch, this reads from stdin:

  git init repo &&
  cd repo &&
  echo foo >- &&
  git add - &&
  git commit -m foo

when the commit command tries to generate the diff summary. Your patch
fixes it, but remains broken if you then do:

  echo changes >>- &&
  git diff

I think you'd want to do just do something like:

diff --git a/diff.c b/diff.c
index 1a594df..aac72b7 100644
--- a/diff.c
+++ b/diff.c
@@ -2684,9 +2684,6 @@ int diff_populate_filespec(struct diff_filespec *s, int size_only)
 		struct stat st;
 		int fd;
 
-		if (!strcmp(s->path, "-"))
-			return populate_from_stdin(s);
-
 		if (lstat(s->path, &st) < 0) {
 			if (errno == ENOENT) {
 			err_empty:

to temporarily fix it. That breaks

  echo content | git diff --no-index - some-file

but that code path should be fixed properly (with a use_stdin flag in
the filespec).

-Peff

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27 19:52     ` Jeff King
@ 2012-06-27 20:25       ` Jeff King
  2012-06-27 20:27       ` Junio C Hamano
  1 sibling, 0 replies; 13+ messages in thread
From: Jeff King @ 2012-06-27 20:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Lyubomirov, git

On Wed, Jun 27, 2012 at 03:52:05PM -0400, Jeff King wrote:

> I think you'd want to do just do something like:
> 
> diff --git a/diff.c b/diff.c
> index 1a594df..aac72b7 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -2684,9 +2684,6 @@ int diff_populate_filespec(struct diff_filespec *s, int size_only)
>  		struct stat st;
>  		int fd;
>  
> -		if (!strcmp(s->path, "-"))
> -			return populate_from_stdin(s);
> -
>  		if (lstat(s->path, &st) < 0) {
>  			if (errno == ENOENT) {
>  			err_empty:
> 
> to temporarily fix it. That breaks
> 
>   echo content | git diff --no-index - some-file
> 
> but that code path should be fixed properly (with a use_stdin flag in
> the filespec).

Something like the patch below, which keeps the stdin test in t4002
working for me.

I suspect there are other problems lurking with the stdin case. For
example, we try to drop filespec contents whenever we can to reduce
memory pressure, under the assumption that we can always re-read the
blob later. But with stdin, we would need to be careful to mark the
contents as precious somehow.

diff --git a/diff-no-index.c b/diff-no-index.c
index f0b0010..c64bd5c 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -51,6 +51,21 @@ static int get_mode(const char *path, int *mode)
 	return 0;
 }
 
+static struct diff_filespec *noindex_filespec(const char *name, int mode)
+{
+	struct diff_filespec *r;
+
+	if (!name)
+		name = "/dev/null";
+	r = alloc_filespec(name);
+
+	fill_filespec(r, null_sha1, mode);
+	if (!strcmp(name, "-"))
+		r->is_stdin = 1;
+
+	return r;
+}
+
 static int queue_diff(struct diff_options *o,
 		      const char *name1, const char *name2)
 {
@@ -137,15 +152,8 @@ static int queue_diff(struct diff_options *o,
 			tmp_c = name1; name1 = name2; name2 = tmp_c;
 		}
 
-		if (!name1)
-			name1 = "/dev/null";
-		if (!name2)
-			name2 = "/dev/null";
-		d1 = alloc_filespec(name1);
-		d2 = alloc_filespec(name2);
-		fill_filespec(d1, null_sha1, mode1);
-		fill_filespec(d2, null_sha1, mode2);
-
+		d1 = noindex_filespec(name1, mode1);
+		d2 = noindex_filespec(name2, mode2);
 		diff_queue(&diff_queued_diff, d1, d2);
 		return 0;
 	}
diff --git a/diff.c b/diff.c
index 1a594df..449bfba 100644
--- a/diff.c
+++ b/diff.c
@@ -2675,6 +2675,9 @@ int diff_populate_filespec(struct diff_filespec *s, int size_only)
 	if (size_only && 0 < s->size)
 		return 0;
 
+	if (s->is_stdin)
+		return populate_from_stdin(s);
+
 	if (S_ISGITLINK(s->mode))
 		return diff_populate_gitlink(s, size_only);
 
@@ -2684,9 +2687,6 @@ int diff_populate_filespec(struct diff_filespec *s, int size_only)
 		struct stat st;
 		int fd;
 
-		if (!strcmp(s->path, "-"))
-			return populate_from_stdin(s);
-
 		if (lstat(s->path, &st) < 0) {
 			if (errno == ENOENT) {
 			err_empty:
diff --git a/diffcore.h b/diffcore.h
index 8f32b82..be0739c 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -43,6 +43,7 @@ struct diff_filespec {
 	unsigned should_free : 1; /* data should be free()'ed */
 	unsigned should_munmap : 1; /* data should be munmap()'ed */
 	unsigned dirty_submodule : 2;  /* For submodules: its work tree is dirty */
+	unsigned is_stdin : 1;
 #define DIRTY_SUBMODULE_UNTRACKED 1
 #define DIRTY_SUBMODULE_MODIFIED  2
 	unsigned has_more_entries : 1; /* only appear in combined diff */

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27 19:52     ` Jeff King
  2012-06-27 20:25       ` Jeff King
@ 2012-06-27 20:27       ` Junio C Hamano
  2012-06-27 20:33         ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-06-27 20:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Daniel Lyubomirov, git

Jeff King <peff@peff.net> writes:

> but that code path should be fixed properly (with a use_stdin flag in
> the filespec).

Yes, just as I said; I am finding more and more issues with the
no-index hack that I have been fixing a bit by bit since I send the
message you responded to.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27 20:27       ` Junio C Hamano
@ 2012-06-27 20:33         ` Junio C Hamano
  2012-06-27 20:35           ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-06-27 20:33 UTC (permalink / raw)
  To: Jeff King; +Cc: Daniel Lyubomirov, git

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> but that code path should be fixed properly (with a use_stdin flag in
>> the filespec).
>
> Yes, just as I said; I am finding more and more issues with the
> no-index hack that I have been fixing a bit by bit since I send the
> message you responded to.

It is not ready yet, but here are a few patches WIP.

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Wed, 27 Jun 2012 11:51:15 -0700
Subject: [PATCH 1/?] diff-index.c: do not pretend paths are pathspecs

"git diff --no-index" takes exactly two paths, not pathspecs, and
has its own way queue_diff() to populate the diff_queue.  Do not
call diff_tree_setup_paths(), pretending as it takes pathspecs.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff-no-index.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/diff-no-index.c b/diff-no-index.c
index f0b0010..ca875da 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -175,6 +175,7 @@ void diff_no_index(struct rev_info *revs,
 	int i;
 	int no_index = 0;
 	unsigned options = 0;
+	const char *paths[2];
 
 	/* Were we asked to do --no-index explicitly? */
 	for (i = 1; i < argc; i++) {
@@ -233,8 +234,6 @@ void diff_no_index(struct rev_info *revs,
 
 	if (prefix) {
 		int len = strlen(prefix);
-		const char *paths[3];
-		memset(paths, 0, sizeof(paths));
 
 		for (i = 0; i < 2; i++) {
 			const char *p = argv[argc - 2 + i];
@@ -247,10 +246,10 @@ void diff_no_index(struct rev_info *revs,
 			     : p);
 			paths[i] = p;
 		}
-		diff_tree_setup_paths(paths, &revs->diffopt);
+	} else {
+		for (i = 0; i < 2; i++)
+			paths[i] = argv[argc - 2 + i];
 	}
-	else
-		diff_tree_setup_paths(argv + argc - 2, &revs->diffopt);
 	revs->diffopt.skip_stat_unmatch = 1;
 	if (!revs->diffopt.output_format)
 		revs->diffopt.output_format = DIFF_FORMAT_PATCH;
@@ -262,8 +261,7 @@ void diff_no_index(struct rev_info *revs,
 	if (diff_setup_done(&revs->diffopt) < 0)
 		die("diff_setup_done failed");
 
-	if (queue_diff(&revs->diffopt, revs->diffopt.pathspec.raw[0],
-		       revs->diffopt.pathspec.raw[1]))
+	if (queue_diff(&revs->diffopt, paths[0], paths[1]))
 		exit(1);
 	diff_set_mnemonic_prefix(&revs->diffopt, "1/", "2/");
 	diffcore_std(&revs->diffopt);
-- 
1.7.11.1.184.g3ee8f69

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27 20:33         ` Junio C Hamano
@ 2012-06-27 20:35           ` Junio C Hamano
  2012-06-27 20:39             ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-06-27 20:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Daniel Lyubomirov, git

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Jeff King <peff@peff.net> writes:
>>
>>> but that code path should be fixed properly (with a use_stdin flag in
>>> the filespec).
>>
>> Yes, just as I said; I am finding more and more issues with the
>> no-index hack that I have been fixing a bit by bit since I send the
>> message you responded to.
>
> It is not ready yet, but here are a few patches WIP.

And this is the second clean-up

-- >8 --
Subject: [PATCH 2/?] diff-index.c: unify handling of command line paths

Regardless of where in the directory hierarchy you are, "-" on the
command line means the standard input.  The old code knew too much
about how the low level machinery uses paths to read from the
working tree and did not bother to have the same check for "-" when
the command is run from the top-level.

Unify the codepaths for subdirectory case and toplevel case into one
and make it clearer.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff-no-index.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/diff-no-index.c b/diff-no-index.c
index ca875da..a5c1e3e 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -172,7 +172,7 @@ void diff_no_index(struct rev_info *revs,
 		   int argc, const char **argv,
 		   int nongit, const char *prefix)
 {
-	int i;
+	int i, prefixlen;
 	int no_index = 0;
 	unsigned options = 0;
 	const char *paths[2];
@@ -232,23 +232,18 @@ void diff_no_index(struct rev_info *revs,
 	if (!DIFF_OPT_TST(&revs->diffopt, EXIT_WITH_STATUS))
 		setup_pager();
 
-	if (prefix) {
-		int len = strlen(prefix);
-
-		for (i = 0; i < 2; i++) {
-			const char *p = argv[argc - 2 + i];
+	prefixlen = prefix ? strlen(prefix) : 0;
+	for (i = 0; i < 2; i++) {
+		const char *p = argv[argc - 2 + i];
+		if (!strcmp(p, "-"))
 			/*
-			 * stdin should be spelled as '-'; if you have
-			 * path that is '-', spell it as ./-.
+			 * stdin should be spelled as "-"; if you have
+			 * path that is "-", spell it as "./-".
 			 */
-			p = (strcmp(p, "-")
-			     ? xstrdup(prefix_filename(prefix, len, p))
-			     : p);
-			paths[i] = p;
-		}
-	} else {
-		for (i = 0; i < 2; i++)
-			paths[i] = argv[argc - 2 + i];
+			p = p;
+		else if (prefixlen)
+			p = xstrdup(prefix_filename(prefix, prefixlen, p));
+		paths[i] = p;
 	}
 	revs->diffopt.skip_stat_unmatch = 1;
 	if (!revs->diffopt.output_format)
-- 
1.7.11.1.184.g3ee8f69

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27 20:35           ` Junio C Hamano
@ 2012-06-27 20:39             ` Junio C Hamano
  2012-06-27 20:48               ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-06-27 20:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Daniel Lyubomirov, git

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> Jeff King <peff@peff.net> writes:
>>>
>>>> but that code path should be fixed properly (with a use_stdin flag in
>>>> the filespec).
>>>
>>> Yes, just as I said; I am finding more and more issues with the
>>> no-index hack that I have been fixing a bit by bit since I send the
>>> message you responded to.
>>
>> It is not ready yet, but here are a few patches WIP.
>
> And this is the second clean-up

And this is the third one.

-- >8 --
Subject: [PATCH 3/?] diff-index.c: "git diff" has no need to read blob from the standard input

Only "diff --no-index -" does.  Bolting the logic into the low-level
function diff_populate_filespec() was a layering violation from day
one.  Move populate_from_stdin() function out of the generic diff.c
to its only user, diff-index.c.

Also make sure "-" from the command line stays a special token "read
from the standard input", even if we later decide to sanitize the
result from prefix_filename() function in a few obvious ways,
e.g. removing unnecessary "./" prefix, duplicated slashes "//" in
the middle, etc.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 diff-no-index.c | 33 +++++++++++++++++++++++++++++++--
 diff.c          | 21 +--------------------
 diffcore.h      |  1 +
 3 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/diff-no-index.c b/diff-no-index.c
index a5c1e3e..9385d53 100644
--- a/diff-no-index.c
+++ b/diff-no-index.c
@@ -32,6 +32,12 @@ static int read_directory(const char *path, struct string_list *list)
 	return 0;
 }
 
+/*
+ * This should be "(standard input)", but it will probably expose many
+ * more breakages.
+ */
+static const char file_from_standard_input[] = "-";
+
 static int get_mode(const char *path, int *mode)
 {
 	struct stat st;
@@ -42,7 +48,7 @@ static int get_mode(const char *path, int *mode)
 	else if (!strcasecmp(path, "nul"))
 		*mode = 0;
 #endif
-	else if (!strcmp(path, "-"))
+	else if (path == file_from_standard_input)
 		*mode = create_ce_mode(0666);
 	else if (lstat(path, &st))
 		return error("Could not access '%s'", path);
@@ -51,6 +57,23 @@ static int get_mode(const char *path, int *mode)
 	return 0;
 }
 
+static int populate_from_stdin(struct diff_filespec *s)
+{
+	struct strbuf buf = STRBUF_INIT;
+	size_t size = 0;
+
+	if (strbuf_read(&buf, 0, 0) < 0)
+		return error("error while reading from stdin %s",
+				     strerror(errno));
+
+	s->should_munmap = 0;
+	s->data = strbuf_detach(&buf, &size);
+	s->size = size;
+	s->should_free = 1;
+	s->nongit_stdin = 1;
+	return 0;
+}
+
 static int queue_diff(struct diff_options *o,
 		      const char *name1, const char *name2)
 {
@@ -143,8 +166,14 @@ static int queue_diff(struct diff_options *o,
 			name2 = "/dev/null";
 		d1 = alloc_filespec(name1);
 		d2 = alloc_filespec(name2);
+
 		fill_filespec(d1, null_sha1, mode1);
+		if (name1 == file_from_standard_input)
+			populate_from_stdin(d1);
+
 		fill_filespec(d2, null_sha1, mode2);
+		if (name2 == file_from_standard_input)
+			populate_from_stdin(d2);
 
 		diff_queue(&diff_queued_diff, d1, d2);
 		return 0;
@@ -240,7 +269,7 @@ void diff_no_index(struct rev_info *revs,
 			 * stdin should be spelled as "-"; if you have
 			 * path that is "-", spell it as "./-".
 			 */
-			p = p;
+			p = file_from_standard_input;
 		else if (prefixlen)
 			p = xstrdup(prefix_filename(prefix, prefixlen, p));
 		paths[i] = p;
diff --git a/diff.c b/diff.c
index 1a594df..0c73c84 100644
--- a/diff.c
+++ b/diff.c
@@ -2619,22 +2619,6 @@ static int reuse_worktree_file(const char *name, const unsigned char *sha1, int
 	return 0;
 }
 
-static int populate_from_stdin(struct diff_filespec *s)
-{
-	struct strbuf buf = STRBUF_INIT;
-	size_t size = 0;
-
-	if (strbuf_read(&buf, 0, 0) < 0)
-		return error("error while reading from stdin %s",
-				     strerror(errno));
-
-	s->should_munmap = 0;
-	s->data = strbuf_detach(&buf, &size);
-	s->size = size;
-	s->should_free = 1;
-	return 0;
-}
-
 static int diff_populate_gitlink(struct diff_filespec *s, int size_only)
 {
 	int len;
@@ -2684,9 +2668,6 @@ int diff_populate_filespec(struct diff_filespec *s, int size_only)
 		struct stat st;
 		int fd;
 
-		if (!strcmp(s->path, "-"))
-			return populate_from_stdin(s);
-
 		if (lstat(s->path, &st) < 0) {
 			if (errno == ENOENT) {
 			err_empty:
@@ -3048,7 +3029,7 @@ static void diff_fill_sha1_info(struct diff_filespec *one)
 	if (DIFF_FILE_VALID(one)) {
 		if (!one->sha1_valid) {
 			struct stat st;
-			if (!strcmp(one->path, "-")) {
+			if (one->nongit_stdin) {
 				hashcpy(one->sha1, null_sha1);
 				return;
 			}
diff --git a/diffcore.h b/diffcore.h
index 8f32b82..ef4fe17 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -42,6 +42,7 @@ struct diff_filespec {
 #define DIFF_FILE_VALID(spec) (((spec)->mode) != 0)
 	unsigned should_free : 1; /* data should be free()'ed */
 	unsigned should_munmap : 1; /* data should be munmap()'ed */
+	unsigned nongit_stdin : 1; /* data is from a nongit source */
 	unsigned dirty_submodule : 2;  /* For submodules: its work tree is dirty */
 #define DIRTY_SUBMODULE_UNTRACKED 1
 #define DIRTY_SUBMODULE_MODIFIED  2
-- 
1.7.11.1.184.g3ee8f69

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27 20:39             ` Junio C Hamano
@ 2012-06-27 20:48               ` Junio C Hamano
  2012-06-27 21:00                 ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-06-27 20:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Daniel Lyubomirov, git

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> Junio C Hamano <gitster@pobox.com> writes:
>>>
>>>> Jeff King <peff@peff.net> writes:
>>>>
>>>>> but that code path should be fixed properly (with a use_stdin flag in
>>>>> the filespec).
>>>>
>>>> Yes, just as I said; I am finding more and more issues with the
>>>> no-index hack that I have been fixing a bit by bit since I send the
>>>> message you responded to.
>>>
>>> It is not ready yet, but here are a few patches WIP.
>>
>> And this is the second clean-up
>
> And this is the third one.

Some of the other breakages that comes from the "no-index" codepath
that we may want to consider addressing I have found so far are:

 - We say on the "diff --git" header uglyness like "a/-", "b/-";
   likewise in the metainfo;

 - We show on the "index" header "0*" value for these entries, even
   though we should be able to compute it (after all we do so for
   files on disk in a non-git directory);

 - We still apply attributes and textconv as if we are dealing with
   a regular file "-" at the root level.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27 20:48               ` Junio C Hamano
@ 2012-06-27 21:00                 ` Jeff King
  2012-06-27 22:17                   ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2012-06-27 21:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Lyubomirov, git

On Wed, Jun 27, 2012 at 01:48:42PM -0700, Junio C Hamano wrote:

> >>> It is not ready yet, but here are a few patches WIP.
> >>
> >> And this is the second clean-up
> >
> > And this is the third one.
> 
> Some of the other breakages that comes from the "no-index" codepath
> that we may want to consider addressing I have found so far are:

>From a cursory look, these definitely go in the right direction. I like
how the third one was able to rip populate_from_stdin entirely out of
diff.c.

I suspect we could get by without even the nongit_stdin flag you added;
the only place which uses it is diff_fill_sha1_info, but theoretically
we don't even need it there; we could just index_mem the file contents
we get via diff_populate_filespec, and the stdin contents will
already be there.

Right now we call index_path for worktree files, but I don't really see
much point; we have to read the whole data either way, and
populate_filespec should be mmap-ing them for us.

>  - We say on the "diff --git" header uglyness like "a/-", "b/-";
>    likewise in the metainfo;

I'd consider changing the path to "/dev/stdin" for this case. It doesn't
exist on some platforms, of course, but neither does /dev/null, which we
use similarly.

>  - We show on the "index" header "0*" value for these entries, even
>    though we should be able to compute it (after all we do so for
>    files on disk in a non-git directory);

The index_mem I mentioned above would fix that.

>  - We still apply attributes and textconv as if we are dealing with
>    a regular file "-" at the root level.

I think that's bad. I wonder if it should have "*" attributes applied to
it or not. While I can see it being convenient in some cases, I think it
makes the rules confusingly complex.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27 21:00                 ` Jeff King
@ 2012-06-27 22:17                   ` Junio C Hamano
  2012-06-27 22:41                     ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2012-06-27 22:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Daniel Lyubomirov, git

Jeff King <peff@peff.net> writes:

> From a cursory look, these definitely go in the right direction. I like
> how the third one was able to rip populate_from_stdin entirely out of
> diff.c.
>
> I suspect we could get by without even the nongit_stdin flag you added;
> the only place which uses it is diff_fill_sha1_info, but theoretically
> we don't even need it there; we could just index_mem the file contents
> we get via diff_populate_filespec, and the stdin contents will
> already be there.

Probably that is a sane thing to do.

And as you hinted, we definitely need to give the "the copy in the
filespec->data is the only one; do not discard until we are really
done with it" semantics to the bit, or introduce a separate bit that
is not necessarily related to the "this came from the standard input"
bit.  I offhand do not know what data source we would later add
that needs such a treatment.  "git diff http://example.com/{foo,bar}",
perhaps?  The "do not discard" bit at that point may need to learn
to swap it out to a temporary file or something when it happens.

> Right now we call index_path for worktree files, but I don't really see
> much point; we have to read the whole data either way, and
> populate_filespec should be mmap-ing them for us.
>
>>  - We say on the "diff --git" header uglyness like "a/-", "b/-";
>>    likewise in the metainfo;
>
> I'd consider changing the path to "/dev/stdin" for this case. It doesn't
> exist on some platforms, of course, but neither does /dev/null, which we
> use similarly.

Sensible.

>>  - We show on the "index" header "0*" value for these entries, even
>>    though we should be able to compute it (after all we do so for
>>    files on disk in a non-git directory);
>
> The index_mem I mentioned above would fix that.

Yes.

>>  - We still apply attributes and textconv as if we are dealing with
>>    a regular file "-" at the root level.
>
> I think that's bad. I wonder if it should have "*" attributes applied to
> it or not. While I can see it being convenient in some cases, I think it
> makes the rules confusingly complex.

It is likely that the crlf conversion on Dos systems wants to be
applied regardless.

This is unrelated to the "standard input confusion" issue, but there
are two more things coming either from the way the no-index code is
done or from the way it is bolted onto git.

With the current code, this:

	mkdir foo bar
        echo hello >foo/1
        echo bye >bar/2
        git diff --no-index foo bar

shows differences between a/foo/1 and b/bar/1, as the no-index code
records foo/1 and bar/1 as the paths in the filespec for them.

But if you are comparing two directory hierarchies, it is a lot more
likely that you would want to see corresponding files in these two
directories.  In other words, the patch is better shown as
differences between a/1 and b/1 (the code makes the core compare
foo/1 and bar/2 after all).  This of course requires no-index to
differentiate the logical pathname (i.e. "this is the path inside
collection a/ (or b/)") and the physical location from which the
contents are read.  Such a differentiation would allow us to also do
renames and rename classifications much more sanely.  We had to add
DIFF_PAIR_RENAME() and filepair->renamed_pair only to support this
codepath because of this misdesign.  We could have just run strcmp()
between the logical pathname of one/two members of the filepair to
see if the pair was renamed if it was done that way.

And the way diff-no-index.c::queue_diff() walks two directories to
grab paths from them in parallel and incrementally means that the
filesystem walking code cannot be reused for something like this:

	git diff master:Documentation /var/tmp/docs

to compare a hierarchy represented with a tree object with another
hierarchy stored in the filesystem outside git's control.  A natural
way to do so would be to grab a set of paths from /var/tmp/docs and
have that set compared against the other set that comes from the tree,
and the "grab a set of paths from /var/tmp/docs" machinery can be
used twice to implement the current

	git diff --no-index /var/tmp/foo /var/tmp/bar

which would have been a lot cleaner implementation.

Oh well.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Bug: problem with file named with dash character
  2012-06-27 22:17                   ` Junio C Hamano
@ 2012-06-27 22:41                     ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2012-06-27 22:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Daniel Lyubomirov, git

On Wed, Jun 27, 2012 at 03:17:54PM -0700, Junio C Hamano wrote:

> > I think that's bad. I wonder if it should have "*" attributes applied to
> > it or not. While I can see it being convenient in some cases, I think it
> > makes the rules confusingly complex.
> 
> It is likely that the crlf conversion on Dos systems wants to be
> applied regardless.

Yeah, that's specifically the case I was thinking of. I would say "well,
we don't need to care about path at all, they can just use
core.autocrlf", but I think autocrlf is discouraged these days in favor
of using attributes.

> This is unrelated to the "standard input confusion" issue, but there
> are two more things coming either from the way the no-index code is
> done or from the way it is bolted onto git.
> 
> With the current code, this:
> 
> 	mkdir foo bar
>         echo hello >foo/1
>         echo bye >bar/2
>         git diff --no-index foo bar
> 
> shows differences between a/foo/1 and b/bar/1, as the no-index code
> records foo/1 and bar/1 as the paths in the filespec for them.
> 
> But if you are comparing two directory hierarchies, it is a lot more
> likely that you would want to see corresponding files in these two
> directories.  In other words, the patch is better shown as
> differences between a/1 and b/1 (the code makes the core compare
> foo/1 and bar/2 after all).  This of course requires no-index to
> differentiate the logical pathname (i.e. "this is the path inside
> collection a/ (or b/)") and the physical location from which the
> contents are read.  Such a differentiation would allow us to also do
> renames and rename classifications much more sanely.  We had to add
> DIFF_PAIR_RENAME() and filepair->renamed_pair only to support this
> codepath because of this misdesign.  We could have just run strcmp()
> between the logical pathname of one/two members of the filepair to
> see if the pair was renamed if it was done that way.

Yeah, that makes sense. Really you want to split the idea of
diff_filespec into a logical unit of "the thing I am diffing" and a
source struct of "here is where I get the data from". And the latter
could be a union of blob information, filesystem path, and stdin flag,
all contained inside the filespec.

> And the way diff-no-index.c::queue_diff() walks two directories to
> grab paths from them in parallel and incrementally means that the
> filesystem walking code cannot be reused for something like this:
> 
> 	git diff master:Documentation /var/tmp/docs
> 
> to compare a hierarchy represented with a tree object with another
> hierarchy stored in the filesystem outside git's control.  A natural
> way to do so would be to grab a set of paths from /var/tmp/docs and
> have that set compared against the other set that comes from the tree,
> and the "grab a set of paths from /var/tmp/docs" machinery can be
> used twice to implement the current
> 
> 	git diff --no-index /var/tmp/foo /var/tmp/bar
> 
> which would have been a lot cleaner implementation.

Agreed. I have occasionally wanted to do something like the tree
comparison you mentioned above, and I think I resorted to actually
making a git tree out of it.

All of that is nice, and if you feel like working on it, great. But I
admit I don't care too much about the --no-index code path. The key
thing to me is fixing the "-" path in the regular code path without
regressing the no-index stdin code-path too badly. And I think your
patches already do that, so it might be a good stopping point.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-06-27 22:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <52ae7682-3e9a-4b52-bec1-08ba3aadffc0@office.digitalus.nl>
2012-06-27  7:32 ` Bug: problem with file named with dash character Daniel Lyubomirov -|- Digitalus Bulgaria
2012-06-27  9:57   ` faux
2012-06-27 18:28   ` Junio C Hamano
2012-06-27 19:52     ` Jeff King
2012-06-27 20:25       ` Jeff King
2012-06-27 20:27       ` Junio C Hamano
2012-06-27 20:33         ` Junio C Hamano
2012-06-27 20:35           ` Junio C Hamano
2012-06-27 20:39             ` Junio C Hamano
2012-06-27 20:48               ` Junio C Hamano
2012-06-27 21:00                 ` Jeff King
2012-06-27 22:17                   ` Junio C Hamano
2012-06-27 22:41                     ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.