All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug] git branch -v has problems with carriage returns
@ 2017-05-16 23:22 Animi Vulpis
       [not found] ` <CA+izobutP-JY84RGG-JbPA5twbckL1uVwxknBRLVTuGG0MEJcg@mail.gmail.com>
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Animi Vulpis @ 2017-05-16 23:22 UTC (permalink / raw)
  To: git

Hi,

I upgraded to git v2.13.0 and since then git branch -v has problems
with carriage returns in subject lines.

We are using gitlab (not the newest version). So this bug (It's about
carriage returns in auto-generated merge messages (\r\n)) is not yet
fixed in our version:
https://gitlab.com/gitlab-org/gitlab-ce/issues/31671
That's were the carriage returns are coming from.

In my specific case the auto-generated merge message has three lines
with empty lines in between.
So every line ends with `\r\n\r\n`

If I do `git branch -v` with such a subject line somehow the third and
second line get combined before the hash. Example:

$ git branch -v
See merge request !XXXX temp space 84e18d22fd Merge branch
'feature-XXX' into 'develop'
# <begins with third line> <ending of seconds line (if longer than
third)> <commit hash (correct)> <subject line (correct)>

Before git v2.13.0 `git branch -v` worked completely normal.

I was not able to create a minimal local example, because my manually
created \r\n in commit messages were transformed into \n\n

Please let me know if I can provide any more information that would be helpful.

Cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Delivery Status Notification (Failure)
       [not found]     ` <591f6844.82dcca0a.451e5.8f76.GMRIR@mx.google.com>
@ 2017-05-19 21:51       ` Atousa Duprat
  0 siblings, 0 replies; 21+ messages in thread
From: Atousa Duprat @ 2017-05-19 21:51 UTC (permalink / raw)
  To: git, Animi Vulpis

Hi,

I have tried to repro this issue but git goes out of its way to store
the commit messages using unix end-of-line format.
I think that git itself cannot create a repo exhibiting this problem.

Most helpful would be if you could create a mini repo using gitlab.
All it would need is one file, two branches, and a merge.
With that in hand, it should be pretty easy to track down the problem
and fix git.

You mentioned that the previous version you were using was working
fine, can you tell me which version that was?
It'll help to narrow down the changes that could have affected the issue.

Thanks,

Atousa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug] git branch -v has problems with carriage returns
  2017-05-16 23:22 [Bug] git branch -v has problems with carriage returns Animi Vulpis
       [not found] ` <CA+izobutP-JY84RGG-JbPA5twbckL1uVwxknBRLVTuGG0MEJcg@mail.gmail.com>
@ 2017-05-19 21:55 ` Atousa Duprat
  2017-05-19 23:20   ` Animi Vulpis
                     ` (2 more replies)
  2017-05-21 14:10 ` DOAN Tran Cong Danh
  2 siblings, 3 replies; 21+ messages in thread
From: Atousa Duprat @ 2017-05-19 21:55 UTC (permalink / raw)
  To: Animi Vulpis; +Cc: git

Sorry for the noise with previous response...

I have tried to repro this issue but git goes out of its way to store
the commit messages using unix end-of-line format.
I think that git itself cannot create a repo exhibiting this problem.

Most helpful would be if you could create a mini repo using gitlab.
All it would need is one file, two branches, and a merge.
With that in hand, it should be pretty easy to track down the problem
and fix git.

You mentioned that the previous version you were using was working
fine, can you tell me which version that was?
It'll help to narrow down the changes that could have affected the issue.

Thanks,

Atousa

On Tue, May 16, 2017 at 4:22 PM, Animi Vulpis <animi.vulpis@gmail.com> wrote:
> Hi,
>
> I upgraded to git v2.13.0 and since then git branch -v has problems
> with carriage returns in subject lines.
>
> We are using gitlab (not the newest version). So this bug (It's about
> carriage returns in auto-generated merge messages (\r\n)) is not yet
> fixed in our version:
> https://gitlab.com/gitlab-org/gitlab-ce/issues/31671
> That's were the carriage returns are coming from.
>
> In my specific case the auto-generated merge message has three lines
> with empty lines in between.
> So every line ends with `\r\n\r\n`
>
> If I do `git branch -v` with such a subject line somehow the third and
> second line get combined before the hash. Example:
>
> $ git branch -v
> See merge request !XXXX temp space 84e18d22fd Merge branch
> 'feature-XXX' into 'develop'
> # <begins with third line> <ending of seconds line (if longer than
> third)> <commit hash (correct)> <subject line (correct)>
>
> Before git v2.13.0 `git branch -v` worked completely normal.
>
> I was not able to create a minimal local example, because my manually
> created \r\n in commit messages were transformed into \n\n
>
> Please let me know if I can provide any more information that would be helpful.
>
> Cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug] git branch -v has problems with carriage returns
  2017-05-19 21:55 ` [Bug] git branch -v has problems with carriage returns Atousa Duprat
@ 2017-05-19 23:20   ` Animi Vulpis
  2017-05-20  6:48   ` Johannes Sixt
  2017-05-21 13:42   ` [PATCH] ref-filter: treat CRLF as same as LF in find_subpos DOAN Tran Cong Danh
  2 siblings, 0 replies; 21+ messages in thread
From: Animi Vulpis @ 2017-05-19 23:20 UTC (permalink / raw)
  To: Atousa Duprat; +Cc: git

No problem, thanks for taking the time to help me.

I managed to create a minimal repository that shows the bug.
(I was able to deploy gitlab-ce-v8.15.8-ce.0 from docker locally and
create the repo, create the merge request and merge it)

I created a github repository so everybody interested can use it:
https://github.com/AnimiVulpis/git-bug
A few additional informations are in the README.md inside the repository.

FYI: I also tried a lot of things to create commit messages with \r\n
but without success. git does a good job preventing this.

Based on the history of the homebrew git formula
(https://github.com/Homebrew/homebrew-core/commits/master/Formula/git.rb)
and the fact that I `brew udpate` at least once a week I am pretty
sure that this bug does not exist in
git v2.12.2

Hope that helps
Have a nice weekend
David

2017-05-19 23:55 GMT+02:00 Atousa Duprat <atousa.p@gmail.com>:
> Sorry for the noise with previous response...
>
> I have tried to repro this issue but git goes out of its way to store
> the commit messages using unix end-of-line format.
> I think that git itself cannot create a repo exhibiting this problem.
>
> Most helpful would be if you could create a mini repo using gitlab.
> All it would need is one file, two branches, and a merge.
> With that in hand, it should be pretty easy to track down the problem
> and fix git.
>
> You mentioned that the previous version you were using was working
> fine, can you tell me which version that was?
> It'll help to narrow down the changes that could have affected the issue.
>
> Thanks,
>
> Atousa
>
> On Tue, May 16, 2017 at 4:22 PM, Animi Vulpis <animi.vulpis@gmail.com> wrote:
>> Hi,
>>
>> I upgraded to git v2.13.0 and since then git branch -v has problems
>> with carriage returns in subject lines.
>>
>> We are using gitlab (not the newest version). So this bug (It's about
>> carriage returns in auto-generated merge messages (\r\n)) is not yet
>> fixed in our version:
>> https://gitlab.com/gitlab-org/gitlab-ce/issues/31671
>> That's were the carriage returns are coming from.
>>
>> In my specific case the auto-generated merge message has three lines
>> with empty lines in between.
>> So every line ends with `\r\n\r\n`
>>
>> If I do `git branch -v` with such a subject line somehow the third and
>> second line get combined before the hash. Example:
>>
>> $ git branch -v
>> See merge request !XXXX temp space 84e18d22fd Merge branch
>> 'feature-XXX' into 'develop'
>> # <begins with third line> <ending of seconds line (if longer than
>> third)> <commit hash (correct)> <subject line (correct)>
>>
>> Before git v2.13.0 `git branch -v` worked completely normal.
>>
>> I was not able to create a minimal local example, because my manually
>> created \r\n in commit messages were transformed into \n\n
>>
>> Please let me know if I can provide any more information that would be helpful.
>>
>> Cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug] git branch -v has problems with carriage returns
  2017-05-19 21:55 ` [Bug] git branch -v has problems with carriage returns Atousa Duprat
  2017-05-19 23:20   ` Animi Vulpis
@ 2017-05-20  6:48   ` Johannes Sixt
  2017-05-31  5:32     ` Atousa Duprat
  2017-05-21 13:42   ` [PATCH] ref-filter: treat CRLF as same as LF in find_subpos DOAN Tran Cong Danh
  2 siblings, 1 reply; 21+ messages in thread
From: Johannes Sixt @ 2017-05-20  6:48 UTC (permalink / raw)
  To: Atousa Duprat; +Cc: Animi Vulpis, git

Am 19.05.2017 um 23:55 schrieb Atousa Duprat:
> I have tried to repro this issue but git goes out of its way to store
> the commit messages using unix end-of-line format.
> I think that git itself cannot create a repo exhibiting this problem.

Here is a recipe to reproduce the error:

   git init
   git commit --allow-empty -m initial
   git branch crlf $(printf '%s\r\n' subject '' line3_long line4 |
        git commit-tree HEAD:)

The reason for the "bug" is obviously that a line having CR in addition 
to LF is not "an empty line". Consequently, the second line is not 
treated as a separator between subject and body, whereupon Git 
concatenates all lines into one large subject line. This strips the LFs 
but leaves the CRs in tact, which, when printed on a terminal move the 
cursor to the beginning of the line, so that text after the CRs 
overwrites what is already in the terminal.

This is just to give you a head start. I'm not going to look into this.

-- Hannes

>> If I do `git branch -v` with such a subject line somehow the third and
>> second line get combined before the hash. Example:
>>
>> $ git branch -v
>> See merge request !XXXX temp space 84e18d22fd Merge branch
>> 'feature-XXX' into 'develop'
>> # <begins with third line> <ending of seconds line (if longer than
>> third)> <commit hash (correct)> <subject line (correct)>
>>
>> Before git v2.13.0 `git branch -v` worked completely normal.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] ref-filter: treat CRLF as same as LF in find_subpos
  2017-05-19 21:55 ` [Bug] git branch -v has problems with carriage returns Atousa Duprat
  2017-05-19 23:20   ` Animi Vulpis
  2017-05-20  6:48   ` Johannes Sixt
@ 2017-05-21 13:42   ` DOAN Tran Cong Danh
  2017-05-22  1:19     ` Junio C Hamano
  2 siblings, 1 reply; 21+ messages in thread
From: DOAN Tran Cong Danh @ 2017-05-21 13:42 UTC (permalink / raw)
  To: git; +Cc: animi.vulpis, j6t, DOAN Tran Cong Danh

Starting from commit 949af06 (branch: use ref-filter printing APIs, 2017-01-10),
`git branch -v` doesn't treat CRLF as line separator anymore.

Quote from git mailing-list:

> Here is a recipe to reproduce the error:
>
>    git init
>    git commit --allow-empty -m initial
>    git branch crlf $(printf '%s\r\n' subject '' line3_long line4 |
>        git commit-tree HEAD:)
> The reason for the "bug" is obviously that a line having CR in addition
> to LF is not "an empty line". Consequently, the second line is not
> treated as a separator between subject and body, whereupon Git
> concatenates all line into one large subject line. This strips the LFs
> but leaves the CRS in tact, which, when printed on a terminal move the
> cursor to the beginning of the line, so that text after the CRs
> overwrites what is already in the terminal.

Reported-by: Animi Vulpis <animi.vulpis@gmail.com>
Helped-by: Johannes Sixt <j6t@kbdg.org>
Signed-off-by: DOAN Tran Cong Danh <congdanhqx@gmail.com>
---
 ref-filter.c             | 19 +++++++++++++++----
 t/t3203-branch-output.sh |  3 ++-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 1fc5e9970..b3c2276a5 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -967,7 +967,8 @@ static void find_subpos(const char *buf, unsigned long sz,
 	/* subject is first non-empty line */
 	*sub = buf;
 	/* subject goes to first empty line */
-	while (buf < *sig && *buf && *buf != '\n') {
+	while (buf < *sig && *buf && *buf != '\n'
+	       && !(*buf == '\r' && *(buf + 1) == '\n')) {
 		eol = strchrnul(buf, '\n');
 		if (*eol)
 			eol++;
@@ -975,12 +976,22 @@ static void find_subpos(const char *buf, unsigned long sz,
 	}
 	*sublen = buf - *sub;
 	/* drop trailing newline, if present */
-	if (*sublen && (*sub)[*sublen - 1] == '\n')
+	if (*sublen && (*sub)[*sublen - 1] == '\n') {
 		*sublen -= 1;
+		/* also drop trailing CR before that LF */
+		if ((*sublen) && (*sub)[*sublen - 1] == '\r')
+			*sublen -= 1;
+	}
 
 	/* skip any empty lines */
-	while (*buf == '\n')
-		buf++;
+	while (1) {
+		if (*buf == '\n')
+			buf++;
+		else if (*buf == '\r' && *(buf + 1) == '\n')
+			buf += 2;
+		else
+			break;
+	}
 	*body = buf;
 	*bodylen = strlen(buf);
 	*nonsiglen = *sig - buf;
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 5778c0afe..29b392066 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -13,7 +13,8 @@ test_expect_success 'make commits' '
 
 test_expect_success 'make branches' '
 	git branch branch-one &&
-	git branch branch-two HEAD^
+	git branch branch-two $(printf "%s\r\n" one "" line3_long line4 |
+	     git commit-tree HEAD:)
 '
 
 test_expect_success 'make remote branches' '
-- 
2.13.0.67.g10c78a1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH] ref-filter: treat CRLF as same as LF in find_subpos
  2017-05-16 23:22 [Bug] git branch -v has problems with carriage returns Animi Vulpis
       [not found] ` <CA+izobutP-JY84RGG-JbPA5twbckL1uVwxknBRLVTuGG0MEJcg@mail.gmail.com>
  2017-05-19 21:55 ` [Bug] git branch -v has problems with carriage returns Atousa Duprat
@ 2017-05-21 14:10 ` DOAN Tran Cong Danh
  2017-05-22 14:57   ` [PATCH v2] ref-filter: trim end whitespace in subject DOAN Tran Cong Danh
  2 siblings, 1 reply; 21+ messages in thread
From: DOAN Tran Cong Danh @ 2017-05-21 14:10 UTC (permalink / raw)
  To: git
  Cc: animi.vulpis, j6t, peff, gitster, git, pclouds, karthik.188,
	DOAN Tran Cong Danh

Starting from commit 949af06 (branch: use ref-filter printing APIs, 2017-01-10),
`git branch -v` doesn't treat CRLF as line separator anymore.

Quote from git mailing-list:

> Here is a recipe to reproduce the error:
>
>    git init
>    git commit --allow-empty -m initial
>    git branch crlf $(printf '%s\r\n' subject '' line3_long line4 |
>        git commit-tree HEAD:)
> The reason for the "bug" is obviously that a line having CR in addition
> to LF is not "an empty line". Consequently, the second line is not
> treated as a separator between subject and body, whereupon Git
> concatenates all line into one large subject line. This strips the LFs
> but leaves the CRS in tact, which, when printed on a terminal move the
> cursor to the beginning of the line, so that text after the CRs
> overwrites what is already in the terminal.

Reported-by: Animi Vulpis <animi.vulpis@gmail.com>
Helped-by: Johannes Sixt <j6t@kbdg.org>
Signed-off-by: DOAN Tran Cong Danh <congdanhqx@gmail.com>
---
 ref-filter.c             | 19 +++++++++++++++----
 t/t3203-branch-output.sh |  3 ++-
 2 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 1fc5e9970..b3c2276a5 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -967,7 +967,8 @@ static void find_subpos(const char *buf, unsigned long sz,
 	/* subject is first non-empty line */
 	*sub = buf;
 	/* subject goes to first empty line */
-	while (buf < *sig && *buf && *buf != '\n') {
+	while (buf < *sig && *buf && *buf != '\n'
+	       && !(*buf == '\r' && *(buf + 1) == '\n')) {
 		eol = strchrnul(buf, '\n');
 		if (*eol)
 			eol++;
@@ -975,12 +976,22 @@ static void find_subpos(const char *buf, unsigned long sz,
 	}
 	*sublen = buf - *sub;
 	/* drop trailing newline, if present */
-	if (*sublen && (*sub)[*sublen - 1] == '\n')
+	if (*sublen && (*sub)[*sublen - 1] == '\n') {
 		*sublen -= 1;
+		/* also drop trailing CR before that LF */
+		if ((*sublen) && (*sub)[*sublen - 1] == '\r')
+			*sublen -= 1;
+	}
 
 	/* skip any empty lines */
-	while (*buf == '\n')
-		buf++;
+	while (1) {
+		if (*buf == '\n')
+			buf++;
+		else if (*buf == '\r' && *(buf + 1) == '\n')
+			buf += 2;
+		else
+			break;
+	}
 	*body = buf;
 	*bodylen = strlen(buf);
 	*nonsiglen = *sig - buf;
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 5778c0afe..29b392066 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -13,7 +13,8 @@ test_expect_success 'make commits' '
 
 test_expect_success 'make branches' '
 	git branch branch-one &&
-	git branch branch-two HEAD^
+	git branch branch-two $(printf "%s\r\n" one "" line3_long line4 |
+	     git commit-tree HEAD:)
 '
 
 test_expect_success 'make remote branches' '
-- 
2.13.0.67.g10c78a1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] ref-filter: treat CRLF as same as LF in find_subpos
  2017-05-21 13:42   ` [PATCH] ref-filter: treat CRLF as same as LF in find_subpos DOAN Tran Cong Danh
@ 2017-05-22  1:19     ` Junio C Hamano
  2017-05-22 11:29       ` DOAN Tran Cong Danh
  2017-05-22 20:12       ` Jeff King
  0 siblings, 2 replies; 21+ messages in thread
From: Junio C Hamano @ 2017-05-22  1:19 UTC (permalink / raw)
  To: DOAN Tran Cong Danh; +Cc: git, animi.vulpis, j6t, Karthik Nayak

DOAN Tran Cong Danh <congdanhqx@gmail.com> writes:

> Starting from commit 949af06 (branch: use ref-filter printing APIs, 2017-01-10),
> `git branch -v` doesn't treat CRLF as line separator anymore.

A seemingly good problem identification (but not quite; see below) ...

>
> Quote from git mailing-list:
>
>> Here is a recipe to reproduce the error:
>>
>>    git init
>>    git commit --allow-empty -m initial
>>    git branch crlf $(printf '%s\r\n' subject '' line3_long line4 |
>>        git commit-tree HEAD:)
>> The reason for the "bug" is obviously that a line having CR in addition
>> to LF is not "an empty line". Consequently, the second line is not
>> treated as a separator between subject and body, whereupon Git
>> concatenates all line into one large subject line. This strips the LFs
>> but leaves the CRS in tact, which, when printed on a terminal move the
>> cursor to the beginning of the line, so that text after the CRs
>> overwrites what is already in the terminal.

... and good analysis.

However.

If you look at how `git branch -v` before that problematic change
removed the extra CR, you would notice that pretty_print_commit()
was used for that, which eventually called format_subject() with
"one\r\n\r\nline3...", got one line "one\r\n" by calling
get_one_line(), adjusted the line length from 5 to 3 by calling
is_blank_line() which as a side effect trims all whitespaces (not
just LF and CR), and emitted "one".  The reason why the next \r\n
was not mistaken as a non-empty line is the same---is_blank_line()
call onthe next line said that "\r\n" is an all-white space line.

So, while I think the realization that we have a problem, and the
analysis above i.e. "emptiness of the line matters", are both good,
but I do not think this is suffucient to get back the old behaviour.

The thing is, we never treated CRLF as line separator in this code.
What we did was to treat LF as line separator, but trimmed trailing
bytes that are isspace().  That is what the analysis you quoted from
J6t says.

Here is your test addition:

> diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
> index 5778c0afe..29b392066 100755
> --- a/t/t3203-branch-output.sh
> +++ b/t/t3203-branch-output.sh
> @@ -13,7 +13,8 @@ test_expect_success 'make commits' '
>  
>  test_expect_success 'make branches' '
>  	git branch branch-one &&
> -	git branch branch-two HEAD^
> +	git branch branch-two $(printf "%s\r\n" one "" line3_long line4 |
> +	     git commit-tree HEAD:)
>  '

If you change this test to

> +	git branch branch-two $(printf "%s\n" one " " line3_long line4 |
> +	     git commit-tree HEAD:)

then the old code before the problematic change will only show the
first line i.e. "one" in "branch -v" output.  I think with or
without your code change, the new code would still show line3_long
and line4 in the output.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] ref-filter: treat CRLF as same as LF in find_subpos
  2017-05-22  1:19     ` Junio C Hamano
@ 2017-05-22 11:29       ` DOAN Tran Cong Danh
  2017-05-22 20:12       ` Jeff King
  1 sibling, 0 replies; 21+ messages in thread
From: DOAN Tran Cong Danh @ 2017-05-22 11:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, animi.vulpis, j6t, Karthik Nayak

On 05/22/2017 09:19 AM, Junio C Hamano wrote:
> If you look at how `git branch -v` before that problematic change
> removed the extra CR, you would notice that pretty_print_commit()
> was used for that, which eventually called format_subject() with
> "one\r\n\r\nline3...", got one line "one\r\n" by calling
> get_one_line(), adjusted the line length from 5 to 3 by calling
> is_blank_line() which as a side effect trims all whitespaces (not
> just LF and CR), and emitted "one".  The reason why the next \r\n
> was not mistaken as a non-empty line is the same---is_blank_line()
> call onthe next line said that "\r\n" is an all-white space line.
> 
> So, while I think the realization that we have a problem, and the
> analysis above i.e. "emptiness of the line matters", are both good,
> but I do not think this is suffucient to get back the old behaviour.
> 
> The thing is, we never treated CRLF as line separator in this code.
> What we did was to treat LF as line separator, but trimmed trailing
> bytes that are isspace().  That is what the analysis you quoted from
> J6t says.

If I understand correctly, we should trim all whitespaces at the end of
subject line, and we should treat lines contain only whitespaces as
empty lines, right?

> Here is your test addition:
> If you change this test to
> 
>> +	git branch branch-two $(printf "%s\n" one " " line3_long line4 |
>> +	     git commit-tree HEAD:)
> 
> then the old code before the problematic change will only show the
> first line i.e. "one" in "branch -v" output.  I think with or
> without your code change, the new code would still show line3_long
> and line4 in the output.

Agree!

If you could confirm my understand, I would happily provide new patch
to trim all trailing whitespaces at the end of subject line
and treat next all-whitespace-line as empty line.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v2] ref-filter: trim end whitespace in subject
  2017-05-21 14:10 ` DOAN Tran Cong Danh
@ 2017-05-22 14:57   ` DOAN Tran Cong Danh
  2017-05-22 17:10     ` [PATCH v3] " DOAN Tran Cong Danh
  0 siblings, 1 reply; 21+ messages in thread
From: DOAN Tran Cong Danh @ 2017-05-22 14:57 UTC (permalink / raw)
  To: git
  Cc: animi.vulpis, j6t, peff, gitster, git, pclouds, karthik.188,
	ĐOÀN Trần Công Danh

From: ĐOÀN Trần Công Danh <congdanhqx@gmail.com>

Commit 949af0684 ("branch: use ref-filter printing APIs", 2017-01-10)
make `git branch -v` stops trimming end-whitespace in subject,
and it stops treating next all-whitespace-line as an empty line.

Quote from git mailing-list:

> Here is a recipe to reproduce the error:
>
>    git init
>    git commit --allow-empty -m initial
>    git branch crlf $(printf '%s\r\n' subject '' line3_long line4 |
>        git commit-tree HEAD:)
> The reason for the "bug" is obviously that a line having CR in addition
> to LF is not "an empty line". Consequently, the second line is not
> treated as a separator between subject and body, whereupon Git
> concatenates all line into one large subject line. This strips the LFs
> but leaves the CRS in tact, which, when printed on a terminal move the
> cursor to the beginning of the line, so that text after the CRs
> overwrites what is already in the terminal.

To recover previous behavior, trim all whitespace at the end of
first line, and treat all-white-space line as empty line

Reported-by: Animi Vulpis <animi.vulpis@gmail.com>
Helped-by: Johannes Sixt <j6t@kbdg.org>
Signed-off-by: ĐOÀN Trần Công Danh <congdanhqx@gmail.com>
---
 ref-filter.c             | 21 +++++++++++++++++----
 t/t3203-branch-output.sh |  3 ++-
 2 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 1fc5e9970..3625d543c 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -942,6 +942,17 @@ static void grab_person(const char *who, struct atom_value *val, int deref, stru
 	}
 }
 
+/*
+ * check if line in range [start, end) is a blank line or not
+ * data in range [start, end) must be valid before calling this function
+ */
+static int is_blank_line(const char *start, const char *end)
+{
+	while (start != end && isspace(*start))
+		++start;
+	return start == end;
+}
+
 static void find_subpos(const char *buf, unsigned long sz,
 			const char **sub, unsigned long *sublen,
 			const char **body, unsigned long *bodylen,
@@ -967,19 +978,21 @@ static void find_subpos(const char *buf, unsigned long sz,
 	/* subject is first non-empty line */
 	*sub = buf;
 	/* subject goes to first empty line */
-	while (buf < *sig && *buf && *buf != '\n') {
+	while (buf < *sig) {
 		eol = strchrnul(buf, '\n');
 		if (*eol)
 			eol++;
+		if (is_blank_line(buf, eol))
+			break;
 		buf = eol;
 	}
 	*sublen = buf - *sub;
-	/* drop trailing newline, if present */
-	if (*sublen && (*sub)[*sublen - 1] == '\n')
+	/* drop trailing whitespace, if present */
+	while (*sublen && isspace((*sub)[*sublen - 1]))
 		*sublen -= 1;
 
 	/* skip any empty lines */
-	while (*buf == '\n')
+	while (isspace(*buf))
 		buf++;
 	*body = buf;
 	*bodylen = strlen(buf);
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 5778c0afe..fa4441868 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -13,7 +13,8 @@ test_expect_success 'make commits' '
 
 test_expect_success 'make branches' '
 	git branch branch-one &&
-	git branch branch-two HEAD^
+	git branch branch-two $(printf "%s\r\n" one " " line3_long line4 |
+	     git commit-tree HEAD:)
 '
 
 test_expect_success 'make remote branches' '
-- 
2.13.0.67.g10c78a1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3] ref-filter: trim end whitespace in subject
  2017-05-22 14:57   ` [PATCH v2] ref-filter: trim end whitespace in subject DOAN Tran Cong Danh
@ 2017-05-22 17:10     ` DOAN Tran Cong Danh
  2017-05-22 19:47       ` Johannes Sixt
  0 siblings, 1 reply; 21+ messages in thread
From: DOAN Tran Cong Danh @ 2017-05-22 17:10 UTC (permalink / raw)
  To: git
  Cc: animi.vulpis, j6t, peff, gitster, git, pclouds, karthik.188,
	ĐOÀN Trần Công Danh

From: ĐOÀN Trần Công Danh <congdanhqx@gmail.com>

Commit 949af0684 ("branch: use ref-filter printing APIs", 2017-01-10)
make `git branch -v` stops trimming end-whitespace in subject,
and it stops treating next all-whitespace-line as an empty line.

Quote from git mailing-list:

> Here is a recipe to reproduce the error:
>
>    git init
>    git commit --allow-empty -m initial
>    git branch crlf $(printf '%s\r\n' subject '' line3_long line4 |
>        git commit-tree HEAD:)
> The reason for the "bug" is obviously that a line having CR in addition
> to LF is not "an empty line". Consequently, the second line is not
> treated as a separator between subject and body, whereupon Git
> concatenates all line into one large subject line. This strips the LFs
> but leaves the CRS in tact, which, when printed on a terminal move the
> cursor to the beginning of the line, so that text after the CRs
> overwrites what is already in the terminal.

To recover previous behavior, trim all whitespace at the end of
first line, and treat all-white-space line as empty line

Reported-by: Animi Vulpis <animi.vulpis@gmail.com>
Helped-by: Johannes Sixt <j6t@kbdg.org>
Signed-off-by: ĐOÀN Trần Công Danh <congdanhqx@gmail.com>
---
Sorry for the noise, after sending out v2,
I found that the body is calculated incorrectly.

 ref-filter.c             | 40 ++++++++++++++++++++++++++++++++--------
 t/t3203-branch-output.sh |  3 ++-
 2 files changed, 34 insertions(+), 9 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 1fc5e9970..4b30edf61 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -942,6 +942,25 @@ static void grab_person(const char *who, struct atom_value *val, int deref, stru
 	}
 }
 
+/*
+ * check if line in range [start, end) is a blank line or not
+ * data in range [start, end) must be valid before calling this function
+ */
+static int is_blank_line(const char *start, const char *end)
+{
+	while (start != end && isspace(*start))
+		start++;
+	return start == end;
+}
+
+static const char* find_next_eol(const char *buf)
+{
+	const char* eol = strchrnul(buf, '\n');
+	if (*eol)
+		eol++;
+	return eol;
+}
+
 static void find_subpos(const char *buf, unsigned long sz,
 			const char **sub, unsigned long *sublen,
 			const char **body, unsigned long *bodylen,
@@ -949,6 +968,7 @@ static void find_subpos(const char *buf, unsigned long sz,
 			const char **sig, unsigned long *siglen)
 {
 	const char *eol;
+	int has_empty_line = 0;
 	/* skip past header until we hit empty line */
 	while (*buf && *buf != '\n') {
 		eol = strchrnul(buf, '\n');
@@ -967,20 +987,24 @@ static void find_subpos(const char *buf, unsigned long sz,
 	/* subject is first non-empty line */
 	*sub = buf;
 	/* subject goes to first empty line */
-	while (buf < *sig && *buf && *buf != '\n') {
-		eol = strchrnul(buf, '\n');
-		if (*eol)
-			eol++;
+	while (buf < *sig && !has_empty_line) {
+		eol = find_next_eol(buf);
+		has_empty_line = is_blank_line(buf, eol);
 		buf = eol;
 	}
 	*sublen = buf - *sub;
-	/* drop trailing newline, if present */
-	if (*sublen && (*sub)[*sublen - 1] == '\n')
+	/* drop trailing whitespace, if present */
+	while (*sublen && isspace((*sub)[*sublen - 1]))
 		*sublen -= 1;
 
 	/* skip any empty lines */
-	while (*buf == '\n')
-		buf++;
+	while (buf < *sig) {
+		eol = find_next_eol(buf);
+		if (is_blank_line(buf, eol))
+			buf = eol;
+		else
+			break;
+	}
 	*body = buf;
 	*bodylen = strlen(buf);
 	*nonsiglen = *sig - buf;
diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index 5778c0afe..fa4441868 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -13,7 +13,8 @@ test_expect_success 'make commits' '
 
 test_expect_success 'make branches' '
 	git branch branch-one &&
-	git branch branch-two HEAD^
+	git branch branch-two $(printf "%s\r\n" one " " line3_long line4 |
+	     git commit-tree HEAD:)
 '
 
 test_expect_success 'make remote branches' '
-- 
Danh


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] ref-filter: trim end whitespace in subject
  2017-05-22 17:10     ` [PATCH v3] " DOAN Tran Cong Danh
@ 2017-05-22 19:47       ` Johannes Sixt
  2017-05-22 19:53         ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: Johannes Sixt @ 2017-05-22 19:47 UTC (permalink / raw)
  To: DOAN Tran Cong Danh
  Cc: git, animi.vulpis, peff, gitster, git, pclouds, karthik.188

Am 22.05.2017 um 19:10 schrieb DOAN Tran Cong Danh:
> diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
> index 5778c0afe..fa4441868 100755
> --- a/t/t3203-branch-output.sh
> +++ b/t/t3203-branch-output.sh
> @@ -13,7 +13,8 @@ test_expect_success 'make commits' '
>   
>   test_expect_success 'make branches' '
>   	git branch branch-one &&
> -	git branch branch-two HEAD^
> +	git branch branch-two $(printf "%s\r\n" one " " line3_long line4 |
> +	     git commit-tree HEAD:)
>   '
>   
>   test_expect_success 'make remote branches' '
> 

This updated test shows nothing, I am afraid: If I apply only this 
change without the rest of the patch, then all test in t3203 still pass. 
And I do not see how the code change could make any difference at all. 
What am I missing?

-- Hannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] ref-filter: trim end whitespace in subject
  2017-05-22 19:47       ` Johannes Sixt
@ 2017-05-22 19:53         ` Jeff King
  2017-05-22 20:19           ` Johannes Sixt
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2017-05-22 19:53 UTC (permalink / raw)
  To: Johannes Sixt
  Cc: DOAN Tran Cong Danh, git, animi.vulpis, gitster, git, pclouds,
	karthik.188

On Mon, May 22, 2017 at 09:47:59PM +0200, Johannes Sixt wrote:

> Am 22.05.2017 um 19:10 schrieb DOAN Tran Cong Danh:
> > diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
> > index 5778c0afe..fa4441868 100755
> > --- a/t/t3203-branch-output.sh
> > +++ b/t/t3203-branch-output.sh
> > @@ -13,7 +13,8 @@ test_expect_success 'make commits' '
> >   test_expect_success 'make branches' '
> >   	git branch branch-one &&
> > -	git branch branch-two HEAD^
> > +	git branch branch-two $(printf "%s\r\n" one " " line3_long line4 |
> > +	     git commit-tree HEAD:)
> >   '
> >   test_expect_success 'make remote branches' '
> > 
> 
> This updated test shows nothing, I am afraid: If I apply only this change
> without the rest of the patch, then all test in t3203 still pass. And I do
> not see how the code change could make any difference at all. What am I
> missing?

It does for me here on Linux; I wonder if the CRs are being eaten by the
shell expansion.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] ref-filter: treat CRLF as same as LF in find_subpos
  2017-05-22  1:19     ` Junio C Hamano
  2017-05-22 11:29       ` DOAN Tran Cong Danh
@ 2017-05-22 20:12       ` Jeff King
  2017-05-23  1:49         ` Junio C Hamano
  2017-05-23  2:01         ` Junio C Hamano
  1 sibling, 2 replies; 21+ messages in thread
From: Jeff King @ 2017-05-22 20:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: DOAN Tran Cong Danh, git, animi.vulpis, j6t, Karthik Nayak

On Mon, May 22, 2017 at 10:19:52AM +0900, Junio C Hamano wrote:

> However.
> 
> If you look at how `git branch -v` before that problematic change
> removed the extra CR, you would notice that pretty_print_commit()
> was used for that, which eventually called format_subject() with
> "one\r\n\r\nline3...", got one line "one\r\n" by calling
> get_one_line(), adjusted the line length from 5 to 3 by calling
> is_blank_line() which as a side effect trims all whitespaces (not
> just LF and CR), and emitted "one".  The reason why the next \r\n
> was not mistaken as a non-empty line is the same---is_blank_line()
> call onthe next line said that "\r\n" is an all-white space line.

I noticed a similar thing regarding pretty_print_commit(). Which really
made me wonder whether the right solution is to drop the custom parsing
code in ref-filter.c and use the bits from pretty.c. Then we'd have only
one parser. That's less code, but more importantly, there can't be
inconsistencies between the two (we're fixing one now, but we have no
idea if there are others).

I suspect that's more work because we'd need to refactor pretty.c a bit
to make the right functionality available. But the end result would be
much more maintainable.

-Peff

PS I'm also a bit curious how a CRLF got into a commit message in the
   first place. Stripspace should be removing that for normal "git
   commit" runs. I don't know that we've ever said it explicitly, but I
   think we'd consider the canonical in-repo object representation to be
   LF-only (just like we do for smudge/clean filters). Which means that
   whatever is generating these commit objects is arguably buggy.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3] ref-filter: trim end whitespace in subject
  2017-05-22 19:53         ` Jeff King
@ 2017-05-22 20:19           ` Johannes Sixt
  0 siblings, 0 replies; 21+ messages in thread
From: Johannes Sixt @ 2017-05-22 20:19 UTC (permalink / raw)
  To: Jeff King
  Cc: DOAN Tran Cong Danh, git, animi.vulpis, gitster, git, pclouds,
	karthik.188

Am 22.05.2017 um 21:53 schrieb Jeff King:
> On Mon, May 22, 2017 at 09:47:59PM +0200, Johannes Sixt wrote:
> 
>> Am 22.05.2017 um 19:10 schrieb DOAN Tran Cong Danh:
>>> diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
>>> index 5778c0afe..fa4441868 100755
>>> --- a/t/t3203-branch-output.sh
>>> +++ b/t/t3203-branch-output.sh
>>> @@ -13,7 +13,8 @@ test_expect_success 'make commits' '
>>>    test_expect_success 'make branches' '
>>>    	git branch branch-one &&
>>> -	git branch branch-two HEAD^
>>> +	git branch branch-two $(printf "%s\r\n" one " " line3_long line4 |

I didn't notice earlier that there is a blank between the dq here.

>>> +	     git commit-tree HEAD:)
>>>    '
>>>    test_expect_success 'make remote branches' '
>>>
>>
>> This updated test shows nothing, I am afraid: If I apply only this change
>> without the rest of the patch, then all test in t3203 still pass. And I do
>> not see how the code change could make any difference at all. What am I
>> missing?

And I didn't look carefully enough at t3203. Some tests do check branch 
-v output.

> 
> It does for me here on Linux; I wonder if the CRs are being eaten by the
> shell expansion.

And I tested on Linux, too, but on the wrong branch. On a branch closer 
to master I see a failure as well. Sorry for the noise.

There are no CRs on the command line, BTW, only on stdin of commit-tree.

-- Hannes

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] ref-filter: treat CRLF as same as LF in find_subpos
  2017-05-22 20:12       ` Jeff King
@ 2017-05-23  1:49         ` Junio C Hamano
  2017-05-23  2:01         ` Junio C Hamano
  1 sibling, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2017-05-23  1:49 UTC (permalink / raw)
  To: Jeff King; +Cc: DOAN Tran Cong Danh, git, animi.vulpis, j6t, Karthik Nayak

Jeff King <peff@peff.net> writes:

> On Mon, May 22, 2017 at 10:19:52AM +0900, Junio C Hamano wrote:
>
>> However.
>> 
>> If you look at how `git branch -v` before that problematic change
>> removed the extra CR, you would notice that pretty_print_commit()
>> was used for that, which eventually called format_subject() with
>> "one\r\n\r\nline3...", got one line "one\r\n" by calling
>> get_one_line(), adjusted the line length from 5 to 3 by calling
>> is_blank_line() which as a side effect trims all whitespaces (not
>> just LF and CR), and emitted "one".  The reason why the next \r\n
>> was not mistaken as a non-empty line is the same---is_blank_line()
>> call onthe next line said that "\r\n" is an all-white space line.
>
> I noticed a similar thing regarding pretty_print_commit(). Which really
> made me wonder whether the right solution is to drop the custom parsing
> code in ref-filter.c and use the bits from pretty.c.
> ...
> I suspect that's more work because we'd need to refactor pretty.c a bit
> to make the right functionality available. But the end result would be
> much more maintainable.

Yes, before I sent my response I debated myself if I should suggest
going that route (I would have done so if I were working with a
contributor who is already familiar with our codebase).  I agree
that it is the right direction to go in the longer term.

> PS I'm also a bit curious how a CRLF got into a commit message in the
>    first place.

I think I read somebody mumbled gitlab somewhere upthread.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] ref-filter: treat CRLF as same as LF in find_subpos
  2017-05-22 20:12       ` Jeff King
  2017-05-23  1:49         ` Junio C Hamano
@ 2017-05-23  2:01         ` Junio C Hamano
  2017-05-23  3:01           ` Junio C Hamano
  1 sibling, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2017-05-23  2:01 UTC (permalink / raw)
  To: Jeff King; +Cc: DOAN Tran Cong Danh, git, animi.vulpis, j6t, Karthik Nayak

Jeff King <peff@peff.net> writes:

> I suspect that's more work because we'd need to refactor pretty.c a bit
> to make the right functionality available. But the end result would be
> much more maintainable.

I actually think the entire codeflow of "find positions and length
of threeparts" using find_subpos() and then "copy the length bytes
starting position for C_{SUB,BODY,SIG,LINES,...}" must be rethought,
if the behavior of pretty.c::pretty_print_commit() is to be matched.
With the current code, %(contents:body) and other atoms that are
handled in ref-filter.c::grab_sub_body_contents() keep trailing
whitespaces on their lines with the current code that copies length
bytes starting the position using xmemdupz().  There need to be some
code that loses these trailing whiltespaces in the copied result.

While I do not claim that refactoring and reusing code from pretty.c
is the only viable way forward, it is clear to me that a patch that
updates find_subpos() and changes nothing else falls short X-<.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] ref-filter: treat CRLF as same as LF in find_subpos
  2017-05-23  2:01         ` Junio C Hamano
@ 2017-05-23  3:01           ` Junio C Hamano
  0 siblings, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2017-05-23  3:01 UTC (permalink / raw)
  To: Jeff King; +Cc: DOAN Tran Cong Danh, git, animi.vulpis, j6t, Karthik Nayak

Junio C Hamano <gitster@pobox.com> writes:

> Jeff King <peff@peff.net> writes:
>
>> I suspect that's more work because we'd need to refactor pretty.c a bit
>> to make the right functionality available. But the end result would be
>> much more maintainable.
>
> I actually think the entire codeflow of "find positions and length
> of threeparts" using find_subpos() and then "copy the length bytes
> starting position for C_{SUB,BODY,SIG,LINES,...}" must be rethought,
> if the behavior of pretty.c::pretty_print_commit() is to be matched.
> With the current code, %(contents:body) and other atoms that are
> handled in ref-filter.c::grab_sub_body_contents() keep trailing
> whitespaces on their lines with the current code that copies length
> bytes starting the position using xmemdupz().  There need to be some
> code that loses these trailing whiltespaces in the copied result.
>
> While I do not claim that refactoring and reusing code from pretty.c
> is the only viable way forward, it is clear to me that a patch that
> updates find_subpos() and changes nothing else falls short X-<.

I wonder if this would be a viable alternative (this is just a
smoking-break hack without an attempt to think through corner
cases---for example we need to make sure we work sensibly when
the object does not have _any_ body past the header, but I do not
think the original works well in that case, either).

 ref-filter.c | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index 1fc5e9970d..10f8fe15f5 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -949,13 +949,7 @@ static void find_subpos(const char *buf, unsigned long sz,
 			const char **sig, unsigned long *siglen)
 {
 	const char *eol;
-	/* skip past header until we hit empty line */
-	while (*buf && *buf != '\n') {
-		eol = strchrnul(buf, '\n');
-		if (*eol)
-			eol++;
-		buf = eol;
-	}
+
 	/* skip any empty lines */
 	while (*buf == '\n')
 		buf++;
@@ -1011,10 +1005,11 @@ static void append_lines(struct strbuf *out, const char *buf, unsigned long size
 }
 
 /* See grab_values */
-static void grab_sub_body_contents(struct atom_value *val, int deref, struct object *obj, void *buf, unsigned long sz)
+static void grab_sub_body_contents(struct atom_value *val, int deref, struct object *obj, void *rawbuf, unsigned long sz)
 {
 	int i;
 	const char *subpos = NULL, *bodypos = NULL, *sigpos = NULL;
+	struct strbuf buf = STRBUF_INIT;
 	unsigned long sublen = 0, bodylen = 0, nonsiglen = 0, siglen = 0;
 
 	for (i = 0; i < used_atom_cnt; i++) {
@@ -1030,11 +1025,18 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, struct obj
 		    strcmp(name, "trailers") &&
 		    !starts_with(name, "contents"))
 			continue;
-		if (!subpos)
-			find_subpos(buf, sz,
+		if (!subpos) {
+			char *eoh = memmem(rawbuf, sz, "\n\n", 2);
+			eoh += 2;
+			sz -= eoh - (char *)rawbuf;
+			rawbuf = eoh;
+			strbuf_add(&buf, rawbuf, sz);
+			strbuf_stripspace(&buf, 0);
+			find_subpos(buf.buf, sz,
 				    &subpos, &sublen,
 				    &bodypos, &bodylen, &nonsiglen,
 				    &sigpos, &siglen);
+		}
 
 		if (atom->u.contents.option == C_SUB)
 			v->s = copy_subject(subpos, sublen);
@@ -1060,8 +1062,9 @@ static void grab_sub_body_contents(struct atom_value *val, int deref, struct obj
 					info.trailer_end - info.trailer_start);
 			trailer_info_release(&info);
 		} else if (atom->u.contents.option == C_BARE)
-			v->s = xstrdup(subpos);
+			v->s = xmemdupz(rawbuf, sz);
 	}
+	strbuf_release(&buf);
 }
 
 /*

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Bug] git branch -v has problems with carriage returns
  2017-05-20  6:48   ` Johannes Sixt
@ 2017-05-31  5:32     ` Atousa Duprat
  2017-05-31 20:58       ` Stefan Beller
  2017-06-01  1:31       ` Junio C Hamano
  0 siblings, 2 replies; 21+ messages in thread
From: Atousa Duprat @ 2017-05-31  5:32 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: Animi Vulpis, git

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

Here is my first attempt at fixing the issue.

There are two problems in ref-filter.c:

First, copy_subject() has been modified to turn '\n' into a space and
every other ascii control character to be ignored.

Second, find_subpos() doesn't realize that a line that only contains a
'\r\n' is a blank line – at least when using crlf convention.
I have changed things so that a sequence of either '\n' or "\r\n"
separate the subject from the body of the commit message.
I am not looking at the crlf setting because it doesn't seem like a
useful distinction – when one would we ever care for \r\n not to be a
blank line?  But it could be done...

Both fixes are minimal, but it feels like they are a issues with the
specific encoding.  Does git mandate ascii or utf-8 commit messages?
If not, there may be a larger issue here with encodings and line-end
conventions at the very least in ref-filter.c
Guidance would be appreciated for how to deal with this issue...

Patch attached.


Atousa


On Fri, May 19, 2017 at 11:48 PM, Johannes Sixt <j6t@kdbg.org> wrote:
> Am 19.05.2017 um 23:55 schrieb Atousa Duprat:
>>
>> I have tried to repro this issue but git goes out of its way to store
>> the commit messages using unix end-of-line format.
>> I think that git itself cannot create a repo exhibiting this problem.
>
>
> Here is a recipe to reproduce the error:
>
>   git init
>   git commit --allow-empty -m initial
>   git branch crlf $(printf '%s\r\n' subject '' line3_long line4 |
>        git commit-tree HEAD:)
>
> The reason for the "bug" is obviously that a line having CR in addition to
> LF is not "an empty line". Consequently, the second line is not treated as a
> separator between subject and body, whereupon Git concatenates all lines
> into one large subject line. This strips the LFs but leaves the CRs in tact,
> which, when printed on a terminal move the cursor to the beginning of the
> line, so that text after the CRs overwrites what is already in the terminal.
>
> This is just to give you a head start. I'm not going to look into this.
>
> -- Hannes
>
>
>>> If I do `git branch -v` with such a subject line somehow the third and
>>> second line get combined before the hash. Example:
>>>
>>> $ git branch -v
>>> See merge request !XXXX temp space 84e18d22fd Merge branch
>>> 'feature-XXX' into 'develop'
>>> # <begins with third line> <ending of seconds line (if longer than
>>> third)> <commit hash (correct)> <subject line (correct)>
>>>
>>> Before git v2.13.0 `git branch -v` worked completely normal.

[-- Attachment #2: branch-crlf.patch --]
[-- Type: application/octet-stream, Size: 943 bytes --]

diff --git a/ref-filter.c b/ref-filter.c
index 3a640448f..bc573f481 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -836,11 +836,15 @@ static const char *copy_email(const char *buf)
 static char *copy_subject(const char *buf, unsigned long len)
 {
 	char *r = xmemdupz(buf, len);
-	int i;
+	int i, j;
 
-	for (i = 0; i < len; i++)
+	for (i = 0, j = 0; i < len; i++, j++)
 		if (r[i] == '\n')
-			r[i] = ' ';
+			r[j] = ' ';
+		else if (r[i] < 32)
+                    j--; // skip ascii control characters that are not '\n'
+                else r[j] = r[i];
+        r[j]=0;
 
 	return r;
 }
@@ -956,9 +960,12 @@ static void find_subpos(const char *buf, unsigned long sz,
 			eol++;
 		buf = eol;
 	}
+
 	/* skip any empty lines */
 	while (*buf == '\n')
 		buf++;
+	while (*buf == '\r' && *(buf+1) == '\n')
+		buf += 2;
 
 	/* parse signature first; we might not even have a subject line */
 	*sig = buf + parse_signature(buf, strlen(buf));

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Bug] git branch -v has problems with carriage returns
  2017-05-31  5:32     ` Atousa Duprat
@ 2017-05-31 20:58       ` Stefan Beller
  2017-06-01  1:31       ` Junio C Hamano
  1 sibling, 0 replies; 21+ messages in thread
From: Stefan Beller @ 2017-05-31 20:58 UTC (permalink / raw)
  To: Atousa Duprat; +Cc: Johannes Sixt, Animi Vulpis, git

On Tue, May 30, 2017 at 10:32 PM, Atousa Duprat <atousa.p@gmail.com> wrote:
> Here is my first attempt at fixing the issue.

Cool you're looking into this. :)

>
> There are two problems in ref-filter.c:
>
> First, copy_subject() has been modified to turn '\n' into a space and
> every other ascii control character to be ignored.
>
> Second, find_subpos() doesn't realize that a line that only contains a
> '\r\n' is a blank line – at least when using crlf convention.
> I have changed things so that a sequence of either '\n' or "\r\n"
> separate the subject from the body of the commit message.
> I am not looking at the crlf setting because it doesn't seem like a
> useful distinction – when one would we ever care for \r\n not to be a
> blank line?  But it could be done...
>
> Both fixes are minimal, but it feels like they are a issues with the
> specific encoding.  Does git mandate ascii or utf-8 commit messages?
> If not, there may be a larger issue here with encodings and line-end
> conventions at the very least in ref-filter.c
> Guidance would be appreciated for how to deal with this issue...
>
> Patch attached.
>

Please read Documentation/SubmittingPatches
(tl;dr:
(a) please sign your patch, read https://developercertificate.org/
(b) if possible please send patches inline instead of attached)

> diff --git a/ref-filter.c b/ref-filter.c
> index 3a640448f..bc573f481 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -836,11 +836,15 @@ static const char *copy_email(const char *buf)
>  static char *copy_subject(const char *buf, unsigned long len)
>  {
>          char *r = xmemdupz(buf, len);
> -        int i;
> +        int i, j;
>
> -        for (i = 0; i < len; i++)
> +        for (i = 0, j = 0; i < len; i++, j++)
>                  if (r[i] == '\n')
> -                        r[i] = ' ';
> +                        r[j] = ' ';
> +                else if (r[i] < 32)
> +                    j--; // skip ascii control characters that are not '\n'

/*
 * Our comment style uses the other way,
 * as it is compatible with more compilers, still.
 */

This seems to solve a different problem than the carriage return
discussed? So it could go into a separate patch.


> +                else r[j] = r[i];
> +        r[j]=0;
>
>          return r;
>  }
> @@ -956,9 +960,12 @@ static void find_subpos(const char *buf, unsigned long sz,
>                          eol++;
>                  buf = eol;
>          }
> +

stray new line?

>          /* skip any empty lines */
>          while (*buf == '\n')
>                  buf++;
> +        while (*buf == '\r' && *(buf+1) == '\n')
> +                buf += 2;

This first skips LF empty lines and then skips CRLF empty
lines. What if they are mixed? I'd think if we extend the
empty line detection we'd want to robust to such as well,
so maybe

    while (*buf == '\r' || *buf == '\n')
        buf++;

Maybe this is a bit too greedy?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Bug] git branch -v has problems with carriage returns
  2017-05-31  5:32     ` Atousa Duprat
  2017-05-31 20:58       ` Stefan Beller
@ 2017-06-01  1:31       ` Junio C Hamano
  1 sibling, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2017-06-01  1:31 UTC (permalink / raw)
  To: Atousa Duprat; +Cc: Johannes Sixt, Animi Vulpis, git

Atousa Duprat <atousa.p@gmail.com> writes:

> Here is my first attempt at fixing the issue.
>
> There are two problems in ref-filter.c:
>
> First, copy_subject() has been modified to turn '\n' into a space and
> every other ascii control character to be ignored.
>
> Second, find_subpos() doesn't realize that a line that only contains a
> '\r\n' is a blank line – at least when using crlf convention.
> I have changed things so that a sequence of either '\n' or "\r\n"
> separate the subject from the body of the commit message.
> I am not looking at the crlf setting because it doesn't seem like a
> useful distinction – when one would we ever care for \r\n not to be a
> blank line?  But it could be done...
>
> Both fixes are minimal, but it feels like they are a issues with the
> specific encoding.  Does git mandate ascii or utf-8 commit messages?
> If not, there may be a larger issue here with encodings and line-end
> conventions at the very least in ref-filter.c
> Guidance would be appreciated for how to deal with this issue...
>
> Patch attached.

Doesn't this share the problem I pointed out in the other attempt in

  https://public-inbox.org/git/xmqqy3tppu13.fsf@gitster.mtv.corp.google.com/

In short, any patch that special cases CR will not get the original
behaviour back correctly.  The original never special cased CR; it
stripped isspace() at the end of lines, and turning of CRLF into LF
is merely just one effect of it.

You can apply the attached patch and see what "git branch -v"
produces.  We should only see "one", not "one line3_long line4", for
branch-two in the output.  Then replace the SP between %s and \n in
the printf thing with \r and repeat the experiment.

Thanks.

 t/t3203-branch-output.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t3203-branch-output.sh b/t/t3203-branch-output.sh
index a428ae6703..79b80a2d3f 100755
--- a/t/t3203-branch-output.sh
+++ b/t/t3203-branch-output.sh
@@ -13,7 +13,8 @@ test_expect_success 'make commits' '
 
 test_expect_success 'make branches' '
 	git branch branch-one &&
-	git branch branch-two HEAD^
+	git branch branch-two $(printf "%s \n" one "" line3_long line4 |
+	     git commit-tree HEAD:)
 '
 
 test_expect_success 'make remote branches' '

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-06-01  1:31 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-16 23:22 [Bug] git branch -v has problems with carriage returns Animi Vulpis
     [not found] ` <CA+izobutP-JY84RGG-JbPA5twbckL1uVwxknBRLVTuGG0MEJcg@mail.gmail.com>
     [not found]   ` <CA+izobuSKtoQzNJuvuisjh7h3FF=nbt8u-hOHfdeFp8ZjgZF+Q@mail.gmail.com>
     [not found]     ` <591f6844.82dcca0a.451e5.8f76.GMRIR@mx.google.com>
2017-05-19 21:51       ` Delivery Status Notification (Failure) Atousa Duprat
2017-05-19 21:55 ` [Bug] git branch -v has problems with carriage returns Atousa Duprat
2017-05-19 23:20   ` Animi Vulpis
2017-05-20  6:48   ` Johannes Sixt
2017-05-31  5:32     ` Atousa Duprat
2017-05-31 20:58       ` Stefan Beller
2017-06-01  1:31       ` Junio C Hamano
2017-05-21 13:42   ` [PATCH] ref-filter: treat CRLF as same as LF in find_subpos DOAN Tran Cong Danh
2017-05-22  1:19     ` Junio C Hamano
2017-05-22 11:29       ` DOAN Tran Cong Danh
2017-05-22 20:12       ` Jeff King
2017-05-23  1:49         ` Junio C Hamano
2017-05-23  2:01         ` Junio C Hamano
2017-05-23  3:01           ` Junio C Hamano
2017-05-21 14:10 ` DOAN Tran Cong Danh
2017-05-22 14:57   ` [PATCH v2] ref-filter: trim end whitespace in subject DOAN Tran Cong Danh
2017-05-22 17:10     ` [PATCH v3] " DOAN Tran Cong Danh
2017-05-22 19:47       ` Johannes Sixt
2017-05-22 19:53         ` Jeff King
2017-05-22 20:19           ` Johannes Sixt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.