All of lore.kernel.org
 help / color / mirror / Atom feed
* Clarify the meaning of "character" in the documentation
@ 2024-03-05  8:43 Manlio Perillo
  2024-03-05  9:00 ` Kristoffer Haugsbakk
  0 siblings, 1 reply; 82+ messages in thread
From: Manlio Perillo @ 2024-03-05  8:43 UTC (permalink / raw)
  To: git

The term "character" is confusing: does it mean 7bit/ASCII character
or Unicode Code Point?

As an example, with
git config --add core.commentChar •  // Bullet (U+2022)
git does not complain, but it is rejected later.

A counter example is using UTF-8 with "user.name", where it is handled
correctly.

I sent this email after reading the documentation of "git diff
--color-moved=blocks, where the text says:
> Blocks of moved text of at least 20 alphanumeric characters are detected greedily.

In this case it is not clear if the number of characters are counted
as UTF-8 or normal 8bit bytes.

Thanks
Manlio Perillo

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05  8:43 Clarify the meaning of "character" in the documentation Manlio Perillo
@ 2024-03-05  9:00 ` Kristoffer Haugsbakk
  2024-03-05 15:32   ` Junio C Hamano
  2024-03-05 22:48   ` brian m. carlson
  0 siblings, 2 replies; 82+ messages in thread
From: Kristoffer Haugsbakk @ 2024-03-05  9:00 UTC (permalink / raw)
  To: Manlio Perillo; +Cc: git


On Tue, Mar 5, 2024, at 09:43, Manlio Perillo wrote:
> The term "character" is confusing: does it mean 7bit/ASCII character
> or Unicode Code Point?

IMO it should say “ASCII” in contexts where it is restricted to
that. Otherwise UTF-8 can be assumed since git(1) handles that well.

> As an example, with
> git config --add core.commentChar •  // Bullet (U+2022)
> git does not complain, but it is rejected later.

I think this is more about `git config --add` not doing any
validation. It just sets things. You can do `git config --add
core.commentChar 'ffd'` and get the same effect.

> A counter example is using UTF-8 with "user.name", where it is handled
> correctly.

Yep.

It will also handle UTF-8 in cross-systems setting, in my experience: if
you generate patches with git-format-patch(1) it will handle UTF-8 that
ends up in email headers correctly (it needs its own encoding).

It’s quite UTF-8 friendly.

> I sent this email after reading the documentation of "git diff
> --color-moved=blocks, where the text says:
>> Blocks of moved text of at least 20 alphanumeric characters are detected greedily.
>
> In this case it is not clear if the number of characters are counted
> as UTF-8 or normal 8bit bytes.

Alphanumeric characters (a-z and A-Z and 0-9) are ASCII. And one ASCII
char is represented using one byte in UTF-8. This already looks precise
to me.

I’ve never run into a case where git-diff(1) does not handle UTF-8. I
don’t even know if it really needs to “handle” it per se as opposed to
just treating it as opaque bytes. Maybe it matters for things like
whitespace and word-boundaries, I don’t know.

-- 
Kristoffer Haugsbakk


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05  9:00 ` Kristoffer Haugsbakk
@ 2024-03-05 15:32   ` Junio C Hamano
  2024-03-05 15:42     ` Dragan Simic
  2024-03-05 16:51     ` Kristoffer Haugsbakk
  2024-03-05 22:48   ` brian m. carlson
  1 sibling, 2 replies; 82+ messages in thread
From: Junio C Hamano @ 2024-03-05 15:32 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: Manlio Perillo, git

"Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:

>> As an example, with
>> git config --add core.commentChar •  // Bullet (U+2022)
>> git does not complain, but it is rejected later.
>
> I think this is more about `git config --add` not doing any
> validation. It just sets things. You can do `git config --add
> core.commentChar 'ffd'` and get the same effect.

This is not wrong per-se, but it merely explains why "config" takes
it without complaining (the command just does not know anything
about what each variable means and what the valid range of values
are).  core.commentChar is limited to "a byte" so in the context of
everything else (like commit log message in the editor) being UTF-8,
it means ASCII would only work there.

As you said, we should document core.commentChar as limited to an
ASCII character, at least as a short term solution.

I personally do not see a reason, however, why we need to be limited
to a single byte, though.  If a patch cleanly implements to allow us
to use any one-or-more-byte sequence as core.commentChar, I do not
offhand see a good reason to reject it---it would be fully backward
compatible and allows you to use a UTF-8 charcter outside ASCII, as
well as "//" and the like.

> Alphanumeric characters (a-z and A-Z and 0-9) are ASCII. And one ASCII
> char is represented using one byte in UTF-8. This already looks precise
> to me.

Correct.

> I’ve never run into a case where git-diff(1) does not handle UTF-8. I
> don’t even know if it really needs to “handle” it per se as opposed to
> just treating it as opaque bytes. Maybe it matters for things like
> whitespace and word-boundaries, I don’t know.

The core part of "diff" is very much line oriented, and after
chopping your random sequence of bytes at each LF that appears in
it, the code is pretty oblivious to the character boundary, except
for a few cases.  "-w" needs to know what the whitespace characters
are (it knows only the limited basic set like SP HT and probably
VT), "-i" needs to know that "A" and "a" are equivalent (I think it
only knows the ASCII, but I may be misremembering).  Outside the
core part of "diff", there are frills that need to know about
character boundaries, like chopping the function header comment
placed on a hunk header "@@ -1682,7 +1682,7 @@" to a reasonable
length, --color-words/--word-diff that first separates lines into
multi-character tokens and align matching sequences in them, etc.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05 15:32   ` Junio C Hamano
@ 2024-03-05 15:42     ` Dragan Simic
  2024-03-05 16:38       ` Junio C Hamano
  2024-03-05 16:58       ` Clarify the meaning of "character" in the documentation Kristoffer Haugsbakk
  2024-03-05 16:51     ` Kristoffer Haugsbakk
  1 sibling, 2 replies; 82+ messages in thread
From: Dragan Simic @ 2024-03-05 15:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kristoffer Haugsbakk, Manlio Perillo, git

On 2024-03-05 16:32, Junio C Hamano wrote:
> "Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:
>> I think this is more about `git config --add` not doing any
>> validation. It just sets things. You can do `git config --add
>> core.commentChar 'ffd'` and get the same effect.
> 
> As you said, we should document core.commentChar as limited to an
> ASCII character, at least as a short term solution.
> 
> I personally do not see a reason, however, why we need to be limited
> to a single byte, though.  If a patch cleanly implements to allow us
> to use any one-or-more-byte sequence as core.commentChar, I do not
> offhand see a good reason to reject it---it would be fully backward
> compatible and allows you to use a UTF-8 charcter outside ASCII, as
> well as "//" and the like.

May I ask why would we want the comment character to possibly be
a multibyte character?  I mean, I support localization, to make it all
easier for the users who opt not to use English, but wouldn't allowing
multibyte characters for the comment character simply be a bit unneeded?

Maybe I'm missing something?

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05 15:42     ` Dragan Simic
@ 2024-03-05 16:38       ` Junio C Hamano
  2024-03-05 17:28         ` Dragan Simic
  2024-03-06  8:08         ` [messy PATCH] multi-byte core.commentChar Jeff King
  2024-03-05 16:58       ` Clarify the meaning of "character" in the documentation Kristoffer Haugsbakk
  1 sibling, 2 replies; 82+ messages in thread
From: Junio C Hamano @ 2024-03-05 16:38 UTC (permalink / raw)
  To: Dragan Simic; +Cc: Kristoffer Haugsbakk, Manlio Perillo, git

Dragan Simic <dsimic@manjaro.org> writes:

> On 2024-03-05 16:32, Junio C Hamano wrote:
>> "Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:
>>> I think this is more about `git config --add` not doing any
>>> validation. It just sets things. You can do `git config --add
>>> core.commentChar 'ffd'` and get the same effect.
>> As you said, we should document core.commentChar as limited to an
>> ASCII character, at least as a short term solution.
>> I personally do not see a reason, however, why we need to be limited
>> to a single byte, though.  If a patch cleanly implements to allow us
>> to use any one-or-more-byte sequence as core.commentChar, I do not
>> offhand see a good reason to reject it---it would be fully backward
>> compatible and allows you to use a UTF-8 charcter outside ASCII, as
>> well as "//" and the like.
>
> May I ask why would we want the comment character to possibly be
> a multibyte character?  I mean, I support localization, to make it all
> easier for the users who opt not to use English, but wouldn't allowing
> multibyte characters for the comment character simply be a bit unneeded?
>
> Maybe I'm missing something?

That's not a question for me ;-).

It is not my personal itch, so I haven't done anything to make the
commentChar take more than one byte.  But if it is somebody else's
itch, I do not see a reason why we should forbid them from
scratching.  If the setting seeps through across repository
boundaries, that may create a compatibility issue and that by itself
might be such a reason.  If it greatly makes the code more complex,
that may be another reason you can use to argue against adding such
a "feature".  If it makes the semantics of what "a comment string"
is and how they are added and stripped at various stages of
processing commit log messages fuzzy and harder to document and
understand, that might be another reason.  I however do not think
any of these to be true.  Maybe I am overly optimistic.  I haven't
looked deeply into the code around commentChar for quite some time.



^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05 15:32   ` Junio C Hamano
  2024-03-05 15:42     ` Dragan Simic
@ 2024-03-05 16:51     ` Kristoffer Haugsbakk
  2024-03-05 17:37       ` Junio C Hamano
  1 sibling, 1 reply; 82+ messages in thread
From: Kristoffer Haugsbakk @ 2024-03-05 16:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kristoffer Haugsbakk, Manlio Perillo, git

On Tue, Mar 5, 2024, at 16:32, Junio C Hamano wrote:
> "Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:
>
>>> As an example, with
>>> git config --add core.commentChar •  // Bullet (U+2022)
>>> git does not complain, but it is rejected later.
>>
>> I think this is more about `git config --add` not doing any
>> validation. It just sets things. You can do `git config --add
>> core.commentChar 'ffd'` and get the same effect.
>
> This is not wrong per-se, but it merely explains why "config" takes
> it without complaining (the command just does not know anything
> about what each variable means and what the valid range of values
> are).  core.commentChar is limited to "a byte" so in the context of
> everything else (like commit log message in the editor) being UTF-8,
> it means ASCII would only work there.

Yep, I neglected to mention that part.

> I personally do not see a reason, however, why we need to be limited
> to a single byte, though.  If a patch cleanly implements to allow us
> to use any one-or-more-byte sequence as core.commentChar, I do not
> offhand see a good reason to reject it---it would be fully backward
> compatible and allows you to use a UTF-8 charcter outside ASCII, as
> well as "//" and the like.

Allow one codepoint or a string? Since a Unicode “character” can be
composed of multiple codepoints. And at that point it might be more work
to validate that it is a “character” compared to allowing any kind of
string.

Maybe introduce `core.commentString` and make it a synonym for
`core.commentChar`?

> The core part of "diff" is very much line oriented, and after
> chopping your random sequence of bytes at each LF that appears in
> it, the code is pretty oblivious to the character boundary, except
> for a few cases.  "-w" needs to know what the whitespace characters
> are (it knows only the limited basic set like SP HT and probably
> VT), "-i" needs to know that "A" and "a" are equivalent (I think it
> only knows the ASCII, but I may be misremembering).  Outside the
> core part of "diff", there are frills that need to know about
> character boundaries, like chopping the function header comment
> placed on a hunk header "@@ -1682,7 +1682,7 @@" to a reasonable
> length, --color-words/--word-diff that first separates lines into
> multi-character tokens and align matching sequences in them, etc.

Ah, interesting. Thanks :)

> As you said, we should document core.commentChar as limited to an
> ASCII character, at least as a short term solution.

Aha, I see now that the config documentation doesn’t make that clear.

-- >8 --
Subject: [PATCH] config: document `core.commentChar` as ASCII-only

d3b3419f8f2 (config: tell the user that we expect an ASCII character,
2023-03-27) updated an error message to make clear that this option
specifically wants an ASCII character but neglected to consider the
config documentation.

Reported-by: Manlio Perillo <manlio.perillo@gmail.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
---

Notes (series):
    I didn’t find any other relevant occurences with
    
        git grep 'commentChar' -- ':(exclude)po'
    
    `Documentation/git-commit.txt` mentions it but it doesn’t seem like a
    clarification is needed in that context.

 Documentation/config/core.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 0e8c2832bf9..2d4bbdb25fa 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -521,7 +521,7 @@ core.editor::
 
 core.commentChar::
 	Commands such as `commit` and `tag` that let you edit
-	messages consider a line that begins with this character
+	messages consider a line that begins with this ASCII character
 	commented, and removes them after the editor returns
 	(default '#').
 +
-- 
2.44.0.64.g52b67adbeb2


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05 15:42     ` Dragan Simic
  2024-03-05 16:38       ` Junio C Hamano
@ 2024-03-05 16:58       ` Kristoffer Haugsbakk
  2024-03-05 17:20         ` Dragan Simic
  1 sibling, 1 reply; 82+ messages in thread
From: Kristoffer Haugsbakk @ 2024-03-05 16:58 UTC (permalink / raw)
  To: Dragan Simic; +Cc: Manlio Perillo, git, Junio C Hamano

On Tue, Mar 5, 2024, at 16:42, Dragan Simic wrote:
>
> May I ask why would we want the comment character to possibly be
> a multibyte character?  I mean, I support localization, to make it all
> easier for the users who opt not to use English, but wouldn't allowing
> multibyte characters for the comment character simply be a bit unneeded?
>
> Maybe I'm missing something?

Personally I think it’s okay. `%` for example is a good candidate since
you seldom use that as a leading character in prose (after a
whitespace), and it seems that `%` is often recommended as an
alternative.

But if it doesn’t make the code more complex: why not? (I just
personally don’t have a use-case.)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05 16:58       ` Clarify the meaning of "character" in the documentation Kristoffer Haugsbakk
@ 2024-03-05 17:20         ` Dragan Simic
  2024-03-05 17:37           ` Kristoffer Haugsbakk
  0 siblings, 1 reply; 82+ messages in thread
From: Dragan Simic @ 2024-03-05 17:20 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: Manlio Perillo, git, Junio C Hamano

On 2024-03-05 17:58, Kristoffer Haugsbakk wrote:
> On Tue, Mar 5, 2024, at 16:42, Dragan Simic wrote:
>> May I ask why would we want the comment character to possibly be
>> a multibyte character?  I mean, I support localization, to make it all
>> easier for the users who opt not to use English, but wouldn't allowing
>> multibyte characters for the comment character simply be a bit 
>> unneeded?
>> 
>> Maybe I'm missing something?
> 
> Personally I think it’s okay. `%` for example is a good candidate since
> you seldom use that as a leading character in prose (after a
> whitespace), and it seems that `%` is often recommended as an
> alternative.

Isn't '%' actually an ASCII character?

> But if it doesn’t make the code more complex: why not? (I just
> personally don’t have a use-case.)

Frankly, allowing multibyte characters as comment characters looks to me
more like a programming exercise than a really needed feature.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05 16:38       ` Junio C Hamano
@ 2024-03-05 17:28         ` Dragan Simic
  2024-03-06  8:08         ` [messy PATCH] multi-byte core.commentChar Jeff King
  1 sibling, 0 replies; 82+ messages in thread
From: Dragan Simic @ 2024-03-05 17:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kristoffer Haugsbakk, Manlio Perillo, git

On 2024-03-05 17:38, Junio C Hamano wrote:
> Dragan Simic <dsimic@manjaro.org> writes:
>> On 2024-03-05 16:32, Junio C Hamano wrote:
>>> "Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:
>>>> I think this is more about `git config --add` not doing any
>>>> validation. It just sets things. You can do `git config --add
>>>> core.commentChar 'ffd'` and get the same effect.
>>> As you said, we should document core.commentChar as limited to an
>>> ASCII character, at least as a short term solution.
>>> I personally do not see a reason, however, why we need to be limited
>>> to a single byte, though.  If a patch cleanly implements to allow us
>>> to use any one-or-more-byte sequence as core.commentChar, I do not
>>> offhand see a good reason to reject it---it would be fully backward
>>> compatible and allows you to use a UTF-8 charcter outside ASCII, as
>>> well as "//" and the like.
>> 
>> May I ask why would we want the comment character to possibly be
>> a multibyte character?  I mean, I support localization, to make it all
>> easier for the users who opt not to use English, but wouldn't allowing
>> multibyte characters for the comment character simply be a bit 
>> unneeded?
>> 
>> Maybe I'm missing something?
> 
> That's not a question for me ;-).
> 
> It is not my personal itch, so I haven't done anything to make the
> commentChar take more than one byte.  But if it is somebody else's
> itch, I do not see a reason why we should forbid them from
> scratching.  If the setting seeps through across repository
> boundaries, that may create a compatibility issue and that by itself
> might be such a reason.  If it greatly makes the code more complex,
> that may be another reason you can use to argue against adding such
> a "feature".  If it makes the semantics of what "a comment string"
> is and how they are added and stripped at various stages of
> processing commit log messages fuzzy and harder to document and
> understand, that might be another reason.  I however do not think
> any of these to be true.  Maybe I am overly optimistic.  I haven't
> looked deeply into the code around commentChar for quite some time.

Yes, there are quite a few possible obstacles.  As I replied to 
Kristoffer
a bit earlier, I see this more as a programming exercise.  Of course,
unless someone really needs it as a new feature, in which case they will
probably need to overcome all those obstacles. :)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05 17:20         ` Dragan Simic
@ 2024-03-05 17:37           ` Kristoffer Haugsbakk
  2024-03-05 21:19             ` Dragan Simic
  0 siblings, 1 reply; 82+ messages in thread
From: Kristoffer Haugsbakk @ 2024-03-05 17:37 UTC (permalink / raw)
  To: Dragan Simic; +Cc: Manlio Perillo, git, Junio C Hamano

On Tue, Mar 5, 2024, at 18:20, Dragan Simic wrote:
> On 2024-03-05 17:58, Kristoffer Haugsbakk wrote:
>> Personally I think it’s okay. `%` for example is a good candidate since
>> you seldom use that as a leading character in prose (after a
>> whitespace), and it seems that `%` is often recommended as an
>> alternative.
>
> Isn't '%' actually an ASCII character?

I wasn’t clear: personally I think the status quo of only allowing ASCII
characters seems fine given that you can use something like `%` as an
alternative comment char.

-- 
Kristoffer Haugsbakk


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05 16:51     ` Kristoffer Haugsbakk
@ 2024-03-05 17:37       ` Junio C Hamano
  2024-03-05 17:49         ` Kristoffer Haugsbakk
  0 siblings, 1 reply; 82+ messages in thread
From: Junio C Hamano @ 2024-03-05 17:37 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: Manlio Perillo, git

Kristoffer Haugsbakk <code@khaugsbakk.name> writes:

>> I personally do not see a reason, however, why we need to be limited
>> to a single byte, though.  If a patch cleanly implements to allow us
>> to use any one-or-more-byte sequence as core.commentChar, I do not
>> offhand see a good reason to reject it---it would be fully backward
>> compatible and allows you to use a UTF-8 charcter outside ASCII, as
>> well as "//" and the like.
>
> Allow one codepoint or a string?

I said "any one-or-more-byte sequence" and I meant it.  It does not
even have to be a full and complete UTF-8 character.  As long as we
correctly prefix the sequence and strip it from the front, I do not
care if the user chooses to use a broken half-character ;-).

> Maybe introduce `core.commentString` and make it a synonym for
> `core.commentChar`?

Yes, if we were to do so.  As I already said, this is not my itch,
but such a synonym would be part of the migration plan if somebody
seriously designs this as a new feature.

> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index 0e8c2832bf9..2d4bbdb25fa 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -521,7 +521,7 @@ core.editor::
>  
>  core.commentChar::
>  	Commands such as `commit` and `tag` that let you edit
> -	messages consider a line that begins with this character
> +	messages consider a line that begins with this ASCII character
>  	commented, and removes them after the editor returns
>  	(default '#').
>  +

Looks sensible.  Thanks.  Will queue.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05 17:37       ` Junio C Hamano
@ 2024-03-05 17:49         ` Kristoffer Haugsbakk
  0 siblings, 0 replies; 82+ messages in thread
From: Kristoffer Haugsbakk @ 2024-03-05 17:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Manlio Perillo, git, Dragan Simic

On Tue, Mar 5, 2024, at 18:37, Junio C Hamano wrote:
>> Maybe introduce `core.commentString` and make it a synonym for
>> `core.commentChar`?
>
> Yes, if we were to do so.  As I already said, this is not my itch,
> but such a synonym would be part of the migration plan if somebody
> seriously designs this as a new feature.

Maybe someone will discover an itch:

https://github.com/gitgitgadget/git/issues/1685

-- 
Kristoffer Haugsbakk


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05 17:37           ` Kristoffer Haugsbakk
@ 2024-03-05 21:19             ` Dragan Simic
  0 siblings, 0 replies; 82+ messages in thread
From: Dragan Simic @ 2024-03-05 21:19 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: Manlio Perillo, git, Junio C Hamano

On 2024-03-05 18:37, Kristoffer Haugsbakk wrote:
> On Tue, Mar 5, 2024, at 18:20, Dragan Simic wrote:
>> On 2024-03-05 17:58, Kristoffer Haugsbakk wrote:
>>> Personally I think it’s okay. `%` for example is a good candidate 
>>> since
>>> you seldom use that as a leading character in prose (after a
>>> whitespace), and it seems that `%` is often recommended as an
>>> alternative.
>> 
>> Isn't '%' actually an ASCII character?
> 
> I wasn’t clear: personally I think the status quo of only allowing 
> ASCII
> characters seems fine given that you can use something like `%` as an
> alternative comment char.

Ah, I see.  Thanks for the clarification.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: Clarify the meaning of "character" in the documentation
  2024-03-05  9:00 ` Kristoffer Haugsbakk
  2024-03-05 15:32   ` Junio C Hamano
@ 2024-03-05 22:48   ` brian m. carlson
  1 sibling, 0 replies; 82+ messages in thread
From: brian m. carlson @ 2024-03-05 22:48 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: Manlio Perillo, git

[-- Attachment #1: Type: text/plain, Size: 1099 bytes --]

On 2024-03-05 at 09:00:06, Kristoffer Haugsbakk wrote:
> 
> On Tue, Mar 5, 2024, at 09:43, Manlio Perillo wrote:
> > I sent this email after reading the documentation of "git diff
> > --color-moved=blocks, where the text says:
> >> Blocks of moved text of at least 20 alphanumeric characters are detected greedily.
> >
> > In this case it is not clear if the number of characters are counted
> > as UTF-8 or normal 8bit bytes.
> 
> Alphanumeric characters (a-z and A-Z and 0-9) are ASCII. And one ASCII
> char is represented using one byte in UTF-8. This already looks precise
> to me.

I don't believe that's an appropriate definition. é is an alphanumeric
character, as is ç.  ½ is numeric.  I would argue an alphanumeric
character comprises at least Unicode classes Ll, Lm, Lo, Lt, Lu, and Nd.
Unicode TR#18 agrees with my assessment.

If we wanted to restrict it ASCII, we need to state that explicitly.
Alternately, if the constraint is 20 UTF-8 octets or something else, we
should state that instead.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [messy PATCH] multi-byte core.commentChar
  2024-03-05 16:38       ` Junio C Hamano
  2024-03-05 17:28         ` Dragan Simic
@ 2024-03-06  8:08         ` Jeff King
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
  1 sibling, 1 reply; 82+ messages in thread
From: Jeff King @ 2024-03-06  8:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

On Tue, Mar 05, 2024 at 08:38:17AM -0800, Junio C Hamano wrote:

> It is not my personal itch, so I haven't done anything to make the
> commentChar take more than one byte.  But if it is somebody else's
> itch, I do not see a reason why we should forbid them from
> scratching.  If the setting seeps through across repository
> boundaries, that may create a compatibility issue and that by itself
> might be such a reason.  If it greatly makes the code more complex,
> that may be another reason you can use to argue against adding such
> a "feature".  If it makes the semantics of what "a comment string"
> is and how they are added and stripped at various stages of
> processing commit log messages fuzzy and harder to document and
> understand, that might be another reason.  I however do not think
> any of these to be true.  Maybe I am overly optimistic.  I haven't
> looked deeply into the code around commentChar for quite some time.

Here's a messy version of what it would look like, in case anybody is
interested. It passes the tests (using the string "#" for the most
part), but there may be corner cases lurking. The %c/%s conversions are
noisy but obvious. The trickier parts are matching, which goes from
single-character to a string match. I used starts_with() but had to
introduce a "_mem" variant for buffers that aren't NUL-terminated.
There's a bit of mild refactoring/cleanup to avoid awkwardness in a few
spots, as well.

I also did a few manual tests with "foo>" and "•" as comment chars,
which seemed to work.

I can't imagine using this myself (I don't even set core.commentChar at
all), so it was mostly that I nerd-sniped myself by thinking "how hard
could it be?". Not too bad, but not trivial. But maybe it spurs somebody
interested in working on it. I am on the fence whether supporting UTF-8
like the bullet-point above is maybe something we should just do on
principle.

For a more readable series, I'd guess it would make sense to introduce
comment_line_str as a separate variable (but continue to enforce the
single-char rule), convert the easy cases en masse, the tricky cases one
by one, and then finally drop comment_line_char entirely. At which point
the config rules can be lifted to allow multi-byte strings.

---
 add-patch.c       |  4 ++--
 builtin/branch.c  |  2 +-
 builtin/commit.c  | 19 ++++++++++------
 builtin/merge.c   |  2 +-
 builtin/tag.c     |  4 ++--
 commit.c          |  3 ++-
 config.c          |  7 +++---
 environment.c     |  2 +-
 environment.h     |  2 +-
 fmt-merge-msg.c   |  2 +-
 sequencer.c       | 28 ++++++++++++-----------
 strbuf.c          | 43 +++++++++++++++++++----------------
 strbuf.h          |  7 +++---
 t/t7508-status.sh |  4 +++-
 trailer.c         |  6 ++---
 wt-status.c       | 23 +++++++------------
 16 files changed, 82 insertions(+), 76 deletions(-)

diff --git a/add-patch.c b/add-patch.c
index 68f525b35c..4b4db0f253 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -1114,7 +1114,7 @@ static int edit_hunk_manually(struct add_p_state *s, struct hunk *hunk)
 				"To remove '%c' lines, make them ' ' lines "
 				"(context).\n"
 				"To remove '%c' lines, delete them.\n"
-				"Lines starting with %c will be removed.\n"),
+				"Lines starting with %s will be removed.\n"),
 			      s->mode->is_reverse ? '+' : '-',
 			      s->mode->is_reverse ? '-' : '+',
 			      comment_line_char);
@@ -1139,7 +1139,7 @@ static int edit_hunk_manually(struct add_p_state *s, struct hunk *hunk)
 	for (i = 0; i < s->buf.len; ) {
 		size_t next = find_next_line(&s->buf, i);
 
-		if (s->buf.buf[i] != comment_line_char)
+		if (!starts_with(s->buf.buf + i, comment_line_char))
 			strbuf_add(&s->plain, s->buf.buf + i, next - i);
 		i = next;
 	}
diff --git a/builtin/branch.c b/builtin/branch.c
index cfb63cce5f..0b6e1d1adb 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -670,7 +670,7 @@ static int edit_branch_description(const char *branch_name)
 	strbuf_commented_addf(&buf, comment_line_char,
 		    _("Please edit the description for the branch\n"
 		      "  %s\n"
-		      "Lines starting with '%c' will be stripped.\n"),
+		      "Lines starting with '%s' will be stripped.\n"),
 		    branch_name, comment_line_char);
 	write_file_buf(edit_description(), buf.buf, buf.len);
 	strbuf_reset(&buf);
diff --git a/builtin/commit.c b/builtin/commit.c
index 6d1fa71676..898c8aadc7 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -678,15 +678,20 @@ static int author_date_is_interesting(void)
 	return author_message || force_date;
 }
 
+/*
+ * This only supports single-byte comment chars, but that's OK;
+ * our candidate list is fixed.
+ */
 static void adjust_comment_line_char(const struct strbuf *sb)
 {
 	char candidates[] = "#;@!$%^&|:";
 	char *candidate;
 	const char *p;
 
-	comment_line_char = candidates[0];
-	if (!memchr(sb->buf, comment_line_char, sb->len))
+	if (!memchr(sb->buf, candidates[0], sb->len)) {
+		comment_line_char = xstrfmt("%c", candidates[0]);
 		return;
+	}
 
 	p = sb->buf;
 	candidate = strchr(candidates, *p);
@@ -705,7 +710,7 @@ static void adjust_comment_line_char(const struct strbuf *sb)
 	if (!*p)
 		die(_("unable to select a comment character that is not used\n"
 		      "in the current commit message"));
-	comment_line_char = *p;
+	comment_line_char = xstrfmt("%c", *p);
 }
 
 static void prepare_amend_commit(struct commit *commit, struct strbuf *sb,
@@ -909,18 +914,18 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		struct ident_split ci, ai;
 		const char *hint_cleanup_all = allow_empty_message ?
 			_("Please enter the commit message for your changes."
-			  " Lines starting\nwith '%c' will be ignored.\n") :
+			  " Lines starting\nwith '%s' will be ignored.\n") :
 			_("Please enter the commit message for your changes."
-			  " Lines starting\nwith '%c' will be ignored, and an empty"
+			  " Lines starting\nwith '%s' will be ignored, and an empty"
 			  " message aborts the commit.\n");
 		const char *hint_cleanup_space = allow_empty_message ?
 			_("Please enter the commit message for your changes."
 			  " Lines starting\n"
-			  "with '%c' will be kept; you may remove them"
+			  "with '%s' will be kept; you may remove them"
 			  " yourself if you want to.\n") :
 			_("Please enter the commit message for your changes."
 			  " Lines starting\n"
-			  "with '%c' will be kept; you may remove them"
+			  "with '%s' will be kept; you may remove them"
 			  " yourself if you want to.\n"
 			  "An empty message aborts the commit.\n");
 		if (whence != FROM_COMMIT) {
diff --git a/builtin/merge.c b/builtin/merge.c
index 935c8a57dd..81b1cf5b90 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -821,7 +821,7 @@ static const char scissors_editor_comment[] =
 N_("An empty message aborts the commit.\n");
 
 static const char no_scissors_editor_comment[] =
-N_("Lines starting with '%c' will be ignored, and an empty message aborts\n"
+N_("Lines starting with '%s' will be ignored, and an empty message aborts\n"
    "the commit.\n");
 
 static void write_merge_heads(struct commit_list *);
diff --git a/builtin/tag.c b/builtin/tag.c
index 19a7e06bf4..8b17705cf6 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -158,11 +158,11 @@ static int do_sign(struct strbuf *buffer)
 
 static const char tag_template[] =
 	N_("\nWrite a message for tag:\n  %s\n"
-	"Lines starting with '%c' will be ignored.\n");
+	"Lines starting with '%s' will be ignored.\n");
 
 static const char tag_template_nocleanup[] =
 	N_("\nWrite a message for tag:\n  %s\n"
-	"Lines starting with '%c' will be kept; you may remove them"
+	"Lines starting with '%s' will be kept; you may remove them"
 	" yourself if you want to.\n");
 
 static int git_tag_config(const char *var, const char *value,
diff --git a/commit.c b/commit.c
index ef679a0b93..ff9d49a141 100644
--- a/commit.c
+++ b/commit.c
@@ -1796,7 +1796,8 @@ size_t ignored_log_message_bytes(const char *buf, size_t len)
 		else
 			next_line++;
 
-		if (buf[bol] == comment_line_char || buf[bol] == '\n') {
+		if (starts_with_mem(buf + bol, cutoff - bol, comment_line_char) ||
+		    buf[bol] == '\n') {
 			/* is this the first of the run of comments? */
 			if (!boc)
 				boc = bol;
diff --git a/config.c b/config.c
index 3cfeb3d8bd..9280dc9844 100644
--- a/config.c
+++ b/config.c
@@ -1565,11 +1565,10 @@ static int git_default_core_config(const char *var, const char *value,
 			return config_error_nonbool(var);
 		else if (!strcasecmp(value, "auto"))
 			auto_comment_line_char = 1;
-		else if (value[0] && !value[1]) {
-			comment_line_char = value[0];
+		else {
+			comment_line_char = xstrdup(value);
 			auto_comment_line_char = 0;
-		} else
-			return error(_("core.commentChar should only be one ASCII character"));
+		}
 		return 0;
 	}
 
diff --git a/environment.c b/environment.c
index 90632a39bc..4435866d4e 100644
--- a/environment.c
+++ b/environment.c
@@ -110,7 +110,7 @@ int protect_ntfs = PROTECT_NTFS_DEFAULT;
  * The character that begins a commented line in user-editable file
  * that is subject to stripspace.
  */
-char comment_line_char = '#';
+char *comment_line_char = "#";
 int auto_comment_line_char;
 
 /* Parallel index stat data preload? */
diff --git a/environment.h b/environment.h
index e5351c9dd9..821c6079af 100644
--- a/environment.h
+++ b/environment.h
@@ -8,7 +8,7 @@ struct strvec;
  * The character that begins a commented line in user-editable file
  * that is subject to stripspace.
  */
-extern char comment_line_char;
+extern char *comment_line_char;
 extern int auto_comment_line_char;
 
 /*
diff --git a/fmt-merge-msg.c b/fmt-merge-msg.c
index 66e47449a0..daf57917fa 100644
--- a/fmt-merge-msg.c
+++ b/fmt-merge-msg.c
@@ -321,7 +321,7 @@ static void credit_people(struct strbuf *out,
 	     skip_prefix(me, them->items->string, &me) &&
 	     starts_with(me, " <")))
 		return;
-	strbuf_addf(out, "\n%c %s ", comment_line_char, label);
+	strbuf_addf(out, "\n%s %s ", comment_line_char, label);
 	add_people_count(out, them);
 }
 
diff --git a/sequencer.c b/sequencer.c
index f49a871ac0..2370abc379 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -663,7 +663,7 @@ void append_conflicts_hint(struct index_state *istate,
 	if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS) {
 		strbuf_addch(msgbuf, '\n');
 		wt_status_append_cut_line(msgbuf);
-		strbuf_addch(msgbuf, comment_line_char);
+		strbuf_addstr(msgbuf, comment_line_char);
 	}
 
 	strbuf_addch(msgbuf, '\n');
@@ -1779,14 +1779,16 @@ static const char *command_to_string(const enum todo_command command)
 {
 	if (command < TODO_COMMENT)
 		return todo_command_info[command].str;
+	if (command == TODO_COMMENT)
+		return comment_line_char;
 	die(_("unknown command: %d"), command);
 }
 
 static char command_to_char(const enum todo_command command)
 {
 	if (command < TODO_COMMENT)
 		return todo_command_info[command].c;
-	return comment_line_char;
+	return 0;
 }
 
 static int is_noop(const enum todo_command command)
@@ -1840,7 +1842,7 @@ static int is_fixup_flag(enum todo_command command, unsigned flag)
 static void add_commented_lines(struct strbuf *buf, const void *str, size_t len)
 {
 	const char *s = str;
-	while (len > 0 && s[0] == comment_line_char) {
+	while (len > 0 && starts_with_mem(s, len, comment_line_char)) {
 		size_t count;
 		const char *n = memchr(s, '\n', len);
 		if (!n)
@@ -1946,7 +1948,7 @@ static int append_squash_message(struct strbuf *buf, const char *body,
 	     (starts_with(body, "squash!") || starts_with(body, "fixup!"))))
 		commented_len = commit_subject_length(body);
 
-	strbuf_addf(buf, "\n%c ", comment_line_char);
+	strbuf_addf(buf, "\n%s ", comment_line_char);
 	strbuf_addf(buf, _(nth_commit_msg_fmt),
 		    ++opts->current_fixup_count + 1);
 	strbuf_addstr(buf, "\n\n");
@@ -2003,10 +2005,10 @@ static int update_squash_messages(struct repository *r,
 			return error(_("could not read '%s'"),
 				rebase_path_squash_msg());
 
-		eol = buf.buf[0] != comment_line_char ?
+		eol = !starts_with(buf.buf, comment_line_char) ?
 			buf.buf : strchrnul(buf.buf, '\n');
 
-		strbuf_addf(&header, "%c ", comment_line_char);
+		strbuf_addf(&header, "%s ", comment_line_char);
 		strbuf_addf(&header, _(combined_commit_msg_fmt),
 			    opts->current_fixup_count + 2);
 		strbuf_splice(&buf, 0, eol - buf.buf, header.buf, header.len);
@@ -2032,9 +2034,9 @@ static int update_squash_messages(struct repository *r,
 			repo_unuse_commit_buffer(r, head_commit, head_message);
 			return error(_("cannot write '%s'"), rebase_path_fixup_msg());
 		}
-		strbuf_addf(&buf, "%c ", comment_line_char);
+		strbuf_addf(&buf, "%s ", comment_line_char);
 		strbuf_addf(&buf, _(combined_commit_msg_fmt), 2);
-		strbuf_addf(&buf, "\n%c ", comment_line_char);
+		strbuf_addf(&buf, "\n%s ", comment_line_char);
 		strbuf_addstr(&buf, is_fixup_flag(command, flag) ?
 			      _(skip_first_commit_msg_str) :
 			      _(first_commit_msg_str));
@@ -2056,7 +2058,7 @@ static int update_squash_messages(struct repository *r,
 	if (command == TODO_SQUASH || is_fixup_flag(command, flag)) {
 		res = append_squash_message(&buf, body, command, opts, flag);
 	} else if (command == TODO_FIXUP) {
-		strbuf_addf(&buf, "\n%c ", comment_line_char);
+		strbuf_addf(&buf, "\n%s ", comment_line_char);
 		strbuf_addf(&buf, _(skip_nth_commit_msg_fmt),
 			    ++opts->current_fixup_count + 1);
 		strbuf_addstr(&buf, "\n\n");
@@ -2562,7 +2564,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
 	/* left-trim */
 	bol += strspn(bol, " \t");
 
-	if (bol == eol || *bol == '\r' || *bol == comment_line_char) {
+	if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_char)) {
 		item->command = TODO_COMMENT;
 		item->commit = NULL;
 		item->arg_offset = bol - buf;
@@ -5659,7 +5661,7 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 				    oid_to_hex(&commit->object.oid),
 				    oneline.buf);
 			if (is_empty)
-				strbuf_addf(&buf, " %c empty",
+				strbuf_addf(&buf, " %s empty",
 					    comment_line_char);
 
 			FLEX_ALLOC_STR(entry, string, buf.buf);
@@ -5750,7 +5752,7 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 		entry = oidmap_get(&state.commit2label, &commit->object.oid);
 
 		if (entry)
-			strbuf_addf(out, "\n%c Branch %s\n", comment_line_char, entry->string);
+			strbuf_addf(out, "\n%s Branch %s\n", comment_line_char, entry->string);
 		else
 			strbuf_addch(out, '\n');
 
@@ -5887,7 +5889,7 @@ int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
 			    oid_to_hex(&commit->object.oid));
 		pretty_print_commit(&pp, commit, out);
 		if (is_empty)
-			strbuf_addf(out, " %c empty", comment_line_char);
+			strbuf_addf(out, " %s empty", comment_line_char);
 		strbuf_addch(out, '\n');
 	}
 	if (skipped_commit)
diff --git a/strbuf.c b/strbuf.c
index 7827178d8e..5d2a32d8f0 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -24,6 +24,17 @@ int istarts_with(const char *str, const char *prefix)
 			return 0;
 }
 
+int starts_with_mem(const char *str, size_t len, const char *prefix)
+{
+	const char *end = str + len;
+	for (; ; str++, prefix++) {
+		if (!*prefix)
+			return 1;
+		else if (str == end || *str != *prefix)
+			return 0;
+	}
+}
+
 int skip_to_optional_arg_default(const char *str, const char *prefix,
 				 const char **arg, const char *def)
 {
@@ -340,18 +351,17 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
 }
 
 static void add_lines(struct strbuf *out,
-			const char *prefix1,
-			const char *prefix2,
-			const char *buf, size_t size)
+			const char *prefix,
+			const char *buf, size_t size,
+			int space_after_prefix)
 {
 	while (size) {
-		const char *prefix;
 		const char *next = memchr(buf, '\n', size);
 		next = next ? (next + 1) : (buf + size);
 
-		prefix = ((prefix2 && (buf[0] == '\n' || buf[0] == '\t'))
-			  ? prefix2 : prefix1);
 		strbuf_addstr(out, prefix);
+		if (space_after_prefix && buf[0] != '\n' && buf[0] != '\t')
+			strbuf_addch(out, ' ');
 		strbuf_add(out, buf, next - buf);
 		size -= next - buf;
 		buf = next;
@@ -360,19 +370,12 @@ static void add_lines(struct strbuf *out,
 }
 
 void strbuf_add_commented_lines(struct strbuf *out, const char *buf,
-				size_t size, char comment_line_char)
+				size_t size, const char *comment_line_char)
 {
-	static char prefix1[3];
-	static char prefix2[2];
-
-	if (prefix1[0] != comment_line_char) {
-		xsnprintf(prefix1, sizeof(prefix1), "%c ", comment_line_char);
-		xsnprintf(prefix2, sizeof(prefix2), "%c", comment_line_char);
-	}
-	add_lines(out, prefix1, prefix2, buf, size);
+	add_lines(out, comment_line_char, buf, size, 1);
 }
 
-void strbuf_commented_addf(struct strbuf *sb, char comment_line_char,
+void strbuf_commented_addf(struct strbuf *sb, const char *comment_line_char,
 			   const char *fmt, ...)
 {
 	va_list params;
@@ -750,7 +753,7 @@ ssize_t strbuf_read_file(struct strbuf *sb, const char *path, size_t hint)
 void strbuf_add_lines(struct strbuf *out, const char *prefix,
 		      const char *buf, size_t size)
 {
-	add_lines(out, prefix, NULL, buf, size);
+	add_lines(out, prefix, buf, size, 0);
 }
 
 void strbuf_addstr_xml_quoted(struct strbuf *buf, const char *s)
@@ -1005,10 +1008,10 @@ static size_t cleanup(char *line, size_t len)
  *
  * If last line does not have a newline at the end, one is added.
  *
- * Pass a non-NUL comment_line_char to skip every line starting
+ * Pass a non-NULL comment_line_char to skip every line starting
  * with it.
  */
-void strbuf_stripspace(struct strbuf *sb, char comment_line_char)
+void strbuf_stripspace(struct strbuf *sb, const char *comment_line_char)
 {
 	size_t empties = 0;
 	size_t i, j, len, newlen;
@@ -1022,7 +1025,7 @@ void strbuf_stripspace(struct strbuf *sb, char comment_line_char)
 		len = eol ? eol - (sb->buf + i) + 1 : sb->len - i;
 
 		if (comment_line_char && len &&
-		    sb->buf[i] == comment_line_char) {
+		    starts_with(sb->buf + i, comment_line_char)) {
 			newlen = 0;
 			continue;
 		}
diff --git a/strbuf.h b/strbuf.h
index e959caca87..b310a55095 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -288,7 +288,7 @@ void strbuf_splice(struct strbuf *sb, size_t pos, size_t len,
  */
 void strbuf_add_commented_lines(struct strbuf *out,
 				const char *buf, size_t size,
-				char comment_line_char);
+				const char *comment_line_char);
 
 
 /**
@@ -379,7 +379,7 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...);
  * blank to the buffer.
  */
 __attribute__((format (printf, 3, 4)))
-void strbuf_commented_addf(struct strbuf *sb, char comment_line_char, const char *fmt, ...);
+void strbuf_commented_addf(struct strbuf *sb, const char *comment_line_char, const char *fmt, ...);
 
 __attribute__((format (printf,2,0)))
 void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap);
@@ -517,7 +517,7 @@ int strbuf_normalize_path(struct strbuf *sb);
  * then lines beginning with that character are considered comments,
  * thus removed.
  */
-void strbuf_stripspace(struct strbuf *buf, char comment_line_char);
+void strbuf_stripspace(struct strbuf *buf, const char *comment_line_char);
 
 static inline int strbuf_strip_suffix(struct strbuf *sb, const char *suffix)
 {
@@ -673,6 +673,7 @@ char *xstrfmt(const char *fmt, ...);
 
 int starts_with(const char *str, const char *prefix);
 int istarts_with(const char *str, const char *prefix);
+int starts_with_mem(const char *str, size_t len, const char *prefix);
 
 /*
  * If the string "str" is the same as the string in "prefix", then the "arg"
diff --git a/t/t7508-status.sh b/t/t7508-status.sh
index a3c18a4fc2..10ed8b32bc 100755
--- a/t/t7508-status.sh
+++ b/t/t7508-status.sh
@@ -1403,7 +1403,9 @@ test_expect_success "status (core.commentchar with submodule summary)" '
 
 test_expect_success "status (core.commentchar with two chars with submodule summary)" '
 	test_config core.commentchar ";;" &&
-	test_must_fail git -c status.displayCommentPrefix=true status
+	sed "s/^/;/" <expect >expect.double &&
+	git -c status.displayCommentPrefix=true status >output &&
+	test_cmp expect.double output
 '
 
 test_expect_success "--ignore-submodules=all suppresses submodule summary" '
diff --git a/trailer.c b/trailer.c
index ef9df4af55..b3edcfe695 100644
--- a/trailer.c
+++ b/trailer.c
@@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
 
 	/* The first paragraph is the title and cannot be trailers */
 	for (s = buf; s < buf + len; s = next_line(s)) {
-		if (s[0] == comment_line_char)
+		if (starts_with_mem(s, buf + len - s, comment_line_char))
 			continue;
 		if (is_blank_line(s))
 			break;
@@ -902,7 +902,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
 		const char **p;
 		ssize_t separator_pos;
 
-		if (bol[0] == comment_line_char) {
+		if (starts_with_mem(bol, buf + end_of_title - bol, comment_line_char)) {
 			non_trailer_lines += possible_continuation_lines;
 			possible_continuation_lines = 0;
 			continue;
@@ -1013,7 +1013,7 @@ static void parse_trailers(struct trailer_info *info,
 	for (i = 0; i < info->trailer_nr; i++) {
 		int separator_pos;
 		char *trailer = info->trailers[i];
-		if (trailer[0] == comment_line_char)
+		if (starts_with(trailer, comment_line_char))
 			continue;
 		separator_pos = find_separator(trailer, separators);
 		if (separator_pos >= 1) {
diff --git a/wt-status.c b/wt-status.c
index b5a29083df..dfe120a559 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -70,7 +70,7 @@ static void status_vprintf(struct wt_status *s, int at_bol, const char *color,
 	strbuf_vaddf(&sb, fmt, ap);
 	if (!sb.len) {
 		if (s->display_comment_prefix) {
-			strbuf_addch(&sb, comment_line_char);
+			strbuf_addstr(&sb, comment_line_char);
 			if (!trail)
 				strbuf_addch(&sb, ' ');
 		}
@@ -85,7 +85,7 @@ static void status_vprintf(struct wt_status *s, int at_bol, const char *color,
 
 		strbuf_reset(&linebuf);
 		if (at_bol && s->display_comment_prefix) {
-			strbuf_addch(&linebuf, comment_line_char);
+			strbuf_addstr(&linebuf, comment_line_char);
 			if (*line != '\n' && *line != '\t')
 				strbuf_addch(&linebuf, ' ');
 		}
@@ -1090,7 +1090,7 @@ size_t wt_status_locate_end(const char *s, size_t len)
 	const char *p;
 	struct strbuf pattern = STRBUF_INIT;
 
-	strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line);
+	strbuf_addf(&pattern, "\n%s %s", comment_line_char, cut_line);
 	if (starts_with(s, pattern.buf + 1))
 		len = 0;
 	else if ((p = strstr(s, pattern.buf)))
@@ -1176,8 +1176,6 @@ static void wt_longstatus_print_tracking(struct wt_status *s)
 	struct strbuf sb = STRBUF_INIT;
 	const char *cp, *ep, *branch_name;
 	struct branch *branch;
-	char comment_line_string[3];
-	int i;
 	uint64_t t_begin = 0;
 
 	assert(s->branch && !s->is_initial);
@@ -1202,19 +1200,14 @@ static void wt_longstatus_print_tracking(struct wt_status *s)
 		}
 	}
 
-	i = 0;
-	if (s->display_comment_prefix) {
-		comment_line_string[i++] = comment_line_char;
-		comment_line_string[i++] = ' ';
-	}
-	comment_line_string[i] = '\0';
-
 	for (cp = sb.buf; (ep = strchr(cp, '\n')) != NULL; cp = ep + 1)
 		color_fprintf_ln(s->fp, color(WT_STATUS_HEADER, s),
-				 "%s%.*s", comment_line_string,
+				 "%s%s%.*s",
+				 s->display_comment_prefix ? comment_line_char : "",
+				 s->display_comment_prefix ? " " : "",
 				 (int)(ep - cp), cp);
 	if (s->display_comment_prefix)
-		color_fprintf_ln(s->fp, color(WT_STATUS_HEADER, s), "%c",
+		color_fprintf_ln(s->fp, color(WT_STATUS_HEADER, s), "%s",
 				 comment_line_char);
 	else
 		fputs("\n", s->fp);
@@ -1382,7 +1375,7 @@ static int read_rebase_todolist(const char *fname, struct string_list *lines)
 			  git_path("%s", fname));
 	}
 	while (!strbuf_getline_lf(&line, f)) {
-		if (line.len && line.buf[0] == comment_line_char)
+		if (starts_with(line.buf, comment_line_char))
 			continue;
 		strbuf_trim(&line);
 		if (!line.len)

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 0/15] allow multi-byte core.commentChar
  2024-03-06  8:08         ` [messy PATCH] multi-byte core.commentChar Jeff King
@ 2024-03-07  9:14           ` Jeff King
  2024-03-07  9:15             ` [PATCH 01/15] strbuf: simplify comment-handling in add_lines() helper Jeff King
                               ` (16 more replies)
  0 siblings, 17 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:14 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

On Wed, Mar 06, 2024 at 03:08:04AM -0500, Jeff King wrote:

> For a more readable series, I'd guess it would make sense to introduce
> comment_line_str as a separate variable (but continue to enforce the
> single-char rule), convert the easy cases en masse, the tricky cases one
> by one, and then finally drop comment_line_char entirely. At which point
> the config rules can be lifted to allow multi-byte strings.

I ended up cleaning this up. Like I said, this isn't something I'm
personally that interested in. But it just seemed like a wart that this
one spot could not handle multi-byte characters that all the cool kids
are using in their prompts etc these days.

Plus it was kind of an interesting puzzle for how to lay out the
refactoring to make each step self-consistent. At the very least, I
think the first couple of cleanups are worth it even if we do not see
the whole thing through. ;)

It obviously nullifies kh/doc-commentchar-is-a-byte, which is in 'next'.
Sadly "git merge" does not find a conflict with the documentation update
in patch 15, so we'll have to remember to pick up one topic or the
other.

I'm using U+00BB as my commentChar for now to see if any bugs show up,
but I expect I'll get sick of it after a few days.

  [01/15]: strbuf: simplify comment-handling in add_lines() helper
  [02/15]: strbuf: avoid static variables in strbuf_add_commented_lines()
  [03/15]: commit: refactor base-case of adjust_comment_line_char()
  [04/15]: strbuf: avoid shadowing global comment_line_char name

    These four are cleanups that could be taken independently.

  [05/15]: environment: store comment_line_char as a string

    This one preps us for incrementally moving code over to the new
    system.

  [06/15]: strbuf: accept a comment string for strbuf_stripspace()
  [07/15]: strbuf: accept a comment string for strbuf_commented_addf()
  [08/15]: strbuf: accept a comment string for strbuf_add_commented_lines()
  [09/15]: prefer comment_line_str to comment_line_char for printing
  [10/15]: find multi-byte comment chars in NUL-terminated strings
  [11/15]: find multi-byte comment chars in unterminated buffers
  [12/15]: sequencer: handle multi-byte comment characters when writing todo list
  [13/15]: wt-status: drop custom comment-char stringification

    These ones are the actual transition.

  [14/15]: environment: drop comment_line_char compatibility macro
  [15/15]: config: allow multi-byte core.commentChar

    And then we tie it off by dropping the now-unused bits and loosening
    the config logic.

 Documentation/config/core.txt |  4 ++-
 add-patch.c                   | 14 +++++-----
 builtin/branch.c              |  8 +++---
 builtin/commit.c              | 19 +++++++-------
 builtin/merge.c               | 12 ++++-----
 builtin/notes.c               | 10 ++++----
 builtin/rebase.c              |  2 +-
 builtin/stripspace.c          |  4 +--
 builtin/tag.c                 | 14 +++++-----
 commit.c                      |  3 ++-
 config.c                      |  6 ++---
 environment.c                 |  2 +-
 environment.h                 |  2 +-
 fmt-merge-msg.c               |  8 +++---
 rebase-interactive.c          | 10 ++++----
 sequencer.c                   | 48 ++++++++++++++++++-----------------
 strbuf.c                      | 47 ++++++++++++++++++----------------
 strbuf.h                      |  9 ++++---
 t/t0030-stripspace.sh         |  5 ++++
 t/t7507-commit-verbose.sh     | 10 ++++++++
 t/t7508-status.sh             |  4 ++-
 trailer.c                     |  6 ++---
 wt-status.c                   | 31 +++++++++-------------
 23 files changed, 149 insertions(+), 129 deletions(-)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 01/15] strbuf: simplify comment-handling in add_lines() helper
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
@ 2024-03-07  9:15             ` Jeff King
  2024-03-07  9:16             ` [PATCH 02/15] strbuf: avoid static variables in strbuf_add_commented_lines() Jeff King
                               ` (15 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:15 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

In strbuf_add_commented_lines(), we prepare two strings with potential
prefixes: one with just the comment char, and one with an additional
space. In the add_lines() helper, we use the one without the extra space
for blank lines or lines starting with a tab.

While passing in two separate prefixes to the helper is very flexible,
it's more flexibility than we actually use (or are likely to use, since
the rules inside add_lines() only make sense if "prefix2" is a variant
of "prefix1" without the extra space). And setting up the two strings
makes refactoring in strbuf_add_commented_lines() awkward.

Instead, let's pass in a single string, and just let add_lines() add the
extra space to the result as appropriate.

We do still need to pass in a flag to trigger this behavior. The helper
is shared by strbuf_add_lines(), which passes in a NULL "prefix2" to
inhibit this extra handling.

Signed-off-by: Jeff King <peff@peff.net>
---
 strbuf.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 7827178d8e..689d8acd5e 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -340,18 +340,17 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
 }
 
 static void add_lines(struct strbuf *out,
-			const char *prefix1,
-			const char *prefix2,
-			const char *buf, size_t size)
+			const char *prefix,
+			const char *buf, size_t size,
+			int space_after_prefix)
 {
 	while (size) {
-		const char *prefix;
 		const char *next = memchr(buf, '\n', size);
 		next = next ? (next + 1) : (buf + size);
 
-		prefix = ((prefix2 && (buf[0] == '\n' || buf[0] == '\t'))
-			  ? prefix2 : prefix1);
 		strbuf_addstr(out, prefix);
+		if (space_after_prefix && buf[0] != '\n' && buf[0] != '\t')
+			strbuf_addch(out, ' ');
 		strbuf_add(out, buf, next - buf);
 		size -= next - buf;
 		buf = next;
@@ -362,14 +361,11 @@ static void add_lines(struct strbuf *out,
 void strbuf_add_commented_lines(struct strbuf *out, const char *buf,
 				size_t size, char comment_line_char)
 {
-	static char prefix1[3];
-	static char prefix2[2];
+	static char prefix[2];
 
-	if (prefix1[0] != comment_line_char) {
-		xsnprintf(prefix1, sizeof(prefix1), "%c ", comment_line_char);
-		xsnprintf(prefix2, sizeof(prefix2), "%c", comment_line_char);
-	}
-	add_lines(out, prefix1, prefix2, buf, size);
+	if (prefix[0] != comment_line_char)
+		xsnprintf(prefix, sizeof(prefix), "%c", comment_line_char);
+	add_lines(out, prefix, buf, size, 1);
 }
 
 void strbuf_commented_addf(struct strbuf *sb, char comment_line_char,
@@ -750,7 +746,7 @@ ssize_t strbuf_read_file(struct strbuf *sb, const char *path, size_t hint)
 void strbuf_add_lines(struct strbuf *out, const char *prefix,
 		      const char *buf, size_t size)
 {
-	add_lines(out, prefix, NULL, buf, size);
+	add_lines(out, prefix, buf, size, 0);
 }
 
 void strbuf_addstr_xml_quoted(struct strbuf *buf, const char *s)
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 02/15] strbuf: avoid static variables in strbuf_add_commented_lines()
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
  2024-03-07  9:15             ` [PATCH 01/15] strbuf: simplify comment-handling in add_lines() helper Jeff King
@ 2024-03-07  9:16             ` Jeff King
  2024-03-07  9:18             ` [PATCH 03/15] commit: refactor base-case of adjust_comment_line_char() Jeff King
                               ` (14 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:16 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

In strbuf_add_commented_lines(), we have to convert the single-byte
comment_line_char into a string to pass to add_lines(). We cache the
created string using a static-local variable. But this makes the
function non-reentrant, and it's doubtful that this provides any real
performance benefit given that we know the string always contains a
single character.

So let's just create it from scratch each time, and to give the compiler
the maximal opportunity to make it fast we'll ditch the over-complicated
xsnprintf() and just assign directly into the array.

Signed-off-by: Jeff King <peff@peff.net>
---
In the long run we'll end up just passing in the comment-string that the
caller gives us, so this patch could arguably be dropped until that
point.

 strbuf.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 689d8acd5e..ca80a2c77e 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -361,10 +361,10 @@ static void add_lines(struct strbuf *out,
 void strbuf_add_commented_lines(struct strbuf *out, const char *buf,
 				size_t size, char comment_line_char)
 {
-	static char prefix[2];
+	char prefix[2];
 
-	if (prefix[0] != comment_line_char)
-		xsnprintf(prefix, sizeof(prefix), "%c", comment_line_char);
+	prefix[0] = comment_line_char;
+	prefix[1] = '\0';
 	add_lines(out, prefix, buf, size, 1);
 }
 
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 03/15] commit: refactor base-case of adjust_comment_line_char()
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
  2024-03-07  9:15             ` [PATCH 01/15] strbuf: simplify comment-handling in add_lines() helper Jeff King
  2024-03-07  9:16             ` [PATCH 02/15] strbuf: avoid static variables in strbuf_add_commented_lines() Jeff King
@ 2024-03-07  9:18             ` Jeff King
  2024-03-07  9:19             ` [PATCH 04/15] strbuf: avoid shadowing global comment_line_char name Jeff King
                               ` (13 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:18 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

When core.commentChar is set to "auto", we check a set of candidate
characters against the proposed buffer to see which if any can be used
without ambiguity. But before we do that, we optimize for the common
case that the default "#" is fine by just seeing if it is present in the
buffer at all.

The way we do this is a bit subtle, though: we assign the candidate
character to comment_line_char preemptively, then check if it works, and
return if it does. The subtle part is that sometimes setting
comment_line_char is important (after we return, the important outcome
is the fact that we have set the variable) and sometimes it is useless
(if our optimization fails, we go on to do the more careful checks and
eventually assign something else instead).

To make it more clear what is happening (and to make further refactoring
of comment_line_char easier), let's check our candidate character
directly, and then assign as part of returning if it worked out.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/commit.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 6d1fa71676..d496980421 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -684,9 +684,10 @@ static void adjust_comment_line_char(const struct strbuf *sb)
 	char *candidate;
 	const char *p;
 
-	comment_line_char = candidates[0];
-	if (!memchr(sb->buf, comment_line_char, sb->len))
+	if (!memchr(sb->buf, candidates[0], sb->len)) {
+		comment_line_char = candidates[0];
 		return;
+	}
 
 	p = sb->buf;
 	candidate = strchr(candidates, *p);
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 04/15] strbuf: avoid shadowing global comment_line_char name
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (2 preceding siblings ...)
  2024-03-07  9:18             ` [PATCH 03/15] commit: refactor base-case of adjust_comment_line_char() Jeff King
@ 2024-03-07  9:19             ` Jeff King
  2024-03-07  9:20             ` [PATCH 05/15] environment: store comment_line_char as a string Jeff King
                               ` (12 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:19 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

Several comment-related strbuf functions take a comment_line_char
parameter. There's also a global comment_line_char variable, which is
closely related (most callers pass it in as this parameter). Let's avoid
shadowing the global name. This makes it more obvious that we're not
using the global value, and it will be especially helpful as we refactor
the global in future patches (in particular, any macro trickery wouldn't
work because the preprocessor doesn't respect scope).

We'll use "comment_prefix". That should be descriptive enough, and as a
bonus is more neutral with respect to the "char" type (since we'll
eventually swap it out for a string).

Signed-off-by: Jeff King <peff@peff.net>
---
 strbuf.c | 16 ++++++++--------
 strbuf.h |  8 ++++----
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index ca80a2c77e..a33aed6c07 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -359,16 +359,16 @@ static void add_lines(struct strbuf *out,
 }
 
 void strbuf_add_commented_lines(struct strbuf *out, const char *buf,
-				size_t size, char comment_line_char)
+				size_t size, char comment_prefix)
 {
 	char prefix[2];
 
-	prefix[0] = comment_line_char;
+	prefix[0] = comment_prefix;
 	prefix[1] = '\0';
 	add_lines(out, prefix, buf, size, 1);
 }
 
-void strbuf_commented_addf(struct strbuf *sb, char comment_line_char,
+void strbuf_commented_addf(struct strbuf *sb, char comment_prefix,
 			   const char *fmt, ...)
 {
 	va_list params;
@@ -379,7 +379,7 @@ void strbuf_commented_addf(struct strbuf *sb, char comment_line_char,
 	strbuf_vaddf(&buf, fmt, params);
 	va_end(params);
 
-	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_line_char);
+	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_prefix);
 	if (incomplete_line)
 		sb->buf[--sb->len] = '\0';
 
@@ -1001,10 +1001,10 @@ static size_t cleanup(char *line, size_t len)
  *
  * If last line does not have a newline at the end, one is added.
  *
- * Pass a non-NUL comment_line_char to skip every line starting
+ * Pass a non-NUL comment_prefix to skip every line starting
  * with it.
  */
-void strbuf_stripspace(struct strbuf *sb, char comment_line_char)
+void strbuf_stripspace(struct strbuf *sb, char comment_prefix)
 {
 	size_t empties = 0;
 	size_t i, j, len, newlen;
@@ -1017,8 +1017,8 @@ void strbuf_stripspace(struct strbuf *sb, char comment_line_char)
 		eol = memchr(sb->buf + i, '\n', sb->len - i);
 		len = eol ? eol - (sb->buf + i) + 1 : sb->len - i;
 
-		if (comment_line_char && len &&
-		    sb->buf[i] == comment_line_char) {
+		if (comment_prefix && len &&
+		    sb->buf[i] == comment_prefix) {
 			newlen = 0;
 			continue;
 		}
diff --git a/strbuf.h b/strbuf.h
index e959caca87..860fcec5fb 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -288,7 +288,7 @@ void strbuf_splice(struct strbuf *sb, size_t pos, size_t len,
  */
 void strbuf_add_commented_lines(struct strbuf *out,
 				const char *buf, size_t size,
-				char comment_line_char);
+				char comment_prefix);
 
 
 /**
@@ -379,7 +379,7 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...);
  * blank to the buffer.
  */
 __attribute__((format (printf, 3, 4)))
-void strbuf_commented_addf(struct strbuf *sb, char comment_line_char, const char *fmt, ...);
+void strbuf_commented_addf(struct strbuf *sb, char comment_prefix, const char *fmt, ...);
 
 __attribute__((format (printf,2,0)))
 void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap);
@@ -513,11 +513,11 @@ int strbuf_getcwd(struct strbuf *sb);
 int strbuf_normalize_path(struct strbuf *sb);
 
 /**
- * Strip whitespace from a buffer. If comment_line_char is non-NUL,
+ * Strip whitespace from a buffer. If comment_prefix is non-NUL,
  * then lines beginning with that character are considered comments,
  * thus removed.
  */
-void strbuf_stripspace(struct strbuf *buf, char comment_line_char);
+void strbuf_stripspace(struct strbuf *buf, char comment_prefix);
 
 static inline int strbuf_strip_suffix(struct strbuf *sb, const char *suffix)
 {
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 05/15] environment: store comment_line_char as a string
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (3 preceding siblings ...)
  2024-03-07  9:19             ` [PATCH 04/15] strbuf: avoid shadowing global comment_line_char name Jeff King
@ 2024-03-07  9:20             ` Jeff King
  2024-03-07  9:21             ` [PATCH 06/15] strbuf: accept a comment string for strbuf_stripspace() Jeff King
                               ` (11 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:20 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

We'd like to eventually support multi-byte comment prefixes, but the
comment_line_char variable is referenced in many spots, making the
transition difficult.

Let's start by storing the character in a NUL-terminated string. That
will let us switch code over incrementally to the string format, and we
can easily support the existing code with a macro wrapper (since we'll
continue to allow only a single-byte prefix, this will behave
identically).

Once all references to the "char" variable have been converted, we can
drop it and enable longer strings.

We'll still have to touch all of the spots that create or set the
variable in this patch, but there are only a few (reading the config,
and the "auto" character selector).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/commit.c | 4 ++--
 config.c         | 2 +-
 environment.c    | 2 +-
 environment.h    | 3 ++-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index d496980421..d8abbe48b1 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -685,7 +685,7 @@ static void adjust_comment_line_char(const struct strbuf *sb)
 	const char *p;
 
 	if (!memchr(sb->buf, candidates[0], sb->len)) {
-		comment_line_char = candidates[0];
+		comment_line_str = xstrfmt("%c", candidates[0]);
 		return;
 	}
 
@@ -706,7 +706,7 @@ static void adjust_comment_line_char(const struct strbuf *sb)
 	if (!*p)
 		die(_("unable to select a comment character that is not used\n"
 		      "in the current commit message"));
-	comment_line_char = *p;
+	comment_line_str = xstrfmt("%c", *p);
 }
 
 static void prepare_amend_commit(struct commit *commit, struct strbuf *sb,
diff --git a/config.c b/config.c
index 3cfeb3d8bd..e12ea68f24 100644
--- a/config.c
+++ b/config.c
@@ -1566,7 +1566,7 @@ static int git_default_core_config(const char *var, const char *value,
 		else if (!strcasecmp(value, "auto"))
 			auto_comment_line_char = 1;
 		else if (value[0] && !value[1]) {
-			comment_line_char = value[0];
+			comment_line_str = xstrfmt("%c", value[0]);
 			auto_comment_line_char = 0;
 		} else
 			return error(_("core.commentChar should only be one ASCII character"));
diff --git a/environment.c b/environment.c
index 90632a39bc..0a9f5db407 100644
--- a/environment.c
+++ b/environment.c
@@ -110,7 +110,7 @@ int protect_ntfs = PROTECT_NTFS_DEFAULT;
  * The character that begins a commented line in user-editable file
  * that is subject to stripspace.
  */
-char comment_line_char = '#';
+const char *comment_line_str = "#";
 int auto_comment_line_char;
 
 /* Parallel index stat data preload? */
diff --git a/environment.h b/environment.h
index e5351c9dd9..3496474cce 100644
--- a/environment.h
+++ b/environment.h
@@ -8,7 +8,8 @@ struct strvec;
  * The character that begins a commented line in user-editable file
  * that is subject to stripspace.
  */
-extern char comment_line_char;
+#define comment_line_char (comment_line_str[0])
+extern const char *comment_line_str;
 extern int auto_comment_line_char;
 
 /*
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 06/15] strbuf: accept a comment string for strbuf_stripspace()
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (4 preceding siblings ...)
  2024-03-07  9:20             ` [PATCH 05/15] environment: store comment_line_char as a string Jeff King
@ 2024-03-07  9:21             ` Jeff King
  2024-03-07  9:53               ` Jeff King
  2024-03-07  9:22             ` [PATCH 07/15] strbuf: accept a comment string for strbuf_commented_addf() Jeff King
                               ` (10 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:21 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

As part of our transition to multi-byte comment characters, let's take a
NUL-terminated string pointer for strbuf_stripspace(), rather than a
single character. We can continue to support its feature of ignoring
comments by accepting a NULL pointer (as opposed to the current behavior
of a NUL byte).

All of the callers have to be adjusted, but they can all just pass
comment_line_str (or NULL).

Inside the function we detect comments by comparing the first byte of a
line to the comment character. We'll adjust that to use starts_with(),
which will match multiple bytes (though for now, of course, we still
only allow a single byte, so it's academic).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/branch.c     | 2 +-
 builtin/notes.c      | 2 +-
 builtin/rebase.c     | 2 +-
 builtin/stripspace.c | 2 +-
 builtin/tag.c        | 2 +-
 rebase-interactive.c | 2 +-
 sequencer.c          | 6 +++---
 strbuf.c             | 6 +++---
 strbuf.h             | 4 ++--
 9 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index cfb63cce5f..c03c0407d1 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -678,7 +678,7 @@ static int edit_branch_description(const char *branch_name)
 		strbuf_release(&buf);
 		return -1;
 	}
-	strbuf_stripspace(&buf, comment_line_char);
+	strbuf_stripspace(&buf, comment_line_str);
 
 	strbuf_addf(&name, "branch.%s.description", branch_name);
 	if (buf.len || exists)
diff --git a/builtin/notes.c b/builtin/notes.c
index caf20fd5bd..5223a3f350 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -223,7 +223,7 @@ static void prepare_note_data(const struct object_id *object, struct note_data *
 			die(_("please supply the note contents using either -m or -F option"));
 		}
 		if (d->stripspace)
-			strbuf_stripspace(&d->buf, comment_line_char);
+			strbuf_stripspace(&d->buf, comment_line_str);
 	}
 }
 
diff --git a/builtin/rebase.c b/builtin/rebase.c
index 6ead9465a4..bf78402129 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -204,7 +204,7 @@ static int edit_todo_file(unsigned flags)
 	if (strbuf_read_file(&todo_list.buf, todo_file, 0) < 0)
 		return error_errno(_("could not read '%s'."), todo_file);
 
-	strbuf_stripspace(&todo_list.buf, comment_line_char);
+	strbuf_stripspace(&todo_list.buf, comment_line_str);
 	res = edit_todo_list(the_repository, &todo_list, &new_todo, NULL, NULL, flags);
 	if (!res && todo_list_write_to_file(the_repository, &new_todo, todo_file,
 					    NULL, NULL, -1, flags & ~(TODO_LIST_SHORTEN_IDS)))
diff --git a/builtin/stripspace.c b/builtin/stripspace.c
index 7b700a9fb1..434ac490cb 100644
--- a/builtin/stripspace.c
+++ b/builtin/stripspace.c
@@ -59,7 +59,7 @@ int cmd_stripspace(int argc, const char **argv, const char *prefix)
 
 	if (mode == STRIP_DEFAULT || mode == STRIP_COMMENTS)
 		strbuf_stripspace(&buf,
-			  mode == STRIP_COMMENTS ? comment_line_char : '\0');
+			  mode == STRIP_COMMENTS ? comment_line_str : NULL);
 	else
 		comment_lines(&buf);
 
diff --git a/builtin/tag.c b/builtin/tag.c
index 19a7e06bf4..07327d3c04 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -310,7 +310,7 @@ static void create_tag(const struct object_id *object, const char *object_ref,
 
 	if (opt->cleanup_mode != CLEANUP_NONE)
 		strbuf_stripspace(buf,
-		  opt->cleanup_mode == CLEANUP_ALL ? comment_line_char : '\0');
+		  opt->cleanup_mode == CLEANUP_ALL ? comment_line_str : NULL);
 
 	if (!opt->message_given && !buf->len)
 		die(_("no tag message?"));
diff --git a/rebase-interactive.c b/rebase-interactive.c
index d9718409b3..6dfc33e4e3 100644
--- a/rebase-interactive.c
+++ b/rebase-interactive.c
@@ -130,7 +130,7 @@ int edit_todo_list(struct repository *r, struct todo_list *todo_list,
 	if (launch_sequence_editor(todo_file, &new_todo->buf, NULL))
 		return -2;
 
-	strbuf_stripspace(&new_todo->buf, comment_line_char);
+	strbuf_stripspace(&new_todo->buf, comment_line_str);
 	if (initial && new_todo->buf.len == 0)
 		return -3;
 
diff --git a/sequencer.c b/sequencer.c
index f49a871ac0..6a1b7b200e 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1152,7 +1152,7 @@ void cleanup_message(struct strbuf *msgbuf,
 		strbuf_setlen(msgbuf, wt_status_locate_end(msgbuf->buf, msgbuf->len));
 	if (cleanup_mode != COMMIT_MSG_CLEANUP_NONE)
 		strbuf_stripspace(msgbuf,
-		  cleanup_mode == COMMIT_MSG_CLEANUP_ALL ? comment_line_char : '\0');
+		  cleanup_mode == COMMIT_MSG_CLEANUP_ALL ? comment_line_str : NULL);
 }
 
 /*
@@ -1184,7 +1184,7 @@ int template_untouched(const struct strbuf *sb, const char *template_file,
 		return 0;
 
 	strbuf_stripspace(&tmpl,
-	  cleanup_mode == COMMIT_MSG_CLEANUP_ALL ? comment_line_char : '\0');
+	  cleanup_mode == COMMIT_MSG_CLEANUP_ALL ? comment_line_str : NULL);
 	if (!skip_prefix(sb->buf, tmpl.buf, &start))
 		start = sb->buf;
 	strbuf_release(&tmpl);
@@ -1557,7 +1557,7 @@ static int try_to_commit(struct repository *r,
 
 	if (cleanup != COMMIT_MSG_CLEANUP_NONE)
 		strbuf_stripspace(msg,
-		  cleanup == COMMIT_MSG_CLEANUP_ALL ? comment_line_char : '\0');
+		  cleanup == COMMIT_MSG_CLEANUP_ALL ? comment_line_str : NULL);
 	if ((flags & EDIT_MSG) && message_is_empty(msg, cleanup)) {
 		res = 1; /* run 'git commit' to display error message */
 		goto out;
diff --git a/strbuf.c b/strbuf.c
index a33aed6c07..e9b6127e76 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -1001,10 +1001,10 @@ static size_t cleanup(char *line, size_t len)
  *
  * If last line does not have a newline at the end, one is added.
  *
- * Pass a non-NUL comment_prefix to skip every line starting
+ * Pass a non-NULL comment_prefix to skip every line starting
  * with it.
  */
-void strbuf_stripspace(struct strbuf *sb, char comment_prefix)
+void strbuf_stripspace(struct strbuf *sb, const char *comment_prefix)
 {
 	size_t empties = 0;
 	size_t i, j, len, newlen;
@@ -1018,7 +1018,7 @@ void strbuf_stripspace(struct strbuf *sb, char comment_prefix)
 		len = eol ? eol - (sb->buf + i) + 1 : sb->len - i;
 
 		if (comment_prefix && len &&
-		    sb->buf[i] == comment_prefix) {
+		    starts_with(sb->buf + i, comment_prefix)) {
 			newlen = 0;
 			continue;
 		}
diff --git a/strbuf.h b/strbuf.h
index 860fcec5fb..dc4710adbb 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -513,11 +513,11 @@ int strbuf_getcwd(struct strbuf *sb);
 int strbuf_normalize_path(struct strbuf *sb);
 
 /**
- * Strip whitespace from a buffer. If comment_prefix is non-NUL,
+ * Strip whitespace from a buffer. If comment_prefix is non-NULL,
  * then lines beginning with that character are considered comments,
  * thus removed.
  */
-void strbuf_stripspace(struct strbuf *buf, char comment_prefix);
+void strbuf_stripspace(struct strbuf *buf, const char *comment_prefix);
 
 static inline int strbuf_strip_suffix(struct strbuf *sb, const char *suffix)
 {
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 07/15] strbuf: accept a comment string for strbuf_commented_addf()
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (5 preceding siblings ...)
  2024-03-07  9:21             ` [PATCH 06/15] strbuf: accept a comment string for strbuf_stripspace() Jeff King
@ 2024-03-07  9:22             ` Jeff King
  2024-03-07  9:23             ` [PATCH 08/15] strbuf: accept a comment string for strbuf_add_commented_lines() Jeff King
                               ` (9 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:22 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

As part of our transition to multi-byte comment characters, let's take a
NUL-terminated string pointer for strbuf_commented_addf() rather than a
single character.

All of the callers have to be adjusted, but they can just pass
comment_line_str rather than comment_line_char.

Note that we rely on strbuf_add_commented_lines() under the hood, so
we'll cheat a bit to squeeze our string into a single character (for now
the two are equivalent, and we'll address this TODO in the next patch).

Signed-off-by: Jeff King <peff@peff.net>
---
 add-patch.c          |  8 ++++----
 builtin/branch.c     |  2 +-
 builtin/merge.c      |  8 ++++----
 builtin/tag.c        |  4 ++--
 rebase-interactive.c |  2 +-
 sequencer.c          |  4 ++--
 strbuf.c             | 10 ++++++++--
 strbuf.h             |  2 +-
 wt-status.c          |  2 +-
 9 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/add-patch.c b/add-patch.c
index 68f525b35c..7390677795 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -1105,11 +1105,11 @@ static int edit_hunk_manually(struct add_p_state *s, struct hunk *hunk)
 	size_t i;
 
 	strbuf_reset(&s->buf);
-	strbuf_commented_addf(&s->buf, comment_line_char,
+	strbuf_commented_addf(&s->buf, comment_line_str,
 			      _("Manual hunk edit mode -- see bottom for "
 				"a quick guide.\n"));
 	render_hunk(s, hunk, 0, 0, &s->buf);
-	strbuf_commented_addf(&s->buf, comment_line_char,
+	strbuf_commented_addf(&s->buf, comment_line_str,
 			      _("---\n"
 				"To remove '%c' lines, make them ' ' lines "
 				"(context).\n"
@@ -1118,13 +1118,13 @@ static int edit_hunk_manually(struct add_p_state *s, struct hunk *hunk)
 			      s->mode->is_reverse ? '+' : '-',
 			      s->mode->is_reverse ? '-' : '+',
 			      comment_line_char);
-	strbuf_commented_addf(&s->buf, comment_line_char, "%s",
+	strbuf_commented_addf(&s->buf, comment_line_str, "%s",
 			      _(s->mode->edit_hunk_hint));
 	/*
 	 * TRANSLATORS: 'it' refers to the patch mentioned in the previous
 	 * messages.
 	 */
-	strbuf_commented_addf(&s->buf, comment_line_char,
+	strbuf_commented_addf(&s->buf, comment_line_str,
 			      _("If it does not apply cleanly, you will be "
 				"given an opportunity to\n"
 				"edit again.  If all lines of the hunk are "
diff --git a/builtin/branch.c b/builtin/branch.c
index c03c0407d1..8904a1e5d9 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -667,7 +667,7 @@ static int edit_branch_description(const char *branch_name)
 	exists = !read_branch_desc(&buf, branch_name);
 	if (!buf.len || buf.buf[buf.len-1] != '\n')
 		strbuf_addch(&buf, '\n');
-	strbuf_commented_addf(&buf, comment_line_char,
+	strbuf_commented_addf(&buf, comment_line_str,
 		    _("Please edit the description for the branch\n"
 		      "  %s\n"
 		      "Lines starting with '%c' will be stripped.\n"),
diff --git a/builtin/merge.c b/builtin/merge.c
index 935c8a57dd..6d048fb628 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -852,15 +852,15 @@ static void prepare_to_commit(struct commit_list *remoteheads)
 		strbuf_addch(&msg, '\n');
 		if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS) {
 			wt_status_append_cut_line(&msg);
-			strbuf_commented_addf(&msg, comment_line_char, "\n");
+			strbuf_commented_addf(&msg, comment_line_str, "\n");
 		}
-		strbuf_commented_addf(&msg, comment_line_char,
+		strbuf_commented_addf(&msg, comment_line_str,
 				      _(merge_editor_comment));
 		if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS)
-			strbuf_commented_addf(&msg, comment_line_char,
+			strbuf_commented_addf(&msg, comment_line_str,
 					      _(scissors_editor_comment));
 		else
-			strbuf_commented_addf(&msg, comment_line_char,
+			strbuf_commented_addf(&msg, comment_line_str,
 				_(no_scissors_editor_comment), comment_line_char);
 	}
 	if (signoff)
diff --git a/builtin/tag.c b/builtin/tag.c
index 07327d3c04..1c708785bf 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -291,10 +291,10 @@ static void create_tag(const struct object_id *object, const char *object_ref,
 			struct strbuf buf = STRBUF_INIT;
 			strbuf_addch(&buf, '\n');
 			if (opt->cleanup_mode == CLEANUP_ALL)
-				strbuf_commented_addf(&buf, comment_line_char,
+				strbuf_commented_addf(&buf, comment_line_str,
 				      _(tag_template), tag, comment_line_char);
 			else
-				strbuf_commented_addf(&buf, comment_line_char,
+				strbuf_commented_addf(&buf, comment_line_str,
 				      _(tag_template_nocleanup), tag, comment_line_char);
 			write_or_die(fd, buf.buf, buf.len);
 			strbuf_release(&buf);
diff --git a/rebase-interactive.c b/rebase-interactive.c
index 6dfc33e4e3..affc93a8e4 100644
--- a/rebase-interactive.c
+++ b/rebase-interactive.c
@@ -71,7 +71,7 @@ void append_todo_help(int command_count,
 
 	if (!edit_todo) {
 		strbuf_addch(buf, '\n');
-		strbuf_commented_addf(buf, comment_line_char,
+		strbuf_commented_addf(buf, comment_line_str,
 				      Q_("Rebase %s onto %s (%d command)",
 					 "Rebase %s onto %s (%d commands)",
 					 command_count),
diff --git a/sequencer.c b/sequencer.c
index 6a1b7b200e..852c3f9f4e 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -667,11 +667,11 @@ void append_conflicts_hint(struct index_state *istate,
 	}
 
 	strbuf_addch(msgbuf, '\n');
-	strbuf_commented_addf(msgbuf, comment_line_char, "Conflicts:\n");
+	strbuf_commented_addf(msgbuf, comment_line_str, "Conflicts:\n");
 	for (i = 0; i < istate->cache_nr;) {
 		const struct cache_entry *ce = istate->cache[i++];
 		if (ce_stage(ce)) {
-			strbuf_commented_addf(msgbuf, comment_line_char,
+			strbuf_commented_addf(msgbuf, comment_line_str,
 					      "\t%s\n", ce->name);
 			while (i < istate->cache_nr &&
 			       !strcmp(ce->name, istate->cache[i]->name))
diff --git a/strbuf.c b/strbuf.c
index e9b6127e76..76d02e0920 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -368,7 +368,7 @@ void strbuf_add_commented_lines(struct strbuf *out, const char *buf,
 	add_lines(out, prefix, buf, size, 1);
 }
 
-void strbuf_commented_addf(struct strbuf *sb, char comment_prefix,
+void strbuf_commented_addf(struct strbuf *sb, const char *comment_prefix,
 			   const char *fmt, ...)
 {
 	va_list params;
@@ -379,7 +379,13 @@ void strbuf_commented_addf(struct strbuf *sb, char comment_prefix,
 	strbuf_vaddf(&buf, fmt, params);
 	va_end(params);
 
-	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_prefix);
+	/*
+	 * TODO Our commented_lines helper does not yet understand
+	 * comment strings. But since we know that the strings are
+	 * always single-char, we can cheat for the moment, and
+	 * fix this later.
+	 */
+	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_prefix[0]);
 	if (incomplete_line)
 		sb->buf[--sb->len] = '\0';
 
diff --git a/strbuf.h b/strbuf.h
index dc4710adbb..b128ca539a 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -379,7 +379,7 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...);
  * blank to the buffer.
  */
 __attribute__((format (printf, 3, 4)))
-void strbuf_commented_addf(struct strbuf *sb, char comment_prefix, const char *fmt, ...);
+void strbuf_commented_addf(struct strbuf *sb, const char *comment_prefix, const char *fmt, ...);
 
 __attribute__((format (printf,2,0)))
 void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap);
diff --git a/wt-status.c b/wt-status.c
index b5a29083df..2be2eb094c 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1103,7 +1103,7 @@ void wt_status_append_cut_line(struct strbuf *buf)
 {
 	const char *explanation = _("Do not modify or remove the line above.\nEverything below it will be ignored.");
 
-	strbuf_commented_addf(buf, comment_line_char, "%s", cut_line);
+	strbuf_commented_addf(buf, comment_line_str, "%s", cut_line);
 	strbuf_add_commented_lines(buf, explanation, strlen(explanation), comment_line_char);
 }
 
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 08/15] strbuf: accept a comment string for strbuf_add_commented_lines()
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (6 preceding siblings ...)
  2024-03-07  9:22             ` [PATCH 07/15] strbuf: accept a comment string for strbuf_commented_addf() Jeff King
@ 2024-03-07  9:23             ` Jeff King
  2024-03-07  9:23             ` [PATCH 09/15] prefer comment_line_str to comment_line_char for printing Jeff King
                               ` (8 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:23 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

As part of our transition to multi-byte comment characters, let's take a
NUL-terminated string pointer for strbuf_add_commented_lines() rather
than a single character.

All of the callers have to be adjusted; most can just pass
comment_line_str rather than comment_line_char.

And now our "cheat" in strbuf_commented_addf() can go away, as we can
take the full string from it.

Signed-off-by: Jeff King <peff@peff.net>
---
This could also be squashed into the previous patch. I wasn't sure if it
would be more overwhelming to have so many changes intermingled, or if
the "cheat" / "uncheat" back-and-forth would be too confusing. Pick your
poison.

 builtin/notes.c      |  8 ++++----
 builtin/stripspace.c |  2 +-
 fmt-merge-msg.c      |  6 +++---
 rebase-interactive.c |  6 +++---
 sequencer.c          |  8 ++++----
 strbuf.c             | 16 +++-------------
 strbuf.h             |  2 +-
 wt-status.c          |  4 ++--
 8 files changed, 21 insertions(+), 31 deletions(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index 5223a3f350..1a67f01d00 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -179,7 +179,7 @@ static void write_commented_object(int fd, const struct object_id *object)
 
 	if (strbuf_read(&buf, show.out, 0) < 0)
 		die_errno(_("could not read 'show' output"));
-	strbuf_add_commented_lines(&cbuf, buf.buf, buf.len, comment_line_char);
+	strbuf_add_commented_lines(&cbuf, buf.buf, buf.len, comment_line_str);
 	write_or_die(fd, cbuf.buf, cbuf.len);
 
 	strbuf_release(&cbuf);
@@ -207,10 +207,10 @@ static void prepare_note_data(const struct object_id *object, struct note_data *
 			copy_obj_to_fd(fd, old_note);
 
 		strbuf_addch(&buf, '\n');
-		strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_char);
+		strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_str);
 		strbuf_add_commented_lines(&buf, _(note_template), strlen(_(note_template)),
-					   comment_line_char);
-		strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_char);
+					   comment_line_str);
+		strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_str);
 		write_or_die(fd, buf.buf, buf.len);
 
 		write_commented_object(fd, object);
diff --git a/builtin/stripspace.c b/builtin/stripspace.c
index 434ac490cb..e5626e5126 100644
--- a/builtin/stripspace.c
+++ b/builtin/stripspace.c
@@ -13,7 +13,7 @@ static void comment_lines(struct strbuf *buf)
 	size_t len;
 
 	msg = strbuf_detach(buf, &len);
-	strbuf_add_commented_lines(buf, msg, len, comment_line_char);
+	strbuf_add_commented_lines(buf, msg, len, comment_line_str);
 	free(msg);
 }
 
diff --git a/fmt-merge-msg.c b/fmt-merge-msg.c
index 66e47449a0..79e8aad086 100644
--- a/fmt-merge-msg.c
+++ b/fmt-merge-msg.c
@@ -510,7 +510,7 @@ static void fmt_tag_signature(struct strbuf *tagbuf,
 	if (sig->len) {
 		strbuf_addch(tagbuf, '\n');
 		strbuf_add_commented_lines(tagbuf, sig->buf, sig->len,
-					   comment_line_char);
+					   comment_line_str);
 	}
 }
 
@@ -557,7 +557,7 @@ static void fmt_merge_msg_sigs(struct strbuf *out)
 				strbuf_add_commented_lines(&tagline,
 						origins.items[first_tag].string,
 						strlen(origins.items[first_tag].string),
-						comment_line_char);
+						comment_line_str);
 				strbuf_insert(&tagbuf, 0, tagline.buf,
 					      tagline.len);
 				strbuf_release(&tagline);
@@ -566,7 +566,7 @@ static void fmt_merge_msg_sigs(struct strbuf *out)
 			strbuf_add_commented_lines(&tagbuf,
 					origins.items[i].string,
 					strlen(origins.items[i].string),
-					comment_line_char);
+					comment_line_str);
 			fmt_tag_signature(&tagbuf, &sig, buf, len);
 		}
 		strbuf_release(&payload);
diff --git a/rebase-interactive.c b/rebase-interactive.c
index affc93a8e4..c343e16fcd 100644
--- a/rebase-interactive.c
+++ b/rebase-interactive.c
@@ -78,7 +78,7 @@ void append_todo_help(int command_count,
 				      shortrevisions, shortonto, command_count);
 	}
 
-	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_char);
+	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_str);
 
 	if (get_missing_commit_check_level() == MISSING_COMMIT_CHECK_ERROR)
 		msg = _("\nDo not remove any line. Use 'drop' "
@@ -87,7 +87,7 @@ void append_todo_help(int command_count,
 		msg = _("\nIf you remove a line here "
 			 "THAT COMMIT WILL BE LOST.\n");
 
-	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_char);
+	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_str);
 
 	if (edit_todo)
 		msg = _("\nYou are editing the todo file "
@@ -98,7 +98,7 @@ void append_todo_help(int command_count,
 		msg = _("\nHowever, if you remove everything, "
 			"the rebase will be aborted.\n\n");
 
-	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_char);
+	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_str);
 }
 
 int edit_todo_list(struct repository *r, struct todo_list *todo_list,
diff --git a/sequencer.c b/sequencer.c
index 852c3f9f4e..032e213a3f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1851,7 +1851,7 @@ static void add_commented_lines(struct strbuf *buf, const void *str, size_t len)
 		s += count;
 		len -= count;
 	}
-	strbuf_add_commented_lines(buf, s, len, comment_line_char);
+	strbuf_add_commented_lines(buf, s, len, comment_line_str);
 }
 
 /* Does the current fixup chain contain a squash command? */
@@ -1950,7 +1950,7 @@ static int append_squash_message(struct strbuf *buf, const char *body,
 	strbuf_addf(buf, _(nth_commit_msg_fmt),
 		    ++opts->current_fixup_count + 1);
 	strbuf_addstr(buf, "\n\n");
-	strbuf_add_commented_lines(buf, body, commented_len, comment_line_char);
+	strbuf_add_commented_lines(buf, body, commented_len, comment_line_str);
 	/* buf->buf may be reallocated so store an offset into the buffer */
 	fixup_off = buf->len;
 	strbuf_addstr(buf, body + commented_len);
@@ -2041,7 +2041,7 @@ static int update_squash_messages(struct repository *r,
 		strbuf_addstr(&buf, "\n\n");
 		if (is_fixup_flag(command, flag))
 			strbuf_add_commented_lines(&buf, body, strlen(body),
-						   comment_line_char);
+						   comment_line_str);
 		else
 			strbuf_addstr(&buf, body);
 
@@ -2061,7 +2061,7 @@ static int update_squash_messages(struct repository *r,
 			    ++opts->current_fixup_count + 1);
 		strbuf_addstr(&buf, "\n\n");
 		strbuf_add_commented_lines(&buf, body, strlen(body),
-					   comment_line_char);
+					   comment_line_str);
 	} else
 		return error(_("unknown command: %d"), command);
 	repo_unuse_commit_buffer(r, commit, message);
diff --git a/strbuf.c b/strbuf.c
index 76d02e0920..7c8f582127 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -359,13 +359,9 @@ static void add_lines(struct strbuf *out,
 }
 
 void strbuf_add_commented_lines(struct strbuf *out, const char *buf,
-				size_t size, char comment_prefix)
+				size_t size, const char *comment_prefix)
 {
-	char prefix[2];
-
-	prefix[0] = comment_prefix;
-	prefix[1] = '\0';
-	add_lines(out, prefix, buf, size, 1);
+	add_lines(out, comment_prefix, buf, size, 1);
 }
 
 void strbuf_commented_addf(struct strbuf *sb, const char *comment_prefix,
@@ -379,13 +375,7 @@ void strbuf_commented_addf(struct strbuf *sb, const char *comment_prefix,
 	strbuf_vaddf(&buf, fmt, params);
 	va_end(params);
 
-	/*
-	 * TODO Our commented_lines helper does not yet understand
-	 * comment strings. But since we know that the strings are
-	 * always single-char, we can cheat for the moment, and
-	 * fix this later.
-	 */
-	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_prefix[0]);
+	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_prefix);
 	if (incomplete_line)
 		sb->buf[--sb->len] = '\0';
 
diff --git a/strbuf.h b/strbuf.h
index b128ca539a..58dddf2777 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -288,7 +288,7 @@ void strbuf_splice(struct strbuf *sb, size_t pos, size_t len,
  */
 void strbuf_add_commented_lines(struct strbuf *out,
 				const char *buf, size_t size,
-				char comment_prefix);
+				const char *comment_prefix);
 
 
 /**
diff --git a/wt-status.c b/wt-status.c
index 2be2eb094c..6b81f5349c 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1028,7 +1028,7 @@ static void wt_longstatus_print_submodule_summary(struct wt_status *s, int uncom
 	if (s->display_comment_prefix) {
 		size_t len;
 		summary_content = strbuf_detach(&summary, &len);
-		strbuf_add_commented_lines(&summary, summary_content, len, comment_line_char);
+		strbuf_add_commented_lines(&summary, summary_content, len, comment_line_str);
 		free(summary_content);
 	}
 
@@ -1104,7 +1104,7 @@ void wt_status_append_cut_line(struct strbuf *buf)
 	const char *explanation = _("Do not modify or remove the line above.\nEverything below it will be ignored.");
 
 	strbuf_commented_addf(buf, comment_line_str, "%s", cut_line);
-	strbuf_add_commented_lines(buf, explanation, strlen(explanation), comment_line_char);
+	strbuf_add_commented_lines(buf, explanation, strlen(explanation), comment_line_str);
 }
 
 void wt_status_add_cut_line(FILE *fp)
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 09/15] prefer comment_line_str to comment_line_char for printing
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (7 preceding siblings ...)
  2024-03-07  9:23             ` [PATCH 08/15] strbuf: accept a comment string for strbuf_add_commented_lines() Jeff King
@ 2024-03-07  9:23             ` Jeff King
  2024-03-07  9:24             ` [PATCH 10/15] find multi-byte comment chars in NUL-terminated strings Jeff King
                               ` (7 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:23 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

As part of our transition to multi-byte comment characters, we should
use the string variable rather than the historical character variable.
All of the sites adjusted here are just swapping out "%c" for "%s" in
format strings, or strbuf_addch() for strbuf_addstr(). The type system
and printf-attribute give the compiler enough information to make sure
our formats and variable changes all match (especially important for
cases where the format string is defined far away from its use, like
prepare_to_commit() in commit.c).

Signed-off-by: Jeff King <peff@peff.net>
---
 add-patch.c      |  4 ++--
 builtin/branch.c |  4 ++--
 builtin/commit.c | 12 ++++++------
 builtin/merge.c  |  4 ++--
 builtin/tag.c    |  8 ++++----
 fmt-merge-msg.c  |  2 +-
 sequencer.c      | 20 ++++++++++----------
 wt-status.c      | 10 +++++-----
 8 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/add-patch.c b/add-patch.c
index 7390677795..4a10237d50 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -1114,10 +1114,10 @@ static int edit_hunk_manually(struct add_p_state *s, struct hunk *hunk)
 				"To remove '%c' lines, make them ' ' lines "
 				"(context).\n"
 				"To remove '%c' lines, delete them.\n"
-				"Lines starting with %c will be removed.\n"),
+				"Lines starting with %s will be removed.\n"),
 			      s->mode->is_reverse ? '+' : '-',
 			      s->mode->is_reverse ? '-' : '+',
-			      comment_line_char);
+			      comment_line_str);
 	strbuf_commented_addf(&s->buf, comment_line_str, "%s",
 			      _(s->mode->edit_hunk_hint));
 	/*
diff --git a/builtin/branch.c b/builtin/branch.c
index 8904a1e5d9..1cdcae8454 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -670,8 +670,8 @@ static int edit_branch_description(const char *branch_name)
 	strbuf_commented_addf(&buf, comment_line_str,
 		    _("Please edit the description for the branch\n"
 		      "  %s\n"
-		      "Lines starting with '%c' will be stripped.\n"),
-		    branch_name, comment_line_char);
+		      "Lines starting with '%s' will be stripped.\n"),
+		    branch_name, comment_line_str);
 	write_file_buf(edit_description(), buf.buf, buf.len);
 	strbuf_reset(&buf);
 	if (launch_editor(edit_description(), &buf, NULL)) {
diff --git a/builtin/commit.c b/builtin/commit.c
index d8abbe48b1..8519a004d0 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -910,18 +910,18 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		struct ident_split ci, ai;
 		const char *hint_cleanup_all = allow_empty_message ?
 			_("Please enter the commit message for your changes."
-			  " Lines starting\nwith '%c' will be ignored.\n") :
+			  " Lines starting\nwith '%s' will be ignored.\n") :
 			_("Please enter the commit message for your changes."
-			  " Lines starting\nwith '%c' will be ignored, and an empty"
+			  " Lines starting\nwith '%s' will be ignored, and an empty"
 			  " message aborts the commit.\n");
 		const char *hint_cleanup_space = allow_empty_message ?
 			_("Please enter the commit message for your changes."
 			  " Lines starting\n"
-			  "with '%c' will be kept; you may remove them"
+			  "with '%s' will be kept; you may remove them"
 			  " yourself if you want to.\n") :
 			_("Please enter the commit message for your changes."
 			  " Lines starting\n"
-			  "with '%c' will be kept; you may remove them"
+			  "with '%s' will be kept; you may remove them"
 			  " yourself if you want to.\n"
 			  "An empty message aborts the commit.\n");
 		if (whence != FROM_COMMIT) {
@@ -945,12 +945,12 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 
 		fprintf(s->fp, "\n");
 		if (cleanup_mode == COMMIT_MSG_CLEANUP_ALL)
-			status_printf(s, GIT_COLOR_NORMAL, hint_cleanup_all, comment_line_char);
+			status_printf(s, GIT_COLOR_NORMAL, hint_cleanup_all, comment_line_str);
 		else if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS) {
 			if (whence == FROM_COMMIT && !merge_contains_scissors)
 				wt_status_add_cut_line(s->fp);
 		} else /* COMMIT_MSG_CLEANUP_SPACE, that is. */
-			status_printf(s, GIT_COLOR_NORMAL, hint_cleanup_space, comment_line_char);
+			status_printf(s, GIT_COLOR_NORMAL, hint_cleanup_space, comment_line_str);
 
 		/*
 		 * These should never fail because they come from our own
diff --git a/builtin/merge.c b/builtin/merge.c
index 6d048fb628..ba4308883f 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -821,7 +821,7 @@ static const char scissors_editor_comment[] =
 N_("An empty message aborts the commit.\n");
 
 static const char no_scissors_editor_comment[] =
-N_("Lines starting with '%c' will be ignored, and an empty message aborts\n"
+N_("Lines starting with '%s' will be ignored, and an empty message aborts\n"
    "the commit.\n");
 
 static void write_merge_heads(struct commit_list *);
@@ -861,7 +861,7 @@ static void prepare_to_commit(struct commit_list *remoteheads)
 					      _(scissors_editor_comment));
 		else
 			strbuf_commented_addf(&msg, comment_line_str,
-				_(no_scissors_editor_comment), comment_line_char);
+				_(no_scissors_editor_comment), comment_line_str);
 	}
 	if (signoff)
 		append_signoff(&msg, ignored_log_message_bytes(msg.buf, msg.len), 0);
diff --git a/builtin/tag.c b/builtin/tag.c
index 1c708785bf..721d07a589 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -158,11 +158,11 @@ static int do_sign(struct strbuf *buffer)
 
 static const char tag_template[] =
 	N_("\nWrite a message for tag:\n  %s\n"
-	"Lines starting with '%c' will be ignored.\n");
+	"Lines starting with '%s' will be ignored.\n");
 
 static const char tag_template_nocleanup[] =
 	N_("\nWrite a message for tag:\n  %s\n"
-	"Lines starting with '%c' will be kept; you may remove them"
+	"Lines starting with '%s' will be kept; you may remove them"
 	" yourself if you want to.\n");
 
 static int git_tag_config(const char *var, const char *value,
@@ -292,10 +292,10 @@ static void create_tag(const struct object_id *object, const char *object_ref,
 			strbuf_addch(&buf, '\n');
 			if (opt->cleanup_mode == CLEANUP_ALL)
 				strbuf_commented_addf(&buf, comment_line_str,
-				      _(tag_template), tag, comment_line_char);
+				      _(tag_template), tag, comment_line_str);
 			else
 				strbuf_commented_addf(&buf, comment_line_str,
-				      _(tag_template_nocleanup), tag, comment_line_char);
+				      _(tag_template_nocleanup), tag, comment_line_str);
 			write_or_die(fd, buf.buf, buf.len);
 			strbuf_release(&buf);
 		}
diff --git a/fmt-merge-msg.c b/fmt-merge-msg.c
index 79e8aad086..ae201e21db 100644
--- a/fmt-merge-msg.c
+++ b/fmt-merge-msg.c
@@ -321,7 +321,7 @@ static void credit_people(struct strbuf *out,
 	     skip_prefix(me, them->items->string, &me) &&
 	     starts_with(me, " <")))
 		return;
-	strbuf_addf(out, "\n%c %s ", comment_line_char, label);
+	strbuf_addf(out, "\n%s %s ", comment_line_str, label);
 	add_people_count(out, them);
 }
 
diff --git a/sequencer.c b/sequencer.c
index 032e213a3f..241e185f87 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -663,7 +663,7 @@ void append_conflicts_hint(struct index_state *istate,
 	if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS) {
 		strbuf_addch(msgbuf, '\n');
 		wt_status_append_cut_line(msgbuf);
-		strbuf_addch(msgbuf, comment_line_char);
+		strbuf_addstr(msgbuf, comment_line_str);
 	}
 
 	strbuf_addch(msgbuf, '\n');
@@ -1946,7 +1946,7 @@ static int append_squash_message(struct strbuf *buf, const char *body,
 	     (starts_with(body, "squash!") || starts_with(body, "fixup!"))))
 		commented_len = commit_subject_length(body);
 
-	strbuf_addf(buf, "\n%c ", comment_line_char);
+	strbuf_addf(buf, "\n%s ", comment_line_str);
 	strbuf_addf(buf, _(nth_commit_msg_fmt),
 		    ++opts->current_fixup_count + 1);
 	strbuf_addstr(buf, "\n\n");
@@ -2006,7 +2006,7 @@ static int update_squash_messages(struct repository *r,
 		eol = buf.buf[0] != comment_line_char ?
 			buf.buf : strchrnul(buf.buf, '\n');
 
-		strbuf_addf(&header, "%c ", comment_line_char);
+		strbuf_addf(&header, "%s ", comment_line_str);
 		strbuf_addf(&header, _(combined_commit_msg_fmt),
 			    opts->current_fixup_count + 2);
 		strbuf_splice(&buf, 0, eol - buf.buf, header.buf, header.len);
@@ -2032,9 +2032,9 @@ static int update_squash_messages(struct repository *r,
 			repo_unuse_commit_buffer(r, head_commit, head_message);
 			return error(_("cannot write '%s'"), rebase_path_fixup_msg());
 		}
-		strbuf_addf(&buf, "%c ", comment_line_char);
+		strbuf_addf(&buf, "%s ", comment_line_str);
 		strbuf_addf(&buf, _(combined_commit_msg_fmt), 2);
-		strbuf_addf(&buf, "\n%c ", comment_line_char);
+		strbuf_addf(&buf, "\n%s ", comment_line_str);
 		strbuf_addstr(&buf, is_fixup_flag(command, flag) ?
 			      _(skip_first_commit_msg_str) :
 			      _(first_commit_msg_str));
@@ -2056,7 +2056,7 @@ static int update_squash_messages(struct repository *r,
 	if (command == TODO_SQUASH || is_fixup_flag(command, flag)) {
 		res = append_squash_message(&buf, body, command, opts, flag);
 	} else if (command == TODO_FIXUP) {
-		strbuf_addf(&buf, "\n%c ", comment_line_char);
+		strbuf_addf(&buf, "\n%s ", comment_line_str);
 		strbuf_addf(&buf, _(skip_nth_commit_msg_fmt),
 			    ++opts->current_fixup_count + 1);
 		strbuf_addstr(&buf, "\n\n");
@@ -5659,8 +5659,8 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 				    oid_to_hex(&commit->object.oid),
 				    oneline.buf);
 			if (is_empty)
-				strbuf_addf(&buf, " %c empty",
-					    comment_line_char);
+				strbuf_addf(&buf, " %s empty",
+					    comment_line_str);
 
 			FLEX_ALLOC_STR(entry, string, buf.buf);
 			oidcpy(&entry->entry.oid, &commit->object.oid);
@@ -5750,7 +5750,7 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 		entry = oidmap_get(&state.commit2label, &commit->object.oid);
 
 		if (entry)
-			strbuf_addf(out, "\n%c Branch %s\n", comment_line_char, entry->string);
+			strbuf_addf(out, "\n%s Branch %s\n", comment_line_str, entry->string);
 		else
 			strbuf_addch(out, '\n');
 
@@ -5887,7 +5887,7 @@ int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
 			    oid_to_hex(&commit->object.oid));
 		pretty_print_commit(&pp, commit, out);
 		if (is_empty)
-			strbuf_addf(out, " %c empty", comment_line_char);
+			strbuf_addf(out, " %s empty", comment_line_str);
 		strbuf_addch(out, '\n');
 	}
 	if (skipped_commit)
diff --git a/wt-status.c b/wt-status.c
index 6b81f5349c..b66c30775b 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -70,7 +70,7 @@ static void status_vprintf(struct wt_status *s, int at_bol, const char *color,
 	strbuf_vaddf(&sb, fmt, ap);
 	if (!sb.len) {
 		if (s->display_comment_prefix) {
-			strbuf_addch(&sb, comment_line_char);
+			strbuf_addstr(&sb, comment_line_str);
 			if (!trail)
 				strbuf_addch(&sb, ' ');
 		}
@@ -85,7 +85,7 @@ static void status_vprintf(struct wt_status *s, int at_bol, const char *color,
 
 		strbuf_reset(&linebuf);
 		if (at_bol && s->display_comment_prefix) {
-			strbuf_addch(&linebuf, comment_line_char);
+			strbuf_addstr(&linebuf, comment_line_str);
 			if (*line != '\n' && *line != '\t')
 				strbuf_addch(&linebuf, ' ');
 		}
@@ -1090,7 +1090,7 @@ size_t wt_status_locate_end(const char *s, size_t len)
 	const char *p;
 	struct strbuf pattern = STRBUF_INIT;
 
-	strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line);
+	strbuf_addf(&pattern, "\n%s %s", comment_line_str, cut_line);
 	if (starts_with(s, pattern.buf + 1))
 		len = 0;
 	else if ((p = strstr(s, pattern.buf)))
@@ -1214,8 +1214,8 @@ static void wt_longstatus_print_tracking(struct wt_status *s)
 				 "%s%.*s", comment_line_string,
 				 (int)(ep - cp), cp);
 	if (s->display_comment_prefix)
-		color_fprintf_ln(s->fp, color(WT_STATUS_HEADER, s), "%c",
-				 comment_line_char);
+		color_fprintf_ln(s->fp, color(WT_STATUS_HEADER, s), "%s",
+				 comment_line_str);
 	else
 		fputs("\n", s->fp);
 	strbuf_release(&sb);
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 10/15] find multi-byte comment chars in NUL-terminated strings
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (8 preceding siblings ...)
  2024-03-07  9:23             ` [PATCH 09/15] prefer comment_line_str to comment_line_char for printing Jeff King
@ 2024-03-07  9:24             ` Jeff King
  2024-03-07  9:26             ` [PATCH 11/15] find multi-byte comment chars in unterminated buffers Jeff King
                               ` (6 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:24 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

Several parts of the code need to identify lines that begin with the
comment character, and do so with a simple byte equality check. As part
of the transition to handling multi-byte characters, we need to match
all of the bytes. For cases where we are looking in a NUL-terminated
string, we can just use starts_with(), which checks all of the
characters in comment_line_str.

Note that we can drop the "line.len" check in wt-status.c's
read_rebase_todolist(). The starts_with() function handles the case of
an empty haystack buffer (it will always return false for a non-empty
prefix).

Signed-off-by: Jeff King <peff@peff.net>
---
I think the main way these hunks could be wrong is if the buffer is not
in fact NUL-terminated. In most cases we're working with a strbuf,
though.

 add-patch.c | 2 +-
 sequencer.c | 2 +-
 trailer.c   | 2 +-
 wt-status.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/add-patch.c b/add-patch.c
index 4a10237d50..d599ca53e1 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -1139,7 +1139,7 @@ static int edit_hunk_manually(struct add_p_state *s, struct hunk *hunk)
 	for (i = 0; i < s->buf.len; ) {
 		size_t next = find_next_line(&s->buf, i);
 
-		if (s->buf.buf[i] != comment_line_char)
+		if (!starts_with(s->buf.buf + i, comment_line_str))
 			strbuf_add(&s->plain, s->buf.buf + i, next - i);
 		i = next;
 	}
diff --git a/sequencer.c b/sequencer.c
index 241e185f87..991a2dbe96 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2003,7 +2003,7 @@ static int update_squash_messages(struct repository *r,
 			return error(_("could not read '%s'"),
 				rebase_path_squash_msg());
 
-		eol = buf.buf[0] != comment_line_char ?
+		eol = !starts_with(buf.buf, comment_line_str) ?
 			buf.buf : strchrnul(buf.buf, '\n');
 
 		strbuf_addf(&header, "%s ", comment_line_str);
diff --git a/trailer.c b/trailer.c
index ef9df4af55..fe18faf6c5 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1013,7 +1013,7 @@ static void parse_trailers(struct trailer_info *info,
 	for (i = 0; i < info->trailer_nr; i++) {
 		int separator_pos;
 		char *trailer = info->trailers[i];
-		if (trailer[0] == comment_line_char)
+		if (starts_with(trailer, comment_line_str))
 			continue;
 		separator_pos = find_separator(trailer, separators);
 		if (separator_pos >= 1) {
diff --git a/wt-status.c b/wt-status.c
index b66c30775b..084bfc584f 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1382,7 +1382,7 @@ static int read_rebase_todolist(const char *fname, struct string_list *lines)
 			  git_path("%s", fname));
 	}
 	while (!strbuf_getline_lf(&line, f)) {
-		if (line.len && line.buf[0] == comment_line_char)
+		if (starts_with(line.buf, comment_line_str))
 			continue;
 		strbuf_trim(&line);
 		if (!line.len)
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (9 preceding siblings ...)
  2024-03-07  9:24             ` [PATCH 10/15] find multi-byte comment chars in NUL-terminated strings Jeff King
@ 2024-03-07  9:26             ` Jeff King
  2024-03-07 11:08               ` Jeff King
  2024-03-07 19:42               ` René Scharfe
  2024-03-07  9:27             ` [PATCH 12/15] sequencer: handle multi-byte comment characters when writing todo list Jeff King
                               ` (5 subsequent siblings)
  16 siblings, 2 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:26 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Junio C Hamano, Dragan Simic,
	Kristoffer Haugsbakk, Manlio Perillo

As with the previous patch, we need to swap out single-byte matching for
something like starts_with() to match all bytes of a multi-byte comment
character. But for cases where the buffer is not NUL-terminated (and we
instead have an explicit size or end pointer), it's not safe to use
starts_with(), as it might walk off the end of the buffer.

Let's introduce a new starts_with_mem() that does the same thing but
also accepts the length of the "haystack" str and makes sure not to walk
past it.

Note that in most cases the existing code did not need a length check at
all, since it was written in a way that knew we had at least one byte
available (and that was all we checked). So I had to read each one to
find the appropriate bounds. The one exception is sequencer.c's
add_commented_lines(), where we can actually get rid of the length
check. Just like starts_with(), our starts_with_mem() handles an empty
haystack variable by not matching (assuming a non-empty prefix).

A few notes on the implementation of starts_with_mem():

  - it would be equally correct to take an "end" pointer (and indeed,
    many of the callers have this and have to subtract to come up with
    the length). I think taking a ptr/size combo is a more usual
    interface for our codebase, though, and has the added benefit that
    the function signature makes it harder to mix up the three
    parameters.

  - we could obviously build starts_with() on top of this by passing
    strlen(str) as the length. But it's possible that starts_with() is a
    relatively hot code path, and it should not pay that penalty (it can
    generally return an answer proportional to the size of the prefix,
    not the whole string).

  - it naively feels like xstrncmpz() should be able to do the same
    thing, but that's not quite true. If you pass the length of the
    haystack buffer, then strncmp() finds that a shorter prefix string
    is "less than" than the haystack, even if the haystack starts with
    the prefix. If you pass the length of the prefix, then you risk
    reading past the end of the haystack if it is shorter than the
    prefix. So I think we really do need a new function.

Signed-off-by: Jeff King <peff@peff.net>
---
Arguably starts_with() and this new function should both be inlined,
like we do for skip_prefix(), but I think that's out of scope for this
series.

And it's possible I was simply too dumb to figure out xstrncmpz() here.
I'm waiting for René to show up and tell me how to do it. ;)

IMHO this is the trickiest commit of the whole series, as it would be
easy to get the length computations subtly wrong.

 commit.c    |  3 ++-
 sequencer.c |  4 ++--
 strbuf.c    | 11 +++++++++++
 strbuf.h    |  1 +
 trailer.c   |  4 ++--
 5 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/commit.c b/commit.c
index ef679a0b93..531a666cba 100644
--- a/commit.c
+++ b/commit.c
@@ -1796,7 +1796,8 @@ size_t ignored_log_message_bytes(const char *buf, size_t len)
 		else
 			next_line++;
 
-		if (buf[bol] == comment_line_char || buf[bol] == '\n') {
+		if (starts_with_mem(buf + bol, cutoff - bol, comment_line_str) ||
+		    buf[bol] == '\n') {
 			/* is this the first of the run of comments? */
 			if (!boc)
 				boc = bol;
diff --git a/sequencer.c b/sequencer.c
index 991a2dbe96..664986e3b2 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1840,7 +1840,7 @@ static int is_fixup_flag(enum todo_command command, unsigned flag)
 static void add_commented_lines(struct strbuf *buf, const void *str, size_t len)
 {
 	const char *s = str;
-	while (len > 0 && s[0] == comment_line_char) {
+	while (starts_with_mem(s, len, comment_line_str)) {
 		size_t count;
 		const char *n = memchr(s, '\n', len);
 		if (!n)
@@ -2562,7 +2562,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
 	/* left-trim */
 	bol += strspn(bol, " \t");
 
-	if (bol == eol || *bol == '\r' || *bol == comment_line_char) {
+	if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) {
 		item->command = TODO_COMMENT;
 		item->commit = NULL;
 		item->arg_offset = bol - buf;
diff --git a/strbuf.c b/strbuf.c
index 7c8f582127..291bdc2a65 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -24,6 +24,17 @@ int istarts_with(const char *str, const char *prefix)
 			return 0;
 }
 
+int starts_with_mem(const char *str, size_t len, const char *prefix)
+{
+	const char *end = str + len;
+	for (; ; str++, prefix++) {
+		if (!*prefix)
+			return 1;
+		else if (str == end || *str != *prefix)
+			return 0;
+	}
+}
+
 int skip_to_optional_arg_default(const char *str, const char *prefix,
 				 const char **arg, const char *def)
 {
diff --git a/strbuf.h b/strbuf.h
index 58dddf2777..3156d6ea8c 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -673,6 +673,7 @@ char *xstrfmt(const char *fmt, ...);
 
 int starts_with(const char *str, const char *prefix);
 int istarts_with(const char *str, const char *prefix);
+int starts_with_mem(const char *str, size_t len, const char *prefix);
 
 /*
  * If the string "str" is the same as the string in "prefix", then the "arg"
diff --git a/trailer.c b/trailer.c
index fe18faf6c5..f59c90b4b5 100644
--- a/trailer.c
+++ b/trailer.c
@@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
 
 	/* The first paragraph is the title and cannot be trailers */
 	for (s = buf; s < buf + len; s = next_line(s)) {
-		if (s[0] == comment_line_char)
+		if (starts_with_mem(s, buf + len - s, comment_line_str))
 			continue;
 		if (is_blank_line(s))
 			break;
@@ -902,7 +902,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
 		const char **p;
 		ssize_t separator_pos;
 
-		if (bol[0] == comment_line_char) {
+		if (starts_with_mem(bol, buf + end_of_title - bol, comment_line_str)) {
 			non_trailer_lines += possible_continuation_lines;
 			possible_continuation_lines = 0;
 			continue;
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 12/15] sequencer: handle multi-byte comment characters when writing todo list
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (10 preceding siblings ...)
  2024-03-07  9:26             ` [PATCH 11/15] find multi-byte comment chars in unterminated buffers Jeff King
@ 2024-03-07  9:27             ` Jeff King
  2024-03-08 10:20               ` Phillip Wood
  2024-03-07  9:28             ` [PATCH 13/15] wt-status: drop custom comment-char stringification Jeff King
                               ` (4 subsequent siblings)
  16 siblings, 1 reply; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:27 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

We already match multi-byte comment characters in parse_insn_line(),
thanks to the previous commit, yielding a TODO_COMMENT entry. But in
todo_list_to_strbuf(), we may call command_to_char() to convert that
back into something we can output.

We can't just return comment_line_char anymore, since it may require
multiple bytes. Instead, we'll return "0" for this case, which is the
same thing we'd return for a command which does not have a single-letter
abbreviation (e.g., "revert" or "noop"). In that case the caller then
falls back to outputting the full name via command_to_string(). So we
can handle TODO_COMMENT there, returning the full string.

Note that there are many other callers of command_to_string(), which
will now behave differently if they pass TODO_COMMENT. But we would not
expect that to happen; prior to this commit, the function just calls
die() in this case. And looking at those callers, that makes sense;
e.g., do_pick_commit() will only be called when servicing a pick
command, and should never be called for a comment in the first place.

Signed-off-by: Jeff King <peff@peff.net>
---
 sequencer.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index 664986e3b2..9e2851428b 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1779,14 +1779,16 @@ static const char *command_to_string(const enum todo_command command)
 {
 	if (command < TODO_COMMENT)
 		return todo_command_info[command].str;
+	if (command == TODO_COMMENT)
+		return comment_line_str;
 	die(_("unknown command: %d"), command);
 }
 
 static char command_to_char(const enum todo_command command)
 {
 	if (command < TODO_COMMENT)
 		return todo_command_info[command].c;
-	return comment_line_char;
+	return 0;
 }
 
 static int is_noop(const enum todo_command command)
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 13/15] wt-status: drop custom comment-char stringification
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (11 preceding siblings ...)
  2024-03-07  9:27             ` [PATCH 12/15] sequencer: handle multi-byte comment characters when writing todo list Jeff King
@ 2024-03-07  9:28             ` Jeff King
  2024-03-07  9:30             ` [PATCH 14/15] environment: drop comment_line_char compatibility macro Jeff King
                               ` (3 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:28 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

In wt_longstatus_print_tracking() we may conditionally show a comment
prefix based on the wt_status->display_comment_prefix flag. We handle
that by creating a local "comment_line_string" that is either the empty
string or the comment character followed by a space.

For a single-byte comment, the maximum length of this string is 2 (plus
a NUL byte). But to handle multi-byte comment characters, it can be
arbitrarily large. One way to handle this is to just call
xstrfmt("%s ", comment_line_str), and then free it when we're done.

But we can simplify things further by just conditionally switching
between our prefix string and an empty string when formatting. We
couldn't just do that with the previous code, because the comment
character was a single byte. There's no way to have a "%c" format switch
between some character and "no character at all". Whereas with "%s" you
can switch between some string and the empty string. So now that we have
a comment string and not a comment char, we can just use it directly
when formatting. Do note that we have to also conditionally add the
trailing space at the same time.

Signed-off-by: Jeff King <peff@peff.net>
---
I had hoped to clean this up as a preparatory commit, but it really is
awkward until we can make use of comment_line_str, for the reasons given
above.

 wt-status.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 084bfc584f..823e8e81b0 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1176,8 +1176,6 @@ static void wt_longstatus_print_tracking(struct wt_status *s)
 	struct strbuf sb = STRBUF_INIT;
 	const char *cp, *ep, *branch_name;
 	struct branch *branch;
-	char comment_line_string[3];
-	int i;
 	uint64_t t_begin = 0;
 
 	assert(s->branch && !s->is_initial);
@@ -1202,16 +1200,11 @@ static void wt_longstatus_print_tracking(struct wt_status *s)
 		}
 	}
 
-	i = 0;
-	if (s->display_comment_prefix) {
-		comment_line_string[i++] = comment_line_char;
-		comment_line_string[i++] = ' ';
-	}
-	comment_line_string[i] = '\0';
-
 	for (cp = sb.buf; (ep = strchr(cp, '\n')) != NULL; cp = ep + 1)
 		color_fprintf_ln(s->fp, color(WT_STATUS_HEADER, s),
-				 "%s%.*s", comment_line_string,
+				 "%s%s%.*s",
+				 s->display_comment_prefix ? comment_line_str : "",
+				 s->display_comment_prefix ? " " : "",
 				 (int)(ep - cp), cp);
 	if (s->display_comment_prefix)
 		color_fprintf_ln(s->fp, color(WT_STATUS_HEADER, s), "%s",
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 14/15] environment: drop comment_line_char compatibility macro
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (12 preceding siblings ...)
  2024-03-07  9:28             ` [PATCH 13/15] wt-status: drop custom comment-char stringification Jeff King
@ 2024-03-07  9:30             ` Jeff King
  2024-03-07  9:34             ` [PATCH 15/15] config: allow multi-byte core.commentChar Jeff King
                               ` (2 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:30 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

There is no longer any code which references the single-byte
comment_line_char. Let's drop it, clearing the way for true multi-byte
entries in comment_line_str.

It's possible there are topics in flight that have added new references
to comment_line_char. But we would prefer to fail compilation (and then
fix it) upon merging with this, rather than have them quietly ignore the
bytes after the first.

Signed-off-by: Jeff King <peff@peff.net>
---
I did merge against 'next' and there are no such topics. And likewise
"log -Scomment_line_char next..seen" shows nothing. But as somebody who
maintained a long-running fork for many years, who knows what people are
carrying in their private trees. ;)

 environment.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/environment.h b/environment.h
index 3496474cce..a8b06674eb 100644
--- a/environment.h
+++ b/environment.h
@@ -8,7 +8,6 @@ struct strvec;
  * The character that begins a commented line in user-editable file
  * that is subject to stripspace.
  */
-#define comment_line_char (comment_line_str[0])
 extern const char *comment_line_str;
 extern int auto_comment_line_char;
 
-- 
2.44.0.463.g71abcb3a9f


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH 15/15] config: allow multi-byte core.commentChar
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (13 preceding siblings ...)
  2024-03-07  9:30             ` [PATCH 14/15] environment: drop comment_line_char compatibility macro Jeff King
@ 2024-03-07  9:34             ` Jeff King
  2024-03-08 11:07             ` [PATCH 0/15] " Phillip Wood
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:34 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

Now that all of the code handles multi-byte comment characters, it's
safe to allow users to set them.

There is one special case I kept: we still will not allow an empty
string for the commentChar. While it might make sense in some contexts
(e.g., output where you don't want any comment prefix), there are plenty
where it will behave badly (e.g., all of our starts_with() checks will
indicate that every line is a comment!). It might be reasonable to
assign some meaningful semantics, but it would probably involve checking
how each site behaves. In the interim let's forbid it and we can loosen
things later.

Since comment_line_str is used in many parts of the code, it's hard to
cover all possibilities with tests. We can convert the existing
double-semicolon prefix test to show that "git status" works. And we'll
give it a more challenging case in t7507, where we confirm that
git-commit strips out the commit template along with any --verbose text
when reading the edited commit message back in. That covers the basics,
though it's possible there could be issues in more exotic spots (e.g.,
the sequencer todo list uses its own code).

Signed-off-by: Jeff King <peff@peff.net>
---
Obviously everything works using the "str" variant with a single
character, and many tests are already covering that. You can swap out
the default to "foo>" or something and run the test suite, but there are
many spots that hard-code "#" in their expectations.

I do think it's an acceptable risk, though; for the most part you'd only
find new bugs if you set a multi-byte core.commentChar, which was simply
not allowed before. So we're more likely to see bugs in the new feature
than regression of existing cases.

 Documentation/config/core.txt |  4 +++-
 config.c                      |  6 +++---
 t/t0030-stripspace.sh         |  5 +++++
 t/t7507-commit-verbose.sh     | 10 ++++++++++
 t/t7508-status.sh             |  4 +++-
 5 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 0e8c2832bf..c86b8c8408 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -523,7 +523,9 @@ core.commentChar::
 	Commands such as `commit` and `tag` that let you edit
 	messages consider a line that begins with this character
 	commented, and removes them after the editor returns
-	(default '#').
+	(default '#'). Note that this option can take values larger than
+	a byte (whether a single multi-byte character, or you
+	could even go wild with a multi-character sequence).
 +
 If set to "auto", `git-commit` would select a character that is not
 the beginning character of any line in existing commit messages.
diff --git a/config.c b/config.c
index e12ea68f24..4dea34936c 100644
--- a/config.c
+++ b/config.c
@@ -1565,11 +1565,11 @@ static int git_default_core_config(const char *var, const char *value,
 			return config_error_nonbool(var);
 		else if (!strcasecmp(value, "auto"))
 			auto_comment_line_char = 1;
-		else if (value[0] && !value[1]) {
-			comment_line_str = xstrfmt("%c", value[0]);
+		else if (value[0]) {
+			comment_line_str = xstrdup(value);
 			auto_comment_line_char = 0;
 		} else
-			return error(_("core.commentChar should only be one ASCII character"));
+			return error(_("core.commentChar must have at least one character"));
 		return 0;
 	}
 
diff --git a/t/t0030-stripspace.sh b/t/t0030-stripspace.sh
index d1b3be8725..9cdf2bddbd 100755
--- a/t/t0030-stripspace.sh
+++ b/t/t0030-stripspace.sh
@@ -401,6 +401,11 @@ test_expect_success 'strip comments with changed comment char' '
 	test -z "$(echo "; comment" | git -c core.commentchar=";" stripspace -s)"
 '
 
+test_expect_success 'empty commentchar is forbidden' '
+	test_must_fail git -c core.commentchar= stripspace -s 2>err &&
+	grep "core.commentChar must have at least one character" err
+'
+
 test_expect_success '-c with single line' '
 	printf "# foo\n" >expect &&
 	printf "foo" | git stripspace -c >actual &&
diff --git a/t/t7507-commit-verbose.sh b/t/t7507-commit-verbose.sh
index c3281b192e..4c7db19ce7 100755
--- a/t/t7507-commit-verbose.sh
+++ b/t/t7507-commit-verbose.sh
@@ -101,6 +101,16 @@ test_expect_success 'verbose diff is stripped out with set core.commentChar' '
 	test_grep "Aborting commit due to empty commit message." err
 '
 
+test_expect_success 'verbose diff is stripped with multi-byte comment char' '
+	(
+		GIT_EDITOR=cat &&
+		export GIT_EDITOR &&
+		test_must_fail git -c core.commentchar="foo>" commit -a -v >out 2>err
+	) &&
+	grep "^foo> " out &&
+	test_grep "Aborting commit due to empty commit message." err
+'
+
 test_expect_success 'status does not verbose without --verbose' '
 	git status >actual &&
 	! grep "^diff --git" actual
diff --git a/t/t7508-status.sh b/t/t7508-status.sh
index a3c18a4fc2..10ed8b32bc 100755
--- a/t/t7508-status.sh
+++ b/t/t7508-status.sh
@@ -1403,7 +1403,9 @@ test_expect_success "status (core.commentchar with submodule summary)" '
 
 test_expect_success "status (core.commentchar with two chars with submodule summary)" '
 	test_config core.commentchar ";;" &&
-	test_must_fail git -c status.displayCommentPrefix=true status
+	sed "s/^/;/" <expect >expect.double &&
+	git -c status.displayCommentPrefix=true status >output &&
+	test_cmp expect.double output
 '
 
 test_expect_success "--ignore-submodules=all suppresses submodule summary" '
-- 
2.44.0.463.g71abcb3a9f

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH 06/15] strbuf: accept a comment string for strbuf_stripspace()
  2024-03-07  9:21             ` [PATCH 06/15] strbuf: accept a comment string for strbuf_stripspace() Jeff King
@ 2024-03-07  9:53               ` Jeff King
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-07  9:53 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

On Thu, Mar 07, 2024 at 04:21:26AM -0500, Jeff King wrote:

> As part of our transition to multi-byte comment characters, let's take a
> NUL-terminated string pointer for strbuf_stripspace(), rather than a
> single character. We can continue to support its feature of ignoring
> comments by accepting a NULL pointer (as opposed to the current behavior
> of a NUL byte).
> 
> All of the callers have to be adjusted, but they can all just pass
> comment_line_str (or NULL).

Bah. I relied on the compiler to tell me the call-sites that needed to
be adjusted. But interestingly gcc is quite happy to allow '\0' to be
passed in place of a pointer, but clang complains:

  gpg-interface.c:589:37: error: expression which evaluates to zero treated as a null pointer constant of type 'const char *' [-Werror,-Wnon-literal-null-conversion]
          strbuf_stripspace(&ssh_keygen_out, '\0');
                                             ^~~~

Likewise there are a few bare "0"'s which do not cause a warning, but
which violate our style standards. So I think we'd want to squash the
patch below in to this step. The other functions don't need the same
treatment because they never treated NUL specially.

---
diff --git a/builtin/am.c b/builtin/am.c
index d1990d7edc..5bc72d7822 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1286,7 +1286,7 @@ static int parse_mail(struct am_state *state, const char *mail)
 
 	strbuf_addstr(&msg, "\n\n");
 	strbuf_addbuf(&msg, &mi.log_message);
-	strbuf_stripspace(&msg, '\0');
+	strbuf_stripspace(&msg, NULL);
 
 	assert(!state->author_name);
 	state->author_name = strbuf_detach(&author_name, NULL);
diff --git a/builtin/commit.c b/builtin/commit.c
index 8519a004d0..e04f1236e8 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -890,7 +890,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 	s->hints = 0;
 
 	if (clean_message_contents)
-		strbuf_stripspace(&sb, '\0');
+		strbuf_stripspace(&sb, NULL);
 
 	if (signoff)
 		append_signoff(&sb, ignored_log_message_bytes(sb.buf, sb.len), 0);
diff --git a/builtin/notes.c b/builtin/notes.c
index 1a67f01d00..cb011303e6 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -264,7 +264,7 @@ static void concat_messages(struct note_data *d)
 		if ((d->stripspace == UNSPECIFIED &&
 		     d->messages[i]->stripspace == STRIPSPACE) ||
 		    d->stripspace == STRIPSPACE)
-			strbuf_stripspace(&d->buf, 0);
+			strbuf_stripspace(&d->buf, NULL);
 		strbuf_reset(&msg);
 	}
 	strbuf_release(&msg);
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 9c76b62b02..f0aa962cf8 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -657,7 +657,7 @@ static int can_use_local_refs(const struct add_opts *opts)
 			strbuf_add_real_path(&path, get_worktree_git_dir(NULL));
 			strbuf_addstr(&path, "/HEAD");
 			strbuf_read_file(&contents, path.buf, 64);
-			strbuf_stripspace(&contents, 0);
+			strbuf_stripspace(&contents, NULL);
 			strbuf_strip_suffix(&contents, "\n");
 
 			warning(_("HEAD points to an invalid (or orphaned) reference.\n"
diff --git a/gpg-interface.c b/gpg-interface.c
index 95e764acb1..b5993385ff 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -586,8 +586,8 @@ static int verify_ssh_signed_buffer(struct signature_check *sigc,
 		}
 	}
 
-	strbuf_stripspace(&ssh_keygen_out, '\0');
-	strbuf_stripspace(&ssh_keygen_err, '\0');
+	strbuf_stripspace(&ssh_keygen_out, NULL);
+	strbuf_stripspace(&ssh_keygen_err, NULL);
 	/* Add stderr outputs to show the user actual ssh-keygen errors */
 	strbuf_add(&ssh_keygen_out, ssh_principals_err.buf, ssh_principals_err.len);
 	strbuf_add(&ssh_keygen_out, ssh_keygen_err.buf, ssh_keygen_err.len);

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-07  9:26             ` [PATCH 11/15] find multi-byte comment chars in unterminated buffers Jeff King
@ 2024-03-07 11:08               ` Jeff King
  2024-03-07 19:41                 ` René Scharfe
  2024-03-07 19:42               ` René Scharfe
  1 sibling, 1 reply; 82+ messages in thread
From: Jeff King @ 2024-03-07 11:08 UTC (permalink / raw)
  To: git
  Cc: René Scharfe, Junio C Hamano, Dragan Simic,
	Kristoffer Haugsbakk, Manlio Perillo

On Thu, Mar 07, 2024 at 04:26:38AM -0500, Jeff King wrote:

> IMHO this is the trickiest commit of the whole series, as it would be
> easy to get the length computations subtly wrong.

And sure enough...

> diff --git a/trailer.c b/trailer.c
> index fe18faf6c5..f59c90b4b5 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>  
>  	/* The first paragraph is the title and cannot be trailers */
>  	for (s = buf; s < buf + len; s = next_line(s)) {
> -		if (s[0] == comment_line_char)
> +		if (starts_with_mem(s, buf + len - s, comment_line_str))
>  			continue;
>  		if (is_blank_line(s))
>  			break;
> @@ -902,7 +902,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>  		const char **p;
>  		ssize_t separator_pos;
>  
> -		if (bol[0] == comment_line_char) {
> +		if (starts_with_mem(bol, buf + end_of_title - bol, comment_line_str)) {
>  			non_trailer_lines += possible_continuation_lines;
>  			possible_continuation_lines = 0;
>  			continue;

This second hunk needs:

diff --git a/trailer.c b/trailer.c
index f59c90b4b5..fdb0b8137e 100644
--- a/trailer.c
+++ b/trailer.c
@@ -902,7 +902,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
 		const char **p;
 		ssize_t separator_pos;
 
-		if (starts_with_mem(bol, buf + end_of_title - bol, comment_line_str)) {
+		if (starts_with_mem(bol, buf + len - bol, comment_line_str)) {
 			non_trailer_lines += possible_continuation_lines;
 			possible_continuation_lines = 0;
 			continue;

I was trying to bound the size based on the loop, which is:

          for (l = last_line(buf, len);
               l >= end_of_title;
               l = last_line(buf, l)) {
                  const char *bol = buf + l;

but I misread "end_of_title" as an upper bound, not a lower one. Which
makes sense because we're iterating backwards over the lines. So I
suppose we could bound it by the previous "bol" value. But in practice,
your prefix won't cross such a boundary anyway, as it won't have a
newline in it (maybe that's something we should enforce? I guess you
could set core.commentChar to '\n' even before my series, which would be
slightly insane).

So just bounding ourselves to "buf + len" seems reasonable, as that
makes sure we don't step outside the buffer passed into the function.

Curiously, this was found by the sanitizer job in CI, where UBSan
complains of integer overflow in the pointer computation. I had run with
both ASan/UBSan locally, but just using gcc, which doesn't seem to find
it (the CI job uses clang). So I'll that to my mental tally of "clang
seems to be better with sanitizers".

-Peff

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-07 11:08               ` Jeff King
@ 2024-03-07 19:41                 ` René Scharfe
  2024-03-07 19:47                   ` René Scharfe
  0 siblings, 1 reply; 82+ messages in thread
From: René Scharfe @ 2024-03-07 19:41 UTC (permalink / raw)
  To: Jeff King, git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

Am 07.03.24 um 12:08 schrieb Jeff King:
> On Thu, Mar 07, 2024 at 04:26:38AM -0500, Jeff King wrote:
>
>> IMHO this is the trickiest commit of the whole series, as it would be
>> easy to get the length computations subtly wrong.
>
> And sure enough...
>
>> diff --git a/trailer.c b/trailer.c
>> index fe18faf6c5..f59c90b4b5 100644
>> --- a/trailer.c
>> +++ b/trailer.c
>> @@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>>
>>  	/* The first paragraph is the title and cannot be trailers */
>>  	for (s = buf; s < buf + len; s = next_line(s)) {
>> -		if (s[0] == comment_line_char)
>> +		if (starts_with_mem(s, buf + len - s, comment_line_str))
>>  			continue;
>>  		if (is_blank_line(s))
>>  			break;
>> @@ -902,7 +902,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>>  		const char **p;
>>  		ssize_t separator_pos;
>>
>> -		if (bol[0] == comment_line_char) {
>> +		if (starts_with_mem(bol, buf + end_of_title - bol, comment_line_str)) {
>>  			non_trailer_lines += possible_continuation_lines;
>>  			possible_continuation_lines = 0;
>>  			continue;
>
> This second hunk needs:
>
> diff --git a/trailer.c b/trailer.c
> index f59c90b4b5..fdb0b8137e 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -902,7 +902,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>  		const char **p;
>  		ssize_t separator_pos;
>
> -		if (starts_with_mem(bol, buf + end_of_title - bol, comment_line_str)) {
> +		if (starts_with_mem(bol, buf + len - bol, comment_line_str)) {
>  			non_trailer_lines += possible_continuation_lines;
>  			possible_continuation_lines = 0;
>  			continue;
>
> I was trying to bound the size based on the loop, which is:
>
>           for (l = last_line(buf, len);
>                l >= end_of_title;
>                l = last_line(buf, l)) {
>                   const char *bol = buf + l;
>
> but I misread "end_of_title" as an upper bound, not a lower one. Which
> makes sense because we're iterating backwards over the lines. So I
> suppose we could bound it by the previous "bol" value. But in practice,
> your prefix won't cross such a boundary anyway, as it won't have a
> newline in it (maybe that's something we should enforce? I guess you
> could set core.commentChar to '\n' even before my series, which would be
> slightly insane).
>
> So just bounding ourselves to "buf + len" seems reasonable, as that
> makes sure we don't step outside the buffer passed into the function.
>
> Curiously, this was found by the sanitizer job in CI, where UBSan
> complains of integer overflow in the pointer computation. I had run with
> both ASan/UBSan locally, but just using gcc, which doesn't seem to find
> it (the CI job uses clang). So I'll that to my mental tally of "clang
> seems to be better with sanitizers".
>
> -Peff


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-07  9:26             ` [PATCH 11/15] find multi-byte comment chars in unterminated buffers Jeff King
  2024-03-07 11:08               ` Jeff King
@ 2024-03-07 19:42               ` René Scharfe
  2024-03-08 10:17                 ` Phillip Wood
  2024-03-12  8:05                 ` Jeff King
  1 sibling, 2 replies; 82+ messages in thread
From: René Scharfe @ 2024-03-07 19:42 UTC (permalink / raw)
  To: Jeff King, git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

Am 07.03.24 um 10:26 schrieb Jeff King:
> As with the previous patch, we need to swap out single-byte matching for
> something like starts_with() to match all bytes of a multi-byte comment
> character. But for cases where the buffer is not NUL-terminated (and we
> instead have an explicit size or end pointer), it's not safe to use
> starts_with(), as it might walk off the end of the buffer.
>
> Let's introduce a new starts_with_mem() that does the same thing but
> also accepts the length of the "haystack" str and makes sure not to walk
> past it.
>
> Note that in most cases the existing code did not need a length check at
> all, since it was written in a way that knew we had at least one byte
> available (and that was all we checked). So I had to read each one to
> find the appropriate bounds. The one exception is sequencer.c's
> add_commented_lines(), where we can actually get rid of the length
> check. Just like starts_with(), our starts_with_mem() handles an empty
> haystack variable by not matching (assuming a non-empty prefix).
>
> A few notes on the implementation of starts_with_mem():
>
>   - it would be equally correct to take an "end" pointer (and indeed,
>     many of the callers have this and have to subtract to come up with
>     the length). I think taking a ptr/size combo is a more usual
>     interface for our codebase, though, and has the added benefit that
>     the function signature makes it harder to mix up the three
>     parameters.
>
>   - we could obviously build starts_with() on top of this by passing
>     strlen(str) as the length. But it's possible that starts_with() is a
>     relatively hot code path, and it should not pay that penalty (it can
>     generally return an answer proportional to the size of the prefix,
>     not the whole string).
>
>   - it naively feels like xstrncmpz() should be able to do the same
>     thing, but that's not quite true. If you pass the length of the
>     haystack buffer, then strncmp() finds that a shorter prefix string
>     is "less than" than the haystack, even if the haystack starts with
>     the prefix. If you pass the length of the prefix, then you risk
>     reading past the end of the haystack if it is shorter than the
>     prefix. So I think we really do need a new function.

Yes.  xstrncmpz() compares a NUL-terminated string and a length-limited
string.  If you want to check whether the former is a prefix of the
latter then you need to stop comparing when reaching its NUL, and also
after exhausting the latter.  So you need to take both lengths into
account:

int starts_with_mem(const char *str, size_t len, const char *prefix)
{
	size_t prefixlen = strlen(prefix);
	return prefixlen <= len && !xstrncmpz(prefix, str, prefixlen);
}

Using memcmp() here is equivalent and simpler:

int starts_with_mem(const char *str, size_t len, const char *prefix)
{
	size_t prefixlen = strlen(prefix);
	return prefixlen <= len && !memcmp(str, prefix, prefixlen);
}

And your version below avoids function calls and avoids traversing the
strings beyond their common prefix, of course.

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> Arguably starts_with() and this new function should both be inlined,
> like we do for skip_prefix(), but I think that's out of scope for this
> series.

Inlining would allow the compiler to unroll the loop for string
constants.  I doubt it would do that for variables, as in the code
below.

Inlining the strlen()+memcmp() version above might allow the compiler
to push the strlen() call out of a loop.

Would any of that improve performance noticeably?  For the call sites
below I doubt it.  But it would probably increase the object text size.

> And it's possible I was simply too dumb to figure out xstrncmpz() here.
> I'm waiting for René to show up and tell me how to do it. ;)

Nah, it's not a good fit, as it requires the two strings to have the
same length.

>
> IMHO this is the trickiest commit of the whole series, as it would be
> easy to get the length computations subtly wrong.
>
>  commit.c    |  3 ++-
>  sequencer.c |  4 ++--
>  strbuf.c    | 11 +++++++++++
>  strbuf.h    |  1 +
>  trailer.c   |  4 ++--
>  5 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index ef679a0b93..531a666cba 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -1796,7 +1796,8 @@ size_t ignored_log_message_bytes(const char *buf, size_t len)
>  		else
>  			next_line++;
>
> -		if (buf[bol] == comment_line_char || buf[bol] == '\n') {
> +		if (starts_with_mem(buf + bol, cutoff - bol, comment_line_str) ||
> +		    buf[bol] == '\n') {
>  			/* is this the first of the run of comments? */
>  			if (!boc)
>  				boc = bol;
> diff --git a/sequencer.c b/sequencer.c
> index 991a2dbe96..664986e3b2 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -1840,7 +1840,7 @@ static int is_fixup_flag(enum todo_command command, unsigned flag)
>  static void add_commented_lines(struct strbuf *buf, const void *str, size_t len)
>  {
>  	const char *s = str;
> -	while (len > 0 && s[0] == comment_line_char) {
> +	while (starts_with_mem(s, len, comment_line_str)) {
>  		size_t count;
>  		const char *n = memchr(s, '\n', len);
>  		if (!n)
> @@ -2562,7 +2562,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
>  	/* left-trim */
>  	bol += strspn(bol, " \t");
>
> -	if (bol == eol || *bol == '\r' || *bol == comment_line_char) {
> +	if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) {

If the strspn() call is safe (which it is, as the caller expects the
string to be NUL-terminated) then you could use starts_with() here and
avoid the length calculation.  But that would also match
comment_line_str values that contain LF, which the _mem version does not
and that's better.

Not sure why lines that start with CR are considered comment lines,
though.

>  		item->command = TODO_COMMENT;
>  		item->commit = NULL;
>  		item->arg_offset = bol - buf;
> diff --git a/strbuf.c b/strbuf.c
> index 7c8f582127..291bdc2a65 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -24,6 +24,17 @@ int istarts_with(const char *str, const char *prefix)
>  			return 0;
>  }
>
> +int starts_with_mem(const char *str, size_t len, const char *prefix)
> +{
> +	const char *end = str + len;
> +	for (; ; str++, prefix++) {
> +		if (!*prefix)
> +			return 1;
> +		else if (str == end || *str != *prefix)
> +			return 0;
> +	}
> +}

So this checks whether a length-limited string has a prefix given as a
NUL-terminated string.  I'd have called it mem_starts_with() and have
expected starts_with_mem() to check a NUL-terminated string for a
length-limited prefix (think !strncmp(str, prefix, prefixlen)).

> +
>  int skip_to_optional_arg_default(const char *str, const char *prefix,
>  				 const char **arg, const char *def)
>  {
> diff --git a/strbuf.h b/strbuf.h
> index 58dddf2777..3156d6ea8c 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -673,6 +673,7 @@ char *xstrfmt(const char *fmt, ...);
>
>  int starts_with(const char *str, const char *prefix);
>  int istarts_with(const char *str, const char *prefix);
> +int starts_with_mem(const char *str, size_t len, const char *prefix);
>
>  /*
>   * If the string "str" is the same as the string in "prefix", then the "arg"
> diff --git a/trailer.c b/trailer.c
> index fe18faf6c5..f59c90b4b5 100644
> --- a/trailer.c
> +++ b/trailer.c
> @@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>
>  	/* The first paragraph is the title and cannot be trailers */
>  	for (s = buf; s < buf + len; s = next_line(s)) {
> -		if (s[0] == comment_line_char)
> +		if (starts_with_mem(s, buf + len - s, comment_line_str))
>  			continue;
>  		if (is_blank_line(s))

Another case where starts_with() would be safe to use, as
is_blank_line() expects (and gets) a NUL-terminated string, but it would
allow matching comment_line_str values that contain LF.

>  			break;
> @@ -902,7 +902,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>  		const char **p;
>  		ssize_t separator_pos;
>
> -		if (bol[0] == comment_line_char) {
> +		if (starts_with_mem(bol, buf + end_of_title - bol, comment_line_str)) {

We're in the same buffer, so the above comment applies here as well.

>  			non_trailer_lines += possible_continuation_lines;
>  			possible_continuation_lines = 0;
>  			continue;

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-07 19:41                 ` René Scharfe
@ 2024-03-07 19:47                   ` René Scharfe
  0 siblings, 0 replies; 82+ messages in thread
From: René Scharfe @ 2024-03-07 19:47 UTC (permalink / raw)
  To: Jeff King, git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

Am 07.03.24 um 20:41 schrieb René Scharfe:

Sorry, sent too early.

> Am 07.03.24 um 12:08 schrieb Jeff King:
>> On Thu, Mar 07, 2024 at 04:26:38AM -0500, Jeff King wrote:
>>
>>> IMHO this is the trickiest commit of the whole series, as it would be
>>> easy to get the length computations subtly wrong.
>>
>> And sure enough...
>>
>>> diff --git a/trailer.c b/trailer.c
>>> index fe18faf6c5..f59c90b4b5 100644
>>> --- a/trailer.c
>>> +++ b/trailer.c
>>> @@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>>>
>>>  	/* The first paragraph is the title and cannot be trailers */
>>>  	for (s = buf; s < buf + len; s = next_line(s)) {
>>> -		if (s[0] == comment_line_char)
>>> +		if (starts_with_mem(s, buf + len - s, comment_line_str))
>>>  			continue;
>>>  		if (is_blank_line(s))
>>>  			break;
>>> @@ -902,7 +902,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>>>  		const char **p;
>>>  		ssize_t separator_pos;
>>>
>>> -		if (bol[0] == comment_line_char) {
>>> +		if (starts_with_mem(bol, buf + end_of_title - bol, comment_line_str)) {
>>>  			non_trailer_lines += possible_continuation_lines;
>>>  			possible_continuation_lines = 0;
>>>  			continue;
>>
>> This second hunk needs:
>>
>> diff --git a/trailer.c b/trailer.c
>> index f59c90b4b5..fdb0b8137e 100644
>> --- a/trailer.c
>> +++ b/trailer.c
>> @@ -902,7 +902,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>>  		const char **p;
>>  		ssize_t separator_pos;
>>
>> -		if (starts_with_mem(bol, buf + end_of_title - bol, comment_line_str)) {
>> +		if (starts_with_mem(bol, buf + len - bol, comment_line_str)) {
>>  			non_trailer_lines += possible_continuation_lines;
>>  			possible_continuation_lines = 0;
>>  			continue;
>>
>> I was trying to bound the size based on the loop, which is:
>>
>>           for (l = last_line(buf, len);
>>                l >= end_of_title;
>>                l = last_line(buf, l)) {
>>                   const char *bol = buf + l;
>>
>> but I misread "end_of_title" as an upper bound, not a lower one. Which
>> makes sense because we're iterating backwards over the lines. So I
>> suppose we could bound it by the previous "bol" value. But in practice,
>> your prefix won't cross such a boundary anyway, as it won't have a
>> newline in it (maybe that's something we should enforce? I guess you
>> could set core.commentChar to '\n' even before my series, which would be
>> slightly insane).
>>
>> So just bounding ourselves to "buf + len" seems reasonable, as that
>> makes sure we don't step outside the buffer passed into the function.

If you don't want or expect LF in comment_line_str, better check it.
And if you do that, most callers of starts_with_mem() -- including this
one -- can use starts_with() instead, as mentioned in my reply to your
patch.  Less calculations, less errors..

René




^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-07 19:42               ` René Scharfe
@ 2024-03-08 10:17                 ` Phillip Wood
  2024-03-08 15:58                   ` Junio C Hamano
  2024-03-12  8:05                 ` Jeff King
  1 sibling, 1 reply; 82+ messages in thread
From: Phillip Wood @ 2024-03-08 10:17 UTC (permalink / raw)
  To: René Scharfe, Jeff King, git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

Hi Peff and René

On 07/03/2024 19:42, René Scharfe wrote:
> Am 07.03.24 um 10:26 schrieb Jeff King:
>> diff --git a/sequencer.c b/sequencer.c
>> index 991a2dbe96..664986e3b2 100644
>> --- a/sequencer.c
>> +++ b/sequencer.c
>> @@ -1840,7 +1840,7 @@ static int is_fixup_flag(enum todo_command command, unsigned flag)
>>   static void add_commented_lines(struct strbuf *buf, const void *str, size_t len)
>>   {
>>   	const char *s = str;
>> -	while (len > 0 && s[0] == comment_line_char) {
>> +	while (starts_with_mem(s, len, comment_line_str)) {
>>   		size_t count;
>>   		const char *n = memchr(s, '\n', len);
>>   		if (!n)
>> @@ -2562,7 +2562,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
>>   	/* left-trim */
>>   	bol += strspn(bol, " \t");
>>
>> -	if (bol == eol || *bol == '\r' || *bol == comment_line_char) {
>> +	if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) {
> 
> If the strspn() call is safe (which it is, as the caller expects the
> string to be NUL-terminated) then you could use starts_with() here and
> avoid the length calculation.  But that would also match
> comment_line_str values that contain LF, which the _mem version does not > and that's better.

I agree with your analysis. I do wonder though if we should reject 
whitespace and control characters when parsing core.commentChar, it 
feels like accepting them is a bug waiting to happen. If 
comment_line_char starts with ' ' or '\t' that part will be eaten by the 
strspn() above and so starts_with_mem() wont match. Also we will never 
match a comment if comment_line_str contains '\n'.

> Not sure why lines that start with CR are considered comment lines,
> though.

I think it is a lazy way of looking for an empty line ending in CR LF, 
it should really be

	|| (bol[0] == '\r' && bol[1] == '\n') ||

Best Wishes

Phillip

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 12/15] sequencer: handle multi-byte comment characters when writing todo list
  2024-03-07  9:27             ` [PATCH 12/15] sequencer: handle multi-byte comment characters when writing todo list Jeff King
@ 2024-03-08 10:20               ` Phillip Wood
  2024-03-12  8:21                 ` Jeff King
  0 siblings, 1 reply; 82+ messages in thread
From: Phillip Wood @ 2024-03-08 10:20 UTC (permalink / raw)
  To: Jeff King, git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

Hi Peff

On 07/03/2024 09:27, Jeff King wrote:
> We already match multi-byte comment characters in parse_insn_line(),
> thanks to the previous commit, yielding a TODO_COMMENT entry. But in
> todo_list_to_strbuf(), we may call command_to_char() to convert that
> back into something we can output.
> 
> We can't just return comment_line_char anymore, since it may require
> multiple bytes. Instead, we'll return "0" for this case, which is the
> same thing we'd return for a command which does not have a single-letter
> abbreviation (e.g., "revert" or "noop"). In that case the caller then
> falls back to outputting the full name via command_to_string(). So we
> can handle TODO_COMMENT there, returning the full string.

If you do re-roll it might be helpful to emphasize that there is only 
one caller.

> Note that there are many other callers of command_to_string(), which
> will now behave differently if they pass TODO_COMMENT. But we would not
> expect that to happen; prior to this commit, the function just calls
> die() in this case. And looking at those callers, that makes sense;
> e.g., do_pick_commit() will only be called when servicing a pick
> command, and should never be called for a comment in the first place.

I've checked the other callers and agree with your analysis. The fact 
that it used to die() also makes it pretty clear that this should be safe.

Best Wishes

Phillip

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>   sequencer.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index 664986e3b2..9e2851428b 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -1779,14 +1779,16 @@ static const char *command_to_string(const enum todo_command command)
>   {
>   	if (command < TODO_COMMENT)
>   		return todo_command_info[command].str;
> +	if (command == TODO_COMMENT)
> +		return comment_line_str;
>   	die(_("unknown command: %d"), command);
>   }
>   
>   static char command_to_char(const enum todo_command command)
>   {
>   	if (command < TODO_COMMENT)
>   		return todo_command_info[command].c;
> -	return comment_line_char;
> +	return 0;
>   }
>   
>   static int is_noop(const enum todo_command command)

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 0/15] allow multi-byte core.commentChar
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (14 preceding siblings ...)
  2024-03-07  9:34             ` [PATCH 15/15] config: allow multi-byte core.commentChar Jeff King
@ 2024-03-08 11:07             ` Phillip Wood
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
  16 siblings, 0 replies; 82+ messages in thread
From: Phillip Wood @ 2024-03-08 11:07 UTC (permalink / raw)
  To: Jeff King, git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

Hi Peff

On 07/03/2024 09:14, Jeff King wrote:
> On Wed, Mar 06, 2024 at 03:08:04AM -0500, Jeff King wrote:
> 
>> For a more readable series, I'd guess it would make sense to introduce
>> comment_line_str as a separate variable (but continue to enforce the
>> single-char rule), convert the easy cases en masse, the tricky cases one
>> by one, and then finally drop comment_line_char entirely. At which point
>> the config rules can be lifted to allow multi-byte strings.
> 
> I ended up cleaning this up. Like I said, this isn't something I'm
> personally that interested in. But it just seemed like a wart that this
> one spot could not handle multi-byte characters that all the cool kids
> are using in their prompts etc these days.

I agree it would be nice to support multibyte comment characters on 
principle even if I don't think I'd use that feature myself. I've looked 
through the changes to the sequencer and they all look sensible to me. 
As I mentioned when looking at patch 11 I do wonder if we want to reject 
ascii whitespace and control characters when parsing core.commentChar. 
At a minimum leading whitespace and LF anywhere in the comment string 
feel like they are asking for trouble.

Best Wishes

Phillip

> Plus it was kind of an interesting puzzle for how to lay out the
> refactoring to make each step self-consistent. At the very least, I
> think the first couple of cleanups are worth it even if we do not see
> the whole thing through. ;)
> 
> It obviously nullifies kh/doc-commentchar-is-a-byte, which is in 'next'.
> Sadly "git merge" does not find a conflict with the documentation update
> in patch 15, so we'll have to remember to pick up one topic or the
> other.
> 
> I'm using U+00BB as my commentChar for now to see if any bugs show up,
> but I expect I'll get sick of it after a few days.
> 
>    [01/15]: strbuf: simplify comment-handling in add_lines() helper
>    [02/15]: strbuf: avoid static variables in strbuf_add_commented_lines()
>    [03/15]: commit: refactor base-case of adjust_comment_line_char()
>    [04/15]: strbuf: avoid shadowing global comment_line_char name
> 
>      These four are cleanups that could be taken independently.
> 
>    [05/15]: environment: store comment_line_char as a string
> 
>      This one preps us for incrementally moving code over to the new
>      system.
> 
>    [06/15]: strbuf: accept a comment string for strbuf_stripspace()
>    [07/15]: strbuf: accept a comment string for strbuf_commented_addf()
>    [08/15]: strbuf: accept a comment string for strbuf_add_commented_lines()
>    [09/15]: prefer comment_line_str to comment_line_char for printing
>    [10/15]: find multi-byte comment chars in NUL-terminated strings
>    [11/15]: find multi-byte comment chars in unterminated buffers
>    [12/15]: sequencer: handle multi-byte comment characters when writing todo list
>    [13/15]: wt-status: drop custom comment-char stringification
> 
>      These ones are the actual transition.
> 
>    [14/15]: environment: drop comment_line_char compatibility macro
>    [15/15]: config: allow multi-byte core.commentChar
> 
>      And then we tie it off by dropping the now-unused bits and loosening
>      the config logic.
> 
>   Documentation/config/core.txt |  4 ++-
>   add-patch.c                   | 14 +++++-----
>   builtin/branch.c              |  8 +++---
>   builtin/commit.c              | 19 +++++++-------
>   builtin/merge.c               | 12 ++++-----
>   builtin/notes.c               | 10 ++++----
>   builtin/rebase.c              |  2 +-
>   builtin/stripspace.c          |  4 +--
>   builtin/tag.c                 | 14 +++++-----
>   commit.c                      |  3 ++-
>   config.c                      |  6 ++---
>   environment.c                 |  2 +-
>   environment.h                 |  2 +-
>   fmt-merge-msg.c               |  8 +++---
>   rebase-interactive.c          | 10 ++++----
>   sequencer.c                   | 48 ++++++++++++++++++-----------------
>   strbuf.c                      | 47 ++++++++++++++++++----------------
>   strbuf.h                      |  9 ++++---
>   t/t0030-stripspace.sh         |  5 ++++
>   t/t7507-commit-verbose.sh     | 10 ++++++++
>   t/t7508-status.sh             |  4 ++-
>   trailer.c                     |  6 ++---
>   wt-status.c                   | 31 +++++++++-------------
>   23 files changed, 149 insertions(+), 129 deletions(-)
> 


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-08 10:17                 ` Phillip Wood
@ 2024-03-08 15:58                   ` Junio C Hamano
  2024-03-08 16:20                     ` Phillip Wood
  0 siblings, 1 reply; 82+ messages in thread
From: Junio C Hamano @ 2024-03-08 15:58 UTC (permalink / raw)
  To: Phillip Wood
  Cc: René Scharfe, Jeff King, git, Dragan Simic,
	Kristoffer Haugsbakk, Manlio Perillo

Phillip Wood <phillip.wood123@gmail.com> writes:

> I agree with your analysis. I do wonder though if we should reject
> whitespace and control characters when parsing core.commentChar, it
> feels like accepting them is a bug waiting to happen. If
> comment_line_char starts with ' ' or '\t' that part will be eaten by
> the strspn() above and so starts_with_mem() wont match. Also we will
> never match a comment if comment_line_str contains '\n'.

Another thing I was wondering is what we want to do a random
byte-sequence that may match from the middle of a multi-byte UTF-8
character.

The reason I haven't mentioned these "nonsense input" is because
they will at worst only lead to self-denial-of-service to those who
are too curious, and will fall into "don't do it then" category.

Also, what exactly is the definition of "nonsense" will become can
of worms.  I can sympathise if somebody wants to use "#\t" to give
themselves a bit more room than usual on the left for visibility,
for example, so there might be a case to want whitespace characters.

>> Not sure why lines that start with CR are considered comment lines,
>> though.
>
> I think it is a lazy way of looking for an empty line ending in CR LF,
> it should really be
>
> 	|| (bol[0] == '\r' && bol[1] == '\n') ||

My recollection matches your speculation. 

IIRC the lazy persono was probably me but I didn't run "git blame".

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-08 15:58                   ` Junio C Hamano
@ 2024-03-08 16:20                     ` Phillip Wood
  2024-03-12  8:19                       ` Jeff King
  0 siblings, 1 reply; 82+ messages in thread
From: Phillip Wood @ 2024-03-08 16:20 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: René Scharfe, Jeff King, git, Dragan Simic,
	Kristoffer Haugsbakk, Manlio Perillo

On 08/03/2024 15:58, Junio C Hamano wrote:
> Phillip Wood <phillip.wood123@gmail.com> writes:
> 
>> I agree with your analysis. I do wonder though if we should reject
>> whitespace and control characters when parsing core.commentChar, it
>> feels like accepting them is a bug waiting to happen. If
>> comment_line_char starts with ' ' or '\t' that part will be eaten by
>> the strspn() above and so starts_with_mem() wont match. Also we will
>> never match a comment if comment_line_str contains '\n'.
> 
> Another thing I was wondering is what we want to do a random
> byte-sequence that may match from the middle of a multi-byte UTF-8
> character.
> 
> The reason I haven't mentioned these "nonsense input" is because
> they will at worst only lead to self-denial-of-service to those who
> are too curious, and will fall into "don't do it then" category.

We could certainly leave it as-is and tell users they are only hurting 
themselves if they complain when it does not work.

> Also, what exactly is the definition of "nonsense" will become can
> of worms.  I can sympathise if somebody wants to use "#\t" to give
> themselves a bit more room than usual on the left for visibility,
> for example, so there might be a case to want whitespace characters.

That's fair, maybe we could just ban leading whitespace if we do decide 
to restrict core.commentChar

Best Wishes

Phillip

>>> Not sure why lines that start with CR are considered comment lines,
>>> though.
>>
>> I think it is a lazy way of looking for an empty line ending in CR LF,
>> it should really be
>>
>> 	|| (bol[0] == '\r' && bol[1] == '\n') ||
> 
> My recollection matches your speculation.
> 
> IIRC the lazy persono was probably me but I didn't run "git blame".

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-07 19:42               ` René Scharfe
  2024-03-08 10:17                 ` Phillip Wood
@ 2024-03-12  8:05                 ` Jeff King
  2024-03-14 19:37                   ` René Scharfe
  1 sibling, 1 reply; 82+ messages in thread
From: Jeff King @ 2024-03-12  8:05 UTC (permalink / raw)
  To: René Scharfe
  Cc: git, Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

On Thu, Mar 07, 2024 at 08:42:22PM +0100, René Scharfe wrote:

> > Arguably starts_with() and this new function should both be inlined,
> > like we do for skip_prefix(), but I think that's out of scope for this
> > series.
> 
> Inlining would allow the compiler to unroll the loop for string
> constants.  I doubt it would do that for variables, as in the code
> below.
> 
> Inlining the strlen()+memcmp() version above might allow the compiler
> to push the strlen() call out of a loop.
> 
> Would any of that improve performance noticeably?  For the call sites
> below I doubt it.  But it would probably increase the object text size.

Good point. With non-constant prefixes in these cases, it probably
wouldn't buy much. There are a lot of other cases with actual string
constants. A compiler in theory could turn starts_with(str, "foo") into
a few instructions. But it's not even clear that it's in very many hot
paths. It would definitely be something we'd have to measure.

> > And it's possible I was simply too dumb to figure out xstrncmpz() here.
> > I'm waiting for René to show up and tell me how to do it. ;)
> 
> Nah, it's not a good fit, as it requires the two strings to have the
> same length.

Thanks for confirming I wasn't missing anything. :)

> > @@ -2562,7 +2562,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
> >  	/* left-trim */
> >  	bol += strspn(bol, " \t");
> >
> > -	if (bol == eol || *bol == '\r' || *bol == comment_line_char) {
> > +	if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) {
> 
> If the strspn() call is safe (which it is, as the caller expects the
> string to be NUL-terminated) then you could use starts_with() here and
> avoid the length calculation.  But that would also match
> comment_line_str values that contain LF, which the _mem version does not
> and that's better.

I try not to read too much into the use of string functions on what
otherwise appears to be an unterminated buffer. While in Git it is quite
often terminated at allocation time (coming from a strbuf, etc) I feel
like I've fixed a number of out-of-bounds reads simply due to sloppy
practices. And even if something is correct today, it is easy for it to
change, since the assumption is made far away from allocation.

So I dunno. Like you said, fewer computations is fewer opportunity to
mess things up. I don't like the idea of introducing a new hand-grenade
that might blow up later, but maybe if it's right next to a strspn()
call that's already a problem, it's not materially making anything
worse.

> > +int starts_with_mem(const char *str, size_t len, const char *prefix)
> > +{
> > +	const char *end = str + len;
> > +	for (; ; str++, prefix++) {
> > +		if (!*prefix)
> > +			return 1;
> > +		else if (str == end || *str != *prefix)
> > +			return 0;
> > +	}
> > +}
> 
> So this checks whether a length-limited string has a prefix given as a
> NUL-terminated string.  I'd have called it mem_starts_with() and have
> expected starts_with_mem() to check a NUL-terminated string for a
> length-limited prefix (think !strncmp(str, prefix, prefixlen)).

I was going for consistency with skip_prefix_mem() and strip_suffix_mem().
To be fair, I probably also named those ones, but I think it's pretty
established. We've never needed the length-limited prefix variant yet,
so I don't know that we're squatting on anything too valuable.

> > @@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
> >
> >  	/* The first paragraph is the title and cannot be trailers */
> >  	for (s = buf; s < buf + len; s = next_line(s)) {
> > -		if (s[0] == comment_line_char)
> > +		if (starts_with_mem(s, buf + len - s, comment_line_str))
> >  			continue;
> >  		if (is_blank_line(s))
> 
> Another case where starts_with() would be safe to use, as
> is_blank_line() expects (and gets) a NUL-terminated string, but it would
> allow matching comment_line_str values that contain LF.

Hmm. Yes, it is a NUL-terminated string always, but the caller has told
us not to look past end_of_log_message(). I suspect that if there is no
newline in comment_line_str() it's probably impossible to go past "len"
(just because the end of the log surely ends with either a NUL or a
newline). But it feels iffy to me. I dunno.

-Peff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-08 16:20                     ` Phillip Wood
@ 2024-03-12  8:19                       ` Jeff King
  2024-03-12 14:36                         ` phillip.wood123
  0 siblings, 1 reply; 82+ messages in thread
From: Jeff King @ 2024-03-12  8:19 UTC (permalink / raw)
  To: phillip.wood
  Cc: Junio C Hamano, René Scharfe, git, Dragan Simic,
	Kristoffer Haugsbakk, Manlio Perillo

On Fri, Mar 08, 2024 at 04:20:12PM +0000, Phillip Wood wrote:

> On 08/03/2024 15:58, Junio C Hamano wrote:
> > Phillip Wood <phillip.wood123@gmail.com> writes:
> > 
> > > I agree with your analysis. I do wonder though if we should reject
> > > whitespace and control characters when parsing core.commentChar, it
> > > feels like accepting them is a bug waiting to happen. If
> > > comment_line_char starts with ' ' or '\t' that part will be eaten by
> > > the strspn() above and so starts_with_mem() wont match. Also we will
> > > never match a comment if comment_line_str contains '\n'.
> > 
> > Another thing I was wondering is what we want to do a random
> > byte-sequence that may match from the middle of a multi-byte UTF-8
> > character.
> > 
> > The reason I haven't mentioned these "nonsense input" is because
> > they will at worst only lead to self-denial-of-service to those who
> > are too curious, and will fall into "don't do it then" category.
> 
> We could certainly leave it as-is and tell users they are only hurting
> themselves if they complain when it does not work.

That was mostly my plan. To some degree I think this is orthogonal to my
series. You can already set core.commentChar to space or newline, and
I'm sure the results are not very good. Actually, I guess it is easy to
try:

  git -c core.commentChar=$'\n' commit --allow-empty

treats everything as not-a-comment.

Maybe it's worth forbidding this at the start of the series, and then
carrying it through. I really do think newline is the most special
character here, just because it's obviously going to be meaningful to
all of our line-oriented parsing. So you'll get weird results, as
opposed to broken multibyte characters, where things would still work if
you choose to consistently use them (and arguably we cannot even define
"broken" as the user can use a different encoding).

Likewise, I guess people might complain that their core.commentChar is
NFD and their editor writes out NFC characters or something, and we
don't match. I was hoping we could just punt on that and nobody would
ever notice (certainly I think it is OK to punt for now and somebody who
truly cares can make a utf8_starts_with() or similar).

> > Also, what exactly is the definition of "nonsense" will become can
> > of worms.  I can sympathise if somebody wants to use "#\t" to give
> > themselves a bit more room than usual on the left for visibility,
> > for example, so there might be a case to want whitespace characters.
> 
> That's fair, maybe we could just ban leading whitespace if we do decide to
> restrict core.commentChar

Leading whitespace actually does work, though I think you'd be slightly
insane to use it.

I'm currently using "! COMMENT !" (after using a unicode char for a few
days). It's horribly ugly, but I wanted to see if any bugs cropped up
(and vim's built-in git syntax highlighting colors it correctly ;) ).

-Peff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 12/15] sequencer: handle multi-byte comment characters when writing todo list
  2024-03-08 10:20               ` Phillip Wood
@ 2024-03-12  8:21                 ` Jeff King
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  8:21 UTC (permalink / raw)
  To: phillip.wood
  Cc: git, Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

On Fri, Mar 08, 2024 at 10:20:45AM +0000, Phillip Wood wrote:

> Hi Peff
> 
> On 07/03/2024 09:27, Jeff King wrote:
> > We already match multi-byte comment characters in parse_insn_line(),
> > thanks to the previous commit, yielding a TODO_COMMENT entry. But in
> > todo_list_to_strbuf(), we may call command_to_char() to convert that
> > back into something we can output.
> > 
> > We can't just return comment_line_char anymore, since it may require
> > multiple bytes. Instead, we'll return "0" for this case, which is the
> > same thing we'd return for a command which does not have a single-letter
> > abbreviation (e.g., "revert" or "noop"). In that case the caller then
> > falls back to outputting the full name via command_to_string(). So we
> > can handle TODO_COMMENT there, returning the full string.
> 
> If you do re-roll it might be helpful to emphasize that there is only one
> caller.

Thanks, will do.

-Peff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH v2 0/16] allow multi-byte core.commentChar
  2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
                               ` (15 preceding siblings ...)
  2024-03-08 11:07             ` [PATCH 0/15] " Phillip Wood
@ 2024-03-12  9:10             ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 01/16] config: forbid newline as core.commentChar Jeff King
                                 ` (16 more replies)
  16 siblings, 17 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:10 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

Here's a revised version of my series. It incorporates the fixups I sent
(which I think Junio had applied already), and incorporates a new patch
at the beginning to forbid newlines.

I _didn't_ convert any of the starts_with_mem() call to starts_with().
I'm on the fence on whether that is simplifying things or creating
potential confusion/bugs later.

If we don't like the new patch 1 (or if we prefer to do it on top; there
is really not much reason to prefer one or the other), then this should
otherwise be the same as what Junio has already queued as
jk/core-comment-char.

Range diff (from v1, without my fixups) is below.

 -:  ---------- >  1:  86efec435d config: forbid newline as core.commentChar
 1:  be18aa04e3 =  2:  7c016e5dc3 strbuf: simplify comment-handling in add_lines() helper
 2:  0f8ea2a86d =  3:  2b4170b5f0 strbuf: avoid static variables in strbuf_add_commented_lines()
 3:  9b56d9f4f0 =  4:  24ca214986 commit: refactor base-case of adjust_comment_line_char()
 4:  0a191e5588 =  5:  9f6433dbe6 strbuf: avoid shadowing global comment_line_char name
 5:  f41e196138 !  6:  d0f32f10f9 environment: store comment_line_char as a string
    @@ builtin/commit.c: static void adjust_comment_line_char(const struct strbuf *sb)
     
      ## config.c ##
     @@ config.c: static int git_default_core_config(const char *var, const char *value,
    - 		else if (!strcasecmp(value, "auto"))
    - 			auto_comment_line_char = 1;
      		else if (value[0] && !value[1]) {
    + 			if (value[0] == '\n')
    + 				return error(_("core.commentChar cannot be newline"));
     -			comment_line_char = value[0];
     +			comment_line_str = xstrfmt("%c", value[0]);
      			auto_comment_line_char = 0;
 6:  84261af2ed !  7:  2c91628564 strbuf: accept a comment string for strbuf_stripspace()
    @@ Commit message
     
         Signed-off-by: Jeff King <peff@peff.net>
     
    + ## builtin/am.c ##
    +@@ builtin/am.c: static int parse_mail(struct am_state *state, const char *mail)
    + 
    + 	strbuf_addstr(&msg, "\n\n");
    + 	strbuf_addbuf(&msg, &mi.log_message);
    +-	strbuf_stripspace(&msg, '\0');
    ++	strbuf_stripspace(&msg, NULL);
    + 
    + 	assert(!state->author_name);
    + 	state->author_name = strbuf_detach(&author_name, NULL);
    +
      ## builtin/branch.c ##
     @@ builtin/branch.c: static int edit_branch_description(const char *branch_name)
      		strbuf_release(&buf);
    @@ builtin/branch.c: static int edit_branch_description(const char *branch_name)
      	strbuf_addf(&name, "branch.%s.description", branch_name);
      	if (buf.len || exists)
     
    + ## builtin/commit.c ##
    +@@ builtin/commit.c: static int prepare_to_commit(const char *index_file, const char *prefix,
    + 	s->hints = 0;
    + 
    + 	if (clean_message_contents)
    +-		strbuf_stripspace(&sb, '\0');
    ++		strbuf_stripspace(&sb, NULL);
    + 
    + 	if (signoff)
    + 		append_signoff(&sb, ignored_log_message_bytes(sb.buf, sb.len), 0);
    +
      ## builtin/notes.c ##
     @@ builtin/notes.c: static void prepare_note_data(const struct object_id *object, struct note_data *
      			die(_("please supply the note contents using either -m or -F option"));
    @@ builtin/notes.c: static void prepare_note_data(const struct object_id *object, s
      	}
      }
      
    +@@ builtin/notes.c: static void concat_messages(struct note_data *d)
    + 		if ((d->stripspace == UNSPECIFIED &&
    + 		     d->messages[i]->stripspace == STRIPSPACE) ||
    + 		    d->stripspace == STRIPSPACE)
    +-			strbuf_stripspace(&d->buf, 0);
    ++			strbuf_stripspace(&d->buf, NULL);
    + 		strbuf_reset(&msg);
    + 	}
    + 	strbuf_release(&msg);
     
      ## builtin/rebase.c ##
     @@ builtin/rebase.c: static int edit_todo_file(unsigned flags)
    @@ builtin/tag.c: static void create_tag(const struct object_id *object, const char
      	if (!opt->message_given && !buf->len)
      		die(_("no tag message?"));
     
    + ## builtin/worktree.c ##
    +@@ builtin/worktree.c: static int can_use_local_refs(const struct add_opts *opts)
    + 			strbuf_add_real_path(&path, get_worktree_git_dir(NULL));
    + 			strbuf_addstr(&path, "/HEAD");
    + 			strbuf_read_file(&contents, path.buf, 64);
    +-			strbuf_stripspace(&contents, 0);
    ++			strbuf_stripspace(&contents, NULL);
    + 			strbuf_strip_suffix(&contents, "\n");
    + 
    + 			warning(_("HEAD points to an invalid (or orphaned) reference.\n"
    +
    + ## gpg-interface.c ##
    +@@ gpg-interface.c: static int verify_ssh_signed_buffer(struct signature_check *sigc,
    + 		}
    + 	}
    + 
    +-	strbuf_stripspace(&ssh_keygen_out, '\0');
    +-	strbuf_stripspace(&ssh_keygen_err, '\0');
    ++	strbuf_stripspace(&ssh_keygen_out, NULL);
    ++	strbuf_stripspace(&ssh_keygen_err, NULL);
    + 	/* Add stderr outputs to show the user actual ssh-keygen errors */
    + 	strbuf_add(&ssh_keygen_out, ssh_principals_err.buf, ssh_principals_err.len);
    + 	strbuf_add(&ssh_keygen_out, ssh_keygen_err.buf, ssh_keygen_err.len);
    +
      ## rebase-interactive.c ##
     @@ rebase-interactive.c: int edit_todo_list(struct repository *r, struct todo_list *todo_list,
      	if (launch_sequence_editor(todo_file, &new_todo->buf, NULL))
 7:  bb22f9c9c5 =  8:  a271207e48 strbuf: accept a comment string for strbuf_commented_addf()
 8:  8d20688e87 =  9:  c1831453d8 strbuf: accept a comment string for strbuf_add_commented_lines()
 9:  4b22efb941 = 10:  523eb9e534 prefer comment_line_str to comment_line_char for printing
10:  cd03310902 = 11:  85428eadaa find multi-byte comment chars in NUL-terminated strings
11:  13a346480e ! 12:  b9e2e2302d find multi-byte comment chars in unterminated buffers
    @@ trailer.c: static size_t find_trailer_block_start(const char *buf, size_t len)
      		ssize_t separator_pos;
      
     -		if (bol[0] == comment_line_char) {
    -+		if (starts_with_mem(bol, buf + end_of_title - bol, comment_line_str)) {
    ++		if (starts_with_mem(bol, buf + len - bol, comment_line_str)) {
      			non_trailer_lines += possible_continuation_lines;
      			possible_continuation_lines = 0;
      			continue;
12:  fb3c6659fc ! 13:  7661ca6306 sequencer: handle multi-byte comment characters when writing todo list
    @@ Commit message
         We can't just return comment_line_char anymore, since it may require
         multiple bytes. Instead, we'll return "0" for this case, which is the
         same thing we'd return for a command which does not have a single-letter
    -    abbreviation (e.g., "revert" or "noop"). In that case the caller then
    -    falls back to outputting the full name via command_to_string(). So we
    -    can handle TODO_COMMENT there, returning the full string.
    +    abbreviation (e.g., "revert" or "noop"). There is only a single caller
    +    of command_to_char(), and upon seeing "0" it falls back to outputting
    +    the full name via command_to_string(). So we can handle TODO_COMMENT
    +    there, returning the full string.
     
         Note that there are many other callers of command_to_string(), which
         will now behave differently if they pass TODO_COMMENT. But we would not
13:  94524b8817 = 14:  8ddab67432 wt-status: drop custom comment-char stringification
14:  d754e86f7b = 15:  16d65f9179 environment: drop comment_line_char compatibility macro
15:  a6ffe08469 ! 16:  461cc720a0 config: allow multi-byte core.commentChar
    @@ Commit message
         how each site behaves. In the interim let's forbid it and we can loosen
         things later.
     
    +    Likewise, the "commentChar cannot be a newline" rule is now extended to
    +    "it cannot contain a newline" (for the same reason: it can confuse our
    +    parsing loops).
    +
         Since comment_line_str is used in many parts of the code, it's hard to
         cover all possibilities with tests. We can convert the existing
         double-semicolon prefix test to show that "git status" works. And we'll
    @@ config.c: static int git_default_core_config(const char *var, const char *value,
      		else if (!strcasecmp(value, "auto"))
      			auto_comment_line_char = 1;
     -		else if (value[0] && !value[1]) {
    +-			if (value[0] == '\n')
    +-				return error(_("core.commentChar cannot be newline"));
     -			comment_line_str = xstrfmt("%c", value[0]);
     +		else if (value[0]) {
    ++			if (strchr(value, '\n'))
    ++				return error(_("core.commentChar cannot contain newline"));
     +			comment_line_str = xstrdup(value);
      			auto_comment_line_char = 0;
      		} else
    @@ config.c: static int git_default_core_config(const char *var, const char *value,
     
      ## t/t0030-stripspace.sh ##
     @@ t/t0030-stripspace.sh: test_expect_success 'strip comments with changed comment char' '
    - 	test -z "$(echo "; comment" | git -c core.commentchar=";" stripspace -s)"
    - '
      
    + test_expect_success 'newline as commentchar is forbidden' '
    + 	test_must_fail git -c core.commentChar="$LF" stripspace -s 2>err &&
    +-	grep "core.commentChar cannot be newline" err
    ++	grep "core.commentChar cannot contain newline" err
    ++'
    ++
     +test_expect_success 'empty commentchar is forbidden' '
     +	test_must_fail git -c core.commentchar= stripspace -s 2>err &&
     +	grep "core.commentChar must have at least one character" err
    -+'
    -+
    + '
    + 
      test_expect_success '-c with single line' '
    - 	printf "# foo\n" >expect &&
    - 	printf "foo" | git stripspace -c >actual &&
     
      ## t/t7507-commit-verbose.sh ##
     @@ t/t7507-commit-verbose.sh: test_expect_success 'verbose diff is stripped out with set core.commentChar' '

  [01/16]: config: forbid newline as core.commentChar
  [02/16]: strbuf: simplify comment-handling in add_lines() helper
  [03/16]: strbuf: avoid static variables in strbuf_add_commented_lines()
  [04/16]: commit: refactor base-case of adjust_comment_line_char()
  [05/16]: strbuf: avoid shadowing global comment_line_char name
  [06/16]: environment: store comment_line_char as a string
  [07/16]: strbuf: accept a comment string for strbuf_stripspace()
  [08/16]: strbuf: accept a comment string for strbuf_commented_addf()
  [09/16]: strbuf: accept a comment string for strbuf_add_commented_lines()
  [10/16]: prefer comment_line_str to comment_line_char for printing
  [11/16]: find multi-byte comment chars in NUL-terminated strings
  [12/16]: find multi-byte comment chars in unterminated buffers
  [13/16]: sequencer: handle multi-byte comment characters when writing todo list
  [14/16]: wt-status: drop custom comment-char stringification
  [15/16]: environment: drop comment_line_char compatibility macro
  [16/16]: config: allow multi-byte core.commentChar

 Documentation/config/core.txt |  4 ++-
 add-patch.c                   | 14 +++++-----
 builtin/am.c                  |  2 +-
 builtin/branch.c              |  8 +++---
 builtin/commit.c              | 21 +++++++--------
 builtin/merge.c               | 12 ++++-----
 builtin/notes.c               | 12 ++++-----
 builtin/rebase.c              |  2 +-
 builtin/stripspace.c          |  4 +--
 builtin/tag.c                 | 14 +++++-----
 builtin/worktree.c            |  2 +-
 commit.c                      |  3 ++-
 config.c                      |  8 +++---
 environment.c                 |  2 +-
 environment.h                 |  2 +-
 fmt-merge-msg.c               |  8 +++---
 gpg-interface.c               |  4 +--
 rebase-interactive.c          | 10 ++++----
 sequencer.c                   | 48 ++++++++++++++++++-----------------
 strbuf.c                      | 47 ++++++++++++++++++----------------
 strbuf.h                      |  9 ++++---
 t/t0030-stripspace.sh         | 10 ++++++++
 t/t7507-commit-verbose.sh     | 10 ++++++++
 t/t7508-status.sh             |  4 ++-
 trailer.c                     |  6 ++---
 wt-status.c                   | 31 +++++++++-------------
 26 files changed, 162 insertions(+), 135 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH v2 01/16] config: forbid newline as core.commentChar
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 02/16] strbuf: simplify comment-handling in add_lines() helper Jeff King
                                 ` (15 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

Since we usually look for a comment char while parsing line-oriented
files, setting core.commentChar to a single newline can confuse our code
quite a bit. For example, using it with "git commit" causes us to fail
to recognize any of the template as comments, including it in the config
message. Which kind of makes sense, since the template content is on its
own line (so no line can "start" with a newline). In other spots I would
not be surprised if you can create more mischief (e.g., violating loop
assumptions) but I didn't dig into it.

Since comment characters are a local preference, to some degree this is
a case of "if it hurts, don't do it". But given that this would be a
silly and pointless thing to do, and that it makes it harder to reason
about code parsing comment lines, let's just forbid it.

There are other cases that are perhaps questionable (e.g., setting the
comment char to a single space), but they seem to behave reasonably (at
least a simple "git commit" will correctly identify and strip the
template lines). So I haven't worried about going on a hunt for every
stupid thing a user might do to themselves, and just focused on the most
confusing case.

Signed-off-by: Jeff King <peff@peff.net>
---
In the string version I suppose you could set it to "\nexec rm -rf /" if
you really wanted to treat yourself to a fun "git rebase". Again, this
is all local, but it's perhaps nice to know that core.commentChar is not
a vector for arbitrary code execution.

(That of course made me wonder if setting it to just "exec rm -rf / "
would work, as the rest of the template line would be ignored by "rm";
but that is self-defeating as we'd recognize the line as a comment and
remove it).

 config.c              | 2 ++
 t/t0030-stripspace.sh | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/config.c b/config.c
index 3cfeb3d8bd..f561631374 100644
--- a/config.c
+++ b/config.c
@@ -1566,6 +1566,8 @@ static int git_default_core_config(const char *var, const char *value,
 		else if (!strcasecmp(value, "auto"))
 			auto_comment_line_char = 1;
 		else if (value[0] && !value[1]) {
+			if (value[0] == '\n')
+				return error(_("core.commentChar cannot be newline"));
 			comment_line_char = value[0];
 			auto_comment_line_char = 0;
 		} else
diff --git a/t/t0030-stripspace.sh b/t/t0030-stripspace.sh
index d1b3be8725..e399dd9189 100755
--- a/t/t0030-stripspace.sh
+++ b/t/t0030-stripspace.sh
@@ -401,6 +401,11 @@ test_expect_success 'strip comments with changed comment char' '
 	test -z "$(echo "; comment" | git -c core.commentchar=";" stripspace -s)"
 '
 
+test_expect_success 'newline as commentchar is forbidden' '
+	test_must_fail git -c core.commentChar="$LF" stripspace -s 2>err &&
+	grep "core.commentChar cannot be newline" err
+'
+
 test_expect_success '-c with single line' '
 	printf "# foo\n" >expect &&
 	printf "foo" | git stripspace -c >actual &&
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 02/16] strbuf: simplify comment-handling in add_lines() helper
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
  2024-03-12  9:17               ` [PATCH v2 01/16] config: forbid newline as core.commentChar Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 03/16] strbuf: avoid static variables in strbuf_add_commented_lines() Jeff King
                                 ` (14 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

In strbuf_add_commented_lines(), we prepare two strings with potential
prefixes: one with just the comment char, and one with an additional
space. In the add_lines() helper, we use the one without the extra space
for blank lines or lines starting with a tab.

While passing in two separate prefixes to the helper is very flexible,
it's more flexibility than we actually use (or are likely to use, since
the rules inside add_lines() only make sense if "prefix2" is a variant
of "prefix1" without the extra space). And setting up the two strings
makes refactoring in strbuf_add_commented_lines() awkward.

Instead, let's pass in a single string, and just let add_lines() add the
extra space to the result as appropriate.

We do still need to pass in a flag to trigger this behavior. The helper
is shared by strbuf_add_lines(), which passes in a NULL "prefix2" to
inhibit this extra handling.

Signed-off-by: Jeff King <peff@peff.net>
---
 strbuf.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 7827178d8e..689d8acd5e 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -340,18 +340,17 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...)
 }
 
 static void add_lines(struct strbuf *out,
-			const char *prefix1,
-			const char *prefix2,
-			const char *buf, size_t size)
+			const char *prefix,
+			const char *buf, size_t size,
+			int space_after_prefix)
 {
 	while (size) {
-		const char *prefix;
 		const char *next = memchr(buf, '\n', size);
 		next = next ? (next + 1) : (buf + size);
 
-		prefix = ((prefix2 && (buf[0] == '\n' || buf[0] == '\t'))
-			  ? prefix2 : prefix1);
 		strbuf_addstr(out, prefix);
+		if (space_after_prefix && buf[0] != '\n' && buf[0] != '\t')
+			strbuf_addch(out, ' ');
 		strbuf_add(out, buf, next - buf);
 		size -= next - buf;
 		buf = next;
@@ -362,14 +361,11 @@ static void add_lines(struct strbuf *out,
 void strbuf_add_commented_lines(struct strbuf *out, const char *buf,
 				size_t size, char comment_line_char)
 {
-	static char prefix1[3];
-	static char prefix2[2];
+	static char prefix[2];
 
-	if (prefix1[0] != comment_line_char) {
-		xsnprintf(prefix1, sizeof(prefix1), "%c ", comment_line_char);
-		xsnprintf(prefix2, sizeof(prefix2), "%c", comment_line_char);
-	}
-	add_lines(out, prefix1, prefix2, buf, size);
+	if (prefix[0] != comment_line_char)
+		xsnprintf(prefix, sizeof(prefix), "%c", comment_line_char);
+	add_lines(out, prefix, buf, size, 1);
 }
 
 void strbuf_commented_addf(struct strbuf *sb, char comment_line_char,
@@ -750,7 +746,7 @@ ssize_t strbuf_read_file(struct strbuf *sb, const char *path, size_t hint)
 void strbuf_add_lines(struct strbuf *out, const char *prefix,
 		      const char *buf, size_t size)
 {
-	add_lines(out, prefix, NULL, buf, size);
+	add_lines(out, prefix, buf, size, 0);
 }
 
 void strbuf_addstr_xml_quoted(struct strbuf *buf, const char *s)
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 03/16] strbuf: avoid static variables in strbuf_add_commented_lines()
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
  2024-03-12  9:17               ` [PATCH v2 01/16] config: forbid newline as core.commentChar Jeff King
  2024-03-12  9:17               ` [PATCH v2 02/16] strbuf: simplify comment-handling in add_lines() helper Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 04/16] commit: refactor base-case of adjust_comment_line_char() Jeff King
                                 ` (13 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

In strbuf_add_commented_lines(), we have to convert the single-byte
comment_line_char into a string to pass to add_lines(). We cache the
created string using a static-local variable. But this makes the
function non-reentrant, and it's doubtful that this provides any real
performance benefit given that we know the string always contains a
single character.

So let's just create it from scratch each time, and to give the compiler
the maximal opportunity to make it fast we'll ditch the over-complicated
xsnprintf() and just assign directly into the array.

Signed-off-by: Jeff King <peff@peff.net>
---
 strbuf.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index 689d8acd5e..ca80a2c77e 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -361,10 +361,10 @@ static void add_lines(struct strbuf *out,
 void strbuf_add_commented_lines(struct strbuf *out, const char *buf,
 				size_t size, char comment_line_char)
 {
-	static char prefix[2];
+	char prefix[2];
 
-	if (prefix[0] != comment_line_char)
-		xsnprintf(prefix, sizeof(prefix), "%c", comment_line_char);
+	prefix[0] = comment_line_char;
+	prefix[1] = '\0';
 	add_lines(out, prefix, buf, size, 1);
 }
 
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 04/16] commit: refactor base-case of adjust_comment_line_char()
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (2 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 03/16] strbuf: avoid static variables in strbuf_add_commented_lines() Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 05/16] strbuf: avoid shadowing global comment_line_char name Jeff King
                                 ` (12 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

When core.commentChar is set to "auto", we check a set of candidate
characters against the proposed buffer to see which if any can be used
without ambiguity. But before we do that, we optimize for the common
case that the default "#" is fine by just seeing if it is present in the
buffer at all.

The way we do this is a bit subtle, though: we assign the candidate
character to comment_line_char preemptively, then check if it works, and
return if it does. The subtle part is that sometimes setting
comment_line_char is important (after we return, the important outcome
is the fact that we have set the variable) and sometimes it is useless
(if our optimization fails, we go on to do the more careful checks and
eventually assign something else instead).

To make it more clear what is happening (and to make further refactoring
of comment_line_char easier), let's check our candidate character
directly, and then assign as part of returning if it worked out.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/commit.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index a91197245f..b2d05c0cc9 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -685,9 +685,10 @@ static void adjust_comment_line_char(const struct strbuf *sb)
 	char *candidate;
 	const char *p;
 
-	comment_line_char = candidates[0];
-	if (!memchr(sb->buf, comment_line_char, sb->len))
+	if (!memchr(sb->buf, candidates[0], sb->len)) {
+		comment_line_char = candidates[0];
 		return;
+	}
 
 	p = sb->buf;
 	candidate = strchr(candidates, *p);
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 05/16] strbuf: avoid shadowing global comment_line_char name
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (3 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 04/16] commit: refactor base-case of adjust_comment_line_char() Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 06/16] environment: store comment_line_char as a string Jeff King
                                 ` (11 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

Several comment-related strbuf functions take a comment_line_char
parameter. There's also a global comment_line_char variable, which is
closely related (most callers pass it in as this parameter). Let's avoid
shadowing the global name. This makes it more obvious that we're not
using the global value, and it will be especially helpful as we refactor
the global in future patches (in particular, any macro trickery wouldn't
work because the preprocessor doesn't respect scope).

We'll use "comment_prefix". That should be descriptive enough, and as a
bonus is more neutral with respect to the "char" type (since we'll
eventually swap it out for a string).

Signed-off-by: Jeff King <peff@peff.net>
---
 strbuf.c | 16 ++++++++--------
 strbuf.h |  8 ++++----
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/strbuf.c b/strbuf.c
index ca80a2c77e..a33aed6c07 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -359,16 +359,16 @@ static void add_lines(struct strbuf *out,
 }
 
 void strbuf_add_commented_lines(struct strbuf *out, const char *buf,
-				size_t size, char comment_line_char)
+				size_t size, char comment_prefix)
 {
 	char prefix[2];
 
-	prefix[0] = comment_line_char;
+	prefix[0] = comment_prefix;
 	prefix[1] = '\0';
 	add_lines(out, prefix, buf, size, 1);
 }
 
-void strbuf_commented_addf(struct strbuf *sb, char comment_line_char,
+void strbuf_commented_addf(struct strbuf *sb, char comment_prefix,
 			   const char *fmt, ...)
 {
 	va_list params;
@@ -379,7 +379,7 @@ void strbuf_commented_addf(struct strbuf *sb, char comment_line_char,
 	strbuf_vaddf(&buf, fmt, params);
 	va_end(params);
 
-	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_line_char);
+	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_prefix);
 	if (incomplete_line)
 		sb->buf[--sb->len] = '\0';
 
@@ -1001,10 +1001,10 @@ static size_t cleanup(char *line, size_t len)
  *
  * If last line does not have a newline at the end, one is added.
  *
- * Pass a non-NUL comment_line_char to skip every line starting
+ * Pass a non-NUL comment_prefix to skip every line starting
  * with it.
  */
-void strbuf_stripspace(struct strbuf *sb, char comment_line_char)
+void strbuf_stripspace(struct strbuf *sb, char comment_prefix)
 {
 	size_t empties = 0;
 	size_t i, j, len, newlen;
@@ -1017,8 +1017,8 @@ void strbuf_stripspace(struct strbuf *sb, char comment_line_char)
 		eol = memchr(sb->buf + i, '\n', sb->len - i);
 		len = eol ? eol - (sb->buf + i) + 1 : sb->len - i;
 
-		if (comment_line_char && len &&
-		    sb->buf[i] == comment_line_char) {
+		if (comment_prefix && len &&
+		    sb->buf[i] == comment_prefix) {
 			newlen = 0;
 			continue;
 		}
diff --git a/strbuf.h b/strbuf.h
index e959caca87..860fcec5fb 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -288,7 +288,7 @@ void strbuf_splice(struct strbuf *sb, size_t pos, size_t len,
  */
 void strbuf_add_commented_lines(struct strbuf *out,
 				const char *buf, size_t size,
-				char comment_line_char);
+				char comment_prefix);
 
 
 /**
@@ -379,7 +379,7 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...);
  * blank to the buffer.
  */
 __attribute__((format (printf, 3, 4)))
-void strbuf_commented_addf(struct strbuf *sb, char comment_line_char, const char *fmt, ...);
+void strbuf_commented_addf(struct strbuf *sb, char comment_prefix, const char *fmt, ...);
 
 __attribute__((format (printf,2,0)))
 void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap);
@@ -513,11 +513,11 @@ int strbuf_getcwd(struct strbuf *sb);
 int strbuf_normalize_path(struct strbuf *sb);
 
 /**
- * Strip whitespace from a buffer. If comment_line_char is non-NUL,
+ * Strip whitespace from a buffer. If comment_prefix is non-NUL,
  * then lines beginning with that character are considered comments,
  * thus removed.
  */
-void strbuf_stripspace(struct strbuf *buf, char comment_line_char);
+void strbuf_stripspace(struct strbuf *buf, char comment_prefix);
 
 static inline int strbuf_strip_suffix(struct strbuf *sb, const char *suffix)
 {
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 06/16] environment: store comment_line_char as a string
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (4 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 05/16] strbuf: avoid shadowing global comment_line_char name Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 07/16] strbuf: accept a comment string for strbuf_stripspace() Jeff King
                                 ` (10 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

We'd like to eventually support multi-byte comment prefixes, but the
comment_line_char variable is referenced in many spots, making the
transition difficult.

Let's start by storing the character in a NUL-terminated string. That
will let us switch code over incrementally to the string format, and we
can easily support the existing code with a macro wrapper (since we'll
continue to allow only a single-byte prefix, this will behave
identically).

Once all references to the "char" variable have been converted, we can
drop it and enable longer strings.

We'll still have to touch all of the spots that create or set the
variable in this patch, but there are only a few (reading the config,
and the "auto" character selector).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/commit.c | 4 ++--
 config.c         | 2 +-
 environment.c    | 2 +-
 environment.h    | 3 ++-
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index b2d05c0cc9..82229c3100 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -686,7 +686,7 @@ static void adjust_comment_line_char(const struct strbuf *sb)
 	const char *p;
 
 	if (!memchr(sb->buf, candidates[0], sb->len)) {
-		comment_line_char = candidates[0];
+		comment_line_str = xstrfmt("%c", candidates[0]);
 		return;
 	}
 
@@ -707,7 +707,7 @@ static void adjust_comment_line_char(const struct strbuf *sb)
 	if (!*p)
 		die(_("unable to select a comment character that is not used\n"
 		      "in the current commit message"));
-	comment_line_char = *p;
+	comment_line_str = xstrfmt("%c", *p);
 }
 
 static void prepare_amend_commit(struct commit *commit, struct strbuf *sb,
diff --git a/config.c b/config.c
index f561631374..7e5dbca4bd 100644
--- a/config.c
+++ b/config.c
@@ -1568,7 +1568,7 @@ static int git_default_core_config(const char *var, const char *value,
 		else if (value[0] && !value[1]) {
 			if (value[0] == '\n')
 				return error(_("core.commentChar cannot be newline"));
-			comment_line_char = value[0];
+			comment_line_str = xstrfmt("%c", value[0]);
 			auto_comment_line_char = 0;
 		} else
 			return error(_("core.commentChar should only be one ASCII character"));
diff --git a/environment.c b/environment.c
index 60706ea398..a73ba9c12c 100644
--- a/environment.c
+++ b/environment.c
@@ -110,7 +110,7 @@ int protect_ntfs = PROTECT_NTFS_DEFAULT;
  * The character that begins a commented line in user-editable file
  * that is subject to stripspace.
  */
-char comment_line_char = '#';
+const char *comment_line_str = "#";
 int auto_comment_line_char;
 
 /* Parallel index stat data preload? */
diff --git a/environment.h b/environment.h
index 5cec19cecc..1c7d0c2f74 100644
--- a/environment.h
+++ b/environment.h
@@ -8,7 +8,8 @@ struct strvec;
  * The character that begins a commented line in user-editable file
  * that is subject to stripspace.
  */
-extern char comment_line_char;
+#define comment_line_char (comment_line_str[0])
+extern const char *comment_line_str;
 extern int auto_comment_line_char;
 
 /*
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 07/16] strbuf: accept a comment string for strbuf_stripspace()
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (5 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 06/16] environment: store comment_line_char as a string Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 08/16] strbuf: accept a comment string for strbuf_commented_addf() Jeff King
                                 ` (9 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

As part of our transition to multi-byte comment characters, let's take a
NUL-terminated string pointer for strbuf_stripspace(), rather than a
single character. We can continue to support its feature of ignoring
comments by accepting a NULL pointer (as opposed to the current behavior
of a NUL byte).

All of the callers have to be adjusted, but they can all just pass
comment_line_str (or NULL).

Inside the function we detect comments by comparing the first byte of a
line to the comment character. We'll adjust that to use starts_with(),
which will match multiple bytes (though for now, of course, we still
only allow a single byte, so it's academic).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/am.c         | 2 +-
 builtin/branch.c     | 2 +-
 builtin/commit.c     | 2 +-
 builtin/notes.c      | 4 ++--
 builtin/rebase.c     | 2 +-
 builtin/stripspace.c | 2 +-
 builtin/tag.c        | 2 +-
 builtin/worktree.c   | 2 +-
 gpg-interface.c      | 4 ++--
 rebase-interactive.c | 2 +-
 sequencer.c          | 6 +++---
 strbuf.c             | 6 +++---
 strbuf.h             | 4 ++--
 13 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index d1990d7edc..5bc72d7822 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1286,7 +1286,7 @@ static int parse_mail(struct am_state *state, const char *mail)
 
 	strbuf_addstr(&msg, "\n\n");
 	strbuf_addbuf(&msg, &mi.log_message);
-	strbuf_stripspace(&msg, '\0');
+	strbuf_stripspace(&msg, NULL);
 
 	assert(!state->author_name);
 	state->author_name = strbuf_detach(&author_name, NULL);
diff --git a/builtin/branch.c b/builtin/branch.c
index b3cbb7fd44..f6091f3438 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -684,7 +684,7 @@ static int edit_branch_description(const char *branch_name)
 		strbuf_release(&buf);
 		return -1;
 	}
-	strbuf_stripspace(&buf, comment_line_char);
+	strbuf_stripspace(&buf, comment_line_str);
 
 	strbuf_addf(&name, "branch.%s.description", branch_name);
 	if (buf.len || exists)
diff --git a/builtin/commit.c b/builtin/commit.c
index 82229c3100..9b139fc795 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -890,7 +890,7 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 	s->hints = 0;
 
 	if (clean_message_contents)
-		strbuf_stripspace(&sb, '\0');
+		strbuf_stripspace(&sb, NULL);
 
 	if (signoff)
 		append_signoff(&sb, ignored_log_message_bytes(sb.buf, sb.len), 0);
diff --git a/builtin/notes.c b/builtin/notes.c
index caf20fd5bd..ae981085ea 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -223,7 +223,7 @@ static void prepare_note_data(const struct object_id *object, struct note_data *
 			die(_("please supply the note contents using either -m or -F option"));
 		}
 		if (d->stripspace)
-			strbuf_stripspace(&d->buf, comment_line_char);
+			strbuf_stripspace(&d->buf, comment_line_str);
 	}
 }
 
@@ -264,7 +264,7 @@ static void concat_messages(struct note_data *d)
 		if ((d->stripspace == UNSPECIFIED &&
 		     d->messages[i]->stripspace == STRIPSPACE) ||
 		    d->stripspace == STRIPSPACE)
-			strbuf_stripspace(&d->buf, 0);
+			strbuf_stripspace(&d->buf, NULL);
 		strbuf_reset(&msg);
 	}
 	strbuf_release(&msg);
diff --git a/builtin/rebase.c b/builtin/rebase.c
index be787690bd..dc17c4727f 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -204,7 +204,7 @@ static int edit_todo_file(unsigned flags)
 	if (strbuf_read_file(&todo_list.buf, todo_file, 0) < 0)
 		return error_errno(_("could not read '%s'."), todo_file);
 
-	strbuf_stripspace(&todo_list.buf, comment_line_char);
+	strbuf_stripspace(&todo_list.buf, comment_line_str);
 	res = edit_todo_list(the_repository, &todo_list, &new_todo, NULL, NULL, flags);
 	if (!res && todo_list_write_to_file(the_repository, &new_todo, todo_file,
 					    NULL, NULL, -1, flags & ~(TODO_LIST_SHORTEN_IDS)))
diff --git a/builtin/stripspace.c b/builtin/stripspace.c
index 7b700a9fb1..434ac490cb 100644
--- a/builtin/stripspace.c
+++ b/builtin/stripspace.c
@@ -59,7 +59,7 @@ int cmd_stripspace(int argc, const char **argv, const char *prefix)
 
 	if (mode == STRIP_DEFAULT || mode == STRIP_COMMENTS)
 		strbuf_stripspace(&buf,
-			  mode == STRIP_COMMENTS ? comment_line_char : '\0');
+			  mode == STRIP_COMMENTS ? comment_line_str : NULL);
 	else
 		comment_lines(&buf);
 
diff --git a/builtin/tag.c b/builtin/tag.c
index 19a7e06bf4..07327d3c04 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -310,7 +310,7 @@ static void create_tag(const struct object_id *object, const char *object_ref,
 
 	if (opt->cleanup_mode != CLEANUP_NONE)
 		strbuf_stripspace(buf,
-		  opt->cleanup_mode == CLEANUP_ALL ? comment_line_char : '\0');
+		  opt->cleanup_mode == CLEANUP_ALL ? comment_line_str : NULL);
 
 	if (!opt->message_given && !buf->len)
 		die(_("no tag message?"));
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 9c76b62b02..f0aa962cf8 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -657,7 +657,7 @@ static int can_use_local_refs(const struct add_opts *opts)
 			strbuf_add_real_path(&path, get_worktree_git_dir(NULL));
 			strbuf_addstr(&path, "/HEAD");
 			strbuf_read_file(&contents, path.buf, 64);
-			strbuf_stripspace(&contents, 0);
+			strbuf_stripspace(&contents, NULL);
 			strbuf_strip_suffix(&contents, "\n");
 
 			warning(_("HEAD points to an invalid (or orphaned) reference.\n"
diff --git a/gpg-interface.c b/gpg-interface.c
index 95e764acb1..b5993385ff 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -586,8 +586,8 @@ static int verify_ssh_signed_buffer(struct signature_check *sigc,
 		}
 	}
 
-	strbuf_stripspace(&ssh_keygen_out, '\0');
-	strbuf_stripspace(&ssh_keygen_err, '\0');
+	strbuf_stripspace(&ssh_keygen_out, NULL);
+	strbuf_stripspace(&ssh_keygen_err, NULL);
 	/* Add stderr outputs to show the user actual ssh-keygen errors */
 	strbuf_add(&ssh_keygen_out, ssh_principals_err.buf, ssh_principals_err.len);
 	strbuf_add(&ssh_keygen_out, ssh_keygen_err.buf, ssh_keygen_err.len);
diff --git a/rebase-interactive.c b/rebase-interactive.c
index d9718409b3..6dfc33e4e3 100644
--- a/rebase-interactive.c
+++ b/rebase-interactive.c
@@ -130,7 +130,7 @@ int edit_todo_list(struct repository *r, struct todo_list *todo_list,
 	if (launch_sequence_editor(todo_file, &new_todo->buf, NULL))
 		return -2;
 
-	strbuf_stripspace(&new_todo->buf, comment_line_char);
+	strbuf_stripspace(&new_todo->buf, comment_line_str);
 	if (initial && new_todo->buf.len == 0)
 		return -3;
 
diff --git a/sequencer.c b/sequencer.c
index 5c6f541126..4819265bf1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1154,7 +1154,7 @@ void cleanup_message(struct strbuf *msgbuf,
 		strbuf_setlen(msgbuf, wt_status_locate_end(msgbuf->buf, msgbuf->len));
 	if (cleanup_mode != COMMIT_MSG_CLEANUP_NONE)
 		strbuf_stripspace(msgbuf,
-		  cleanup_mode == COMMIT_MSG_CLEANUP_ALL ? comment_line_char : '\0');
+		  cleanup_mode == COMMIT_MSG_CLEANUP_ALL ? comment_line_str : NULL);
 }
 
 /*
@@ -1186,7 +1186,7 @@ int template_untouched(const struct strbuf *sb, const char *template_file,
 		return 0;
 
 	strbuf_stripspace(&tmpl,
-	  cleanup_mode == COMMIT_MSG_CLEANUP_ALL ? comment_line_char : '\0');
+	  cleanup_mode == COMMIT_MSG_CLEANUP_ALL ? comment_line_str : NULL);
 	if (!skip_prefix(sb->buf, tmpl.buf, &start))
 		start = sb->buf;
 	strbuf_release(&tmpl);
@@ -1559,7 +1559,7 @@ static int try_to_commit(struct repository *r,
 
 	if (cleanup != COMMIT_MSG_CLEANUP_NONE)
 		strbuf_stripspace(msg,
-		  cleanup == COMMIT_MSG_CLEANUP_ALL ? comment_line_char : '\0');
+		  cleanup == COMMIT_MSG_CLEANUP_ALL ? comment_line_str : NULL);
 	if ((flags & EDIT_MSG) && message_is_empty(msg, cleanup)) {
 		res = 1; /* run 'git commit' to display error message */
 		goto out;
diff --git a/strbuf.c b/strbuf.c
index a33aed6c07..e9b6127e76 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -1001,10 +1001,10 @@ static size_t cleanup(char *line, size_t len)
  *
  * If last line does not have a newline at the end, one is added.
  *
- * Pass a non-NUL comment_prefix to skip every line starting
+ * Pass a non-NULL comment_prefix to skip every line starting
  * with it.
  */
-void strbuf_stripspace(struct strbuf *sb, char comment_prefix)
+void strbuf_stripspace(struct strbuf *sb, const char *comment_prefix)
 {
 	size_t empties = 0;
 	size_t i, j, len, newlen;
@@ -1018,7 +1018,7 @@ void strbuf_stripspace(struct strbuf *sb, char comment_prefix)
 		len = eol ? eol - (sb->buf + i) + 1 : sb->len - i;
 
 		if (comment_prefix && len &&
-		    sb->buf[i] == comment_prefix) {
+		    starts_with(sb->buf + i, comment_prefix)) {
 			newlen = 0;
 			continue;
 		}
diff --git a/strbuf.h b/strbuf.h
index 860fcec5fb..dc4710adbb 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -513,11 +513,11 @@ int strbuf_getcwd(struct strbuf *sb);
 int strbuf_normalize_path(struct strbuf *sb);
 
 /**
- * Strip whitespace from a buffer. If comment_prefix is non-NUL,
+ * Strip whitespace from a buffer. If comment_prefix is non-NULL,
  * then lines beginning with that character are considered comments,
  * thus removed.
  */
-void strbuf_stripspace(struct strbuf *buf, char comment_prefix);
+void strbuf_stripspace(struct strbuf *buf, const char *comment_prefix);
 
 static inline int strbuf_strip_suffix(struct strbuf *sb, const char *suffix)
 {
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 08/16] strbuf: accept a comment string for strbuf_commented_addf()
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (6 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 07/16] strbuf: accept a comment string for strbuf_stripspace() Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 09/16] strbuf: accept a comment string for strbuf_add_commented_lines() Jeff King
                                 ` (8 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

As part of our transition to multi-byte comment characters, let's take a
NUL-terminated string pointer for strbuf_commented_addf() rather than a
single character.

All of the callers have to be adjusted, but they can just pass
comment_line_str rather than comment_line_char.

Note that we rely on strbuf_add_commented_lines() under the hood, so
we'll cheat a bit to squeeze our string into a single character (for now
the two are equivalent, and we'll address this TODO in the next patch).

Signed-off-by: Jeff King <peff@peff.net>
---
 add-patch.c          |  8 ++++----
 builtin/branch.c     |  2 +-
 builtin/merge.c      |  8 ++++----
 builtin/tag.c        |  4 ++--
 rebase-interactive.c |  2 +-
 sequencer.c          |  4 ++--
 strbuf.c             | 10 ++++++++--
 strbuf.h             |  2 +-
 wt-status.c          |  2 +-
 9 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/add-patch.c b/add-patch.c
index 68f525b35c..7390677795 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -1105,11 +1105,11 @@ static int edit_hunk_manually(struct add_p_state *s, struct hunk *hunk)
 	size_t i;
 
 	strbuf_reset(&s->buf);
-	strbuf_commented_addf(&s->buf, comment_line_char,
+	strbuf_commented_addf(&s->buf, comment_line_str,
 			      _("Manual hunk edit mode -- see bottom for "
 				"a quick guide.\n"));
 	render_hunk(s, hunk, 0, 0, &s->buf);
-	strbuf_commented_addf(&s->buf, comment_line_char,
+	strbuf_commented_addf(&s->buf, comment_line_str,
 			      _("---\n"
 				"To remove '%c' lines, make them ' ' lines "
 				"(context).\n"
@@ -1118,13 +1118,13 @@ static int edit_hunk_manually(struct add_p_state *s, struct hunk *hunk)
 			      s->mode->is_reverse ? '+' : '-',
 			      s->mode->is_reverse ? '-' : '+',
 			      comment_line_char);
-	strbuf_commented_addf(&s->buf, comment_line_char, "%s",
+	strbuf_commented_addf(&s->buf, comment_line_str, "%s",
 			      _(s->mode->edit_hunk_hint));
 	/*
 	 * TRANSLATORS: 'it' refers to the patch mentioned in the previous
 	 * messages.
 	 */
-	strbuf_commented_addf(&s->buf, comment_line_char,
+	strbuf_commented_addf(&s->buf, comment_line_str,
 			      _("If it does not apply cleanly, you will be "
 				"given an opportunity to\n"
 				"edit again.  If all lines of the hunk are "
diff --git a/builtin/branch.c b/builtin/branch.c
index f6091f3438..2d8c89e9ac 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -673,7 +673,7 @@ static int edit_branch_description(const char *branch_name)
 	exists = !read_branch_desc(&buf, branch_name);
 	if (!buf.len || buf.buf[buf.len-1] != '\n')
 		strbuf_addch(&buf, '\n');
-	strbuf_commented_addf(&buf, comment_line_char,
+	strbuf_commented_addf(&buf, comment_line_str,
 		    _("Please edit the description for the branch\n"
 		      "  %s\n"
 		      "Lines starting with '%c' will be stripped.\n"),
diff --git a/builtin/merge.c b/builtin/merge.c
index a0ba1f9815..4e47434708 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -852,15 +852,15 @@ static void prepare_to_commit(struct commit_list *remoteheads)
 		strbuf_addch(&msg, '\n');
 		if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS) {
 			wt_status_append_cut_line(&msg);
-			strbuf_commented_addf(&msg, comment_line_char, "\n");
+			strbuf_commented_addf(&msg, comment_line_str, "\n");
 		}
-		strbuf_commented_addf(&msg, comment_line_char,
+		strbuf_commented_addf(&msg, comment_line_str,
 				      _(merge_editor_comment));
 		if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS)
-			strbuf_commented_addf(&msg, comment_line_char,
+			strbuf_commented_addf(&msg, comment_line_str,
 					      _(scissors_editor_comment));
 		else
-			strbuf_commented_addf(&msg, comment_line_char,
+			strbuf_commented_addf(&msg, comment_line_str,
 				_(no_scissors_editor_comment), comment_line_char);
 	}
 	if (signoff)
diff --git a/builtin/tag.c b/builtin/tag.c
index 07327d3c04..1c708785bf 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -291,10 +291,10 @@ static void create_tag(const struct object_id *object, const char *object_ref,
 			struct strbuf buf = STRBUF_INIT;
 			strbuf_addch(&buf, '\n');
 			if (opt->cleanup_mode == CLEANUP_ALL)
-				strbuf_commented_addf(&buf, comment_line_char,
+				strbuf_commented_addf(&buf, comment_line_str,
 				      _(tag_template), tag, comment_line_char);
 			else
-				strbuf_commented_addf(&buf, comment_line_char,
+				strbuf_commented_addf(&buf, comment_line_str,
 				      _(tag_template_nocleanup), tag, comment_line_char);
 			write_or_die(fd, buf.buf, buf.len);
 			strbuf_release(&buf);
diff --git a/rebase-interactive.c b/rebase-interactive.c
index 6dfc33e4e3..affc93a8e4 100644
--- a/rebase-interactive.c
+++ b/rebase-interactive.c
@@ -71,7 +71,7 @@ void append_todo_help(int command_count,
 
 	if (!edit_todo) {
 		strbuf_addch(buf, '\n');
-		strbuf_commented_addf(buf, comment_line_char,
+		strbuf_commented_addf(buf, comment_line_str,
 				      Q_("Rebase %s onto %s (%d command)",
 					 "Rebase %s onto %s (%d commands)",
 					 command_count),
diff --git a/sequencer.c b/sequencer.c
index 4819265bf1..051929c9f1 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -667,11 +667,11 @@ void append_conflicts_hint(struct index_state *istate,
 	}
 
 	strbuf_addch(msgbuf, '\n');
-	strbuf_commented_addf(msgbuf, comment_line_char, "Conflicts:\n");
+	strbuf_commented_addf(msgbuf, comment_line_str, "Conflicts:\n");
 	for (i = 0; i < istate->cache_nr;) {
 		const struct cache_entry *ce = istate->cache[i++];
 		if (ce_stage(ce)) {
-			strbuf_commented_addf(msgbuf, comment_line_char,
+			strbuf_commented_addf(msgbuf, comment_line_str,
 					      "\t%s\n", ce->name);
 			while (i < istate->cache_nr &&
 			       !strcmp(ce->name, istate->cache[i]->name))
diff --git a/strbuf.c b/strbuf.c
index e9b6127e76..76d02e0920 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -368,7 +368,7 @@ void strbuf_add_commented_lines(struct strbuf *out, const char *buf,
 	add_lines(out, prefix, buf, size, 1);
 }
 
-void strbuf_commented_addf(struct strbuf *sb, char comment_prefix,
+void strbuf_commented_addf(struct strbuf *sb, const char *comment_prefix,
 			   const char *fmt, ...)
 {
 	va_list params;
@@ -379,7 +379,13 @@ void strbuf_commented_addf(struct strbuf *sb, char comment_prefix,
 	strbuf_vaddf(&buf, fmt, params);
 	va_end(params);
 
-	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_prefix);
+	/*
+	 * TODO Our commented_lines helper does not yet understand
+	 * comment strings. But since we know that the strings are
+	 * always single-char, we can cheat for the moment, and
+	 * fix this later.
+	 */
+	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_prefix[0]);
 	if (incomplete_line)
 		sb->buf[--sb->len] = '\0';
 
diff --git a/strbuf.h b/strbuf.h
index dc4710adbb..b128ca539a 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -379,7 +379,7 @@ void strbuf_addf(struct strbuf *sb, const char *fmt, ...);
  * blank to the buffer.
  */
 __attribute__((format (printf, 3, 4)))
-void strbuf_commented_addf(struct strbuf *sb, char comment_prefix, const char *fmt, ...);
+void strbuf_commented_addf(struct strbuf *sb, const char *comment_prefix, const char *fmt, ...);
 
 __attribute__((format (printf,2,0)))
 void strbuf_vaddf(struct strbuf *sb, const char *fmt, va_list ap);
diff --git a/wt-status.c b/wt-status.c
index 7108a92b52..3845e1d383 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1103,7 +1103,7 @@ void wt_status_append_cut_line(struct strbuf *buf)
 {
 	const char *explanation = _("Do not modify or remove the line above.\nEverything below it will be ignored.");
 
-	strbuf_commented_addf(buf, comment_line_char, "%s", cut_line);
+	strbuf_commented_addf(buf, comment_line_str, "%s", cut_line);
 	strbuf_add_commented_lines(buf, explanation, strlen(explanation), comment_line_char);
 }
 
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 09/16] strbuf: accept a comment string for strbuf_add_commented_lines()
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (7 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 08/16] strbuf: accept a comment string for strbuf_commented_addf() Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 10/16] prefer comment_line_str to comment_line_char for printing Jeff King
                                 ` (7 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

As part of our transition to multi-byte comment characters, let's take a
NUL-terminated string pointer for strbuf_add_commented_lines() rather
than a single character.

All of the callers have to be adjusted; most can just pass
comment_line_str rather than comment_line_char.

And now our "cheat" in strbuf_commented_addf() can go away, as we can
take the full string from it.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/notes.c      |  8 ++++----
 builtin/stripspace.c |  2 +-
 fmt-merge-msg.c      |  6 +++---
 rebase-interactive.c |  6 +++---
 sequencer.c          |  8 ++++----
 strbuf.c             | 16 +++-------------
 strbuf.h             |  2 +-
 wt-status.c          |  4 ++--
 8 files changed, 21 insertions(+), 31 deletions(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index ae981085ea..cb011303e6 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -179,7 +179,7 @@ static void write_commented_object(int fd, const struct object_id *object)
 
 	if (strbuf_read(&buf, show.out, 0) < 0)
 		die_errno(_("could not read 'show' output"));
-	strbuf_add_commented_lines(&cbuf, buf.buf, buf.len, comment_line_char);
+	strbuf_add_commented_lines(&cbuf, buf.buf, buf.len, comment_line_str);
 	write_or_die(fd, cbuf.buf, cbuf.len);
 
 	strbuf_release(&cbuf);
@@ -207,10 +207,10 @@ static void prepare_note_data(const struct object_id *object, struct note_data *
 			copy_obj_to_fd(fd, old_note);
 
 		strbuf_addch(&buf, '\n');
-		strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_char);
+		strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_str);
 		strbuf_add_commented_lines(&buf, _(note_template), strlen(_(note_template)),
-					   comment_line_char);
-		strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_char);
+					   comment_line_str);
+		strbuf_add_commented_lines(&buf, "\n", strlen("\n"), comment_line_str);
 		write_or_die(fd, buf.buf, buf.len);
 
 		write_commented_object(fd, object);
diff --git a/builtin/stripspace.c b/builtin/stripspace.c
index 434ac490cb..e5626e5126 100644
--- a/builtin/stripspace.c
+++ b/builtin/stripspace.c
@@ -13,7 +13,7 @@ static void comment_lines(struct strbuf *buf)
 	size_t len;
 
 	msg = strbuf_detach(buf, &len);
-	strbuf_add_commented_lines(buf, msg, len, comment_line_char);
+	strbuf_add_commented_lines(buf, msg, len, comment_line_str);
 	free(msg);
 }
 
diff --git a/fmt-merge-msg.c b/fmt-merge-msg.c
index 66e47449a0..79e8aad086 100644
--- a/fmt-merge-msg.c
+++ b/fmt-merge-msg.c
@@ -510,7 +510,7 @@ static void fmt_tag_signature(struct strbuf *tagbuf,
 	if (sig->len) {
 		strbuf_addch(tagbuf, '\n');
 		strbuf_add_commented_lines(tagbuf, sig->buf, sig->len,
-					   comment_line_char);
+					   comment_line_str);
 	}
 }
 
@@ -557,7 +557,7 @@ static void fmt_merge_msg_sigs(struct strbuf *out)
 				strbuf_add_commented_lines(&tagline,
 						origins.items[first_tag].string,
 						strlen(origins.items[first_tag].string),
-						comment_line_char);
+						comment_line_str);
 				strbuf_insert(&tagbuf, 0, tagline.buf,
 					      tagline.len);
 				strbuf_release(&tagline);
@@ -566,7 +566,7 @@ static void fmt_merge_msg_sigs(struct strbuf *out)
 			strbuf_add_commented_lines(&tagbuf,
 					origins.items[i].string,
 					strlen(origins.items[i].string),
-					comment_line_char);
+					comment_line_str);
 			fmt_tag_signature(&tagbuf, &sig, buf, len);
 		}
 		strbuf_release(&payload);
diff --git a/rebase-interactive.c b/rebase-interactive.c
index affc93a8e4..c343e16fcd 100644
--- a/rebase-interactive.c
+++ b/rebase-interactive.c
@@ -78,7 +78,7 @@ void append_todo_help(int command_count,
 				      shortrevisions, shortonto, command_count);
 	}
 
-	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_char);
+	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_str);
 
 	if (get_missing_commit_check_level() == MISSING_COMMIT_CHECK_ERROR)
 		msg = _("\nDo not remove any line. Use 'drop' "
@@ -87,7 +87,7 @@ void append_todo_help(int command_count,
 		msg = _("\nIf you remove a line here "
 			 "THAT COMMIT WILL BE LOST.\n");
 
-	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_char);
+	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_str);
 
 	if (edit_todo)
 		msg = _("\nYou are editing the todo file "
@@ -98,7 +98,7 @@ void append_todo_help(int command_count,
 		msg = _("\nHowever, if you remove everything, "
 			"the rebase will be aborted.\n\n");
 
-	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_char);
+	strbuf_add_commented_lines(buf, msg, strlen(msg), comment_line_str);
 }
 
 int edit_todo_list(struct repository *r, struct todo_list *todo_list,
diff --git a/sequencer.c b/sequencer.c
index 051929c9f1..d12c5a8a03 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1853,7 +1853,7 @@ static void add_commented_lines(struct strbuf *buf, const void *str, size_t len)
 		s += count;
 		len -= count;
 	}
-	strbuf_add_commented_lines(buf, s, len, comment_line_char);
+	strbuf_add_commented_lines(buf, s, len, comment_line_str);
 }
 
 /* Does the current fixup chain contain a squash command? */
@@ -1952,7 +1952,7 @@ static int append_squash_message(struct strbuf *buf, const char *body,
 	strbuf_addf(buf, _(nth_commit_msg_fmt),
 		    ++opts->current_fixup_count + 1);
 	strbuf_addstr(buf, "\n\n");
-	strbuf_add_commented_lines(buf, body, commented_len, comment_line_char);
+	strbuf_add_commented_lines(buf, body, commented_len, comment_line_str);
 	/* buf->buf may be reallocated so store an offset into the buffer */
 	fixup_off = buf->len;
 	strbuf_addstr(buf, body + commented_len);
@@ -2043,7 +2043,7 @@ static int update_squash_messages(struct repository *r,
 		strbuf_addstr(&buf, "\n\n");
 		if (is_fixup_flag(command, flag))
 			strbuf_add_commented_lines(&buf, body, strlen(body),
-						   comment_line_char);
+						   comment_line_str);
 		else
 			strbuf_addstr(&buf, body);
 
@@ -2063,7 +2063,7 @@ static int update_squash_messages(struct repository *r,
 			    ++opts->current_fixup_count + 1);
 		strbuf_addstr(&buf, "\n\n");
 		strbuf_add_commented_lines(&buf, body, strlen(body),
-					   comment_line_char);
+					   comment_line_str);
 	} else
 		return error(_("unknown command: %d"), command);
 	repo_unuse_commit_buffer(r, commit, message);
diff --git a/strbuf.c b/strbuf.c
index 76d02e0920..7c8f582127 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -359,13 +359,9 @@ static void add_lines(struct strbuf *out,
 }
 
 void strbuf_add_commented_lines(struct strbuf *out, const char *buf,
-				size_t size, char comment_prefix)
+				size_t size, const char *comment_prefix)
 {
-	char prefix[2];
-
-	prefix[0] = comment_prefix;
-	prefix[1] = '\0';
-	add_lines(out, prefix, buf, size, 1);
+	add_lines(out, comment_prefix, buf, size, 1);
 }
 
 void strbuf_commented_addf(struct strbuf *sb, const char *comment_prefix,
@@ -379,13 +375,7 @@ void strbuf_commented_addf(struct strbuf *sb, const char *comment_prefix,
 	strbuf_vaddf(&buf, fmt, params);
 	va_end(params);
 
-	/*
-	 * TODO Our commented_lines helper does not yet understand
-	 * comment strings. But since we know that the strings are
-	 * always single-char, we can cheat for the moment, and
-	 * fix this later.
-	 */
-	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_prefix[0]);
+	strbuf_add_commented_lines(sb, buf.buf, buf.len, comment_prefix);
 	if (incomplete_line)
 		sb->buf[--sb->len] = '\0';
 
diff --git a/strbuf.h b/strbuf.h
index b128ca539a..58dddf2777 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -288,7 +288,7 @@ void strbuf_splice(struct strbuf *sb, size_t pos, size_t len,
  */
 void strbuf_add_commented_lines(struct strbuf *out,
 				const char *buf, size_t size,
-				char comment_prefix);
+				const char *comment_prefix);
 
 
 /**
diff --git a/wt-status.c b/wt-status.c
index 3845e1d383..ae623e760e 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1028,7 +1028,7 @@ static void wt_longstatus_print_submodule_summary(struct wt_status *s, int uncom
 	if (s->display_comment_prefix) {
 		size_t len;
 		summary_content = strbuf_detach(&summary, &len);
-		strbuf_add_commented_lines(&summary, summary_content, len, comment_line_char);
+		strbuf_add_commented_lines(&summary, summary_content, len, comment_line_str);
 		free(summary_content);
 	}
 
@@ -1104,7 +1104,7 @@ void wt_status_append_cut_line(struct strbuf *buf)
 	const char *explanation = _("Do not modify or remove the line above.\nEverything below it will be ignored.");
 
 	strbuf_commented_addf(buf, comment_line_str, "%s", cut_line);
-	strbuf_add_commented_lines(buf, explanation, strlen(explanation), comment_line_char);
+	strbuf_add_commented_lines(buf, explanation, strlen(explanation), comment_line_str);
 }
 
 void wt_status_add_cut_line(struct wt_status *s)
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 10/16] prefer comment_line_str to comment_line_char for printing
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (8 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 09/16] strbuf: accept a comment string for strbuf_add_commented_lines() Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 11/16] find multi-byte comment chars in NUL-terminated strings Jeff King
                                 ` (6 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

As part of our transition to multi-byte comment characters, we should
use the string variable rather than the historical character variable.
All of the sites adjusted here are just swapping out "%c" for "%s" in
format strings, or strbuf_addch() for strbuf_addstr(). The type system
and printf-attribute give the compiler enough information to make sure
our formats and variable changes all match (especially important for
cases where the format string is defined far away from its use, like
prepare_to_commit() in commit.c).

Signed-off-by: Jeff King <peff@peff.net>
---
 add-patch.c      |  4 ++--
 builtin/branch.c |  4 ++--
 builtin/commit.c | 12 ++++++------
 builtin/merge.c  |  4 ++--
 builtin/tag.c    |  8 ++++----
 fmt-merge-msg.c  |  2 +-
 sequencer.c      | 20 ++++++++++----------
 wt-status.c      | 10 +++++-----
 8 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/add-patch.c b/add-patch.c
index 7390677795..4a10237d50 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -1114,10 +1114,10 @@ static int edit_hunk_manually(struct add_p_state *s, struct hunk *hunk)
 				"To remove '%c' lines, make them ' ' lines "
 				"(context).\n"
 				"To remove '%c' lines, delete them.\n"
-				"Lines starting with %c will be removed.\n"),
+				"Lines starting with %s will be removed.\n"),
 			      s->mode->is_reverse ? '+' : '-',
 			      s->mode->is_reverse ? '-' : '+',
-			      comment_line_char);
+			      comment_line_str);
 	strbuf_commented_addf(&s->buf, comment_line_str, "%s",
 			      _(s->mode->edit_hunk_hint));
 	/*
diff --git a/builtin/branch.c b/builtin/branch.c
index 2d8c89e9ac..faf6ea1b7b 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -676,8 +676,8 @@ static int edit_branch_description(const char *branch_name)
 	strbuf_commented_addf(&buf, comment_line_str,
 		    _("Please edit the description for the branch\n"
 		      "  %s\n"
-		      "Lines starting with '%c' will be stripped.\n"),
-		    branch_name, comment_line_char);
+		      "Lines starting with '%s' will be stripped.\n"),
+		    branch_name, comment_line_str);
 	write_file_buf(edit_description(), buf.buf, buf.len);
 	strbuf_reset(&buf);
 	if (launch_editor(edit_description(), &buf, NULL)) {
diff --git a/builtin/commit.c b/builtin/commit.c
index 9b139fc795..066dc42a3d 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -910,18 +910,18 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 		struct ident_split ci, ai;
 		const char *hint_cleanup_all = allow_empty_message ?
 			_("Please enter the commit message for your changes."
-			  " Lines starting\nwith '%c' will be ignored.\n") :
+			  " Lines starting\nwith '%s' will be ignored.\n") :
 			_("Please enter the commit message for your changes."
-			  " Lines starting\nwith '%c' will be ignored, and an empty"
+			  " Lines starting\nwith '%s' will be ignored, and an empty"
 			  " message aborts the commit.\n");
 		const char *hint_cleanup_space = allow_empty_message ?
 			_("Please enter the commit message for your changes."
 			  " Lines starting\n"
-			  "with '%c' will be kept; you may remove them"
+			  "with '%s' will be kept; you may remove them"
 			  " yourself if you want to.\n") :
 			_("Please enter the commit message for your changes."
 			  " Lines starting\n"
-			  "with '%c' will be kept; you may remove them"
+			  "with '%s' will be kept; you may remove them"
 			  " yourself if you want to.\n"
 			  "An empty message aborts the commit.\n");
 		if (whence != FROM_COMMIT) {
@@ -944,12 +944,12 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
 
 		fprintf(s->fp, "\n");
 		if (cleanup_mode == COMMIT_MSG_CLEANUP_ALL)
-			status_printf(s, GIT_COLOR_NORMAL, hint_cleanup_all, comment_line_char);
+			status_printf(s, GIT_COLOR_NORMAL, hint_cleanup_all, comment_line_str);
 		else if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS) {
 			if (whence == FROM_COMMIT)
 				wt_status_add_cut_line(s);
 		} else /* COMMIT_MSG_CLEANUP_SPACE, that is. */
-			status_printf(s, GIT_COLOR_NORMAL, hint_cleanup_space, comment_line_char);
+			status_printf(s, GIT_COLOR_NORMAL, hint_cleanup_space, comment_line_str);
 
 		/*
 		 * These should never fail because they come from our own
diff --git a/builtin/merge.c b/builtin/merge.c
index 4e47434708..1e33aa49a0 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -821,7 +821,7 @@ static const char scissors_editor_comment[] =
 N_("An empty message aborts the commit.\n");
 
 static const char no_scissors_editor_comment[] =
-N_("Lines starting with '%c' will be ignored, and an empty message aborts\n"
+N_("Lines starting with '%s' will be ignored, and an empty message aborts\n"
    "the commit.\n");
 
 static void write_merge_heads(struct commit_list *);
@@ -861,7 +861,7 @@ static void prepare_to_commit(struct commit_list *remoteheads)
 					      _(scissors_editor_comment));
 		else
 			strbuf_commented_addf(&msg, comment_line_str,
-				_(no_scissors_editor_comment), comment_line_char);
+				_(no_scissors_editor_comment), comment_line_str);
 	}
 	if (signoff)
 		append_signoff(&msg, ignored_log_message_bytes(msg.buf, msg.len), 0);
diff --git a/builtin/tag.c b/builtin/tag.c
index 1c708785bf..721d07a589 100644
--- a/builtin/tag.c
+++ b/builtin/tag.c
@@ -158,11 +158,11 @@ static int do_sign(struct strbuf *buffer)
 
 static const char tag_template[] =
 	N_("\nWrite a message for tag:\n  %s\n"
-	"Lines starting with '%c' will be ignored.\n");
+	"Lines starting with '%s' will be ignored.\n");
 
 static const char tag_template_nocleanup[] =
 	N_("\nWrite a message for tag:\n  %s\n"
-	"Lines starting with '%c' will be kept; you may remove them"
+	"Lines starting with '%s' will be kept; you may remove them"
 	" yourself if you want to.\n");
 
 static int git_tag_config(const char *var, const char *value,
@@ -292,10 +292,10 @@ static void create_tag(const struct object_id *object, const char *object_ref,
 			strbuf_addch(&buf, '\n');
 			if (opt->cleanup_mode == CLEANUP_ALL)
 				strbuf_commented_addf(&buf, comment_line_str,
-				      _(tag_template), tag, comment_line_char);
+				      _(tag_template), tag, comment_line_str);
 			else
 				strbuf_commented_addf(&buf, comment_line_str,
-				      _(tag_template_nocleanup), tag, comment_line_char);
+				      _(tag_template_nocleanup), tag, comment_line_str);
 			write_or_die(fd, buf.buf, buf.len);
 			strbuf_release(&buf);
 		}
diff --git a/fmt-merge-msg.c b/fmt-merge-msg.c
index 79e8aad086..ae201e21db 100644
--- a/fmt-merge-msg.c
+++ b/fmt-merge-msg.c
@@ -321,7 +321,7 @@ static void credit_people(struct strbuf *out,
 	     skip_prefix(me, them->items->string, &me) &&
 	     starts_with(me, " <")))
 		return;
-	strbuf_addf(out, "\n%c %s ", comment_line_char, label);
+	strbuf_addf(out, "\n%s %s ", comment_line_str, label);
 	add_people_count(out, them);
 }
 
diff --git a/sequencer.c b/sequencer.c
index d12c5a8a03..b75d0c098d 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -663,7 +663,7 @@ void append_conflicts_hint(struct index_state *istate,
 	if (cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS) {
 		strbuf_addch(msgbuf, '\n');
 		wt_status_append_cut_line(msgbuf);
-		strbuf_addch(msgbuf, comment_line_char);
+		strbuf_addstr(msgbuf, comment_line_str);
 	}
 
 	strbuf_addch(msgbuf, '\n');
@@ -1948,7 +1948,7 @@ static int append_squash_message(struct strbuf *buf, const char *body,
 	     (starts_with(body, "squash!") || starts_with(body, "fixup!"))))
 		commented_len = commit_subject_length(body);
 
-	strbuf_addf(buf, "\n%c ", comment_line_char);
+	strbuf_addf(buf, "\n%s ", comment_line_str);
 	strbuf_addf(buf, _(nth_commit_msg_fmt),
 		    ++opts->current_fixup_count + 1);
 	strbuf_addstr(buf, "\n\n");
@@ -2008,7 +2008,7 @@ static int update_squash_messages(struct repository *r,
 		eol = buf.buf[0] != comment_line_char ?
 			buf.buf : strchrnul(buf.buf, '\n');
 
-		strbuf_addf(&header, "%c ", comment_line_char);
+		strbuf_addf(&header, "%s ", comment_line_str);
 		strbuf_addf(&header, _(combined_commit_msg_fmt),
 			    opts->current_fixup_count + 2);
 		strbuf_splice(&buf, 0, eol - buf.buf, header.buf, header.len);
@@ -2034,9 +2034,9 @@ static int update_squash_messages(struct repository *r,
 			repo_unuse_commit_buffer(r, head_commit, head_message);
 			return error(_("cannot write '%s'"), rebase_path_fixup_msg());
 		}
-		strbuf_addf(&buf, "%c ", comment_line_char);
+		strbuf_addf(&buf, "%s ", comment_line_str);
 		strbuf_addf(&buf, _(combined_commit_msg_fmt), 2);
-		strbuf_addf(&buf, "\n%c ", comment_line_char);
+		strbuf_addf(&buf, "\n%s ", comment_line_str);
 		strbuf_addstr(&buf, is_fixup_flag(command, flag) ?
 			      _(skip_first_commit_msg_str) :
 			      _(first_commit_msg_str));
@@ -2058,7 +2058,7 @@ static int update_squash_messages(struct repository *r,
 	if (command == TODO_SQUASH || is_fixup_flag(command, flag)) {
 		res = append_squash_message(&buf, body, command, opts, flag);
 	} else if (command == TODO_FIXUP) {
-		strbuf_addf(&buf, "\n%c ", comment_line_char);
+		strbuf_addf(&buf, "\n%s ", comment_line_str);
 		strbuf_addf(&buf, _(skip_nth_commit_msg_fmt),
 			    ++opts->current_fixup_count + 1);
 		strbuf_addstr(&buf, "\n\n");
@@ -5667,8 +5667,8 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 				    oid_to_hex(&commit->object.oid),
 				    oneline.buf);
 			if (is_empty)
-				strbuf_addf(&buf, " %c empty",
-					    comment_line_char);
+				strbuf_addf(&buf, " %s empty",
+					    comment_line_str);
 
 			FLEX_ALLOC_STR(entry, string, buf.buf);
 			oidcpy(&entry->entry.oid, &commit->object.oid);
@@ -5758,7 +5758,7 @@ static int make_script_with_merges(struct pretty_print_context *pp,
 		entry = oidmap_get(&state.commit2label, &commit->object.oid);
 
 		if (entry)
-			strbuf_addf(out, "\n%c Branch %s\n", comment_line_char, entry->string);
+			strbuf_addf(out, "\n%s Branch %s\n", comment_line_str, entry->string);
 		else
 			strbuf_addch(out, '\n');
 
@@ -5895,7 +5895,7 @@ int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
 			    oid_to_hex(&commit->object.oid));
 		pretty_print_commit(&pp, commit, out);
 		if (is_empty)
-			strbuf_addf(out, " %c empty", comment_line_char);
+			strbuf_addf(out, " %s empty", comment_line_str);
 		strbuf_addch(out, '\n');
 	}
 	if (skipped_commit)
diff --git a/wt-status.c b/wt-status.c
index ae623e760e..6201a97de0 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -70,7 +70,7 @@ static void status_vprintf(struct wt_status *s, int at_bol, const char *color,
 	strbuf_vaddf(&sb, fmt, ap);
 	if (!sb.len) {
 		if (s->display_comment_prefix) {
-			strbuf_addch(&sb, comment_line_char);
+			strbuf_addstr(&sb, comment_line_str);
 			if (!trail)
 				strbuf_addch(&sb, ' ');
 		}
@@ -85,7 +85,7 @@ static void status_vprintf(struct wt_status *s, int at_bol, const char *color,
 
 		strbuf_reset(&linebuf);
 		if (at_bol && s->display_comment_prefix) {
-			strbuf_addch(&linebuf, comment_line_char);
+			strbuf_addstr(&linebuf, comment_line_str);
 			if (*line != '\n' && *line != '\t')
 				strbuf_addch(&linebuf, ' ');
 		}
@@ -1090,7 +1090,7 @@ size_t wt_status_locate_end(const char *s, size_t len)
 	const char *p;
 	struct strbuf pattern = STRBUF_INIT;
 
-	strbuf_addf(&pattern, "\n%c %s", comment_line_char, cut_line);
+	strbuf_addf(&pattern, "\n%s %s", comment_line_str, cut_line);
 	if (starts_with(s, pattern.buf + 1))
 		len = 0;
 	else if ((p = strstr(s, pattern.buf)))
@@ -1218,8 +1218,8 @@ static void wt_longstatus_print_tracking(struct wt_status *s)
 				 "%s%.*s", comment_line_string,
 				 (int)(ep - cp), cp);
 	if (s->display_comment_prefix)
-		color_fprintf_ln(s->fp, color(WT_STATUS_HEADER, s), "%c",
-				 comment_line_char);
+		color_fprintf_ln(s->fp, color(WT_STATUS_HEADER, s), "%s",
+				 comment_line_str);
 	else
 		fputs("\n", s->fp);
 	strbuf_release(&sb);
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 11/16] find multi-byte comment chars in NUL-terminated strings
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (9 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 10/16] prefer comment_line_str to comment_line_char for printing Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 12/16] find multi-byte comment chars in unterminated buffers Jeff King
                                 ` (5 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

Several parts of the code need to identify lines that begin with the
comment character, and do so with a simple byte equality check. As part
of the transition to handling multi-byte characters, we need to match
all of the bytes. For cases where we are looking in a NUL-terminated
string, we can just use starts_with(), which checks all of the
characters in comment_line_str.

Note that we can drop the "line.len" check in wt-status.c's
read_rebase_todolist(). The starts_with() function handles the case of
an empty haystack buffer (it will always return false for a non-empty
prefix).

Signed-off-by: Jeff King <peff@peff.net>
---
 add-patch.c | 2 +-
 sequencer.c | 2 +-
 trailer.c   | 2 +-
 wt-status.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/add-patch.c b/add-patch.c
index 4a10237d50..d599ca53e1 100644
--- a/add-patch.c
+++ b/add-patch.c
@@ -1139,7 +1139,7 @@ static int edit_hunk_manually(struct add_p_state *s, struct hunk *hunk)
 	for (i = 0; i < s->buf.len; ) {
 		size_t next = find_next_line(&s->buf, i);
 
-		if (s->buf.buf[i] != comment_line_char)
+		if (!starts_with(s->buf.buf + i, comment_line_str))
 			strbuf_add(&s->plain, s->buf.buf + i, next - i);
 		i = next;
 	}
diff --git a/sequencer.c b/sequencer.c
index b75d0c098d..42125e57a4 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -2005,7 +2005,7 @@ static int update_squash_messages(struct repository *r,
 			return error(_("could not read '%s'"),
 				rebase_path_squash_msg());
 
-		eol = buf.buf[0] != comment_line_char ?
+		eol = !starts_with(buf.buf, comment_line_str) ?
 			buf.buf : strchrnul(buf.buf, '\n');
 
 		strbuf_addf(&header, "%s ", comment_line_str);
diff --git a/trailer.c b/trailer.c
index ef9df4af55..fe18faf6c5 100644
--- a/trailer.c
+++ b/trailer.c
@@ -1013,7 +1013,7 @@ static void parse_trailers(struct trailer_info *info,
 	for (i = 0; i < info->trailer_nr; i++) {
 		int separator_pos;
 		char *trailer = info->trailers[i];
-		if (trailer[0] == comment_line_char)
+		if (starts_with(trailer, comment_line_str))
 			continue;
 		separator_pos = find_separator(trailer, separators);
 		if (separator_pos >= 1) {
diff --git a/wt-status.c b/wt-status.c
index 6201a97de0..8753d59f90 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1386,7 +1386,7 @@ static int read_rebase_todolist(const char *fname, struct string_list *lines)
 			  git_path("%s", fname));
 	}
 	while (!strbuf_getline_lf(&line, f)) {
-		if (line.len && line.buf[0] == comment_line_char)
+		if (starts_with(line.buf, comment_line_str))
 			continue;
 		strbuf_trim(&line);
 		if (!line.len)
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 12/16] find multi-byte comment chars in unterminated buffers
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (10 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 11/16] find multi-byte comment chars in NUL-terminated strings Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 13/16] sequencer: handle multi-byte comment characters when writing todo list Jeff King
                                 ` (4 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

As with the previous patch, we need to swap out single-byte matching for
something like starts_with() to match all bytes of a multi-byte comment
character. But for cases where the buffer is not NUL-terminated (and we
instead have an explicit size or end pointer), it's not safe to use
starts_with(), as it might walk off the end of the buffer.

Let's introduce a new starts_with_mem() that does the same thing but
also accepts the length of the "haystack" str and makes sure not to walk
past it.

Note that in most cases the existing code did not need a length check at
all, since it was written in a way that knew we had at least one byte
available (and that was all we checked). So I had to read each one to
find the appropriate bounds. The one exception is sequencer.c's
add_commented_lines(), where we can actually get rid of the length
check. Just like starts_with(), our starts_with_mem() handles an empty
haystack variable by not matching (assuming a non-empty prefix).

A few notes on the implementation of starts_with_mem():

  - it would be equally correct to take an "end" pointer (and indeed,
    many of the callers have this and have to subtract to come up with
    the length). I think taking a ptr/size combo is a more usual
    interface for our codebase, though, and has the added benefit that
    the function signature makes it harder to mix up the three
    parameters.

  - we could obviously build starts_with() on top of this by passing
    strlen(str) as the length. But it's possible that starts_with() is a
    relatively hot code path, and it should not pay that penalty (it can
    generally return an answer proportional to the size of the prefix,
    not the whole string).

  - it naively feels like xstrncmpz() should be able to do the same
    thing, but that's not quite true. If you pass the length of the
    haystack buffer, then strncmp() finds that a shorter prefix string
    is "less than" than the haystack, even if the haystack starts with
    the prefix. If you pass the length of the prefix, then you risk
    reading past the end of the haystack if it is shorter than the
    prefix. So I think we really do need a new function.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c    |  3 ++-
 sequencer.c |  4 ++--
 strbuf.c    | 11 +++++++++++
 strbuf.h    |  1 +
 trailer.c   |  4 ++--
 5 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/commit.c b/commit.c
index 467be9f7f9..9cfbe9d657 100644
--- a/commit.c
+++ b/commit.c
@@ -1797,7 +1797,8 @@ size_t ignored_log_message_bytes(const char *buf, size_t len)
 		else
 			next_line++;
 
-		if (buf[bol] == comment_line_char || buf[bol] == '\n') {
+		if (starts_with_mem(buf + bol, cutoff - bol, comment_line_str) ||
+		    buf[bol] == '\n') {
 			/* is this the first of the run of comments? */
 			if (!boc)
 				boc = bol;
diff --git a/sequencer.c b/sequencer.c
index 42125e57a4..ef84832855 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1842,7 +1842,7 @@ static int is_fixup_flag(enum todo_command command, unsigned flag)
 static void add_commented_lines(struct strbuf *buf, const void *str, size_t len)
 {
 	const char *s = str;
-	while (len > 0 && s[0] == comment_line_char) {
+	while (starts_with_mem(s, len, comment_line_str)) {
 		size_t count;
 		const char *n = memchr(s, '\n', len);
 		if (!n)
@@ -2564,7 +2564,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
 	/* left-trim */
 	bol += strspn(bol, " \t");
 
-	if (bol == eol || *bol == '\r' || *bol == comment_line_char) {
+	if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) {
 		item->command = TODO_COMMENT;
 		item->commit = NULL;
 		item->arg_offset = bol - buf;
diff --git a/strbuf.c b/strbuf.c
index 7c8f582127..291bdc2a65 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -24,6 +24,17 @@ int istarts_with(const char *str, const char *prefix)
 			return 0;
 }
 
+int starts_with_mem(const char *str, size_t len, const char *prefix)
+{
+	const char *end = str + len;
+	for (; ; str++, prefix++) {
+		if (!*prefix)
+			return 1;
+		else if (str == end || *str != *prefix)
+			return 0;
+	}
+}
+
 int skip_to_optional_arg_default(const char *str, const char *prefix,
 				 const char **arg, const char *def)
 {
diff --git a/strbuf.h b/strbuf.h
index 58dddf2777..3156d6ea8c 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -673,6 +673,7 @@ char *xstrfmt(const char *fmt, ...);
 
 int starts_with(const char *str, const char *prefix);
 int istarts_with(const char *str, const char *prefix);
+int starts_with_mem(const char *str, size_t len, const char *prefix);
 
 /*
  * If the string "str" is the same as the string in "prefix", then the "arg"
diff --git a/trailer.c b/trailer.c
index fe18faf6c5..fdb0b8137e 100644
--- a/trailer.c
+++ b/trailer.c
@@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
 
 	/* The first paragraph is the title and cannot be trailers */
 	for (s = buf; s < buf + len; s = next_line(s)) {
-		if (s[0] == comment_line_char)
+		if (starts_with_mem(s, buf + len - s, comment_line_str))
 			continue;
 		if (is_blank_line(s))
 			break;
@@ -902,7 +902,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
 		const char **p;
 		ssize_t separator_pos;
 
-		if (bol[0] == comment_line_char) {
+		if (starts_with_mem(bol, buf + len - bol, comment_line_str)) {
 			non_trailer_lines += possible_continuation_lines;
 			possible_continuation_lines = 0;
 			continue;
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 13/16] sequencer: handle multi-byte comment characters when writing todo list
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (11 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 12/16] find multi-byte comment chars in unterminated buffers Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 14/16] wt-status: drop custom comment-char stringification Jeff King
                                 ` (3 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

We already match multi-byte comment characters in parse_insn_line(),
thanks to the previous commit, yielding a TODO_COMMENT entry. But in
todo_list_to_strbuf(), we may call command_to_char() to convert that
back into something we can output.

We can't just return comment_line_char anymore, since it may require
multiple bytes. Instead, we'll return "0" for this case, which is the
same thing we'd return for a command which does not have a single-letter
abbreviation (e.g., "revert" or "noop"). There is only a single caller
of command_to_char(), and upon seeing "0" it falls back to outputting
the full name via command_to_string(). So we can handle TODO_COMMENT
there, returning the full string.

Note that there are many other callers of command_to_string(), which
will now behave differently if they pass TODO_COMMENT. But we would not
expect that to happen; prior to this commit, the function just calls
die() in this case. And looking at those callers, that makes sense;
e.g., do_pick_commit() will only be called when servicing a pick
command, and should never be called for a comment in the first place.

Signed-off-by: Jeff King <peff@peff.net>
---
 sequencer.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/sequencer.c b/sequencer.c
index ef84832855..a8fdf00e89 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1781,14 +1781,16 @@ static const char *command_to_string(const enum todo_command command)
 {
 	if (command < TODO_COMMENT)
 		return todo_command_info[command].str;
+	if (command == TODO_COMMENT)
+		return comment_line_str;
 	die(_("unknown command: %d"), command);
 }
 
 static char command_to_char(const enum todo_command command)
 {
 	if (command < TODO_COMMENT)
 		return todo_command_info[command].c;
-	return comment_line_char;
+	return 0;
 }
 
 static int is_noop(const enum todo_command command)
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 14/16] wt-status: drop custom comment-char stringification
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (12 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 13/16] sequencer: handle multi-byte comment characters when writing todo list Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 15/16] environment: drop comment_line_char compatibility macro Jeff King
                                 ` (2 subsequent siblings)
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

In wt_longstatus_print_tracking() we may conditionally show a comment
prefix based on the wt_status->display_comment_prefix flag. We handle
that by creating a local "comment_line_string" that is either the empty
string or the comment character followed by a space.

For a single-byte comment, the maximum length of this string is 2 (plus
a NUL byte). But to handle multi-byte comment characters, it can be
arbitrarily large. One way to handle this is to just call
xstrfmt("%s ", comment_line_str), and then free it when we're done.

But we can simplify things further by just conditionally switching
between our prefix string and an empty string when formatting. We
couldn't just do that with the previous code, because the comment
character was a single byte. There's no way to have a "%c" format switch
between some character and "no character at all". Whereas with "%s" you
can switch between some string and the empty string. So now that we have
a comment string and not a comment char, we can just use it directly
when formatting. Do note that we have to also conditionally add the
trailing space at the same time.

Signed-off-by: Jeff King <peff@peff.net>
---
 wt-status.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/wt-status.c b/wt-status.c
index 8753d59f90..7217ff30c5 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1180,8 +1180,6 @@ static void wt_longstatus_print_tracking(struct wt_status *s)
 	struct strbuf sb = STRBUF_INIT;
 	const char *cp, *ep, *branch_name;
 	struct branch *branch;
-	char comment_line_string[3];
-	int i;
 	uint64_t t_begin = 0;
 
 	assert(s->branch && !s->is_initial);
@@ -1206,16 +1204,11 @@ static void wt_longstatus_print_tracking(struct wt_status *s)
 		}
 	}
 
-	i = 0;
-	if (s->display_comment_prefix) {
-		comment_line_string[i++] = comment_line_char;
-		comment_line_string[i++] = ' ';
-	}
-	comment_line_string[i] = '\0';
-
 	for (cp = sb.buf; (ep = strchr(cp, '\n')) != NULL; cp = ep + 1)
 		color_fprintf_ln(s->fp, color(WT_STATUS_HEADER, s),
-				 "%s%.*s", comment_line_string,
+				 "%s%s%.*s",
+				 s->display_comment_prefix ? comment_line_str : "",
+				 s->display_comment_prefix ? " " : "",
 				 (int)(ep - cp), cp);
 	if (s->display_comment_prefix)
 		color_fprintf_ln(s->fp, color(WT_STATUS_HEADER, s), "%s",
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 15/16] environment: drop comment_line_char compatibility macro
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (13 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 14/16] wt-status: drop custom comment-char stringification Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-12  9:17               ` [PATCH v2 16/16] config: allow multi-byte core.commentChar Jeff King
  2024-03-12 14:40               ` [PATCH v2 0/16] " phillip.wood123
  16 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

There is no longer any code which references the single-byte
comment_line_char. Let's drop it, clearing the way for true multi-byte
entries in comment_line_str.

It's possible there are topics in flight that have added new references
to comment_line_char. But we would prefer to fail compilation (and then
fix it) upon merging with this, rather than have them quietly ignore the
bytes after the first.

Signed-off-by: Jeff King <peff@peff.net>
---
 environment.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/environment.h b/environment.h
index 1c7d0c2f74..05fd94d7be 100644
--- a/environment.h
+++ b/environment.h
@@ -8,7 +8,6 @@ struct strvec;
  * The character that begins a commented line in user-editable file
  * that is subject to stripspace.
  */
-#define comment_line_char (comment_line_str[0])
 extern const char *comment_line_str;
 extern int auto_comment_line_char;
 
-- 
2.44.0.481.gf1a6d20963


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (14 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 15/16] environment: drop comment_line_char compatibility macro Jeff King
@ 2024-03-12  9:17               ` Jeff King
  2024-03-13 18:23                 ` Kristoffer Haugsbakk
  2024-03-12 14:40               ` [PATCH v2 0/16] " phillip.wood123
  16 siblings, 1 reply; 82+ messages in thread
From: Jeff King @ 2024-03-12  9:17 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

Now that all of the code handles multi-byte comment characters, it's
safe to allow users to set them.

There is one special case I kept: we still will not allow an empty
string for the commentChar. While it might make sense in some contexts
(e.g., output where you don't want any comment prefix), there are plenty
where it will behave badly (e.g., all of our starts_with() checks will
indicate that every line is a comment!). It might be reasonable to
assign some meaningful semantics, but it would probably involve checking
how each site behaves. In the interim let's forbid it and we can loosen
things later.

Likewise, the "commentChar cannot be a newline" rule is now extended to
"it cannot contain a newline" (for the same reason: it can confuse our
parsing loops).

Since comment_line_str is used in many parts of the code, it's hard to
cover all possibilities with tests. We can convert the existing
double-semicolon prefix test to show that "git status" works. And we'll
give it a more challenging case in t7507, where we confirm that
git-commit strips out the commit template along with any --verbose text
when reading the edited commit message back in. That covers the basics,
though it's possible there could be issues in more exotic spots (e.g.,
the sequencer todo list uses its own code).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/config/core.txt |  4 +++-
 config.c                      | 10 +++++-----
 t/t0030-stripspace.sh         |  7 ++++++-
 t/t7507-commit-verbose.sh     | 10 ++++++++++
 t/t7508-status.sh             |  4 +++-
 5 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 0e8c2832bf..c86b8c8408 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -523,7 +523,9 @@ core.commentChar::
 	Commands such as `commit` and `tag` that let you edit
 	messages consider a line that begins with this character
 	commented, and removes them after the editor returns
-	(default '#').
+	(default '#'). Note that this option can take values larger than
+	a byte (whether a single multi-byte character, or you
+	could even go wild with a multi-character sequence).
 +
 If set to "auto", `git-commit` would select a character that is not
 the beginning character of any line in existing commit messages.
diff --git a/config.c b/config.c
index 7e5dbca4bd..92c752ed9f 100644
--- a/config.c
+++ b/config.c
@@ -1565,13 +1565,13 @@ static int git_default_core_config(const char *var, const char *value,
 			return config_error_nonbool(var);
 		else if (!strcasecmp(value, "auto"))
 			auto_comment_line_char = 1;
-		else if (value[0] && !value[1]) {
-			if (value[0] == '\n')
-				return error(_("core.commentChar cannot be newline"));
-			comment_line_str = xstrfmt("%c", value[0]);
+		else if (value[0]) {
+			if (strchr(value, '\n'))
+				return error(_("core.commentChar cannot contain newline"));
+			comment_line_str = xstrdup(value);
 			auto_comment_line_char = 0;
 		} else
-			return error(_("core.commentChar should only be one ASCII character"));
+			return error(_("core.commentChar must have at least one character"));
 		return 0;
 	}
 
diff --git a/t/t0030-stripspace.sh b/t/t0030-stripspace.sh
index e399dd9189..a161faf702 100755
--- a/t/t0030-stripspace.sh
+++ b/t/t0030-stripspace.sh
@@ -403,7 +403,12 @@ test_expect_success 'strip comments with changed comment char' '
 
 test_expect_success 'newline as commentchar is forbidden' '
 	test_must_fail git -c core.commentChar="$LF" stripspace -s 2>err &&
-	grep "core.commentChar cannot be newline" err
+	grep "core.commentChar cannot contain newline" err
+'
+
+test_expect_success 'empty commentchar is forbidden' '
+	test_must_fail git -c core.commentchar= stripspace -s 2>err &&
+	grep "core.commentChar must have at least one character" err
 '
 
 test_expect_success '-c with single line' '
diff --git a/t/t7507-commit-verbose.sh b/t/t7507-commit-verbose.sh
index c3281b192e..4c7db19ce7 100755
--- a/t/t7507-commit-verbose.sh
+++ b/t/t7507-commit-verbose.sh
@@ -101,6 +101,16 @@ test_expect_success 'verbose diff is stripped out with set core.commentChar' '
 	test_grep "Aborting commit due to empty commit message." err
 '
 
+test_expect_success 'verbose diff is stripped with multi-byte comment char' '
+	(
+		GIT_EDITOR=cat &&
+		export GIT_EDITOR &&
+		test_must_fail git -c core.commentchar="foo>" commit -a -v >out 2>err
+	) &&
+	grep "^foo> " out &&
+	test_grep "Aborting commit due to empty commit message." err
+'
+
 test_expect_success 'status does not verbose without --verbose' '
 	git status >actual &&
 	! grep "^diff --git" actual
diff --git a/t/t7508-status.sh b/t/t7508-status.sh
index a3c18a4fc2..10ed8b32bc 100755
--- a/t/t7508-status.sh
+++ b/t/t7508-status.sh
@@ -1403,7 +1403,9 @@ test_expect_success "status (core.commentchar with submodule summary)" '
 
 test_expect_success "status (core.commentchar with two chars with submodule summary)" '
 	test_config core.commentchar ";;" &&
-	test_must_fail git -c status.displayCommentPrefix=true status
+	sed "s/^/;/" <expect >expect.double &&
+	git -c status.displayCommentPrefix=true status >output &&
+	test_cmp expect.double output
 '
 
 test_expect_success "--ignore-submodules=all suppresses submodule summary" '
-- 
2.44.0.481.gf1a6d20963

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-12  8:19                       ` Jeff King
@ 2024-03-12 14:36                         ` phillip.wood123
  2024-03-13  6:23                           ` Jeff King
  0 siblings, 1 reply; 82+ messages in thread
From: phillip.wood123 @ 2024-03-12 14:36 UTC (permalink / raw)
  To: Jeff King, phillip.wood
  Cc: Junio C Hamano, René Scharfe, git, Dragan Simic,
	Kristoffer Haugsbakk, Manlio Perillo

Hi Peff

On 12/03/2024 08:19, Jeff King wrote:
> On Fri, Mar 08, 2024 at 04:20:12PM +0000, Phillip Wood wrote:
>> We could certainly leave it as-is and tell users they are only hurting
>> themselves if they complain when it does not work.
> 
> That was mostly my plan. To some degree I think this is orthogonal to my
> series. You can already set core.commentChar to space or newline, and
> I'm sure the results are not very good. Actually, I guess it is easy to
> try:
> 
>    git -c core.commentChar=$'\n' commit --allow-empty
> 
> treats everything as not-a-comment.
> 
> Maybe it's worth forbidding this at the start of the series, and then
> carrying it through. I really do think newline is the most special
> character here, just because it's obviously going to be meaningful to
> all of our line-oriented parsing. So you'll get weird results, as
> opposed to broken multibyte characters, where things would still work if
> you choose to consistently use them (and arguably we cannot even define
> "broken" as the user can use a different encoding).

I agree newline is a special case compared to broken multibyte 
characters, I see you've disallowed it in v2 which seems like a good idea.

> Likewise, I guess people might complain that their core.commentChar is
> NFD and their editor writes out NFC characters or something, and we
> don't match. I was hoping we could just punt on that and nobody would
> ever notice (certainly I think it is OK to punt for now and somebody who
> truly cares can make a utf8_starts_with() or similar).
> 
>>> Also, what exactly is the definition of "nonsense" will become can
>>> of worms.  I can sympathise if somebody wants to use "#\t" to give
>>> themselves a bit more room than usual on the left for visibility,
>>> for example, so there might be a case to want whitespace characters.
>>
>> That's fair, maybe we could just ban leading whitespace if we do decide to
>> restrict core.commentChar
> 
> Leading whitespace actually does work, though I think you'd be slightly
> insane to use it.

For "git rebase" in only works if you edit the todo list with "git 
rebase --edit-todo" which calls strbuf_stripspace() and therefore 
parse_insn_line() never sees the comments. If you edit the todo list 
directly then it will error out. You can see this with

     git -c core.commentChar=' ' rebase -x 'echo " this is a comment" 
 >>"$(git rev-parse --git-path rebase-merge/git-rebase-todo)"' HEAD^

which successfully picks HEAD but then gives

     error: invalid command 'this'

when it tries to parse the todo list after the exec command is run. 
Given it is broken already I'm not sure we should worry about it here. 
In any case it is not clear how much we should worry about problems 
caused by users editing the todo list without using "git rebase 
--edit-todo". There is code in parse_insn_line() which is clearly there 
to handle direct editing of the file but I don't think it is tested and 
directly editing the file probably bypasses the 
rebase.missingCommitsCheck checks as well.

Best Wishes

Phillip

> I'm currently using "! COMMENT !" (after using a unicode char for a few
> days). It's horribly ugly, but I wanted to see if any bugs cropped up
> (and vim's built-in git syntax highlighting colors it correctly ;) ).
> 
> -Peff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 0/16] allow multi-byte core.commentChar
  2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
                                 ` (15 preceding siblings ...)
  2024-03-12  9:17               ` [PATCH v2 16/16] config: allow multi-byte core.commentChar Jeff King
@ 2024-03-12 14:40               ` phillip.wood123
  2024-03-12 20:30                 ` Junio C Hamano
  16 siblings, 1 reply; 82+ messages in thread
From: phillip.wood123 @ 2024-03-12 14:40 UTC (permalink / raw)
  To: Jeff King, git
  Cc: Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

Hi Peff

On 12/03/2024 09:10, Jeff King wrote:
> Here's a revised version of my series. It incorporates the fixups I sent
> (which I think Junio had applied already), and incorporates a new patch
> at the beginning to forbid newlines.
> 
> I _didn't_ convert any of the starts_with_mem() call to starts_with().
> I'm on the fence on whether that is simplifying things or creating
> potential confusion/bugs later.
> 
> If we don't like the new patch 1 (or if we prefer to do it on top; there
> is really not much reason to prefer one or the other), then this should
> otherwise be the same as what Junio has already queued as
> jk/core-comment-char.

Looking through the range-diff it addresses all of my (sequencer 
focused) comments on v1.

Best Wishes

Phillip

> Range diff (from v1, without my fixups) is below.
> 
>   -:  ---------- >  1:  86efec435d config: forbid newline as core.commentChar
>   1:  be18aa04e3 =  2:  7c016e5dc3 strbuf: simplify comment-handling in add_lines() helper
>   2:  0f8ea2a86d =  3:  2b4170b5f0 strbuf: avoid static variables in strbuf_add_commented_lines()
>   3:  9b56d9f4f0 =  4:  24ca214986 commit: refactor base-case of adjust_comment_line_char()
>   4:  0a191e5588 =  5:  9f6433dbe6 strbuf: avoid shadowing global comment_line_char name
>   5:  f41e196138 !  6:  d0f32f10f9 environment: store comment_line_char as a string
>      @@ builtin/commit.c: static void adjust_comment_line_char(const struct strbuf *sb)
>       
>        ## config.c ##
>       @@ config.c: static int git_default_core_config(const char *var, const char *value,
>      - 		else if (!strcasecmp(value, "auto"))
>      - 			auto_comment_line_char = 1;
>        		else if (value[0] && !value[1]) {
>      + 			if (value[0] == '\n')
>      + 				return error(_("core.commentChar cannot be newline"));
>       -			comment_line_char = value[0];
>       +			comment_line_str = xstrfmt("%c", value[0]);
>        			auto_comment_line_char = 0;
>   6:  84261af2ed !  7:  2c91628564 strbuf: accept a comment string for strbuf_stripspace()
>      @@ Commit message
>       
>           Signed-off-by: Jeff King <peff@peff.net>
>       
>      + ## builtin/am.c ##
>      +@@ builtin/am.c: static int parse_mail(struct am_state *state, const char *mail)
>      +
>      + 	strbuf_addstr(&msg, "\n\n");
>      + 	strbuf_addbuf(&msg, &mi.log_message);
>      +-	strbuf_stripspace(&msg, '\0');
>      ++	strbuf_stripspace(&msg, NULL);
>      +
>      + 	assert(!state->author_name);
>      + 	state->author_name = strbuf_detach(&author_name, NULL);
>      +
>        ## builtin/branch.c ##
>       @@ builtin/branch.c: static int edit_branch_description(const char *branch_name)
>        		strbuf_release(&buf);
>      @@ builtin/branch.c: static int edit_branch_description(const char *branch_name)
>        	strbuf_addf(&name, "branch.%s.description", branch_name);
>        	if (buf.len || exists)
>       
>      + ## builtin/commit.c ##
>      +@@ builtin/commit.c: static int prepare_to_commit(const char *index_file, const char *prefix,
>      + 	s->hints = 0;
>      +
>      + 	if (clean_message_contents)
>      +-		strbuf_stripspace(&sb, '\0');
>      ++		strbuf_stripspace(&sb, NULL);
>      +
>      + 	if (signoff)
>      + 		append_signoff(&sb, ignored_log_message_bytes(sb.buf, sb.len), 0);
>      +
>        ## builtin/notes.c ##
>       @@ builtin/notes.c: static void prepare_note_data(const struct object_id *object, struct note_data *
>        			die(_("please supply the note contents using either -m or -F option"));
>      @@ builtin/notes.c: static void prepare_note_data(const struct object_id *object, s
>        	}
>        }
>        
>      +@@ builtin/notes.c: static void concat_messages(struct note_data *d)
>      + 		if ((d->stripspace == UNSPECIFIED &&
>      + 		     d->messages[i]->stripspace == STRIPSPACE) ||
>      + 		    d->stripspace == STRIPSPACE)
>      +-			strbuf_stripspace(&d->buf, 0);
>      ++			strbuf_stripspace(&d->buf, NULL);
>      + 		strbuf_reset(&msg);
>      + 	}
>      + 	strbuf_release(&msg);
>       
>        ## builtin/rebase.c ##
>       @@ builtin/rebase.c: static int edit_todo_file(unsigned flags)
>      @@ builtin/tag.c: static void create_tag(const struct object_id *object, const char
>        	if (!opt->message_given && !buf->len)
>        		die(_("no tag message?"));
>       
>      + ## builtin/worktree.c ##
>      +@@ builtin/worktree.c: static int can_use_local_refs(const struct add_opts *opts)
>      + 			strbuf_add_real_path(&path, get_worktree_git_dir(NULL));
>      + 			strbuf_addstr(&path, "/HEAD");
>      + 			strbuf_read_file(&contents, path.buf, 64);
>      +-			strbuf_stripspace(&contents, 0);
>      ++			strbuf_stripspace(&contents, NULL);
>      + 			strbuf_strip_suffix(&contents, "\n");
>      +
>      + 			warning(_("HEAD points to an invalid (or orphaned) reference.\n"
>      +
>      + ## gpg-interface.c ##
>      +@@ gpg-interface.c: static int verify_ssh_signed_buffer(struct signature_check *sigc,
>      + 		}
>      + 	}
>      +
>      +-	strbuf_stripspace(&ssh_keygen_out, '\0');
>      +-	strbuf_stripspace(&ssh_keygen_err, '\0');
>      ++	strbuf_stripspace(&ssh_keygen_out, NULL);
>      ++	strbuf_stripspace(&ssh_keygen_err, NULL);
>      + 	/* Add stderr outputs to show the user actual ssh-keygen errors */
>      + 	strbuf_add(&ssh_keygen_out, ssh_principals_err.buf, ssh_principals_err.len);
>      + 	strbuf_add(&ssh_keygen_out, ssh_keygen_err.buf, ssh_keygen_err.len);
>      +
>        ## rebase-interactive.c ##
>       @@ rebase-interactive.c: int edit_todo_list(struct repository *r, struct todo_list *todo_list,
>        	if (launch_sequence_editor(todo_file, &new_todo->buf, NULL))
>   7:  bb22f9c9c5 =  8:  a271207e48 strbuf: accept a comment string for strbuf_commented_addf()
>   8:  8d20688e87 =  9:  c1831453d8 strbuf: accept a comment string for strbuf_add_commented_lines()
>   9:  4b22efb941 = 10:  523eb9e534 prefer comment_line_str to comment_line_char for printing
> 10:  cd03310902 = 11:  85428eadaa find multi-byte comment chars in NUL-terminated strings
> 11:  13a346480e ! 12:  b9e2e2302d find multi-byte comment chars in unterminated buffers
>      @@ trailer.c: static size_t find_trailer_block_start(const char *buf, size_t len)
>        		ssize_t separator_pos;
>        
>       -		if (bol[0] == comment_line_char) {
>      -+		if (starts_with_mem(bol, buf + end_of_title - bol, comment_line_str)) {
>      ++		if (starts_with_mem(bol, buf + len - bol, comment_line_str)) {
>        			non_trailer_lines += possible_continuation_lines;
>        			possible_continuation_lines = 0;
>        			continue;
> 12:  fb3c6659fc ! 13:  7661ca6306 sequencer: handle multi-byte comment characters when writing todo list
>      @@ Commit message
>           We can't just return comment_line_char anymore, since it may require
>           multiple bytes. Instead, we'll return "0" for this case, which is the
>           same thing we'd return for a command which does not have a single-letter
>      -    abbreviation (e.g., "revert" or "noop"). In that case the caller then
>      -    falls back to outputting the full name via command_to_string(). So we
>      -    can handle TODO_COMMENT there, returning the full string.
>      +    abbreviation (e.g., "revert" or "noop"). There is only a single caller
>      +    of command_to_char(), and upon seeing "0" it falls back to outputting
>      +    the full name via command_to_string(). So we can handle TODO_COMMENT
>      +    there, returning the full string.
>       
>           Note that there are many other callers of command_to_string(), which
>           will now behave differently if they pass TODO_COMMENT. But we would not
> 13:  94524b8817 = 14:  8ddab67432 wt-status: drop custom comment-char stringification
> 14:  d754e86f7b = 15:  16d65f9179 environment: drop comment_line_char compatibility macro
> 15:  a6ffe08469 ! 16:  461cc720a0 config: allow multi-byte core.commentChar
>      @@ Commit message
>           how each site behaves. In the interim let's forbid it and we can loosen
>           things later.
>       
>      +    Likewise, the "commentChar cannot be a newline" rule is now extended to
>      +    "it cannot contain a newline" (for the same reason: it can confuse our
>      +    parsing loops).
>      +
>           Since comment_line_str is used in many parts of the code, it's hard to
>           cover all possibilities with tests. We can convert the existing
>           double-semicolon prefix test to show that "git status" works. And we'll
>      @@ config.c: static int git_default_core_config(const char *var, const char *value,
>        		else if (!strcasecmp(value, "auto"))
>        			auto_comment_line_char = 1;
>       -		else if (value[0] && !value[1]) {
>      +-			if (value[0] == '\n')
>      +-				return error(_("core.commentChar cannot be newline"));
>       -			comment_line_str = xstrfmt("%c", value[0]);
>       +		else if (value[0]) {
>      ++			if (strchr(value, '\n'))
>      ++				return error(_("core.commentChar cannot contain newline"));
>       +			comment_line_str = xstrdup(value);
>        			auto_comment_line_char = 0;
>        		} else
>      @@ config.c: static int git_default_core_config(const char *var, const char *value,
>       
>        ## t/t0030-stripspace.sh ##
>       @@ t/t0030-stripspace.sh: test_expect_success 'strip comments with changed comment char' '
>      - 	test -z "$(echo "; comment" | git -c core.commentchar=";" stripspace -s)"
>      - '
>        
>      + test_expect_success 'newline as commentchar is forbidden' '
>      + 	test_must_fail git -c core.commentChar="$LF" stripspace -s 2>err &&
>      +-	grep "core.commentChar cannot be newline" err
>      ++	grep "core.commentChar cannot contain newline" err
>      ++'
>      ++
>       +test_expect_success 'empty commentchar is forbidden' '
>       +	test_must_fail git -c core.commentchar= stripspace -s 2>err &&
>       +	grep "core.commentChar must have at least one character" err
>      -+'
>      -+
>      + '
>      +
>        test_expect_success '-c with single line' '
>      - 	printf "# foo\n" >expect &&
>      - 	printf "foo" | git stripspace -c >actual &&
>       
>        ## t/t7507-commit-verbose.sh ##
>       @@ t/t7507-commit-verbose.sh: test_expect_success 'verbose diff is stripped out with set core.commentChar' '
> 
>    [01/16]: config: forbid newline as core.commentChar
>    [02/16]: strbuf: simplify comment-handling in add_lines() helper
>    [03/16]: strbuf: avoid static variables in strbuf_add_commented_lines()
>    [04/16]: commit: refactor base-case of adjust_comment_line_char()
>    [05/16]: strbuf: avoid shadowing global comment_line_char name
>    [06/16]: environment: store comment_line_char as a string
>    [07/16]: strbuf: accept a comment string for strbuf_stripspace()
>    [08/16]: strbuf: accept a comment string for strbuf_commented_addf()
>    [09/16]: strbuf: accept a comment string for strbuf_add_commented_lines()
>    [10/16]: prefer comment_line_str to comment_line_char for printing
>    [11/16]: find multi-byte comment chars in NUL-terminated strings
>    [12/16]: find multi-byte comment chars in unterminated buffers
>    [13/16]: sequencer: handle multi-byte comment characters when writing todo list
>    [14/16]: wt-status: drop custom comment-char stringification
>    [15/16]: environment: drop comment_line_char compatibility macro
>    [16/16]: config: allow multi-byte core.commentChar
> 
>   Documentation/config/core.txt |  4 ++-
>   add-patch.c                   | 14 +++++-----
>   builtin/am.c                  |  2 +-
>   builtin/branch.c              |  8 +++---
>   builtin/commit.c              | 21 +++++++--------
>   builtin/merge.c               | 12 ++++-----
>   builtin/notes.c               | 12 ++++-----
>   builtin/rebase.c              |  2 +-
>   builtin/stripspace.c          |  4 +--
>   builtin/tag.c                 | 14 +++++-----
>   builtin/worktree.c            |  2 +-
>   commit.c                      |  3 ++-
>   config.c                      |  8 +++---
>   environment.c                 |  2 +-
>   environment.h                 |  2 +-
>   fmt-merge-msg.c               |  8 +++---
>   gpg-interface.c               |  4 +--
>   rebase-interactive.c          | 10 ++++----
>   sequencer.c                   | 48 ++++++++++++++++++-----------------
>   strbuf.c                      | 47 ++++++++++++++++++----------------
>   strbuf.h                      |  9 ++++---
>   t/t0030-stripspace.sh         | 10 ++++++++
>   t/t7507-commit-verbose.sh     | 10 ++++++++
>   t/t7508-status.sh             |  4 ++-
>   trailer.c                     |  6 ++---
>   wt-status.c                   | 31 +++++++++-------------
>   26 files changed, 162 insertions(+), 135 deletions(-)
> 
> -Peff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 0/16] allow multi-byte core.commentChar
  2024-03-12 14:40               ` [PATCH v2 0/16] " phillip.wood123
@ 2024-03-12 20:30                 ` Junio C Hamano
  0 siblings, 0 replies; 82+ messages in thread
From: Junio C Hamano @ 2024-03-12 20:30 UTC (permalink / raw)
  To: phillip.wood123
  Cc: Jeff King, git, Dragan Simic, Kristoffer Haugsbakk,
	Manlio Perillo, René Scharfe, Phillip Wood

phillip.wood123@gmail.com writes:

> Looking through the range-diff it addresses all of my (sequencer
> focused) comments on v1.

Thanks, both.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-12 14:36                         ` phillip.wood123
@ 2024-03-13  6:23                           ` Jeff King
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-13  6:23 UTC (permalink / raw)
  To: phillip.wood
  Cc: Junio C Hamano, René Scharfe, git, Dragan Simic,
	Kristoffer Haugsbakk, Manlio Perillo

On Tue, Mar 12, 2024 at 02:36:36PM +0000, phillip.wood123@gmail.com wrote:

> > Leading whitespace actually does work, though I think you'd be slightly
> > insane to use it.
> 
> For "git rebase" in only works if you edit the todo list with "git rebase
> --edit-todo" which calls strbuf_stripspace() and therefore parse_insn_line()
> never sees the comments. If you edit the todo list directly then it will
> error out. You can see this with
> 
>     git -c core.commentChar=' ' rebase -x 'echo " this is a comment"
> >>"$(git rev-parse --git-path rebase-merge/git-rebase-todo)"' HEAD^
> 
> which successfully picks HEAD but then gives
> 
>     error: invalid command 'this'
> 
> when it tries to parse the todo list after the exec command is run. Given it
> is broken already I'm not sure we should worry about it here. In any case it
> is not clear how much we should worry about problems caused by users editing
> the todo list without using "git rebase --edit-todo". There is code in
> parse_insn_line() which is clearly there to handle direct editing of the
> file but I don't think it is tested and directly editing the file probably
> bypasses the rebase.missingCommitsCheck checks as well.

Ah, thanks for the example. I guess it's not too surprising that it can
cause confusion. Given that it's an existing issue, I think my
preference would be to leave it out of the series under discussion
(given how long and complicated it is already), but I'd have no
objection to tightening things further on top as a separate series.

-Peff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-12  9:17               ` [PATCH v2 16/16] config: allow multi-byte core.commentChar Jeff King
@ 2024-03-13 18:23                 ` Kristoffer Haugsbakk
  2024-03-13 18:39                   ` Junio C Hamano
  2024-03-15  5:59                   ` Jeff King
  0 siblings, 2 replies; 82+ messages in thread
From: Kristoffer Haugsbakk @ 2024-03-13 18:23 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Dragan Simic, Manlio Perillo, René Scharfe,
	Phillip Wood, git

Thanks for your work on this. Now I can use dingbats as my comment char.

On Tue, Mar 12, 2024, at 10:17, Jeff King wrote:
> diff --git a/Documentation/config/core.txt
> b/Documentation/config/core.txt
> index 0e8c2832bf..c86b8c8408 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -523,7 +523,9 @@ core.commentChar::
>  	Commands such as `commit` and `tag` that let you edit
>  	messages consider a line that begins with this character
>  	commented, and removes them after the editor returns
> -	(default '#').
> +	(default '#'). Note that this option can take values larger than
> +	a byte (whether a single multi-byte character, or you
> +	could even go wild with a multi-character sequence).

I don’t know if this expanded description focuses a bit much on the
history of the change[1] or if it is intentionally indirect about this
char-is-really-a-string behavior as a sort of easter egg.[2]

Maybe it could be more directly stated like:

  “ Note that this variable can in fact be a string like `foo`; it
    doesn’t have to be a single character.

(Hopefully UTF-8 is implied by “foo”. Or else “føø”.)

Terms like “a byte” and “multi-byte characters” seem a bit too technical
in this context when you can just say “string”.

† 1: (1) What’s a “char”, is it ASCII? (2) It’s ASCII but could in
    principle be made multi-byte (3) And also a multi-byte *string*,
    right? (4) …
† 2: In five years: (1) How come this Git tutorial’s commit message
    template has `(commit)` as the ignore-these-lines marker? How did he
    abuse “comment char” to make a long string? (2) Actually…

❦ Please enter the email reply. Lines starting with '❦' will be ignored,
❦ and an empty message aborts the sendout.

-- 
Kristoffer Haugsbakk

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-13 18:23                 ` Kristoffer Haugsbakk
@ 2024-03-13 18:39                   ` Junio C Hamano
  2024-03-15  5:59                   ` Jeff King
  1 sibling, 0 replies; 82+ messages in thread
From: Junio C Hamano @ 2024-03-13 18:39 UTC (permalink / raw)
  To: Kristoffer Haugsbakk
  Cc: Jeff King, Dragan Simic, Manlio Perillo, René Scharfe,
	Phillip Wood, git

"Kristoffer Haugsbakk" <code@khaugsbakk.name> writes:

>> +	(default '#'). Note that this option can take values larger than
>> +	a byte (whether a single multi-byte character, or you
>> +	could even go wild with a multi-character sequence).
>
> I don’t know if this expanded description focuses a bit much on the
> history of the change[1] or if it is intentionally indirect about this
> char-is-really-a-string behavior as a sort of easter egg.[2]

> Maybe it could be more directly stated like:
>
>   “ Note that this variable can in fact be a string like `foo`; it
>     doesn’t have to be a single character.
>
> (Hopefully UTF-8 is implied by “foo”. Or else “føø”.)

That's definitely an improvement, but I would say that using a
dingbat instad of "foo", and "single character" -> "single ASCII
character" (or "single byte") would make it even clearer.

Thanks.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 11/15] find multi-byte comment chars in unterminated buffers
  2024-03-12  8:05                 ` Jeff King
@ 2024-03-14 19:37                   ` René Scharfe
  0 siblings, 0 replies; 82+ messages in thread
From: René Scharfe @ 2024-03-14 19:37 UTC (permalink / raw)
  To: Jeff King
  Cc: git, Junio C Hamano, Dragan Simic, Kristoffer Haugsbakk, Manlio Perillo

Am 12.03.24 um 09:05 schrieb Jeff King:
> On Thu, Mar 07, 2024 at 08:42:22PM +0100, René Scharfe wrote:
>
>>> @@ -2562,7 +2562,7 @@ static int parse_insn_line(struct repository *r, struct todo_item *item,
>>>  	/* left-trim */
>>>  	bol += strspn(bol, " \t");
>>>
>>> -	if (bol == eol || *bol == '\r' || *bol == comment_line_char) {
>>> +	if (bol == eol || *bol == '\r' || starts_with_mem(bol, eol - bol, comment_line_str)) {
>>
>> If the strspn() call is safe (which it is, as the caller expects the
>> string to be NUL-terminated) then you could use starts_with() here and
>> avoid the length calculation.  But that would also match
>> comment_line_str values that contain LF, which the _mem version does not
>> and that's better.
>
> I try not to read too much into the use of string functions on what
> otherwise appears to be an unterminated buffer. While in Git it is quite
> often terminated at allocation time (coming from a strbuf, etc) I feel
> like I've fixed a number of out-of-bounds reads simply due to sloppy
> practices. And even if something is correct today, it is easy for it to
> change, since the assumption is made far away from allocation.
>
> So I dunno. Like you said, fewer computations is fewer opportunity to
> mess things up. I don't like the idea of introducing a new hand-grenade
> that might blow up later, but maybe if it's right next to a strspn()
> call that's already a problem, it's not materially making anything
> worse.

Yeah, and my logic was flawed: If the caller somehow guarantees that a
space or tab occurs before eol then the strspn() call is safe.  Its
presence doesn't guarantee NUL termination.  parse_insn_line() would
not be safe to use without that prerequisite, but that's a different
matter..

>>> @@ -882,7 +882,7 @@ static size_t find_trailer_block_start(const char *buf, size_t len)
>>>
>>>  	/* The first paragraph is the title and cannot be trailers */
>>>  	for (s = buf; s < buf + len; s = next_line(s)) {
>>> -		if (s[0] == comment_line_char)
>>> +		if (starts_with_mem(s, buf + len - s, comment_line_str))
>>>  			continue;
>>>  		if (is_blank_line(s))
>>
>> Another case where starts_with() would be safe to use, as
>> is_blank_line() expects (and gets) a NUL-terminated string, but it would
>> allow matching comment_line_str values that contain LF.
>
> Hmm. Yes, it is a NUL-terminated string always, but the caller has told
> us not to look past end_of_log_message(). I suspect that if there is no
> newline in comment_line_str() it's probably impossible to go past "len"
> (just because the end of the log surely ends with either a NUL or a
> newline). But it feels iffy to me. I dunno.

Same flawed thinking on my part: As long as we're guaranteed a blank
line in the buffer we won't walk past its end.  That doesn't mean we can
assume a NUL is present.  But that's fragile.  The code should use
memchr() instead of strchrnul().

That's not the problem you set out to solve in your series, though, and
you avoid making it worse by respecting the length limit in the code
you change.  #leftoverbits

Keeping track of the remaining length increases code size and adds
opportunities for mistakes.  Not sure how to avoid it, however.  Using
eol instead of len at least avoids subtractions.

tl;dr: Good patch (in v2).

René

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-13 18:23                 ` Kristoffer Haugsbakk
  2024-03-13 18:39                   ` Junio C Hamano
@ 2024-03-15  5:59                   ` Jeff King
  2024-03-15  7:16                     ` Kristoffer Haugsbakk
  1 sibling, 1 reply; 82+ messages in thread
From: Jeff King @ 2024-03-15  5:59 UTC (permalink / raw)
  To: Kristoffer Haugsbakk
  Cc: Junio C Hamano, Dragan Simic, Manlio Perillo, René Scharfe,
	Phillip Wood, git

On Wed, Mar 13, 2024 at 07:23:25PM +0100, Kristoffer Haugsbakk wrote:

> Thanks for your work on this. Now I can use dingbats as my comment char.

Truly we have entered a golden age of technology. ;)

> > @@ -523,7 +523,9 @@ core.commentChar::
> >  	Commands such as `commit` and `tag` that let you edit
> >  	messages consider a line that begins with this character
> >  	commented, and removes them after the editor returns
> > -	(default '#').
> > +	(default '#'). Note that this option can take values larger than
> > +	a byte (whether a single multi-byte character, or you
> > +	could even go wild with a multi-character sequence).
> 
> I don’t know if this expanded description focuses a bit much on the
> history of the change[1] or if it is intentionally indirect about this
> char-is-really-a-string behavior as a sort of easter egg.[2]

Mostly I was worried that people would take "char" in the name to assume
it could only be a single byte (I had originally even started the new
sentence with "Despite the word 'char' in the name, this option
can..."). And that is not just history, but a name we are stuck with
forever[1].

Certainly "char" is an ambiguous term, though. I didn't mean to leave
char-is-a-string as an easter egg; that's what I meant by
"multi-character sequence". Certainly "string" is a shorter way of
saying that. ;) But I wasn't sure its meaning would be obvious without
the word "multi-character". Giving an example as you suggested does
help that.

That said...

> Maybe it could be more directly stated like:
> 
>   “ Note that this variable can in fact be a string like `foo`; it
>     doesn’t have to be a single character.

I actually do think the "string" nature is mostly uninteresting, and I'd
be OK leaving it as an easter egg. What your suggestion doesn't say is
that multi-byte characters are OK. But if we think people will just
assume that in a modern UTF-8 world, then maybe we don't need to say
anything at all?

> (Hopefully UTF-8 is implied by “foo”. Or else “føø”.)

It actually does not have to be UTF-8. If you use an alternate encoding
in your editor (and probably set i18n.commitEncoding to match), I think
everything might just work. (Though to be clear, I think anybody using
non-UTF8 in 2024 deserves our pity either for being crazy or for being
stuck working on an antiquated system).

-Peff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-15  5:59                   ` Jeff King
@ 2024-03-15  7:16                     ` Kristoffer Haugsbakk
  2024-03-15  8:10                       ` Jeff King
  0 siblings, 1 reply; 82+ messages in thread
From: Kristoffer Haugsbakk @ 2024-03-15  7:16 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Dragan Simic, Manlio Perillo, René Scharfe,
	Phillip Wood, git

On Fri, Mar 15, 2024, at 06:59, Jeff King wrote:
> On Wed, Mar 13, 2024 at 07:23:25PM +0100, Kristoffer Haugsbakk wrote:
>
>> Thanks for your work on this. Now I can use dingbats as my comment char.
>
> Truly we have entered a golden age of technology. ;)

QoL features can in aggregate have a surprising impact :)

>
>> > @@ -523,7 +523,9 @@ core.commentChar::
>> >  	Commands such as `commit` and `tag` that let you edit
>> >  	messages consider a line that begins with this character
>> >  	commented, and removes them after the editor returns
>> > -	(default '#').
>> > +	(default '#'). Note that this option can take values larger than
>> > +	a byte (whether a single multi-byte character, or you
>> > +	could even go wild with a multi-character sequence).
>>
>> I don’t know if this expanded description focuses a bit much on the
>> history of the change[1] or if it is intentionally indirect about this
>> char-is-really-a-string behavior as a sort of easter egg.[2]
>
> Mostly I was worried that people would take "char" in the name to assume
> it could only be a single byte (I had originally even started the new
> sentence with "Despite the word 'char' in the name, this option
> can..."). And that is not just history, but a name we are stuck with
> forever[1].

Missing footnote or referring to my footnote?

My suggestion was to use a `core.commentString` alias. Which might
matter for new answers to questions about its use. It might not matter
if in practice most people get their config tips from 1500 point
StackOverflow question about how git-commit(1) keeps swallowing their
GitHub issue numbers (due to automatic linewrap) from 2011.

> Certainly "char" is an ambiguous term, though. I didn't mean to leave
> char-is-a-string as an easter egg; that's what I meant by
> "multi-character sequence". Certainly "string" is a shorter way of
> saying that. ;) But I wasn't sure its meaning would be obvious without
> the word "multi-character". Giving an example as you suggested does
> help that.
>
> That said...
>
>> Maybe it could be more directly stated like:
>>
>>   “ Note that this variable can in fact be a string like `foo`; it
>>     doesn’t have to be a single character.
>
> I actually do think the "string" nature is mostly uninteresting, and I'd
> be OK leaving it as an easter egg.

To my mind a string subsumes a char (multi- or not). Like in programming
languages: some might be used to single-char `#`, but I don’t think they
do a double take when they see languages with `//` or `--`.

> What your suggestion doesn't say is that multi-byte characters are
> OK. But if we think people will just assume that in a modern UTF-8
> world, then maybe we don't need to say anything at all?

Given that we’re mostly in the context of a commit message, an
ASCII-only restriction would feel archaic.

I guess it depends on what the *normal* is in the documentation at
large. As a user I’m used to Git handling the text that I give it.

> It actually does not have to be UTF-8.

Good point. Unicode is more appropriate.

> (Though to be clear, I think anybody using non-UTF8 in 2024 deserves
> our pity either for being crazy or for being stuck working on an
> antiquated system).

I honestly feel blessed that I have to worry so little about text
encoding.

-- 
Kristoffer Haugsbakk


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-15  7:16                     ` Kristoffer Haugsbakk
@ 2024-03-15  8:10                       ` Jeff King
  2024-03-15 13:30                         ` Kristoffer Haugsbakk
                                           ` (2 more replies)
  0 siblings, 3 replies; 82+ messages in thread
From: Jeff King @ 2024-03-15  8:10 UTC (permalink / raw)
  To: Kristoffer Haugsbakk
  Cc: Junio C Hamano, Dragan Simic, Manlio Perillo, René Scharfe,
	Phillip Wood, git

On Fri, Mar 15, 2024 at 08:16:53AM +0100, Kristoffer Haugsbakk wrote:

> > Mostly I was worried that people would take "char" in the name to assume
> > it could only be a single byte (I had originally even started the new
> > sentence with "Despite the word 'char' in the name, this option
> > can..."). And that is not just history, but a name we are stuck with
> > forever[1].
> 
> Missing footnote or referring to my footnote?
> 
> My suggestion was to use a `core.commentString` alias. Which might
> matter for new answers to questions about its use. It might not matter
> if in practice most people get their config tips from 1500 point
> StackOverflow question about how git-commit(1) keeps swallowing their
> GitHub issue numbers (due to automatic linewrap) from 2011.

Heh, missing footnote. I was going to say "we could introduce
core.commentStr or similar", but after your comment I searched in the
archive and see that you did indeed already suggest it.

I'm not sure if it would make things more or less confusing to have two
related values. One nice side effect is that the new variable would be
ignored by older versions of Git (whereas by extending core.commentChar,
you end up with config that causes older versions to barf). That
probably doesn't matter that much for most users, but as somebody who
works on Git I frequently run old versions for bug testing, bisection,
and so forth.

> > I actually do think the "string" nature is mostly uninteresting, and I'd
> > be OK leaving it as an easter egg.
> 
> To my mind a string subsumes a char (multi- or not). Like in programming
> languages: some might be used to single-char `#`, but I don’t think they
> do a double take when they see languages with `//` or `--`.

Hmm, good point. I was mostly focused on UTF-8 characters, but "//" is
quite a reasonable thing for people to try. It is probably a better
example than "foo".

> > What your suggestion doesn't say is that multi-byte characters are
> > OK. But if we think people will just assume that in a modern UTF-8
> > world, then maybe we don't need to say anything at all?
> 
> Given that we’re mostly in the context of a commit message, an
> ASCII-only restriction would feel archaic.
>
> I guess it depends on what the *normal* is in the documentation at
> large. As a user I’m used to Git handling the text that I give it.

Right, that's what I was asking. To me "character" means an ASCII byte,
but I think I might be archaic myself. ;) If most of our readers would
just assume that multi-byte characters work, perhaps it is confusing
things to even mention it.

> > It actually does not have to be UTF-8.
> 
> Good point. Unicode is more appropriate.

I think other Unicode encodings are likely to have problems (because
they embed NULs). Specifically I was thinking that you could probably
get away with latin1 or other 8-bit encodings. But again, I really hope
nobody is doing that anymore.

So anyway, adapting your original suggestion based on discussion in the
thread, maybe squash in (to the final patch):

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c86b8c8408..c5a8033df9 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -523,9 +523,8 @@ core.commentChar::
 	Commands such as `commit` and `tag` that let you edit
 	messages consider a line that begins with this character
 	commented, and removes them after the editor returns
-	(default '#'). Note that this option can take values larger than
-	a byte (whether a single multi-byte character, or you
-	could even go wild with a multi-character sequence).
+	(default '#'). Note that this variable can be a string like
+	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.
 +
 If set to "auto", `git-commit` would select a character that is not
 the beginning character of any line in existing commit messages.


That's assuming we don't want to go the commentString route, which would
require a bit more re-working of the patch. I'm also open to a more
clever or pretty multi-byte example if we have one. ;)

-Peff

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-15  8:10                       ` Jeff King
@ 2024-03-15 13:30                         ` Kristoffer Haugsbakk
  2024-03-15 15:40                         ` Junio C Hamano
  2024-03-26 22:10                         ` Junio C Hamano
  2 siblings, 0 replies; 82+ messages in thread
From: Kristoffer Haugsbakk @ 2024-03-15 13:30 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Dragan Simic, Manlio Perillo, René Scharfe,
	Phillip Wood, git

> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index c86b8c8408..c5a8033df9 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -523,9 +523,8 @@ core.commentChar::
>  	Commands such as `commit` and `tag` that let you edit
>  	messages consider a line that begins with this character
>  	commented, and removes them after the editor returns
> -	(default '#'). Note that this option can take values larger than
> -	a byte (whether a single multi-byte character, or you
> -	could even go wild with a multi-character sequence).
> +	(default '#'). Note that this variable can be a string like
> +	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.

This is perfect :)

-- 
Kristoffer Haugsbakk


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-15  8:10                       ` Jeff King
  2024-03-15 13:30                         ` Kristoffer Haugsbakk
@ 2024-03-15 15:40                         ` Junio C Hamano
  2024-03-16  5:50                           ` Jeff King
  2024-03-26 22:10                         ` Junio C Hamano
  2 siblings, 1 reply; 82+ messages in thread
From: Junio C Hamano @ 2024-03-15 15:40 UTC (permalink / raw)
  To: Jeff King
  Cc: Kristoffer Haugsbakk, Dragan Simic, Manlio Perillo,
	René Scharfe, Phillip Wood, git

Jeff King <peff@peff.net> writes:

> +	(default '#'). Note that this variable can be a string like
> +	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.

Looking good.

> That's assuming we don't want to go the commentString route, which would
> require a bit more re-working of the patch. I'm also open to a more
> clever or pretty multi-byte example if we have one. ;)

Adding core.commentString can be done long after the dust settles
and I would expect that most of the changes in the patch would not
have to be updated.  The parts that use comment_line_str variable do
not have to change, the documentation needs "core.commentString is a
synonym for core.commentChar, the latter of which is understood by
older versions of Git (but they may use only the first byte of the
string)" or something, but other than that, the existing text after
this patch does not have to be updated.  If we add a proper synonym
support to the config machinery, that would be a sizable project,
but otherwise it would be just another "if (!strcmp()) var = val".

Stepping back a bit, one thing that we do need to mention in this
round is what happens when you use multi-byte sequence and have it
accessed by existing versions of Git.  "use only the first byte" I
wrote above came out of thin air without experimenting or reading
the code, but something like that ought to be part of the "Note
that" paragraph above.

	(default '#'). Note that this variable can be a string like
	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.
	Also note that older versions of Git used only the first byte
	(not necessarily a character) of the value of this variable,
	so you may want to be careful if you plan to use versions of
	Git older than 2.45.

or something like that, perhaps.





^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-15 15:40                         ` Junio C Hamano
@ 2024-03-16  5:50                           ` Jeff King
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-16  5:50 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kristoffer Haugsbakk, Dragan Simic, Manlio Perillo,
	René Scharfe, Phillip Wood, git

On Fri, Mar 15, 2024 at 08:40:56AM -0700, Junio C Hamano wrote:

> > That's assuming we don't want to go the commentString route, which would
> > require a bit more re-working of the patch. I'm also open to a more
> > clever or pretty multi-byte example if we have one. ;)
> 
> Adding core.commentString can be done long after the dust settles
> and I would expect that most of the changes in the patch would not
> have to be updated.  The parts that use comment_line_str variable do
> not have to change, the documentation needs "core.commentString is a
> synonym for core.commentChar, the latter of which is understood by
> older versions of Git (but they may use only the first byte of the
> string)" or something, but other than that, the existing text after
> this patch does not have to be updated.  If we add a proper synonym
> support to the config machinery, that would be a sizable project,
> but otherwise it would be just another "if (!strcmp()) var = val".

Yeah, I agree we could add core.commentString on top of what's here, as
long as we're OK with core.commentChar starting to accept strings in the
meantime. Which is probably reasonable, and in which case the code
portion of the patch really is just:

diff --git a/config.c b/config.c
index 92c752ed9f..13fb922bf5 100644
--- a/config.c
+++ b/config.c
@@ -1560,7 +1560,8 @@ static int git_default_core_config(const char *var, const char *value,
 	if (!strcmp(var, "core.editor"))
 		return git_config_string(&editor_program, var, value);
 
-	if (!strcmp(var, "core.commentchar")) {
+	if (!strcmp(var, "core.commentchar") ||
+	    !strcmp(var, "core.commentstring")) {
 		if (!value)
 			return config_error_nonbool(var);
 		else if (!strcasecmp(value, "auto"))

(the real work of course being in docs and tests).

If we wanted to distinguish them more (say, core.commentChar remains
as-is but core.commentString allows strings and takes precedence), then
we'd need to do it now to avoid flip-flopping between versions. I don't
see a huge benefit in restricting commentChar though.

> Stepping back a bit, one thing that we do need to mention in this
> round is what happens when you use multi-byte sequence and have it
> accessed by existing versions of Git.  "use only the first byte" I
> wrote above came out of thin air without experimenting or reading
> the code, but something like that ought to be part of the "Note
> that" paragraph above.
> 
> 	(default '#'). Note that this variable can be a string like
> 	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.
> 	Also note that older versions of Git used only the first byte
> 	(not necessarily a character) of the value of this variable,
> 	so you may want to be careful if you plan to use versions of
> 	Git older than 2.45.

The current code barfs for anything larger than a byte:

  $ git.v2.44.0 -c core.commentchar=foo stripspace -s
  error: core.commentChar should only be one ASCII character
  fatal: unable to parse 'core.commentchar' from command-line config

I'm mixed on these sorts of version-specific notes in the documentation.
For people who aren't mixing versions, the history is useless noise
(whose value decreases as time goes on and 2.45 becomes "old" itself).
For people who do use older versions, they'd quickly get an error like
the one above.

So I dunno. I'm not strictly opposed, but if this is something we think
is worth warning about, then that implies to me that it is worth
providing a more ergonomic solution like core.commentString.

-Peff

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-15  8:10                       ` Jeff King
  2024-03-15 13:30                         ` Kristoffer Haugsbakk
  2024-03-15 15:40                         ` Junio C Hamano
@ 2024-03-26 22:10                         ` Junio C Hamano
  2024-03-26 22:12                           ` Kristoffer Haugsbakk
  2024-03-27  7:46                           ` Jeff King
  2 siblings, 2 replies; 82+ messages in thread
From: Junio C Hamano @ 2024-03-26 22:10 UTC (permalink / raw)
  To: Jeff King
  Cc: Kristoffer Haugsbakk, Dragan Simic, Manlio Perillo,
	René Scharfe, Phillip Wood, git

Jeff King <peff@peff.net> writes:

> So anyway, adapting your original suggestion based on discussion in the
> thread, maybe squash in (to the final patch):
>
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index c86b8c8408..c5a8033df9 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -523,9 +523,8 @@ core.commentChar::
>  	Commands such as `commit` and `tag` that let you edit
>  	messages consider a line that begins with this character
>  	commented, and removes them after the editor returns
> -	(default '#'). Note that this option can take values larger than
> -	a byte (whether a single multi-byte character, or you
> -	could even go wild with a multi-character sequence).
> +	(default '#'). Note that this variable can be a string like
> +	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.
>  +
>  If set to "auto", `git-commit` would select a character that is not
>  the beginning character of any line in existing commit messages.
>
>
> That's assuming we don't want to go the commentString route, which would
> require a bit more re-working of the patch. I'm also open to a more
> clever or pretty multi-byte example if we have one. ;)

It has been 10 days since this discussion petered out.

My preference is to introduce core.commentString to avoid confusion
coming from an older Git using the first-byte of a multi-byte
string, or dying upon reading a configuration file meant for a newer
Git, and then let core.commentString override core.commentChar, but
I would prefer to see the discussion participants to raise their
opinions and reach a conclusion.

Thanks.

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-26 22:10                         ` Junio C Hamano
@ 2024-03-26 22:12                           ` Kristoffer Haugsbakk
  2024-03-27  7:46                           ` Jeff King
  1 sibling, 0 replies; 82+ messages in thread
From: Kristoffer Haugsbakk @ 2024-03-26 22:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Dragan Simic, Manlio Perillo, René Scharfe, Phillip Wood,
	git, Jeff King

On Tue, Mar 26, 2024, at 23:10, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
>> So anyway, adapting your original suggestion based on discussion in the
>> thread, maybe squash in (to the final patch):
>>
>> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
>> index c86b8c8408..c5a8033df9 100644
>> --- a/Documentation/config/core.txt
>> +++ b/Documentation/config/core.txt
>> @@ -523,9 +523,8 @@ core.commentChar::
>>  	Commands such as `commit` and `tag` that let you edit
>>  	messages consider a line that begins with this character
>>  	commented, and removes them after the editor returns
>> -	(default '#'). Note that this option can take values larger than
>> -	a byte (whether a single multi-byte character, or you
>> -	could even go wild with a multi-character sequence).
>> +	(default '#'). Note that this variable can be a string like
>> +	`//` or `⁑⁕⁑`; it doesn't have to be a single ASCII character.
>>  +
>>  If set to "auto", `git-commit` would select a character that is not
>>  the beginning character of any line in existing commit messages.
>>
>>
>> That's assuming we don't want to go the commentString route, which would
>> require a bit more re-working of the patch. I'm also open to a more
>> clever or pretty multi-byte example if we have one. ;)
>
> It has been 10 days since this discussion petered out.
>
> My preference is to introduce core.commentString to avoid confusion
> coming from an older Git using the first-byte of a multi-byte
> string, or dying upon reading a configuration file meant for a newer
> Git, and then let core.commentString override core.commentChar, but
> I would prefer to see the discussion participants to raise their
> opinions and reach a conclusion.
>
> Thanks.

Sounds good to me.

-- 
Kristoffer Haugsbakk


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-26 22:10                         ` Junio C Hamano
  2024-03-26 22:12                           ` Kristoffer Haugsbakk
@ 2024-03-27  7:46                           ` Jeff King
  2024-03-27  8:19                             ` [PATCH 17/16] config: add core.commentString Jeff King
  2024-03-27 14:53                             ` [PATCH v2 16/16] config: allow multi-byte core.commentChar Junio C Hamano
  1 sibling, 2 replies; 82+ messages in thread
From: Jeff King @ 2024-03-27  7:46 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kristoffer Haugsbakk, Dragan Simic, Manlio Perillo,
	René Scharfe, Phillip Wood, git

On Tue, Mar 26, 2024 at 03:10:23PM -0700, Junio C Hamano wrote:

> It has been 10 days since this discussion petered out.

I wrote the last message, so I was waiting for you to respond. ;)

  https://lore.kernel.org/git/20240316055013.GA32145@coredump.intra.peff.net/

> My preference is to introduce core.commentString to avoid confusion
> coming from an older Git using the first-byte of a multi-byte
> string, or dying upon reading a configuration file meant for a newer
> Git, and then let core.commentString override core.commentChar, but
> I would prefer to see the discussion participants to raise their
> opinions and reach a conclusion.

OK. I don't have a strong opinion. Are you OK with core.commentString as
a strict synonym (so last-one-wins and either name overwrites previous)?
Or do you want an override (i.e., commentString always overrides
commentChar, regardless of order). I think it's mostly academic, and the
strict synonym version is much easier to implement.

-Peff

^ permalink raw reply	[flat|nested] 82+ messages in thread

* [PATCH 17/16] config: add core.commentString
  2024-03-27  7:46                           ` Jeff King
@ 2024-03-27  8:19                             ` Jeff King
  2024-03-27 12:45                               ` Chris Torek
  2024-03-27 16:13                               ` Junio C Hamano
  2024-03-27 14:53                             ` [PATCH v2 16/16] config: allow multi-byte core.commentChar Junio C Hamano
  1 sibling, 2 replies; 82+ messages in thread
From: Jeff King @ 2024-03-27  8:19 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kristoffer Haugsbakk, Dragan Simic, Manlio Perillo,
	René Scharfe, Phillip Wood, git

On Wed, Mar 27, 2024 at 03:46:55AM -0400, Jeff King wrote:

> > My preference is to introduce core.commentString to avoid confusion
> > coming from an older Git using the first-byte of a multi-byte
> > string, or dying upon reading a configuration file meant for a newer
> > Git, and then let core.commentString override core.commentChar, but
> > I would prefer to see the discussion participants to raise their
> > opinions and reach a conclusion.
> 
> OK. I don't have a strong opinion. Are you OK with core.commentString as
> a strict synonym (so last-one-wins and either name overwrites previous)?
> Or do you want an override (i.e., commentString always overrides
> commentChar, regardless of order). I think it's mostly academic, and the
> strict synonym version is much easier to implement.

Like this, on top of what you have queued in jk/core-comment-string.

Note that you graduated kh/doc-commentchar-is-a-byte, which says "this
ASCII character" early in the description, which will be incorrect if my
series is merged. This would need to be fixed (possibly as part of
merging my topic, though I don't think it actually triggers a conflict,
so you'll have to remember to do so manually). Or mine could be rebased
on top of master and then remove it as part of the series.

-- >8 --
Subject: [PATCH] config: add core.commentString

The core.commentChar code recently learned to accept more than a
single ASCII character. But using it is annoying with multiple versions
of Git, since older ones will reject it outright:

    $ git.v2.44.0 -c core.commentchar=foo stripspace -s
    error: core.commentChar should only be one ASCII character
    fatal: unable to parse 'core.commentchar' from command-line config

Let's add an alias core.commentString. That's arguably a better name
anyway, since we now can handle strings, and it makes it possible to
have a config that works reasonably with both old and new versions of
Git (see the example in the documentation).

This is strictly an alias, so there's not much point in adding duplicate
tests; I added a single one to t0030 that exercises the alias code.

Note also that the error messages for invalid values will now show the
variable the config parser handed us, and thus will be normalized to
lowercase (rather than camelcase). A few tests in t0030 are adjusted to
match.

Signed-off-by: Jeff King <peff@peff.net>
---
An alternative to using "$var cannot ..." in the error messages (if we
don't like the all-lowercase variable name) is to just say "comment
strings cannot ...". That vaguely covers both cases, and the message
printed by the config code itself does mention the actual variable name
that triggered the error.

 Documentation/config/core.txt | 19 ++++++++++++++++---
 config.c                      |  7 ++++---
 t/t0030-stripspace.sh         |  9 +++++++--
 3 files changed, 27 insertions(+), 8 deletions(-)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index c86b8c8408..bbe869c497 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -520,15 +520,28 @@ core.editor::
 	`GIT_EDITOR` is not set.  See linkgit:git-var[1].
 
 core.commentChar::
+core.commentString::
 	Commands such as `commit` and `tag` that let you edit
 	messages consider a line that begins with this character
 	commented, and removes them after the editor returns
-	(default '#'). Note that this option can take values larger than
-	a byte (whether a single multi-byte character, or you
-	could even go wild with a multi-character sequence).
+	(default '#').
 +
 If set to "auto", `git-commit` would select a character that is not
 the beginning character of any line in existing commit messages.
++
+Note that these two variables are aliases of each other, and in modern
+versions of Git you are free to use a string (e.g., `//` or `⁑⁕⁑`) with
+`commentChar`. Versions of Git prior to v2.45.0 will ignore
+`commentString` but will reject a value of `commentChar` that consists
+of more than a single ASCII byte. If you plan to use your config with
+older and newer versions of Git, you may want to specify both:
++
+    [core]
+    # single character for older versions
+    commentChar = "#"
+    # string for newer versions (which will override commentChar
+    # because it comes later in the file)
+    commentString = "//"
 
 core.filesRefLockTimeout::
 	The length of time, in milliseconds, to retry when trying to
diff --git a/config.c b/config.c
index 92c752ed9f..d12e0f34f1 100644
--- a/config.c
+++ b/config.c
@@ -1560,18 +1560,19 @@ static int git_default_core_config(const char *var, const char *value,
 	if (!strcmp(var, "core.editor"))
 		return git_config_string(&editor_program, var, value);
 
-	if (!strcmp(var, "core.commentchar")) {
+	if (!strcmp(var, "core.commentchar") ||
+	    !strcmp(var, "core.commentstring")) {
 		if (!value)
 			return config_error_nonbool(var);
 		else if (!strcasecmp(value, "auto"))
 			auto_comment_line_char = 1;
 		else if (value[0]) {
 			if (strchr(value, '\n'))
-				return error(_("core.commentChar cannot contain newline"));
+				return error(_("%s cannot contain newline"), var);
 			comment_line_str = xstrdup(value);
 			auto_comment_line_char = 0;
 		} else
-			return error(_("core.commentChar must have at least one character"));
+			return error(_("%s must have at least one character"), var);
 		return 0;
 	}
 
diff --git a/t/t0030-stripspace.sh b/t/t0030-stripspace.sh
index a161faf702..f10f42ff1e 100755
--- a/t/t0030-stripspace.sh
+++ b/t/t0030-stripspace.sh
@@ -401,14 +401,19 @@ test_expect_success 'strip comments with changed comment char' '
 	test -z "$(echo "; comment" | git -c core.commentchar=";" stripspace -s)"
 '
 
+test_expect_success 'strip comments with changed comment string' '
+	test ! -z "$(echo "// comment" | git -c core.commentchar=// stripspace)" &&
+	test -z "$(echo "// comment" | git -c core.commentchar="//" stripspace -s)"
+'
+
 test_expect_success 'newline as commentchar is forbidden' '
 	test_must_fail git -c core.commentChar="$LF" stripspace -s 2>err &&
-	grep "core.commentChar cannot contain newline" err
+	grep "core.commentchar cannot contain newline" err
 '
 
 test_expect_success 'empty commentchar is forbidden' '
 	test_must_fail git -c core.commentchar= stripspace -s 2>err &&
-	grep "core.commentChar must have at least one character" err
+	grep "core.commentchar must have at least one character" err
 '
 
 test_expect_success '-c with single line' '
-- 
2.44.0.727.g4d9414de3a


^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH 17/16] config: add core.commentString
  2024-03-27  8:19                             ` [PATCH 17/16] config: add core.commentString Jeff King
@ 2024-03-27 12:45                               ` Chris Torek
  2024-03-27 16:13                               ` Junio C Hamano
  1 sibling, 0 replies; 82+ messages in thread
From: Chris Torek @ 2024-03-27 12:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Kristoffer Haugsbakk, Dragan Simic,
	Manlio Perillo, René Scharfe, Phillip Wood, git

Assuming the implementation continues as suggested, I'll mention
here that I really like this note:

On Wed, Mar 27, 2024 at 1:19 AM Jeff King <peff@peff.net> wrote:
> +Note that these two variables are aliases of each other, and in modern
> +versions of Git you are free to use a string (e.g., `//` or `⁑⁕⁑`) with
> +`commentChar`. Versions of Git prior to v2.45.0 will ignore
> +`commentString` but will reject a value of `commentChar` that consists
> +of more than a single ASCII byte. If you plan to use your config with
> +older and newer versions of Git, you may want to specify both:

One of the big things I think is missing from existing Git documentation
(and would, alas, be a huge effort to provide) is backwards-compatibility
notes. People are often stuck with old versions of software, at least
during initial bringup, for a variety of reasons, and such notes can
be quite helpful.

Examples of modern systems that have extensive notes include
Python, where the documentation often says "new in 3.7" or
whatever, and Go, where the automatically-built documentation
notes which version of Go introduced some new function.

I'm not exactly volunteering here for the heavy lifting though. :-)

Chris

^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH v2 16/16] config: allow multi-byte core.commentChar
  2024-03-27  7:46                           ` Jeff King
  2024-03-27  8:19                             ` [PATCH 17/16] config: add core.commentString Jeff King
@ 2024-03-27 14:53                             ` Junio C Hamano
  1 sibling, 0 replies; 82+ messages in thread
From: Junio C Hamano @ 2024-03-27 14:53 UTC (permalink / raw)
  To: Jeff King
  Cc: Kristoffer Haugsbakk, Dragan Simic, Manlio Perillo,
	René Scharfe, Phillip Wood, git

Jeff King <peff@peff.net> writes:

> OK. I don't have a strong opinion. Are you OK with core.commentString as
> a strict synonym (so last-one-wins and either name overwrites previous)?
> Or do you want an override (i.e., commentString always overrides
> commentChar, regardless of order). I think it's mostly academic, and the
> strict synonym version is much easier to implement.

When I wrote it, I meant "String is a successor of Char, if both
exists that is used regardless of the order", but either is OK.
Older versions of Git would not understand the "String" version, so
it matters only to those who uses mixed versions of Git and they can
control the last-one-wins in their configuration file, I would
guess.

Thanks.


^ permalink raw reply	[flat|nested] 82+ messages in thread

* Re: [PATCH 17/16] config: add core.commentString
  2024-03-27  8:19                             ` [PATCH 17/16] config: add core.commentString Jeff King
  2024-03-27 12:45                               ` Chris Torek
@ 2024-03-27 16:13                               ` Junio C Hamano
  2024-03-28  9:47                                 ` Jeff King
  1 sibling, 1 reply; 82+ messages in thread
From: Junio C Hamano @ 2024-03-27 16:13 UTC (permalink / raw)
  To: Jeff King
  Cc: Kristoffer Haugsbakk, Dragan Simic, Manlio Perillo,
	René Scharfe, Phillip Wood, git

Jeff King <peff@peff.net> writes:

> Note that you graduated kh/doc-commentchar-is-a-byte, which says "this
> ASCII character" early in the description, which will be incorrect if my
> series is merged.

True.  I could tweak this patch to force a conflict

 core.commentChar::
 core.commentString::
 	Commands such as `commit` and `tag` that let you edit
-	messages consider a line that begins with this character
+	messages consider a line that begins with this string
 	commented, and removes them after the editor returns
 	(default '#').

and let the rerere database to remember the resolution (which will
tweak "string" back to "character").  But I'll prepare a merge-fix
before I forget, which is a cleaner approach.

> An alternative to using "$var cannot ..." in the error messages (if we
> don't like the all-lowercase variable name) is to just say "comment
> strings cannot ...". That vaguely covers both cases, and the message
> printed by the config code itself does mention the actual variable name
> that triggered the error.

OK, because the error() return from this function will trigger
another die() in the caller, e.g.

    error: core.commentchar must have at least one character
    fatal: bad config variable 'core.commentchar' in file '.git/config' at line 6

so we can afford to make the "error" side vague, except that the
"fatal" one is also downcased already, so we are not really solving
anything by making the message vague, I would think.  The posted
patch as-is is prefectly fine.

Side note:
    I wonder if we would later want to somehow _merge_ these two
    error messages, i.e. the lower-level will notice and record the
    nature of the problem instead of calling error(), and the caller
    will use the recorded information while composing the "fatal"
    message to die with.  I actually do not know if it is a good
    idea to begin with.  If we want to do it right, the "record"
    part probably cannot be a simple "stringify into strbuf" that
    will result in lego message that is harder for i18n folks.


$ git diff refs/merge-fix/jk/core-comment-string^!
diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index bd033ab100..bbe869c497 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -522,7 +522,7 @@ core.editor::
 core.commentChar::
 core.commentString::
 	Commands such as `commit` and `tag` that let you edit
-	messages consider a line that begins with this ASCII character
+	messages consider a line that begins with this character
 	commented, and removes them after the editor returns
 	(default '#').
 +

^ permalink raw reply related	[flat|nested] 82+ messages in thread

* Re: [PATCH 17/16] config: add core.commentString
  2024-03-27 16:13                               ` Junio C Hamano
@ 2024-03-28  9:47                                 ` Jeff King
  0 siblings, 0 replies; 82+ messages in thread
From: Jeff King @ 2024-03-28  9:47 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Kristoffer Haugsbakk, Dragan Simic, Manlio Perillo,
	René Scharfe, Phillip Wood, git

On Wed, Mar 27, 2024 at 09:13:31AM -0700, Junio C Hamano wrote:

> > An alternative to using "$var cannot ..." in the error messages (if we
> > don't like the all-lowercase variable name) is to just say "comment
> > strings cannot ...". That vaguely covers both cases, and the message
> > printed by the config code itself does mention the actual variable name
> > that triggered the error.
> 
> OK, because the error() return from this function will trigger
> another die() in the caller, e.g.
> 
>     error: core.commentchar must have at least one character
>     fatal: bad config variable 'core.commentchar' in file '.git/config' at line 6
> 
> so we can afford to make the "error" side vague, except that the
> "fatal" one is also downcased already, so we are not really solving
> anything by making the message vague, I would think.  The posted
> patch as-is is prefectly fine.

Oh, right.  For some reason I thought the die() message would have the
variable as written by the user, but that obviously is not true. So I
agree it would not even be an improvement (and the normalizing in my new
error() message is something we've been living with all along anyway for
other messages).

> Side note:
>     I wonder if we would later want to somehow _merge_ these two
>     error messages, i.e. the lower-level will notice and record the
>     nature of the problem instead of calling error(), and the caller
>     will use the recorded information while composing the "fatal"
>     message to die with.  I actually do not know if it is a good
>     idea to begin with.  If we want to do it right, the "record"
>     part probably cannot be a simple "stringify into strbuf" that
>     will result in lego message that is harder for i18n folks.

Yeah, this is a general problem of accumulating errors. I had always
assumed in cases like this that we could have some language-independent
syntax like:

  die("%s:%d: error parsing '%s': %s",
      file, line_nr, var, err_from_callback);

It's certainly lego-like, but it avoids the worst lego cases where
we're literally composing sentences. But as somebody who does not do
translations, it's possible I'm just being optimistic. ;)

-Peff

^ permalink raw reply	[flat|nested] 82+ messages in thread

end of thread, other threads:[~2024-03-28  9:47 UTC | newest]

Thread overview: 82+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-05  8:43 Clarify the meaning of "character" in the documentation Manlio Perillo
2024-03-05  9:00 ` Kristoffer Haugsbakk
2024-03-05 15:32   ` Junio C Hamano
2024-03-05 15:42     ` Dragan Simic
2024-03-05 16:38       ` Junio C Hamano
2024-03-05 17:28         ` Dragan Simic
2024-03-06  8:08         ` [messy PATCH] multi-byte core.commentChar Jeff King
2024-03-07  9:14           ` [PATCH 0/15] allow " Jeff King
2024-03-07  9:15             ` [PATCH 01/15] strbuf: simplify comment-handling in add_lines() helper Jeff King
2024-03-07  9:16             ` [PATCH 02/15] strbuf: avoid static variables in strbuf_add_commented_lines() Jeff King
2024-03-07  9:18             ` [PATCH 03/15] commit: refactor base-case of adjust_comment_line_char() Jeff King
2024-03-07  9:19             ` [PATCH 04/15] strbuf: avoid shadowing global comment_line_char name Jeff King
2024-03-07  9:20             ` [PATCH 05/15] environment: store comment_line_char as a string Jeff King
2024-03-07  9:21             ` [PATCH 06/15] strbuf: accept a comment string for strbuf_stripspace() Jeff King
2024-03-07  9:53               ` Jeff King
2024-03-07  9:22             ` [PATCH 07/15] strbuf: accept a comment string for strbuf_commented_addf() Jeff King
2024-03-07  9:23             ` [PATCH 08/15] strbuf: accept a comment string for strbuf_add_commented_lines() Jeff King
2024-03-07  9:23             ` [PATCH 09/15] prefer comment_line_str to comment_line_char for printing Jeff King
2024-03-07  9:24             ` [PATCH 10/15] find multi-byte comment chars in NUL-terminated strings Jeff King
2024-03-07  9:26             ` [PATCH 11/15] find multi-byte comment chars in unterminated buffers Jeff King
2024-03-07 11:08               ` Jeff King
2024-03-07 19:41                 ` René Scharfe
2024-03-07 19:47                   ` René Scharfe
2024-03-07 19:42               ` René Scharfe
2024-03-08 10:17                 ` Phillip Wood
2024-03-08 15:58                   ` Junio C Hamano
2024-03-08 16:20                     ` Phillip Wood
2024-03-12  8:19                       ` Jeff King
2024-03-12 14:36                         ` phillip.wood123
2024-03-13  6:23                           ` Jeff King
2024-03-12  8:05                 ` Jeff King
2024-03-14 19:37                   ` René Scharfe
2024-03-07  9:27             ` [PATCH 12/15] sequencer: handle multi-byte comment characters when writing todo list Jeff King
2024-03-08 10:20               ` Phillip Wood
2024-03-12  8:21                 ` Jeff King
2024-03-07  9:28             ` [PATCH 13/15] wt-status: drop custom comment-char stringification Jeff King
2024-03-07  9:30             ` [PATCH 14/15] environment: drop comment_line_char compatibility macro Jeff King
2024-03-07  9:34             ` [PATCH 15/15] config: allow multi-byte core.commentChar Jeff King
2024-03-08 11:07             ` [PATCH 0/15] " Phillip Wood
2024-03-12  9:10             ` [PATCH v2 0/16] " Jeff King
2024-03-12  9:17               ` [PATCH v2 01/16] config: forbid newline as core.commentChar Jeff King
2024-03-12  9:17               ` [PATCH v2 02/16] strbuf: simplify comment-handling in add_lines() helper Jeff King
2024-03-12  9:17               ` [PATCH v2 03/16] strbuf: avoid static variables in strbuf_add_commented_lines() Jeff King
2024-03-12  9:17               ` [PATCH v2 04/16] commit: refactor base-case of adjust_comment_line_char() Jeff King
2024-03-12  9:17               ` [PATCH v2 05/16] strbuf: avoid shadowing global comment_line_char name Jeff King
2024-03-12  9:17               ` [PATCH v2 06/16] environment: store comment_line_char as a string Jeff King
2024-03-12  9:17               ` [PATCH v2 07/16] strbuf: accept a comment string for strbuf_stripspace() Jeff King
2024-03-12  9:17               ` [PATCH v2 08/16] strbuf: accept a comment string for strbuf_commented_addf() Jeff King
2024-03-12  9:17               ` [PATCH v2 09/16] strbuf: accept a comment string for strbuf_add_commented_lines() Jeff King
2024-03-12  9:17               ` [PATCH v2 10/16] prefer comment_line_str to comment_line_char for printing Jeff King
2024-03-12  9:17               ` [PATCH v2 11/16] find multi-byte comment chars in NUL-terminated strings Jeff King
2024-03-12  9:17               ` [PATCH v2 12/16] find multi-byte comment chars in unterminated buffers Jeff King
2024-03-12  9:17               ` [PATCH v2 13/16] sequencer: handle multi-byte comment characters when writing todo list Jeff King
2024-03-12  9:17               ` [PATCH v2 14/16] wt-status: drop custom comment-char stringification Jeff King
2024-03-12  9:17               ` [PATCH v2 15/16] environment: drop comment_line_char compatibility macro Jeff King
2024-03-12  9:17               ` [PATCH v2 16/16] config: allow multi-byte core.commentChar Jeff King
2024-03-13 18:23                 ` Kristoffer Haugsbakk
2024-03-13 18:39                   ` Junio C Hamano
2024-03-15  5:59                   ` Jeff King
2024-03-15  7:16                     ` Kristoffer Haugsbakk
2024-03-15  8:10                       ` Jeff King
2024-03-15 13:30                         ` Kristoffer Haugsbakk
2024-03-15 15:40                         ` Junio C Hamano
2024-03-16  5:50                           ` Jeff King
2024-03-26 22:10                         ` Junio C Hamano
2024-03-26 22:12                           ` Kristoffer Haugsbakk
2024-03-27  7:46                           ` Jeff King
2024-03-27  8:19                             ` [PATCH 17/16] config: add core.commentString Jeff King
2024-03-27 12:45                               ` Chris Torek
2024-03-27 16:13                               ` Junio C Hamano
2024-03-28  9:47                                 ` Jeff King
2024-03-27 14:53                             ` [PATCH v2 16/16] config: allow multi-byte core.commentChar Junio C Hamano
2024-03-12 14:40               ` [PATCH v2 0/16] " phillip.wood123
2024-03-12 20:30                 ` Junio C Hamano
2024-03-05 16:58       ` Clarify the meaning of "character" in the documentation Kristoffer Haugsbakk
2024-03-05 17:20         ` Dragan Simic
2024-03-05 17:37           ` Kristoffer Haugsbakk
2024-03-05 21:19             ` Dragan Simic
2024-03-05 16:51     ` Kristoffer Haugsbakk
2024-03-05 17:37       ` Junio C Hamano
2024-03-05 17:49         ` Kristoffer Haugsbakk
2024-03-05 22:48   ` brian m. carlson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.