git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Fix a bug in configuration parsing, and improve tests and documentation
@ 2024-03-15 13:22 Dragan Simic
  2024-03-15 13:22 ` [PATCH 1/4] config: minor addition of whitespace Dragan Simic
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Dragan Simic @ 2024-03-15 13:22 UTC (permalink / raw)
  To: git; +Cc: gitster, rsbecker, github

This series is an evolvement from another recent series, [1] as a result
of a decision to fix a longstanding bug in the parsing of configuration
option values, instead of documenting the status quo. [2][3]

The bufgix introduced in this series _should_ have no hidden negative
effects.  All of the configuration-related tests, both the old and the
new ones, pass with the patches applied.

[1] https://lore.kernel.org/git/cover.1710258538.git.dsimic@manjaro.org/T/#u
[2] https://lore.kernel.org/git/ff7b0a2ead90ad9a9456141da5e4df4a@manjaro.org/
[3] https://lore.kernel.org/git/11be11f231f3bf41d0245c780c20693f@manjaro.org/

Dragan Simic (4):
  config: minor addition of whitespace
  config: really keep value-internal whitespace verbatim
  t1300: add more tests for whitespace and inline comments
  config.txt: describe handling of whitespace further

 Documentation/config.txt |  19 +++++---
 config.c                 |  15 ++++--
 t/t1300-config.sh        | 102 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 122 insertions(+), 14 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] config: minor addition of whitespace
  2024-03-15 13:22 [PATCH 0/4] Fix a bug in configuration parsing, and improve tests and documentation Dragan Simic
@ 2024-03-15 13:22 ` Dragan Simic
  2024-03-15 13:22 ` [PATCH 2/4] config: really keep value-internal whitespace verbatim Dragan Simic
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Dragan Simic @ 2024-03-15 13:22 UTC (permalink / raw)
  To: git; +Cc: gitster, rsbecker, github

In general, binary operators should be enclosed in a pair of leading and
trailing space characters.  Thus, clean up one spotted expression that for
some reason had a "bunched up" operator.

Signed-off-by: Dragan Simic <dsimic@manjaro.org>
---
 config.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/config.c b/config.c
index 3cfeb3d8bd99..a86a20cdf5cb 100644
--- a/config.c
+++ b/config.c
@@ -869,7 +869,7 @@ static char *parse_value(struct config_source *cs)
 			continue;
 		}
 		if (c == '"') {
-			quote = 1-quote;
+			quote = 1 - quote;
 			continue;
 		}
 		strbuf_addch(&cs->value, c);

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] config: really keep value-internal whitespace verbatim
  2024-03-15 13:22 [PATCH 0/4] Fix a bug in configuration parsing, and improve tests and documentation Dragan Simic
  2024-03-15 13:22 ` [PATCH 1/4] config: minor addition of whitespace Dragan Simic
@ 2024-03-15 13:22 ` Dragan Simic
  2024-03-15 17:46   ` Junio C Hamano
  2024-03-15 13:22 ` [PATCH 3/4] t1300: add more tests for whitespace and inline comments Dragan Simic
  2024-03-15 13:22 ` [PATCH 4/4] config.txt: describe handling of whitespace further Dragan Simic
  3 siblings, 1 reply; 11+ messages in thread
From: Dragan Simic @ 2024-03-15 13:22 UTC (permalink / raw)
  To: git; +Cc: gitster, rsbecker, github

Fix a bug in function parse_value() that prevented whitespace characters
(i.e. spaces and horizontal tabs) found inside configuration option values
from being parsed and returned in their original form.  The bug caused any
number of consecutive whitespace characters to be wrongly "squashed" into
the same number of space characters.

This bug was introduced back in July 2009, in commit ebdaae372b46 ("config:
Keep inner whitespace verbatim").

Further investigation showed that setting a configuration value, by invoking
git-config(1), converts value-internal horizontal tabs into "\t" escape
sequences, which the buggy value-parsing logic in function parse_value()
didn't "squash" into spaces.  That's why the test included in the ebdaae37
commit passed, which presumably made the bug remain undetected for this long.
On the other hand, value-internal literal horizontal tab characters, found in
a configuration file edited by hand, do get "squashed" by the value-parsing
logic, so the right choice was to fix this bug by making the value-internal
whitespace characters preserved verbatim.

Fixes: ebdaae372b46 ("config: Keep inner whitespace verbatim")
Signed-off-by: Dragan Simic <dsimic@manjaro.org>
---
 config.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/config.c b/config.c
index a86a20cdf5cb..5072f12e62e4 100644
--- a/config.c
+++ b/config.c
@@ -817,33 +817,38 @@ static int get_next_char(struct config_source *cs)
 
 static char *parse_value(struct config_source *cs)
 {
-	int quote = 0, comment = 0, space = 0;
+	int quote = 0, comment = 0;
+	size_t trim_len = 0;
 
 	strbuf_reset(&cs->value);
 	for (;;) {
 		int c = get_next_char(cs);
 		if (c == '\n') {
 			if (quote) {
 				cs->linenr--;
 				return NULL;
 			}
+			if (trim_len)
+				strbuf_setlen(&cs->value, trim_len);
 			return cs->value.buf;
 		}
 		if (comment)
 			continue;
 		if (isspace(c) && !quote) {
+			if (!trim_len)
+				trim_len = cs->value.len;
 			if (cs->value.len)
-				space++;
+				strbuf_addch(&cs->value, c);
 			continue;
 		}
 		if (!quote) {
 			if (c == ';' || c == '#') {
 				comment = 1;
 				continue;
 			}
 		}
-		for (; space; space--)
-			strbuf_addch(&cs->value, ' ');
+		if (trim_len)
+			trim_len = 0;
 		if (c == '\\') {
 			c = get_next_char(cs);
 			switch (c) {

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] t1300: add more tests for whitespace and inline comments
  2024-03-15 13:22 [PATCH 0/4] Fix a bug in configuration parsing, and improve tests and documentation Dragan Simic
  2024-03-15 13:22 ` [PATCH 1/4] config: minor addition of whitespace Dragan Simic
  2024-03-15 13:22 ` [PATCH 2/4] config: really keep value-internal whitespace verbatim Dragan Simic
@ 2024-03-15 13:22 ` Dragan Simic
  2024-03-15 19:39   ` Eric Sunshine
  2024-03-15 13:22 ` [PATCH 4/4] config.txt: describe handling of whitespace further Dragan Simic
  3 siblings, 1 reply; 11+ messages in thread
From: Dragan Simic @ 2024-03-15 13:22 UTC (permalink / raw)
  To: git; +Cc: gitster, rsbecker, github

Add a handful of additional automated tests, to improve the coverage of
configuration file entries whose values contain internal whitespace, leading
and/or trailing whitespace, or which contain an additional inline comment.

Signed-off-by: Dragan Simic <dsimic@manjaro.org>
---
 t/t1300-config.sh | 102 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 101 insertions(+), 1 deletion(-)

diff --git a/t/t1300-config.sh b/t/t1300-config.sh
index 31c387868708..6eef8a48098c 100755
--- a/t/t1300-config.sh
+++ b/t/t1300-config.sh
@@ -11,6 +11,96 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
+cat > .git/config << EOF
+[section]
+	solid = rock
+	sparse = big 		 blue
+	sparseAndTail = big 		 blue 
+	sparseAndTailQuoted = "big 		 blue "
+	sparseAndBiggerTail = big 		 blue 	 	
+	sparseAndBiggerTailQuoted = "big 		 blue 	 	"
+	sparseAndBiggerTailQuotedPlus =  "big 		 blue 	 	"	 
+	headAndTail = 	big blue 
+	headAndTailQuoted = "	big blue "
+	headAndTailQuotedPlus =  "	big blue " 
+	annotated = big blue	# to be discarded
+	annotatedQuoted = "big blue"	# to be discarded
+EOF
+
+echo 'rock' > expect
+
+test_expect_success 'no internal whitespace' '
+	git config --get section.solid > output &&
+	test_cmp expect output
+'
+
+echo 'big 		 blue' > expect
+
+test_expect_success 'internal whitespace' '
+	git config --get section.sparse > output &&
+	test_cmp expect output
+'
+
+test_expect_success 'internal and trailing whitespace' '
+	git config --get section.sparseAndTail > output &&
+	test_cmp expect output
+'
+
+test_expect_success 'internal and more trailing whitespace' '
+	git config --get section.sparseAndBiggerTail > output &&
+	test_cmp expect output
+'
+
+echo 'big 		 blue ' > expect
+
+test_expect_success 'internal and trailing whitespace, all quoted' '
+	git config --get section.sparseAndTailQuoted > output &&
+	test_cmp expect output
+'
+
+echo 'big 		 blue 	 	' > expect
+
+test_expect_success 'internal and more trailing whitespace, all quoted' '
+	git config --get section.sparseAndBiggerTailQuoted > output &&
+	test_cmp expect output
+'
+
+test_expect_success 'internal and more trailing whitespace, not all quoted' '
+	git config --get section.sparseAndBiggerTailQuotedPlus > output &&
+	test_cmp expect output
+'
+
+echo 'big blue' > expect
+
+test_expect_success 'leading and trailing whitespace' '
+	git config --get section.headAndTail > output &&
+	test_cmp expect output
+'
+
+echo '	big blue ' > expect
+
+test_expect_success 'leading and trailing whitespace, all quoted' '
+	git config --get section.headAndTailQuoted > output &&
+	test_cmp expect output
+'
+
+test_expect_success 'leading and trailing whitespace, not all quoted' '
+	git config --get section.headAndTailQuotedPlus > output &&
+	test_cmp expect output
+'
+
+echo 'big blue' > expect
+
+test_expect_success 'inline comment' '
+	git config --get section.annotated > output &&
+	test_cmp expect output
+'
+
+test_expect_success 'inline comment, quoted' '
+	git config --get section.annotatedQuoted > output &&
+	test_cmp expect output
+'
+
 test_expect_success 'clear default config' '
 	rm -f .git/config
 '
@@ -1066,7 +1156,17 @@ test_expect_success '--null --get-regexp' '
 	test_cmp expect result
 '
 
-test_expect_success 'inner whitespace kept verbatim' '
+test_expect_success 'inner whitespace kept verbatim, spaces only' '
+	git config section.val "foo   bar" &&
+	test_cmp_config "foo   bar" section.val
+'
+
+test_expect_success 'inner whitespace kept verbatim, horizontal tabs only' '
+	git config section.val "foo		bar" &&
+	test_cmp_config "foo		bar" section.val
+'
+
+test_expect_success 'inner whitespace kept verbatim, horizontal tabs and spaces' '
 	git config section.val "foo 	  bar" &&
 	test_cmp_config "foo 	  bar" section.val
 '

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] config.txt: describe handling of whitespace further
  2024-03-15 13:22 [PATCH 0/4] Fix a bug in configuration parsing, and improve tests and documentation Dragan Simic
                   ` (2 preceding siblings ...)
  2024-03-15 13:22 ` [PATCH 3/4] t1300: add more tests for whitespace and inline comments Dragan Simic
@ 2024-03-15 13:22 ` Dragan Simic
  3 siblings, 0 replies; 11+ messages in thread
From: Dragan Simic @ 2024-03-15 13:22 UTC (permalink / raw)
  To: git; +Cc: gitster, rsbecker, github

Make it more clear what the whitespace characters are in the context of git
configuration files, and improve the description of the trailing whitespace
handling a bit.

Signed-off-by: Dragan Simic <dsimic@manjaro.org>
---
 Documentation/config.txt | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 782c2bab906c..20f3300dc706 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -22,9 +22,10 @@ multivalued.
 Syntax
 ~~~~~~
 
-The syntax is fairly flexible and permissive; whitespaces are mostly
-ignored.  The '#' and ';' characters begin comments to the end of line,
-blank lines are ignored.
+The syntax is fairly flexible and permissive.  Whitespace characters,
+which in this context are the space character (SP) and the horizontal
+tabulation (HT), are mostly ignored.  The '#' and ';' characters begin
+comments to the end of line.  Blank lines are ignored.
 
 The file consists of sections and variables.  A section begins with
 the name of the section in square brackets and continues until the next
@@ -64,12 +65,14 @@ The variable names are case-insensitive, allow only alphanumeric characters
 and `-`, and must start with an alphabetic character.
 
 A line that defines a value can be continued to the next line by
-ending it with a `\`; the backslash and the end-of-line are
-stripped.  Leading whitespaces after 'name =', the remainder of the
+ending it with a `\`; the backslash and the end-of-line are stripped.
+Leading whitespace characters after 'name =', the remainder of the
 line after the first comment character '#' or ';', and trailing
-whitespaces of the line are discarded unless they are enclosed in
-double quotes.  Internal whitespaces within the value are retained
-verbatim.
+whitespace characters of the line are discarded unless they are enclosed
+in double quotes.  The discarding of the trailing whitespace characters
+applies regardless of the discarding of the portion of the line after
+the first comment character.  Internal whitespace characters within the
+value are retained verbatim.
 
 Inside double quotes, double quote `"` and backslash `\` characters
 must be escaped: use `\"` for `"` and `\\` for `\`.

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] config: really keep value-internal whitespace verbatim
  2024-03-15 13:22 ` [PATCH 2/4] config: really keep value-internal whitespace verbatim Dragan Simic
@ 2024-03-15 17:46   ` Junio C Hamano
  2024-03-15 19:50     ` Dragan Simic
  0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2024-03-15 17:46 UTC (permalink / raw)
  To: Dragan Simic; +Cc: git, rsbecker, github

Dragan Simic <dsimic@manjaro.org> writes:

> Fix a bug in function parse_value() that prevented whitespace characters
> (i.e. spaces and horizontal tabs) found inside configuration option values
> from being parsed and returned in their original form.  The bug caused any
> number of consecutive whitespace characters to be wrongly "squashed" into
> the same number of space characters.
>
> This bug was introduced back in July 2009, in commit ebdaae372b46 ("config:
> Keep inner whitespace verbatim").
>
> Further investigation showed that setting a configuration value, by invoking
> git-config(1), converts value-internal horizontal tabs into "\t" escape
> sequences, which the buggy value-parsing logic in function parse_value()
> didn't "squash" into spaces.  That's why the test included in the ebdaae37
> commit passed, which presumably made the bug remain undetected for this long.
> On the other hand, value-internal literal horizontal tab characters, found in
> a configuration file edited by hand, do get "squashed" by the value-parsing
> logic, so the right choice was to fix this bug by making the value-internal
> whitespace characters preserved verbatim.

OK.

> Fixes: ebdaae372b46 ("config: Keep inner whitespace verbatim")

No need for this line.  You mentioned it in the text already, and
more importantly, grepping for trailers is not the right thing to do
because we may think we fixed all bugs in ebdaae372b46 did with this
patch, which may in 6 months turn out to be false but we cannot undo
the trailer.  On the other hand, the discussion of the problem in
the proposed log message gives readers a richer context and the
future developers can understand that this fixed one thing in
ebdaae372b46 but didn't even know about other bugs introduced by
that old commit and make more intelligent decision based on that
better understanding.

> Signed-off-by: Dragan Simic <dsimic@manjaro.org>
> ---
>  config.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/config.c b/config.c
> index a86a20cdf5cb..5072f12e62e4 100644
> --- a/config.c
> +++ b/config.c
> @@ -817,33 +817,38 @@ static int get_next_char(struct config_source *cs)
>  
>  static char *parse_value(struct config_source *cs)
>  {
> -	int quote = 0, comment = 0, space = 0;
> +	int quote = 0, comment = 0;
> +	size_t trim_len = 0;


>  	strbuf_reset(&cs->value);
>  	for (;;) {
>  		int c = get_next_char(cs);
>  		if (c == '\n') {
>  			if (quote) {
>  				cs->linenr--;
>  				return NULL;
>  			}
> +			if (trim_len)
> +				strbuf_setlen(&cs->value, trim_len);
>  			return cs->value.buf;

So the idea is that we copy everything we read verbatim in cs->value
but keep track of the beginning of the run of whitespace characters
at the end of the line in trim_len so that we can rtrim it?  That
should be the most straight-forward implementation.

>  		}
>  		if (comment)
>  			continue;
>  		if (isspace(c) && !quote) {
> +			if (!trim_len)
> +				trim_len = cs->value.len;
>  			if (cs->value.len)
> +				strbuf_addch(&cs->value, c);
>  			continue;

While we are not inside a dq-pair, we keep ignoring a whitespace
character at the beginning (i.e. cs->value.len == 0).

If we have some value in cs->value, however, we add the whitespace
character to cs->value verbatim.  trim_len==0 signals that the last
character we processed was not a whitespace character, and we copy
the current length of cs->value there---this is so that we can rtrim
away the run of whitespaces at the end of the input, including the
byte we are adding here, if it turns out that we are looking at the
first whitespace character of such trailing blanks.

>  		}
>  		if (!quote) {
>  			if (c == ';' || c == '#') {
>  				comment = 1;
>  				continue;
>  			}
>  		}

> +		if (trim_len)
> +			trim_len = 0;

If we are outside a dq-pair, we reach here only when we are looking
at a non-whitespace character.  If we are counting a run of unquoted
whitespaces, we can reset trim_len here to record that we do not
have to trim.

But can we be seeing a whitespace that is inside quote here?  Is
resetting trim_len to zero what we want in such a case?  Let's see.
Inside dq, we'd want to accumulate bytes, possibly after unwrapping
their backslash escapes, to cs->value, and these bytes include the
whitespace characters---we want to keep them literally without
trimming.  OK.

Looking good.  We should already demonstrate that this works well
with a new test in the same patch, can't we?  If we can, we should.

Thanks.

>  		if (c == '\\') {
>  			c = get_next_char(cs);
>  			switch (c) {

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] t1300: add more tests for whitespace and inline comments
  2024-03-15 13:22 ` [PATCH 3/4] t1300: add more tests for whitespace and inline comments Dragan Simic
@ 2024-03-15 19:39   ` Eric Sunshine
  2024-03-15 20:04     ` Dragan Simic
  2024-03-15 20:29     ` Eric Sunshine
  0 siblings, 2 replies; 11+ messages in thread
From: Eric Sunshine @ 2024-03-15 19:39 UTC (permalink / raw)
  To: Dragan Simic; +Cc: git, gitster, rsbecker, github

On Fri, Mar 15, 2024 at 9:22 AM Dragan Simic <dsimic@manjaro.org> wrote:
> Add a handful of additional automated tests, to improve the coverage of
> configuration file entries whose values contain internal whitespace, leading
> and/or trailing whitespace, or which contain an additional inline comment.
>
> Signed-off-by: Dragan Simic <dsimic@manjaro.org>

Just some minor style-related comments...

> diff --git a/t/t1300-config.sh b/t/t1300-config.sh
> @@ -11,6 +11,96 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +cat > .git/config << EOF
> +[section]
> +       solid = rock
> +       sparse = big             blue
> +       sparseAndTail = big              blue
> +       sparseAndTailQuoted = "big               blue "
> +       sparseAndBiggerTail = big                blue
> +       sparseAndBiggerTailQuoted = "big                 blue           "
> +       sparseAndBiggerTailQuotedPlus =  "big            blue           "
> +       headAndTail =   big blue
> +       headAndTailQuoted = "   big blue "
> +       headAndTailQuotedPlus =  "      big blue "
> +       annotated = big blue    # to be discarded
> +       annotatedQuoted = "big blue"    # to be discarded
> +EOF

These days we try to place all test-related code inside a
test_expect_success() context rather than having it standalone. In
this case, since the file being created is (presumably) shared by
multiple tests in this script, you may want to add a new test which
performs this setup step.

Use \EOF rather than EOF to signal to readers that we don't expect any
variable interpolation to happen within the here-doc body.

Further, use -\EOF inside test_expect_success() to allow us to indent
the body of the heredoc to match the test indentation.

Style guideline says to omit whitespace after redirection operators
(such as `<<` and `>`).

We have a q_to_tab() function which lets us state explicitly for
readers the location of TAB characters in the heredoc body. You'll
often see that used instead of literal TABs.

Taking all the above into account, perhaps:

    test_expect_success 'setup whitespace' '
        q_to_tab >.git/config <<-\EOF
        [section]
        solid = rock
        sparse = bigQblue
        ...
        EOF

Same comments apply to rest of patch.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/4] config: really keep value-internal whitespace verbatim
  2024-03-15 17:46   ` Junio C Hamano
@ 2024-03-15 19:50     ` Dragan Simic
  0 siblings, 0 replies; 11+ messages in thread
From: Dragan Simic @ 2024-03-15 19:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, rsbecker, github

On 2024-03-15 18:46, Junio C Hamano wrote:
> Dragan Simic <dsimic@manjaro.org> writes:
> 
>> Fixes: ebdaae372b46 ("config: Keep inner whitespace verbatim")
> 
> No need for this line.  You mentioned it in the text already, and
> more importantly, grepping for trailers is not the right thing to do
> because we may think we fixed all bugs in ebdaae372b46 did with this
> patch, which may in 6 months turn out to be false but we cannot undo
> the trailer.  On the other hand, the discussion of the problem in
> the proposed log message gives readers a richer context and the
> future developers can understand that this fixed one thing in
> ebdaae372b46 but didn't even know about other bugs introduced by
> that old commit and make more intelligent decision based on that
> better understanding.

Got it, and I agree that the discussions linked in the patch description
provide much better ways for understanding what brought us to the 
current
situation, and what led to the proposed bugfixes.

It's simply that I'm kind of used to providing "Fixes" tags for patches
in other projects;  I'll make a mental note not to do that for git.

I'll also send the v2 with no "Fixes" tag, after the v1 spends a few 
days
on the mailing list.

>> Signed-off-by: Dragan Simic <dsimic@manjaro.org>
>> ---
>>  config.c | 13 +++++++++----
>>  1 file changed, 9 insertions(+), 4 deletions(-)
>> 
>> diff --git a/config.c b/config.c
>> index a86a20cdf5cb..5072f12e62e4 100644
>> --- a/config.c
>> +++ b/config.c
>> @@ -817,33 +817,38 @@ static int get_next_char(struct config_source 
>> *cs)
>> 
>>  static char *parse_value(struct config_source *cs)
>>  {
>> -	int quote = 0, comment = 0, space = 0;
>> +	int quote = 0, comment = 0;
>> +	size_t trim_len = 0;
> 
> 
>>  	strbuf_reset(&cs->value);
>>  	for (;;) {
>>  		int c = get_next_char(cs);
>>  		if (c == '\n') {
>>  			if (quote) {
>>  				cs->linenr--;
>>  				return NULL;
>>  			}
>> +			if (trim_len)
>> +				strbuf_setlen(&cs->value, trim_len);
>>  			return cs->value.buf;
> 
> So the idea is that we copy everything we read verbatim in cs->value
> but keep track of the beginning of the run of whitespace characters
> at the end of the line in trim_len so that we can rtrim it?  That
> should be the most straight-forward implementation.

Yes, that's it.  It's as simple as it can be.  A different, somewhat
more complex approach of putting the encountered whitespace in a 
separate
buffer may be warranted if we expected _a lot_ of whitespace, i.e. much
more whitespace than the other characters, but that shouldn't be the 
case,
which makes the simple approach a good choice.

>>  		}
>>  		if (comment)
>>  			continue;
>>  		if (isspace(c) && !quote) {
>> +			if (!trim_len)
>> +				trim_len = cs->value.len;
>>  			if (cs->value.len)
>> +				strbuf_addch(&cs->value, c);
>>  			continue;
> 
> While we are not inside a dq-pair, we keep ignoring a whitespace
> character at the beginning (i.e. cs->value.len == 0).

Yes, that "eats up" any leading whitespace.

> If we have some value in cs->value, however, we add the whitespace
> character to cs->value verbatim.  trim_len==0 signals that the last
> character we processed was not a whitespace character, and we copy
> the current length of cs->value there---this is so that we can rtrim
> away the run of whitespaces at the end of the input, including the
> byte we are adding here, if it turns out that we are looking at the
> first whitespace character of such trailing blanks.

Exactly.  By the way, please know that I _really_ appreciate your
highly detailed patch reviews!

>>  		}
>>  		if (!quote) {
>>  			if (c == ';' || c == '#') {
>>  				comment = 1;
>>  				continue;
>>  			}
>>  		}
> 
>> +		if (trim_len)
>> +			trim_len = 0;
> 
> If we are outside a dq-pair, we reach here only when we are looking
> at a non-whitespace character.  If we are counting a run of unquoted
> whitespaces, we can reset trim_len here to record that we do not
> have to trim.

Exactly.  It resets the "need to rtrim flag", so to speak.

> But can we be seeing a whitespace that is inside quote here?  Is
> resetting trim_len to zero what we want in such a case?  Let's see.
> Inside dq, we'd want to accumulate bytes, possibly after unwrapping
> their backslash escapes, to cs->value, and these bytes include the
> whitespace characters---we want to keep them literally without
> trimming.  OK.

Yes, any whitespace inside a quoted option value is to be taken
verbatim, so we basically reset the "need to rtrim flag" in that case,
because value-internal whitespace isn't to be rtrimmed.

> Looking good.  We should already demonstrate that this works well
> with a new test in the same patch, can't we?  If we can, we should.

Sorry, this sounds a bit confusing to me?  I think there are already
more than enough new tests in the 3/4 patch I sent?

Could you, please, clarify a bit?

>>  		if (c == '\\') {
>>  			c = get_next_char(cs);
>>  			switch (c) {

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] t1300: add more tests for whitespace and inline comments
  2024-03-15 19:39   ` Eric Sunshine
@ 2024-03-15 20:04     ` Dragan Simic
  2024-03-15 20:29     ` Eric Sunshine
  1 sibling, 0 replies; 11+ messages in thread
From: Dragan Simic @ 2024-03-15 20:04 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: git, gitster, rsbecker, github

Hello Eric,

On 2024-03-15 20:39, Eric Sunshine wrote:
> On Fri, Mar 15, 2024 at 9:22 AM Dragan Simic <dsimic@manjaro.org> 
> wrote:
>> Add a handful of additional automated tests, to improve the coverage 
>> of
>> configuration file entries whose values contain internal whitespace, 
>> leading
>> and/or trailing whitespace, or which contain an additional inline 
>> comment.
>> 
>> Signed-off-by: Dragan Simic <dsimic@manjaro.org>
> 
> Just some minor style-related comments...
> 
>> diff --git a/t/t1300-config.sh b/t/t1300-config.sh
>> @@ -11,6 +11,96 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>> +cat > .git/config << EOF
>> +[section]
>> +       solid = rock
>> +       sparse = big             blue
>> +       sparseAndTail = big              blue
>> +       sparseAndTailQuoted = "big               blue "
>> +       sparseAndBiggerTail = big                blue
>> +       sparseAndBiggerTailQuoted = "big                 blue          
>>  "
>> +       sparseAndBiggerTailQuotedPlus =  "big            blue          
>>  "
>> +       headAndTail =   big blue
>> +       headAndTailQuoted = "   big blue "
>> +       headAndTailQuotedPlus =  "      big blue "
>> +       annotated = big blue    # to be discarded
>> +       annotatedQuoted = "big blue"    # to be discarded
>> +EOF
> 
> These days we try to place all test-related code inside a
> test_expect_success() context rather than having it standalone. In
> this case, since the file being created is (presumably) shared by
> multiple tests in this script, you may want to add a new test which
> performs this setup step.
> 
> Use \EOF rather than EOF to signal to readers that we don't expect any
> variable interpolation to happen within the here-doc body.
> 
> Further, use -\EOF inside test_expect_success() to allow us to indent
> the body of the heredoc to match the test indentation.
> 
> Style guideline says to omit whitespace after redirection operators
> (such as `<<` and `>`).
> 
> We have a q_to_tab() function which lets us state explicitly for
> readers the location of TAB characters in the heredoc body. You'll
> often see that used instead of literal TABs.
> 
> Taking all the above into account, perhaps:
> 
>     test_expect_success 'setup whitespace' '
>         q_to_tab >.git/config <<-\EOF
>         [section]
>         solid = rock
>         sparse = bigQblue
>         ...
>         EOF
> 
> Same comments apply to rest of patch.

Thank you for your review and all suggestions!  I'll make sure to rework
the tests in v2, in the way you described it above.

I'll come back with any questions I might have while reworking the new
tests.  I hope that's fine.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] t1300: add more tests for whitespace and inline comments
  2024-03-15 19:39   ` Eric Sunshine
  2024-03-15 20:04     ` Dragan Simic
@ 2024-03-15 20:29     ` Eric Sunshine
  2024-03-15 21:42       ` Dragan Simic
  1 sibling, 1 reply; 11+ messages in thread
From: Eric Sunshine @ 2024-03-15 20:29 UTC (permalink / raw)
  To: Dragan Simic; +Cc: git, gitster, rsbecker, github

On Fri, Mar 15, 2024 at 3:39 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> These days we try to place all test-related code inside a
> test_expect_success() context rather than having it standalone. In
> this case, since the file being created is (presumably) shared by
> multiple tests in this script, you may want to add a new test which
> performs this setup step.
>
> Taking all the above into account, perhaps:
>
>     test_expect_success 'setup whitespace' '
>         q_to_tab >.git/config <<-\EOF
>         [section]
>         solid = rock
>         sparse = bigQblue
>         ...
>         EOF
>
> Same comments apply to rest of patch.

To be clear, this case is special because the file being created is
shared by multiple tests, so it deserves being placed in its own
test_expect_success() invocation.

For the remaining cases where you're doing some set-up outside of
test_expect_success(), just move the set-up code into the
corresponding test_expect_success() invocation. For instance, rather
than:

    echo 'big               blue' > expect

    test_expect_success 'internal whitespace' '
        git config --get section.sparse > output &&
        test_cmp expect output
    '

do this:

    test_expect_success 'internal whitespace' '
        echo 'bigQblue' | q_to_tab >expect
        git config --get section.sparse >actual &&
        test_cmp expect actual
    '

(I changed "output" to "actual" above since the names "expect" and
"actual" are common in the tests.)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/4] t1300: add more tests for whitespace and inline comments
  2024-03-15 20:29     ` Eric Sunshine
@ 2024-03-15 21:42       ` Dragan Simic
  0 siblings, 0 replies; 11+ messages in thread
From: Dragan Simic @ 2024-03-15 21:42 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: git, gitster, rsbecker, github

On 2024-03-15 21:29, Eric Sunshine wrote:
> On Fri, Mar 15, 2024 at 3:39 PM Eric Sunshine <sunshine@sunshineco.com> 
> wrote:
>> These days we try to place all test-related code inside a
>> test_expect_success() context rather than having it standalone. In
>> this case, since the file being created is (presumably) shared by
>> multiple tests in this script, you may want to add a new test which
>> performs this setup step.
>> 
>> Taking all the above into account, perhaps:
>> 
>>     test_expect_success 'setup whitespace' '
>>         q_to_tab >.git/config <<-\EOF
>>         [section]
>>         solid = rock
>>         sparse = bigQblue
>>         ...
>>         EOF
>> 
>> Same comments apply to rest of patch.
> 
> To be clear, this case is special because the file being created is
> shared by multiple tests, so it deserves being placed in its own
> test_expect_success() invocation.
> 
> For the remaining cases where you're doing some set-up outside of
> test_expect_success(), just move the set-up code into the
> corresponding test_expect_success() invocation. For instance, rather
> than:
> 
>     echo 'big               blue' > expect
> 
>     test_expect_success 'internal whitespace' '
>         git config --get section.sparse > output &&
>         test_cmp expect output
>     '
> 
> do this:
> 
>     test_expect_success 'internal whitespace' '
>         echo 'bigQblue' | q_to_tab >expect
>         git config --get section.sparse >actual &&
>         test_cmp expect actual
>     '
> 
> (I changed "output" to "actual" above since the names "expect" and
> "actual" are common in the tests.)

This looks nice, thanks again.  It keeps the expected results and
the test execution in a single "block", making it a bit easier to
keep track of different tests and their expected results.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2024-03-15 21:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-15 13:22 [PATCH 0/4] Fix a bug in configuration parsing, and improve tests and documentation Dragan Simic
2024-03-15 13:22 ` [PATCH 1/4] config: minor addition of whitespace Dragan Simic
2024-03-15 13:22 ` [PATCH 2/4] config: really keep value-internal whitespace verbatim Dragan Simic
2024-03-15 17:46   ` Junio C Hamano
2024-03-15 19:50     ` Dragan Simic
2024-03-15 13:22 ` [PATCH 3/4] t1300: add more tests for whitespace and inline comments Dragan Simic
2024-03-15 19:39   ` Eric Sunshine
2024-03-15 20:04     ` Dragan Simic
2024-03-15 20:29     ` Eric Sunshine
2024-03-15 21:42       ` Dragan Simic
2024-03-15 13:22 ` [PATCH 4/4] config.txt: describe handling of whitespace further Dragan Simic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).