From: Dragan Simic <dsimic@manjaro.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, rsbecker@nexbridge.com, github@seichter.de
Subject: Re: [PATCH 1/3] config.txt: describe whitespace characters further and more accurately
Date: Thu, 14 Mar 2024 07:20:00 +0100 [thread overview]
Message-ID: <ff7b0a2ead90ad9a9456141da5e4df4a@manjaro.org> (raw)
In-Reply-To: <xmqqjzm51ugt.fsf@gitster.g>
On 2024-03-14 02:18, Junio C Hamano wrote:
> Dragan Simic <dsimic@manjaro.org> writes:
>> -The syntax is fairly flexible and permissive; whitespaces are mostly
>> -ignored. The '#' and ';' characters begin comments to the end of
>> line,
>> -blank lines are ignored.
>> +The syntax is fairly flexible and permissive. Whitespace characters,
>> +which in this context are the space character (SP) and the horizontal
>> +tabulation (HT), are mostly ignored. The '#' and ';' characters
>> begin
>> +comments to the end of line. Blank lines are ignored.
>
> OK, except for "whitespace characters"---do we need to say
> "whitespace characters", after we already listed HT and SP are the
> ones, instead of just "whitespaces"?
I also spent some time thinking about that. To me, the plural form,
i.e.
"whitespaces", simply doesn't sound very good, because "whitespace"
feels
to me more like a mass noun, and I really haven't seen it used in plural
form in other projects.
>> A line that defines a value can be continued to the next line by
>> +ending it with a `\`; the backslash and the end-of-line are stripped.
>> +Leading whitespace characters after 'name =', the remainder of the
>> line after the first comment character '#' or ';', and trailing
>> +whitespace characters of the line are discarded unless they are
>> enclosed
>> +in double quotes. This discarding of the trailing whitespace
>> characters
>> +also applies after the remainder of the line after the comment
>> character
>> +is discarded.
>
> "also" makes it sound as if we do it twice, once to remove trailing
> whitespaces after the remainder of the line after '#", and then trim
> the trailing whitespaces after we removed the comment.
Good point, I also felt it the same way, but went with such wording
simply
because I thought it should be more understandable to the users, despite
being technically a bit incorrect.
> I wonder if we can make it clearer by following the step-by-step
> nature of the earlier part of the paragraph through. We already say
> the folded line processing is done first, so break things down in
> conceptual phases/steps, perhaps like
>
> * The backslash at the end-of-line is removed, together with the
> end-of-line, to form a single long line.
>
> * Anything that come after the first unquoted comment character,
> either '#' or ';', are discarded.
>
> * The leading and trailing whitespaces around the value part
> (i.e. what follows 'name =') are discarded.
>
> * Remaining unquoted whitespaces inside the value part are munged.
Hmm, I'm not really sure that such a description would be more clear
to the users, despite being technically more correct. I'll think a bit
more about it.
>> +Any number of internal whitespace characters found within
>> +the value are converted to the same number of space (SP) characters.
>
> The last one sounds like a bug to me, by the way.
>
> At least the very original 17712991 (Add ".git/config" file parser,
> 2005-10-10) squashed a run of whitespace characters into a single
> SP, which makes sense as a "clean-up".
>
> But ebdaae37 (config: Keep inner whitespace verbatim, 2009-07-30),
> while claiming to "Keep" inner whitespaces, broke it by replacing
> any isspace() bytes that are not SP with SP, contradicting its
> stated purpose.
Thank you for the investigation. The ebdaae37 commit certainly
introduced a bug to the value parsing, which presumably has remained
undected because the included test passes.
The way I see it, fixing the bug may actually be a breaking change,
because some user configurations may actually rely on the current
(mis)behavior. This makes me somewhat afraid that fixing this bug,
which I already thought about, may actually do more harm than good.
However, fixing this bug seems to be only right thing to do, which
I'll explain further below.
> As the latest change by the author of that change is from more than
> 10 years ago, I do not expect that he is still interested in this
> part of the codebase, but thanks to a very clearly written log
> message, we can read what the motivation behind that change was, and
> seeing that what the code does contradicts with the stated
> motivation we can safely declare that this is an ancient bug.
Agreed, the evidence is clear.
> Fixing that bug can of course be left outside the series. For those
> who are looking for microproject ideas who discovered this message
> by searching for the #leftoverbits keyword, one possible fix would
> be to revert ebdaae37, make sure a value with any whitespace in it
> gets quoted, and document clearly that an unquoted run of
> whitespaces is squashed into a single SP. Another way that is
> milder is to finish what ebdaae37 wanted to do and retain the
> whitespaces "verbatim".
I already though about fixing the bug so the value parser actually does
what git-config(1) currently says, but as I already noted above, I'm
afraid a bit that fixing this bug may actually do more harm than good.
Though, further investigation shows that setting a configuration value,
by invoking git-config(1), converts value-internal tabs into "\t" escape
sequences, which the value-parsing logic doesn't "squash" into spaces.
That's why the test included in the ebdaae37 commit passes. On the
other
hand, value-internal literal tab characters, found in a configuration
file, do get "squashed" by the value-parsing logic, so I'd say that the
only right thing to do is to fix this bug by making the value-internal
whitespace characters preserved verbatim.
I'd be happy to include the bugfix into this series, if my
above-mentioned
fears prove to be unnecessary.
next prev parent reply other threads:[~2024-03-14 6:20 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-12 15:55 [PATCH 0/3] Improve the documentation and test coverage for whitespace and comments Dragan Simic
2024-03-12 15:55 ` [PATCH 1/3] config.txt: describe whitespace characters further and more accurately Dragan Simic
2024-03-14 1:18 ` Junio C Hamano
2024-03-14 6:20 ` Dragan Simic [this message]
2024-03-14 16:45 ` Junio C Hamano
2024-03-14 18:48 ` Dragan Simic
2024-03-12 15:55 ` [PATCH 2/3] config.txt: perform some minor reformatting Dragan Simic
2024-03-14 1:58 ` Junio C Hamano
2024-03-14 6:20 ` Dragan Simic
2024-03-14 16:22 ` Junio C Hamano
2024-03-14 18:40 ` Dragan Simic
2024-03-12 15:55 ` [PATCH 3/3] t1300: add tests for internal whitespace and inline comments Dragan Simic
2024-03-14 2:18 ` Junio C Hamano
2024-03-14 6:20 ` Dragan Simic
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ff7b0a2ead90ad9a9456141da5e4df4a@manjaro.org \
--to=dsimic@manjaro.org \
--cc=git@vger.kernel.org \
--cc=github@seichter.de \
--cc=gitster@pobox.com \
--cc=rsbecker@nexbridge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).