* [PATCHv2] parse-options: report uncorrupted multi-byte options
@ 2013-02-11 23:13 Erik Faye-Lund
2013-02-11 23:51 ` Junio C Hamano
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Erik Faye-Lund @ 2013-02-11 23:13 UTC (permalink / raw)
To: git; +Cc: gitster, peff, matthieu.moy, tboegi
Because our command-line parser considers only one byte at the time
for short-options, we incorrectly report only the first byte when
multi-byte input was provided. This makes user-erros slightly
awkward to diagnose for instance under UTF-8 locale and non-English
keyboard layouts.
Make the reporting code report the whole argument-string when a
non-ASCII short-option is detected.
Signed-off-by: Erik Faye-Lund <kusmabite@gmail.com>
Improved-by: Jeff King <peff@peff.net>
---
Here's a second attempt at fixing error-reporting with UTF-8 encoded
input, this time without corrupting other non-ascii multi-byte
encodings.
I decided to change the text from what Jeff suggested; all we know is
that it's non-ASCII. It might be Latin-1 or some other non-ASCII,
single byte encoding. And since we're trying not to care, let's also
try to not be overly specific :)
I wasn't entirely sure who to attribute for the improvement, so I just
picked Jeff; he provided some code. That decision might not be correct,
feel free to change it.
parse-options.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/parse-options.c b/parse-options.c
index 67e98a6..6a39446 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -461,8 +461,11 @@ int parse_options(int argc, const char **argv, const char *prefix,
default: /* PARSE_OPT_UNKNOWN */
if (ctx.argv[0][1] == '-') {
error("unknown option `%s'", ctx.argv[0] + 2);
- } else {
+ } else if (isascii(*ctx.opt)) {
error("unknown switch `%c'", *ctx.opt);
+ } else {
+ error("unknown non-ascii option in string: `%s'",
+ ctx.argv[0]);
}
usage_with_options(usagestr, options);
}
--
1.8.1.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCHv2] parse-options: report uncorrupted multi-byte options
2013-02-11 23:13 [PATCHv2] parse-options: report uncorrupted multi-byte options Erik Faye-Lund
@ 2013-02-11 23:51 ` Junio C Hamano
2013-02-12 1:00 ` Jeff King
2013-02-12 1:21 ` Duy Nguyen
2 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2013-02-11 23:51 UTC (permalink / raw)
To: Erik Faye-Lund; +Cc: git, peff, matthieu.moy, tboegi
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] parse-options: report uncorrupted multi-byte options
2013-02-11 23:13 [PATCHv2] parse-options: report uncorrupted multi-byte options Erik Faye-Lund
2013-02-11 23:51 ` Junio C Hamano
@ 2013-02-12 1:00 ` Jeff King
2013-02-12 1:21 ` Duy Nguyen
2 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2013-02-12 1:00 UTC (permalink / raw)
To: Erik Faye-Lund; +Cc: git, gitster, matthieu.moy, tboegi
On Tue, Feb 12, 2013 at 12:13:48AM +0100, Erik Faye-Lund wrote:
> I decided to change the text from what Jeff suggested; all we know is
> that it's non-ASCII. It might be Latin-1 or some other non-ASCII,
> single byte encoding. And since we're trying not to care, let's also
> try to not be overly specific :)
Yeah, that makes more sense (I did not put too much thought into the
original wording). Thanks.
-Peff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] parse-options: report uncorrupted multi-byte options
2013-02-11 23:13 [PATCHv2] parse-options: report uncorrupted multi-byte options Erik Faye-Lund
2013-02-11 23:51 ` Junio C Hamano
2013-02-12 1:00 ` Jeff King
@ 2013-02-12 1:21 ` Duy Nguyen
2013-02-12 2:10 ` Junio C Hamano
2 siblings, 1 reply; 6+ messages in thread
From: Duy Nguyen @ 2013-02-12 1:21 UTC (permalink / raw)
To: Erik Faye-Lund; +Cc: git, gitster, peff, matthieu.moy, tboegi
On Tue, Feb 12, 2013 at 6:13 AM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
> Because our command-line parser considers only one byte at the time
> for short-options, we incorrectly report only the first byte when
> multi-byte input was provided. This makes user-erros slightly
> awkward to diagnose for instance under UTF-8 locale and non-English
> keyboard layouts.
>
> Make the reporting code report the whole argument-string when a
> non-ASCII short-option is detected.
Similar cases:
config.c:git_default_core_config() assumes core.commentchar is ascii.
We should catch and report non-ascii chars, or simply accept it as a
string.
builtin/update-index.c:cmd_update_index(): error("unknown switch
'%c'", *ctx.opt);
builtin/apply.c:apply_one_fragment(): error(_("invalid start of line:
'%c'"), first); where 'first' may be a part of utf-8 from a broken
patch.
--
Duy
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] parse-options: report uncorrupted multi-byte options
2013-02-12 1:21 ` Duy Nguyen
@ 2013-02-12 2:10 ` Junio C Hamano
2013-02-12 2:30 ` Duy Nguyen
0 siblings, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2013-02-12 2:10 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Erik Faye-Lund, git, peff, matthieu.moy, tboegi
Duy Nguyen <pclouds@gmail.com> writes:
> On Tue, Feb 12, 2013 at 6:13 AM, Erik Faye-Lund <kusmabite@gmail.com> wrote:
>> Because our command-line parser considers only one byte at the time
>> for short-options, we incorrectly report only the first byte when
>> multi-byte input was provided. This makes user-erros slightly
>> awkward to diagnose for instance under UTF-8 locale and non-English
>> keyboard layouts.
>>
>> Make the reporting code report the whole argument-string when a
>> non-ASCII short-option is detected.
>
> Similar cases:
>
> config.c:git_default_core_config() assumes core.commentchar is ascii.
> We should catch and report non-ascii chars, or simply accept it as a
> string.
That one is just an uninterpreted byte. core.commentString might be
a nice extension to the concept, but it is an entirely different
category.
> builtin/update-index.c:cmd_update_index(): error("unknown switch
> '%c'", *ctx.opt);
This one is in the same category as this topic.
> builtin/apply.c:apply_one_fragment(): error(_("invalid start of line:
> '%c'"), first); where 'first' may be a part of utf-8 from a broken
> patch.
This is where the patch is expected to have either " ", "-" or "+",
again, anything else is an uninterpreted byte. It is more like
reporting the file we found an error in, whose filename is not
encoded in UTF-8 to the user's terminal.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCHv2] parse-options: report uncorrupted multi-byte options
2013-02-12 2:10 ` Junio C Hamano
@ 2013-02-12 2:30 ` Duy Nguyen
0 siblings, 0 replies; 6+ messages in thread
From: Duy Nguyen @ 2013-02-12 2:30 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Erik Faye-Lund, git, peff, matthieu.moy, tboegi
On Tue, Feb 12, 2013 at 9:10 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Similar cases:
>>
>> config.c:git_default_core_config() assumes core.commentchar is ascii.
>> We should catch and report non-ascii chars, or simply accept it as a
>> string.
>
> That one is just an uninterpreted byte. core.commentString might be
> a nice extension to the concept, but it is an entirely different
> category.
My point is not to output broken utf-8 if we can. If someone
accidentally puts a UTF-8 character in core.commentChar, it will
produce broken utf-8 templates that editors might react, but hard to
see by eye. Something like this may give sufficient protection:
diff --git a/config.c b/config.c
index aefd80b..b6f73e0 100644
--- a/config.c
+++ b/config.c
@@ -726,8 +726,11 @@ static int git_default_core_config(const char
*var, const char *value)
if (!strcmp(var, "core.commentchar")) {
const char *comment;
int ret = git_config_string(&comment, var, value);
- if (!ret)
+ if (!ret) {
+ if (comment[1])
+ return error("core.commentchar must be
one ASCII character");
comment_line_char = comment[0];
+ }
return ret;
}
--
Duy
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2013-02-12 2:31 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-11 23:13 [PATCHv2] parse-options: report uncorrupted multi-byte options Erik Faye-Lund
2013-02-11 23:51 ` Junio C Hamano
2013-02-12 1:00 ` Jeff King
2013-02-12 1:21 ` Duy Nguyen
2013-02-12 2:10 ` Junio C Hamano
2013-02-12 2:30 ` Duy Nguyen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.