All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lars Schneider <larsxschneider@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: lars.schneider@autodesk.com, git@vger.kernel.org, tboegi@web.de,
	j6t@kdbg.org, sunshine@sunshineco.com, peff@peff.net,
	ramsay@ramsayjones.plus.com, Johannes.Schindelin@gmx.de,
	pclouds@gmail.com
Subject: Re: [PATCH v11 08/10] convert: advise canonical UTF encoding names
Date: Thu, 15 Mar 2018 23:42:26 +0100	[thread overview]
Message-ID: <D1598F51-5D9E-42FA-A9B7-C1462526B9CB@gmail.com> (raw)
In-Reply-To: <xmqqefkt5ak0.fsf@gitster-ct.c.googlers.com>


> On 09 Mar 2018, at 20:11, Junio C Hamano <gitster@pobox.com> wrote:
> 
> lars.schneider@autodesk.com writes:
> 
>> From: Lars Schneider <larsxschneider@gmail.com>
>> 
>> The canonical name of an UTF encoding has the format UTF, dash, number,
>> and an optionally byte order in upper case (e.g. UTF-8 or UTF-16BE).
>> Some iconv versions support alternative names without a dash or with
>> lower case characters.
>> 
>> To avoid problems between different iconv version always suggest the
>> canonical UTF names in advise messages.
>> 
>> Signed-off-by: Lars Schneider <larsxschneider@gmail.com>
>> ---
> 
> I think it is probably better to squash this to earlier step,
> i.e. jumping straight to the endgame solution.

ok!


>> diff --git a/convert.c b/convert.c
>> index b80d666a6b..9a3ae7cce1 100644
>> --- a/convert.c
>> +++ b/convert.c
>> @@ -279,12 +279,20 @@ static int validate_encoding(const char *path, const char *enc,
>> 				"BOM is prohibited in '%s' if encoded as %s");
>> 			/*
>> 			 * This advice is shown for UTF-??BE and UTF-??LE encodings.
>> +			 * We cut off the last two characters of the encoding name
>> +			 # to generate the encoding name suitable for BOMs.
>> 			 */
> 
> I somehow thought that I saw "s/#/*/" in somebody's response during
> the previous round?

Oops. Will fix!


>> 			const char *advise_msg = _(
>> 				"The file '%s' contains a byte order "
>> -				"mark (BOM). Please use %.6s as "
>> +				"mark (BOM). Please use UTF-%s as "
>> 				"working-tree-encoding.");
>> -			advise(advise_msg, path, enc);
>> +			const char *stripped = "";
>> +			char *upper = xstrdup_toupper(enc);
>> +			upper[strlen(upper)-2] = '\0';
>> +			if (!skip_prefix(upper, "UTF-", &stripped))
>> +				skip_prefix(stripped, "UTF", &stripped);
>> +			advise(advise_msg, path, stripped);
>> +			free(upper);
> 
> If this codepath is ever entered with "enc" that does not begin with
> "UTF" (e.g. "Shift_JIS", which is impossible in the current code,
> but I'll talk about future-proofing here), then neither of these
> skip_prefix will trigger, and then you'd end up suggesting to use
> "UTF-" that is nonsense.  Perhaps initialize stripped to NULL and
> force advise to segv to catch such a programmer error?

Agreed!


Thanks,
Lars


  reply	other threads:[~2018-03-15 22:42 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-09 17:35 [PATCH v11 00/10] convert: add support for different encodings lars.schneider
2018-03-09 17:35 ` [PATCH v11 01/10] strbuf: remove unnecessary NUL assignment in xstrdup_tolower() lars.schneider
2018-03-09 17:35 ` [PATCH v11 02/10] strbuf: add xstrdup_toupper() lars.schneider
2018-03-09 17:35 ` [PATCH v11 03/10] strbuf: add a case insensitive starts_with() lars.schneider
2018-03-09 17:35 ` [PATCH v11 04/10] utf8: add function to detect prohibited UTF-16/32 BOM lars.schneider
2018-03-09 17:35 ` [PATCH v11 05/10] utf8: add function to detect a missing " lars.schneider
2018-03-09 17:35 ` [PATCH v11 06/10] convert: add 'working-tree-encoding' attribute lars.schneider
2018-03-09 19:10   ` Junio C Hamano
2018-03-15 21:23     ` Lars Schneider
2018-03-18  7:24   ` Torsten Bögershausen
2018-04-01 13:24     ` Lars Schneider
2018-04-05 16:41       ` Torsten Bögershausen
2018-04-15 16:54         ` Lars Schneider
2018-03-09 17:35 ` [PATCH v11 07/10] convert: check for detectable errors in UTF encodings lars.schneider
2018-03-09 19:00   ` Junio C Hamano
2018-03-09 19:04     ` Lars Schneider
2018-03-09 19:10   ` Junio C Hamano
2018-03-09 17:35 ` [PATCH v11 08/10] convert: advise canonical UTF encoding names lars.schneider
2018-03-09 19:11   ` Junio C Hamano
2018-03-15 22:42     ` Lars Schneider [this message]
2018-03-09 17:35 ` [PATCH v11 09/10] convert: add tracing for 'working-tree-encoding' attribute lars.schneider
2018-03-09 17:35 ` [PATCH v11 10/10] convert: add round trip check based on 'core.checkRoundtripEncoding' lars.schneider
2018-03-09 20:18   ` Eric Sunshine
2018-03-09 20:22     ` Junio C Hamano
2018-03-09 20:27       ` Eric Sunshine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D1598F51-5D9E-42FA-A9B7-C1462526B9CB@gmail.com \
    --to=larsxschneider@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=lars.schneider@autodesk.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sunshine@sunshineco.com \
    --cc=tboegi@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.