dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Fredrik Fornwall <fredrik@fornwall.net>, dash@vger.kernel.org
Subject: Re: [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins
Date: Fri, 22 May 2015 07:02:57 -0600	[thread overview]
Message-ID: <555F2901.1080108@redhat.com> (raw)
In-Reply-To: <20150522044515.GA32740@gondor.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 1713 bytes --]

On 05/21/2015 10:45 PM, Herbert Xu wrote:
>> Setting LC_ALL has the nice property that LC_COLLATE and LC_CTYPE are
>> guaranteed to be compatible; if you just set LC_COLLATE but leave
>> LC_CTYPE unchanged and unset LC_ALL, it is possible to attempt a
>> collation that assumes one character set while still living in a ctype
>> that assumes another, and get garbled results.
> 
> Show me an actual pair of values for these two that produce
> incorrect results for mkbuiltins and I'll happily change both.

'sort -b' uses isspace() to determine which characters to strip.  There
are locales with a larger set of characters where isspace() returns true
than for the LC_CTYPE=C locale.  Suppose that I can find a single-byte
locale where isblank('\xff') is true.  If that is the case, then the
input '\xffa\nb\n' will sort differently for 'LC_ALL=C sort -b' (output
'b\n'\xffa\n') than for 'LANG=C LC_CTYPE=$locale' (output '\xffa\nb\n')
because the change in CTYPE changes whether the \xff is ignored as a
blank or included as part of the name being sorted.

However, the man pages for 'locale(1)' and 'localedef(1)' did not make
it obvious for me how to perform a search that would easily find such a
locale, so I'm open to suggestions on how to prove my point via more
than just analysis.

And there's still the point that mkbuiltins is being run on controlled
input, where you are sticking only to a subset of characters that happen
to be portable (that is, you are unlikely to be tripped up by a locale
where \xff is a blank, since you are not using \xff in your input).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

  reply	other threads:[~2015-05-22 13:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-17 23:15 [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins Fredrik Fornwall
2015-05-22  4:25 ` Herbert Xu
2015-05-22  4:40   ` Eric Blake
2015-05-22  4:45     ` Herbert Xu
2015-05-22 13:02       ` Eric Blake [this message]
2015-05-24 21:05   ` Fredrik Fornwall
2015-05-26  2:49     ` Herbert Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=555F2901.1080108@redhat.com \
    --to=eblake@redhat.com \
    --cc=dash@vger.kernel.org \
    --cc=fredrik@fornwall.net \
    --cc=herbert@gondor.apana.org.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).