dash.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins
@ 2015-05-17 23:15 Fredrik Fornwall
  2015-05-22  4:25 ` Herbert Xu
  0 siblings, 1 reply; 7+ messages in thread
From: Fredrik Fornwall @ 2015-05-17 23:15 UTC (permalink / raw)
  To: dash

In mkbuiltins LC_COLLATE is set, but since "The value of the LC_ALL
environment variable has precedence over any of the other environment
variables starting with LC_"
(http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html), this
has no effect when LC_ALL is set.

This breaks when having e.g. LC_ALL=en_US.UTF-8 during make, which
causes the test case
    dash -c :
to fail, probably due to broken ordering in builtins.c. The patch
corrects that by setting LC_ALL instead of LC_COLLATE.

Fredrik

diff -u -r ../dash-0.5.8/src/mkbuiltins ./src/mkbuiltins
--- ../dash-0.5.8/src/mkbuiltins 2014-09-28 04:19:32.000000000 -0400
+++ ./src/mkbuiltins 2015-05-17 19:08:00.076452891 -0400
@@ -78,7 +78,7 @@
  if ($i ~ /^-/)
  line = $(++i) "\t" line
  print line
- }}' $temp | LC_COLLATE=C sort -k 1,1 | tee $temp2 | awk '{
+ }}' $temp | LC_ALL=C sort -k 1,1 | tee $temp2 | awk '{
  opt = ""
  if (NF > 2) {
  opt = substr($2, 2)
@@ -97,7 +97,7 @@
  */

 !
-sed 's/ -[a-z]*//' $temp2 | nl -b a -v 0 | LC_COLLATE=C sort -u -k 3,3 |
+sed 's/ -[a-z]*//' $temp2 | nl -b a -v 0 | LC_ALL=C sort -u -k 3,3 |
 tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ |
  awk '{ printf "#define %s (builtincmd + %d)\n", $3, $1}'
 printf '\n#define NUMBUILTINS %d\n' $(wc -l < $temp2)
$ diff -u -r ../dash-0.5.8/ .

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins
  2015-05-17 23:15 [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins Fredrik Fornwall
@ 2015-05-22  4:25 ` Herbert Xu
  2015-05-22  4:40   ` Eric Blake
  2015-05-24 21:05   ` Fredrik Fornwall
  0 siblings, 2 replies; 7+ messages in thread
From: Herbert Xu @ 2015-05-22  4:25 UTC (permalink / raw)
  To: Fredrik Fornwall; +Cc: dash

Fredrik Fornwall <fredrik@fornwall.net> wrote:
> In mkbuiltins LC_COLLATE is set, but since "The value of the LC_ALL
> environment variable has precedence over any of the other environment
> variables starting with LC_"
> (http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html), this
> has no effect when LC_ALL is set.
> 
> This breaks when having e.g. LC_ALL=en_US.UTF-8 during make, which
> causes the test case
>    dash -c :
> to fail, probably due to broken ordering in builtins.c. The patch
> corrects that by setting LC_ALL instead of LC_COLLATE.

This causes any errors printed by sort to come out in English.

Please fix this by simply setting LC_ALL to empty alongside
LC_COLLATE=C.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins
  2015-05-22  4:25 ` Herbert Xu
@ 2015-05-22  4:40   ` Eric Blake
  2015-05-22  4:45     ` Herbert Xu
  2015-05-24 21:05   ` Fredrik Fornwall
  1 sibling, 1 reply; 7+ messages in thread
From: Eric Blake @ 2015-05-22  4:40 UTC (permalink / raw)
  To: Herbert Xu, Fredrik Fornwall; +Cc: dash

[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]

On 05/21/2015 10:25 PM, Herbert Xu wrote:
> Fredrik Fornwall <fredrik@fornwall.net> wrote:
>> In mkbuiltins LC_COLLATE is set, but since "The value of the LC_ALL
>> environment variable has precedence over any of the other environment
>> variables starting with LC_"
>> (http://pubs.opengroup.org/onlinepubs/7908799/xbd/envvar.html), this
>> has no effect when LC_ALL is set.
>>
>> This breaks when having e.g. LC_ALL=en_US.UTF-8 during make, which
>> causes the test case
>>    dash -c :
>> to fail, probably due to broken ordering in builtins.c. The patch
>> corrects that by setting LC_ALL instead of LC_COLLATE.
> 
> This causes any errors printed by sort to come out in English.

Why do you care whether any errors printed by sort are in the "C" locale
(in English) rather than localized?  Ideally, there won't be any sort
errors in the first place, because this tool is run on controlled input
as part of the build process.

> 
> Please fix this by simply setting LC_ALL to empty alongside
> LC_COLLATE=C.

Setting LC_ALL has the nice property that LC_COLLATE and LC_CTYPE are
guaranteed to be compatible; if you just set LC_COLLATE but leave
LC_CTYPE unchanged and unset LC_ALL, it is possible to attempt a
collation that assumes one character set while still living in a ctype
that assumes another, and get garbled results.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins
  2015-05-22  4:40   ` Eric Blake
@ 2015-05-22  4:45     ` Herbert Xu
  2015-05-22 13:02       ` Eric Blake
  0 siblings, 1 reply; 7+ messages in thread
From: Herbert Xu @ 2015-05-22  4:45 UTC (permalink / raw)
  To: Eric Blake; +Cc: Fredrik Fornwall, dash

On Thu, May 21, 2015 at 10:40:19PM -0600, Eric Blake wrote:
>
> Why do you care whether any errors printed by sort are in the "C" locale
> (in English) rather than localized?  Ideally, there won't be any sort
> errors in the first place, because this tool is run on controlled input
> as part of the build process.

Your /tmp could be full or sort could be out of memory.

> Setting LC_ALL has the nice property that LC_COLLATE and LC_CTYPE are
> guaranteed to be compatible; if you just set LC_COLLATE but leave
> LC_CTYPE unchanged and unset LC_ALL, it is possible to attempt a
> collation that assumes one character set while still living in a ctype
> that assumes another, and get garbled results.

Show me an actual pair of values for these two that produce
incorrect results for mkbuiltins and I'll happily change both.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins
  2015-05-22  4:45     ` Herbert Xu
@ 2015-05-22 13:02       ` Eric Blake
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Blake @ 2015-05-22 13:02 UTC (permalink / raw)
  To: Herbert Xu; +Cc: Fredrik Fornwall, dash

[-- Attachment #1: Type: text/plain, Size: 1713 bytes --]

On 05/21/2015 10:45 PM, Herbert Xu wrote:
>> Setting LC_ALL has the nice property that LC_COLLATE and LC_CTYPE are
>> guaranteed to be compatible; if you just set LC_COLLATE but leave
>> LC_CTYPE unchanged and unset LC_ALL, it is possible to attempt a
>> collation that assumes one character set while still living in a ctype
>> that assumes another, and get garbled results.
> 
> Show me an actual pair of values for these two that produce
> incorrect results for mkbuiltins and I'll happily change both.

'sort -b' uses isspace() to determine which characters to strip.  There
are locales with a larger set of characters where isspace() returns true
than for the LC_CTYPE=C locale.  Suppose that I can find a single-byte
locale where isblank('\xff') is true.  If that is the case, then the
input '\xffa\nb\n' will sort differently for 'LC_ALL=C sort -b' (output
'b\n'\xffa\n') than for 'LANG=C LC_CTYPE=$locale' (output '\xffa\nb\n')
because the change in CTYPE changes whether the \xff is ignored as a
blank or included as part of the name being sorted.

However, the man pages for 'locale(1)' and 'localedef(1)' did not make
it obvious for me how to perform a search that would easily find such a
locale, so I'm open to suggestions on how to prove my point via more
than just analysis.

And there's still the point that mkbuiltins is being run on controlled
input, where you are sticking only to a subset of characters that happen
to be portable (that is, you are unlikely to be tripped up by a locale
where \xff is a blank, since you are not using \xff in your input).

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins
  2015-05-22  4:25 ` Herbert Xu
  2015-05-22  4:40   ` Eric Blake
@ 2015-05-24 21:05   ` Fredrik Fornwall
  2015-05-26  2:49     ` Herbert Xu
  1 sibling, 1 reply; 7+ messages in thread
From: Fredrik Fornwall @ 2015-05-24 21:05 UTC (permalink / raw)
  To: Herbert Xu; +Cc: dash

On Fri, May 22, 2015 at 6:25 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> This causes any errors printed by sort to come out in English.
>
> Please fix this by simply setting LC_ALL to empty alongside
> LC_COLLATE=C.

A version with LC_ALL= follows:

diff -u -r ../dash-0.5.8/src/mkbuiltins ./src/mkbuiltins
--- ../dash-0.5.8/src/mkbuiltins 2014-09-28 04:19:32.000000000 -0400
+++ ./src/mkbuiltins 2015-05-17 19:08:00.076452891 -0400
@@ -78,7 +78,7 @@
  if ($i ~ /^-/)
  line = $(++i) "\t" line
  print line
- }}' $temp | LC_COLLATE=C sort -k 1,1 | tee $temp2 | awk '{
+ }}' $temp | LC_ALL= LC_COLLATE=C sort -k 1,1 | tee $temp2 | awk '{
  opt = ""
  if (NF > 2) {
  opt = substr($2, 2)
@@ -97,7 +97,7 @@
  */

 !
-sed 's/ -[a-z]*//' $temp2 | nl -b a -v 0 | LC_COLLATE=C sort -u -k 3,3 |
+sed 's/ -[a-z]*//' $temp2 | nl -b a -v 0 | LC_ALL= LC_COLLATE=C sort
-u -k 3,3 |
 tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ |
  awk '{ printf "#define %s (builtincmd + %d)\n", $3, $1}'
 printf '\n#define NUMBUILTINS %d\n' $(wc -l < $temp2)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins
  2015-05-24 21:05   ` Fredrik Fornwall
@ 2015-05-26  2:49     ` Herbert Xu
  0 siblings, 0 replies; 7+ messages in thread
From: Herbert Xu @ 2015-05-26  2:49 UTC (permalink / raw)
  To: Fredrik Fornwall; +Cc: dash

On Sun, May 24, 2015 at 11:05:48PM +0200, Fredrik Fornwall wrote:
> On Fri, May 22, 2015 at 6:25 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> > This causes any errors printed by sort to come out in English.
> >
> > Please fix this by simply setting LC_ALL to empty alongside
> > LC_COLLATE=C.
> 
> A version with LC_ALL= follows:

Patch applied.  Though I had to do it by hand because your patch
has been mangled by your mailer.  I also fixed up the indentation.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-05-26  2:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-17 23:15 [PATCH] Set LC_ALL instead LC_COLLATE in mkbuiltins Fredrik Fornwall
2015-05-22  4:25 ` Herbert Xu
2015-05-22  4:40   ` Eric Blake
2015-05-22  4:45     ` Herbert Xu
2015-05-22 13:02       ` Eric Blake
2015-05-24 21:05   ` Fredrik Fornwall
2015-05-26  2:49     ` Herbert Xu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).