linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
@ 2009-12-25 17:13 Sergei Trofimovich
  2009-12-25 23:36 ` H. Peter Anvin
  0 siblings, 1 reply; 7+ messages in thread
From: Sergei Trofimovich @ 2009-12-25 17:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: Michal Marek, Sergei Trofimovich

We restricted LC_CTYPE to ASCII recently but not messages from, say,
gcc. So instead of nice warnings I get '???? ??????? ???????'
(ru_RU.UTF-8 locale) as a gcc warning, which is not nice. So, set
LC_MESSAGES=C too.

Signed-off-by: Sergei Trofimovich <slyfox@inbox.ru>
---
 Makefile |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index c628a5c..67bc799 100644
--- a/Makefile
+++ b/Makefile
@@ -21,7 +21,8 @@ unexport LC_ALL
 LC_CTYPE=C
 LC_COLLATE=C
 LC_NUMERIC=C
-export LC_CTYPE LC_COLLATE LC_NUMERIC
+LC_MESSAGES=C
+export LC_CTYPE LC_COLLATE LC_NUMERIC LC_MESSAGES
 
 # We are using a recursive build, so we need to do a little thinking
 # to get the ordering right.
-- 
1.6.4.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
  2009-12-25 17:13 [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is) Sergei Trofimovich
@ 2009-12-25 23:36 ` H. Peter Anvin
  2009-12-26  1:17   ` Roland Dreier
  0 siblings, 1 reply; 7+ messages in thread
From: H. Peter Anvin @ 2009-12-25 23:36 UTC (permalink / raw)
  To: Sergei Trofimovich, Linus Torvalds
  Cc: linux-kernel, Michal Marek, Sergei Trofimovich

[-- Attachment #1: Type: text/plain, Size: 612 bytes --]

On 12/25/2009 09:13 AM, Sergei Trofimovich wrote:
> We restricted LC_CTYPE to ASCII recently but not messages from, say,
> gcc. So instead of nice warnings I get '???? ??????? ???????'
> (ru_RU.UTF-8 locale) as a gcc warning, which is not nice. So, set
> LC_MESSAGES=C too.

The whole reason with only setting some LC_* to C was to be able to
leave LC_MESSAGES intact, but it seems it breaks on too many real-life
systems.

As such, I suggest we should set LC_ALL=C and get rid of the rest of it:

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


[-- Attachment #2: 0001-Makefile-Set-LC_ALL-C.patch --]
[-- Type: text/x-patch, Size: 1017 bytes --]

>From 633dcb9167582064ec5d2d832450e93768cfe376 Mon Sep 17 00:00:00 2001
From: H. Peter Anvin <hpa@zytor.com>
Date: Fri, 25 Dec 2009 15:34:33 -0800
Subject: [PATCH] Makefile: Set LC_ALL=C

We were setting LC_CTYPE, LC_COLLATE and LC_NUMERIC to the C locale,
with the intent that LC_MESSAGES would still be localized.
Unfortunately, that doesn't seem to actually work in real life, so
just be done with it and set LC_ALL=C.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 Makefile |    7 ++-----
 1 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index c628a5c..a801d1d 100644
--- a/Makefile
+++ b/Makefile
@@ -17,11 +17,8 @@ NAME = Man-Eating Seals of Antiquity
 MAKEFLAGS += -rR --no-print-directory
 
 # Avoid funny character set dependencies
-unexport LC_ALL
-LC_CTYPE=C
-LC_COLLATE=C
-LC_NUMERIC=C
-export LC_CTYPE LC_COLLATE LC_NUMERIC
+LC_ALL=C
+export LC_ALL
 
 # We are using a recursive build, so we need to do a little thinking
 # to get the ordering right.
-- 
1.6.2.5


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
  2009-12-25 23:36 ` H. Peter Anvin
@ 2009-12-26  1:17   ` Roland Dreier
  2009-12-26  1:30     ` H. Peter Anvin
  2009-12-26 20:04     ` H. Peter Anvin
  0 siblings, 2 replies; 7+ messages in thread
From: Roland Dreier @ 2009-12-26  1:17 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Sergei Trofimovich, Linus Torvalds, linux-kernel, Michal Marek,
	Sergei Trofimovich


 > The whole reason with only setting some LC_* to C was to be able to
 > leave LC_MESSAGES intact, but it seems it breaks on too many real-life
 > systems.

 > As such, I suggest we should set LC_ALL=C and get rid of the rest of it:

Seems unfortunate to lose localized error messages.  (Although in my
en_US.UTF-8 case, all I get is non-ASCII quote characters)

This all started because of the awk invocation in arch/x86/lib.  Maybe
the best idea would be to confine the locale monkeying to that one
place?

 - R.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
  2009-12-26  1:17   ` Roland Dreier
@ 2009-12-26  1:30     ` H. Peter Anvin
  2009-12-26  6:58       ` Roland Dreier
  2009-12-26 20:04     ` H. Peter Anvin
  1 sibling, 1 reply; 7+ messages in thread
From: H. Peter Anvin @ 2009-12-26  1:30 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sergei Trofimovich, Linus Torvalds, linux-kernel, Michal Marek,
	Sergei Trofimovich

On 12/25/2009 05:17 PM, Roland Dreier wrote:
> 
>  > The whole reason with only setting some LC_* to C was to be able to
>  > leave LC_MESSAGES intact, but it seems it breaks on too many real-life
>  > systems.
> 
>  > As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
> 
> Seems unfortunate to lose localized error messages.  (Although in my
> en_US.UTF-8 case, all I get is non-ASCII quote characters)
> 

The whole problem is that for some people we lose *all* messages.  This
seems all very strange to me at all, but I guess it tweaks some internal
detail inside the glibc message library, sigh.

> This all started because of the awk invocation in arch/x86/lib.  Maybe
> the best idea would be to confine the locale monkeying to that one
> place?

Except that sed, etc. and even the shell itself have the same class of
problems.  Perl doesn't, since it has saner rules for how regular
expressions handle ranges.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
  2009-12-26  1:30     ` H. Peter Anvin
@ 2009-12-26  6:58       ` Roland Dreier
  0 siblings, 0 replies; 7+ messages in thread
From: Roland Dreier @ 2009-12-26  6:58 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Sergei Trofimovich, Linus Torvalds, linux-kernel, Michal Marek,
	Sergei Trofimovich


 > > Seems unfortunate to lose localized error messages.  (Although in my
 > > en_US.UTF-8 case, all I get is non-ASCII quote characters)

 > The whole problem is that for some people we lose *all* messages.  This
 > seems all very strange to me at all, but I guess it tweaks some internal
 > detail inside the glibc message library, sigh.

I just meant that people used to be able to get localized error messages
by setting LANG or whatever.  And now they're stuck with ASCII english.

 > > This all started because of the awk invocation in arch/x86/lib.  Maybe
 > > the best idea would be to confine the locale monkeying to that one
 > > place?

 > Except that sed, etc. and even the shell itself have the same class of
 > problems.  Perl doesn't, since it has saner rules for how regular
 > expressions handle ranges.

But pretty much everyone on a modern distro has had a UTF8 locale for
quite a while.  And as far as I know there have been no problems caused
by collation order or anything else.  So this change to always build in
the C locale is just worrying about theoretical problems.

Anyway, not a big deal I guess.

 - R.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
  2009-12-26  1:17   ` Roland Dreier
  2009-12-26  1:30     ` H. Peter Anvin
@ 2009-12-26 20:04     ` H. Peter Anvin
  2010-01-04 14:44       ` Michal Marek
  1 sibling, 1 reply; 7+ messages in thread
From: H. Peter Anvin @ 2009-12-26 20:04 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Sergei Trofimovich, Linus Torvalds, linux-kernel, Michal Marek,
	Sergei Trofimovich

On 12/25/2009 05:17 PM, Roland Dreier wrote:
> 
>  > The whole reason with only setting some LC_* to C was to be able to
>  > leave LC_MESSAGES intact, but it seems it breaks on too many real-life
>  > systems.
> 
>  > As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
> 
> Seems unfortunate to lose localized error messages.  (Although in my
> en_US.UTF-8 case, all I get is non-ASCII quote characters)
> 
> This all started because of the awk invocation in arch/x86/lib.  Maybe
> the best idea would be to confine the locale monkeying to that one
> place?
> 

It is also possible that setting only LC_COLLATE will solve the most
fundamental problem, which is the one of character ranges.  LC_COLLATE
probably will interfere less with LC_MESSAGES than the setting of LC_CTYPE.

It's still bloody broken that glibc malfunctions like that for an
LC_MESSAGES/LC_CTYPE intentional mismatch, but, sigh, that's glibc for you.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
  2009-12-26 20:04     ` H. Peter Anvin
@ 2010-01-04 14:44       ` Michal Marek
  0 siblings, 0 replies; 7+ messages in thread
From: Michal Marek @ 2010-01-04 14:44 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Roland Dreier, Sergei Trofimovich, Linus Torvalds, linux-kernel,
	Sergei Trofimovich

On 26.12.2009 21:04, H. Peter Anvin wrote:
> On 12/25/2009 05:17 PM, Roland Dreier wrote:
>>
>>  > The whole reason with only setting some LC_* to C was to be able to
>>  > leave LC_MESSAGES intact, but it seems it breaks on too many real-life
>>  > systems.
>>
>>  > As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
>>
>> Seems unfortunate to lose localized error messages.  (Although in my
>> en_US.UTF-8 case, all I get is non-ASCII quote characters)
>>
>> This all started because of the awk invocation in arch/x86/lib.  Maybe
>> the best idea would be to confine the locale monkeying to that one
>> place?
>>
> 
> It is also possible that setting only LC_COLLATE will solve the most
> fundamental problem, which is the one of character ranges.  LC_COLLATE
> probably will interfere less with LC_MESSAGES than the setting of LC_CTYPE.

We need LC_COLLATE=C so that [a-z] really means lowercase ASCII letters
and nothing else (most importantly not uppercase letters) in awk, sed
and the shell. If we stay with LC_CTYPE=$userdefined, the meaning of
[[:classes:]] becomes indeterministic and so does the mapping of
lowercase and uppercase characters:

$ echo iI | LC_CTYPE=tr_TR.UTF-8 awk '{ print $0 " " toupper($0) " "
tolower($0) }'
iI İI iı

Character classes are probably not a big issue (modulo the fact that
mawk doesn't seem to support them), because the input is ascii text
anyway. Regarding the tolower()/toupper() functions, I found one
potential troublemaker:

$ git grep -E 'to(lower|upper)' | grep -v '\.[ch]:'
arch/sh/tools/gen-mach-types:            tolower(mach[i]), mach[i]);

Maybe this awk script should be run with LC_ALL=C, people mostly care
about (localized) messages from gcc, not from awk.

Michal

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-01-04 14:44 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-25 17:13 [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is) Sergei Trofimovich
2009-12-25 23:36 ` H. Peter Anvin
2009-12-26  1:17   ` Roland Dreier
2009-12-26  1:30     ` H. Peter Anvin
2009-12-26  6:58       ` Roland Dreier
2009-12-26 20:04     ` H. Peter Anvin
2010-01-04 14:44       ` Michal Marek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).