* [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
@ 2009-12-25 17:13 Sergei Trofimovich
2009-12-25 23:36 ` H. Peter Anvin
0 siblings, 1 reply; 7+ messages in thread
From: Sergei Trofimovich @ 2009-12-25 17:13 UTC (permalink / raw)
To: linux-kernel; +Cc: Michal Marek, Sergei Trofimovich
We restricted LC_CTYPE to ASCII recently but not messages from, say,
gcc. So instead of nice warnings I get '???? ??????? ???????'
(ru_RU.UTF-8 locale) as a gcc warning, which is not nice. So, set
LC_MESSAGES=C too.
Signed-off-by: Sergei Trofimovich <slyfox@inbox.ru>
---
Makefile | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/Makefile b/Makefile
index c628a5c..67bc799 100644
--- a/Makefile
+++ b/Makefile
@@ -21,7 +21,8 @@ unexport LC_ALL
LC_CTYPE=C
LC_COLLATE=C
LC_NUMERIC=C
-export LC_CTYPE LC_COLLATE LC_NUMERIC
+LC_MESSAGES=C
+export LC_CTYPE LC_COLLATE LC_NUMERIC LC_MESSAGES
# We are using a recursive build, so we need to do a little thinking
# to get the ordering right.
--
1.6.4.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
2009-12-25 17:13 [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is) Sergei Trofimovich
@ 2009-12-25 23:36 ` H. Peter Anvin
2009-12-26 1:17 ` Roland Dreier
0 siblings, 1 reply; 7+ messages in thread
From: H. Peter Anvin @ 2009-12-25 23:36 UTC (permalink / raw)
To: Sergei Trofimovich, Linus Torvalds
Cc: linux-kernel, Michal Marek, Sergei Trofimovich
[-- Attachment #1: Type: text/plain, Size: 612 bytes --]
On 12/25/2009 09:13 AM, Sergei Trofimovich wrote:
> We restricted LC_CTYPE to ASCII recently but not messages from, say,
> gcc. So instead of nice warnings I get '???? ??????? ???????'
> (ru_RU.UTF-8 locale) as a gcc warning, which is not nice. So, set
> LC_MESSAGES=C too.
The whole reason with only setting some LC_* to C was to be able to
leave LC_MESSAGES intact, but it seems it breaks on too many real-life
systems.
As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
[-- Attachment #2: 0001-Makefile-Set-LC_ALL-C.patch --]
[-- Type: text/x-patch, Size: 1017 bytes --]
>From 633dcb9167582064ec5d2d832450e93768cfe376 Mon Sep 17 00:00:00 2001
From: H. Peter Anvin <hpa@zytor.com>
Date: Fri, 25 Dec 2009 15:34:33 -0800
Subject: [PATCH] Makefile: Set LC_ALL=C
We were setting LC_CTYPE, LC_COLLATE and LC_NUMERIC to the C locale,
with the intent that LC_MESSAGES would still be localized.
Unfortunately, that doesn't seem to actually work in real life, so
just be done with it and set LC_ALL=C.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
Makefile | 7 ++-----
1 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/Makefile b/Makefile
index c628a5c..a801d1d 100644
--- a/Makefile
+++ b/Makefile
@@ -17,11 +17,8 @@ NAME = Man-Eating Seals of Antiquity
MAKEFLAGS += -rR --no-print-directory
# Avoid funny character set dependencies
-unexport LC_ALL
-LC_CTYPE=C
-LC_COLLATE=C
-LC_NUMERIC=C
-export LC_CTYPE LC_COLLATE LC_NUMERIC
+LC_ALL=C
+export LC_ALL
# We are using a recursive build, so we need to do a little thinking
# to get the ordering right.
--
1.6.2.5
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
2009-12-25 23:36 ` H. Peter Anvin
@ 2009-12-26 1:17 ` Roland Dreier
2009-12-26 1:30 ` H. Peter Anvin
2009-12-26 20:04 ` H. Peter Anvin
0 siblings, 2 replies; 7+ messages in thread
From: Roland Dreier @ 2009-12-26 1:17 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Sergei Trofimovich, Linus Torvalds, linux-kernel, Michal Marek,
Sergei Trofimovich
> The whole reason with only setting some LC_* to C was to be able to
> leave LC_MESSAGES intact, but it seems it breaks on too many real-life
> systems.
> As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
Seems unfortunate to lose localized error messages. (Although in my
en_US.UTF-8 case, all I get is non-ASCII quote characters)
This all started because of the awk invocation in arch/x86/lib. Maybe
the best idea would be to confine the locale monkeying to that one
place?
- R.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
2009-12-26 1:17 ` Roland Dreier
@ 2009-12-26 1:30 ` H. Peter Anvin
2009-12-26 6:58 ` Roland Dreier
2009-12-26 20:04 ` H. Peter Anvin
1 sibling, 1 reply; 7+ messages in thread
From: H. Peter Anvin @ 2009-12-26 1:30 UTC (permalink / raw)
To: Roland Dreier
Cc: Sergei Trofimovich, Linus Torvalds, linux-kernel, Michal Marek,
Sergei Trofimovich
On 12/25/2009 05:17 PM, Roland Dreier wrote:
>
> > The whole reason with only setting some LC_* to C was to be able to
> > leave LC_MESSAGES intact, but it seems it breaks on too many real-life
> > systems.
>
> > As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
>
> Seems unfortunate to lose localized error messages. (Although in my
> en_US.UTF-8 case, all I get is non-ASCII quote characters)
>
The whole problem is that for some people we lose *all* messages. This
seems all very strange to me at all, but I guess it tweaks some internal
detail inside the glibc message library, sigh.
> This all started because of the awk invocation in arch/x86/lib. Maybe
> the best idea would be to confine the locale monkeying to that one
> place?
Except that sed, etc. and even the shell itself have the same class of
problems. Perl doesn't, since it has saner rules for how regular
expressions handle ranges.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
2009-12-26 1:30 ` H. Peter Anvin
@ 2009-12-26 6:58 ` Roland Dreier
0 siblings, 0 replies; 7+ messages in thread
From: Roland Dreier @ 2009-12-26 6:58 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Sergei Trofimovich, Linus Torvalds, linux-kernel, Michal Marek,
Sergei Trofimovich
> > Seems unfortunate to lose localized error messages. (Although in my
> > en_US.UTF-8 case, all I get is non-ASCII quote characters)
> The whole problem is that for some people we lose *all* messages. This
> seems all very strange to me at all, but I guess it tweaks some internal
> detail inside the glibc message library, sigh.
I just meant that people used to be able to get localized error messages
by setting LANG or whatever. And now they're stuck with ASCII english.
> > This all started because of the awk invocation in arch/x86/lib. Maybe
> > the best idea would be to confine the locale monkeying to that one
> > place?
> Except that sed, etc. and even the shell itself have the same class of
> problems. Perl doesn't, since it has saner rules for how regular
> expressions handle ranges.
But pretty much everyone on a modern distro has had a UTF8 locale for
quite a while. And as far as I know there have been no problems caused
by collation order or anything else. So this change to always build in
the C locale is just worrying about theoretical problems.
Anyway, not a big deal I guess.
- R.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
2009-12-26 1:17 ` Roland Dreier
2009-12-26 1:30 ` H. Peter Anvin
@ 2009-12-26 20:04 ` H. Peter Anvin
2010-01-04 14:44 ` Michal Marek
1 sibling, 1 reply; 7+ messages in thread
From: H. Peter Anvin @ 2009-12-26 20:04 UTC (permalink / raw)
To: Roland Dreier
Cc: Sergei Trofimovich, Linus Torvalds, linux-kernel, Michal Marek,
Sergei Trofimovich
On 12/25/2009 05:17 PM, Roland Dreier wrote:
>
> > The whole reason with only setting some LC_* to C was to be able to
> > leave LC_MESSAGES intact, but it seems it breaks on too many real-life
> > systems.
>
> > As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
>
> Seems unfortunate to lose localized error messages. (Although in my
> en_US.UTF-8 case, all I get is non-ASCII quote characters)
>
> This all started because of the awk invocation in arch/x86/lib. Maybe
> the best idea would be to confine the locale monkeying to that one
> place?
>
It is also possible that setting only LC_COLLATE will solve the most
fundamental problem, which is the one of character ranges. LC_COLLATE
probably will interfere less with LC_MESSAGES than the setting of LC_CTYPE.
It's still bloody broken that glibc malfunctions like that for an
LC_MESSAGES/LC_CTYPE intentional mismatch, but, sigh, that's glibc for you.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is)
2009-12-26 20:04 ` H. Peter Anvin
@ 2010-01-04 14:44 ` Michal Marek
0 siblings, 0 replies; 7+ messages in thread
From: Michal Marek @ 2010-01-04 14:44 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Roland Dreier, Sergei Trofimovich, Linus Torvalds, linux-kernel,
Sergei Trofimovich
On 26.12.2009 21:04, H. Peter Anvin wrote:
> On 12/25/2009 05:17 PM, Roland Dreier wrote:
>>
>> > The whole reason with only setting some LC_* to C was to be able to
>> > leave LC_MESSAGES intact, but it seems it breaks on too many real-life
>> > systems.
>>
>> > As such, I suggest we should set LC_ALL=C and get rid of the rest of it:
>>
>> Seems unfortunate to lose localized error messages. (Although in my
>> en_US.UTF-8 case, all I get is non-ASCII quote characters)
>>
>> This all started because of the awk invocation in arch/x86/lib. Maybe
>> the best idea would be to confine the locale monkeying to that one
>> place?
>>
>
> It is also possible that setting only LC_COLLATE will solve the most
> fundamental problem, which is the one of character ranges. LC_COLLATE
> probably will interfere less with LC_MESSAGES than the setting of LC_CTYPE.
We need LC_COLLATE=C so that [a-z] really means lowercase ASCII letters
and nothing else (most importantly not uppercase letters) in awk, sed
and the shell. If we stay with LC_CTYPE=$userdefined, the meaning of
[[:classes:]] becomes indeterministic and so does the mapping of
lowercase and uppercase characters:
$ echo iI | LC_CTYPE=tr_TR.UTF-8 awk '{ print $0 " " toupper($0) " "
tolower($0) }'
iI İI iı
Character classes are probably not a big issue (modulo the fact that
mawk doesn't seem to support them), because the input is ascii text
anyway. Regarding the tolower()/toupper() functions, I found one
potential troublemaker:
$ git grep -E 'to(lower|upper)' | grep -v '\.[ch]:'
arch/sh/tools/gen-mach-types: tolower(mach[i]), mach[i]);
Maybe this awk script should be run with LC_ALL=C, people mostly care
about (localized) messages from gcc, not from awk.
Michal
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-01-04 14:44 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-25 17:13 [PATCH] Kbuild: set LC_MESSAGES=C (as LC_CTYPE=C is) Sergei Trofimovich
2009-12-25 23:36 ` H. Peter Anvin
2009-12-26 1:17 ` Roland Dreier
2009-12-26 1:30 ` H. Peter Anvin
2009-12-26 6:58 ` Roland Dreier
2009-12-26 20:04 ` H. Peter Anvin
2010-01-04 14:44 ` Michal Marek
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).