* [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-08 12:16 ` Michal Marek
0 siblings, 0 replies; 34+ messages in thread
From: Michal Marek @ 2010-01-08 12:16 UTC (permalink / raw)
To: H. Peter Anvin, Simon Horman
Cc: Masami Hiramatsu, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
Setting LC_CTYPE=C breaks localized messages in some setups. With only
LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
so defined character classes and tolower()/toupper(). The former is not
a big issue, because we can assume that e.g. [:alpha:] will always
include a-zA-Z and we only ever process ASCII input. The latter seems
only affect arch/sh/tools/gen-mach-types, which we can handle separately.
So after this patch the meaning of ranges like [a-z], the behavior of
sort and join, etc. should be the same everywhere and at the same time
gcc should be able to print localized waring and error messages.
LC_NUMERIC=C might not be necessary, but setting it doesn't hurt.
Reported-by: Simon Horman <horms@verge.net.au>
Reported-by: Sergei Trofimovich <slyfox@inbox.ru>
Signed-off-by: Michal Marek <mmarek@suse.cz>
---
Note: if this still breaks for someone, we will simply set LC_ALL=C.
Makefile | 3 +--
arch/sh/tools/Makefile | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/Makefile b/Makefile
index 09a320f..a7b4351 100644
--- a/Makefile
+++ b/Makefile
@@ -18,10 +18,9 @@ MAKEFLAGS += -rR --no-print-directory
# Avoid funny character set dependencies
unexport LC_ALL
-LC_CTYPE=C
LC_COLLATE=C
LC_NUMERIC=C
-export LC_CTYPE LC_COLLATE LC_NUMERIC
+export LC_COLLATE LC_NUMERIC
# We are using a recursive build, so we need to do a little thinking
# to get the ordering right.
diff --git a/arch/sh/tools/Makefile b/arch/sh/tools/Makefile
index 558a56b..2082af1 100644
--- a/arch/sh/tools/Makefile
+++ b/arch/sh/tools/Makefile
@@ -13,4 +13,4 @@
include/generated/machtypes.h: $(src)/gen-mach-types $(src)/mach-types
@echo ' Generating $@'
$(Q)mkdir -p $(dir $@)
- $(Q)$(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
+ $(Q)LC_ALL=C $(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
--
1.6.5.3
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-08 12:16 ` Michal Marek
@ 2010-01-08 18:50 ` H. Peter Anvin
-1 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2010-01-08 18:50 UTC (permalink / raw)
To: Michal Marek
Cc: Simon Horman, Masami Hiramatsu, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
On 01/08/2010 04:16 AM, Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> So after this patch the meaning of ranges like [a-z], the behavior of
> sort and join, etc. should be the same everywhere and at the same time
> gcc should be able to print localized waring and error messages.
> LC_NUMERIC=C might not be necessary, but setting it doesn't hurt.
>
> Reported-by: Simon Horman <horms@verge.net.au>
> Reported-by: Sergei Trofimovich <slyfox@inbox.ru>
> Signed-off-by: Michal Marek <mmarek@suse.cz>
For what it's worth:
Acked-by: H. Peter Anvin <hpa@zytor.com>
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-08 18:50 ` H. Peter Anvin
0 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2010-01-08 18:50 UTC (permalink / raw)
To: Michal Marek
Cc: Simon Horman, Masami Hiramatsu, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
On 01/08/2010 04:16 AM, Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> So after this patch the meaning of ranges like [a-z], the behavior of
> sort and join, etc. should be the same everywhere and at the same time
> gcc should be able to print localized waring and error messages.
> LC_NUMERIC=C might not be necessary, but setting it doesn't hurt.
>
> Reported-by: Simon Horman <horms@verge.net.au>
> Reported-by: Sergei Trofimovich <slyfox@inbox.ru>
> Signed-off-by: Michal Marek <mmarek@suse.cz>
For what it's worth:
Acked-by: H. Peter Anvin <hpa@zytor.com>
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-08 12:16 ` Michal Marek
@ 2010-01-09 0:00 ` Simon Horman
-1 siblings, 0 replies; 34+ messages in thread
From: Simon Horman @ 2010-01-09 0:00 UTC (permalink / raw)
To: Michal Marek
Cc: H. Peter Anvin, Masami Hiramatsu, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
Hi Michal,
sorry for messing up your email address in one of the previous threads.
On Fri, Jan 08, 2010 at 01:16:28PM +0100, Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> So after this patch the meaning of ranges like [a-z], the behavior of
> sort and join, etc. should be the same everywhere and at the same time
> gcc should be able to print localized waring and error messages.
> LC_NUMERIC=C might not be necessary, but setting it doesn't hurt.
>
> Reported-by: Simon Horman <horms@verge.net.au>
> Reported-by: Sergei Trofimovich <slyfox@inbox.ru>
> Signed-off-by: Michal Marek <mmarek@suse.cz>
Tested-by: Simon Horman <horms@verge.net.au>
> ---
>
> Note: if this still breaks for someone, we will simply set LC_ALL=C.
Personally I think it would be much better to set the locale explicitly
as needed, where needed, such as the LC_ALL=C sledgehammer that you
have inserted into arch/sh/tools. Or at a slightly higher level,
offer an awk-wrapper, as it seems to be the main (only?) cause of concern.
Surely the goal isn't to alter the user-experience - to the extent that a
build has a user-experience - but to force some tools to behave as desired.
Just an opinion. The patch below seems to work fine for me.
>
> Makefile | 3 +--
> arch/sh/tools/Makefile | 2 +-
> 2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 09a320f..a7b4351 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -18,10 +18,9 @@ MAKEFLAGS += -rR --no-print-directory
>
> # Avoid funny character set dependencies
> unexport LC_ALL
> -LC_CTYPE=C
> LC_COLLATE=C
> LC_NUMERIC=C
> -export LC_CTYPE LC_COLLATE LC_NUMERIC
> +export LC_COLLATE LC_NUMERIC
>
> # We are using a recursive build, so we need to do a little thinking
> # to get the ordering right.
> diff --git a/arch/sh/tools/Makefile b/arch/sh/tools/Makefile
> index 558a56b..2082af1 100644
> --- a/arch/sh/tools/Makefile
> +++ b/arch/sh/tools/Makefile
> @@ -13,4 +13,4 @@
> include/generated/machtypes.h: $(src)/gen-mach-types $(src)/mach-types
> @echo ' Generating $@'
> $(Q)mkdir -p $(dir $@)
> - $(Q)$(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
> + $(Q)LC_ALL=C $(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
> --
> 1.6.5.3
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-09 0:00 ` Simon Horman
0 siblings, 0 replies; 34+ messages in thread
From: Simon Horman @ 2010-01-09 0:00 UTC (permalink / raw)
To: Michal Marek
Cc: H. Peter Anvin, Masami Hiramatsu, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
Hi Michal,
sorry for messing up your email address in one of the previous threads.
On Fri, Jan 08, 2010 at 01:16:28PM +0100, Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> So after this patch the meaning of ranges like [a-z], the behavior of
> sort and join, etc. should be the same everywhere and at the same time
> gcc should be able to print localized waring and error messages.
> LC_NUMERIC=C might not be necessary, but setting it doesn't hurt.
>
> Reported-by: Simon Horman <horms@verge.net.au>
> Reported-by: Sergei Trofimovich <slyfox@inbox.ru>
> Signed-off-by: Michal Marek <mmarek@suse.cz>
Tested-by: Simon Horman <horms@verge.net.au>
> ---
>
> Note: if this still breaks for someone, we will simply set LC_ALL=C.
Personally I think it would be much better to set the locale explicitly
as needed, where needed, such as the LC_ALL=C sledgehammer that you
have inserted into arch/sh/tools. Or at a slightly higher level,
offer an awk-wrapper, as it seems to be the main (only?) cause of concern.
Surely the goal isn't to alter the user-experience - to the extent that a
build has a user-experience - but to force some tools to behave as desired.
Just an opinion. The patch below seems to work fine for me.
>
> Makefile | 3 +--
> arch/sh/tools/Makefile | 2 +-
> 2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 09a320f..a7b4351 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -18,10 +18,9 @@ MAKEFLAGS += -rR --no-print-directory
>
> # Avoid funny character set dependencies
> unexport LC_ALL
> -LC_CTYPE=C
> LC_COLLATE=C
> LC_NUMERIC=C
> -export LC_CTYPE LC_COLLATE LC_NUMERIC
> +export LC_COLLATE LC_NUMERIC
>
> # We are using a recursive build, so we need to do a little thinking
> # to get the ordering right.
> diff --git a/arch/sh/tools/Makefile b/arch/sh/tools/Makefile
> index 558a56b..2082af1 100644
> --- a/arch/sh/tools/Makefile
> +++ b/arch/sh/tools/Makefile
> @@ -13,4 +13,4 @@
> include/generated/machtypes.h: $(src)/gen-mach-types $(src)/mach-types
> @echo ' Generating $@'
> $(Q)mkdir -p $(dir $@)
> - $(Q)$(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
> + $(Q)LC_ALL=C $(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
> --
> 1.6.5.3
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-09 0:00 ` Simon Horman
@ 2010-01-09 0:07 ` H. Peter Anvin
-1 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2010-01-09 0:07 UTC (permalink / raw)
To: Simon Horman
Cc: Michal Marek, Masami Hiramatsu, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
On 01/08/2010 04:00 PM, Simon Horman wrote:
>
> Personally I think it would be much better to set the locale explicitly
> as needed, where needed, such as the LC_ALL=C sledgehammer that you
> have inserted into arch/sh/tools. Or at a slightly higher level,
> offer an awk-wrapper, as it seems to be the main (only?) cause of concern.
>
awk, sed, shell scripts, etc. all have the same problem.
-hpa
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-09 0:07 ` H. Peter Anvin
0 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2010-01-09 0:07 UTC (permalink / raw)
To: Simon Horman
Cc: Michal Marek, Masami Hiramatsu, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
On 01/08/2010 04:00 PM, Simon Horman wrote:
>
> Personally I think it would be much better to set the locale explicitly
> as needed, where needed, such as the LC_ALL=C sledgehammer that you
> have inserted into arch/sh/tools. Or at a slightly higher level,
> offer an awk-wrapper, as it seems to be the main (only?) cause of concern.
>
awk, sed, shell scripts, etc. all have the same problem.
-hpa
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-08 12:16 ` Michal Marek
@ 2010-01-09 0:09 ` Masami Hiramatsu
-1 siblings, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2010-01-09 0:09 UTC (permalink / raw)
To: Michal Marek
Cc: H. Peter Anvin, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
Hi Michal,
Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
Could you also wrap it?
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhiramat@redhat.com
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-09 0:09 ` Masami Hiramatsu
0 siblings, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2010-01-09 0:09 UTC (permalink / raw)
To: Michal Marek
Cc: H. Peter Anvin, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
Hi Michal,
Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
Could you also wrap it?
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhiramat@redhat.com
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-09 0:09 ` Masami Hiramatsu
@ 2010-01-09 0:16 ` H. Peter Anvin
-1 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2010-01-09 0:16 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Michal Marek, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
> Hi Michal,
>
> Michal Marek wrote:
>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>> so defined character classes and tolower()/toupper(). The former is not
>> a big issue, because we can assume that e.g. [:alpha:] will always
>> include a-zA-Z and we only ever process ASCII input. The latter seems
>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
> Could you also wrap it?
>
This is tolower/toupper()? Do there exist locales where tolower/toupper
on ASCII input do weird things, or are we merely hypothesizing?
-hpa
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-09 0:16 ` H. Peter Anvin
0 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2010-01-09 0:16 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: Michal Marek, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
> Hi Michal,
>
> Michal Marek wrote:
>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>> so defined character classes and tolower()/toupper(). The former is not
>> a big issue, because we can assume that e.g. [:alpha:] will always
>> include a-zA-Z and we only ever process ASCII input. The latter seems
>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
> Could you also wrap it?
>
This is tolower/toupper()? Do there exist locales where tolower/toupper
on ASCII input do weird things, or are we merely hypothesizing?
-hpa
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-09 0:16 ` H. Peter Anvin
@ 2010-01-09 0:30 ` Masami Hiramatsu
-1 siblings, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2010-01-09 0:30 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Michal Marek, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
H. Peter Anvin wrote:
> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>> Hi Michal,
>>
>> Michal Marek wrote:
>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>>> so defined character classes and tolower()/toupper(). The former is not
>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>>
>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>> Could you also wrap it?
>>
>
> This is tolower/toupper()? Do there exist locales where tolower/toupper
> on ASCII input do weird things, or are we merely hypothesizing?
Isn't it affect [A-Z] or [a-z]? If not, the patch good to me too.
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhiramat@redhat.com
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-09 0:30 ` Masami Hiramatsu
0 siblings, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2010-01-09 0:30 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Michal Marek, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
H. Peter Anvin wrote:
> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>> Hi Michal,
>>
>> Michal Marek wrote:
>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>>> so defined character classes and tolower()/toupper(). The former is not
>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>>
>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>> Could you also wrap it?
>>
>
> This is tolower/toupper()? Do there exist locales where tolower/toupper
> on ASCII input do weird things, or are we merely hypothesizing?
Isn't it affect [A-Z] or [a-z]? If not, the patch good to me too.
Thank you,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhiramat@redhat.com
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-09 0:30 ` Masami Hiramatsu
@ 2010-01-09 0:43 ` H. Peter Anvin
-1 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2010-01-09 0:43 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: H. Peter Anvin, Michal Marek, Simon Horman, Roland Dreier,
Sam Ravnborg, Sergei Trofimovich, linux-kbuild, linux-kernel,
linux-sh
> H. Peter Anvin wrote:
>> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>>> Hi Michal,
>>>
>>> Michal Marek wrote:
>>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for
>>>> not
>>>> so defined character classes and tolower()/toupper(). The former is
>>>> not
>>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>>> only affect arch/sh/tools/gen-mach-types, which we can handle
>>>> separately.
>>>
>>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>>> Could you also wrap it?
>>>
>>
>> This is tolower/toupper()? Do there exist locales where tolower/toupper
>> on ASCII input do weird things, or are we merely hypothesizing?
>
> Isn't it affect [A-Z] or [a-z]? If not, the patch good to me too.
>
[A-Z][a-z] is what LC_COLLATE is about.
-hpa
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-09 0:43 ` H. Peter Anvin
0 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2010-01-09 0:43 UTC (permalink / raw)
To: Masami Hiramatsu
Cc: H. Peter Anvin, Michal Marek, Simon Horman, Roland Dreier,
Sam Ravnborg, Sergei Trofimovich, linux-kbuild, linux-kernel,
linux-sh
> H. Peter Anvin wrote:
>> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>>> Hi Michal,
>>>
>>> Michal Marek wrote:
>>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for
>>>> not
>>>> so defined character classes and tolower()/toupper(). The former is
>>>> not
>>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>>> only affect arch/sh/tools/gen-mach-types, which we can handle
>>>> separately.
>>>
>>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>>> Could you also wrap it?
>>>
>>
>> This is tolower/toupper()? Do there exist locales where tolower/toupper
>> on ASCII input do weird things, or are we merely hypothesizing?
>
> Isn't it affect [A-Z] or [a-z]? If not, the patch good to me too.
>
[A-Z][a-z] is what LC_COLLATE is about.
-hpa
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-09 0:16 ` H. Peter Anvin
@ 2010-01-09 0:53 ` Masami Hiramatsu
-1 siblings, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2010-01-09 0:53 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Michal Marek, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
H. Peter Anvin wrote:
> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>> Hi Michal,
>>
>> Michal Marek wrote:
>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>>> so defined character classes and tolower()/toupper(). The former is not
>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>>
>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>> Could you also wrap it?
>>
>
> This is tolower/toupper()? Do there exist locales where tolower/toupper
> on ASCII input do weird things, or are we merely hypothesizing?
Ah, sorry, I was just hypothesizing.
---
#!/bin/sh
# en_US locale sorts alphabets as AaBb...
LANG=en_US
LC_ALLLC_COLLATE=C
LC_NUMERIC=C
export LC_COLLATE LC_NUMERIC
awk 'BEGIN{if (match("C","[a-z]")) {print "NG"} else {print "OK"} exit;}'
---
this returns "OK". So, the patch is OK for me too.
Thanks,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhiramat@redhat.com
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-09 0:53 ` Masami Hiramatsu
0 siblings, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2010-01-09 0:53 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Michal Marek, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
H. Peter Anvin wrote:
> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>> Hi Michal,
>>
>> Michal Marek wrote:
>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>>> so defined character classes and tolower()/toupper(). The former is not
>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>>
>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>> Could you also wrap it?
>>
>
> This is tolower/toupper()? Do there exist locales where tolower/toupper
> on ASCII input do weird things, or are we merely hypothesizing?
Ah, sorry, I was just hypothesizing.
---
#!/bin/sh
# en_US locale sorts alphabets as AaBb...
LANG=en_US
LC_ALL=
LC_COLLATE=C
LC_NUMERIC=C
export LC_COLLATE LC_NUMERIC
awk 'BEGIN{if (match("C","[a-z]")) {print "NG"} else {print "OK"} exit;}'
---
this returns "OK". So, the patch is OK for me too.
Thanks,
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhiramat@redhat.com
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-09 0:16 ` H. Peter Anvin
@ 2010-01-11 9:52 ` Michal Marek
-1 siblings, 0 replies; 34+ messages in thread
From: Michal Marek @ 2010-01-11 9:52 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Masami Hiramatsu, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
On 9.1.2010 01:16, H. Peter Anvin wrote:
> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>> Hi Michal,
>>
>> Michal Marek wrote:
>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>>> so defined character classes and tolower()/toupper(). The former is not
>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>>
>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>> Could you also wrap it?
>>
>
> This is tolower/toupper()? Do there exist locales where tolower/toupper
> on ASCII input do weird things, or are we merely hypothesizing?
In Turkish, uppercase i is İ (I with dot) and lowercase I is ı (i
without dot), see http://en.wikipedia.org/wiki/Dotted_and_dotless_I.
Michal
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-11 9:52 ` Michal Marek
0 siblings, 0 replies; 34+ messages in thread
From: Michal Marek @ 2010-01-11 9:52 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Masami Hiramatsu, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
On 9.1.2010 01:16, H. Peter Anvin wrote:
> On 01/08/2010 04:09 PM, Masami Hiramatsu wrote:
>> Hi Michal,
>>
>> Michal Marek wrote:
>>> Setting LC_CTYPE=C breaks localized messages in some setups. With only
>>> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
>>> so defined character classes and tolower()/toupper(). The former is not
>>> a big issue, because we can assume that e.g. [:alpha:] will always
>>> include a-zA-Z and we only ever process ASCII input. The latter seems
>>> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>>
>> Hmm, this also affects arch/x/tools/gen-insn-attr-x86.awk.
>> Could you also wrap it?
>>
>
> This is tolower/toupper()? Do there exist locales where tolower/toupper
> on ASCII input do weird things, or are we merely hypothesizing?
In Turkish, uppercase i is İ (I with dot) and lowercase I is ı (i
without dot), see http://en.wikipedia.org/wiki/Dotted_and_dotless_I.
Michal
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-09 0:16 ` H. Peter Anvin
` (3 preceding siblings ...)
(?)
@ 2010-01-11 10:52 ` Alan Cox
2010-01-12 0:50 ` H. Peter Anvin
-1 siblings, 1 reply; 34+ messages in thread
From: Alan Cox @ 2010-01-11 10:52 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Masami Hiramatsu, Michal Marek, Simon Horman, Roland Dreier,
Sam Ravnborg, Sergei Trofimovich, linux-kbuild, linux-kernel,
linux-sh
> This is tolower/toupper()? Do there exist locales where tolower/toupper
> on ASCII input do weird things, or are we merely hypothesizing?
Turkish is the famous one for this and usually causes
internationalisation chaos. So yes they exist, and there are worse more
esoteric cases. There are good reasons sed and friends support classes as
well as old C locale style ranges.
Alan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-11 10:52 ` Alan Cox
@ 2010-01-12 0:50 ` H. Peter Anvin
0 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2010-01-12 0:50 UTC (permalink / raw)
To: Alan Cox
Cc: Masami Hiramatsu, Michal Marek, Simon Horman, Roland Dreier,
Sam Ravnborg, Sergei Trofimovich, linux-kbuild, linux-kernel,
linux-sh
On 01/11/2010 02:52 AM, Alan Cox wrote:
>> This is tolower/toupper()? Do there exist locales where tolower/toupper
>> on ASCII input do weird things, or are we merely hypothesizing?
>
> Turkish is the famous one for this and usually causes
> internationalisation chaos. So yes they exist, and there are worse more
> esoteric cases. There are good reasons sed and friends support classes as
> well as old C locale style ranges.
>
Ah yes, forgot about Turkish. Apparently Lithuanian and Azeri also have
special rules for the letters I and J. Sigh.
-hpa
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-12 0:50 ` H. Peter Anvin
0 siblings, 0 replies; 34+ messages in thread
From: H. Peter Anvin @ 2010-01-12 0:50 UTC (permalink / raw)
To: Alan Cox
Cc: Masami Hiramatsu, Michal Marek, Simon Horman, Roland Dreier,
Sam Ravnborg, Sergei Trofimovich, linux-kbuild, linux-kernel,
linux-sh
On 01/11/2010 02:52 AM, Alan Cox wrote:
>> This is tolower/toupper()? Do there exist locales where tolower/toupper
>> on ASCII input do weird things, or are we merely hypothesizing?
>
> Turkish is the famous one for this and usually causes
> internationalisation chaos. So yes they exist, and there are worse more
> esoteric cases. There are good reasons sed and friends support classes as
> well as old C locale style ranges.
>
Ah yes, forgot about Turkish. Apparently Lithuanian and Azeri also have
special rules for the letters I and J. Sigh.
-hpa
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
2010-01-08 12:16 ` Michal Marek
@ 2010-01-09 1:07 ` Masami Hiramatsu
-1 siblings, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2010-01-09 1:07 UTC (permalink / raw)
To: Michal Marek
Cc: H. Peter Anvin, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> So after this patch the meaning of ranges like [a-z], the behavior of
> sort and join, etc. should be the same everywhere and at the same time
> gcc should be able to print localized waring and error messages.
> LC_NUMERIC=C might not be necessary, but setting it doesn't hurt.
>
> Reported-by: Simon Horman <horms@verge.net.au>
> Reported-by: Sergei Trofimovich <slyfox@inbox.ru>
> Signed-off-by: Michal Marek <mmarek@suse.cz>
I checked that this change doesn't affect arch/x86/tools/gen-insn-attr-x86.awk.
Tested-by: Masami Hiramatsu <mhiramat@redhat.com>
Thank you!
> ---
>
> Note: if this still breaks for someone, we will simply set LC_ALL=C.
>
> Makefile | 3 +--
> arch/sh/tools/Makefile | 2 +-
> 2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 09a320f..a7b4351 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -18,10 +18,9 @@ MAKEFLAGS += -rR --no-print-directory
>
> # Avoid funny character set dependencies
> unexport LC_ALL
> -LC_CTYPE=C
> LC_COLLATE=C
> LC_NUMERIC=C
> -export LC_CTYPE LC_COLLATE LC_NUMERIC
> +export LC_COLLATE LC_NUMERIC
>
> # We are using a recursive build, so we need to do a little thinking
> # to get the ordering right.
> diff --git a/arch/sh/tools/Makefile b/arch/sh/tools/Makefile
> index 558a56b..2082af1 100644
> --- a/arch/sh/tools/Makefile
> +++ b/arch/sh/tools/Makefile
> @@ -13,4 +13,4 @@
> include/generated/machtypes.h: $(src)/gen-mach-types $(src)/mach-types
> @echo ' Generating $@'
> $(Q)mkdir -p $(dir $@)
> - $(Q)$(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
> + $(Q)LC_ALL=C $(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhiramat@redhat.com
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] Makefile: do not override LC_CTYPE
@ 2010-01-09 1:07 ` Masami Hiramatsu
0 siblings, 0 replies; 34+ messages in thread
From: Masami Hiramatsu @ 2010-01-09 1:07 UTC (permalink / raw)
To: Michal Marek
Cc: H. Peter Anvin, Simon Horman, Roland Dreier, Sam Ravnborg,
Sergei Trofimovich, linux-kbuild, linux-kernel, linux-sh
Michal Marek wrote:
> Setting LC_CTYPE=C breaks localized messages in some setups. With only
> LC_COLLATE=C and LC_NUMERIC=C, we get almost all we need, except for not
> so defined character classes and tolower()/toupper(). The former is not
> a big issue, because we can assume that e.g. [:alpha:] will always
> include a-zA-Z and we only ever process ASCII input. The latter seems
> only affect arch/sh/tools/gen-mach-types, which we can handle separately.
>
> So after this patch the meaning of ranges like [a-z], the behavior of
> sort and join, etc. should be the same everywhere and at the same time
> gcc should be able to print localized waring and error messages.
> LC_NUMERIC=C might not be necessary, but setting it doesn't hurt.
>
> Reported-by: Simon Horman <horms@verge.net.au>
> Reported-by: Sergei Trofimovich <slyfox@inbox.ru>
> Signed-off-by: Michal Marek <mmarek@suse.cz>
I checked that this change doesn't affect arch/x86/tools/gen-insn-attr-x86.awk.
Tested-by: Masami Hiramatsu <mhiramat@redhat.com>
Thank you!
> ---
>
> Note: if this still breaks for someone, we will simply set LC_ALL=C.
>
> Makefile | 3 +--
> arch/sh/tools/Makefile | 2 +-
> 2 files changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index 09a320f..a7b4351 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -18,10 +18,9 @@ MAKEFLAGS += -rR --no-print-directory
>
> # Avoid funny character set dependencies
> unexport LC_ALL
> -LC_CTYPE=C
> LC_COLLATE=C
> LC_NUMERIC=C
> -export LC_CTYPE LC_COLLATE LC_NUMERIC
> +export LC_COLLATE LC_NUMERIC
>
> # We are using a recursive build, so we need to do a little thinking
> # to get the ordering right.
> diff --git a/arch/sh/tools/Makefile b/arch/sh/tools/Makefile
> index 558a56b..2082af1 100644
> --- a/arch/sh/tools/Makefile
> +++ b/arch/sh/tools/Makefile
> @@ -13,4 +13,4 @@
> include/generated/machtypes.h: $(src)/gen-mach-types $(src)/mach-types
> @echo ' Generating $@'
> $(Q)mkdir -p $(dir $@)
> - $(Q)$(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
> + $(Q)LC_ALL=C $(AWK) -f $^ > $@ || { rm -f $@; /bin/false; }
--
Masami Hiramatsu
Software Engineer
Hitachi Computer Products (America), Inc.
Software Solutions Division
e-mail: mhiramat@redhat.com
^ permalink raw reply [flat|nested] 34+ messages in thread