* [PATCH] UTF-8ifying the kernel source
@ 2004-03-04 10:05 David Eger
2004-03-04 10:19 ` Meelis Roos
` (3 more replies)
0 siblings, 4 replies; 22+ messages in thread
From: David Eger @ 2004-03-04 10:05 UTC (permalink / raw)
To: linux-kernel
http://www.yak.net/random/linux-2.6.3-utf8-cleanup-auto.diff.bz2
Here you find the first of several patches to convert the kernel
source from ISO Latin-1 to UTF-8. I'm working on the files that didn't
auto-convert easily; comments welcome ;-)
First, some statistics!
In Linux 2.6.3, there are:
15860 clean 7-bit ASCII files
274 text files are not 7-bit clean
38 of these 274 files are not auto-convertible -- either they are not ISO
Latin-1 or the high octets appear within the actual code (not comments).
This first patch applies to help files, documentation, and comments which
are trivially correct ISO Latin-1 => UTF-8 conversions. The work I have
left to do is summarized below.
--dte
Un-needed/wrong non-ASCII characters (these fixes will form patch 2)
====================================================================
drivers/video/amifb.c - +- sign?
Documentation/i2c/i2c-protocol - NBSP, but why?
arch/i386/kernel/cpu/cyrix.c - NBSP, but why?
arch/v850/kernel/as85ep1.ld - WTF? comments in some random charset...
drivers/char/ftape/lowlevel/fdc-isr.c - WTF? shit in the comments
include/asm-m68k/atarihw.h - 0x94 - "cancel character"?
include/asm-m68k/atariints.h - 0x94 - "cancel character"?
include/linux/802_11.h - why the non-standard dash?
scripts/docproc.c - why the bizarre spelling for specific?
fs/ext2/xattr.c - bad ASCII art
fs/ext3/xattr.c - bad ASCII art
fs/afs/vlclient.h - a degrees sign, but why?
Box-drawing ASCII art (these fixes will form patch 3)
=====================================================
Documentation/networking/tms380tr.txt - DOS-style ASCII art
arch/arm/nwfpe/fpopcode.h - line-drawing characters
C strings - (what to do?)
=========================
arch/ppc/platforms/proc_rtas.c - a C string containing "degrees"
arch/ppc64/kernel/rtas-proc.c - a C string containing "degrees"
drivers/macintosh/therm_adt7467.c - degrees, MODULE_PARAM_DESC(),
and a C string
drivers/mtd/chips/cfi_probe.c - C strings
drivers/net/wireless/netwave_cs.c - C strings
drivers/scsi/dc395x.c - C strings
Other - (i'd convert it, but...)
================================
drivers/pci/pci.ids - I don't know what program processes this...
drivers/ieee1394/oui.db - I don't know what program processes this...
Machine / charset specific shite - (does anything need to be done?)
===================================================================
arch/m68k/hp300/hp300map.map - maps to "char"s.. grr
drivers/char/defkeymap.map - a map file... maps to "char"s.. grr
drivers/char/qtronixmap.c_shipped - maps to "char"s.. grr
drivers/char/qtronixmap.map - maps to "char"s.. grr
drivers/tc/lk201-map.c_shipped - maps to "char"s.. grr
drivers/tc/lk201-map.map - maps to "char"s.. grr
drivers/acorn/char/defkeymap-l7200.c - maps to "char"s.. grr
arch/s390/kernel/ebcdic.c - comments on a keymap table
drivers/video/console/font_8x16.c - comments on a keymap table
drivers/video/console/font_8x8.c - comments on a keymap table
drivers/video/console/font_pearl_8x8.c - comments on a keymap table
drivers/s390/ebcdic.c - comments on a keymap table
Noise from userland (this I won't be touching)
==============================================
Documentation/networking/ethertap.txt - random crap cat'd from /dev/tap0
Documentation/s390/Debugging390.txt - weird gdb output
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-04 10:05 [PATCH] UTF-8ifying the kernel source David Eger
@ 2004-03-04 10:19 ` Meelis Roos
2004-03-04 10:32 ` Måns Rullgård
2004-03-04 21:51 ` Alex Belits
` (2 subsequent siblings)
3 siblings, 1 reply; 22+ messages in thread
From: Meelis Roos @ 2004-03-04 10:19 UTC (permalink / raw)
To: linux-kernel, eger
DE> Here you find the first of several patches to convert the kernel
DE> source from ISO Latin-1 to UTF-8. I'm working on the files that didn't
DE> auto-convert easily; comments welcome ;-)
Why? It's just easier to use plain 8-bit text files today (with editors,
code tools etc) and accept the limitations of it that to overcome the
limitations by forcing people to UTF-8 editors & other tools.
I am not a kernel developer but this seems a bad idea to me.
--
Meelis Roos
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-04 10:19 ` Meelis Roos
@ 2004-03-04 10:32 ` Måns Rullgård
0 siblings, 0 replies; 22+ messages in thread
From: Måns Rullgård @ 2004-03-04 10:32 UTC (permalink / raw)
To: linux-kernel
Meelis Roos <mroos@linux.ee> writes:
> DE> Here you find the first of several patches to convert the kernel
> DE> source from ISO Latin-1 to UTF-8. I'm working on the files that didn't
> DE> auto-convert easily; comments welcome ;-)
>
> Why? It's just easier to use plain 8-bit text files today (with editors,
> code tools etc) and accept the limitations of it that to overcome the
> limitations by forcing people to UTF-8 editors & other tools.
How do you propose that editors should know which encoding a file
uses? The trend seems to be moving towards UTF-8 for everything, so
the kernel might as well do it too.
--
Måns Rullgård
mru@kth.se
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-04 10:05 [PATCH] UTF-8ifying the kernel source David Eger
2004-03-04 10:19 ` Meelis Roos
@ 2004-03-04 21:51 ` Alex Belits
2004-03-05 8:26 ` Miles Bader
2004-03-05 23:24 ` David Eger
3 siblings, 0 replies; 22+ messages in thread
From: Alex Belits @ 2004-03-04 21:51 UTC (permalink / raw)
To: David Eger; +Cc: linux-kernel
On Thu, 4 Mar 2004, David Eger wrote:
> http://www.yak.net/random/linux-2.6.3-utf8-cleanup-auto.diff.bz2
>
> Here you find the first of several patches to convert the kernel
> source from ISO Latin-1 to UTF-8. I'm working on the files that didn't
> auto-convert easily; comments welcome ;-)
>
> First, some statistics!
>
> In Linux 2.6.3, there are:
> 15860 clean 7-bit ASCII files
> 274 text files are not 7-bit clean
>
> 38 of these 274 files are not auto-convertible -- either they are not ISO
> Latin-1 or the high octets appear within the actual code (not comments).
>
> This first patch applies to help files, documentation, and comments which
> are trivially correct ISO Latin-1 => UTF-8 conversions. The work I have
> left to do is summarized below.
That will be of a great help for the future developers that will edit
kernel sources in Microsoft Word.
[a large collection of expletives in multiple languages and charsets is
skipped here]
--
Alex
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-04 10:05 [PATCH] UTF-8ifying the kernel source David Eger
2004-03-04 10:19 ` Meelis Roos
2004-03-04 21:51 ` Alex Belits
@ 2004-03-05 8:26 ` Miles Bader
2004-03-05 20:01 ` H. Peter Anvin
2004-03-05 23:24 ` David Eger
3 siblings, 1 reply; 22+ messages in thread
From: Miles Bader @ 2004-03-05 8:26 UTC (permalink / raw)
To: David Eger; +Cc: linux-kernel
David Eger <eger@havoc.gtf.org> writes:
> arch/v850/kernel/as85ep1.ld - WTF? comments in some random charset...
FWIW, the charset is EUC-JP.
Even other files in that same directory aren't consistent, e.g.,
as85ep1.c uses ISO-2022-JP.
[My fault, but it never really registered on my important-enough-to fix
radar (emacs autodetects them all so I never really noticed the
discrepancy).]
-Miles
--
We are all lying in the gutter, but some of us are looking at the stars.
-Oscar Wilde
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-05 8:26 ` Miles Bader
@ 2004-03-05 20:01 ` H. Peter Anvin
2004-03-05 21:00 ` Mike Fedyk
0 siblings, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2004-03-05 20:01 UTC (permalink / raw)
To: linux-kernel
Followup to: <buovfljbsyl.fsf@mcspd15.ucom.lsi.nec.co.jp>
By author: Miles Bader <miles@lsi.nec.co.jp>
In newsgroup: linux.dev.kernel
>
> David Eger <eger@havoc.gtf.org> writes:
> > arch/v850/kernel/as85ep1.ld - WTF? comments in some random charset...
>
> FWIW, the charset is EUC-JP.
>
> Even other files in that same directory aren't consistent, e.g.,
> as85ep1.c uses ISO-2022-JP.
>
> [My fault, but it never really registered on my important-enough-to fix
> radar (emacs autodetects them all so I never really noticed the
> discrepancy).]
>
OK, this is definitely a good reason to go to UTF-8 across the board.
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-05 20:01 ` H. Peter Anvin
@ 2004-03-05 21:00 ` Mike Fedyk
2004-03-05 21:02 ` H. Peter Anvin
2004-03-05 21:20 ` David Eger
0 siblings, 2 replies; 22+ messages in thread
From: Mike Fedyk @ 2004-03-05 21:00 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: linux-kernel
H. Peter Anvin wrote:
> Followup to: <buovfljbsyl.fsf@mcspd15.ucom.lsi.nec.co.jp>
> By author: Miles Bader <miles@lsi.nec.co.jp>
> In newsgroup: linux.dev.kernel
>
>>David Eger <eger@havoc.gtf.org> writes:
>>
>>>arch/v850/kernel/as85ep1.ld - WTF? comments in some random charset...
>>
>>FWIW, the charset is EUC-JP.
>>
>>Even other files in that same directory aren't consistent, e.g.,
>>as85ep1.c uses ISO-2022-JP.
>>
>>[My fault, but it never really registered on my important-enough-to fix
>>radar (emacs autodetects them all so I never really noticed the
>>discrepancy).]
>>
>
>
> OK, this is definitely a good reason to go to UTF-8 across the board.
So when is "less" going to support utf8? Right now, it just shows
escape codes... :(
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-05 21:00 ` Mike Fedyk
@ 2004-03-05 21:02 ` H. Peter Anvin
2004-03-05 21:17 ` Måns Rullgård
2004-03-05 21:20 ` David Eger
1 sibling, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2004-03-05 21:02 UTC (permalink / raw)
To: Mike Fedyk; +Cc: linux-kernel
Mike Fedyk wrote:
>>
>> OK, this is definitely a good reason to go to UTF-8 across the board.
>
> So when is "less" going to support utf8? Right now, it just shows
> escape codes... :(
>
Why don't you ask the "less" maintainer about that?
Right now, "less" seems to insist on showing ampersands for *any*
non-ASCII character for me...
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-05 21:02 ` H. Peter Anvin
@ 2004-03-05 21:17 ` Måns Rullgård
2004-03-05 21:26 ` Charles Cazabon
0 siblings, 1 reply; 22+ messages in thread
From: Måns Rullgård @ 2004-03-05 21:17 UTC (permalink / raw)
To: linux-kernel
"H. Peter Anvin" <hpa@zytor.com> writes:
> Mike Fedyk wrote:
>>>
>>> OK, this is definitely a good reason to go to UTF-8 across the board.
>>
>> So when is "less" going to support utf8? Right now, it just shows
>> escape codes... :(
>>
>
> Why don't you ask the "less" maintainer about that?
>
> Right now, "less" seems to insist on showing ampersands for *any*
> non-ASCII character for me...
Less version 381 is working fine here with UTF-8. I have LANG and
LC_CTYPE set to en_US.UTF-8.
--
Måns Rullgård
mru@kth.se
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-05 21:00 ` Mike Fedyk
2004-03-05 21:02 ` H. Peter Anvin
@ 2004-03-05 21:20 ` David Eger
1 sibling, 0 replies; 22+ messages in thread
From: David Eger @ 2004-03-05 21:20 UTC (permalink / raw)
To: Mike Fedyk; +Cc: linux-kernel
On Fri, Mar 05, 2004 at 01:00:55PM -0800, Mike Fedyk wrote:
>
> So when is "less" going to support utf8? Right now, it just shows
> escape codes... :(
bash user? try:
$ export LESSCHARSET="utf-8"
$ less myfavoritefile.c
-dte ;-)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-05 21:17 ` Måns Rullgård
@ 2004-03-05 21:26 ` Charles Cazabon
0 siblings, 0 replies; 22+ messages in thread
From: Charles Cazabon @ 2004-03-05 21:26 UTC (permalink / raw)
To: linux-kernel
Måns Rullgård <mru@kth.se> wrote:
> >
> > Right now, "less" seems to insist on showing ampersands for *any*
> > non-ASCII character for me...
>
> Less version 381 is working fine here with UTF-8. I have LANG and
> LC_CTYPE set to en_US.UTF-8.
less 340 works fine here with the same settings.
Charles
--
-----------------------------------------------------------------------
Charles Cazabon <linux@discworld.dyndns.org>
GPL'ed software available at: http://www.qcc.ca/~charlesc/software/
-----------------------------------------------------------------------
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-04 10:05 [PATCH] UTF-8ifying the kernel source David Eger
` (2 preceding siblings ...)
2004-03-05 8:26 ` Miles Bader
@ 2004-03-05 23:24 ` David Eger
2004-03-05 23:33 ` H. Peter Anvin
` (2 more replies)
3 siblings, 3 replies; 22+ messages in thread
From: David Eger @ 2004-03-05 23:24 UTC (permalink / raw)
To: linux-kernel
There are now three patches available, and some work left to go.
The first patch hasn't changed, still the trivial ISO Latin-1 => UTF-8.
The second patch takes care of a lot of wrong and/or unneeded non-ASCII.
The third patch concerns 8-bit characters embedded in C strings.
These are almost always output to devfs or proc. The characters used are
the degrees symbol (for ppc temp. sensors) and mu (for micro-seconds).
I do not want to make a value judgement on what the kernel outputs
to userspace, so I leave the strings the same. However, C99 makes it
implementation defined how the source character set is translated to
the character set in the compiled binary... Therefore, I've taken the
raw octets and converted them in the source file to octal constants in
the strings, just to make sure cc doesn't mangle things if you set your
locale differently...
http://www.yak.net/random/linux-2.6.3-utf8-cleanup-auto.diff.bz2
http://www.yak.net/random/linux-2.6.3-utf8-cleanup-wrong.diff
http://www.yak.net/random/linux-2.6.3-utf8-cleanup-cstrings.diff
-dte
Un-needed/wrong non-ASCII characters (patch 2)
==============================================
drivers/video/amifb.c - +- sign (NOTE: X's .ttf files just don't have it)
Documentation/i2c/i2c-protocol - NBSP, but why? (made regular space)
arch/i386/kernel/cpu/cyrix.c - NBSP, but why? (made regular space)
include/linux/802_11.h - why the non-standard dash? (made regular dash)
scripts/docproc.c - why the bizarre spelling for specific? (fixed)
fs/ext2/xattr.c - bad ASCII art (made regular pipe - fixed)
fs/ext3/xattr.c - bad ASCII art (made regular pipe - fixed)
arch/arm/nwfpe/fpopcode.h - line-drawing characters (fixed)
include/asm-m68k/atarihw.h - 0x94? no, it's an ö, for Björn
include/asm-m68k/atariints.h - 0x94? no, it's an ö, for Björn
C strings - (patch 3)
=====================
arch/ppc/platforms/proc_rtas.c - a C string w/"degrees": exports to proc
arch/ppc64/kernel/rtas-proc.c - a C string w/"degrees": exports to proc
drivers/macintosh/therm_adt7467.c - temperature reporting (degrees sign)
- several printk's, output to a devfs interface, MODULE_PARAM_DESC(),
drivers/mtd/chips/cfi_probe.c - time reporting (micro sign)
- printk's in the DEBUG code
drivers/net/wireless/netwave_cs.c - module version string
(author's name - but it doesn't seem to be *used* for anything...)
BELOW HERE not fixed...
(was going to be fixed w/ patch, but, umm, huh?)
==================================================
arch/v850/kernel/as85ep1.ld - according to Miles Bader,
it's EUC-JP in the comments, and e.g. as85ep1.c uses ISO-2022-JP...
drivers/char/ftape/lowlevel/fdc-isr.c - WTF? shit in the comments
fs/afs/vlclient.h - a degrees sign, but why? (author says he'll get it)
drivers/scsi/dc395x.c - C debug strings... is this chinese traditional?
Documentation/networking/tms380tr.txt - DOS-style ASCII art
Other - (i'd convert it, but...)
================================
drivers/pci/pci.ids - I don't know what program processes this...
drivers/ieee1394/oui.db - I don't know what program processes this...
Machine / charset specific shite - (does anything need to be done?)
===================================================================
arch/m68k/hp300/hp300map.map - maps to "char"s.. grr
drivers/char/defkeymap.map - a map file... maps to "char"s.. grr
drivers/char/qtronixmap.c_shipped - maps to "char"s.. grr
drivers/char/qtronixmap.map - maps to "char"s.. grr
drivers/tc/lk201-map.c_shipped - maps to "char"s.. grr
drivers/tc/lk201-map.map - maps to "char"s.. grr
drivers/acorn/char/defkeymap-l7200.c - maps to "char"s.. grr
arch/s390/kernel/ebcdic.c - comments on a keymap table
drivers/video/console/font_8x16.c - comments on a keymap table
drivers/video/console/font_8x8.c - comments on a keymap table
drivers/video/console/font_pearl_8x8.c - comments on a keymap table
drivers/s390/ebcdic.c - comments on a keymap table
Noise from userland (this I won't be touching)
==============================================
Documentation/networking/ethertap.txt - random crap cat'd from /dev/tap0
Documentation/s390/Debugging390.txt - weird gdb output
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-05 23:24 ` David Eger
@ 2004-03-05 23:33 ` H. Peter Anvin
2004-03-06 11:08 ` Xavier Bestel
2004-03-06 13:33 ` Other bizarre thing... backspaces? David Eger
2004-03-09 12:19 ` [PATCH] UTF-8ifying the kernel source Geert Uytterhoeven
2 siblings, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2004-03-05 23:33 UTC (permalink / raw)
To: linux-kernel
Followup to: <20040305232425.GA6239@havoc.gtf.org>
By author: David Eger <eger@havoc.gtf.org>
In newsgroup: linux.dev.kernel
> The third patch concerns 8-bit characters embedded in C strings.
> These are almost always output to devfs or proc. The characters used are
> the degrees symbol (for ppc temp. sensors) and mu (for micro-seconds).
> I do not want to make a value judgement on what the kernel outputs
> to userspace, so I leave the strings the same. However, C99 makes it
> implementation defined how the source character set is translated to
> the character set in the compiled binary... Therefore, I've taken the
> raw octets and converted them in the source file to octal constants in
> the strings, just to make sure cc doesn't mangle things if you set your
> locale differently...
>
I would highly vote for making those UTF-8 unless it breaks protocol.
Plain ASCII would be better, though.
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-05 23:33 ` H. Peter Anvin
@ 2004-03-06 11:08 ` Xavier Bestel
2004-03-06 11:14 ` Måns Rullgård
2004-03-09 0:30 ` H. Peter Anvin
0 siblings, 2 replies; 22+ messages in thread
From: Xavier Bestel @ 2004-03-06 11:08 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Linux Kernel Mailing List
Le sam 06/03/2004 à 00:33, H. Peter Anvin a écrit :
> Followup to: <20040305232425.GA6239@havoc.gtf.org>
> By author: David Eger <eger@havoc.gtf.org>
> In newsgroup: linux.dev.kernel
>
> > The third patch concerns 8-bit characters embedded in C strings.
> > These are almost always output to devfs or proc. The characters used are
> > the degrees symbol (for ppc temp. sensors) and mu (for micro-seconds).
>
> I would highly vote for making those UTF-8 unless it breaks protocol.
ISO-8859-1 characters are mostly the same in UTF-8.
Xav
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-06 11:08 ` Xavier Bestel
@ 2004-03-06 11:14 ` Måns Rullgård
2004-03-09 0:30 ` H. Peter Anvin
1 sibling, 0 replies; 22+ messages in thread
From: Måns Rullgård @ 2004-03-06 11:14 UTC (permalink / raw)
To: linux-kernel
Xavier Bestel <xavier.bestel@free.fr> writes:
> Le sam 06/03/2004 à 00:33, H. Peter Anvin a écrit :
>> Followup to: <20040305232425.GA6239@havoc.gtf.org>
>> By author: David Eger <eger@havoc.gtf.org>
>> In newsgroup: linux.dev.kernel
>>
>> > The third patch concerns 8-bit characters embedded in C strings.
>> > These are almost always output to devfs or proc. The characters used are
>> > the degrees symbol (for ppc temp. sensors) and mu (for micro-seconds).
>>
>> I would highly vote for making those UTF-8 unless it breaks protocol.
>
> ISO-8859-1 characters are mostly the same in UTF-8.
The 7-bit ones are the same. The 8-bit ones are all different.
--
Måns Rullgård
mru@kth.se
^ permalink raw reply [flat|nested] 22+ messages in thread
* Other bizarre thing... backspaces?
2004-03-05 23:24 ` David Eger
2004-03-05 23:33 ` H. Peter Anvin
@ 2004-03-06 13:33 ` David Eger
2004-03-06 14:04 ` Måns Rullgård
2004-03-09 12:19 ` [PATCH] UTF-8ifying the kernel source Geert Uytterhoeven
2 siblings, 1 reply; 22+ messages in thread
From: David Eger @ 2004-03-06 13:33 UTC (permalink / raw)
To: linux-kernel
There are five files with embedded backspace octets in them.... ;-)
fs/hfs/FAQ.txt
fs/hfs/HFS.txt
fs/hfs/INSTALL.txt
Documentation/filesystems/coda.txt
Documentation/uml/UserModeLinux-HOWTO.txt
-dte
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Other bizarre thing... backspaces?
2004-03-06 13:33 ` Other bizarre thing... backspaces? David Eger
@ 2004-03-06 14:04 ` Måns Rullgård
2004-03-14 16:25 ` Petr Baudis
0 siblings, 1 reply; 22+ messages in thread
From: Måns Rullgård @ 2004-03-06 14:04 UTC (permalink / raw)
To: linux-kernel
David Eger <eger@havoc.gtf.org> writes:
> There are five files with embedded backspace octets in them.... ;-)
That's an old way to do underlining and bold face and it seems like at
least coda.txt is doing that. If I could choose I'd probably just
remove them.
--
Måns Rullgård
mru@kth.se
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-06 11:08 ` Xavier Bestel
2004-03-06 11:14 ` Måns Rullgård
@ 2004-03-09 0:30 ` H. Peter Anvin
2004-03-09 9:49 ` Xavier Bestel
1 sibling, 1 reply; 22+ messages in thread
From: H. Peter Anvin @ 2004-03-09 0:30 UTC (permalink / raw)
To: linux-kernel
Followup to: <1078571331.963.3.camel@bip.parateam.prv>
By author: Xavier Bestel <xavier.bestel@free.fr>
In newsgroup: linux.dev.kernel
>
> Le sam 06/03/2004 à 00:33, H. Peter Anvin a écrit :
> > Followup to: <20040305232425.GA6239@havoc.gtf.org>
> > By author: David Eger <eger@havoc.gtf.org>
> > In newsgroup: linux.dev.kernel
> >
> > > The third patch concerns 8-bit characters embedded in C strings.
> > > These are almost always output to devfs or proc. The characters used are
> > > the degrees symbol (for ppc temp. sensors) and mu (for micro-seconds).
> >
> > I would highly vote for making those UTF-8 unless it breaks protocol.
>
> ISO-8859-1 characters are mostly the same in UTF-8.
>
Unicode, yes. UTF-8, no. The ISO-8859-1 character "Å" (0xC5) does,
indeed correspond to Unicode character U+00C5, but it's encoded 0xC3
0x85 in UTF-8.
-hpa
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-09 0:30 ` H. Peter Anvin
@ 2004-03-09 9:49 ` Xavier Bestel
0 siblings, 0 replies; 22+ messages in thread
From: Xavier Bestel @ 2004-03-09 9:49 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: linux-kernel
On Tue, 2004-03-09 at 00:30 +0000, H. Peter Anvin wrote:
> Followup to: <1078571331.963.3.camel@bip.parateam.prv>
> By author: Xavier Bestel <xavier.bestel@free.fr>
> > ISO-8859-1 characters are mostly the same in UTF-8.
> >
>
> Unicode, yes. UTF-8, no. The ISO-8859-1 character "Å" (0xC5) does,
> indeed correspond to Unicode character U+00C5, but it's encoded 0xC3
> 0x85 in UTF-8.
Yeah, that's what I realized, after posting of course.
While utf-8ying the sources is certainly a good thing, I have mixed
feelings about kernel strings. It will render poorly in some
environments.
Maybe the all-ascii route is better for strings ?
Xav
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
2004-03-05 23:24 ` David Eger
2004-03-05 23:33 ` H. Peter Anvin
2004-03-06 13:33 ` Other bizarre thing... backspaces? David Eger
@ 2004-03-09 12:19 ` Geert Uytterhoeven
2 siblings, 0 replies; 22+ messages in thread
From: Geert Uytterhoeven @ 2004-03-09 12:19 UTC (permalink / raw)
To: David Eger; +Cc: Linux Kernel Development
On Fri, 5 Mar 2004, David Eger wrote:
> Un-needed/wrong non-ASCII characters (patch 2)
> ==============================================
> drivers/video/amifb.c - +- sign (NOTE: X's .ttf files just don't have it)
do_blank is either 0 (do nothing), -1 (unblank), or +1 (blank).
You can replace it by `+/-1' if you want.
> include/asm-m68k/atarihw.h - 0x94? no, it's an ö, for Björn
> include/asm-m68k/atariints.h - 0x94? no, it's an ö, for Björn
Yep.
> Machine / charset specific shite - (does anything need to be done?)
> ===================================================================
> arch/m68k/hp300/hp300map.map - maps to "char"s.. grr
> drivers/char/defkeymap.map - a map file... maps to "char"s.. grr
> drivers/char/qtronixmap.c_shipped - maps to "char"s.. grr
> drivers/char/qtronixmap.map - maps to "char"s.. grr
> drivers/tc/lk201-map.c_shipped - maps to "char"s.. grr
> drivers/tc/lk201-map.map - maps to "char"s.. grr
> drivers/acorn/char/defkeymap-l7200.c - maps to "char"s.. grr
If you want the keyboard to generate UTF-8, I think you should change these
(not sure, please test).
> drivers/video/console/font_8x16.c - comments on a keymap table
> drivers/video/console/font_8x8.c - comments on a keymap table
> drivers/video/console/font_pearl_8x8.c - comments on a keymap table
These fonts have the box-drawing ASCII art.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Re: Other bizarre thing... backspaces?
2004-03-06 14:04 ` Måns Rullgård
@ 2004-03-14 16:25 ` Petr Baudis
0 siblings, 0 replies; 22+ messages in thread
From: Petr Baudis @ 2004-03-14 16:25 UTC (permalink / raw)
To: M?ns Rullg?rd; +Cc: linux-kernel
Dear diary, on Sat, Mar 06, 2004 at 03:04:35PM CET, I got a letter,
where M?ns Rullg?rd <mru@kth.se> told me, that...
> David Eger <eger@havoc.gtf.org> writes:
>
> > There are five files with embedded backspace octets in them.... ;-)
>
> That's an old way to do underlining and bold face and it seems like at
> least coda.txt is doing that. If I could choose I'd probably just
> remove them.
Well, what's the "new way" for ASCII documents? At least less produces a
desired result.
Kind regards,
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH] UTF-8ifying the kernel source
@ 2004-03-05 13:21 paolo ciarrocchi
0 siblings, 0 replies; 22+ messages in thread
From: paolo ciarrocchi @ 2004-03-05 13:21 UTC (permalink / raw)
To: linux-kernel
Sorry to jump in to this thread without providing any useful information...
I'm looking for doc and/or links to info regardign UTF8 and iso-*.
Any hints ?
Thanks in advance.
Ciao,
Paolo
_________________________________________________________________
Filtri antispamming e antivirus per la tua casella di posta
http://www.msn.it/msn/hotmail
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2004-03-14 23:32 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-04 10:05 [PATCH] UTF-8ifying the kernel source David Eger
2004-03-04 10:19 ` Meelis Roos
2004-03-04 10:32 ` Måns Rullgård
2004-03-04 21:51 ` Alex Belits
2004-03-05 8:26 ` Miles Bader
2004-03-05 20:01 ` H. Peter Anvin
2004-03-05 21:00 ` Mike Fedyk
2004-03-05 21:02 ` H. Peter Anvin
2004-03-05 21:17 ` Måns Rullgård
2004-03-05 21:26 ` Charles Cazabon
2004-03-05 21:20 ` David Eger
2004-03-05 23:24 ` David Eger
2004-03-05 23:33 ` H. Peter Anvin
2004-03-06 11:08 ` Xavier Bestel
2004-03-06 11:14 ` Måns Rullgård
2004-03-09 0:30 ` H. Peter Anvin
2004-03-09 9:49 ` Xavier Bestel
2004-03-06 13:33 ` Other bizarre thing... backspaces? David Eger
2004-03-06 14:04 ` Måns Rullgård
2004-03-14 16:25 ` Petr Baudis
2004-03-09 12:19 ` [PATCH] UTF-8ifying the kernel source Geert Uytterhoeven
2004-03-05 13:21 paolo ciarrocchi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).