linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] UTF-8ifying the kernel source
@ 2004-03-04 10:05 David Eger
  2004-03-04 10:19 ` Meelis Roos
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: David Eger @ 2004-03-04 10:05 UTC (permalink / raw)
  To: linux-kernel



http://www.yak.net/random/linux-2.6.3-utf8-cleanup-auto.diff.bz2

Here you find the first of several patches to convert the kernel
source from ISO Latin-1 to UTF-8.  I'm working on the files that didn't
auto-convert easily; comments welcome ;-)

First, some statistics!

In Linux 2.6.3, there are:
15860 clean 7-bit ASCII files
274 text files are not 7-bit clean

38 of these 274 files are not auto-convertible -- either they are not ISO
Latin-1 or the high octets appear within the actual code (not comments).

This first patch applies to help files, documentation, and comments which
are trivially correct ISO Latin-1 => UTF-8 conversions.  The work I have
left to do is summarized below.

--dte


Un-needed/wrong non-ASCII characters (these fixes will form patch 2)
====================================================================
drivers/video/amifb.c	- +- sign?
Documentation/i2c/i2c-protocol	- NBSP, but why?
arch/i386/kernel/cpu/cyrix.c	- NBSP, but why?
arch/v850/kernel/as85ep1.ld	- WTF? comments in some random charset...
drivers/char/ftape/lowlevel/fdc-isr.c	- WTF? shit in the comments
include/asm-m68k/atarihw.h	- 0x94 - "cancel character"?
include/asm-m68k/atariints.h	- 0x94 - "cancel character"?
include/linux/802_11.h - why the non-standard dash?
scripts/docproc.c	- why the bizarre spelling for specific?
fs/ext2/xattr.c	- bad ASCII art
fs/ext3/xattr.c	- bad ASCII art
fs/afs/vlclient.h	- a degrees sign, but why?

Box-drawing ASCII art (these fixes will form patch 3)
=====================================================
Documentation/networking/tms380tr.txt	- DOS-style ASCII art
arch/arm/nwfpe/fpopcode.h	- line-drawing characters

C strings - (what to do?)
=========================
arch/ppc/platforms/proc_rtas.c	-  a C string containing "degrees"
arch/ppc64/kernel/rtas-proc.c	-  a C string containing "degrees"
drivers/macintosh/therm_adt7467.c	- degrees, MODULE_PARAM_DESC(), 
					  and a C string
drivers/mtd/chips/cfi_probe.c	- C strings
drivers/net/wireless/netwave_cs.c	- C strings	
drivers/scsi/dc395x.c	- C strings

Other - (i'd convert it, but...)
================================
drivers/pci/pci.ids	- I don't know what program processes this...
drivers/ieee1394/oui.db	- I don't know what program processes this...

Machine / charset specific shite - (does anything need to be done?)
===================================================================
arch/m68k/hp300/hp300map.map	- maps to "char"s.. grr
drivers/char/defkeymap.map	- a map file... maps to "char"s.. grr
drivers/char/qtronixmap.c_shipped	- maps to "char"s.. grr
drivers/char/qtronixmap.map	- maps to "char"s.. grr
drivers/tc/lk201-map.c_shipped	- maps to "char"s.. grr
drivers/tc/lk201-map.map	- maps to "char"s.. grr
drivers/acorn/char/defkeymap-l7200.c	- maps to "char"s.. grr
arch/s390/kernel/ebcdic.c	- comments on a keymap table
drivers/video/console/font_8x16.c	- comments on a keymap table 
drivers/video/console/font_8x8.c	- comments on a keymap table 
drivers/video/console/font_pearl_8x8.c	- comments on a keymap table 
drivers/s390/ebcdic.c	- comments on a keymap table

Noise from userland (this I won't be touching)
==============================================
Documentation/networking/ethertap.txt	- random crap cat'd from /dev/tap0
Documentation/s390/Debugging390.txt	- weird gdb output


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-04 10:05 [PATCH] UTF-8ifying the kernel source David Eger
@ 2004-03-04 10:19 ` Meelis Roos
  2004-03-04 10:32   ` Måns Rullgård
  2004-03-04 21:51 ` Alex Belits
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 21+ messages in thread
From: Meelis Roos @ 2004-03-04 10:19 UTC (permalink / raw)
  To: linux-kernel, eger

DE> Here you find the first of several patches to convert the kernel
DE> source from ISO Latin-1 to UTF-8.  I'm working on the files that didn't
DE> auto-convert easily; comments welcome ;-)

Why? It's just easier to use plain 8-bit text files today (with editors,
code tools etc) and accept the limitations of it that to overcome the
limitations by forcing people to UTF-8 editors & other tools.

I am not a kernel developer but this seems a bad idea to me.

-- 
Meelis Roos

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-04 10:19 ` Meelis Roos
@ 2004-03-04 10:32   ` Måns Rullgård
  0 siblings, 0 replies; 21+ messages in thread
From: Måns Rullgård @ 2004-03-04 10:32 UTC (permalink / raw)
  To: linux-kernel

Meelis Roos <mroos@linux.ee> writes:

> DE> Here you find the first of several patches to convert the kernel
> DE> source from ISO Latin-1 to UTF-8.  I'm working on the files that didn't
> DE> auto-convert easily; comments welcome ;-)
>
> Why? It's just easier to use plain 8-bit text files today (with editors,
> code tools etc) and accept the limitations of it that to overcome the
> limitations by forcing people to UTF-8 editors & other tools.

How do you propose that editors should know which encoding a file
uses?  The trend seems to be moving towards UTF-8 for everything, so
the kernel might as well do it too.

-- 
Måns Rullgård
mru@kth.se


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-04 10:05 [PATCH] UTF-8ifying the kernel source David Eger
  2004-03-04 10:19 ` Meelis Roos
@ 2004-03-04 21:51 ` Alex Belits
  2004-03-05  8:26 ` Miles Bader
  2004-03-05 23:24 ` David Eger
  3 siblings, 0 replies; 21+ messages in thread
From: Alex Belits @ 2004-03-04 21:51 UTC (permalink / raw)
  To: David Eger; +Cc: linux-kernel

On Thu, 4 Mar 2004, David Eger wrote:

> http://www.yak.net/random/linux-2.6.3-utf8-cleanup-auto.diff.bz2
>
> Here you find the first of several patches to convert the kernel
> source from ISO Latin-1 to UTF-8.  I'm working on the files that didn't
> auto-convert easily; comments welcome ;-)
>
> First, some statistics!
>
> In Linux 2.6.3, there are:
> 15860 clean 7-bit ASCII files
> 274 text files are not 7-bit clean
>
> 38 of these 274 files are not auto-convertible -- either they are not ISO
> Latin-1 or the high octets appear within the actual code (not comments).
>
> This first patch applies to help files, documentation, and comments which
> are trivially correct ISO Latin-1 => UTF-8 conversions.  The work I have
> left to do is summarized below.

  That will be of a great help for the future developers that will edit
kernel sources in Microsoft Word.

[a large collection of expletives in multiple languages and charsets is
skipped here]

-- 
Alex

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-04 10:05 [PATCH] UTF-8ifying the kernel source David Eger
  2004-03-04 10:19 ` Meelis Roos
  2004-03-04 21:51 ` Alex Belits
@ 2004-03-05  8:26 ` Miles Bader
  2004-03-05 20:01   ` H. Peter Anvin
  2004-03-05 23:24 ` David Eger
  3 siblings, 1 reply; 21+ messages in thread
From: Miles Bader @ 2004-03-05  8:26 UTC (permalink / raw)
  To: David Eger; +Cc: linux-kernel

David Eger <eger@havoc.gtf.org> writes:
> arch/v850/kernel/as85ep1.ld	- WTF? comments in some random charset...

FWIW, the charset is EUC-JP.

Even other files in that same directory aren't consistent, e.g.,
as85ep1.c uses ISO-2022-JP.

[My fault, but it never really registered on my important-enough-to fix
radar (emacs autodetects them all so I never really noticed the
discrepancy).]

-Miles
-- 
We are all lying in the gutter, but some of us are looking at the stars.
-Oscar Wilde

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-05  8:26 ` Miles Bader
@ 2004-03-05 20:01   ` H. Peter Anvin
  2004-03-05 21:00     ` Mike Fedyk
  0 siblings, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2004-03-05 20:01 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <buovfljbsyl.fsf@mcspd15.ucom.lsi.nec.co.jp>
By author:    Miles Bader <miles@lsi.nec.co.jp>
In newsgroup: linux.dev.kernel
>
> David Eger <eger@havoc.gtf.org> writes:
> > arch/v850/kernel/as85ep1.ld	- WTF? comments in some random charset...
> 
> FWIW, the charset is EUC-JP.
> 
> Even other files in that same directory aren't consistent, e.g.,
> as85ep1.c uses ISO-2022-JP.
> 
> [My fault, but it never really registered on my important-enough-to fix
> radar (emacs autodetects them all so I never really noticed the
> discrepancy).]
> 

OK, this is definitely a good reason to go to UTF-8 across the board.

	-hpa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-05 20:01   ` H. Peter Anvin
@ 2004-03-05 21:00     ` Mike Fedyk
  2004-03-05 21:02       ` H. Peter Anvin
  2004-03-05 21:20       ` David Eger
  0 siblings, 2 replies; 21+ messages in thread
From: Mike Fedyk @ 2004-03-05 21:00 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

H. Peter Anvin wrote:
> Followup to:  <buovfljbsyl.fsf@mcspd15.ucom.lsi.nec.co.jp>
> By author:    Miles Bader <miles@lsi.nec.co.jp>
> In newsgroup: linux.dev.kernel
> 
>>David Eger <eger@havoc.gtf.org> writes:
>>
>>>arch/v850/kernel/as85ep1.ld	- WTF? comments in some random charset...
>>
>>FWIW, the charset is EUC-JP.
>>
>>Even other files in that same directory aren't consistent, e.g.,
>>as85ep1.c uses ISO-2022-JP.
>>
>>[My fault, but it never really registered on my important-enough-to fix
>>radar (emacs autodetects them all so I never really noticed the
>>discrepancy).]
>>
> 
> 
> OK, this is definitely a good reason to go to UTF-8 across the board.

So when is "less" going to support utf8?  Right now, it just shows 
escape codes... :(

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-05 21:00     ` Mike Fedyk
@ 2004-03-05 21:02       ` H. Peter Anvin
  2004-03-05 21:17         ` Måns Rullgård
  2004-03-05 21:20       ` David Eger
  1 sibling, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2004-03-05 21:02 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel

Mike Fedyk wrote:
>>
>> OK, this is definitely a good reason to go to UTF-8 across the board.
> 
> So when is "less" going to support utf8?  Right now, it just shows
> escape codes... :(
>

Why don't you ask the "less" maintainer about that?

Right now, "less" seems to insist on showing ampersands for *any*
non-ASCII character for me...

	-hpa


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-05 21:02       ` H. Peter Anvin
@ 2004-03-05 21:17         ` Måns Rullgård
  2004-03-05 21:26           ` Charles Cazabon
  0 siblings, 1 reply; 21+ messages in thread
From: Måns Rullgård @ 2004-03-05 21:17 UTC (permalink / raw)
  To: linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> Mike Fedyk wrote:
>>>
>>> OK, this is definitely a good reason to go to UTF-8 across the board.
>> 
>> So when is "less" going to support utf8?  Right now, it just shows
>> escape codes... :(
>>
>
> Why don't you ask the "less" maintainer about that?
>
> Right now, "less" seems to insist on showing ampersands for *any*
> non-ASCII character for me...

Less version 381 is working fine here with UTF-8.  I have LANG and
LC_CTYPE set to en_US.UTF-8.

-- 
Måns Rullgård
mru@kth.se


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-05 21:00     ` Mike Fedyk
  2004-03-05 21:02       ` H. Peter Anvin
@ 2004-03-05 21:20       ` David Eger
  1 sibling, 0 replies; 21+ messages in thread
From: David Eger @ 2004-03-05 21:20 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel

On Fri, Mar 05, 2004 at 01:00:55PM -0800, Mike Fedyk wrote:
> 
> So when is "less" going to support utf8?  Right now, it just shows 
> escape codes... :(

bash user? try:
$ export LESSCHARSET="utf-8"
$ less myfavoritefile.c

-dte ;-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-05 21:17         ` Måns Rullgård
@ 2004-03-05 21:26           ` Charles Cazabon
  0 siblings, 0 replies; 21+ messages in thread
From: Charles Cazabon @ 2004-03-05 21:26 UTC (permalink / raw)
  To: linux-kernel

Måns Rullgård <mru@kth.se> wrote:
> >
> > Right now, "less" seems to insist on showing ampersands for *any*
> > non-ASCII character for me...
> 
> Less version 381 is working fine here with UTF-8.  I have LANG and
> LC_CTYPE set to en_US.UTF-8.

less 340 works fine here with the same settings.

Charles
-- 
-----------------------------------------------------------------------
Charles Cazabon                            <linux@discworld.dyndns.org>
GPL'ed software available at:     http://www.qcc.ca/~charlesc/software/
-----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-04 10:05 [PATCH] UTF-8ifying the kernel source David Eger
                   ` (2 preceding siblings ...)
  2004-03-05  8:26 ` Miles Bader
@ 2004-03-05 23:24 ` David Eger
  2004-03-05 23:33   ` H. Peter Anvin
                     ` (2 more replies)
  3 siblings, 3 replies; 21+ messages in thread
From: David Eger @ 2004-03-05 23:24 UTC (permalink / raw)
  To: linux-kernel

There are now three patches available, and some work left to go.

The first patch hasn't changed, still the trivial ISO Latin-1 => UTF-8.

The second patch takes care of a lot of wrong and/or unneeded non-ASCII.

The third patch concerns 8-bit characters embedded in C strings.
These are almost always output to devfs or proc.  The characters used are
the degrees symbol (for ppc temp. sensors) and mu (for micro-seconds).
I do not want to make a value judgement on what the kernel outputs
to userspace, so I leave the strings the same.  However, C99 makes it
implementation defined how the source character set is translated to
the character set in the compiled binary...  Therefore, I've taken the
raw octets and converted them in the source file to octal constants in
the strings, just to make sure cc doesn't mangle things if you set your
locale differently...

http://www.yak.net/random/linux-2.6.3-utf8-cleanup-auto.diff.bz2
http://www.yak.net/random/linux-2.6.3-utf8-cleanup-wrong.diff
http://www.yak.net/random/linux-2.6.3-utf8-cleanup-cstrings.diff

-dte


Un-needed/wrong non-ASCII characters (patch 2)
==============================================
drivers/video/amifb.c	- +- sign (NOTE: X's .ttf files just don't have it)
Documentation/i2c/i2c-protocol	- NBSP, but why? (made regular space)
arch/i386/kernel/cpu/cyrix.c	- NBSP, but why? (made regular space)
include/linux/802_11.h - why the non-standard dash? (made regular dash)
scripts/docproc.c	- why the bizarre spelling for specific? (fixed)
fs/ext2/xattr.c	- bad ASCII art (made regular pipe - fixed)
fs/ext3/xattr.c	- bad ASCII art (made regular pipe - fixed)
arch/arm/nwfpe/fpopcode.h	- line-drawing characters (fixed)
include/asm-m68k/atarihw.h	- 0x94? no, it's an ö, for Björn
include/asm-m68k/atariints.h	- 0x94? no, it's an ö, for Björn

C strings - (patch 3)
=====================
arch/ppc/platforms/proc_rtas.c	-  a C string w/"degrees": exports to proc
arch/ppc64/kernel/rtas-proc.c	-  a C string w/"degrees": exports to proc
drivers/macintosh/therm_adt7467.c	- temperature reporting (degrees sign)
	- several printk's, output to a devfs interface, MODULE_PARAM_DESC(), 
drivers/mtd/chips/cfi_probe.c	- time reporting (micro sign) 
	- printk's in the DEBUG code
drivers/net/wireless/netwave_cs.c	- module version string 
   (author's name - but it doesn't seem to be *used* for anything...)

BELOW HERE not fixed...

(was going to be fixed w/ patch, but, umm, huh?)
==================================================
arch/v850/kernel/as85ep1.ld	- according to Miles Bader, 
	it's EUC-JP in the comments, and e.g. as85ep1.c uses ISO-2022-JP...
drivers/char/ftape/lowlevel/fdc-isr.c	- WTF? shit in the comments
fs/afs/vlclient.h	- a degrees sign, but why? (author says he'll get it)
drivers/scsi/dc395x.c	- C debug strings... is this chinese traditional?
Documentation/networking/tms380tr.txt	- DOS-style ASCII art 

Other - (i'd convert it, but...)
================================
drivers/pci/pci.ids	- I don't know what program processes this...
drivers/ieee1394/oui.db	- I don't know what program processes this...

Machine / charset specific shite - (does anything need to be done?)
===================================================================
arch/m68k/hp300/hp300map.map	- maps to "char"s.. grr
drivers/char/defkeymap.map	- a map file... maps to "char"s.. grr
drivers/char/qtronixmap.c_shipped	- maps to "char"s.. grr
drivers/char/qtronixmap.map	- maps to "char"s.. grr
drivers/tc/lk201-map.c_shipped	- maps to "char"s.. grr
drivers/tc/lk201-map.map	- maps to "char"s.. grr
drivers/acorn/char/defkeymap-l7200.c	- maps to "char"s.. grr
arch/s390/kernel/ebcdic.c	- comments on a keymap table
drivers/video/console/font_8x16.c	- comments on a keymap table 
drivers/video/console/font_8x8.c	- comments on a keymap table 
drivers/video/console/font_pearl_8x8.c	- comments on a keymap table 
drivers/s390/ebcdic.c	- comments on a keymap table

Noise from userland (this I won't be touching)
==============================================
Documentation/networking/ethertap.txt	- random crap cat'd from /dev/tap0
Documentation/s390/Debugging390.txt	- weird gdb output


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-05 23:24 ` David Eger
@ 2004-03-05 23:33   ` H. Peter Anvin
  2004-03-06 11:08     ` Xavier Bestel
  2004-03-06 13:33   ` Other bizarre thing... backspaces? David Eger
  2004-03-09 12:19   ` [PATCH] UTF-8ifying the kernel source Geert Uytterhoeven
  2 siblings, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2004-03-05 23:33 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20040305232425.GA6239@havoc.gtf.org>
By author:    David Eger <eger@havoc.gtf.org>
In newsgroup: linux.dev.kernel

> The third patch concerns 8-bit characters embedded in C strings.
> These are almost always output to devfs or proc.  The characters used are
> the degrees symbol (for ppc temp. sensors) and mu (for micro-seconds).
> I do not want to make a value judgement on what the kernel outputs
> to userspace, so I leave the strings the same.  However, C99 makes it
> implementation defined how the source character set is translated to
> the character set in the compiled binary...  Therefore, I've taken the
> raw octets and converted them in the source file to octal constants in
> the strings, just to make sure cc doesn't mangle things if you set your
> locale differently...
> 

I would highly vote for making those UTF-8 unless it breaks protocol.

Plain ASCII would be better, though.

	-hpa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-05 23:33   ` H. Peter Anvin
@ 2004-03-06 11:08     ` Xavier Bestel
  2004-03-06 11:14       ` Måns Rullgård
  2004-03-09  0:30       ` H. Peter Anvin
  0 siblings, 2 replies; 21+ messages in thread
From: Xavier Bestel @ 2004-03-06 11:08 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linux Kernel Mailing List

Le sam 06/03/2004 à 00:33, H. Peter Anvin a écrit :
> Followup to:  <20040305232425.GA6239@havoc.gtf.org>
> By author:    David Eger <eger@havoc.gtf.org>
> In newsgroup: linux.dev.kernel
> 
> > The third patch concerns 8-bit characters embedded in C strings.
> > These are almost always output to devfs or proc.  The characters used are
> > the degrees symbol (for ppc temp. sensors) and mu (for micro-seconds).
>
> I would highly vote for making those UTF-8 unless it breaks protocol.

ISO-8859-1 characters are mostly the same in UTF-8.

	Xav


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-06 11:08     ` Xavier Bestel
@ 2004-03-06 11:14       ` Måns Rullgård
  2004-03-09  0:30       ` H. Peter Anvin
  1 sibling, 0 replies; 21+ messages in thread
From: Måns Rullgård @ 2004-03-06 11:14 UTC (permalink / raw)
  To: linux-kernel

Xavier Bestel <xavier.bestel@free.fr> writes:

> Le sam 06/03/2004 à 00:33, H. Peter Anvin a écrit :
>> Followup to:  <20040305232425.GA6239@havoc.gtf.org>
>> By author:    David Eger <eger@havoc.gtf.org>
>> In newsgroup: linux.dev.kernel
>> 
>> > The third patch concerns 8-bit characters embedded in C strings.
>> > These are almost always output to devfs or proc.  The characters used are
>> > the degrees symbol (for ppc temp. sensors) and mu (for micro-seconds).
>>
>> I would highly vote for making those UTF-8 unless it breaks protocol.
>
> ISO-8859-1 characters are mostly the same in UTF-8.

The 7-bit ones are the same.  The 8-bit ones are all different.

-- 
Måns Rullgård
mru@kth.se


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Other bizarre thing... backspaces?
  2004-03-05 23:24 ` David Eger
  2004-03-05 23:33   ` H. Peter Anvin
@ 2004-03-06 13:33   ` David Eger
  2004-03-06 14:04     ` Måns Rullgård
  2004-03-09 12:19   ` [PATCH] UTF-8ifying the kernel source Geert Uytterhoeven
  2 siblings, 1 reply; 21+ messages in thread
From: David Eger @ 2004-03-06 13:33 UTC (permalink / raw)
  To: linux-kernel

There are five files with embedded backspace octets in them.... ;-)

fs/hfs/FAQ.txt
fs/hfs/HFS.txt
fs/hfs/INSTALL.txt
Documentation/filesystems/coda.txt 
Documentation/uml/UserModeLinux-HOWTO.txt 

-dte


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Other bizarre thing... backspaces?
  2004-03-06 13:33   ` Other bizarre thing... backspaces? David Eger
@ 2004-03-06 14:04     ` Måns Rullgård
  2004-03-14 16:25       ` Petr Baudis
  0 siblings, 1 reply; 21+ messages in thread
From: Måns Rullgård @ 2004-03-06 14:04 UTC (permalink / raw)
  To: linux-kernel

David Eger <eger@havoc.gtf.org> writes:

> There are five files with embedded backspace octets in them.... ;-)

That's an old way to do underlining and bold face and it seems like at
least coda.txt is doing that.  If I could choose I'd probably just
remove them.

-- 
Måns Rullgård
mru@kth.se


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-06 11:08     ` Xavier Bestel
  2004-03-06 11:14       ` Måns Rullgård
@ 2004-03-09  0:30       ` H. Peter Anvin
  2004-03-09  9:49         ` Xavier Bestel
  1 sibling, 1 reply; 21+ messages in thread
From: H. Peter Anvin @ 2004-03-09  0:30 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <1078571331.963.3.camel@bip.parateam.prv>
By author:    Xavier Bestel <xavier.bestel@free.fr>
In newsgroup: linux.dev.kernel
>
> Le sam 06/03/2004 à 00:33, H. Peter Anvin a écrit :
> > Followup to:  <20040305232425.GA6239@havoc.gtf.org>
> > By author:    David Eger <eger@havoc.gtf.org>
> > In newsgroup: linux.dev.kernel
> > 
> > > The third patch concerns 8-bit characters embedded in C strings.
> > > These are almost always output to devfs or proc.  The characters used are
> > > the degrees symbol (for ppc temp. sensors) and mu (for micro-seconds).
> >
> > I would highly vote for making those UTF-8 unless it breaks protocol.
> 
> ISO-8859-1 characters are mostly the same in UTF-8.
> 

Unicode, yes.  UTF-8, no.  The ISO-8859-1 character "Å" (0xC5) does,
indeed correspond to Unicode character U+00C5, but it's encoded 0xC3
0x85 in UTF-8.

	-hpa

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-09  0:30       ` H. Peter Anvin
@ 2004-03-09  9:49         ` Xavier Bestel
  0 siblings, 0 replies; 21+ messages in thread
From: Xavier Bestel @ 2004-03-09  9:49 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

On Tue, 2004-03-09 at 00:30 +0000, H. Peter Anvin wrote:

> Followup to:  <1078571331.963.3.camel@bip.parateam.prv>
> By author:    Xavier Bestel <xavier.bestel@free.fr>
> > ISO-8859-1 characters are mostly the same in UTF-8.
> > 
> 
> Unicode, yes.  UTF-8, no.  The ISO-8859-1 character "Å" (0xC5) does,
> indeed correspond to Unicode character U+00C5, but it's encoded 0xC3
> 0x85 in UTF-8.

Yeah, that's what I realized, after posting of course.
While utf-8ying the sources is certainly a good thing, I have mixed
feelings about kernel strings. It will render poorly in some
environments.
Maybe the all-ascii route is better for strings ?

	Xav


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] UTF-8ifying the kernel source
  2004-03-05 23:24 ` David Eger
  2004-03-05 23:33   ` H. Peter Anvin
  2004-03-06 13:33   ` Other bizarre thing... backspaces? David Eger
@ 2004-03-09 12:19   ` Geert Uytterhoeven
  2 siblings, 0 replies; 21+ messages in thread
From: Geert Uytterhoeven @ 2004-03-09 12:19 UTC (permalink / raw)
  To: David Eger; +Cc: Linux Kernel Development

On Fri, 5 Mar 2004, David Eger wrote:
> Un-needed/wrong non-ASCII characters (patch 2)
> ==============================================
> drivers/video/amifb.c	- +- sign (NOTE: X's .ttf files just don't have it)

do_blank is either 0 (do nothing), -1 (unblank), or +1 (blank).

You can replace it by `+/-1' if you want.

> include/asm-m68k/atarihw.h	- 0x94? no, it's an ö, for Björn
> include/asm-m68k/atariints.h	- 0x94? no, it's an ö, for Björn

Yep.

> Machine / charset specific shite - (does anything need to be done?)
> ===================================================================
> arch/m68k/hp300/hp300map.map	- maps to "char"s.. grr
> drivers/char/defkeymap.map	- a map file... maps to "char"s.. grr
> drivers/char/qtronixmap.c_shipped	- maps to "char"s.. grr
> drivers/char/qtronixmap.map	- maps to "char"s.. grr
> drivers/tc/lk201-map.c_shipped	- maps to "char"s.. grr
> drivers/tc/lk201-map.map	- maps to "char"s.. grr
> drivers/acorn/char/defkeymap-l7200.c	- maps to "char"s.. grr

If you want the keyboard to generate UTF-8, I think you should change these
(not sure, please test).

> drivers/video/console/font_8x16.c	- comments on a keymap table
> drivers/video/console/font_8x8.c	- comments on a keymap table
> drivers/video/console/font_pearl_8x8.c	- comments on a keymap table

These fonts have the box-drawing ASCII art.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Re: Other bizarre thing... backspaces?
  2004-03-06 14:04     ` Måns Rullgård
@ 2004-03-14 16:25       ` Petr Baudis
  0 siblings, 0 replies; 21+ messages in thread
From: Petr Baudis @ 2004-03-14 16:25 UTC (permalink / raw)
  To: M?ns Rullg?rd; +Cc: linux-kernel

Dear diary, on Sat, Mar 06, 2004 at 03:04:35PM CET, I got a letter,
where M?ns Rullg?rd <mru@kth.se> told me, that...
> David Eger <eger@havoc.gtf.org> writes:
> 
> > There are five files with embedded backspace octets in them.... ;-)
> 
> That's an old way to do underlining and bold face and it seems like at
> least coda.txt is doing that.  If I could choose I'd probably just
> remove them.

Well, what's the "new way" for ASCII documents? At least less produces a
desired result.

Kind regards,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
The reasonable man adapts himself to the world; the unreasonable one
persists in trying to adapt the world to himself. Therefore all
progress depends on the unreasonable man. -- George Bernard Shaw

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2004-03-14 23:32 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-04 10:05 [PATCH] UTF-8ifying the kernel source David Eger
2004-03-04 10:19 ` Meelis Roos
2004-03-04 10:32   ` Måns Rullgård
2004-03-04 21:51 ` Alex Belits
2004-03-05  8:26 ` Miles Bader
2004-03-05 20:01   ` H. Peter Anvin
2004-03-05 21:00     ` Mike Fedyk
2004-03-05 21:02       ` H. Peter Anvin
2004-03-05 21:17         ` Måns Rullgård
2004-03-05 21:26           ` Charles Cazabon
2004-03-05 21:20       ` David Eger
2004-03-05 23:24 ` David Eger
2004-03-05 23:33   ` H. Peter Anvin
2004-03-06 11:08     ` Xavier Bestel
2004-03-06 11:14       ` Måns Rullgård
2004-03-09  0:30       ` H. Peter Anvin
2004-03-09  9:49         ` Xavier Bestel
2004-03-06 13:33   ` Other bizarre thing... backspaces? David Eger
2004-03-06 14:04     ` Måns Rullgård
2004-03-14 16:25       ` Petr Baudis
2004-03-09 12:19   ` [PATCH] UTF-8ifying the kernel source Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).