linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* A Great Idea (tm) about reimplementing NLS.
@ 2005-06-13 10:38 Alexey Zaytsev
  2005-06-13 10:49 ` Stefan Smietanowski
                   ` (3 more replies)
  0 siblings, 4 replies; 70+ messages in thread
From: Alexey Zaytsev @ 2005-06-13 10:38 UTC (permalink / raw)
  To: linux-kernel

Hello.

I have a Great Idea about improving NLS in the linux kernel and I want
somebody with kernel experience consider if it's good or not, just not
to waste time on writing code that will be rejected.

First of all, why do I think the current NLS implementation isn't good enough.

Let's look at a situation. I'm using utf-8 as my default system
charset, and my friend Vasiliy Pupkin, who uses koi8-r, wants to plug
his flash drive (ext3) into my computer. It should work, except all
non-us-ascii filenames will be totally unreadable. The problem is even
bigger if I have an other friend's hard drive with reiserfs and cp1251
encoded filenames on it. The problem is not only with Russian language
for which we have at least 3 common encodings. Everyone who uses
non-us-ascii letters can face the same problem, since there are at
least 2 encodings for theyr language - utf-8 and an other one used
before utf.

Some would suggest not to use non-ascii file names at all, some would
say that I should temporary change my locale, some could even offer me
a perl script they wrote when faced the same problem. All these
solutions are inconvenient and conflict with fundamental VFS concepts.

Instead of adding NLS support to filesystems who don't have it yet, I
think there should be a global NLS layer, to convert file names from
any to any encoding, independent of file system and transparently to
the user.

So what do you think? Is it all nonsense or maybe I should try to implement it?

Please CC me, I'm not subscribed.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 10:38 A Great Idea (tm) about reimplementing NLS Alexey Zaytsev
@ 2005-06-13 10:49 ` Stefan Smietanowski
  2005-06-13 18:01   ` Islam Amer
  2005-06-13 12:05 ` Bernd Petrovitsch
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 70+ messages in thread
From: Stefan Smietanowski @ 2005-06-13 10:49 UTC (permalink / raw)
  To: Alexey Zaytsev; +Cc: linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi.

> I have a Great Idea about improving NLS in the linux kernel and I want
> somebody with kernel experience consider if it's good or not, just not
> to waste time on writing code that will be rejected.
> 
> First of all, why do I think the current NLS implementation isn't good enough.
> 
> Let's look at a situation. I'm using utf-8 as my default system
> charset, and my friend Vasiliy Pupkin, who uses koi8-r, wants to plug
> his flash drive (ext3) into my computer. It should work, except all
> non-us-ascii filenames will be totally unreadable. The problem is even
> bigger if I have an other friend's hard drive with reiserfs and cp1251
> encoded filenames on it. The problem is not only with Russian language
> for which we have at least 3 common encodings. Everyone who uses
> non-us-ascii letters can face the same problem, since there are at
> least 2 encodings for theyr language - utf-8 and an other one used
> before utf.
> 
> Some would suggest not to use non-ascii file names at all, some would
> say that I should temporary change my locale, some could even offer me
> a perl script they wrote when faced the same problem. All these
> solutions are inconvenient and conflict with fundamental VFS concepts.
> 
> Instead of adding NLS support to filesystems who don't have it yet, I
> think there should be a global NLS layer, to convert file names from
> any to any encoding, independent of file system and transparently to
> the user.
> 
> So what do you think? Is it all nonsense or maybe I should try to implement it?

What do you do when a charset doesn't contain a char that another one
does?

Compare the two very similar charsets ISO-8859-1 and ISO-8859-15 and
have the Euro-sign using ISO-8859-15. Then try to make that into
something sane.

Not knocking you or anything, you just have to think about these
pitfalls.

// Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (MingW32)

iD8DBQFCrWSeBrn2kJu9P78RAkZNAKCjRkxx4EnZT+C8wblPB/AH63xz2ACfS4m6
IrVy4TwcwWH2Wm1Va+SN0XI=
=SYdz
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 10:38 A Great Idea (tm) about reimplementing NLS Alexey Zaytsev
  2005-06-13 10:49 ` Stefan Smietanowski
@ 2005-06-13 12:05 ` Bernd Petrovitsch
  2005-06-13 13:54   ` Alexey Zaytsev
  2005-06-13 13:35 ` Alan Cox
  2005-06-15  9:13 ` Denis Vlasenko
  3 siblings, 1 reply; 70+ messages in thread
From: Bernd Petrovitsch @ 2005-06-13 12:05 UTC (permalink / raw)
  To: Alexey Zaytsev; +Cc: linux-kernel

On Mon, 2005-06-13 at 14:38 +0400, Alexey Zaytsev wrote:
[ Filenames with another encoding ]
> Some would suggest not to use non-ascii file names at all, some would
> say that I should temporary change my locale, some could even offer me
> a perl script they wrote when faced the same problem. All these
> solutions are inconvenient and conflict with fundamental VFS concepts.
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In what way?
Basically you just rename the files. How can this conflict with
"fundamental VFS concepts" (and with which).

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 10:38 A Great Idea (tm) about reimplementing NLS Alexey Zaytsev
  2005-06-13 10:49 ` Stefan Smietanowski
  2005-06-13 12:05 ` Bernd Petrovitsch
@ 2005-06-13 13:35 ` Alan Cox
  2005-06-13 17:20   ` Alexey Zaytsev
  2005-06-15  9:13 ` Denis Vlasenko
  3 siblings, 1 reply; 70+ messages in thread
From: Alan Cox @ 2005-06-13 13:35 UTC (permalink / raw)
  To: Alexey Zaytsev; +Cc: Linux Kernel Mailing List

On Llu, 2005-06-13 at 11:38, Alexey Zaytsev wrote:
> Instead of adding NLS support to filesystems who don't have it yet, I
> think there should be a global NLS layer, to convert file names from
> any to any encoding, independent of file system and transparently to
> the user.

Thats essentially what we have. The core OS is UTF-8, the fat and one or
two other legacy file systems support mapping old and/or inferior
encodings into utf-8 (and some other stuff).

Alan


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 12:05 ` Bernd Petrovitsch
@ 2005-06-13 13:54   ` Alexey Zaytsev
  2005-06-13 14:32     ` Bernd Petrovitsch
  0 siblings, 1 reply; 70+ messages in thread
From: Alexey Zaytsev @ 2005-06-13 13:54 UTC (permalink / raw)
  To: Bernd Petrovitsch; +Cc: linux-kernel

On 13/06/05, Bernd Petrovitsch <bernd@firmix.at> wrote:
> On Mon, 2005-06-13 at 14:38 +0400, Alexey Zaytsev wrote:
> [ Filenames with another encoding ]
> > Some would suggest not to use non-ascii file names at all, some would
> > say that I should temporary change my locale, some could even offer me
> > a perl script they wrote when faced the same problem. All these
> > solutions are inconvenient and conflict with fundamental VFS concepts.
>                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> In what way?
> Basically you just rename the files. How can this conflict with
> "fundamental VFS concepts" (and with which).

I can't rename files on Pupkin's drive because he won't like it. ;)
In the case with a flash drive I can copy all the files to my computer
and rename them, but I can't do it with a bigger media like hard disk.

The main idea of VFS is that you can access your files in the same way
on any supported file system. But actually you can't simple access
different-encoded non-ascii files on a filesystem that has no NLS,
like ext or reiser.
 
>        Bernd

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 13:54   ` Alexey Zaytsev
@ 2005-06-13 14:32     ` Bernd Petrovitsch
  2005-06-13 17:38       ` Alexey Zaytsev
  0 siblings, 1 reply; 70+ messages in thread
From: Bernd Petrovitsch @ 2005-06-13 14:32 UTC (permalink / raw)
  To: Alexey Zaytsev; +Cc: linux-kernel

On Mon, 2005-06-13 at 17:54 +0400, Alexey Zaytsev wrote:
> On 13/06/05, Bernd Petrovitsch <bernd@firmix.at> wrote:
> > On Mon, 2005-06-13 at 14:38 +0400, Alexey Zaytsev wrote:
> > [ Filenames with another encoding ]
> > > Some would suggest not to use non-ascii file names at all, some would
> > > say that I should temporary change my locale, some could even offer me
> > > a perl script they wrote when faced the same problem. All these
> > > solutions are inconvenient and conflict with fundamental VFS concepts.
> >                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > In what way?
> > Basically you just rename the files. How can this conflict with
> > "fundamental VFS concepts" (and with which).
> 
> I can't rename files on Pupkin's drive because he won't like it. ;)

.... which has IMHO nothing to do with the VFS (or concepts behind it).

> In the case with a flash drive I can copy all the files to my computer
> and rename them, but I can't do it with a bigger media like hard disk.

You forgot CDs/DVDs and other inherently read-only media with such
strange filenames.

> The main idea of VFS is that you can access your files in the same way
> on any supported file system. But actually you can't simple access
> different-encoded non-ascii files on a filesystem that has no NLS,
> like ext or reiser.

I don't think that any filesystem knows about the encoding of every
filename - after all it is up to the user which encoding he uses for a
given file (and no, no one forces me to use the same encoding on the
names of all of "mine" files).
IOW given a FAT filesystem on an USB stick, which codepage should be
used?

Perhaps it makes sense to start a prototype with a FUSE (or similar)
module. You could use standard libs to convert without messing around in
the kernel (and I don't think someone wants to have an encoding
conversion layer in the kernel).

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 13:35 ` Alan Cox
@ 2005-06-13 17:20   ` Alexey Zaytsev
  2005-06-13 19:20     ` Alan Cox
  0 siblings, 1 reply; 70+ messages in thread
From: Alexey Zaytsev @ 2005-06-13 17:20 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

On 13/06/05, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> On Llu, 2005-06-13 at 11:38, Alexey Zaytsev wrote:
> > Instead of adding NLS support to filesystems who don't have it yet, I
> > think there should be a global NLS layer, to convert file names from
> > any to any encoding, independent of file system and transparently to
> > the user.
> 
> Thats essentially what we have. The core OS is UTF-8, the fat and one or
> two other legacy file systems support mapping old and/or inferior
> encodings into utf-8 (and some other stuff).

Yes, that's how it works, but if I want ext or reiser or whatever to
have NLS, I'll have to make them support it (btw, if I do so, wont it
be rejected?). I want to move the NLS one level upper so the
filesystem imlementations won't have to worry about it any more. I
don't have much kernel experience, and none in the fs area, so I can't
explain it any better, but hope you get the idea.

> Alan
 Thank you answering.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 14:32     ` Bernd Petrovitsch
@ 2005-06-13 17:38       ` Alexey Zaytsev
  2005-06-13 18:58         ` Måns Rullgård
  0 siblings, 1 reply; 70+ messages in thread
From: Alexey Zaytsev @ 2005-06-13 17:38 UTC (permalink / raw)
  To: Bernd Petrovitsch; +Cc: linux-kernel

On 13/06/05, Bernd Petrovitsch <bernd@firmix.at> wrote:
> > The main idea of VFS is that you can access your files in the same way
> > on any supported file system. But actually you can't simple access
> > different-encoded non-ascii files on a filesystem that has no NLS,
> > like ext or reiser.
> 
> I don't think that any filesystem knows about the encoding of every
> filename - after all it is up to the user which encoding he uses for a
> given file (and no, no one forces me to use the same encoding on the
> names of all of "mine" files).
> IOW given a FAT filesystem on an USB stick, which codepage should be
> used?
 
Yes, most if not all filesystems don't have any information about file
names encoding, but the user can often guess it. Hawing files with
differently-encoded names on the same filesystem is nonsense, which
could only appear because of the current NLS misfeatures.

> Perhaps it makes sense to start a prototype with a FUSE (or similar)
> module. You could use standard libs to convert without messing around in
> the kernel (and I don't think someone wants to have an encoding
> conversion layer in the kernel).

We already have it in the kernel, it's called nls and it's up to the
file system implementation to decide to use it or not. I don't
contrive something new, I just want the existing system to work a bit
different.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 10:49 ` Stefan Smietanowski
@ 2005-06-13 18:01   ` Islam Amer
  2005-06-14  9:32     ` Islam Amer
  0 siblings, 1 reply; 70+ messages in thread
From: Islam Amer @ 2005-06-13 18:01 UTC (permalink / raw)
  To: Stefan Smietanowski; +Cc: linux-kernel

Hi.
A related issue I had is that some codepages don't have a nls module
yet. 
For example I once or twice needed to mount a vfat filesystem with
cp1256 ( arabic windows ) charset. There is no such nls module. I
couldn't find any documentation about how to create the module ( maybe I
didn't look hard enough ).



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 17:38       ` Alexey Zaytsev
@ 2005-06-13 18:58         ` Måns Rullgård
  2005-06-14  8:04           ` Alexander E. Patrakov
  0 siblings, 1 reply; 70+ messages in thread
From: Måns Rullgård @ 2005-06-13 18:58 UTC (permalink / raw)
  To: linux-kernel

Alexey Zaytsev <alexey.zaytsev@gmail.com> writes:

> On 13/06/05, Bernd Petrovitsch <bernd@firmix.at> wrote:
>> > The main idea of VFS is that you can access your files in the same way
>> > on any supported file system. But actually you can't simple access
>> > different-encoded non-ascii files on a filesystem that has no NLS,
>> > like ext or reiser.
>> 
>> I don't think that any filesystem knows about the encoding of every
>> filename - after all it is up to the user which encoding he uses for a
>> given file (and no, no one forces me to use the same encoding on the
>> names of all of "mine" files).
>> IOW given a FAT filesystem on an USB stick, which codepage should be
>> used?
>
> Yes, most if not all filesystems don't have any information about file
> names encoding, but the user can often guess it. Hawing files with
> differently-encoded names on the same filesystem is nonsense, which
> could only appear because of the current NLS misfeatures.

Different users of the same system may have perfectly valid reasons to
use different locale settings, and thus different filename encodings.
Forcing one thing or another is just a useless restriction, and
probably not POSIX compliant.

-- 
Måns Rullgård
mru@inprovide.com


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 17:20   ` Alexey Zaytsev
@ 2005-06-13 19:20     ` Alan Cox
  2005-06-13 19:38       ` Måns Rullgård
                         ` (3 more replies)
  0 siblings, 4 replies; 70+ messages in thread
From: Alan Cox @ 2005-06-13 19:20 UTC (permalink / raw)
  To: Alexey Zaytsev; +Cc: Linux Kernel Mailing List

On Llu, 2005-06-13 at 18:20, Alexey Zaytsev wrote:
> Yes, that's how it works, but if I want ext or reiser or whatever to
> have NLS, I'll have to make them support it (btw, if I do so, wont it
> be rejected?). I want to move the NLS one level upper so the
> filesystem imlementations won't have to worry about it any more. I
> don't have much kernel experience, and none in the fs area, so I can't
> explain it any better, but hope you get the idea.

An ext3fs is always utf-8. People might have chosen to put other
encodings on it but thats "not our fault" ;)

There are some good technical reasons too

Encodings don't map 1:1 - two names may cease to be unique

Encodings vary in length - image a file name that is longer than the
allowed maximum on your system with your encoding choice - that could
occur with KOI8-R to UTF-8 I believe

That said it ought to be possible to use the stackable fs work (FUSE
etc) to write a layer you can mount over any fs that does NLS
translation.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 19:20     ` Alan Cox
@ 2005-06-13 19:38       ` Måns Rullgård
  2005-06-13 20:31       ` Rutger Nijlunsing
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 70+ messages in thread
From: Måns Rullgård @ 2005-06-13 19:38 UTC (permalink / raw)
  To: linux-kernel

Alan Cox <alan@lxorguk.ukuu.org.uk> writes:

> On Llu, 2005-06-13 at 18:20, Alexey Zaytsev wrote:
>> Yes, that's how it works, but if I want ext or reiser or whatever to
>> have NLS, I'll have to make them support it (btw, if I do so, wont it
>> be rejected?). I want to move the NLS one level upper so the
>> filesystem imlementations won't have to worry about it any more. I
>> don't have much kernel experience, and none in the fs area, so I can't
>> explain it any better, but hope you get the idea.
>
> An ext3fs is always utf-8. People might have chosen to put other
> encodings on it but thats "not our fault" ;)

I was of the impression that most filesystems (ext3 included) treated
file names as a sequence of bytes, and didn't care about encoding.
Please correct me if I am wrong.

-- 
Måns Rullgård
mru@inprovide.com


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 19:20     ` Alan Cox
  2005-06-13 19:38       ` Måns Rullgård
@ 2005-06-13 20:31       ` Rutger Nijlunsing
  2005-06-15 20:50       ` Alexey Zaytsev
  2005-06-16  1:49       ` Patrick McFarland
  3 siblings, 0 replies; 70+ messages in thread
From: Rutger Nijlunsing @ 2005-06-13 20:31 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alexey Zaytsev, Linux Kernel Mailing List

On Mon, Jun 13, 2005 at 08:20:53PM +0100, Alan Cox wrote:
> On Llu, 2005-06-13 at 18:20, Alexey Zaytsev wrote:
> > Yes, that's how it works, but if I want ext or reiser or whatever to
> > have NLS, I'll have to make them support it (btw, if I do so, wont it
> > be rejected?). I want to move the NLS one level upper so the
> > filesystem imlementations won't have to worry about it any more. I
> > don't have much kernel experience, and none in the fs area, so I can't
> > explain it any better, but hope you get the idea.
> 
> An ext3fs is always utf-8. People might have chosen to put other
> encodings on it but thats "not our fault" ;)
> 
> There are some good technical reasons too
> 
> Encodings don't map 1:1 - two names may cease to be unique
> 
> Encodings vary in length - image a file name that is longer than the
> allowed maximum on your system with your encoding choice - that could
> occur with KOI8-R to UTF-8 I believe
> 
> That said it ought to be possible to use the stackable fs work (FUSE
> etc) to write a layer you can mount over any fs that does NLS
> translation.

Or just make a symbolic linked shadow FS with translated filenames
(UNTESTED):

cd /tmp
cp -src /mnt/problem_dir .
find problem_dir -exec bash -c "mv \'{}\' \'$(echo {} | iconv -f KOI8-R -t UTF-8)\'" \;

-- 
Rutger Nijlunsing ---------------------------------- eludias ed dse.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 18:58         ` Måns Rullgård
@ 2005-06-14  8:04           ` Alexander E. Patrakov
  2005-06-14  9:05             ` Måns Rullgård
  0 siblings, 1 reply; 70+ messages in thread
From: Alexander E. Patrakov @ 2005-06-14  8:04 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: linux-kernel

Måns Rullgård wrote:
> Different users of the same system may have perfectly valid reasons to
> use different locale settings, and thus different filename encodings.
> Forcing one thing or another is just a useless restriction, and
> probably not POSIX compliant.

I agree.

Although some people (like glib2 developers) try to say that filenames should 
be in UTF-8, this doesn't work, just because the "ls" command assumes that 
they are in the locale charset. Please fix glibc and/or coreutils and all 
other programs first.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-14  8:04           ` Alexander E. Patrakov
@ 2005-06-14  9:05             ` Måns Rullgård
  2005-06-15  8:26               ` Lukasz Stelmach
  0 siblings, 1 reply; 70+ messages in thread
From: Måns Rullgård @ 2005-06-14  9:05 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: linux-kernel

"Alexander E. Patrakov" <patrakov@ums.usu.ru> writes:

> Måns Rullgård wrote:
>> Different users of the same system may have perfectly valid reasons to
>> use different locale settings, and thus different filename encodings.
>> Forcing one thing or another is just a useless restriction, and
>> probably not POSIX compliant.
>
> I agree.
>
> Although some people (like glib2 developers) try to say that
> filenames should be in UTF-8, this doesn't work, just because the

IMHO, the glib developers are clueless, and the GNOME crew even more
so.  I remember when the gtk file selection dialog stopped displaying
files with "bad" names, unless I set some wacky undocumented
environment variable first.

> "ls" command assumes that they are in the locale charset. Please fix
> glibc and/or coreutils and all other programs first.

I use utf-8 exclusively for my filenames (the few that are not 7-bit
ascii).  Forcing others who use the system to do the same would cause
them a lot of trouble, as they must transfer files to and from Windows
machines that use anything but utf-8.  The result is that some
filenames are in utf-8, some iso-8859-1, and some euc-kr.  As long as
these stay in each users' home directory, it all works quite well,
though.

-- 
Måns Rullgård
mru@inprovide.com

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 18:01   ` Islam Amer
@ 2005-06-14  9:32     ` Islam Amer
  2005-06-14 10:10       ` Måns Rullgård
  0 siblings, 1 reply; 70+ messages in thread
From: Islam Amer @ 2005-06-14  9:32 UTC (permalink / raw)
  To: linux-kernel

Hi.
A related issue I had is that some codepages don't have a nls module
yet. 
For example I once or twice needed to mount a vfat filesystem with
cp1256 ( arabic windows ) charset. There is no such nls module. I
couldn't find any documentation about how to create the module ( maybe I
didn't look hard enough ).
Therefore a filesystem used by old windows versions having arabic names
is unusable.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-14  9:32     ` Islam Amer
@ 2005-06-14 10:10       ` Måns Rullgård
  2005-06-14 15:28         ` Islam Amer
  0 siblings, 1 reply; 70+ messages in thread
From: Måns Rullgård @ 2005-06-14 10:10 UTC (permalink / raw)
  To: linux-kernel

Islam Amer <pharon@gmail.com> writes:

> Hi.
> A related issue I had is that some codepages don't have a nls module
> yet. 
> For example I once or twice needed to mount a vfat filesystem with
> cp1256 ( arabic windows ) charset. There is no such nls module. I
> couldn't find any documentation about how to create the module ( maybe I
> didn't look hard enough ).
> Therefore a filesystem used by old windows versions having arabic names
> is unusable.

Did you try looking at some of the already existing ones?

-- 
Måns Rullgård
mru@inprovide.com


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-14 10:10       ` Måns Rullgård
@ 2005-06-14 15:28         ` Islam Amer
  0 siblings, 0 replies; 70+ messages in thread
From: Islam Amer @ 2005-06-14 15:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: mru


Yes, and couldn't figure out how they were generated, newbie here. I
read somewhere the character tables get generated automatically from a
UTF8 reference, but can't find that article now.

I am ready to invest time in trying to create (shudder) the module and
testing if it fixes the problem I describe, if someone could give me a
pointer.

Thanks.


	


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-14  9:05             ` Måns Rullgård
@ 2005-06-15  8:26               ` Lukasz Stelmach
  2005-06-15  8:54                 ` Patrick McFarland
  0 siblings, 1 reply; 70+ messages in thread
From: Lukasz Stelmach @ 2005-06-15  8:26 UTC (permalink / raw)
  To: Måns Rullgård; +Cc: Alexander E. Patrakov, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 518 bytes --]

Måns Rullgård napisał(a):

> I use utf-8 exclusively for my filenames (the few that are not 7-bit
> ascii).  Forcing others who use the system to do the same would cause
> them a lot of trouble, as they must transfer files to and from Windows
> machines that use anything but utf-8.

But VFAT (and NTFS???) use unicode, i.e. UTF-16 (???). AFAIK

-- 
Było mi bardzo miło.                    Trzecia pospolita klęska, [...]
>Łukasz<                      Już nie katolicka lecz złodziejska.  (c)PP


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15  8:26               ` Lukasz Stelmach
@ 2005-06-15  8:54                 ` Patrick McFarland
  2005-06-15  9:14                   ` Lukasz Stelmach
  0 siblings, 1 reply; 70+ messages in thread
From: Patrick McFarland @ 2005-06-15  8:54 UTC (permalink / raw)
  To: Lukasz Stelmach
  Cc: Måns Rullgård, Alexander E. Patrakov, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1671 bytes --]

On Wednesday 15 June 2005 04:26 am, Lukasz Stelmach wrote:
> Måns Rullgård napisał(a):
> > I use utf-8 exclusively for my filenames (the few that are not 7-bit
> > ascii).  Forcing others who use the system to do the same would cause
> > them a lot of trouble, as they must transfer files to and from Windows
> > machines that use anything but utf-8.
>
> But VFAT (and NTFS???) use unicode, i.e. UTF-16 (???). AFAIK

No, VFAT and NTFS use an 8-bit encoding, and I think its only NTFS5 that is 
forced to only use UTF-8 as the encoding (vfat can use any 8-bit encoding, 
and NTFS4 afaik can as well, and I don't think NTFS5 uses UTF-16 internally),

Forcing people to use unicode isn't a bad thing btw, especially since it is a 
culture agnostic encoding that can represent wide characters (eg. from Asian 
languages) in a uniform manner*, and allowing to use multiple languages (eg. 
Chinese and Japanese) at once without needing to switch encodings.

This also means it fixes the 'bug' where you have multiple encodings for the 
same language (ie. JIS, SJIS, and EUC_JP; similarly, simplified Chinese has 5 
popular encoding methods, and traditional Chinese as three), which allows 
easier sharing of data between users without needing to muck with encoding.

* Unicode can do so in UTF-8 as well as UTF-16 to describe the entire 20-bit 
space.

-- 
Patrick "Diablo-D3" McFarland || pmcfarland@downeast.net
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 10:38 A Great Idea (tm) about reimplementing NLS Alexey Zaytsev
                   ` (2 preceding siblings ...)
  2005-06-13 13:35 ` Alan Cox
@ 2005-06-15  9:13 ` Denis Vlasenko
  2005-06-16  1:55   ` Patrick McFarland
  3 siblings, 1 reply; 70+ messages in thread
From: Denis Vlasenko @ 2005-06-15  9:13 UTC (permalink / raw)
  To: Alexey Zaytsev, linux-kernel

On Monday 13 June 2005 13:38, Alexey Zaytsev wrote:
> Instead of adding NLS support to filesystems who don't have it yet, I
> think there should be a global NLS layer, to convert file names from
> any to any encoding, independent of file system and transparently to
> the user.
> 
> So what do you think? Is it all nonsense or maybe I should try to implement it?

I do not understand how this is going to look from userspace perspective.
Can you give examples how this will work?
--
vda


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15  8:54                 ` Patrick McFarland
@ 2005-06-15  9:14                   ` Lukasz Stelmach
  2005-06-15  9:41                     ` Måns Rullgård
  0 siblings, 1 reply; 70+ messages in thread
From: Lukasz Stelmach @ 2005-06-15  9:14 UTC (permalink / raw)
  To: Patrick McFarland
  Cc: Måns Rullgård, Alexander E. Patrakov, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1785 bytes --]

Patrick McFarland napisał(a):
> On Wednesday 15 June 2005 04:26 am, Lukasz Stelmach wrote:
> 
>>Måns Rullgård napisał(a):
>>
>>>I use utf-8 exclusively for my filenames (the few that are not 7-bit
>>>ascii).  Forcing others who use the system to do the same would cause
>>>them a lot of trouble, as they must transfer files to and from Windows
>>>machines that use anything but utf-8.
>>
>>But VFAT (and NTFS???) use unicode, i.e. UTF-16 (???). AFAIK
> 
> No, VFAT and NTFS use an 8-bit encoding,

I meant that they don't use utf-8 but it is still the unicode. I am not
sure i've made myself clear.

> Forcing people to use unicode isn't a bad thing btw, especially since
> it is a culture agnostic encoding that can represent wide characters
> (eg. from Asian languages) in a uniform manner*, and allowing to use
> multiple languages (eg. Chinese and Japanese) at once without needing
> to switch encodings.

Yes. I also think UTF-8 is a good idea, however it is not an ideal one.
It *preferes* Roman encodings since some Asian characters need even four
bytes.

IMHO for *every* filesystem there need to be an *option* to:

1. store filenames in utf-8 (that is quite possible today) or any other
unicode form.
2. convert them to/from a desired iocharset. I prefere using ISO-8859-2
on my system for not every tool support utf-8 today (hopefuly yet).

Of course if a user whishes to store filenames in some other encoding
she should be *able* to do so (that is why i like linux).

Generally. IMHO VFAT is a good example how character encoding needs to
be handeled.

Best regards.
-- 
Było mi bardzo miło.                    Trzecia pospolita klęska, [...]
>Łukasz<                      Już nie katolicka lecz złodziejska.  (c)PP


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15  9:14                   ` Lukasz Stelmach
@ 2005-06-15  9:41                     ` Måns Rullgård
  2005-06-15 14:52                       ` Lukasz Stelmach
  0 siblings, 1 reply; 70+ messages in thread
From: Måns Rullgård @ 2005-06-15  9:41 UTC (permalink / raw)
  To: Lukasz Stelmach; +Cc: Patrick McFarland, Alexander E. Patrakov, linux-kernel

Lukasz Stelmach <stlman@poczta.fm> writes:

> Patrick McFarland napisał(a):
>> On Wednesday 15 June 2005 04:26 am, Lukasz Stelmach wrote:
>> 
>>>Måns Rullgård napisał(a):
>>>
>>>>I use utf-8 exclusively for my filenames (the few that are not 7-bit
>>>>ascii).  Forcing others who use the system to do the same would cause
>>>>them a lot of trouble, as they must transfer files to and from Windows
>>>>machines that use anything but utf-8.
>>>
>>>But VFAT (and NTFS???) use unicode, i.e. UTF-16 (???). AFAIK
>> 
>> No, VFAT and NTFS use an 8-bit encoding,
>
> I meant that they don't use utf-8 but it is still the unicode. I am not
> sure i've made myself clear.
>
>> Forcing people to use unicode isn't a bad thing btw, especially since
>> it is a culture agnostic encoding that can represent wide characters
>> (eg. from Asian languages) in a uniform manner*, and allowing to use
>> multiple languages (eg. Chinese and Japanese) at once without needing
>> to switch encodings.
>
> Yes. I also think UTF-8 is a good idea, however it is not an ideal one.
> It *preferes* Roman encodings since some Asian characters need even four
> bytes.

That's a simple consequence of having an alphabet with thousands of
characters.  Besides, fewer of the Asian characters are required to
represent the same meaning, so the byte count should come out about
the same.

> IMHO for *every* filesystem there need to be an *option* to:
>
> 1. store filenames in utf-8 (that is quite possible today) or any other
> unicode form.

export LC_CTYPE=whatever.utf-8

> 2. convert them to/from a desired iocharset. I prefere using ISO-8859-2
> on my system for not every tool support utf-8 today (hopefuly yet).

man iconv

> Of course if a user whishes to store filenames in some other encoding
> she should be *able* to do so (that is why i like linux).

That's the current situation.

> Generally. IMHO VFAT is a good example how character encoding needs
> to be handeled.

IMHO, VFAT is only a good example of bad design.

-- 
Måns Rullgård
mru@inprovide.com

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15  9:41                     ` Måns Rullgård
@ 2005-06-15 14:52                       ` Lukasz Stelmach
  2005-06-15 21:28                         ` Lennart Sorensen
  0 siblings, 1 reply; 70+ messages in thread
From: Lukasz Stelmach @ 2005-06-15 14:52 UTC (permalink / raw)
  To: mru; +Cc: Patrick McFarland, Alexander E. Patrakov, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1741 bytes --]

Måns Rullgård napisał(a):

>>IMHO for *every* filesystem there need to be an *option* to:
>>
>>1. store filenames in utf-8 (that is quite possible today) or any other
>>unicode form.
> 
> export LC_CTYPE=whatever.utf-8

Translate and store them in utf-8 at kernel level same as VFAT mounted
with iocharset option.

>>2. convert them to/from a desired iocharset. I prefere using ISO-8859-2
>>on my system for not every tool support utf-8 today (hopefuly yet).
> 
> man iconv

There are far more programmes than only iconv. First of all readline
library is kind of broken because it counts (or at least it did a year
ago) bytes instead of characters. I won't use UTF-8 nor force anybody
else to do so until readline will handle it properly.


>>Of course if a user whishes to store filenames in some other encoding
>>she should be *able* to do so (that is why i like linux).
> 
> That's the current situation.

And it is good in a way, however, i think kernel level translation
should be also possible. Either done by a code in each filsystem or by
some layer above it.

>>Generally. IMHO VFAT is a good example how character encoding needs
>>to be handeled.
> 
> IMHO, VFAT is only a good example of bad design.

It depend's on what it is used for. It is very good fs for removable
media. None of linux native filesystems is good for this because of
different uids on different machines. Since VFAT uses unicode it is
possible to see the filenames properly on systems using different
codepages for the same language (1:1 is possible).

-- 
Było mi bardzo miło.                    Trzecia pospolita klęska, [...]
>Łukasz<                      Już nie katolicka lecz złodziejska.  (c)PP


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 19:20     ` Alan Cox
  2005-06-13 19:38       ` Måns Rullgård
  2005-06-13 20:31       ` Rutger Nijlunsing
@ 2005-06-15 20:50       ` Alexey Zaytsev
  2005-06-16  1:52         ` Patrick McFarland
  2005-06-16  1:49       ` Patrick McFarland
  3 siblings, 1 reply; 70+ messages in thread
From: Alexey Zaytsev @ 2005-06-15 20:50 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linux Kernel Mailing List

On 13/06/05, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> On Llu, 2005-06-13 at 18:20, Alexey Zaytsev wrote:
> > Yes, that's how it works, but if I want ext or reiser or whatever to
> > have NLS, I'll have to make them support it (btw, if I do so, wont it
> > be rejected?). I want to move the NLS one level upper so the
> > filesystem imlementations won't have to worry about it any more. I
> > don't have much kernel experience, and none in the fs area, so I can't
> > explain it any better, but hope you get the idea.
> 
> An ext3fs is always utf-8. People might have chosen to put other
> encodings on it but thats "not our fault" ;)
> 
> There are some good technical reasons too
> 
> Encodings don't map 1:1 - two names may cease to be unique
> 
> Encodings vary in length - image a file name that is longer than the
> allowed maximum on your system with your encoding choice - that could
> occur with KOI8-R to UTF-8 I believe

> 
> That said it ought to be possible to use the stackable fs work (FUSE
> etc) to write a layer you can mount over any fs that does NLS
> translation.

Now I quite agree that it isn't a Great Idea to do such conversion in
the kernel, but the problem still remains and there is no other place
we can do it. I belive that it should be done now and removed after
the world finishes to move to utf. Maybe it should not be applyed to
the main kernel tree, but I'm sure that at least Russian linux
distributions will like it.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15 14:52                       ` Lukasz Stelmach
@ 2005-06-15 21:28                         ` Lennart Sorensen
  2005-06-15 23:34                           ` Lukasz Stelmach
  2005-06-16  1:42                           ` Patrick McFarland
  0 siblings, 2 replies; 70+ messages in thread
From: Lennart Sorensen @ 2005-06-15 21:28 UTC (permalink / raw)
  To: Lukasz Stelmach
  Cc: mru, Patrick McFarland, Alexander E. Patrakov, linux-kernel

On Wed, Jun 15, 2005 at 04:52:00PM +0200, Lukasz Stelmach wrote:
> There are far more programmes than only iconv. First of all readline
> library is kind of broken because it counts (or at least it did a year
> ago) bytes instead of characters. I won't use UTF-8 nor force anybody
> else to do so until readline will handle it properly.

Well utf8 would sure be nice if everything used it, and getting nice
fonts for it was simple.

> And it is good in a way, however, i think kernel level translation
> should be also possible. Either done by a code in each filsystem or by
> some layer above it.

What do you do if the underlying filesystem can not store some unicode
characters that are allowed on others?

> It depend's on what it is used for. It is very good fs for removable
> media. None of linux native filesystems is good for this because of
> different uids on different machines. Since VFAT uses unicode it is
> possible to see the filenames properly on systems using different
> codepages for the same language (1:1 is possible).

VFAT uses unicode?  I thought it used the same codepage silyness as FAT
did, since after all ti was just supposed to be a long filename
extension to FAT.  Do they use unicode in the long filenames only?

I think UDF is a better filesystem for many types of media since it is
able to me more gently to the sectors storing the meta data than VFAT
ever will be.

Len Sorensen

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15 21:28                         ` Lennart Sorensen
@ 2005-06-15 23:34                           ` Lukasz Stelmach
  2005-06-16  1:44                             ` Patrick McFarland
                                               ` (2 more replies)
  2005-06-16  1:42                           ` Patrick McFarland
  1 sibling, 3 replies; 70+ messages in thread
From: Lukasz Stelmach @ 2005-06-15 23:34 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: mru, Patrick McFarland, Alexander E. Patrakov, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1856 bytes --]

Lennart Sorensen napisał(a):

>>And it is good in a way, however, i think kernel level translation
>>should be also possible. Either done by a code in each filsystem or by
>>some layer above it.
> 
> What do you do if the underlying filesystem can not store some unicode
> characters that are allowed on others?

That's why UTF-8 is suggested. UTF-8 has been developed to "fool" the
software that need not to be aware of unicodeness of the text it manages
to handle it without any hickups *and* to store in the text information
about multibyte characters.What characters exactly you do mean? NULL?
There is no NULL byte in any UTF-8 string except the one which
terminates it.

> VFAT uses unicode?  I thought it used the same codepage silyness as FAT
> did, since after all ti was just supposed to be a long filename
> extension to FAT.  Do they use unicode in the long filenames only?

Yes, it uses unicode. And dos codepages in short ones. To prove this
take a vfat floppy and mount it. touch(1) a file on it that has some
non latin1 characters. Unmount the floppy then do dd if=/dev/fd0
of=/tmp/floppy bs=1024 count=512. While it's done take some hex
editor/viewer and seek the latin1-complaint part of the filename
in the floppy file (search for uppercase string). Righ above the short
filename you'll find multibyte long one.

> I think UDF is a better filesystem for many types of media since it is
> able to me more gently to the sectors storing the meta data than VFAT
> ever will be.

I've tried cd packet writing with UDF and it gives insane overhead of
about 20%. What metadata you'd like to store for example on your
flashdrive or a floppy disk?

-- 
Było mi bardzo miło.                    Trzecia pospolita klęska, [...]
>Łukasz<                      Już nie katolicka lecz złodziejska.  (c)PP


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15 21:28                         ` Lennart Sorensen
  2005-06-15 23:34                           ` Lukasz Stelmach
@ 2005-06-16  1:42                           ` Patrick McFarland
  1 sibling, 0 replies; 70+ messages in thread
From: Patrick McFarland @ 2005-06-16  1:42 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Lukasz Stelmach, mru, Alexander E. Patrakov, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1597 bytes --]

On Wednesday 15 June 2005 05:28 pm, Lennart Sorensen wrote:
> What do you do if the underlying filesystem can not store some unicode
> characters that are allowed on others?

Um, thats impossible, unless you're implying something like the file system 
not being 8-bit safe. The only thing UTF-8 does is store data in bytes, it 
doesn't need any real support from the file system.

> > It depend's on what it is used for. It is very good fs for removable
> > media. None of linux native filesystems is good for this because of
> > different uids on different machines. Since VFAT uses unicode it is
> > possible to see the filenames properly on systems using different
> > codepages for the same language (1:1 is possible).

> VFAT uses unicode?  I thought it used the same codepage silyness as FAT
> did, since after all ti was just supposed to be a long filename
> extension to FAT.  Do they use unicode in the long filenames only?

I mentioned earlier that VFAT uses 8-bit encodings, none of them (supported by 
Windows, at least) are Unicode.

> I think UDF is a better filesystem for many types of media since it is
> able to me more gently to the sectors storing the meta data than VFAT
> ever will be.

I agree. UDF is the true successor to the portable media throne.

-- 
Patrick "Diablo-D3" McFarland || pmcfarland@downeast.net
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15 23:34                           ` Lukasz Stelmach
@ 2005-06-16  1:44                             ` Patrick McFarland
  2005-06-16 10:38                               ` Måns Rullgård
                                                 ` (2 more replies)
  2005-06-16  9:40                             ` Måns Rullgård
  2005-06-16 13:39                             ` Lennart Sorensen
  2 siblings, 3 replies; 70+ messages in thread
From: Patrick McFarland @ 2005-06-16  1:44 UTC (permalink / raw)
  To: Lukasz Stelmach
  Cc: Lennart Sorensen, mru, Alexander E. Patrakov, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1190 bytes --]

On Wednesday 15 June 2005 07:34 pm, Lukasz Stelmach wrote:
> That's why UTF-8 is suggested. UTF-8 has been developed to "fool" the
> software that need not to be aware of unicodeness of the text it manages
> to handle it without any hickups *and* to store in the text information
> about multibyte characters.What characters exactly you do mean? NULL?
> There is no NULL byte in any UTF-8 string except the one which
> terminates it.

Bingo. Only the operating system itself and software displaying filenames 
needs to understand Unicode; the file system implementation itself just knows 
its a string of bytes and nothing else.

> I've tried cd packet writing with UDF and it gives insane overhead of
> about 20%. What metadata you'd like to store for example on your
> flashdrive or a floppy disk?

Uh, 20%? That sounds awfully high. You sure you didn't do something wrong?

-- 
Patrick "Diablo-D3" McFarland || pmcfarland@downeast.net
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-13 19:20     ` Alan Cox
                         ` (2 preceding siblings ...)
  2005-06-15 20:50       ` Alexey Zaytsev
@ 2005-06-16  1:49       ` Patrick McFarland
  2005-06-16  2:36         ` Theodore Ts'o
  3 siblings, 1 reply; 70+ messages in thread
From: Patrick McFarland @ 2005-06-16  1:49 UTC (permalink / raw)
  To: Alan Cox; +Cc: Alexey Zaytsev, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1230 bytes --]

On Monday 13 June 2005 03:20 pm, Alan Cox wrote:
> An ext3fs is always utf-8. People might have chosen to put other
> encodings on it but thats "not our fault" ;)

What happens if you 'field upgrade' ext2 to ext3 by adding a journal? That 
doesn't magically convert !utf-8 to utf-8.

> There are some good technical reasons too
>
> Encodings don't map 1:1 - two names may cease to be unique

Hold up. Unless the original encoding is 'wrong' and has two mapped characters 
that, in reality, are the same character, no such uniqueness should stop. 
(This implies the encoding that we switched to 'fixed' said 'bug')

> Encodings vary in length - image a file name that is longer than the
> allowed maximum on your system with your encoding choice - that could
> occur with KOI8-R to UTF-8 I believe

Thats a fault of the file system design, not of the encoding. File systems 
should not have very short filenames.

-- 
Patrick "Diablo-D3" McFarland || pmcfarland@downeast.net
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15 20:50       ` Alexey Zaytsev
@ 2005-06-16  1:52         ` Patrick McFarland
  2005-06-16 10:14           ` Alexey Zaytsev
  0 siblings, 1 reply; 70+ messages in thread
From: Patrick McFarland @ 2005-06-16  1:52 UTC (permalink / raw)
  To: Alexey Zaytsev; +Cc: Alan Cox, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1342 bytes --]

On Wednesday 15 June 2005 04:50 pm, Alexey Zaytsev wrote:
> On 13/06/05, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > That said it ought to be possible to use the stackable fs work (FUSE
> > etc) to write a layer you can mount over any fs that does NLS
> > translation.
>
> Now I quite agree that it isn't a Great Idea to do such conversion in
> the kernel, but the problem still remains and there is no other place
> we can do it. I belive that it should be done now and removed after
> the world finishes to move to utf. Maybe it should not be applyed to
> the main kernel tree, but I'm sure that at least Russian linux
> distributions will like it.

I partially agree. I think no userland application should have access to the 
un-'fixed' file names; they should be fed only Unicode to prevent the spread 
and acceptance of out of date encodings.

Forcing users to do smart things is often the only way to make them do smart 
things, and the lack of acceptance of Unicode on Linux in the wild seems to 
be the only way.

-- 
Patrick "Diablo-D3" McFarland || pmcfarland@downeast.net
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15  9:13 ` Denis Vlasenko
@ 2005-06-16  1:55   ` Patrick McFarland
  2005-06-16  3:59     ` [RFC] Filesystem name storage (Was: A Great Idea (tm) about reimplementing NLS.) Kyle Moffett
  0 siblings, 1 reply; 70+ messages in thread
From: Patrick McFarland @ 2005-06-16  1:55 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: Alexey Zaytsev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 858 bytes --]

On Wednesday 15 June 2005 05:13 am, Denis Vlasenko wrote:
> I do not understand how this is going to look from userspace perspective.
> Can you give examples how this will work?

IMHO, he means that the userspace would only see Unicode filenames, and the 
userspace could only give Unicode names back to the kernel. The kernel, using 
this global NLS layer would translate back and forth, and the userland 
wouldn't know about it.

Its basically the only sane way to approach the problem of getting the entire 
Linux community to convert to Unicode.

-- 
Patrick "Diablo-D3" McFarland || pmcfarland@downeast.net
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16  1:49       ` Patrick McFarland
@ 2005-06-16  2:36         ` Theodore Ts'o
  2005-06-16  2:59           ` Patrick McFarland
  2005-06-16  4:33           ` Jeremy Maitin-Shepard
  0 siblings, 2 replies; 70+ messages in thread
From: Theodore Ts'o @ 2005-06-16  2:36 UTC (permalink / raw)
  To: Patrick McFarland; +Cc: Alan Cox, Alexey Zaytsev, Linux Kernel Mailing List

On Wed, Jun 15, 2005 at 09:49:05PM -0400, Patrick McFarland wrote:
> On Monday 13 June 2005 03:20 pm, Alan Cox wrote:
> > An ext3fs is always utf-8. People might have chosen to put other
> > encodings on it but thats "not our fault" ;)
> 
> What happens if you 'field upgrade' ext2 to ext3 by adding a journal? That 
> doesn't magically convert !utf-8 to utf-8.

Ext2/3's encoding has always been utf-8.  Period.

There have been some people who have chosen to do something else
locally, but that was about as valid as the people who violated SMTP
standards by Just Sending 8-bits instead of using MIME.

							- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16  2:36         ` Theodore Ts'o
@ 2005-06-16  2:59           ` Patrick McFarland
  2005-06-16  4:33           ` Jeremy Maitin-Shepard
  1 sibling, 0 replies; 70+ messages in thread
From: Patrick McFarland @ 2005-06-16  2:59 UTC (permalink / raw)
  To: Theodore Ts'o; +Cc: Alan Cox, Alexey Zaytsev, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 723 bytes --]

On Wednesday 15 June 2005 10:36 pm, Theodore Ts'o wrote:
> Ext2/3's encoding has always been utf-8.  Period.
>
> There have been some people who have chosen to do something else
> locally, but that was about as valid as the people who violated SMTP
> standards by Just Sending 8-bits instead of using MIME.

Ahh. Whoever made that choice way back at the beginning of Ext2 development 
rocks. Whoever you are, Thanks.

-- 
Patrick "Diablo-D3" McFarland || pmcfarland@downeast.net
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [RFC] Filesystem name storage (Was: A Great Idea (tm) about reimplementing NLS.)
  2005-06-16  1:55   ` Patrick McFarland
@ 2005-06-16  3:59     ` Kyle Moffett
  2005-06-18 15:24       ` Lukasz Stelmach
  0 siblings, 1 reply; 70+ messages in thread
From: Kyle Moffett @ 2005-06-16  3:59 UTC (permalink / raw)
  To: Patrick McFarland; +Cc: Denis Vlasenko, Alexey Zaytsev, linux-kernel

On Jun 15, 2005, at 21:55:04, Patrick McFarland wrote:
> On Wednesday 15 June 2005 05:13 am, Denis Vlasenko wrote:
>> I do not understand how this is going to look from userspace  
>> perspective.
>> Can you give examples how this will work?
>
> IMHO, he means that the userspace would only see Unicode filenames,  
> and the
> userspace could only give Unicode names back to the kernel. The  
> kernel, using
> this global NLS layer would translate back and forth, and the userland
> wouldn't know about it.
>
> Its basically the only sane way to approach the problem of getting  
> the entire
> Linux community to convert to Unicode.

Would the following system for filenames resolve most of the issues  
people
are raising:

First load charset tables into the kernel.  These would be stored in  
files in
userspace and could be easily updated, renamed, deleted, etc.  Such a  
table
would always be a translation from Unicode <=> Charset.  A kernel  
with this
system built in would understand natively "raw", "utf8", "utf16", and  
"utf32",
anything else would need loaded charset tables.

The following mount options would available:
   nls_raw=(0|1)  [default 1]:
     This would cause Linux to pass all chars through unmolested.   
This mode
     works well on multiuser systems where users want to use their  
own NLS
     tools, or where the whole system uses UTF-8, including the  
filesystems.
     This is backwards compatible with the way Linux currently  
presents most
     (all?) filesystems.  If the options "nls_disk" or "nls_user" are  
used,
     then this option is forced to be zero.
   nls_disk=<string-charset>
     This specifies the underlying charset which should be used on  
the disk
     or filesystem itself.  This may be "negotiate" for any filesystems
     which support NLS *and* can identify which charset is in use.   
Built in
     options are "utf8", "utf16", and "utf32".  Defaults to  
"negotiate" if
     available otherwise "utf8", but only defaults if "nls_raw" is 0.
   nls_user=<string-charset>
     This specifies the charset which should be presented to the  
user.  This
     may be used to allow a backwards compatibility (IE: A program wants
     ISO8859-1, but the admin wants the underlying filesystem to use  
UTF-8.
     Built in options are "utf8", "utf16", and "utf32".  Defaults to  
"utf8"
     if "nls_raw" is 0.

The end result is that specifying either nls_disk or nls_user will  
turn on
automatic NLS conversion, with the unspecified nls_ option being utf8.

If these options are used on bind mounts, they should override the  
underlying
filesystem's mount options (Instead of stacking).  This will allow  
the admin
to specify:

# mount -t ext3 -o nls_disk=utf8,nls_user=utf8 /dev/hdb /mnt
# mount --bind  -o nls_disk=utf8,nls_user=iso8850-1 /mnt/mail /var/ 
spool/mail

if he/she wants to provide backwards compatibility with a legacy mail
spooling program.  Note: A part of each translation table would be an
entry for "Unspecified character", such that any UTF-8 character not  
mapped
in the table could be translated to a sane default, such as '?'.  If  
names
collide under such translation, the kernel would need a way to keep  
track of
the collisions (Appended numbers?) and properly re-resolve them when  
asked.

Cheers,
Kyle Moffett

-----BEGIN GEEK CODE BLOCK-----
Version: 3.12
GCM/CS/IT/U d- s++: a18 C++++>$ UB/L/X/*++++(+)>$ P+++(++++)>$
L++++(+++) E W++(+) N+++(++) o? K? w--- O? M++ V? PS+() PE+(-) Y+
PGP+++ t+(+++) 5 X R? tv-(--) b++++(++) DI+ D+ G e->++++$ h!*()>++$  
r  !y?(-)
------END GEEK CODE BLOCK------


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16  2:36         ` Theodore Ts'o
  2005-06-16  2:59           ` Patrick McFarland
@ 2005-06-16  4:33           ` Jeremy Maitin-Shepard
  2005-06-16 14:37             ` Theodore Ts'o
  1 sibling, 1 reply; 70+ messages in thread
From: Jeremy Maitin-Shepard @ 2005-06-16  4:33 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: Patrick McFarland, Alan Cox, Alexey Zaytsev, Linux Kernel Mailing List

"Theodore Ts'o" <tytso@mit.edu> writes:

> [snip]

> Ext2/3's encoding has always been utf-8.  Period.

In what way does Ext2/3 know or care about file name encoding?  Doesn't
it just store an arbitrary 8-byte string?  Couldn't someone claim that
from the start it was designed to use iso8859-1 just as easily as you
can claim it was designed to use utf-8?

-- 
Jeremy Maitin-Shepard

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15 23:34                           ` Lukasz Stelmach
  2005-06-16  1:44                             ` Patrick McFarland
@ 2005-06-16  9:40                             ` Måns Rullgård
  2005-06-18 14:48                               ` Lukasz Stelmach
  2005-06-16 13:39                             ` Lennart Sorensen
  2 siblings, 1 reply; 70+ messages in thread
From: Måns Rullgård @ 2005-06-16  9:40 UTC (permalink / raw)
  To: Lukasz Stelmach
  Cc: Lennart Sorensen, Patrick McFarland, Alexander E. Patrakov, linux-kernel

Lukasz Stelmach <stlman@poczta.fm> writes:

> Lennart Sorensen napisał(a):
>
>>>And it is good in a way, however, i think kernel level translation
>>>should be also possible. Either done by a code in each filsystem or by
>>>some layer above it.
>> 
>> What do you do if the underlying filesystem can not store some unicode
>> characters that are allowed on others?
>
> That's why UTF-8 is suggested. UTF-8 has been developed to "fool" the
> software that need not to be aware of unicodeness of the text it manages
> to handle it without any hickups *and* to store in the text information
> about multibyte characters.What characters exactly you do mean? NULL?
> There is no NULL byte in any UTF-8 string except the one which
> terminates it.

That's exactly how ext3, reiserfs, xfs, jfs, etc. all work.  A few
filesystems are tagged as using some specific encoding.  If your
filesystem is marked for iso-8859-1, what should a kernel with a
conversion mechanism do if a user tries to name a file 김?

>> I think UDF is a better filesystem for many types of media since it is
>> able to me more gently to the sectors storing the meta data than VFAT
>> ever will be.
>
> I've tried cd packet writing with UDF and it gives insane overhead of
> about 20%. What metadata you'd like to store for example on your
> flashdrive or a floppy disk?

Filename, timestamps, all the usual.

-- 
Måns Rullgård
mru@inprovide.com

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16  1:52         ` Patrick McFarland
@ 2005-06-16 10:14           ` Alexey Zaytsev
  0 siblings, 0 replies; 70+ messages in thread
From: Alexey Zaytsev @ 2005-06-16 10:14 UTC (permalink / raw)
  To: Patrick McFarland; +Cc: Alan Cox, Linux Kernel Mailing List

On 16/06/05, Patrick McFarland <pmcfarland@downeast.net> wrote:
> > Now I quite agree that it isn't a Great Idea to do such conversion in
> > the kernel, but the problem still remains and there is no other place
> > we can do it. I belive that it should be done now and removed after
> > the world finishes to move to utf. Maybe it should not be applyed to
> > the main kernel tree, but I'm sure that at least Russian linux
> > distributions will like it.
> 
> I partially agree. I think no userland application should have access to the
> un-'fixed' file names; they should be fed only Unicode to prevent the spread
> and acceptance of out of date encodings.
> 
> Forcing users to do smart things is often the only way to make them do smart
> things, and the lack of acceptance of Unicode on Linux in the wild seems to
> be the only way.

I'm not going to force anybody to do anything. There is a nubmer of
reasons to use  unicode, but if somebody finds them not convincing, he
is free to use any other encoding. If somebody uses koi8-r as his
primary encoding and wants to mount a cp1251 or unicode-encoded file
system, he is free to do it, although he should be ready to loose some
characters.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16  1:44                             ` Patrick McFarland
@ 2005-06-16 10:38                               ` Måns Rullgård
  2005-06-16 11:36                               ` Bernd Eckenfels
  2005-06-16 20:41                               ` Rob Sims
  2 siblings, 0 replies; 70+ messages in thread
From: Måns Rullgård @ 2005-06-16 10:38 UTC (permalink / raw)
  To: linux-kernel

Patrick McFarland <pmcfarland@downeast.net> writes:

> On Wednesday 15 June 2005 07:34 pm, Lukasz Stelmach wrote:
>> That's why UTF-8 is suggested. UTF-8 has been developed to "fool" the
>> software that need not to be aware of unicodeness of the text it manages
>> to handle it without any hickups *and* to store in the text information
>> about multibyte characters.What characters exactly you do mean? NULL?
>> There is no NULL byte in any UTF-8 string except the one which
>> terminates it.
>
> Bingo. Only the operating system itself and software displaying
> filenames needs to understand Unicode; the file system
> implementation itself just knows its a string of bytes and nothing
> else.

Not even the OS needs to know what the bytes mean.  Only applications
displaying the names have reason to interpret the bytes they are
composed of in any specific manner.

-- 
Måns Rullgård
mru@inprovide.com


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16  1:44                             ` Patrick McFarland
  2005-06-16 10:38                               ` Måns Rullgård
@ 2005-06-16 11:36                               ` Bernd Eckenfels
  2005-06-16 20:41                               ` Rob Sims
  2 siblings, 0 replies; 70+ messages in thread
From: Bernd Eckenfels @ 2005-06-16 11:36 UTC (permalink / raw)
  To: linux-kernel

In article <200506152144.56540.pmcfarland@downeast.net> you wrote:
> Bingo. Only the operating system itself and software displaying filenames 
> needs to understand Unicode; the file system implementation itself just knows 
> its a string of bytes and nothing else.

The filesystem needs to understand and translate the path names, if the
Filesystem specification mandates a different encoding for file names (for
example UTF-16). Thats why you fequently see translations in legacy
filesystems which have national encoding in the on-disk format (think FAT)

And if the filesystem uses unicode, you also need to enforce the naming of
files in UTF-8 to do reliable translation. Currently user mode may also send
you non-UTF8 national chars.

Bernd

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-15 23:34                           ` Lukasz Stelmach
  2005-06-16  1:44                             ` Patrick McFarland
  2005-06-16  9:40                             ` Måns Rullgård
@ 2005-06-16 13:39                             ` Lennart Sorensen
  2005-06-16 14:44                               ` Richard B. Johnson
  2 siblings, 1 reply; 70+ messages in thread
From: Lennart Sorensen @ 2005-06-16 13:39 UTC (permalink / raw)
  To: Lukasz Stelmach
  Cc: mru, Patrick McFarland, Alexander E. Patrakov, linux-kernel

On Thu, Jun 16, 2005 at 01:34:13AM +0200, Lukasz Stelmach wrote:
> That's why UTF-8 is suggested. UTF-8 has been developed to "fool" the
> software that need not to be aware of unicodeness of the text it manages
> to handle it without any hickups *and* to store in the text information
> about multibyte characters.What characters exactly you do mean? NULL?
> There is no NULL byte in any UTF-8 string except the one which
> terminates it.

That is true.  UTF-8 wouldn't cause any more problems than ascii already
does, such as some filesystems not allowing : and * in filenames among
other characters.

> Yes, it uses unicode. And dos codepages in short ones. To prove this
> take a vfat floppy and mount it. touch(1) a file on it that has some
> non latin1 characters. Unmount the floppy then do dd if=/dev/fd0
> of=/tmp/floppy bs=1024 count=512. While it's done take some hex
> editor/viewer and seek the latin1-complaint part of the filename
> in the floppy file (search for uppercase string). Righ above the short
> filename you'll find multibyte long one.

Well at least that seems like they did something right when they
extended FAT with VFAT.  Doesn't make FAT a good filesystem, but it does
make the filename extension pretty nice, much as it is an ugly hack too.

> I've tried cd packet writing with UDF and it gives insane overhead of
> about 20%. What metadata you'd like to store for example on your
> flashdrive or a floppy disk?

The constant rewriting of the same sectors that store the FAT is really
bad for some types of flash and other removeably media (like DVD-RAM).

I hadn't noticed any big overhead in UDF, although packet writing may
add some overhead itself (I never used packet writing).

Len Sorensen

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16  4:33           ` Jeremy Maitin-Shepard
@ 2005-06-16 14:37             ` Theodore Ts'o
  2005-06-19 17:55               ` Pavel Machek
  0 siblings, 1 reply; 70+ messages in thread
From: Theodore Ts'o @ 2005-06-16 14:37 UTC (permalink / raw)
  To: Jeremy Maitin-Shepard
  Cc: Patrick McFarland, Alan Cox, Alexey Zaytsev, Linux Kernel Mailing List

On Thu, Jun 16, 2005 at 12:33:16AM -0400, Jeremy Maitin-Shepard wrote:
> > Ext2/3's encoding has always been utf-8.  Period.
> 
> In what way does Ext2/3 know or care about file name encoding?  Doesn't
> it just store an arbitrary 8-byte string?  Couldn't someone claim that
> from the start it was designed to use iso8859-1 just as easily as you
> can claim it was designed to use utf-8?

Because we've had this discussion^H^H^H^H^H^H^H^H^H^H^H flame war
years ago, and despite people from Russia whining that that it took 3
bytes to encode each Cyrillic character in UTF-8, it's where we came out.  

The bottom-line though is that if someone files a bug report with ext3
because one user on the system was is creating filenames in Japanese,
and another user on the same time-sharing system is creating filenames
in Germany, and they fail to interoperate, and they were doing so in
their local language, we would laugh at them --- just as people
writing mail programs would laugh at people who complained that they
were running into problems Just Sending 8-bits instead of using MIME,
and could you please fix this business-critical bug?  

Or as more and more desktop programs start interpreting the filenames
as UTF-8, and people with local variations get screwed, that is their
problem, and Not Ours.

So no, we can't prevent anyone from shooting them in the foot.
However, if they *do* take the gun, aim it straight downwards, and
pull the trigger, we aren't obligated to help.

						- Ted

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16 13:39                             ` Lennart Sorensen
@ 2005-06-16 14:44                               ` Richard B. Johnson
  2005-06-16 15:04                                 ` Lennart Sorensen
  0 siblings, 1 reply; 70+ messages in thread
From: Richard B. Johnson @ 2005-06-16 14:44 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Lukasz Stelmach, mru, Patrick McFarland, Alexander E. Patrakov,
	linux-kernel

On Thu, 16 Jun 2005, Lennart Sorensen wrote:

> On Thu, Jun 16, 2005 at 01:34:13AM +0200, Lukasz Stelmach wrote:
>> That's why UTF-8 is suggested. UTF-8 has been developed to "fool" the
>> software that need not to be aware of unicodeness of the text it manages
>> to handle it without any hickups *and* to store in the text information
>> about multibyte characters.What characters exactly you do mean? NULL?
>> There is no NULL byte in any UTF-8 string except the one which
>> terminates it.
>
> That is true.  UTF-8 wouldn't cause any more problems than ascii already
> does, such as some filesystems not allowing : and * in filenames among
> other characters.
>
>> Yes, it uses unicode. And dos codepages in short ones. To prove this
>> take a vfat floppy and mount it. touch(1) a file on it that has some
>> non latin1 characters. Unmount the floppy then do dd if=/dev/fd0
>> of=/tmp/floppy bs=1024 count=512. While it's done take some hex
>> editor/viewer and seek the latin1-complaint part of the filename
>> in the floppy file (search for uppercase string). Righ above the short
>> filename you'll find multibyte long one.
>
[SNIPPED...]

>
> Len Sorensen

You know this problem was "solved" over 20 years ago when it was
discovered that file-names could never be long enough. The solution
was a container-file which contained as much stuff as necessary to
identity the contents of the file that it was associated with. Using
this technique, the "real" file didn't need any ASCII identifiers. The
real file didn't show up in some directory program, just the contents
of the container-file. This same technique could be used for any
arbitrary file-identification including characters that haven't been
invented yet.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11.9 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached for review by Dictator Bush.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16 14:44                               ` Richard B. Johnson
@ 2005-06-16 15:04                                 ` Lennart Sorensen
  2005-06-17  1:18                                   ` Patrick McFarland
  2005-06-18 22:30                                   ` Bernd Eckenfels
  0 siblings, 2 replies; 70+ messages in thread
From: Lennart Sorensen @ 2005-06-16 15:04 UTC (permalink / raw)
  To: Richard B. Johnson
  Cc: Lukasz Stelmach, mru, Patrick McFarland, Alexander E. Patrakov,
	linux-kernel

On Thu, Jun 16, 2005 at 10:44:52AM -0400, Richard B. Johnson wrote:
> You know this problem was "solved" over 20 years ago when it was
> discovered that file-names could never be long enough. The solution
> was a container-file which contained as much stuff as necessary to
> identity the contents of the file that it was associated with. Using
> this technique, the "real" file didn't need any ASCII identifiers. The
> real file didn't show up in some directory program, just the contents
> of the container-file. This same technique could be used for any
> arbitrary file-identification including characters that haven't been
> invented yet.

Why am I suddenly reminded of apple's idiotic filesystem forks for
resources and data?  Such a pain when trying to transfer files to other
types of filesystems.  Modifying the files themselves also doesn't seem
like the right solution.

As for filenames never being long enough, I don't think that is true.
Filenames CAN be too long, but I don't see very many people think 250
characters makes for a useful filename.  Most people seem happy with 50
or so being a good limit even though many systems support much longer.
8 wasn't enough, and 25 or 30 was sometimes a bit short, but usually
enough.  Not having enough filename length doesn't seem to be a problem
in need of a solution on most systems anymore.

Len Sorensen

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16  1:44                             ` Patrick McFarland
  2005-06-16 10:38                               ` Måns Rullgård
  2005-06-16 11:36                               ` Bernd Eckenfels
@ 2005-06-16 20:41                               ` Rob Sims
  2 siblings, 0 replies; 70+ messages in thread
From: Rob Sims @ 2005-06-16 20:41 UTC (permalink / raw)
  To: linux-kernel

On Wed, Jun 15, 2005 at 09:44:55PM -0400, Patrick McFarland wrote:
> > I've tried cd packet writing with UDF and it gives insane overhead of
> > about 20%. What metadata you'd like to store for example on your
> > flashdrive or a floppy disk?
> 
> Uh, 20%? That sounds awfully high. You sure you didn't do something wrong?

Fixed packet recording under UDF records data in 32/39 of sectors; this
alone is 18%; I can easily see 2% in UDF itself.

More specifically, each packet has a seven sector overhead.  UDF sets
the packet size to 32.
-- 
Rob

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16 15:04                                 ` Lennart Sorensen
@ 2005-06-17  1:18                                   ` Patrick McFarland
  2005-06-17  8:21                                     ` Måns Rullgård
  2005-06-17 12:56                                     ` Lennart Sorensen
  2005-06-18 22:30                                   ` Bernd Eckenfels
  1 sibling, 2 replies; 70+ messages in thread
From: Patrick McFarland @ 2005-06-17  1:18 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Richard B. Johnson, Lukasz Stelmach, mru, Alexander E. Patrakov,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 635 bytes --]

On Thursday 16 June 2005 11:04 am, Lennart Sorensen wrote:
>  Most people seem happy with 50 or so being a good limit even though many
>  systems support much longer. 

50 characters or 50 bytes? Because in the case of UTF-8, if you do a lot of 
three byte characters (which require four bites to encode), 50 bytes is very 
short.

-- 
Patrick "Diablo-D3" McFarland || pmcfarland@downeast.net
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-17  1:18                                   ` Patrick McFarland
@ 2005-06-17  8:21                                     ` Måns Rullgård
  2005-06-17  8:49                                       ` Patrick McFarland
  2005-06-17 12:56                                     ` Lennart Sorensen
  1 sibling, 1 reply; 70+ messages in thread
From: Måns Rullgård @ 2005-06-17  8:21 UTC (permalink / raw)
  To: Patrick McFarland
  Cc: Lennart Sorensen, Richard B. Johnson, Lukasz Stelmach,
	Alexander E. Patrakov, linux-kernel

Patrick McFarland <pmcfarland@downeast.net> writes:

> On Thursday 16 June 2005 11:04 am, Lennart Sorensen wrote:
>>  Most people seem happy with 50 or so being a good limit even though many
>>  systems support much longer. 
>
> 50 characters or 50 bytes? Because in the case of UTF-8, if you do a lot of 
> three byte characters (which require four bites to encode), 50 bytes is very 
> short.

What do you mean by three-byte characters requiring four bytes to
encode?  Is a three-byte character not a character encoded using three
bytes?

As for 50 bytes being too short, many of the multibyte characters are
equivalent to several English characters, so fewer of them are
required.  You have a point, though.

-- 
Måns Rullgård
mru@inprovide.com

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-17  8:21                                     ` Måns Rullgård
@ 2005-06-17  8:49                                       ` Patrick McFarland
  2005-06-17  9:17                                         ` Måns Rullgård
                                                           ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Patrick McFarland @ 2005-06-17  8:49 UTC (permalink / raw)
  To: Måns Rullgård
  Cc: Lennart Sorensen, Richard B. Johnson, Lukasz Stelmach,
	Alexander E. Patrakov, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1813 bytes --]

On Friday 17 June 2005 04:21 am, Måns Rullgård wrote:
> Patrick McFarland <pmcfarland@downeast.net> writes:
> > On Thursday 16 June 2005 11:04 am, Lennart Sorensen wrote:
> >>  Most people seem happy with 50 or so being a good limit even though
> >> many systems support much longer.
> >
> > 50 characters or 50 bytes? Because in the case of UTF-8, if you do a lot
> > of three byte characters (which require four bites to encode), 50 bytes
> > is very short.
>
> What do you mean by three-byte characters requiring four bytes to
> encode?  Is a three-byte character not a character encoded using three
> bytes?

(implication of utf8 and not utf16 goes here)

Very few Unicode characters require three bytes, instead of the usual one or 
two.

For one byte you just have the byte. 

For two bytes, you really have three: a control code stating "the following 
two bytes are a two byte character", and then the two bytes. 

For three bytes, you really have four bytes: a control code stating "the 
following three bytes are a three byte character" and then the three bytes.

Unless I've completely misunderstood the Unicode specification, this is what 
is going on.

> As for 50 bytes being too short, many of the multibyte characters are
> equivalent to several English characters, so fewer of them are
> required.  You have a point, though.

Any English characters (ie, the first 127 ascii characters) map directly to 
the first 127 Unicode characters (if thats what you meant).

-- 
Patrick "Diablo-D3" McFarland || pmcfarland@downeast.net
"Computer games don't affect kids; I mean if Pac-Man affected us as kids, we'd 
all be running around in darkened rooms, munching magic pills and listening to
repetitive electronic music." -- Kristian Wilson, Nintendo, Inc, 1989

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-17  8:49                                       ` Patrick McFarland
@ 2005-06-17  9:17                                         ` Måns Rullgård
  2005-06-17  9:37                                           ` Måns Rullgård
  2005-06-17  9:41                                         ` Bernd Eckenfels
  2005-06-17 13:09                                         ` Lennart Sorensen
  2 siblings, 1 reply; 70+ messages in thread
From: Måns Rullgård @ 2005-06-17  9:17 UTC (permalink / raw)
  To: Patrick McFarland
  Cc: Lennart Sorensen, Richard B. Johnson, Lukasz Stelmach,
	Alexander E. Patrakov, linux-kernel

Patrick McFarland <pmcfarland@downeast.net> writes:

> On Friday 17 June 2005 04:21 am, Måns Rullgård wrote:
>> Patrick McFarland <pmcfarland@downeast.net> writes:
>> > On Thursday 16 June 2005 11:04 am, Lennart Sorensen wrote:
>> >>  Most people seem happy with 50 or so being a good limit even though
>> >> many systems support much longer.
>> >
>> > 50 characters or 50 bytes? Because in the case of UTF-8, if you do a lot
>> > of three byte characters (which require four bites to encode), 50 bytes
>> > is very short.
>>
>> What do you mean by three-byte characters requiring four bytes to
>> encode?  Is a three-byte character not a character encoded using three
>> bytes?
>
> (implication of utf8 and not utf16 goes here)
>
> Very few Unicode characters require three bytes, instead of the
> usual one or two.

I wouldn't the Chinese, Japanese, and Korean characters "very few",
and those all require (at least) three bytes.

> For one byte you just have the byte. 

Correct.

> For two bytes, you really have three: a control code stating "the
> following two bytes are a two byte character", and then the two
> bytes.
>
> For three bytes, you really have four bytes: a control code stating
> "the following three bytes are a three byte character" and then the
> three bytes.

Wrong.  The first byte indicates the total size of the character, but
it also contains data, like this:

  0xxxxxxx
  110xxxxx 10xxxxxx
  1110xxxx 10xxxxxx 10xxxxxx
  11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Refer to the Unicode standard, section 3.9 for the full details.

>> As for 50 bytes being too short, many of the multibyte characters are
>> equivalent to several English characters, so fewer of them are
>> required.  You have a point, though.
>
> Any English characters (ie, the first 127 ascii characters) map
> directly to the first 127 Unicode characters (if thats what you
> meant).

Let me clarify with an example.  The common Korean name Kim consists
of three ascii characters.  The Hangul spelling, ~, is encoded in
utf-8 using three bytes.  Even though a three-byte character was used,
the number of bytes is the same.

-- 
Måns Rullgård
mru@inprovide.com

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-17  9:17                                         ` Måns Rullgård
@ 2005-06-17  9:37                                           ` Måns Rullgård
  0 siblings, 0 replies; 70+ messages in thread
From: Måns Rullgård @ 2005-06-17  9:37 UTC (permalink / raw)
  To: linux-kernel


Looks like something ate the Hangul.  I'll try again, without any
other non-ascii characters.

>>> As for 50 bytes being too short, many of the multibyte characters are
>>> equivalent to several English characters, so fewer of them are
>>> required.  You have a point, though.
>>
>> Any English characters (ie, the first 127 ascii characters) map
>> directly to the first 127 Unicode characters (if thats what you
>> meant).
>
> Let me clarify with an example.  The common Korean name Kim consists
> of three ascii characters.  The Hangul spelling, 김, is encoded in
> utf-8 using three bytes.  Even though a three-byte character was used,
> the number of bytes is the same.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-17  8:49                                       ` Patrick McFarland
  2005-06-17  9:17                                         ` Måns Rullgård
@ 2005-06-17  9:41                                         ` Bernd Eckenfels
  2005-06-17 13:09                                         ` Lennart Sorensen
  2 siblings, 0 replies; 70+ messages in thread
From: Bernd Eckenfels @ 2005-06-17  9:41 UTC (permalink / raw)
  To: linux-kernel

In article <200506170450.12943.pmcfarland@downeast.net> you wrote:
> (implication of utf8 and not utf16 goes here)
> 
> Very few Unicode characters require three bytes, instead of the usual one or 
> two.

UTF-8 2 bytes end with U+07ff which covers only Latin, Cyrillic, Hebrew and
Arabic.

All JCK Unified Ideographs  (U+4E00-) and Extensions (U+3400-) have 3 byte
encodings with UTF-8. Some of the B Extensions even use 4 bytes (U+20000-)

> For one byte you just have the byte. 

For ASCII you have one byte.

> For two bytes, you really have three: a control code stating "the following 
> two bytes are a two byte character", and then the two bytes. 

Umm, thats a bit missleading. UTF-8 works with bit not byte prefixes.
Unicode code points are integers and depending on the encoding represented
as multiple code points, which can be represented as bytes.

> Unless I've completely misunderstood the Unicode specification, this is what 
> is going on.

You might want to look up Joel's Tutorial or just browse the Unihan Database:
http://www.joelonsoftware.com/articles/Unicode.html
http://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=3400
http://www.unicode.org/cgi-bin/UnihanGrid.pl?codepoint=U+07F1&useutf8=false

Greetings
Bernd

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-17  1:18                                   ` Patrick McFarland
  2005-06-17  8:21                                     ` Måns Rullgård
@ 2005-06-17 12:56                                     ` Lennart Sorensen
  1 sibling, 0 replies; 70+ messages in thread
From: Lennart Sorensen @ 2005-06-17 12:56 UTC (permalink / raw)
  To: Patrick McFarland
  Cc: Richard B. Johnson, Lukasz Stelmach, mru, Alexander E. Patrakov,
	linux-kernel

On Thu, Jun 16, 2005 at 09:18:06PM -0400, Patrick McFarland wrote:
> On Thursday 16 June 2005 11:04 am, Lennart Sorensen wrote:
> >  Most people seem happy with 50 or so being a good limit even though many
> >  systems support much longer. 
> 
> 50 characters or 50 bytes? Because in the case of UTF-8, if you do a lot of 
> three byte characters (which require four bites to encode), 50 bytes is very 
> short.

I would think most languages that need 2 or 3 bytes per character would
need a lot less characters, although I think I can think of a few cases
where that isn't true.

Well how about making it '50 characters seems plenty for most people to
be happy'.  If you can handle a couple hundred bytes that should be ok,
and I think most filesystems can.

Len Sorensen

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-17  8:49                                       ` Patrick McFarland
  2005-06-17  9:17                                         ` Måns Rullgård
  2005-06-17  9:41                                         ` Bernd Eckenfels
@ 2005-06-17 13:09                                         ` Lennart Sorensen
  2005-06-17 13:23                                           ` Måns Rullgård
  2 siblings, 1 reply; 70+ messages in thread
From: Lennart Sorensen @ 2005-06-17 13:09 UTC (permalink / raw)
  To: Patrick McFarland
  Cc: M?ns Rullg?rd, Richard B. Johnson, Lukasz Stelmach,
	Alexander E. Patrakov, linux-kernel

On Fri, Jun 17, 2005 at 04:49:33AM -0400, Patrick McFarland wrote:
> (implication of utf8 and not utf16 goes here)
> 
> Very few Unicode characters require three bytes, instead of the usual one or 
> two.
> 
> For one byte you just have the byte. 
> 
> For two bytes, you really have three: a control code stating "the following 
> two bytes are a two byte character", and then the two bytes. 
> 
> For three bytes, you really have four bytes: a control code stating "the 
> following three bytes are a three byte character" and then the three bytes.
> 
> Unless I've completely misunderstood the Unicode specification, this is what 
> is going on.

You have probably slightly misunderstood UTF8 at least.  UTF8 tries very
hard to make sure you can't mistake the characters for ascii, so it
makes the first byte contains some 1's follwed by one zero.  The number
of 1's indicates how many bytes the character contains, after the 0 the
remaining bits is used to store bits for the character.  The remaining
bytes are all 10xxxxxx which stores another 6 bites of the character code.
One is required to use the shortest form of utf8 that can store the
character you are encoding.

x's are where the bits for the character number go:
0xxxxxxx encodes character 0-127
110xxxxx 10xxxxxx encodes character 128-2047
1110xxxx 10xxxxxx 10xxxxxx encodes characters 2048-65535
etc up to
1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx encodes characters
67108864-2147483647

As far as I know, unicode doesn't currently define anything past 20bits
or so, so probably 4bytes is the most you will see in normal use, with 3
bytes covering quite a large number of the characters.

> Any English characters (ie, the first 127 ascii characters) map directly to 
> the first 127 Unicode characters (if thats what you meant).

Well utf8 also is backwards compatible with ascii to make handling text
files simpler.  You could encode the ascii characters using the other
part of UTF8 except that would violate the rule of using the shortest
form possible.

Len Sorensen

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-17 13:09                                         ` Lennart Sorensen
@ 2005-06-17 13:23                                           ` Måns Rullgård
  2005-06-18 16:04                                             ` Robin Rosenberg
  0 siblings, 1 reply; 70+ messages in thread
From: Måns Rullgård @ 2005-06-17 13:23 UTC (permalink / raw)
  To: Lennart Sorensen
  Cc: Patrick McFarland, Richard B. Johnson, Lukasz Stelmach,
	Alexander E. Patrakov, linux-kernel

lsorense@csclub.uwaterloo.ca (Lennart Sorensen) writes:
> You have probably slightly misunderstood UTF8 at least.  UTF8 tries very
> hard to make sure you can't mistake the characters for ascii, so it
> makes the first byte contains some 1's follwed by one zero.  The number
> of 1's indicates how many bytes the character contains, after the 0 the
> remaining bits is used to store bits for the character.  The remaining
> bytes are all 10xxxxxx which stores another 6 bites of the character code.
> One is required to use the shortest form of utf8 that can store the
> character you are encoding.

Some characters can be encoded in several equally shortest ways.  For
instance, characters with multiple diacritics can have these applied
in different orders.  One of these is designated the canonical
encoding, and should be used in favor of the others.  Those things,
among others, are what makes unicode difficult to deal with.

-- 
Måns Rullgård
mru@inprovide.com

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16  9:40                             ` Måns Rullgård
@ 2005-06-18 14:48                               ` Lukasz Stelmach
  2005-06-18 23:22                                 ` Måns Rullgård
  0 siblings, 1 reply; 70+ messages in thread
From: Lukasz Stelmach @ 2005-06-18 14:48 UTC (permalink / raw)
  To: mru; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2196 bytes --]

Måns Rullgård napisał(a):
>>>What do you do if the underlying filesystem can not store some unicode
>>>characters that are allowed on others?
>>
>>That's why UTF-8 is suggested. UTF-8 has been developed to "fool" the
>>software that need not to be aware of unicodeness of the text it manages
>>to handle it without any hickups *and* to store in the text information
>>about multibyte characters.What characters exactly you do mean? NULL?
>>There is no NULL byte in any UTF-8 string except the one which
>>terminates it.
> 
> That's exactly how ext3, reiserfs, xfs, jfs, etc. all work.  A few
> filesystems are tagged as using some specific encoding.  If your
> filesystem is marked for iso-8859-1, what should a kernel with a
> conversion mechanism do if a user tries to name a file 김?

Return -ENOENT? I am guessing. But please tell me what should do
userland software if it runs with locale set to something.iso-8859-2 and
finds 김 in the directory? That is the same problem. And for now ISO
8-bit encodings are far more popolar and usefull with contemporary tools
than UTF-8. That is why I think suggestion of a layer in the kernel that
would translate filenames form utf-8 stored on the media to e.g. latin2
used by tools is quite reasonable. Especially when there is more than
one encoding for a particular language (think Russian, Polish). Even
more, with such a facility transition would be much more greaceful since
you could have utf-8 filesystem and then you can worry about tools other
tools. The filesystem is already populated with UFT-8 names.

>>>I think UDF is a better filesystem for many types of media since it is
>>>able to me more gently to the sectors storing the meta data than VFAT
>>>ever will be.
>>I've tried cd packet writing with UDF and it gives insane overhead of
>>about 20%. What metadata you'd like to store for example on your
>>flashdrive or a floppy disk?
> 
> Filename, timestamps, all the usual.

That's why IMHO FAT is quite enough for this purpose.

-- 
Było mi bardzo miło.                    Trzecia pospolita klęska, [...]
>Łukasz<                      Już nie katolicka lecz złodziejska.  (c)PP


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC] Filesystem name storage (Was: A Great Idea (tm) about reimplementing NLS.)
  2005-06-16  3:59     ` [RFC] Filesystem name storage (Was: A Great Idea (tm) about reimplementing NLS.) Kyle Moffett
@ 2005-06-18 15:24       ` Lukasz Stelmach
  0 siblings, 0 replies; 70+ messages in thread
From: Lukasz Stelmach @ 2005-06-18 15:24 UTC (permalink / raw)
  To: Kyle Moffett
  Cc: Patrick McFarland, Denis Vlasenko, Alexey Zaytsev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 621 bytes --]

Kyle Moffett napisał(a):

> Would the following system for filenames resolve most of the issues  people
> are raising:
> 
> First load charset tables into the kernel.  These would be stored in 
> files in userspace
[...]

Just like keyboard maps?

> The following mount options would available:
>   nls_raw=(0|1)  [default 1]:
[...]
>   nls_disk=<string-charset>
[...]
>   nls_user=<string-charset>

It sounds quite reasonable to me :-)


-- 
Było mi bardzo miło.                    Trzecia pospolita klęska, [...]
>Łukasz<                      Już nie katolicka lecz złodziejska.  (c)PP


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-17 13:23                                           ` Måns Rullgård
@ 2005-06-18 16:04                                             ` Robin Rosenberg
  2005-06-18 18:06                                               ` Kari Hurtta
  2005-06-18 19:09                                               ` Bernd Eckenfels
  0 siblings, 2 replies; 70+ messages in thread
From: Robin Rosenberg @ 2005-06-18 16:04 UTC (permalink / raw)
  To: Kernel Mailing List
  Cc: Måns Rullgård, Lennart Sorensen, Patrick McFarland,
	Richard B. Johnson, Lukasz Stelmach, Alexander E. Patrakov

fredagen den 17 juni 2005 15.23 skrev Måns Rullgård:
> Some characters can be encoded in several equally shortest ways.  

No they cannot. How to encode characters i explicitly and well defined. If you 
don't follow the rules you are simply not producing UTF-8, but something 
else.

Every unicode character has exactly one  UTF-8 representation. 

-- robin


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-18 16:04                                             ` Robin Rosenberg
@ 2005-06-18 18:06                                               ` Kari Hurtta
  2005-06-18 21:57                                                 ` Robin Rosenberg
  2005-06-18 19:09                                               ` Bernd Eckenfels
  1 sibling, 1 reply; 70+ messages in thread
From: Kari Hurtta @ 2005-06-18 18:06 UTC (permalink / raw)
  To: Robin Rosenberg
  Cc: Kernel Mailing List, Måns Rullgård, Lennart Sorensen,
	Patrick McFarland, Richard B. Johnson, Lukasz Stelmach,
	Alexander E. Patrakov

> fredagen den 17 juni 2005 15.23 skrev Måns Rullgård:
> > Some characters can be encoded in several equally shortest ways.  
> 
> No they cannot. How to encode characters i explicitly and well defined. If you 
> don't follow the rules you are simply not producing UTF-8, but something 
> else.
> 
> Every unicode character has exactly one  UTF-8 representation. 
> 
> -- robin

You are confused between unicode characters and unicode codepoints.

Every unicode codepoint has exactly one  UTF-8 representation.

Unicode characters may use one ore more unicode codepoints.

Some characters have also representation with one codepoint, but not all.

For example

	LATIN CAPITAL LETTER A WITH ACUTE

have presentation	0041 0301

That is two unicode codepoints.  That character
have also other (compatibility) representation

that is			00C1


But consider (somewhat imaginary) character

	LATIN CAPITAL LETTER A WITH GRAVE AND CIRCUMFLEX

that have presentation	0041 0300 0302

but it have also presentation	0041 0302 0300


Both presentations are equal short.



/ Kari Hurtta	



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-18 16:04                                             ` Robin Rosenberg
  2005-06-18 18:06                                               ` Kari Hurtta
@ 2005-06-18 19:09                                               ` Bernd Eckenfels
  1 sibling, 0 replies; 70+ messages in thread
From: Bernd Eckenfels @ 2005-06-18 19:09 UTC (permalink / raw)
  To: linux-kernel

In article <200506181804.21366.robin.rosenberg.lists@dewire.com> you wrote:
> Every unicode character has exactly one  UTF-8 representation. 

Every unicode code point has exactly one UTF-8 representation, however there
are for a few glyphs multiple code points. And this is not only a problem
beause of homoglphys which look like/similiar, but also because of combining
characters vs. legacy characters. However thats more an issue of the user
interface (think IDN exploits).

Personally I think the on-disk  filesystem format should be required to be
UTF-8, and its an open discussion if the syscalls accept UTF-8 or locale
byte encodings. Currently its a mess. We can learn from Windows here:)

Greetings
Bernd

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-18 18:06                                               ` Kari Hurtta
@ 2005-06-18 21:57                                                 ` Robin Rosenberg
  0 siblings, 0 replies; 70+ messages in thread
From: Robin Rosenberg @ 2005-06-18 21:57 UTC (permalink / raw)
  To: Kernel Mailing List
  Cc: Kari Hurtta, Måns Rullgård, Lennart Sorensen,
	Patrick McFarland, Richard B. Johnson, Lukasz Stelmach,
	Alexander E. Patrakov

lördagen den 18 juni 2005 20.06 skrev Kari Hurtta:
> > fredagen den 17 juni 2005 15.23 skrev M	åns Rullgård:
> > > Some characters can be encoded in several equally shortest ways.
> >
> > No they cannot. How to encode characters i explicitly and well defined.
> > If you don't follow the rules you are simply not producing UTF-8, but
> > something else.
> >
> > Every unicode character has exactly one  UTF-8 representation.
> >
> > -- robin
>
> You are confused between unicode characters and unicode codepoints.

Yes. i forgot about them. Thank you for reminding me. 

-- robin

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16 15:04                                 ` Lennart Sorensen
  2005-06-17  1:18                                   ` Patrick McFarland
@ 2005-06-18 22:30                                   ` Bernd Eckenfels
  1 sibling, 0 replies; 70+ messages in thread
From: Bernd Eckenfels @ 2005-06-18 22:30 UTC (permalink / raw)
  To: linux-kernel

In article <20050616150419.GY23488@csclub.uwaterloo.ca> you wrote:
> As for filenames never being long enough, I don't think that is true.
> Filenames CAN be too long, but I don't see very many people think 250
> characters makes for a useful filename.  Most people seem happy with 50
> or so being a good limit even though many systems support much longer.
> 8 wasn't enough, and 25 or 30 was sometimes a bit short, but usually
> enough.  Not having enough filename length doesn't seem to be a problem
> in need of a solution on most systems anymore.

pathname length however is a bigger issue.  255bytes is pretty short,
especially if you distributed file systems with deeply nested mountpoints.

Greetings
Bernd

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-18 14:48                               ` Lukasz Stelmach
@ 2005-06-18 23:22                                 ` Måns Rullgård
  2005-06-22  8:41                                   ` Lukasz Stelmach
  0 siblings, 1 reply; 70+ messages in thread
From: Måns Rullgård @ 2005-06-18 23:22 UTC (permalink / raw)
  To: Lukasz Stelmach; +Cc: linux-kernel

Lukasz Stelmach <stlman@poczta.fm> writes:

> Måns Rullgård napisał(a):
>>>>What do you do if the underlying filesystem can not store some unicode
>>>>characters that are allowed on others?
>>>
>>>That's why UTF-8 is suggested. UTF-8 has been developed to "fool" the
>>>software that need not to be aware of unicodeness of the text it manages
>>>to handle it without any hickups *and* to store in the text information
>>>about multibyte characters.What characters exactly you do mean? NULL?
>>>There is no NULL byte in any UTF-8 string except the one which
>>>terminates it.
>> 
>> That's exactly how ext3, reiserfs, xfs, jfs, etc. all work.  A few
>> filesystems are tagged as using some specific encoding.  If your
>> filesystem is marked for iso-8859-1, what should a kernel with a
>> conversion mechanism do if a user tries to name a file 김?
>
> Return -ENOENT? I am guessing.

Doesn't seem very friendly.

> But please tell me what should do userland software if it runs with
> locale set to something.iso-8859-2 and finds 김 in the directory?

I suppose it will display ęš (0x80 doesn't seem be a printable
iso-8859-2 character).  You told it to use iso-8859-2 in the first
place, so what do you expect?

> That is the same problem. And for now ISO 8-bit encodings are far
> more popolar and usefull with contemporary tools than UTF-8.

ISO 8-bit encodings are more common with characters they can
represent.  These are a small minority of all characters commonly
used.

> That is why I think suggestion of a layer in the kernel that would
> translate filenames form utf-8 stored on the media to e.g. latin2
> used by tools is quite reasonable. Especially when there is more
> than one encoding for a particular language (think Russian,
> Polish). Even more, with such a facility transition would be much
> more greaceful since you could have utf-8 filesystem and then you
> can worry about tools other tools. The filesystem is already
> populated with UFT-8 names.

How is the kernel to know what to translate to/from?

>>>>I think UDF is a better filesystem for many types of media since it is
>>>>able to me more gently to the sectors storing the meta data than VFAT
>>>>ever will be.
>>>I've tried cd packet writing with UDF and it gives insane overhead of
>>>about 20%. What metadata you'd like to store for example on your
>>>flashdrive or a floppy disk?
>> 
>> Filename, timestamps, all the usual.
>
> That's why IMHO FAT is quite enough for this purpose.

FAT has a bad habit of constantly hammering the same sectors over and
over.  This can wear out cheap flash media in no time at all.

-- 
Måns Rullgård
mru@inprovide.com

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-16 14:37             ` Theodore Ts'o
@ 2005-06-19 17:55               ` Pavel Machek
  2005-06-20 18:38                 ` Alan Cox
  0 siblings, 1 reply; 70+ messages in thread
From: Pavel Machek @ 2005-06-19 17:55 UTC (permalink / raw)
  To: Theodore Ts'o, Jeremy Maitin-Shepard, Patrick McFarland,
	Alan Cox, Alexey Zaytsev, Linux Kernel Mailing List

Hi!

> > > Ext2/3's encoding has always been utf-8.  Period.
> > 
> > In what way does Ext2/3 know or care about file name encoding?  Doesn't
> > it just store an arbitrary 8-byte string?  Couldn't someone claim that
> > from the start it was designed to use iso8859-1 just as easily as you
> > can claim it was designed to use utf-8?
> 
> Because we've had this discussion^H^H^H^H^H^H^H^H^H^H^H flame war
> years ago, and despite people from Russia whining that that it took 3
> bytes to encode each Cyrillic character in UTF-8, it's where we came out.  
> 
> The bottom-line though is that if someone files a bug report with ext3
> because one user on the system was is creating filenames in Japanese,
> and another user on the same time-sharing system is creating filenames
> in Germany, and they fail to interoperate, and they were doing so in
> their local language, we would laugh at them --- just as people
> writing mail programs would laugh at people who complained that they
> were running into problems Just Sending 8-bits instead of using MIME,
> and could you please fix this business-critical bug?  
> 
> Or as more and more desktop programs start interpreting the filenames
> as UTF-8, and people with local variations get screwed, that is their
> problem, and Not Ours.

Actually the day we have rm utf-8-ed, we have a problem. Someone will
create two files that have same utf name, encoded differently, and
will be in trouble. Remember old > \* "hack"? utf-8 makes variation
possible...

If we are serious about utf-8 support in ext3, we should return
-EINVAL if someone passes non-canonical utf-8 string.

								Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-19 17:55               ` Pavel Machek
@ 2005-06-20 18:38                 ` Alan Cox
  2005-06-20 22:19                   ` Pavel Machek
  2005-08-26  0:00                   ` Daniel B.
  0 siblings, 2 replies; 70+ messages in thread
From: Alan Cox @ 2005-06-20 18:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Theodore Ts'o, Jeremy Maitin-Shepard, Patrick McFarland,
	Alexey Zaytsev, Linux Kernel Mailing List

On Sul, 2005-06-19 at 18:55, Pavel Machek wrote:
> Actually the day we have rm utf-8-ed, we have a problem. Someone will
> create two files that have same utf name, encoded differently, and
> will be in trouble. Remember old > \* "hack"? utf-8 makes variation
> possible...

They are different to POSIX as they are different byte sequences
> 
> If we are serious about utf-8 support in ext3, we should return
> -EINVAL if someone passes non-canonical utf-8 string.

That would ironically not be standards compliant


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-20 18:38                 ` Alan Cox
@ 2005-06-20 22:19                   ` Pavel Machek
  2005-06-20 23:38                     ` Andreas Schwab
  2005-08-26  0:00                   ` Daniel B.
  1 sibling, 1 reply; 70+ messages in thread
From: Pavel Machek @ 2005-06-20 22:19 UTC (permalink / raw)
  To: Alan Cox
  Cc: Theodore Ts'o, Jeremy Maitin-Shepard, Patrick McFarland,
	Alexey Zaytsev, Linux Kernel Mailing List

Hi!

> > Actually the day we have rm utf-8-ed, we have a problem. Someone will
> > create two files that have same utf name, encoded differently, and
> > will be in trouble. Remember old > \* "hack"? utf-8 makes variation
> > possible...
> 
> They are different to POSIX as they are different byte sequences

Does POSIX really say that all weird characters must be accepted in
path name?

> > If we are serious about utf-8 support in ext3, we should return
> > -EINVAL if someone passes non-canonical utf-8 string.
> 
> That would ironically not be standards compliant

I don't see how we can claim ext3 is utf-8 then. If application
vendors believed us and accepted that ext3 filenames are in utf-8,
they'd do wrong thing because kernel is perfectly willing to feed them
non-utf-8 things.

								Pavel
-- 
teflon -- maybe it is a trademark, but it should not be.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-20 22:19                   ` Pavel Machek
@ 2005-06-20 23:38                     ` Andreas Schwab
  0 siblings, 0 replies; 70+ messages in thread
From: Andreas Schwab @ 2005-06-20 23:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Alan Cox, Theodore Ts'o, Jeremy Maitin-Shepard,
	Patrick McFarland, Alexey Zaytsev, Linux Kernel Mailing List

Pavel Machek <pavel@suse.cz> writes:

> Hi!
>
>> > Actually the day we have rm utf-8-ed, we have a problem. Someone will
>> > create two files that have same utf name, encoded differently, and
>> > will be in trouble. Remember old > \* "hack"? utf-8 makes variation
>> > possible...
>> 
>> They are different to POSIX as they are different byte sequences
>
> Does POSIX really say that all weird characters must be accepted in
> path name?

POSIX only requires [A-Za-z0-9._-].

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-18 23:22                                 ` Måns Rullgård
@ 2005-06-22  8:41                                   ` Lukasz Stelmach
  0 siblings, 0 replies; 70+ messages in thread
From: Lukasz Stelmach @ 2005-06-22  8:41 UTC (permalink / raw)
  To: mru; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2824 bytes --]

Måns Rullgård napisał(a):

>>>That's exactly how ext3, reiserfs, xfs, jfs, etc. all work.  A few
>>>filesystems are tagged as using some specific encoding.  If your
>>>filesystem is marked for iso-8859-1, what should a kernel with a
>>>conversion mechanism do if a user tries to name a file 김?
>>
>>Return -ENOENT? I am guessing.
> 
> 
> Doesn't seem very friendly.

Well, if user marks her fs as iso-8859-1 that means that she doesn't
want it to contain filenames unrepresentable in this particular
codepage. Aleksey has begun the whole thread because in Russia there are
several, equally popular, different encodings for the same alphabet. And
in this context his proposal is quite good: develope general, fs
independent NLS layer.

>>But please tell me what should do userland software if it runs with
>>locale set to something.iso-8859-2 and finds 김 in the directory?
> 
> 
> I suppose it will display ęš (0x80 doesn't seem be a printable
> iso-8859-2 character).  You told it to use iso-8859-2 in the first
> place, so what do you expect?

ls(1) displays either \0nnn or ?. Or maybe some other mangling could be
done, however, octal representation seems to be ok.


>>That is the same problem. And for now ISO 8-bit encodings are far
>>more popular and usefull with contemporary tools than UTF-8.
> 
> 
> ISO 8-bit encodings are more common with characters they can
> represent.  These are a small minority of all characters commonly
> used.

OK. Let me be more general: fixed char width encodings. AFAIK Japanese
encodigs use 16bits, yet it is still fixed width.

>>That is why I think suggestion of a layer in the kernel that would
>>translate filenames form utf-8 stored on the media to e.g. latin2
>>used by tools is quite reasonable. Especially when there is more
>>than one encoding for a particular language (think Russian,
>>Polish). Even more, with such a facility transition would be much
>>more greaceful since you could have utf-8 filesystem and then you
>>can worry about tools other tools. The filesystem is already
>>populated with UFT-8 names.
> 
> 
> How is the kernel to know what to translate to/from?

Mount options. See the letter from Kyle Moffett
<C960854D-7EA5-4DD7-8F2B-7021092CE3EB@mac.com>


[ good filesystem for portable media ]
>>That's why IMHO FAT is quite enough for this purpose.
> 
> FAT has a bad habit of constantly hammering the same sectors over and
> over.  This can wear out cheap flash media in no time at all.

Maybe. I don't think that digital cameras or audio players will suppoty
UDF though. But that is something completly differnent.

-- 
By³o mi bardzo mi³o.                    Trzecia pospolita klêska, [...]
>£ukasz<                      Ju¿ nie katolicka lecz z³odziejska.  (c)PP


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 256 bytes --]

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-06-20 18:38                 ` Alan Cox
  2005-06-20 22:19                   ` Pavel Machek
@ 2005-08-26  0:00                   ` Daniel B.
  2005-08-26  8:34                     ` Bernd Petrovitsch
  2005-08-26 14:07                     ` Alan Cox
  1 sibling, 2 replies; 70+ messages in thread
From: Daniel B. @ 2005-08-26  0:00 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Alan Cox

Alan Cox wrote:
> 
> On Sul, 2005-06-19 at 18:55, Pavel Machek wrote:
> ...
> >
> > If we are serious about utf-8 support in ext3, we should return
> > -EINVAL if someone passes non-canonical utf-8 string.
> 
> That would ironically not be standards compliant

Which standards?

The standards I've read (mostly XML- and web-related specs)
do say that non-standard UTF-8 octet sequences should be rejected.


Daniel
-- 
Daniel Barclay
dsb@smart.net

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-08-26  0:00                   ` Daniel B.
@ 2005-08-26  8:34                     ` Bernd Petrovitsch
  2005-08-26 14:07                     ` Alan Cox
  1 sibling, 0 replies; 70+ messages in thread
From: Bernd Petrovitsch @ 2005-08-26  8:34 UTC (permalink / raw)
  To: Daniel B.; +Cc: Linux Kernel Mailing List, Alan Cox

On Thu, 2005-08-25 at 20:00 -0400, Daniel B. wrote:
> Alan Cox wrote:
> > On Sul, 2005-06-19 at 18:55, Pavel Machek wrote:
[...]
> > > If we are serious about utf-8 support in ext3, we should return
> > > -EINVAL if someone passes non-canonical utf-8 string.
> > 
> > That would ironically not be standards compliant
> 
> Which standards?

Probably POSIX, SuSv3 and similiar.

> The standards I've read (mostly XML- and web-related specs)
> do say that non-standard UTF-8 octet sequences should be rejected.

There you have basically text files with some structure in it and the
definiton/requirement that is is UTF-8.
At kernel level these are also just byte streams and the kernel doesn't
know or care which charset, encoding, file format, font etc. the data is
used with or interpreted several layers higher, e.g. for presenting to
the user. And the same holds for filenames.

	Bernd
-- 
Firmix Software GmbH                   http://www.firmix.at/
mobil: +43 664 4416156                 fax: +43 1 7890849-55
          Embedded Linux Development and Services


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: A Great Idea (tm) about reimplementing NLS.
  2005-08-26  0:00                   ` Daniel B.
  2005-08-26  8:34                     ` Bernd Petrovitsch
@ 2005-08-26 14:07                     ` Alan Cox
  1 sibling, 0 replies; 70+ messages in thread
From: Alan Cox @ 2005-08-26 14:07 UTC (permalink / raw)
  To: Daniel B.; +Cc: Linux Kernel Mailing List

On Iau, 2005-08-25 at 20:00 -0400, Daniel B. wrote:
> Which standards?

Traditional unix namespace is a sequence of bytes with '/' as a
seperator and \0 as a terminator. There are no other restrictions. UTF-8
is essentially a retrofit onto that.

> The standards I've read (mostly XML- and web-related specs)
> do say that non-standard UTF-8 octet sequences should be rejected.

If you follow the thread further various people pointed out that POSIX
and other standard documents are actually more restrictive in their
guarantees so my belief by a strict standards reading is wrong. It'll
break a few apps if you enforced it (lots if you took the minimal posix
requirement).



^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2005-08-26 13:38 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-06-13 10:38 A Great Idea (tm) about reimplementing NLS Alexey Zaytsev
2005-06-13 10:49 ` Stefan Smietanowski
2005-06-13 18:01   ` Islam Amer
2005-06-14  9:32     ` Islam Amer
2005-06-14 10:10       ` Måns Rullgård
2005-06-14 15:28         ` Islam Amer
2005-06-13 12:05 ` Bernd Petrovitsch
2005-06-13 13:54   ` Alexey Zaytsev
2005-06-13 14:32     ` Bernd Petrovitsch
2005-06-13 17:38       ` Alexey Zaytsev
2005-06-13 18:58         ` Måns Rullgård
2005-06-14  8:04           ` Alexander E. Patrakov
2005-06-14  9:05             ` Måns Rullgård
2005-06-15  8:26               ` Lukasz Stelmach
2005-06-15  8:54                 ` Patrick McFarland
2005-06-15  9:14                   ` Lukasz Stelmach
2005-06-15  9:41                     ` Måns Rullgård
2005-06-15 14:52                       ` Lukasz Stelmach
2005-06-15 21:28                         ` Lennart Sorensen
2005-06-15 23:34                           ` Lukasz Stelmach
2005-06-16  1:44                             ` Patrick McFarland
2005-06-16 10:38                               ` Måns Rullgård
2005-06-16 11:36                               ` Bernd Eckenfels
2005-06-16 20:41                               ` Rob Sims
2005-06-16  9:40                             ` Måns Rullgård
2005-06-18 14:48                               ` Lukasz Stelmach
2005-06-18 23:22                                 ` Måns Rullgård
2005-06-22  8:41                                   ` Lukasz Stelmach
2005-06-16 13:39                             ` Lennart Sorensen
2005-06-16 14:44                               ` Richard B. Johnson
2005-06-16 15:04                                 ` Lennart Sorensen
2005-06-17  1:18                                   ` Patrick McFarland
2005-06-17  8:21                                     ` Måns Rullgård
2005-06-17  8:49                                       ` Patrick McFarland
2005-06-17  9:17                                         ` Måns Rullgård
2005-06-17  9:37                                           ` Måns Rullgård
2005-06-17  9:41                                         ` Bernd Eckenfels
2005-06-17 13:09                                         ` Lennart Sorensen
2005-06-17 13:23                                           ` Måns Rullgård
2005-06-18 16:04                                             ` Robin Rosenberg
2005-06-18 18:06                                               ` Kari Hurtta
2005-06-18 21:57                                                 ` Robin Rosenberg
2005-06-18 19:09                                               ` Bernd Eckenfels
2005-06-17 12:56                                     ` Lennart Sorensen
2005-06-18 22:30                                   ` Bernd Eckenfels
2005-06-16  1:42                           ` Patrick McFarland
2005-06-13 13:35 ` Alan Cox
2005-06-13 17:20   ` Alexey Zaytsev
2005-06-13 19:20     ` Alan Cox
2005-06-13 19:38       ` Måns Rullgård
2005-06-13 20:31       ` Rutger Nijlunsing
2005-06-15 20:50       ` Alexey Zaytsev
2005-06-16  1:52         ` Patrick McFarland
2005-06-16 10:14           ` Alexey Zaytsev
2005-06-16  1:49       ` Patrick McFarland
2005-06-16  2:36         ` Theodore Ts'o
2005-06-16  2:59           ` Patrick McFarland
2005-06-16  4:33           ` Jeremy Maitin-Shepard
2005-06-16 14:37             ` Theodore Ts'o
2005-06-19 17:55               ` Pavel Machek
2005-06-20 18:38                 ` Alan Cox
2005-06-20 22:19                   ` Pavel Machek
2005-06-20 23:38                     ` Andreas Schwab
2005-08-26  0:00                   ` Daniel B.
2005-08-26  8:34                     ` Bernd Petrovitsch
2005-08-26 14:07                     ` Alan Cox
2005-06-15  9:13 ` Denis Vlasenko
2005-06-16  1:55   ` Patrick McFarland
2005-06-16  3:59     ` [RFC] Filesystem name storage (Was: A Great Idea (tm) about reimplementing NLS.) Kyle Moffett
2005-06-18 15:24       ` Lukasz Stelmach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).