linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Russian encoding support for MacHFS
@ 2005-01-11  8:58 Pavel Fedin
  0 siblings, 0 replies; 7+ messages in thread
From: Pavel Fedin @ 2005-01-11  8:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Sven Luther

[-- Attachment #1: Type: text/plain, Size: 1161 bytes --]

 Hello, guys! I'd like to present to the community my second kernel project. This patch adds support for russian characters on MacHFS volumes if you use koi8-r encoding on Linux (this is the common case in Russia).
 The implementation is probably not very good because it uses its own tables instead of NLS modules. Using NLS modules i consider impossible because due to MacHFS nature (at least current implementation nature) names must be supplied in MacOS encoding for proper searching. This means that you must to be able to reverse-translate all names from Linux encoding to Mac encoding. Using NLS causes characters loss if requested character does not exist in the table (it is substituted by '?'). Macintosh disks often contains specific characters in file names ("Folder" character for example) which will be lost in this case.
 If someone has some idea how to fix this you're welcome. I currently don't see a way to make the thing better because i don't know internal HFS structure. Probably using utf8 as host encoding would solve the problem but it's not commonly used in Russia.

-- 
Best regards,
Pavel Fedin,									mailto:sonic_amiga@rambler.ru

[-- Attachment #2: hfs-koi8r.diff.bz2 --]
[-- Type: application/x-bzip2, Size: 4460 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Russian encoding support for MacHFS
  2005-01-25  9:35   ` Pavel Fedin
  2005-01-25 15:34     ` Roman Zippel
@ 2005-01-27  8:29     ` Alex Riesen
  1 sibling, 0 replies; 7+ messages in thread
From: Alex Riesen @ 2005-01-27  8:29 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: linux-kernel

On Tue, 25 Jan 2005 12:35:16 +0300, Pavel Fedin <sonic_amiga@rambler.ru> wrote:
> > how about just leave the characters unchanged? (remap them to the same
> > codes in Unicode).
> 
>  But what to do when i convert then from unicode to 8-bit iocharset? This can lead to that several characters in Mac charset will be converted to the same character in Linux charset. This will lead to information loss and name will not be reverse-translatable.
>  To describe the thing better: i have 8-bit Mac encoding and 8-bit target encoding (iocharset). I need to convert from (1) to (2) and be able to convert back. I tried to perform a one-way conversion like in other filesystems but this didn't work.
>  Probably NLS tables can be used when iocharset is UTF8. If you wish i can try to implement it after some time.

remap unicode character missing in filesystem codepage into something like '?'.
I believe this is what nls routines do if converter returns -1 (error).
You'd loose the new characters, right. But you'd loose them anyway, as they
have no place in mac software.

> > Unicode, and its encoding UTF8 IS commonly used everywhere.
> > And Russia can (and often does) use it just as well.
> 
>  Many people say many software is not UTF8-ready yet. Anyway i had problems when tried to use it. Many russian ASCII documents use 8-bit encoding so i need to be able to deal with them. Many software assumes that 1 byte is 1 character.

just fix that software instead of polluting the kernel.

And besides: software which _does_ work with unicode,
can make a good use of an nls module for HFS.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Russian encoding support for MacHFS
  2005-01-25 15:34     ` Roman Zippel
@ 2005-01-26  6:30       ` Pavel Fedin
  0 siblings, 0 replies; 7+ messages in thread
From: Pavel Fedin @ 2005-01-26  6:30 UTC (permalink / raw)
  To: Roman Zippel; +Cc: linux-kernel

On Tue, 25 Jan 2005 16:34:57 +0100 (CET)
Roman Zippel <zippel@linux-m68k.org> wrote:

> I'm not quite sure, what problem you're trying to solve here.

 I am trying to implement character sets conversion for MacHFS. I have some CD s with russian file names. Currently they are not displayed properly because Linux uses KOI8-R character set for russian letters and Macintosh uses its own character set called Mac-cyrillic or codepage 10007.
 Firstly i tried to implement character set conversion using NLS tables. It was done using "iocharset" and "codepage" arguments. "Iocharset" specified Linux's local character set and "codepage" specified HFS's character set. So to convert a character i needed to process it twice: convert from "codepage" to Unicode and then convert from Unicode to "iocharset".
 The problem with this is that some characters will be lost during this conversion. Not all characters from source ("codepage") charset are present in destination ("iocharset") charset table (for example "Folder" sign). But for proper operation of dir.c/hfs_lookup() function we need to be able to convert the name back from KOI8-R to CP10007 otherwise searching algorythm will fail. This will lead to that we won't be able to operate with any file which contains such a characters.
 A solution was to use my own conversion table which ensures that no characters will be lost during conversion in both directions. Every unique source character is translated to some unique destination character. Of course Mac-specific characters are not displayed properly but they're not lost either. "codepage" argument was omitted for simplicity because specific "iocharset" implies specific "codepage" (for example if iocharset is koi8-r then we can assume that Macintosh codepage is mac-cyrillic). But some people said that this patch can't be approved because not using NLS is bad solution. So i'd like to talk to you, may be we'll find a better solution (because you know HFS better than me) or we can come to a conclusion that there is really no solution and push the patch upstream.

> If you want to store unicode characters use HFS+, I plan to implement nls 
> support real soon for it (especially because to also fix the missing 
> decomposition support). 

 Would be nice. I also thought about it but i have no HFS+ disks with russian names so i can't test it. And i decided not to do a "blind" implementation in order not to break the filesystem. Currently my patch adds "iocharset" argumnent to HFS+ also (so that i can specify both filesystems in one /etc/fstab line, this is useful for CD-ROM) but it is ignored there.

-- 
Best regards,
Pavel Fedin,									mailto:sonic_amiga@rambler.ru

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Russian encoding support for MacHFS
  2005-01-25  9:35   ` Pavel Fedin
@ 2005-01-25 15:34     ` Roman Zippel
  2005-01-26  6:30       ` Pavel Fedin
  2005-01-27  8:29     ` Alex Riesen
  1 sibling, 1 reply; 7+ messages in thread
From: Roman Zippel @ 2005-01-25 15:34 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: Alex Riesen, linux-kernel

Hi,

On Tue, 25 Jan 2005, Pavel Fedin wrote:

> > how about just leave the characters unchanged? (remap them to the same
> > codes in Unicode).
> 
>  But what to do when i convert then from unicode to 8-bit iocharset? 
> This can lead to that several characters in Mac charset will be 
> converted to the same character in Linux charset. This will lead to 
> information loss and name will not be reverse-translatable.
>  To describe the thing better: i have 8-bit Mac encoding and 8-bit 
> target encoding (iocharset). I need to convert from (1) to (2) and be 
> able to convert back. I tried to perform a one-way conversion like in 
> other filesystems but this didn't work.
>  Probably NLS tables can be used when iocharset is UTF8. If you wish i 
> can try to implement it after some time.

I'm not quite sure, what problem you're trying to solve here. NLS is used 
to convert from a local encoding to unicode, HFS has only 8bit 
characters, so there isn't much space to store the unicode characters in. 
If you want to use utf-8, you can do so without changing hfs. All 
filesystem which don't use nls (that includes e.g. ext3) store the 
filename in the local encoding.
If you want to store unicode characters use HFS+, I plan to implement nls 
support real soon for it (especially because to also fix the missing 
decomposition support). 

bye, Roman

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Russian encoding support for MacHFS
  2005-01-24 18:46 ` Alex Riesen
@ 2005-01-25  9:35   ` Pavel Fedin
  2005-01-25 15:34     ` Roman Zippel
  2005-01-27  8:29     ` Alex Riesen
  0 siblings, 2 replies; 7+ messages in thread
From: Pavel Fedin @ 2005-01-25  9:35 UTC (permalink / raw)
  To: Alex Riesen; +Cc: linux-kernel

On Mon, 24 Jan 2005 19:46:18 +0100
Alex Riesen <raa.lkml@gmail.com> wrote:

> how about just leave the characters unchanged? (remap them to the same
> codes in Unicode).

 But what to do when i convert then from unicode to 8-bit iocharset? This can lead to that several characters in Mac charset will be converted to the same character in Linux charset. This will lead to information loss and name will not be reverse-translatable.
 To describe the thing better: i have 8-bit Mac encoding and 8-bit target encoding (iocharset). I need to convert from (1) to (2) and be able to convert back. I tried to perform a one-way conversion like in other filesystems but this didn't work.
 Probably NLS tables can be used when iocharset is UTF8. If you wish i can try to implement it after some time.

> Unicode, and its encoding UTF8 IS commonly used everywhere.
> And Russia can (and often does) use it just as well.

 Many people say many software is not UTF8-ready yet. Anyway i had problems when tried to use it. Many russian ASCII documents use 8-bit encoding so i need to be able to deal with them. Many software assumes that 1 byte is 1 character.

> P.S. Read Documentation/SubmittingPatches.

 Ok. Sorry for violations.

> What kernel is the patch against?

 2.6.8.

-- 
Best regards,
Pavel Fedin,									mailto:sonic_amiga@rambler.ru

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Russian encoding support for MacHFS
  2005-01-24  9:57 Pavel Fedin
@ 2005-01-24 18:46 ` Alex Riesen
  2005-01-25  9:35   ` Pavel Fedin
  0 siblings, 1 reply; 7+ messages in thread
From: Alex Riesen @ 2005-01-24 18:46 UTC (permalink / raw)
  To: Pavel Fedin; +Cc: linux-kernel

(could you please use shorter lines? Around 80 is good. It's difficult to read).

On Mon, 24 Jan 2005 12:57:56 +0300, Pavel Fedin <sonic_amiga@rambler.ru> wrote:
> ... This means that you must to be able to reverse-translate all names from
> Linux encoding to Mac encoding. Using NLS causes characters loss if
> requested character does not exist in the table (it is substituted by '?').
> Macintosh disks often contains specific characters in file names
> ("Folder" character for example) which will be lost in this case.

how about just leave the characters unchanged? (remap them to the same
codes in Unicode).

> Probably using utf8 as host encoding would solve the problem but it's not
> commonly used in Russia.

Unicode, and its encoding UTF8 IS commonly used everywhere.
And Russia can (and often does) use it just as well.

P.S. Read Documentation/SubmittingPatches.
What kernel is the patch against?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] Russian encoding support for MacHFS
@ 2005-01-24  9:57 Pavel Fedin
  2005-01-24 18:46 ` Alex Riesen
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Fedin @ 2005-01-24  9:57 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1162 bytes --]

 Hello, guys! I'd like to present to the community my second kernel project. This patch adds support for russian characters on MacHFS volumes if you use koi8-r encoding on Linux (this is the common case in Russia).
 The implementation is probably not very good because it uses its own tables instead of NLS modules. Using NLS modules i consider impossible because due to MacHFS nature (at least current implementation nature) names must be supplied in MacOS encoding for proper searching. This means that you must to be able to reverse-translate all names from Linux encoding to Mac encoding. Using NLS causes characters loss if requested character does not exist in the table (it is substituted by '?'). Macintosh disks often contains specific characters in file names ("Folder" character for example) which will be lost in this case.
 If someone has some idea how to fix this you're welcome. I currently don't see a way to make the thing better because i don't know internal HFS structure. Probably using utf8 as host encoding would solve the problem but it's not commonly used in Russia.

-- 
Best regards,
Pavel Fedin,									mailto:sonic_amiga@rambler.ru


[-- Attachment #2: hfs-koi8r.diff.bz2 --]
[-- Type: application/x-bzip2, Size: 4460 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2005-01-27  8:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-11  8:58 [PATCH] Russian encoding support for MacHFS Pavel Fedin
2005-01-24  9:57 Pavel Fedin
2005-01-24 18:46 ` Alex Riesen
2005-01-25  9:35   ` Pavel Fedin
2005-01-25 15:34     ` Roman Zippel
2005-01-26  6:30       ` Pavel Fedin
2005-01-27  8:29     ` Alex Riesen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).