linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] It's UTF-8
@ 2006-01-08 20:38 Alexey Dobriyan
  2006-01-08 21:46 ` Jan Engelhardt
  2006-01-09  8:28 ` Alexander E. Patrakov
  0 siblings, 2 replies; 11+ messages in thread
From: Alexey Dobriyan @ 2006-01-08 20:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 Documentation/filesystems/isofs.txt |    4 ++--
 Documentation/filesystems/jfs.txt   |    2 +-
 Documentation/filesystems/vfat.txt  |    6 +++---
 fs/befs/linuxvfs.c                  |    2 +-
 fs/cifs/CHANGES                     |    2 +-
 fs/fat/dir.c                        |    2 +-
 fs/fat/inode.c                      |    2 +-
 fs/isofs/joliet.c                   |    2 +-
 fs/nls/Kconfig                      |    2 +-
 include/asm-mips/termbits.h         |    2 +-
 include/linux/msdos_fs.h            |    2 +-
 11 files changed, 14 insertions(+), 14 deletions(-)

--- a/Documentation/filesystems/isofs.txt
+++ b/Documentation/filesystems/isofs.txt
@@ -9,9 +9,9 @@ when using discs encoded using Microsoft
   iocharset=name Character set to use for converting from Unicode to
 		ASCII.  Joliet filenames are stored in Unicode format, but
 		Unix for the most part doesn't know how to deal with Unicode.
-		There is also an option of doing UTF8 translations with the
+		There is also an option of doing UTF-8 translations with the
 		utf8 option.
-  utf8          Encode Unicode names in UTF8 format. Default is no.
+  utf8          Encode Unicode names in UTF-8 format. Default is no.
 
 Mount options unique to the isofs filesystem.
   block=512     Set the block size for the disk to 512 bytes
--- a/Documentation/filesystems/jfs.txt
+++ b/Documentation/filesystems/jfs.txt
@@ -6,7 +6,7 @@ The following mount options are supporte
 
 iocharset=name	Character set to use for converting from Unicode to
 		ASCII.  The default is to do no conversion.  Use
-		iocharset=utf8 for UTF8 translations.  This requires
+		iocharset=utf8 for UTF-8 translations.  This requires
 		CONFIG_NLS_UTF8 to be set in the kernel .config file.
 		iocharset=none specifies the default behavior explicitly.
 
--- a/Documentation/filesystems/vfat.txt
+++ b/Documentation/filesystems/vfat.txt
@@ -28,16 +28,16 @@ iocharset=name -- Character set to use f
 		 know how to deal with Unicode.
 		 By default, FAT_DEFAULT_IOCHARSET setting is used.
 
-		 There is also an option of doing UTF8 translations
+		 There is also an option of doing UTF-8 translations
 		 with the utf8 option.
 
 		 NOTE: "iocharset=utf8" is not recommended. If unsure,
 		 you should consider the following option instead.
 
-utf8=<bool>   -- UTF8 is the filesystem safe version of Unicode that
+utf8=<bool>   -- UTF-8 is the filesystem safe version of Unicode that
 		 is used by the console.  It can be be enabled for the
 		 filesystem with this option. If 'uni_xlate' gets set,
-		 UTF8 gets disabled.
+		 UTF-8 gets disabled.
 
 uni_xlate=<bool> -- Translate unhandled Unicode characters to special
 		 escaped sequences.  This would let you backup and
--- a/fs/befs/linuxvfs.c
+++ b/fs/befs/linuxvfs.c
@@ -561,7 +561,7 @@ befs_utf2nls(struct super_block *sb, con
  * @sb: Superblock
  * @src: Input string buffer in NLS format
  * @srclen: Length of input string in bytes
- * @dest: The output string in UTF8 format
+ * @dest: The output string in UTF-8 format
  * @destlen: Length of the output buffer
  * 
  * Converts input string @src, which is in the format of the loaded NLS map,
--- a/fs/cifs/CHANGES
+++ b/fs/cifs/CHANGES
@@ -150,7 +150,7 @@ improperly zeroed buffer in CIFS Unix ex
 Version 1.25
 ------------
 Fix internationalization problem in cifs readdir with filenames that map to 
-longer UTF8 strings than the string on the wire was in Unicode.  Add workaround
+longer UTF-8 strings than the string on the wire was in Unicode.  Add workaround
 for readdir to netapp servers. Fix search rewind (seek into readdir to return 
 non-consecutive entries).  Do not do readdir when server negotiates 
 buffer size to small to fit filename. Add support for reading POSIX ACLs from
--- a/fs/fat/dir.c
+++ b/fs/fat/dir.c
@@ -114,7 +114,7 @@ static inline int fat_get_entry(struct i
 }
 
 /*
- * Convert Unicode 16 to UTF8, translated Unicode, or ASCII.
+ * Convert Unicode 16 to UTF-8, translated Unicode, or ASCII.
  * If uni_xlate is enabled and we can't get a 1:1 conversion, use a
  * colon as an escape character since it is normally invalid on the vfat
  * filesystem. The following four characters are the hexadecimal digits
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -1016,7 +1016,7 @@ static int parse_options(char *options, 
 			return -EINVAL;
 		}
 	}
-	/* UTF8 doesn't provide FAT semantics */
+	/* UTF-8 doesn't provide FAT semantics */
 	if (!strcmp(opts->iocharset, "utf8")) {
 		printk(KERN_ERR "FAT: utf8 is not a recommended IO charset"
 		       " for FAT filesystems, filesystem will be case sensitive!\n");
--- a/fs/isofs/joliet.c
+++ b/fs/isofs/joliet.c
@@ -11,7 +11,7 @@
 #include "isofs.h"
 
 /*
- * Convert Unicode 16 to UTF8 or ASCII.
+ * Convert Unicode 16 to UTF-8 or ASCII.
  */
 static int
 uni16_to_x8(unsigned char *ascii, u16 *uni, int len, struct nls_table *nls)
--- a/fs/nls/Kconfig
+++ b/fs/nls/Kconfig
@@ -491,7 +491,7 @@ config NLS_KOI8_U
 	  (koi8-u) and Belarusian (koi8-ru) character sets.
 
 config NLS_UTF8
-	tristate "NLS UTF8"
+	tristate "NLS UTF-8"
 	depends on NLS
 	help
 	  If you want to display filenames with native language characters
--- a/include/asm-mips/termbits.h
+++ b/include/asm-mips/termbits.h
@@ -77,7 +77,7 @@ struct termios {
 #define IXANY	0004000		/* Any character will restart after stop.  */
 #define IXOFF	0010000		/* Enable start/stop input control.  */
 #define IMAXBEL	0020000		/* Ring bell when input queue is full.  */
-#define IUTF8	0040000		/* Input is UTF8 */
+#define IUTF8	0040000		/* Input is UTF-8 */
 
 /* c_oflag bits */
 #define OPOST	0000001		/* Perform output processing.  */
--- a/include/linux/msdos_fs.h
+++ b/include/linux/msdos_fs.h
@@ -199,7 +199,7 @@ struct fat_mount_options {
 		 sys_immutable:1, /* set = system files are immutable */
 		 dotsOK:1,        /* set = hidden and system files are named '.filename' */
 		 isvfat:1,        /* 0=no vfat long filename support, 1=vfat support */
-		 utf8:1,	  /* Use of UTF8 character set (Default) */
+		 utf8:1,	  /* Use of UTF-8 character set (Default) */
 		 unicode_xlate:1, /* create escape sequences for unhandled Unicode */
 		 numtail:1,       /* Does first alias have a numeric '~1' type tail? */
 		 atari:1,         /* Use Atari GEMDOS variation of MS-DOS fs */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] It's UTF-8
  2006-01-08 20:38 [PATCH] It's UTF-8 Alexey Dobriyan
@ 2006-01-08 21:46 ` Jan Engelhardt
  2006-01-08 22:09   ` Måns Rullgård
                     ` (3 more replies)
  2006-01-09  8:28 ` Alexander E. Patrakov
  1 sibling, 4 replies; 11+ messages in thread
From: Jan Engelhardt @ 2006-01-08 21:46 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Andrew Morton, linux-kernel


>Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>

I'd say ACK. However,

> iocharset=name	Character set to use for converting from Unicode to
> 		ASCII.  The default is to do no conversion.  Use
>-		iocharset=utf8 for UTF8 translations.  This requires
>+		iocharset=utf8 for UTF-8 translations.  This requires
> 		CONFIG_NLS_UTF8 to be set in the kernel .config file.

If you are really nitpicky about the "-", then it should also be 
"iocharset=utf-8" (and whereever else). Or what's the real purpose of 
adding the dashes in only half of the places, then?



Jan Engelhardt
-- 
| Alphagate Systems, http://alphagate.hopto.org/
| jengelh's site, http://jengelh.hopto.org/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] It's UTF-8
  2006-01-08 21:46 ` Jan Engelhardt
@ 2006-01-08 22:09   ` Måns Rullgård
  2006-01-08 22:10   ` Alistair John Strachan
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Måns Rullgård @ 2006-01-08 22:09 UTC (permalink / raw)
  To: linux-kernel

Jan Engelhardt <jengelh@linux01.gwdg.de> writes:

>>Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
>
> I'd say ACK. However,
>
>> iocharset=name	Character set to use for converting from Unicode to
>> 		ASCII.  The default is to do no conversion.  Use
>>-		iocharset=utf8 for UTF8 translations.  This requires
>>+		iocharset=utf8 for UTF-8 translations.  This requires
>> 		CONFIG_NLS_UTF8 to be set in the kernel .config file.
>
> If you are really nitpicky about the "-", then it should also be 
> "iocharset=utf-8" (and whereever else). Or what's the real purpose of 
> adding the dashes in only half of the places, then?

The patch only changes documentation/comments.  Changing other things
would break compatibility, and that's usually not a good idea for
cosmetic changes.

-- 
Måns Rullgård
mru@inprovide.com


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] It's UTF-8
  2006-01-08 21:46 ` Jan Engelhardt
  2006-01-08 22:09   ` Måns Rullgård
@ 2006-01-08 22:10   ` Alistair John Strachan
  2006-01-09  9:04     ` Vojtech Pavlik
  2006-01-08 22:25   ` Alexey Dobriyan
  2006-01-09 12:48   ` Kalin KOZHUHAROV
  3 siblings, 1 reply; 11+ messages in thread
From: Alistair John Strachan @ 2006-01-08 22:10 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Alexey Dobriyan, Andrew Morton, linux-kernel

On Sunday 08 January 2006 21:46, Jan Engelhardt wrote:
> >Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
>
> I'd say ACK. However,
>
> > iocharset=name	Character set to use for converting from Unicode to
> > 		ASCII.  The default is to do no conversion.  Use
> >-		iocharset=utf8 for UTF8 translations.  This requires
> >+		iocharset=utf8 for UTF-8 translations.  This requires
> > 		CONFIG_NLS_UTF8 to be set in the kernel .config file.
>
> If you are really nitpicky about the "-", then it should also be
> "iocharset=utf-8" (and whereever else). Or what's the real purpose of
> adding the dashes in only half of the places, then?

Also what's "Unicode 16" as used in several places in the kernel. Surely this 
should be changed to UTF-16, which is the _encoding_ for the unicode 
character space.

-- 
Cheers,
Alistair.

'No sense being pessimistic, it probably wouldn't work anyway.'
Third year Computer Science undergraduate.
1F2 55 South Clerk Street, Edinburgh, UK.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] It's UTF-8
  2006-01-08 21:46 ` Jan Engelhardt
  2006-01-08 22:09   ` Måns Rullgård
  2006-01-08 22:10   ` Alistair John Strachan
@ 2006-01-08 22:25   ` Alexey Dobriyan
  2006-01-09 12:48   ` Kalin KOZHUHAROV
  3 siblings, 0 replies; 11+ messages in thread
From: Alexey Dobriyan @ 2006-01-08 22:25 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Andrew Morton, linux-kernel

On Sun, Jan 08, 2006 at 10:46:22PM +0100, Jan Engelhardt wrote:
> > iocharset=name	Character set to use for converting from Unicode to
> > 		ASCII.  The default is to do no conversion.  Use
> >-		iocharset=utf8 for UTF8 translations.  This requires
> >+		iocharset=utf8 for UTF-8 translations.  This requires
> > 		CONFIG_NLS_UTF8 to be set in the kernel .config file.
>
> If you are really nitpicky about the "-", then it should also be
> "iocharset=utf-8" (and whereever else). Or what's the real purpose of
> adding the dashes in only half of the places, then?

I don't want to be shot by everyone who has "iocharset=utf8" in
/etc/fstab.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] It's UTF-8
  2006-01-08 20:38 [PATCH] It's UTF-8 Alexey Dobriyan
  2006-01-08 21:46 ` Jan Engelhardt
@ 2006-01-09  8:28 ` Alexander E. Patrakov
  2006-01-09 11:38   ` Krzysztof Halasa
  1 sibling, 1 reply; 11+ messages in thread
From: Alexander E. Patrakov @ 2006-01-09  8:28 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: linux-kernel

Alexey Dobriyan wrote:

>  	if (!strcmp(opts->iocharset, "utf8")) {
>  		printk(KERN_ERR "FAT: utf8 is not a recommended IO charset"
>  		       " for FAT filesystems, filesystem will be case sensitive!\n");

This warning better reads in such a way:

FAT: this is not the recommended filesystem for use with UTF-8 filenames.

Reason: the utf8 IO charset is the only IO charset that displays 
filenames properly in UTF-8 locales. So the choice is really between 
case-sensitive filenames (iocharset=utf8) and completely unreadable 
filenames (everything else).

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] It's UTF-8
  2006-01-08 22:10   ` Alistair John Strachan
@ 2006-01-09  9:04     ` Vojtech Pavlik
  0 siblings, 0 replies; 11+ messages in thread
From: Vojtech Pavlik @ 2006-01-09  9:04 UTC (permalink / raw)
  To: Alistair John Strachan
  Cc: Jan Engelhardt, Alexey Dobriyan, Andrew Morton, linux-kernel

On Sun, Jan 08, 2006 at 10:10:09PM +0000, Alistair John Strachan wrote:

> On Sunday 08 January 2006 21:46, Jan Engelhardt wrote:
> > >Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
> >
> > I'd say ACK. However,
> >
> > > iocharset=name	Character set to use for converting from Unicode to
> > > 		ASCII.  The default is to do no conversion.  Use
> > >-		iocharset=utf8 for UTF8 translations.  This requires
> > >+		iocharset=utf8 for UTF-8 translations.  This requires
> > > 		CONFIG_NLS_UTF8 to be set in the kernel .config file.
> >
> > If you are really nitpicky about the "-", then it should also be
> > "iocharset=utf-8" (and whereever else). Or what's the real purpose of
> > adding the dashes in only half of the places, then?
> 
> Also what's "Unicode 16" as used in several places in the kernel. Surely this 
> should be changed to UTF-16, which is the _encoding_ for the unicode 
> character space.
 
It might also be UCS-2 and not UTF-16 in some places. They do differ.

-- 
Vojtech Pavlik
SuSE Labs, SuSE CR

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] It's UTF-8
  2006-01-09  8:28 ` Alexander E. Patrakov
@ 2006-01-09 11:38   ` Krzysztof Halasa
  2006-01-09 18:44     ` Xavier Bestel
  0 siblings, 1 reply; 11+ messages in thread
From: Krzysztof Halasa @ 2006-01-09 11:38 UTC (permalink / raw)
  To: Alexander E. Patrakov; +Cc: Alexey Dobriyan, linux-kernel

"Alexander E. Patrakov" <patrakov@gmail.com> writes:

> Alexey Dobriyan wrote:
>
>>  	if (!strcmp(opts->iocharset, "utf8")) {
>>  		printk(KERN_ERR "FAT: utf8 is not a recommended IO charset"
>>  		       " for FAT filesystems, filesystem will be case sensitive!\n");
>
> This warning better reads in such a way:
>
> FAT: this is not the recommended filesystem for use with UTF-8 filenames.
>
> Reason: the utf8 IO charset is the only IO charset that displays
> filenames properly in UTF-8 locales. So the choice is really between
> case-sensitive filenames (iocharset=utf8) and completely unreadable
> filenames (everything else).

And UTF-8 locale seems to be the only really sane today. I'd kill the
whole warning.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] It's UTF-8
  2006-01-08 21:46 ` Jan Engelhardt
                     ` (2 preceding siblings ...)
  2006-01-08 22:25   ` Alexey Dobriyan
@ 2006-01-09 12:48   ` Kalin KOZHUHAROV
  3 siblings, 0 replies; 11+ messages in thread
From: Kalin KOZHUHAROV @ 2006-01-09 12:48 UTC (permalink / raw)
  To: linux-kernel

Jan Engelhardt wrote:
>>Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
> 
> 
> I'd say ACK. However,
> 
> 
>>iocharset=name	Character set to use for converting from Unicode to
>>		ASCII.  The default is to do no conversion.  Use
>>-		iocharset=utf8 for UTF8 translations.  This requires
>>+		iocharset=utf8 for UTF-8 translations.  This requires
>>		CONFIG_NLS_UTF8 to be set in the kernel .config file.
> 
> 
> If you are really nitpicky about the "-", then it should also be 
> "iocharset=utf-8" (and whereever else). Or what's the real purpose of 
> adding the dashes in only half of the places, then?

glibc was the starter, AFAIR. So both utf8 and UTF-8 are generally accepted, but utf-8 is not that
wide spread.

Kalin.

-- 
|[ ~~~~~~~~~~~~~~~~~~~~~~ ]|
+-> http://ThinRope.net/ <-+
|[ ______________________ ]|


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] It's UTF-8
  2006-01-09 11:38   ` Krzysztof Halasa
@ 2006-01-09 18:44     ` Xavier Bestel
  2006-01-10  0:12       ` Krzysztof Halasa
  0 siblings, 1 reply; 11+ messages in thread
From: Xavier Bestel @ 2006-01-09 18:44 UTC (permalink / raw)
  To: Krzysztof Halasa; +Cc: Alexander E. Patrakov, Alexey Dobriyan, linux-kernel

Le lundi 09 janvier 2006 à 12:38 +0100, Krzysztof Halasa a écrit :
> "Alexander E. Patrakov" <patrakov@gmail.com> writes:

> > FAT: this is not the recommended filesystem for use with UTF-8 filenames.
> >
> > Reason: the utf8 IO charset is the only IO charset that displays
> > filenames properly in UTF-8 locales. So the choice is really between
> > case-sensitive filenames (iocharset=utf8) and completely unreadable
> > filenames (everything else).
> 
> And UTF-8 locale seems to be the only really sane today. I'd kill the
> whole warning.

.. on unix. But FAT is a sort of lingua franca of filesystems, and is
the only one understandable by every (embedded) OS. So you'd better stay
compatible with everyone else.

	Xav



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] It's UTF-8
  2006-01-09 18:44     ` Xavier Bestel
@ 2006-01-10  0:12       ` Krzysztof Halasa
  0 siblings, 0 replies; 11+ messages in thread
From: Krzysztof Halasa @ 2006-01-10  0:12 UTC (permalink / raw)
  To: Xavier Bestel; +Cc: Alexander E. Patrakov, Alexey Dobriyan, linux-kernel

Xavier Bestel <xavier.bestel@free.fr> writes:

>> And UTF-8 locale seems to be the only really sane today. I'd kill the
>> whole warning.
>
> .. on unix. But FAT is a sort of lingua franca of filesystems, and is
> the only one understandable by every (embedded) OS. So you'd better stay
> compatible with everyone else.

You stay compatible. And you can even read files with national
characters in names.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-01-10  0:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-08 20:38 [PATCH] It's UTF-8 Alexey Dobriyan
2006-01-08 21:46 ` Jan Engelhardt
2006-01-08 22:09   ` Måns Rullgård
2006-01-08 22:10   ` Alistair John Strachan
2006-01-09  9:04     ` Vojtech Pavlik
2006-01-08 22:25   ` Alexey Dobriyan
2006-01-09 12:48   ` Kalin KOZHUHAROV
2006-01-09  8:28 ` Alexander E. Patrakov
2006-01-09 11:38   ` Krzysztof Halasa
2006-01-09 18:44     ` Xavier Bestel
2006-01-10  0:12       ` Krzysztof Halasa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).