linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali.rohar@gmail.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	"Theodore Y. Ts'o" <tytso@mit.edu>,
	OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
	Namjae Jeon <linkinjeon@gmail.com>,
	Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: vfat: Broken case-insensitive support for UTF-8
Date: Mon, 20 Jan 2020 00:33:48 +0100	[thread overview]
Message-ID: <20200119233348.es5m63kapdvyesal@pali> (raw)
In-Reply-To: <20200119230809.GW8904@ZenIV.linux.org.uk>

[-- Attachment #1: Type: text/plain, Size: 2527 bytes --]

On Sunday 19 January 2020 23:08:09 Al Viro wrote:
> On Sun, Jan 19, 2020 at 11:14:55PM +0100, Pali Rohár wrote:
> 
> > So when UTF-8 on VFS for VFAT is enabled, then for VFS <--> VFAT
> > conversion are used utf16s_to_utf8s() and utf8s_to_utf16s() functions.
> > But in fat_name_match(), vfat_hashi() and vfat_cmpi() functions is used
> > NLS table (default iso8859-1) with nls_strnicmp() and nls_tolower().
> > 
> > Which means that fat_name_match(), vfat_hashi() and vfat_cmpi() are
> > broken for vfat in UTF-8 mode.
> > 
> > I was thinking how to fix it, and the only possible way is to write a
> > uni_tolower() function which takes one Unicode code point and returns
> > lowercase of input's Unicode code point. We cannot do any Unicode
> > normalization as VFAT specification does not say anything about it and
> > MS reference fastfat.sys implementation does not do it neither.
> 
> Then how can that possibly be broken?  If it matches the native behaviour,
> that's it.

VFAT is case insensitive.

> > As you can see lowercase 'd' and uppercase 'D' are same, but lowercase
> > 'č' and uppercase 'Č' are not same. This is because 'č' is two bytes
> > 0xc4 0x8d sequence and comparing is done by Latin1 table. 0xc4 is in
> > Latin 'Ä' which is already in uppercase. 0x8d is control char so is not
> > changed by tolower/toupper function.
> 
> Again, who the hell cares?

All users who use also non-Linux fat implementations.

> Does the behaviour match how Windows handles that thing?

Linux behavior does not match Windows behavior.

On Windows is FAT32 (fastfat.sys) case insensitive and file names "č"
and "Č" are treated as same file. Windows does not allow you to create
both files. It says that file already exists.

> "Case" is not something well-defined; the only definition
> is "whatever weird crap does the native implementation choose to do".

You are right that case sensitiveness is not well-defined, but in
Unicode we have also language-independent and basically well-defined
conversion.

And because VFAT is Unicode fs (internally UTF-16) it make sense that
well-defined Unicode folding should be used.

> That's the only reason to support that garbage at all...

What do you mean by garbage? Where? All filenames which I specified are
valid UTF-8 sequences, valid Unicode code points and therefore have
valid UTF-16 representation stored in VFAT fs.

Sorry, but I did not understand your comment.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  reply	other threads:[~2020-01-19 23:33 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-19 22:14 vfat: Broken case-insensitive support for UTF-8 Pali Rohár
2020-01-19 23:08 ` Al Viro
2020-01-19 23:33   ` Pali Rohár [this message]
2020-01-20  0:09     ` Al Viro
2020-01-20 11:19       ` Pali Rohár
2020-01-20  4:04 ` OGAWA Hirofumi
2020-01-20  7:30   ` Al Viro
2020-01-20  7:45     ` Al Viro
2020-01-20  8:07       ` oopsably broken case-insensitive support in ext4 and f2fs (Re: vfat: Broken case-insensitive support for UTF-8) Al Viro
2020-01-20 19:35         ` Al Viro
2020-01-24  4:29           ` Eric Biggers
2020-01-24 17:47             ` Linus Torvalds
2020-01-24 18:03               ` Jaegeuk Kim
2020-01-24 18:45                 ` Eric Biggers
2020-01-20 11:04   ` vfat: Broken case-insensitive support for UTF-8 Pali Rohár
2020-01-20 12:07     ` OGAWA Hirofumi
2020-01-20 21:40       ` Pali Rohár
2020-01-20 22:46         ` Al Viro
2020-01-20 23:57           ` Pali Rohár
2020-01-21  0:07             ` Al Viro
2020-01-21 20:34               ` Pali Rohár
2020-01-21 21:36                 ` Al Viro
2020-01-21 22:14                   ` Al Viro
2020-01-21 22:46                     ` Pali Rohár
2020-01-26 23:08                 ` Pali Rohár
2020-01-21 12:43             ` David Laight
2020-01-22  0:25         ` Gabriel Krisman Bertazi
2020-01-20 15:07     ` David Laight
2020-01-20 15:20       ` Pali Rohár
2020-01-20 15:47         ` David Laight
2020-01-20 16:12           ` Al Viro
2020-01-20 16:51             ` David Laight
2020-01-20 16:27           ` Pali Rohár
2020-01-20 16:43             ` David Laight
2020-01-20 16:56               ` Pali Rohár
2020-01-20 17:37       ` Theodore Y. Ts'o
2020-01-20 17:32   ` Theodore Y. Ts'o
2020-01-20 17:56     ` Pali Rohár
2020-01-21  3:52     ` OGAWA Hirofumi
2020-01-21 11:00       ` Pali Rohár
2020-01-21 12:26         ` OGAWA Hirofumi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200119233348.es5m63kapdvyesal@pali \
    --to=pali.rohar@gmail.com \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=krisman@collabora.com \
    --cc=linkinjeon@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).