linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: David Laight <David.Laight@aculab.com>
Cc: "'Pali Rohár'" <pali.rohar@gmail.com>,
	"OGAWA Hirofumi" <hirofumi@mail.parknet.co.jp>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"Theodore Y. Ts'o" <tytso@mit.edu>,
	"Namjae Jeon" <linkinjeon@gmail.com>,
	"Gabriel Krisman Bertazi" <krisman@collabora.com>
Subject: Re: vfat: Broken case-insensitive support for UTF-8
Date: Mon, 20 Jan 2020 16:12:06 +0000	[thread overview]
Message-ID: <20200120161206.GC8904@ZenIV.linux.org.uk> (raw)
In-Reply-To: <1a4c545dc7f14e33b7e59321a0aab868@AcuMS.aculab.com>

On Mon, Jan 20, 2020 at 03:47:22PM +0000, David Laight wrote:
> From: Pali Rohár
> > Sent: 20 January 2020 15:20
> ...
> > This is not possible. There is 1:1 mapping between UTF-8 sequence and
> > Unicode code point. wchar_t in kernel represent either one Unicode code
> > point (limited up to U+FFFF in NLS framework functions) or 2bytes in
> > UTF-16 sequence (only in utf8s_to_utf16s() and utf16s_to_utf8s()
> > functions).
> 
> Unfortunately there is neither a 1:1 mapping of all possible byte sequences
> to wchar_t (or unicode code points), nor a 1:1 mapping of all possible
> wchar_t values to UTF-8.
> Really both need to be defined - even for otherwise 'invalid' sequences.

Who.  Cares?

Filename is a sequence of octets, not codepoints.  Its interpretation is
entirely up to the userland.

Same goes for the notion of "case" (locale-dependent, etc.); some
filesystems impose their (arbitrary) restrictions on the possible
octet sequences (and equally arbitrary equivalence relations between
them) that can be approximated in terms of upper/lower case in some
locale.  It does not matter how arbitrary those are, or what stands
behind them:
	* don't do that for any new filesystem designs
	* for existing filesystem types, the actual behaviour of
native implementation IS THE ONE AND ONLY AUTHORITY.  It does not
matter from what misguided thought process it has come from;
the absolute requirement is that if you mount a filesystem valid
from the native implementation POV, you must leave it in a state
that would be valid from the native implementation POV.  That's
it.

Any talk about normalization, etc. is completely pointless -
for any sane uses it's an opaque stream of octets that filesystem
and VFS should leave the fuck alone.  Codepoints, encodings, etc.
come into the game only to an extent they are useful to describe
the weird rules given filesystem might have.  And they are just
that - tools to describe externally imposed mappings.

  reply	other threads:[~2020-01-20 16:12 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-19 22:14 vfat: Broken case-insensitive support for UTF-8 Pali Rohár
2020-01-19 23:08 ` Al Viro
2020-01-19 23:33   ` Pali Rohár
2020-01-20  0:09     ` Al Viro
2020-01-20 11:19       ` Pali Rohár
2020-01-20  4:04 ` OGAWA Hirofumi
2020-01-20  7:30   ` Al Viro
2020-01-20  7:45     ` Al Viro
2020-01-20  8:07       ` oopsably broken case-insensitive support in ext4 and f2fs (Re: vfat: Broken case-insensitive support for UTF-8) Al Viro
2020-01-20 19:35         ` Al Viro
2020-01-24  4:29           ` Eric Biggers
2020-01-24 17:47             ` Linus Torvalds
2020-01-24 18:03               ` Jaegeuk Kim
2020-01-24 18:45                 ` Eric Biggers
2020-01-20 11:04   ` vfat: Broken case-insensitive support for UTF-8 Pali Rohár
2020-01-20 12:07     ` OGAWA Hirofumi
2020-01-20 21:40       ` Pali Rohár
2020-01-20 22:46         ` Al Viro
2020-01-20 23:57           ` Pali Rohár
2020-01-21  0:07             ` Al Viro
2020-01-21 20:34               ` Pali Rohár
2020-01-21 21:36                 ` Al Viro
2020-01-21 22:14                   ` Al Viro
2020-01-21 22:46                     ` Pali Rohár
2020-01-26 23:08                 ` Pali Rohár
2020-01-21 12:43             ` David Laight
2020-01-22  0:25         ` Gabriel Krisman Bertazi
2020-01-20 15:07     ` David Laight
2020-01-20 15:20       ` Pali Rohár
2020-01-20 15:47         ` David Laight
2020-01-20 16:12           ` Al Viro [this message]
2020-01-20 16:51             ` David Laight
2020-01-20 16:27           ` Pali Rohár
2020-01-20 16:43             ` David Laight
2020-01-20 16:56               ` Pali Rohár
2020-01-20 17:37       ` Theodore Y. Ts'o
2020-01-20 17:32   ` Theodore Y. Ts'o
2020-01-20 17:56     ` Pali Rohár
2020-01-21  3:52     ` OGAWA Hirofumi
2020-01-21 11:00       ` Pali Rohár
2020-01-21 12:26         ` OGAWA Hirofumi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200120161206.GC8904@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=David.Laight@aculab.com \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=krisman@collabora.com \
    --cc=linkinjeon@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pali.rohar@gmail.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).