linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali.rohar@gmail.com>
To: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	"Theodore Y. Ts'o" <tytso@mit.edu>,
	Namjae Jeon <linkinjeon@gmail.com>,
	Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: vfat: Broken case-insensitive support for UTF-8
Date: Mon, 20 Jan 2020 12:04:38 +0100	[thread overview]
Message-ID: <20200120110438.ak7jpyy66clx5v6x@pali> (raw)
In-Reply-To: <87sgkan57p.fsf@mail.parknet.co.jp>

On Monday 20 January 2020 13:04:42 OGAWA Hirofumi wrote:
> Pali Rohár <pali.rohar@gmail.com> writes:
> 
> > Which means that fat_name_match(), vfat_hashi() and vfat_cmpi() are
> > broken for vfat in UTF-8 mode.
> 
> Right. It is a known issue.

Could be this issue better documented? E.g. in mount(8) manpage where
are written mount options for vfat? I think that people should be aware
of this issue when they use "utf8=1" mount option.

> > I was thinking how to fix it, and the only possible way is to write a
> > uni_tolower() function which takes one Unicode code point and returns
> > lowercase of input's Unicode code point. We cannot do any Unicode
> > normalization as VFAT specification does not say anything about it and
> > MS reference fastfat.sys implementation does not do it neither.
> >
> > So, what would be the best option for implementing that function?
> >
> >   unicode_t uni_tolower(unicode_t u);
> >
> > Could a new fs/unicode code help with it? Or it is too tied with NFD
> > normalization and therefore cannot be easily used or extended?
> 
> To be perfect, the table would have to emulate what Windows use. It can
> be unicode standard, or something other.

Windows FAT32 implementation (fastfat.sys) is opensource. So it should
be possible to inspect code and figure out how it is working.

I will try to look at it.

> And other fs can use different what Windows use.
> 
> So the table would have to be switchable in perfect world (if there is
> no consensus to use 1 table).  If we use switchable table, I think it
> would be better to put in userspace, and loadable like firmware data.
> 
> Well, so then it would not be simple work (especially, to be perfect).

Switchable table is not really simple and I think as a first step would
be enough to have one (hardcoded) table for UTF-8. Like we have for all
other encodings.

> Also, not directly same issue though. There is related issue for
> case-insensitive. Even if we use some sort of internal wide char
> (e.g. in nls, 16bits), dcache is holding name in user's encode
> (e.g. utf8). So inefficient to convert cached name to wide char for each
> access.

Yes, this is truth. But this conversion is already doing exFAT
implementation. I think we do not have other choice if we want Windows
compatible implementation.

> Relatively recent EXT4 case-insensitive may tackled this though, I'm not
> checking it yet.
> 
> > New exfat code which is under review and hopefully would be merged,
> > contains own unicode upcase table (as defined by exfat specification) so
> > as exfat is similar to FAT32, maybe reusing it would be a better option?
> 
> exfat just put a case conversion table in fs. So I don't think it helps
> fatfs.

exfat has fallback conversion table (hardcoded in driver) which is used
when fs itself does not have conversion table. This is mandated by exfat
specification. Part of exFAT specification is that default conversion
table.

I was thinking... as both VFAT and exFAT are MS standard and exFAT is
just evolved FAT32 we could use that exFAT default conversion table
(which is prevent in that exfat driver).

-- 
Pali Rohár
pali.rohar@gmail.com

  parent reply	other threads:[~2020-01-20 11:04 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-19 22:14 vfat: Broken case-insensitive support for UTF-8 Pali Rohár
2020-01-19 23:08 ` Al Viro
2020-01-19 23:33   ` Pali Rohár
2020-01-20  0:09     ` Al Viro
2020-01-20 11:19       ` Pali Rohár
2020-01-20  4:04 ` OGAWA Hirofumi
2020-01-20  7:30   ` Al Viro
2020-01-20  7:45     ` Al Viro
2020-01-20  8:07       ` oopsably broken case-insensitive support in ext4 and f2fs (Re: vfat: Broken case-insensitive support for UTF-8) Al Viro
2020-01-20 19:35         ` Al Viro
2020-01-24  4:29           ` Eric Biggers
2020-01-24 17:47             ` Linus Torvalds
2020-01-24 18:03               ` Jaegeuk Kim
2020-01-24 18:45                 ` Eric Biggers
2020-01-20 11:04   ` Pali Rohár [this message]
2020-01-20 12:07     ` vfat: Broken case-insensitive support for UTF-8 OGAWA Hirofumi
2020-01-20 21:40       ` Pali Rohár
2020-01-20 22:46         ` Al Viro
2020-01-20 23:57           ` Pali Rohár
2020-01-21  0:07             ` Al Viro
2020-01-21 20:34               ` Pali Rohár
2020-01-21 21:36                 ` Al Viro
2020-01-21 22:14                   ` Al Viro
2020-01-21 22:46                     ` Pali Rohár
2020-01-26 23:08                 ` Pali Rohár
2020-01-21 12:43             ` David Laight
2020-01-22  0:25         ` Gabriel Krisman Bertazi
2020-01-20 15:07     ` David Laight
2020-01-20 15:20       ` Pali Rohár
2020-01-20 15:47         ` David Laight
2020-01-20 16:12           ` Al Viro
2020-01-20 16:51             ` David Laight
2020-01-20 16:27           ` Pali Rohár
2020-01-20 16:43             ` David Laight
2020-01-20 16:56               ` Pali Rohár
2020-01-20 17:37       ` Theodore Y. Ts'o
2020-01-20 17:32   ` Theodore Y. Ts'o
2020-01-20 17:56     ` Pali Rohár
2020-01-21  3:52     ` OGAWA Hirofumi
2020-01-21 11:00       ` Pali Rohár
2020-01-21 12:26         ` OGAWA Hirofumi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200120110438.ak7jpyy66clx5v6x@pali \
    --to=pali.rohar@gmail.com \
    --cc=hirofumi@mail.parknet.co.jp \
    --cc=krisman@collabora.com \
    --cc=linkinjeon@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).