From: Gabriel Krisman Bertazi <krisman@collabora.com>
To: "Pali Rohár" <pali.rohar@gmail.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
"Theodore Y. Ts'o" <tytso@mit.edu>,
Namjae Jeon <linkinjeon@gmail.com>
Subject: Re: vfat: Broken case-insensitive support for UTF-8
Date: Tue, 21 Jan 2020 19:25:18 -0500 [thread overview]
Message-ID: <85wo9knxqp.fsf@collabora.com> (raw)
In-Reply-To: <20200120214046.f6uq7rlih7diqahz@pali> ("Pali =?utf-8?Q?Roh?= =?utf-8?Q?=C3=A1r=22's?= message of "Mon, 20 Jan 2020 22:40:46 +0100")
Pali Rohár <pali.rohar@gmail.com> writes:
> On Monday 20 January 2020 21:07:12 OGAWA Hirofumi wrote:
>> Pali Rohár <pali.rohar@gmail.com> writes:
>>
>> >> To be perfect, the table would have to emulate what Windows use. It can
>> >> be unicode standard, or something other.
>> >
>> > Windows FAT32 implementation (fastfat.sys) is opensource. So it should
>> > be possible to inspect code and figure out how it is working.
>> >
>> > I will try to look at it.
>>
>> I don't think the conversion library is not in fs driver though,
>> checking implement itself would be good.
>
> Ok, I did some research. It took me it longer as I thought as lot of
> stuff is undocumented and hard to find all relevant information.
>
> So... fastfat.sys is using ntos function RtlUpcaseUnicodeString() which
> takes UTF-16 string and returns upper case UTF-16 string. There is no
> mapping table in fastfat.sys driver itself.
>
> RtlUpcaseUnicodeString() is a ntos kernel function and after my research
> it seems that this function is using only conversion table stored in
> file l_intl.nls (from c:\windows\system32).
>
> Project wine describe this file as "unicode casing tables" and seems
> that it can parse this file format. Even more it distributes its own
> version of this file which looks like to be generated from official
> Unicode UnicodeData.txt via Perl script make_unicode (part of wine).
>
> So question is... how much is MS changing l_intl.nls file in their
> released Windows versions?
>
> I would try to decode what is format of that file l_intl.nls and try to
> compare data in it from some Windows versions.
>
> Can we reuse upper case mapping table from that file?
Regarding fs/unicode, we have some infrastructure to parse UCD files,
handle unicode versioning, and store the data in a more compact
structure. See the mkutf8data script.
Right now, we only store the mapping of the code-point to the NFD + full
casefold, but it would be possible to extend the parsing script to store
the un-normalized uppercase version in the data structure. So, if
l_intl.nls is generated from UnicodeData.txt, you might consider to
extend fs/unicode to store it. We store the code-points in an optimized
format to decode utf-8, but the infrastructure is half way there
already.
--
Gabriel Krisman Bertazi
next prev parent reply other threads:[~2020-01-22 0:25 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-19 22:14 vfat: Broken case-insensitive support for UTF-8 Pali Rohár
2020-01-19 23:08 ` Al Viro
2020-01-19 23:33 ` Pali Rohár
2020-01-20 0:09 ` Al Viro
2020-01-20 11:19 ` Pali Rohár
2020-01-20 4:04 ` OGAWA Hirofumi
2020-01-20 7:30 ` Al Viro
2020-01-20 7:45 ` Al Viro
2020-01-20 8:07 ` oopsably broken case-insensitive support in ext4 and f2fs (Re: vfat: Broken case-insensitive support for UTF-8) Al Viro
2020-01-20 19:35 ` Al Viro
2020-01-24 4:29 ` Eric Biggers
2020-01-24 17:47 ` Linus Torvalds
2020-01-24 18:03 ` Jaegeuk Kim
2020-01-24 18:45 ` Eric Biggers
2020-01-20 11:04 ` vfat: Broken case-insensitive support for UTF-8 Pali Rohár
2020-01-20 12:07 ` OGAWA Hirofumi
2020-01-20 21:40 ` Pali Rohár
2020-01-20 22:46 ` Al Viro
2020-01-20 23:57 ` Pali Rohár
2020-01-21 0:07 ` Al Viro
2020-01-21 20:34 ` Pali Rohár
2020-01-21 21:36 ` Al Viro
2020-01-21 22:14 ` Al Viro
2020-01-21 22:46 ` Pali Rohár
2020-01-26 23:08 ` Pali Rohár
2020-01-21 12:43 ` David Laight
2020-01-22 0:25 ` Gabriel Krisman Bertazi [this message]
2020-01-20 15:07 ` David Laight
2020-01-20 15:20 ` Pali Rohár
2020-01-20 15:47 ` David Laight
2020-01-20 16:12 ` Al Viro
2020-01-20 16:51 ` David Laight
2020-01-20 16:27 ` Pali Rohár
2020-01-20 16:43 ` David Laight
2020-01-20 16:56 ` Pali Rohár
2020-01-20 17:37 ` Theodore Y. Ts'o
2020-01-20 17:32 ` Theodore Y. Ts'o
2020-01-20 17:56 ` Pali Rohár
2020-01-21 3:52 ` OGAWA Hirofumi
2020-01-21 11:00 ` Pali Rohár
2020-01-21 12:26 ` OGAWA Hirofumi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=85wo9knxqp.fsf@collabora.com \
--to=krisman@collabora.com \
--cc=hirofumi@mail.parknet.co.jp \
--cc=linkinjeon@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pali.rohar@gmail.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).