linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali.rohar@gmail.com>
To: Namjae Jeon <namjae.jeon@samsung.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	gregkh@linuxfoundation.org, valdis.kletnieks@vt.edu, hch@lst.de,
	sj1557.seo@samsung.com, linkinjeon@gmail.com, tytso@mit.edu,
	Gabriel Krisman Bertazi <krisman@collabora.com>
Subject: Re: [PATCH v9 10/13] exfat: add nls operations
Date: Sun, 5 Jan 2020 17:51:15 +0100	[thread overview]
Message-ID: <20200105165115.37dyrcwtgf6zgc6r@pali> (raw)
In-Reply-To: <20200102082036.29643-11-namjae.jeon@samsung.com>

[-- Attachment #1: Type: text/plain, Size: 2652 bytes --]

On Thursday 02 January 2020 16:20:33 Namjae Jeon wrote:
> This adds the implementation of nls operations for exfat.

Hello! In whole patch series are different naming convention for
nls/Unicode related terms. E.g. uni16s, utf16s, nls, vfsname, ...

Could this be fixed, so it would be unambiguously named? "uni16s" name
is misleading as Unicode does not fit into 16byte type.

Based on what is in nls.h I would propose following names:

* unicode_t *utf32s always for strings in UTF-32/UCS-4 encoding (host
  endianity) (or "unicode_t *unis" as this is the fixed-width encoding
  for all Unicode codepoints)

* wchar_t *utf16s always for strings in UTF-16 encoding (host endianity)

* u8 *utf8s always for strings in UTF-8 encoding

* wchar_t *ucs2s always for strings in UCS-2 encoding (host endianity)

Plus in the case you need to work with UTF-16 or UCS-2 in little endian,
add appropriate naming suffixes.

And use e.g. "vfsname" (char * OR unsigned char * OR u8 *) like you
already have on some places for strings in iocharset= encoding.


Looking at the whole code + exfat specification and usage is:

Kernel NLS functions do conversion between UCS-2 and iocharset=.
exfat upcase table has definitions only for UCS-2 characters.
All exfat string structures are stored in UTF-16LE, except upcase table
which is in UCS-2LE.

It is great mess in specification, specially when it talks about Unicode
upcase table for case insensitivity, which is limited only to code
points up to the U+FFFF and does not say anything about Unicode
Normalization and Normal Forms.

=======================================================================

And this opens a new question, what should kernel do if userspace asks
to create these 4 files? (Assume that iocharset=uff8 for full Unicode
support)

1. U+00e9
2. U+0065, U+0301
3. U+00c9
4. U+0045, U+0301

According to Unicode uppercase algorithm, all 4 filenames results in
same grapheme "LATIN CAPITAL LETTER E WITH ACUTE".

But with current exfat implementation first and third are treated as
same and then second and fourth are treated as same. Therefore first and
fourth are treated as different filenames, even the fact that they
represent same grapheme just only one is upper case and one lower case.

To prevent such thing we need to use some kind of Unicode normalization
form here.

What do you think what should kernel's exfat driver do in this case?

CCing Gabriel as he was implementing some Unicode normalization for ext4
driver and maybe should bring some light to new exfat driver too.

-- 
Pali Rohár
pali.rohar@gmail.com

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

  parent reply	other threads:[~2020-01-05 16:51 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20200102082359epcas1p2aa1eca9729a6ec54ec3b8140615dca6e@epcas1p2.samsung.com>
2020-01-02  8:20 ` [PATCH v9 00/13] add the latest exfat driver Namjae Jeon
     [not found]   ` <CGME20200102082400epcas1p4cd0ad14967bd8d231fc0efcede8bd99c@epcas1p4.samsung.com>
2020-01-02  8:20     ` [PATCH v9 01/13] exfat: add in-memory and on-disk structures and headers Namjae Jeon
2020-01-08 17:08       ` Christoph Hellwig
2020-01-09 22:43         ` Namjae Jeon
     [not found]   ` <CGME20200102082401epcas1p2f33f3c11ecedabff2165ba216854d8fe@epcas1p2.samsung.com>
2020-01-02  8:20     ` [PATCH v9 02/13] exfat: add super block operations Namjae Jeon
2020-01-08 17:21       ` Christoph Hellwig
2020-01-09 23:21         ` Namjae Jeon
     [not found]   ` <CGME20200102082402epcas1p47cdc0873473f99c5d81f56865bb94abc@epcas1p4.samsung.com>
2020-01-02  8:20     ` [PATCH v9 03/13] exfat: add inode operations Namjae Jeon
2020-01-08 17:50       ` Christoph Hellwig
2020-01-09 23:23         ` Namjae Jeon
     [not found]   ` <CGME20200102082402epcas1p22cdd763b3c72166c0a91f9ba8db6a9b8@epcas1p2.samsung.com>
2020-01-02  8:20     ` [PATCH v9 04/13] exfat: add directory operations Namjae Jeon
2020-01-08 17:52       ` Christoph Hellwig
     [not found]   ` <CGME20200102082403epcas1p432813ab4fd8ed07075e89e48a0ce34d7@epcas1p4.samsung.com>
2020-01-02  8:20     ` [PATCH v9 05/13] exfat: add file operations Namjae Jeon
2020-01-08 17:56       ` Christoph Hellwig
     [not found]   ` <CGME20200102082404epcas1p4a28c34799df317165ddf8bd5a0b433e9@epcas1p4.samsung.com>
2020-01-02  8:20     ` [PATCH v9 06/13] exfat: add exfat entry operations Namjae Jeon
2020-01-08 18:00       ` Christoph Hellwig
2020-01-09 23:24         ` Namjae Jeon
     [not found]   ` <CGME20200102082405epcas1p41dd62d00104cb0daa4fe85641cb8ee22@epcas1p4.samsung.com>
2020-01-02  8:20     ` [PATCH v9 07/13] exfat: add bitmap operations Namjae Jeon
2020-01-08 18:01       ` Christoph Hellwig
     [not found]   ` <CGME20200102082405epcas1p160f24165fc8ae8f51080e75bb585e0c7@epcas1p1.samsung.com>
2020-01-02  8:20     ` [PATCH v9 08/13] exfat: add exfat cache Namjae Jeon
2020-01-08 18:02       ` Christoph Hellwig
     [not found]   ` <CGME20200102082406epcas1p268f260d90213bdaabee25a7518f86625@epcas1p2.samsung.com>
2020-01-02  8:20     ` [PATCH v9 09/13] exfat: add misc operations Namjae Jeon
2020-01-02  9:19       ` Pali Rohár
2020-01-02 11:30         ` Namjae Jeon
2020-01-02 11:40           ` Pali Rohár
2020-01-03 18:36             ` Pali Rohár
2020-01-03 23:28               ` Namjae Jeon
2020-01-08 18:03         ` Christoph Hellwig
2020-01-08 19:40           ` Arnd Bergmann
2020-01-09 23:32             ` Namjae Jeon
     [not found]   ` <CGME20200102082407epcas1p4cf10cd3d0ca2903707ab01b1cc523a05@epcas1p4.samsung.com>
2020-01-02  8:20     ` [PATCH v9 10/13] exfat: add nls operations Namjae Jeon
2020-01-02 13:55       ` Pali Rohár
2020-01-03  7:06         ` Namjae Jeon
2020-01-03  8:44           ` Pali Rohár
2020-01-02 14:20       ` Pali Rohár
2020-01-03  4:44         ` Namjae Jeon
2020-01-03  9:40       ` Pali Rohár
2020-01-03 12:31         ` Pali Rohár
2020-01-09 22:35           ` Namjae Jeon
2020-01-05 15:24       ` Pali Rohár
2020-01-05 16:51       ` Pali Rohár [this message]
2020-01-06 19:46         ` Gabriel Krisman Bertazi
2020-01-07 11:52           ` Pali Rohár
2020-01-09 22:04             ` [PATCH v9 09/13] exfat: add misc operations Valdis Klētnieks
2020-01-09 23:41               ` Namjae Jeon
2020-01-09 22:37         ` [PATCH v9 10/13] exfat: add nls operations Namjae Jeon
     [not found]   ` <CGME20200102082408epcas1p28d46af675103d2cd92232a4f7b712c46@epcas1p2.samsung.com>
2020-01-02  8:20     ` [PATCH v9 11/13] exfat: add Kconfig and Makefile Namjae Jeon
2020-01-02 12:53       ` Pali Rohár
     [not found]   ` <CGME20200102082408epcas1p194621a6aa6729011703f0c5a076a7396@epcas1p1.samsung.com>
2020-01-02  8:20     ` [PATCH v9 12/13] exfat: add exfat in fs/Kconfig and fs/Makefile Namjae Jeon
2020-01-02 12:58       ` Pali Rohár
2020-01-02 13:07         ` Namjae Jeon
2020-01-02 13:10           ` Pali Rohár
2020-01-02 14:19           ` Greg KH
2020-01-02 23:48             ` Namjae Jeon
2020-01-04  5:22       ` kbuild test robot
     [not found]   ` <CGME20200102082409epcas1p4210cf0ea40d23689c4a5ba18b50979cf@epcas1p4.samsung.com>
2020-01-02  8:20     ` [PATCH v9 13/13] MAINTAINERS: add exfat filesystem Namjae Jeon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200105165115.37dyrcwtgf6zgc6r@pali \
    --to=pali.rohar@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=krisman@collabora.com \
    --cc=linkinjeon@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=namjae.jeon@samsung.com \
    --cc=sj1557.seo@samsung.com \
    --cc=tytso@mit.edu \
    --cc=valdis.kletnieks@vt.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).