linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	kernel@collabora.com, linux-ext4@vger.kernel.org,
	krisman@collabora.com
Subject: Re: [PATCH v4 00/23] Ext4 Encoding and Case-insensitive support
Date: Sun, 9 Dec 2018 12:54:38 -0800	[thread overview]
Message-ID: <CAHk-=wh9CXVF6VZ8ZN5aRoRZyPb5ZME3LqNspPNd3LwQFHJT0Q@mail.gmail.com> (raw)
In-Reply-To: <20181209201043.GA1840@mit.edu>

On Sun, Dec 9, 2018 at 12:10 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> Gabriel added the Unicode tables for case folding to the fs/nls
> directory.  If you'd prefer that we put them somewhere else, we
> can; do you have a preference?

I have a really hard time judging, since I haven't seen the code, just
a random diffstat and shortlog.

First off, there is no such thing as "one" unicode table for case
folding. There are lots and lots of tables, and I'm not clear what
table it is all about.

For example, both OS X and Windows do some form of case folding on
unicode. They don't do the *same* folding, though.

There are also various locale variations to case folding. This is
where I thought your nls choice came from, but then you tried to imply
that there are no locale issues and that directories can just have a
single flag to enable/disable the folding.

In some locales, "SS" and "ß" (perhaps "SZ" too) will compare the same
in case-insensitivity. Crazy in general, and afaik modern unicode even
has a real upper-case "ß" so it's arguably legacy, but...

And that's all entirely independent of the issues with all the
combining characters, modifier letters, white-space, overlong utf8
questions, etc etc.

It's also easy to generate overlong utf-8 that decodes to '/', for
example. Some broken systems might consider that identical to a real
'/' and it matters for path lookup.

So what's the actual code? What rules did you happen to pick? Did you
take the windows rules as-is (I _think_ they may be documented) since
the primary target apparently is just samba performance?

And even if the answer is "we follow NTFS rules", which *version* of
NTFS folding rules are you using if you're trying to speed up samba,
for example? Because afaik they have changed over time.

Is the *only* target samba? You are never interested for local loads
like "oh, people want to run Wine and might need it" or the
application testing parts?

All of these matter.

For example, if it's some "ext4 special case just for samba", then
perhaps the logical place to put all this is just in fs/ext4/ and not
bother anybody else about it.

But if it might be useful as some generic "NTFS hashing" library, then
make it that.

                   Linus

  reply	other threads:[~2018-12-09 20:54 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-06 23:08 [PATCH v4 00/23] Ext4 Encoding and Case-insensitive support Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 01/23] nls: Wrap uni2char/char2uni callers Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 02/23] nls: Wrap charset field access Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 03/23] nls: Wrap charset hooks in ops structure Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 04/23] nls: Split default charset from NLS core Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 05/23] nls: Split struct nls_charset from struct nls_table Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 06/23] nls: Add support for multiple versions of an encoding Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 07/23] nls: Implement NLS_STRICT_MODE flag Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 08/23] nls: Let charsets define the behavior of tolower/toupper Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 09/23] nls: Add new interface for string comparisons Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 10/23] nls: Add optional normalization and casefold hooks Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 11/23] nls: ascii: Support validation and normalization operations Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 12/23] nls: utf8: Add unicode character database files Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 13/23] scripts: add trie generator for UTF-8 Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 14/23] nls: utf8: Move nls-utf8{,-core}.c Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 15/23] nls: utf8: Introduce code for UTF-8 normalization Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 16/23] nls: utf8n: reduce the size of utf8data[] Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 17/23] nls: utf8: Integrate utf8 normalization code with utf8 charset Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 18/23] nls: utf8: Introduce test module for normalized utf8 implementation Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 19/23] ext4: Reserve superblock fields for encoding information Gabriel Krisman Bertazi
2018-12-06 23:09 ` [PATCH v4 20/23] ext4: Include encoding information in the superblock Gabriel Krisman Bertazi
2018-12-06 23:09 ` [PATCH v4 21/23] ext4: Support encoding-aware file name lookups Gabriel Krisman Bertazi
2018-12-06 23:09 ` [PATCH v4 22/23] ext4: Implement EXT4_CASEFOLD_FL flag Gabriel Krisman Bertazi
2018-12-06 23:09 ` [PATCH v4 23/23] docs: ext4.rst: Document encoding and case-insensitive Gabriel Krisman Bertazi
2018-12-07 18:41 ` [PATCH v4 00/23] Ext4 Encoding and Case-insensitive support Randy Dunlap
     [not found] ` <20181208194128.GE20708@thunk.org>
2018-12-08 21:48   ` Linus Torvalds
2018-12-08 21:58     ` Linus Torvalds
2018-12-08 22:59       ` Linus Torvalds
2018-12-09  0:46         ` Andreas Dilger
     [not found]       ` <20181209050326.GA28659@mit.edu>
2018-12-09 17:41         ` Linus Torvalds
2018-12-09 20:10           ` Theodore Y. Ts'o
2018-12-09 20:54             ` Linus Torvalds [this message]
2018-12-10  0:08               ` Theodore Y. Ts'o
2018-12-10 19:35                 ` Linus Torvalds
2018-12-09 20:53           ` Gabriel Krisman Bertazi
2018-12-09 21:05             ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2018-12-06 22:04 Gabriel Krisman Bertazi
2018-12-06 22:50 ` Dave Chinner
2018-12-06 23:09   ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wh9CXVF6VZ8ZN5aRoRZyPb5ZME3LqNspPNd3LwQFHJT0Q@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).