linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	kernel@collabora.com, linux-ext4@vger.kernel.org,
	krisman@collabora.com
Subject: Re: [PATCH v4 00/23] Ext4 Encoding and Case-insensitive support
Date: Sat, 8 Dec 2018 13:48:54 -0800	[thread overview]
Message-ID: <CAHk-=wg2JvjXfdZ8K5Tv3vm6+bKRedotF5cr5AwVZVBypVfdAQ@mail.gmail.com> (raw)
In-Reply-To: <20181208194128.GE20708@thunk.org>

On Sat, Dec 8, 2018 at 12:22 PM Theodore Y. Ts'o <tytso@mit.edu> wrote:
>
> There's a patch series that's been baking for a while that will likely
> go upstream either in the next upcoming merge window, or the one after
> that.  Since it adds support for Unicode case-folding, it involves a
> non-trivial number of changes to fs/nls.  As near as I can tell, no
> one is really maintaining fs/nls.

Christ.

Why do people want to do this? We know it's a crazy and stupid thing
to do. And we know that, exactly because people have done it, and it
has always been a mistake.

It causes actual and very subtle security issues.

It breaks things subtly even when they supposedly "know" about case
folding because different things will do it differently (ie user space
vs kernel space not having the *exact* same rules due to using
different tables, for example).

It doesn't work with locales, because people often want different
locales at the same time.

And it slows things down enormously because you can't do hashing well,
and comparisons get hugely more expensive.

And to add insult to injury, people always implement it so *horribly*
badly that it's not even funny.

For example, the usual way that people do it is to case-fold two
strings, and then compare the end results. And that's *incredibly*
stupid and slow and generates extra temporary allocations etc.

Or people to it character-by-character instead, and don't understand
utf-8 (which is literally designed to be easy to see character
boundaries *without* having to do a full decode!), and do *that*
incredibly badly instead.

And when you create a file with an ambiguous name, what does readdir
report? Does it report the name you used, some normalized thing, or
what?

Finally, people then invariably do it in ways that preclude any
concurrent sane uses.

For example, they make it a single mount-time flag for the whole
filesystem, so now if you are (for example) wanting to do emulation of
bad system decisions, you now force the *host* to buy into the whole
mistake too.

And they make it a whole-filesystem flag, instead of (for example)
allowing just the emulated environment to do case-insensitive
filesystem operations on an operation-by-operation basis, and possibly
only within a particular subdirectory structure (or bind mount).

So the first thing I want to know is who really needs it, *why* they
need it, and what the design is for.

Because I can almost guarantee that the design is horrible, and the
reasons are really really bad.

And what *are* the case insensitivity rules, and how do you co-exist
when there are two *different* folding rules at the same time? For
example, OS X has some truly horrendously bad rules, that take the
badness that Windows did to a whole different level. What if you're a
file server (or emulation environment) and you want to expose the same
filesystem to both of those environments?

Because it would quite possibly be a whole lot better to allow
per-operation flags, so that you can do

    fd = openat(dir, path, O_RDONLY | O_ICASE);

so that you can allow *one* process to treat a filesystem as if it was
case insensitive (think "Wine in with a ~/.wine/C directory"), without
forcing the whole filesystem to be icase.

Yes, allowing concurrent use then generates whole new "interesting"
questions, like "what happens if a case _sensitive_ user creates two
files with names that are identical to a in-sensitive user", but they
aren't necessarily any worse than the issues you face *not* allowing
that.

> Given your recent comments about not wanting to see pull requests for
> things outside of fs/xfs as part of the xfs pull, do you have any
> opinions about how to do manage this feature going upstream?  My
> original plan was to send them through the ext4 tree, since I very
> much doubt Al cares much about nls issues, and they will only impact
> ext4.

I really want to know what is driving this insanity, and what the
actual use-case is.

You have a diffstat, but not a git tree to look at what the heck is going on.

Seriously, case insensitivity is *such* a horrendously bad idea that
people need to think about it deeply, and nobody seems to ever do
that.

And yes, we have d_hash() and some rudimentary support for it in the
VFS layer, but that VFS layer bit was always meant purely for
interoperability filesystems that nobody really cared about as a real
filesystem for Linux. Notably FAT and its ilk.

If we have a major native filesystem doing it, I think we need to
actively think about the big picture and do it *right*. None of the
crazy "ok, you can't even look things up in the dcache directly at
all" stuff that we have as a hack to just allow _bad_ filesystems to
do their thing.

So I think this is a bigger deal than that diffstat of yours implies.
I don't think people understand just how *bad* case insensitivity is.

The old DOS/Mac people thought case insensitivity was a "helpful"
idea, and that was understandable - but wrong - even back in the 80's.
They are still living with the end result of that horrendously bad
decision decades later. They've _tried_ to fix their bad decisions,
and have never been able to (except, apparently, in iOS where somebody
finally had a glimmer of a clue).

                 Linus

  parent reply	other threads:[~2018-12-08 21:49 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-06 23:08 [PATCH v4 00/23] Ext4 Encoding and Case-insensitive support Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 01/23] nls: Wrap uni2char/char2uni callers Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 02/23] nls: Wrap charset field access Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 03/23] nls: Wrap charset hooks in ops structure Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 04/23] nls: Split default charset from NLS core Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 05/23] nls: Split struct nls_charset from struct nls_table Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 06/23] nls: Add support for multiple versions of an encoding Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 07/23] nls: Implement NLS_STRICT_MODE flag Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 08/23] nls: Let charsets define the behavior of tolower/toupper Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 09/23] nls: Add new interface for string comparisons Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 10/23] nls: Add optional normalization and casefold hooks Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 11/23] nls: ascii: Support validation and normalization operations Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 12/23] nls: utf8: Add unicode character database files Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 13/23] scripts: add trie generator for UTF-8 Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 14/23] nls: utf8: Move nls-utf8{,-core}.c Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 15/23] nls: utf8: Introduce code for UTF-8 normalization Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 16/23] nls: utf8n: reduce the size of utf8data[] Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 17/23] nls: utf8: Integrate utf8 normalization code with utf8 charset Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 18/23] nls: utf8: Introduce test module for normalized utf8 implementation Gabriel Krisman Bertazi
2018-12-06 23:08 ` [PATCH v4 19/23] ext4: Reserve superblock fields for encoding information Gabriel Krisman Bertazi
2018-12-06 23:09 ` [PATCH v4 20/23] ext4: Include encoding information in the superblock Gabriel Krisman Bertazi
2018-12-06 23:09 ` [PATCH v4 21/23] ext4: Support encoding-aware file name lookups Gabriel Krisman Bertazi
2018-12-06 23:09 ` [PATCH v4 22/23] ext4: Implement EXT4_CASEFOLD_FL flag Gabriel Krisman Bertazi
2018-12-06 23:09 ` [PATCH v4 23/23] docs: ext4.rst: Document encoding and case-insensitive Gabriel Krisman Bertazi
2018-12-07 18:41 ` [PATCH v4 00/23] Ext4 Encoding and Case-insensitive support Randy Dunlap
     [not found] ` <20181208194128.GE20708@thunk.org>
2018-12-08 21:48   ` Linus Torvalds [this message]
2018-12-08 21:58     ` Linus Torvalds
2018-12-08 22:59       ` Linus Torvalds
2018-12-09  0:46         ` Andreas Dilger
     [not found]       ` <20181209050326.GA28659@mit.edu>
2018-12-09 17:41         ` Linus Torvalds
2018-12-09 20:10           ` Theodore Y. Ts'o
2018-12-09 20:54             ` Linus Torvalds
2018-12-10  0:08               ` Theodore Y. Ts'o
2018-12-10 19:35                 ` Linus Torvalds
2018-12-09 20:53           ` Gabriel Krisman Bertazi
2018-12-09 21:05             ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2018-12-06 22:04 Gabriel Krisman Bertazi
2018-12-06 22:50 ` Dave Chinner
2018-12-06 23:09   ` Gabriel Krisman Bertazi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wg2JvjXfdZ8K5Tv3vm6+bKRedotF5cr5AwVZVBypVfdAQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=kernel@collabora.com \
    --cc=krisman@collabora.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).