linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: "Theodore Ts'o" <tytso@mit.edu>
Cc: Christian Brauner <brauner@kernel.org>,
	Gabriel Krisman Bertazi <krisman@suse.de>,
	viro@zeniv.linux.org.uk,  linux-f2fs-devel@lists.sourceforge.net,
	ebiggers@kernel.org,  linux-fsdevel@vger.kernel.org,
	jaegeuk@kernel.org, linux-ext4@vger.kernel.org
Subject: Re: [f2fs-dev] [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs
Date: Mon, 20 Nov 2023 18:29:05 -0800	[thread overview]
Message-ID: <CAHk-=whb80quGmmgVcsq51cXw9dQ9EfNMi9otL9eh34jVZaD2g@mail.gmail.com> (raw)
In-Reply-To: <20231121020254.GB291888@mit.edu>

On Mon, 20 Nov 2023 at 18:03, Theodore Ts'o <tytso@mit.edu> wrote:
>
> On Mon, Nov 20, 2023 at 10:07:51AM -0800, Linus Torvalds wrote:
> >     I'm looking at things like
> > generic_ci_d_compare(), and it hurts to see the mindless "let's do
> > lookups and compares one utf8 character at a time". What a disgrace.
> > Somebody either *really* didn't care, or was a Unicode person who
> > didn't understand the point of UTF-8.
>
> This isn't because of case-folding brain damage, but rather Unicode
> brain damage.

No, it really is just stupidity and horribleness.

The thing is, when you check two strings for equality, the FIRST THING
you should do is to just compare them for exactly that: equality.

And no, the way you do that is not by checking each unicode character
one by one.

You do it by just doing a regular memcmp. In fact, you can do even
better than that: while at it, check whether
 (a) all bytes are equal in everything but bit#5
 (b) none of the bytes have the high  bit set
and you have now narrowed down things in a big way. You can do these
things trivially one whole word at a time, and you'll handle 99% of
all input without EVER doing any Unicode garbage AT ALL.

Yes, yes, if you actually have complex characters, you end up having
to deal with that mess. But no, that is *not* an excuse for saying
"all characters are complex".

So no. There is absolutely zero excuse for doing stupid things, except
for "nobody has ever cared, because case folding is so stupid to begin
with that people just expect it to perform horribly badly".

End result:

 - generic_ci_d_compare() should *not* consider the memcmp() to be a
"fall back to this for non-casefolded". You should start with that,
and if the bytes are equal then the strings are equal. End of story.

 - if the bytes are not equal, then the strings *might* still compare
equal if it's a casefolded directory.

 - but EVEN THEN you shouldn't fall back to actually doing UTF-8
decoding unless you saw the high bit being set at some point.

 - and if they different in anything but bit #5 and you didn't see the
high bit, you know they are different.

It's a bit complicated, yes. But no, doing things one unicode
character at a time is just bad bad bad.

               Linus

  reply	other threads:[~2023-11-21  2:29 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-16  5:07 [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs Gabriel Krisman Bertazi
2023-08-16  5:07 ` [PATCH v6 1/9] ecryptfs: Reject casefold directory inodes Gabriel Krisman Bertazi
2023-08-16  5:07 ` [PATCH v6 2/9] 9p: Split ->weak_revalidate from ->revalidate Gabriel Krisman Bertazi
2023-08-16  5:07 ` [PATCH v6 3/9] fs: Expose name under lookup to d_revalidate hooks Gabriel Krisman Bertazi
2023-11-22 20:59   ` Al Viro
2023-08-16  5:07 ` [PATCH v6 4/9] fs: Add DCACHE_CASEFOLDED_NAME flag Gabriel Krisman Bertazi
2023-11-22 20:32   ` Al Viro
2023-08-16  5:07 ` [PATCH v6 5/9] libfs: Validate negative dentries in case-insensitive directories Gabriel Krisman Bertazi
2023-11-22 20:20   ` Al Viro
2023-08-16  5:08 ` [PATCH v6 6/9] libfs: Chain encryption checks after case-insensitive revalidation Gabriel Krisman Bertazi
2023-08-16  5:08 ` [PATCH v6 7/9] libfs: Merge encrypted_ci_dentry_ops and ci_dentry_ops Gabriel Krisman Bertazi
2023-08-16  5:08 ` [PATCH v6 8/9] ext4: Enable negative dentries on case-insensitive lookup Gabriel Krisman Bertazi
2023-08-16  5:08 ` [PATCH v6 9/9] f2fs: " Gabriel Krisman Bertazi
2023-08-17 17:06 ` [PATCH v6 0/9] Support negative dentries on case-insensitive ext4 and f2fs Eric Biggers
2023-08-21 15:52   ` Christian Brauner
2023-08-21 18:53     ` Gabriel Krisman Bertazi
2023-08-22  9:03       ` Christian Brauner
2023-10-24 22:20         ` Gabriel Krisman Bertazi
2023-10-25 13:32 ` Christian Brauner
2023-10-25 15:19   ` Gabriel Krisman Bertazi
2023-11-19 23:11   ` [f2fs-dev] " Gabriel Krisman Bertazi
     [not found]   ` <655a9634.630a0220.d50d7.5063SMTPIN_ADDED_BROKEN@mx.google.com>
2023-11-20 15:06     ` Christian Brauner
2023-11-20 16:59       ` Gabriel Krisman Bertazi
2023-11-20 18:07       ` Linus Torvalds
2023-11-21  2:02         ` Theodore Ts'o
2023-11-21  2:29           ` Linus Torvalds [this message]
2023-11-21  3:03             ` Linus Torvalds
2023-11-21  5:12               ` Theodore Ts'o
2023-11-22 21:04                 ` Al Viro
2023-11-21  2:27         ` Al Viro
2023-11-22 21:19           ` Al Viro
2023-11-23  0:18             ` Linus Torvalds
2023-11-23  5:09               ` Al Viro
2023-11-23 15:57               ` Gabriel Krisman Bertazi
2023-11-23 17:12                 ` Al Viro
2023-11-23 17:37                   ` Gabriel Krisman Bertazi
2023-11-23 18:24                     ` Al Viro
2023-11-23 19:06                       ` Gabriel Krisman Bertazi
2023-11-23 19:53                         ` Al Viro
2023-11-23 20:15                           ` Al Viro
2023-11-24 15:20                           ` Gabriel Krisman Bertazi
2023-11-28  0:02                             ` Al Viro
2023-11-23 21:52                         ` Al Viro
2023-11-24 15:22                           ` Gabriel Krisman Bertazi
2023-11-25 22:01                             ` Al Viro
2023-11-26  4:52                               ` Al Viro
2023-11-26 18:41                                 ` fun with d_invalidate() vs. d_splice_alias() was " Al Viro
2023-11-27  6:38                                   ` Al Viro
2023-11-27 15:47                                     ` Eric W. Biederman
2023-11-27 16:01                                       ` Eric W. Biederman
2023-11-27 17:25                                         ` Al Viro
2023-11-27 18:26                                           ` Al Viro
2023-11-27 16:03                                       ` Al Viro
2023-11-27 16:14                                         ` Al Viro
2023-11-27 18:19                                           ` Eric W. Biederman
2023-11-27 18:43                                             ` Al Viro
2023-11-27 16:33                                     ` Christian Brauner
2023-11-29  4:53                                     ` Al Viro
2023-11-29 10:21                                       ` Christian Brauner
2023-11-29 15:19                                       ` Eric W. Biederman
     [not found]               ` <655f7665.df0a0220.58a21.e84fSMTPIN_ADDED_BROKEN@mx.google.com>
2023-11-23 16:41                 ` Linus Torvalds
2023-11-23  1:12             ` Al Viro
2023-11-23  1:22               ` Al Viro
2023-11-22  3:30         ` Gabriel Krisman Bertazi
2024-01-16 19:02 ` patchwork-bot+f2fs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whb80quGmmgVcsq51cXw9dQ9EfNMi9otL9eh34jVZaD2g@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=ebiggers@kernel.org \
    --cc=jaegeuk@kernel.org \
    --cc=krisman@suse.de \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-f2fs-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).