linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Eric Biggers <ebiggers@kernel.org>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Andreas Dilger <adilger@dilger.ca>,
	Eric Whitney <enwlinux@gmail.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	Masahiro Yamada <masahiroy@kernel.org>
Subject: Re: Detecting default signedness of char in ext4 (despite -funsigned-char)
Date: Wed, 18 Jan 2023 07:48:06 -0800	[thread overview]
Message-ID: <CAHk-=wg7SkJZeAJ-KMKxsA7m9cs7MJoSDpu0aYKVm=bAwhcqjA@mail.gmail.com> (raw)
In-Reply-To: <Y8eAJIKikCTJrlcr@sol.localdomain>

On Tue, Jan 17, 2023 at 9:14 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> Well, reading the code more carefully, the on-disk ext4 superblock can contain
> EXT2_FLAGS_SIGNED_HASH, EXT2_FLAGS_UNSIGNED_HASH, or neither.  "Neither" is the
> legacy case.  The above existing code in ext4 is handling the "neither" case by
> setting the flag corresponding to the default signedness of char.  So yes, that
> migration code was always broken if you moved the disk from a platform with
> signed char (e.g. x86) to a platform with unsigned char (e.g. arm).  But,
> -funsigned-char breaks it whenever the disk stays on a platform with signed
> char.  That seems much worse.  Though, it's also a migration for legacy
> filesystems, so maybe that code isn't needed often anymore anyway...

The xattr hash is also broken if it stays on one single machine, but
is accessed two different ways.

Example: about half our architectures are mainly tested inside qemu,
so if you ever end up using the same disk image across two emulated
environments, you'll hit the exact same thing.

Basically, any filesystem that depends on host byte order, or on
host-specific data sizes - or on host signedness rules - is simply
completely and utterly broken (unless it's something like 'tmpfs' that
doesn't have any existence outside of that machine).

So ext4 has been broken from day one when it comes to xattr hashing.

And nobody ever noticed, because very few people use xattrs to begin
with, and when they do it tends to be very limited. And they seldom
mix architectures.

But "nobody noticed" doesn't mean it wasn't broken. It was always
completely and unambiguously buggy.

> >  (a) just admit that ext4 was buggy, and say "char is now unsigned",
> > and know that generic/454 will fail when you switch from a buggy
> > kernel to a new one that no longer has this signedness bug.
>
> It seems kind of crazy to intentionally break xattrs with non-ASCII names upon a
> kernel upgrade...

I really don't think they happen very much, and if we can fix a bug
without doing anything about it, and nobody notices, that would be
fine by me.

But:

> I think that what your patch does is allow filesystems to contain both signed
> and unsigned xattr hashes, and write out new ones as unsigned.

Right. Nobody seems to actually care about the hash, as far as I can tell.

It's used for that corruption check. And it is used by
ext4_xattr_block_cache_find() to basically reuse a cached entry, but
it has no actual semantic meaning.

>  That might work,
> though e2fsprogs would need to be fixed too, and old versions of e2fsck would
> corrupt xattrs unless a new ext4 filesystem feature flag was added.

The thing is, ef2progs NEEDS TO BE FIXED REGARDLESS!

You don't seem to realize that this is a fundamental filesystem bug.

It was not introduced by "-funsigned-char". It's been there for decades.

Re-introducing the "let's try to hide this bug" logic like your patch
does is disgusting and actively wrong.

This bug needs to be *fixed*.

And since we don't seem to have a "this filesystem uses stupid signed
hash arithmetic" flag (the EXT2_FLAGS_SIGNED_HASH only covers the
filename case), and since nobody actually cares, the best option seems
to be to just do what the code should have done originally, ie not
rely on 'char' being sign-extended.

A simpler patch would be to actually just entirely remove the check of
the e_hash value entiely, ie just this:

-               if (e_hash != entry->e_hash)
-                       return -EFSCORRUPTED;

and just say that the hash was always broken and the test for a random
value is not worth it.

              Linus

  reply	other threads:[~2023-01-18 15:50 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17 18:31 generic/454 regression in 6.2-rc1 Eric Whitney
2023-01-18  0:10 ` Andreas Dilger
2023-01-18  3:55   ` Detecting default signedness of char in ext4 (despite -funsigned-char) Eric Biggers
2023-01-18  4:21     ` Eric Biggers
2023-01-18  4:27     ` Linus Torvalds
2023-01-18  5:14       ` Eric Biggers
2023-01-18 15:48         ` Linus Torvalds [this message]
2023-01-18 19:14           ` Eric Biggers
2023-01-18 19:39             ` Linus Torvalds
2023-01-18 20:18         ` Theodore Ts'o
     [not found]           ` <CAHk-=wiGdxWtHRZftcqyPf8WbenyjniesKyZ=o73UyxfK9BL-A@mail.gmail.com>
2023-01-18 21:49             ` Theodore Ts'o
2023-01-18 22:20               ` Andreas Dilger
2023-01-19  7:19                 ` Eric Biggers
2023-01-19 18:24                 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wg7SkJZeAJ-KMKxsA7m9cs7MJoSDpu0aYKVm=bAwhcqjA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Jason@zx2c4.com \
    --cc=adilger@dilger.ca \
    --cc=ebiggers@kernel.org \
    --cc=enwlinux@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).