Re: Detecting default signedness of char in ext4 (despite -funsigned-char)

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Eric Biggers <ebiggers@kernel.org>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Andreas Dilger <adilger@dilger.ca>,
	Eric Whitney <enwlinux@gmail.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	Masahiro Yamada <masahiroy@kernel.org>
Subject: Re: Detecting default signedness of char in ext4 (despite -funsigned-char)
Date: Wed, 18 Jan 2023 07:48:06 -0800	[thread overview]
Message-ID: <CAHk-=wg7SkJZeAJ-KMKxsA7m9cs7MJoSDpu0aYKVm=bAwhcqjA@mail.gmail.com> (raw)
In-Reply-To: <Y8eAJIKikCTJrlcr@sol.localdomain>

On Tue, Jan 17, 2023 at 9:14 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> Well, reading the code more carefully, the on-disk ext4 superblock can contain
> EXT2_FLAGS_SIGNED_HASH, EXT2_FLAGS_UNSIGNED_HASH, or neither.  "Neither" is the
> legacy case.  The above existing code in ext4 is handling the "neither" case by
> setting the flag corresponding to the default signedness of char.  So yes, that
> migration code was always broken if you moved the disk from a platform with
> signed char (e.g. x86) to a platform with unsigned char (e.g. arm).  But,
> -funsigned-char breaks it whenever the disk stays on a platform with signed
> char.  That seems much worse.  Though, it's also a migration for legacy
> filesystems, so maybe that code isn't needed often anymore anyway...

The xattr hash is also broken if it stays on one single machine, but
is accessed two different ways.

Example: about half our architectures are mainly tested inside qemu,
so if you ever end up using the same disk image across two emulated
environments, you'll hit the exact same thing.

Basically, any filesystem that depends on host byte order, or on
host-specific data sizes - or on host signedness rules - is simply
completely and utterly broken (unless it's something like 'tmpfs' that
doesn't have any existence outside of that machine).

So ext4 has been broken from day one when it comes to xattr hashing.

And nobody ever noticed, because very few people use xattrs to begin
with, and when they do it tends to be very limited. And they seldom
mix architectures.

But "nobody noticed" doesn't mean it wasn't broken. It was always
completely and unambiguously buggy.

> >  (a) just admit that ext4 was buggy, and say "char is now unsigned",
> > and know that generic/454 will fail when you switch from a buggy
> > kernel to a new one that no longer has this signedness bug.
>
> It seems kind of crazy to intentionally break xattrs with non-ASCII names upon a
> kernel upgrade...

I really don't think they happen very much, and if we can fix a bug
without doing anything about it, and nobody notices, that would be
fine by me.

But:

> I think that what your patch does is allow filesystems to contain both signed
> and unsigned xattr hashes, and write out new ones as unsigned.

Right. Nobody seems to actually care about the hash, as far as I can tell.

It's used for that corruption check. And it is used by
ext4_xattr_block_cache_find() to basically reuse a cached entry, but
it has no actual semantic meaning.

>  That might work,
> though e2fsprogs would need to be fixed too, and old versions of e2fsck would
> corrupt xattrs unless a new ext4 filesystem feature flag was added.

The thing is, ef2progs NEEDS TO BE FIXED REGARDLESS!

You don't seem to realize that this is a fundamental filesystem bug.

It was not introduced by "-funsigned-char". It's been there for decades.

Re-introducing the "let's try to hide this bug" logic like your patch
does is disgusting and actively wrong.

This bug needs to be *fixed*.

And since we don't seem to have a "this filesystem uses stupid signed
hash arithmetic" flag (the EXT2_FLAGS_SIGNED_HASH only covers the
filename case), and since nobody actually cares, the best option seems
to be to just do what the code should have done originally, ie not
rely on 'char' being sign-extended.

A simpler patch would be to actually just entirely remove the check of
the e_hash value entiely, ie just this:

-               if (e_hash != entry->e_hash)
-                       return -EFSCORRUPTED;

and just say that the hash was always broken and the test for a random
value is not worth it.

              Linus