Re: Detecting default signedness of char in ext4 (despite -funsigned-char)

From: Eric Biggers <ebiggers@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Andreas Dilger <adilger@dilger.ca>,
	Eric Whitney <enwlinux@gmail.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	Masahiro Yamada <masahiroy@kernel.org>
Subject: Re: Detecting default signedness of char in ext4 (despite -funsigned-char)
Date: Wed, 18 Jan 2023 11:14:02 -0800	[thread overview]
Message-ID: <Y8hE+uwHkilxThDT@sol.localdomain> (raw)
In-Reply-To: <CAHk-=wg7SkJZeAJ-KMKxsA7m9cs7MJoSDpu0aYKVm=bAwhcqjA@mail.gmail.com>

On Wed, Jan 18, 2023 at 07:48:06AM -0800, Linus Torvalds wrote:
> On Tue, Jan 17, 2023 at 9:14 PM Eric Biggers <ebiggers@kernel.org> wrote:
> >
> > Well, reading the code more carefully, the on-disk ext4 superblock can contain
> > EXT2_FLAGS_SIGNED_HASH, EXT2_FLAGS_UNSIGNED_HASH, or neither.  "Neither" is the
> > legacy case.  The above existing code in ext4 is handling the "neither" case by
> > setting the flag corresponding to the default signedness of char.  So yes, that
> > migration code was always broken if you moved the disk from a platform with
> > signed char (e.g. x86) to a platform with unsigned char (e.g. arm).  But,
> > -funsigned-char breaks it whenever the disk stays on a platform with signed
> > char.  That seems much worse.  Though, it's also a migration for legacy
> > filesystems, so maybe that code isn't needed often anymore anyway...
> 
> The xattr hash is also broken if it stays on one single machine, but
> is accessed two different ways.
> 
> Example: about half our architectures are mainly tested inside qemu,
> so if you ever end up using the same disk image across two emulated
> environments, you'll hit the exact same thing.
> 
> Basically, any filesystem that depends on host byte order, or on
> host-specific data sizes - or on host signedness rules - is simply
> completely and utterly broken (unless it's something like 'tmpfs' that
> doesn't have any existence outside of that machine).
> 
> So ext4 has been broken from day one when it comes to xattr hashing.
> 
> And nobody ever noticed, because very few people use xattrs to begin
> with, and when they do it tends to be very limited. And they seldom
> mix architectures.
> 
> But "nobody noticed" doesn't mean it wasn't broken. It was always
> completely and unambiguously buggy.
> 
> > >  (a) just admit that ext4 was buggy, and say "char is now unsigned",
> > > and know that generic/454 will fail when you switch from a buggy
> > > kernel to a new one that no longer has this signedness bug.
> >
> > It seems kind of crazy to intentionally break xattrs with non-ASCII names upon a
> > kernel upgrade...
> 
> I really don't think they happen very much, and if we can fix a bug
> without doing anything about it, and nobody notices, that would be
> fine by me.
> 
> But:
> 
> > I think that what your patch does is allow filesystems to contain both signed
> > and unsigned xattr hashes, and write out new ones as unsigned.
> 
> Right. Nobody seems to actually care about the hash, as far as I can tell.
> 
> It's used for that corruption check. And it is used by
> ext4_xattr_block_cache_find() to basically reuse a cached entry, but
> it has no actual semantic meaning.
> 
> >  That might work,
> > though e2fsprogs would need to be fixed too, and old versions of e2fsck would
> > corrupt xattrs unless a new ext4 filesystem feature flag was added.
> 
> The thing is, ef2progs NEEDS TO BE FIXED REGARDLESS!
> 
> You don't seem to realize that this is a fundamental filesystem bug.
> 
> It was not introduced by "-funsigned-char". It's been there for decades.
> 
> Re-introducing the "let's try to hide this bug" logic like your patch
> does is disgusting and actively wrong.
> 
> This bug needs to be *fixed*.
> 
> And since we don't seem to have a "this filesystem uses stupid signed
> hash arithmetic" flag (the EXT2_FLAGS_SIGNED_HASH only covers the
> filename case), and since nobody actually cares, the best option seems
> to be to just do what the code should have done originally, ie not
> rely on 'char' being sign-extended.
> 
> A simpler patch would be to actually just entirely remove the check of
> the e_hash value entiely, ie just this:
> 
> -               if (e_hash != entry->e_hash)
> -                       return -EFSCORRUPTED;
> 
> and just say that the hash was always broken and the test for a random
> value is not worth it.
> 

Of course I understand there's a fundamental filesystem bug.  The question is
what to *do* about it.  The patches I suggested were *only* intended to make the
ext4 code work the way it did in v6.1 as a workaround, and to start a discussion
about how to detect the platform's default signedness of a char, since it does
seem that's still going to be needed regardless, even though it of course never
should have been needed in the first place.  I'm glad that you're interested in
helping with a more fundamental fix as well.

Now, the options for that are:

For the dirhash:

1a.) When mounting a filesystem that doesn't have the signedness of the dirhash
     explicitly stored, assume it's the platform's default signedness, and
     explicitly store that.  (Behavior in v6.1 and earlier.)

1b.) When mounting a filesystem that doesn't have the signedness of the dirhash
     explicitly stored, assume it's unsigned, and explicitly store that on-disk.
     (Behavior in v6.2-rc4.)

For the xattr hash:

2a.) When mounting a filesystem that doesn't have the signedness of the xattr
     hash explicitly stored, assume it's the platform's default signedness, and
     explicitly store that.  (Like how the dirhash worked.)

2b.) Change the xattr hash to always be unsigned.  (Behavior in v6.2-rc4.)

2c.) Write new xattr hashes as unsigned, and allow the filesystem to contain
     both unsigned and signed xattr hashes, without any explicit indication.  If
     the hash fails to verify as unsigned, try verifying it as signed too.

(1a) and (2a) would be the least likely to break users.  [(2a) instead of (2c),
since (2c) would make old versions of e2fsck break filesystems on platforms with
signed char.]  And those solutions require being able to detect the platform's
default signedness, however much we hate it.

Now, we seem to have gotten the "let's break userspace, lol" version of Linus
today, not the "SHUT THE FUCK UP, WE DO NOT BREAK USERSPACE" version of Linus
(https://lore.kernel.org/r/CA+55aFy98A+LJK4+GWMcbzaa1zsPBRo76q+ioEjbx-uaMKH6Uw@mail.gmail.com).
So sure, if we're extremely confident that no one, or at least no one we care
about, is mounting very old filesystems that haven't been mounted in a long
time, or using non-ASCII xattr names, then sure we could break those cases.
These cases were indeed already broken if a filesystem moved between platforms
with different char signedness, so that could be a reason not to care, although
"moving between platforms" is *much* less common than "same platform".

- Eric