All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Eric Biggers <ebiggers@kernel.org>
Cc: Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Andreas Dilger <adilger@dilger.ca>,
	Eric Whitney <enwlinux@gmail.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	Masahiro Yamada <masahiroy@kernel.org>
Subject: Re: Detecting default signedness of char in ext4 (despite -funsigned-char)
Date: Tue, 17 Jan 2023 20:27:23 -0800	[thread overview]
Message-ID: <CAHk-=whUNjwqZXa-MH9KMmc_CpQpoFKFjAB9ZKHuu=TbsouT4A@mail.gmail.com> (raw)
In-Reply-To: <Y8dtze3ZLGaUi8pi@sol.localdomain>

[-- Attachment #1: Type: text/plain, Size: 3395 bytes --]

On Tue, Jan 17, 2023 at 7:56 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> [Added some Cc's, and updated subject to reflect what this is really about]

Hmm.

I really hate this.

> On Tue, Jan 17, 2023 at 05:10:55PM -0700, Andreas Dilger wrote:
> > >
> > > My 6.2-rc1 regression run on the current x86-64 test appliance revealed a new
> > > failure for generic/454 on the 4k file system configuration and all other
> > > configurations using a 4k block size.  This failure reproduces with 100%
> > > reliability and continues to appear as of 6.2-rc4.
> > >
> > > The test output indicates that the file system under test is inconsistent.
> >
> > There is actually support in the superblock for both signed and unsigned char
> > hash calculations, exactly because there was a bug like this in the past.
> > It looks like the ext4 code/build is still using the signed hash functions:

So clearly ext4 is completely buggy in this respect, but this is
exactly what would happen if you just mount a disk that was written to
on (old, pre-funsigned-char) x86, and then mount it on, say, an arm
machine that has always been unsigned-char.

That was always supposed to work.

So switching to a new kernel really should just be *exactly* the same
as moving the disk to a different machine.

And that's still "supposed to just work".

And this whole "let's get it wrong, and make x86 act as if 'char' is
signed, even when it isn't" seems entirely the wrong way around.

So the bug here is that __ext4_fill_super() seems to not look at the
actual on-disk thing, but instead do

> > static int __ext4_fill_super(struct fs_context *fc, struct super_block *sb)
> > {
...
> >                 else if ((i & EXT2_FLAGS_SIGNED_HASH) == 0) {
> > #ifdef __CHAR_UNSIGNED__

but at the same time this code is *exactly* the code that is trying to
deal with "oh, you moved the disk from a signed architecture to an
unsigned one".

So the above code is literally what should fix up that movement -
taking the *actual* new signedness (or lack thereof, in this case)
into account.

Now, it apparently doesn't work very well, and I suspect the reason it
doesn't work is that the xattr code doesn't actually test these
EXT2_FLAGS_SIGNED_HASH bits (and the s_hash_unsigned field value that
goes along with it).

But we should *fix* that.

Instead, the patch is self-admittedly very ugly:

> Below is one very ugly solution.  It seems to work [..]

but I really don't think it works. It just perpetuates the bug that
you can't move a filesystem from one architecture to another.

So I really think that the solution is either

 (a) just admit that ext4 was buggy, and say "char is now unsigned",
and know that generic/454 will fail when you switch from a buggy
kernel to a new one that no longer has this signedness bug.

 (b) fix ext4_xattr_hash_entry() to actually see "oh, this filesystem
was created with signed chars, and so we'll use that algorithm even
though our chars are always unsigned".

Honestly, the only actual case of breakage I have heard of is that
test, so I was hoping that (a) is simply the acceptable and simplest
solution. It basically says "nobody really cares, we're now always
unsigned, real people didn't use non-ASCII xattr names".

Anyway, here's a TOTALLY UNTESTED patch to do (b). Maybe it's entirely
broken, but I think you can see what I'm aiming for.

Comments?

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/x-patch, Size: 2164 bytes --]

 fs/ext4/xattr.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 7decaaf27e82..69a1b8c6a2ec 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -81,6 +81,8 @@ ext4_xattr_block_cache_find(struct inode *, struct ext4_xattr_header *,
 			    struct mb_cache_entry **);
 static __le32 ext4_xattr_hash_entry(char *name, size_t name_len, __le32 *value,
 				    size_t value_count);
+static __le32 ext4_xattr_hash_entry_signed(char *name, size_t name_len, __le32 *value,
+				    size_t value_count);
 static void ext4_xattr_rehash(struct ext4_xattr_header *);
 
 static const struct xattr_handler * const ext4_xattr_handler_map[] = {
@@ -470,8 +472,21 @@ ext4_xattr_inode_verify_hashes(struct inode *ea_inode,
 		tmp_data = cpu_to_le32(hash);
 		e_hash = ext4_xattr_hash_entry(entry->e_name, entry->e_name_len,
 					       &tmp_data, 1);
-		if (e_hash != entry->e_hash)
-			return -EFSCORRUPTED;
+		/* All good? */
+		if (e_hash == entry->e_hash)
+			return 0;
+
+		/*
+		 * Not good. Maybe the entry hash was calculated
+		 * using the buggy signed char version?
+		 */
+		e_hash = ext4_xattr_hash_entry_signed(entry->e_name, entry->e_name_len,
+							&tmp_data, 1);
+		if (e_hash == entry->e_hash)
+			return 0;
+
+		/* Still no match - bad */
+		return -EFSCORRUPTED;
 	}
 	return 0;
 }
@@ -3091,6 +3106,28 @@ static __le32 ext4_xattr_hash_entry(char *name, size_t name_len, __le32 *value,
 	return cpu_to_le32(hash);
 }
 
+/*
+ * ext4_xattr_hash_entry_signed()
+ *
+ * Compute the hash of an extended attribute incorrectly.
+ */
+static __le32 ext4_xattr_hash_entry_signed(char *name, size_t name_len, __le32 *value, size_t value_count)
+{
+	__u32 hash = 0;
+
+	while (name_len--) {
+		hash = (hash << NAME_HASH_SHIFT) ^
+		       (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
+		       (signed char)*name++;
+	}
+	while (value_count--) {
+		hash = (hash << VALUE_HASH_SHIFT) ^
+		       (hash >> (8*sizeof(hash) - VALUE_HASH_SHIFT)) ^
+		       le32_to_cpu(*value++);
+	}
+	return cpu_to_le32(hash);
+}
+
 #undef NAME_HASH_SHIFT
 #undef VALUE_HASH_SHIFT
 

  parent reply	other threads:[~2023-01-18  4:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17 18:31 generic/454 regression in 6.2-rc1 Eric Whitney
2023-01-18  0:10 ` Andreas Dilger
2023-01-18  3:55   ` Detecting default signedness of char in ext4 (despite -funsigned-char) Eric Biggers
2023-01-18  4:21     ` Eric Biggers
2023-01-18  4:27     ` Linus Torvalds [this message]
2023-01-18  5:14       ` Eric Biggers
2023-01-18 15:48         ` Linus Torvalds
2023-01-18 19:14           ` Eric Biggers
2023-01-18 19:39             ` Linus Torvalds
2023-01-18 20:18         ` Theodore Ts'o
     [not found]           ` <CAHk-=wiGdxWtHRZftcqyPf8WbenyjniesKyZ=o73UyxfK9BL-A@mail.gmail.com>
2023-01-18 21:49             ` Theodore Ts'o
2023-01-18 22:20               ` Andreas Dilger
2023-01-19  7:19                 ` Eric Biggers
2023-01-19 18:24                 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whUNjwqZXa-MH9KMmc_CpQpoFKFjAB9ZKHuu=TbsouT4A@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Jason@zx2c4.com \
    --cc=adilger@dilger.ca \
    --cc=ebiggers@kernel.org \
    --cc=enwlinux@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.