linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Andreas Dilger <adilger@dilger.ca>
Cc: "Theodore Ts'o" <tytso@mit.edu>,
	Eric Biggers <ebiggers@kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Eric Whitney <enwlinux@gmail.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	Masahiro Yamada <masahiroy@kernel.org>
Subject: Re: Detecting default signedness of char in ext4 (despite -funsigned-char)
Date: Thu, 19 Jan 2023 10:24:18 -0800	[thread overview]
Message-ID: <CAHk-=whAMYsprEG8idgtvmzsx_Si5r3Hzetp39s3Rkkm=6+eSA@mail.gmail.com> (raw)
In-Reply-To: <3A9E6D2E-F98F-461C-834D-D4E269CC737F@dilger.ca>

[-- Attachment #1: Type: text/plain, Size: 4195 bytes --]

On Wed, Jan 18, 2023 at 2:20 PM Andreas Dilger <adilger@dilger.ca> wrote:
>
> It makes sense to use the existing UNSIGNED/SIGNED flag in the superblock
> for the dir hash also for the xattr hash.

No. As Eric says, the existing flag for this - while related to a very
similar issue - just doesn't work.

It's for a different hash, and the flag has been set (or not) based on
previous history of that particular filesystem that isn't necessarily
at all relevant. The "signedness" of the xattrs hash simply doesn't
necessarily match the signedness of the existing flag.

Also, honestly, it's just entirely POINTLESS.

Here's the deal: these kinds of flags simply SHOULD NOT EXIST. A
filesystem that has dynamic byte order flags is an INCOMPETENT
filesystem. Yes, I know they exist. The people involved should be
ashamed of them.

For things like byte order flags, it's simply unambiguously better to
just have an unconditional on-disk byte order, even when that byte
order doesn't match the "native" byte order of the machine. Doing an
unconditional 'bswap' operation is literally smaller, simpler and
faster than conditionally doing nothing, and makes for much easier
verification (because you can use static typing rules, like we use for
"__le32" and types like that).

The EXACT same thing is true of architecture-specific signedness.

Yes, you can add flags to dynamically choose one or the other, but
that's actively *worse* then just picking one statically and saying
"this is the correct one".

And honestly, the unsigned version was always the correct one. It's
not even half-way ambiguous like the whole little-endian vs big-endian
discussion used to be. In this very thread Ted made clear that he
hadn't even realized that this signed case could happen and would
matter.  The signed case is surprising and unintentional.

So, given that
 (a) it's just a *fact* that the static choice is better
 (b) of the two choices, one is clearly correct
I'm just going to put my foot down and say that right way forward is
clearly to just make that be the de-facto rule.

And honestly, it's the simpler patch too, since "-funsigned-char" has
already done all the actual "pick the right hash" for us.

We have four levels of "how much do we care" and simplicity:

 (1) do nothing, see if anybody actually ever notices outside of generic/454

 (2) just remove the hash check entirely, it's not doing anything
semantically meaningful

 (3) keep the hash check, replace the "return -EFSCORRUPTED" with a
"pr_warn_once()" and returning zero

 (4) the "keep the hash check, if it fails, test it the other way,
always write the correct new hash" patch I already posted

but the thing I refuse to do is to either just maintain the bug as-is
with extra complexity (Eric's patch) or add a new flag (or erroneously
re-use an existing flag) to make it dynamic.

The e2fsck argument is bugus, because *every* case requires e2fsck
changes. Even if we maintain the completely bogus historical behavior
of "the on-disk filesystem representation depends on architecture",
e2fsck had better start warning about it.

And again, I guarantee that the "fix the signedness" is the simpler
thing even for e2fsck, since at least then there is one unambiguously
correct version and one incorrect one, instead of some wishy-washy
"oh, this filesystem is correct for *THESE* architectures".

I'm attaching my suggested patch here again - this time with a commit
message - because while I'm ok with any of options 1-4 above, I
already wrote the patch for the most complicated case, and it's not
*that* complex.

We could possibly add a "pr_warn_once()" so that we have a combination
of (3) and (4).

In the longer term, we should

 (a) remove the #ifdef __CHAR_UNSIGNED__ in ext4 as always true
(happily, I grepped - it's the only case in the whole source tree)

 (b) probably plan on _eventually_ just remove my disgusting
"calculate signed version" thing, once we think it's all blown over
and we have never seen one of those pr_warn_on() messages outside of
generic/454

but I *really* don't want to add some new filesystem flag to deal with
this situation when a non-flag version is just simpler.

Hmm?

              Linus

[-- Attachment #2: 0001-ext4-deal-with-legacy-signed-xattr-name-hash-values.patch --]
[-- Type: text/x-patch, Size: 3564 bytes --]

From a9c0ae2863875446fa3d00edae2ccad4da75feb5 Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Thu, 19 Jan 2023 09:50:38 -0800
Subject: [PATCH] ext4: deal with legacy signed xattr name hash values

We potentially have old hashes of the xattr names generated on systems
with signed 'char' types.  Now that everybody uses '-funsigned-char',
those hashes will no longer match.

This only happens if you use xattrs names that have the high bit set,
which probably doesn't happen in practice, but the xfstest generic/454
shows it.

Instead of adding a new "signed xattr hash filesystem" bit and having to
deal with all the possible combinations, just calculate the hash both
ways if the first one fails, and always generate new hashes with the
proper unsigned char version.

Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/oe-lkp/202212291509.704a11c9-oliver.sang@intel.com
Link: https://lore.kernel.org/all/CAHk-=whUNjwqZXa-MH9KMmc_CpQpoFKFjAB9ZKHuu=TbsouT4A@mail.gmail.com/
Exposed-by: 3bc753c06dd0 ("kbuild: treat char as always unsigned")
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Theodore Ts'o <tytso@mit.edu>,
Cc: Jason Donenfeld <Jason@zx2c4.com>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
---
 fs/ext4/xattr.c | 41 +++++++++++++++++++++++++++++++++++++++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 7decaaf27e82..69a1b8c6a2ec 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -81,6 +81,8 @@ ext4_xattr_block_cache_find(struct inode *, struct ext4_xattr_header *,
 			    struct mb_cache_entry **);
 static __le32 ext4_xattr_hash_entry(char *name, size_t name_len, __le32 *value,
 				    size_t value_count);
+static __le32 ext4_xattr_hash_entry_signed(char *name, size_t name_len, __le32 *value,
+				    size_t value_count);
 static void ext4_xattr_rehash(struct ext4_xattr_header *);
 
 static const struct xattr_handler * const ext4_xattr_handler_map[] = {
@@ -470,8 +472,21 @@ ext4_xattr_inode_verify_hashes(struct inode *ea_inode,
 		tmp_data = cpu_to_le32(hash);
 		e_hash = ext4_xattr_hash_entry(entry->e_name, entry->e_name_len,
 					       &tmp_data, 1);
-		if (e_hash != entry->e_hash)
-			return -EFSCORRUPTED;
+		/* All good? */
+		if (e_hash == entry->e_hash)
+			return 0;
+
+		/*
+		 * Not good. Maybe the entry hash was calculated
+		 * using the buggy signed char version?
+		 */
+		e_hash = ext4_xattr_hash_entry_signed(entry->e_name, entry->e_name_len,
+							&tmp_data, 1);
+		if (e_hash == entry->e_hash)
+			return 0;
+
+		/* Still no match - bad */
+		return -EFSCORRUPTED;
 	}
 	return 0;
 }
@@ -3091,6 +3106,28 @@ static __le32 ext4_xattr_hash_entry(char *name, size_t name_len, __le32 *value,
 	return cpu_to_le32(hash);
 }
 
+/*
+ * ext4_xattr_hash_entry_signed()
+ *
+ * Compute the hash of an extended attribute incorrectly.
+ */
+static __le32 ext4_xattr_hash_entry_signed(char *name, size_t name_len, __le32 *value, size_t value_count)
+{
+	__u32 hash = 0;
+
+	while (name_len--) {
+		hash = (hash << NAME_HASH_SHIFT) ^
+		       (hash >> (8*sizeof(hash) - NAME_HASH_SHIFT)) ^
+		       (signed char)*name++;
+	}
+	while (value_count--) {
+		hash = (hash << VALUE_HASH_SHIFT) ^
+		       (hash >> (8*sizeof(hash) - VALUE_HASH_SHIFT)) ^
+		       le32_to_cpu(*value++);
+	}
+	return cpu_to_le32(hash);
+}
+
 #undef NAME_HASH_SHIFT
 #undef VALUE_HASH_SHIFT
 
-- 
2.39.0.rc2.4.g1435931e43


      parent reply	other threads:[~2023-01-19 18:24 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17 18:31 generic/454 regression in 6.2-rc1 Eric Whitney
2023-01-18  0:10 ` Andreas Dilger
2023-01-18  3:55   ` Detecting default signedness of char in ext4 (despite -funsigned-char) Eric Biggers
2023-01-18  4:21     ` Eric Biggers
2023-01-18  4:27     ` Linus Torvalds
2023-01-18  5:14       ` Eric Biggers
2023-01-18 15:48         ` Linus Torvalds
2023-01-18 19:14           ` Eric Biggers
2023-01-18 19:39             ` Linus Torvalds
2023-01-18 20:18         ` Theodore Ts'o
     [not found]           ` <CAHk-=wiGdxWtHRZftcqyPf8WbenyjniesKyZ=o73UyxfK9BL-A@mail.gmail.com>
2023-01-18 21:49             ` Theodore Ts'o
2023-01-18 22:20               ` Andreas Dilger
2023-01-19  7:19                 ` Eric Biggers
2023-01-19 18:24                 ` Linus Torvalds [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=whAMYsprEG8idgtvmzsx_Si5r3Hzetp39s3Rkkm=6+eSA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Jason@zx2c4.com \
    --cc=adilger@dilger.ca \
    --cc=ebiggers@kernel.org \
    --cc=enwlinux@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).