linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@dilger.ca>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Eric Biggers <ebiggers@kernel.org>,
	Ext4 Developers List <linux-ext4@vger.kernel.org>,
	Eric Whitney <enwlinux@gmail.com>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	Masahiro Yamada <masahiroy@kernel.org>
Subject: Re: Detecting default signedness of char in ext4 (despite -funsigned-char)
Date: Wed, 18 Jan 2023 15:20:03 -0700	[thread overview]
Message-ID: <3A9E6D2E-F98F-461C-834D-D4E269CC737F@dilger.ca> (raw)
In-Reply-To: <Y8hpZRmHJwdutRr2@mit.edu>

[-- Attachment #1: Type: text/plain, Size: 3397 bytes --]


> On Jan 18, 2023, at 12:39 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>> Note that the reason I'm so laissez-faire about it is that "broken
>> test case" is something very different from "actually broken user
>> space".
>> 
>> I haven't actually seen anybody _report_ this as a problem, I've only
>> seen the generic/454 xfstest failures.

That is likely because the number of 6.2+ kernel users who are using
Unicode xattr names is small, but they would likely come out of the
woodwork once Ubuntu/RHEL start using those kernels, and by then it
would be too late to fix this in a compatible manner.

> On Wed, Jan 18, 2023 at 03:14:04PM -0600, Linus Torvalds wrote:
>> You're missing the fact that 'char' gets expanded to 'int', and in the
>> process but #7 gets copied to bits 8-31 if it is signed.
>> 
>> Then the xor and the later shifting will move those bits around..
> 
> I agree with your analysis that in actual practice, almost no one
> actually uses non-ASCII characters for xattr names.  (Filenames, yes,
> but in general xattr names are set by programs, not by users.)  So
> besides xfstests generic/454, how likely is it that people would be
> using things like Octopus emoji's or Unicode characters such as <GREEK
> UPSILON WITH ACUTE AND HOOK SYMBOL>?  Very unlikely, I'd argue.  I
> might be a bit more worried about userspace applications written for,
> say, Red Flag Linux in China using chinese characters in xattrs, but
> I'd argue even there it's much more likely that this would be in the
> xattr values as opposed to the name.
> 
> In terms of what should we do for next steps, if we only pick signed,
> then it's possible if there are some edge case users who actually did
> use non-ASCII characters in the xattr name on PowerPC, ARM, or S/390,
> they would be broken.  That's simpler, and if we think there are
> darned few of them, I guess we could do that.
> 
> That being said, it's not that much more work to use a flag in the
> superblock to indicate whether or not we should be explicitly casting
> *name to either a signed or unsigned char, and then setting that flag
> automagically to avoid problems on people who started the file system
> on say, x86 before the signed to unsigned char transition, and who
> started natively on a PowerPC, ARM, or S/390.
> 
> The one bit which makes this a bit more complex is either way, we need
> to change both the kernel and e2fsprogs, which is why if we do the
> signed/unsigned xattr hash flag, it's important to set the flag value
> to be whatever the "default" signeded would be on that architecture
> pre 6.2-rc1.

It makes sense to use the existing UNSIGNED/SIGNED flag in the superblock
for the dir hash also for the xattr hash.  That would give the historical
value for the xattr hashes prior to the 6.2 unsigned char change, and is
correct for all filesystems and xattrs *except* non-ASCII xattr names
created on 6.2+ kernels (which should indeed be relatively few cases).

e2fsck could do the same, and would again be correct for all xattrs names
except those created with kernel 6.2+.  It could check both the signed
and unsigned forms and correct those few that are reversed compared to
the superblock flag, which should be rare, but isn't much work and avoids
incorrectly clearing the "corrupted" xattrs.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

  reply	other threads:[~2023-01-18 22:20 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-17 18:31 generic/454 regression in 6.2-rc1 Eric Whitney
2023-01-18  0:10 ` Andreas Dilger
2023-01-18  3:55   ` Detecting default signedness of char in ext4 (despite -funsigned-char) Eric Biggers
2023-01-18  4:21     ` Eric Biggers
2023-01-18  4:27     ` Linus Torvalds
2023-01-18  5:14       ` Eric Biggers
2023-01-18 15:48         ` Linus Torvalds
2023-01-18 19:14           ` Eric Biggers
2023-01-18 19:39             ` Linus Torvalds
2023-01-18 20:18         ` Theodore Ts'o
     [not found]           ` <CAHk-=wiGdxWtHRZftcqyPf8WbenyjniesKyZ=o73UyxfK9BL-A@mail.gmail.com>
2023-01-18 21:49             ` Theodore Ts'o
2023-01-18 22:20               ` Andreas Dilger [this message]
2023-01-19  7:19                 ` Eric Biggers
2023-01-19 18:24                 ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3A9E6D2E-F98F-461C-834D-D4E269CC737F@dilger.ca \
    --to=adilger@dilger.ca \
    --cc=Jason@zx2c4.com \
    --cc=ebiggers@kernel.org \
    --cc=enwlinux@gmail.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).