linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kbuild: treat char as always signed
@ 2022-10-19 16:26 Jason A. Donenfeld
  2022-10-19 16:54 ` Segher Boessenkool
                   ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-19 16:26 UTC (permalink / raw)
  To: linux-kernel, linux-kbuild, linux-arch, linux-toolchains
  Cc: Jason A. Donenfeld, Masahiro Yamada, Kees Cook, Andrew Morton,
	Linus Torvalds, Andy Shevchenko, Greg Kroah-Hartman

Recently, some compile-time checking I added to the clamp_t family of
functions triggered a build error when a poorly written driver was
compiled on ARM, because the driver assumed that the naked `char` type
is signed, but ARM treats it as unsigned, and the C standard says it's
architecture-dependent.

I doubt this particular driver is the only instance in which
unsuspecting authors assume that `char` with no `signed` or `unsigned`
designation is signed, because that's how the other types work. We were
lucky enough this time that that driver used `clamp_t(char,
negative_value, positive_value)`, so the new checking code found it, and
I've sent a patch to fix it, but there are likely other places lurking
that won't be so easily unearthed.

So let's just eliminate this particular variety of heisensigned bugs
entirely. Set `-fsigned-char` globally, so that gcc makes the type
signed on all architectures.

Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/lkml/202210190108.ESC3pc3D-lkp@intel.com/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index f41ec8c8426b..f1abcaf7110e 100644
--- a/Makefile
+++ b/Makefile
@@ -562,7 +562,7 @@ KBUILD_AFLAGS   := -D__ASSEMBLY__ -fno-PIE
 KBUILD_CFLAGS   := -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs \
 		   -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE \
 		   -Werror=implicit-function-declaration -Werror=implicit-int \
-		   -Werror=return-type -Wno-format-security \
+		   -Werror=return-type -Wno-format-security -fsigned-char \
 		   -std=gnu11
 KBUILD_CPPFLAGS := -D__KERNEL__
 KBUILD_RUSTFLAGS := $(rust_common_flags) \
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 16:26 [PATCH] kbuild: treat char as always signed Jason A. Donenfeld
@ 2022-10-19 16:54 ` Segher Boessenkool
  2022-10-19 17:14   ` Linus Torvalds
  2022-10-19 19:54 ` Linus Torvalds
       [not found] ` <202210201618.8XhEGsLd-lkp@intel.com>
  2 siblings, 1 reply; 70+ messages in thread
From: Segher Boessenkool @ 2022-10-19 16:54 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Linus Torvalds,
	Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 10:26:48AM -0600, Jason A. Donenfeld wrote:
> Recently, some compile-time checking I added to the clamp_t family of
> functions triggered a build error when a poorly written driver was
> compiled on ARM, because the driver assumed that the naked `char` type
> is signed, but ARM treats it as unsigned, and the C standard says it's
> architecture-dependent.

> So let's just eliminate this particular variety of heisensigned bugs
> entirely. Set `-fsigned-char` globally, so that gcc makes the type
> signed on all architectures.

This is an ABI change.  It is also hugely detrimental to generated
code quality on architectures that make the saner choice (that is, have
most instructions zero-extend byte quantities).

Instead, don't actively disable the compiler warnings that catch such
cases?  So start with removing footguns like

  # disable pointer signed / unsigned warnings in gcc 4.0
  KBUILD_CFLAGS += -Wno-pointer-sign


Segher

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 16:54 ` Segher Boessenkool
@ 2022-10-19 17:14   ` Linus Torvalds
  2022-10-19 17:26     ` Linus Torvalds
  2022-10-19 17:43     ` Segher Boessenkool
  0 siblings, 2 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-10-19 17:14 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 9:57 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> This is an ABI change.  It is also hugely detrimental to generated
> code quality on architectures that make the saner choice (that is, have
> most instructions zero-extend byte quantities).

Yeah, I agree. We should just accept the standard wording, and be
aware that 'char' has indeterminate signedness.

But:

> Instead, don't actively disable the compiler warnings that catch such
> cases?  So start with removing footguns like
>
>   # disable pointer signed / unsigned warnings in gcc 4.0
>   KBUILD_CFLAGS += -Wno-pointer-sign

Nope, that won't fly.

The pointer-sign thing doesn't actually help (ie it won't find places
where you actually compare a char), and it causes untold damage in
doing completely insane things.

For example, it suddenly starts warning if  you *are* being careful,
and explicitly use 'unsigned char array[]' things to avoid any sign
issues, and then want to do simple and straightforward things with
said array (like doing a 'strcmp()' on it).

Seriously, -Wpointer-sign is not just useless, it's actively _evil_.
The fact that you suggest that clearly means that you've never used
it.

                      Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 17:14   ` Linus Torvalds
@ 2022-10-19 17:26     ` Linus Torvalds
  2022-10-19 18:10       ` Nick Desaulniers
  2022-10-19 17:43     ` Segher Boessenkool
  1 sibling, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2022-10-19 17:26 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 10:14 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The pointer-sign thing doesn't actually help (ie it won't find places
> where you actually compare a char), and it causes untold damage in
> doing completely insane things.

Side note: several years ago I tried to make up some sane rules to
have 'sparse' actually be able to warn when a 'char' was used in a
context where the sign mattered.

I failed miserably.

You actually can see some signs (heh) of that in the sparse sources,
in that the type system actually has a bit for explicitly signed types
("MOD_EXPLICITLY_SIGNED"), but it ends up being almost entirely
unused.

That bit does still have one particular use: the "bitfield is
dubiously signed" thing where sparse will complain about bitfields
that are implicitly (but not explicitly) signed. Because people really
expect 'int a:1' to have values 0/1, not 0/-1.

But the original intent was to find code where people used a 'char'
that wasn't explicitly signed, and that then had architecture-defined
behavior.

I just could not come up with any even remotely sane warning
heuristics that didn't have a metric buttload of false positives.

I still have this feeling that it *should* be possible to warn about
the situation where you end up doing an implicit type widening (ie the
normal C "arithmetic is always done in at least 'int'") that then does
not get narrowed down again without the upper bits ever mattering.

But it needs somebody smarter than me, I'm afraid.

And the fact that I don't think any other compiler has that warning
either makes me just wonder if my feeling that it should be possible
is just wrong.

                   Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 17:14   ` Linus Torvalds
  2022-10-19 17:26     ` Linus Torvalds
@ 2022-10-19 17:43     ` Segher Boessenkool
  2022-10-19 18:11       ` Linus Torvalds
  1 sibling, 1 reply; 70+ messages in thread
From: Segher Boessenkool @ 2022-10-19 17:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

Hi!

On Wed, Oct 19, 2022 at 10:14:20AM -0700, Linus Torvalds wrote:
> On Wed, Oct 19, 2022 at 9:57 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > This is an ABI change.  It is also hugely detrimental to generated
> > code quality on architectures that make the saner choice (that is, have
> > most instructions zero-extend byte quantities).
> 
> Yeah, I agree. We should just accept the standard wording, and be
> aware that 'char' has indeterminate signedness.

And plain "char" is a separate type from "signed char" and "unsigned
char" both.

> But:
> 
> > Instead, don't actively disable the compiler warnings that catch such
> > cases?  So start with removing footguns like
> >
> >   # disable pointer signed / unsigned warnings in gcc 4.0
> >   KBUILD_CFLAGS += -Wno-pointer-sign
> 
> Nope, that won't fly.
> 
> The pointer-sign thing doesn't actually help (ie it won't find places
> where you actually compare a char), and it causes untold damage in
> doing completely insane things.

When I did this more than a decade ago there indeed was a LOT of noise,
mostly caused by dubious code.  I do agree many cases detected are not
very important, but it also revealed cases where a filesystem's disk
format changed (atarifs or amigafs or such iirc) -- many cases it is
annoying to be reminded of sloppy code, but in some cases it detects
crucial problems.

> Seriously, -Wpointer-sign is not just useless, it's actively _evil_.

Then suggest something better?  Or suggest improvements to the existing
warning?

This warning is part of -Wall, most people must not have problems with
it (or people are so apathetic about this that they have not complained
about it).

It is easy to improve your code when the compiler detects problems like
this.  Of course after such a long time of lax code sanity enforcement
you get all warnings at once :-/

> The fact that you suggest that clearly means that you've never used
> it.

Ah, ad hominems.  Great.


Segher

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 17:26     ` Linus Torvalds
@ 2022-10-19 18:10       ` Nick Desaulniers
  2022-10-19 18:35         ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Nick Desaulniers @ 2022-10-19 18:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Segher Boessenkool, Jason A. Donenfeld, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Kees Cook, Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 10:26 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Oct 19, 2022 at 10:14 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > The pointer-sign thing doesn't actually help (ie it won't find places
> > where you actually compare a char), and it causes untold damage in
> > doing completely insane things.
>
> Side note: several years ago I tried to make up some sane rules to
> have 'sparse' actually be able to warn when a 'char' was used in a
> context where the sign mattered.

Do you have examples? Maybe we could turn this into a compiler feature
request.  Having prior art on the problem would be a boon.

>
> I failed miserably.
>
> You actually can see some signs (heh) of that in the sparse sources,
> in that the type system actually has a bit for explicitly signed types
> ("MOD_EXPLICITLY_SIGNED"), but it ends up being almost entirely
> unused.
>
> That bit does still have one particular use: the "bitfield is
> dubiously signed" thing where sparse will complain about bitfields
> that are implicitly (but not explicitly) signed. Because people really
> expect 'int a:1' to have values 0/1, not 0/-1.

Clang's -Wbitfield-constant-conversion can catch that.
commit 5c5c2baad2b5 ("ASoC: mchp-spdiftx: Fix clang
-Wbitfield-constant-conversion")
commit eab9100d9898 ("ASoC: mchp-spdiftx: Fix clang
-Wbitfield-constant-conversion")
commit 37209783c73a ("thunderbolt: Make priority unsigned in struct tb_path")

>
> But the original intent was to find code where people used a 'char'
> that wasn't explicitly signed, and that then had architecture-defined
> behavior.
>
> I just could not come up with any even remotely sane warning
> heuristics that didn't have a metric buttload of false positives.
>
> I still have this feeling that it *should* be possible to warn about
> the situation where you end up doing an implicit type widening (ie the
> normal C "arithmetic is always done in at least 'int'") that then does
> not get narrowed down again without the upper bits ever mattering.
>
> But it needs somebody smarter than me, I'm afraid.
>
> And the fact that I don't think any other compiler has that warning
> either makes me just wonder if my feeling that it should be possible
> is just wrong.
>
>                    Linus



-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 17:43     ` Segher Boessenkool
@ 2022-10-19 18:11       ` Linus Torvalds
  2022-10-19 18:20         ` Nick Desaulniers
                           ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-10-19 18:11 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 10:45 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> When I did this more than a decade ago there indeed was a LOT of noise,
> mostly caused by dubious code.

It really happens with explicitly *not* dubious code.

Using 'unsigned char[]' is very common in code that actually does
anything where you care about the actual byte values. Things like
utf-8 handling, things like compression, lots and lots of cases.

But a number of those cases are still dealing with *strings*. UTF-8 is
still a perfectly valid C string format, and using 'strlen()' on a
buffer that contains UTF-8 is neither unusual nor wrong. It is still
the proper way to get the byte length of the thing. It's how UTF-8 is
literally designed.

And -Wpointer-sign will complain about that, unless you start doing
explicit casting, which is just a worse fix than the disease.

Explicit casts are bad (unless, of course, you are explicitly trying
to violate the type system, when they are both required, and a great
way to say "look, I'm doing something dangerous").

So people who say "just cast it", don't understand that casts *should*
be seen as "this code is doing something special, tread carefully". If
you just randomly add casts to shut up a warning, the casts become
normalized and don't raise the kind of warning signs that they
*should* raise.

And it's really annoying, because the code ends up using 'unsigned
char' exactly _because_ it's trying to be careful and explicit about
signs, and then the warning makes that carefully written code worse.

> Then suggest something better?  Or suggest improvements to the existing
> warning?

As I mentioned in the next email, I tried to come up with something
better in sparse, which wasn't based on the pointer type comparison,
but on the actual 'char' itself.

My (admittedly only ever half-implemented) thing actually worked fine
for the simple cases (where simplification would end up just undoing
all the "expand char to int" because the end use was just assigned to
another char, or it was masked for other reasons).

But while sparse does a lot of basic optimizations, it still left
enough "look, you're doing sign-extensions on a 'char'" on the table
that it warned about perfectly valid stuff.

And maybe that's fundamentally hard.

The "-Wpointer-sign" thing could probably be fairly easily improved,
by just recognizing that things like 'strlen()' and friends do not
care about the sign of 'char', and neither does a 'strcmp()' that only
checks for equality (but if you check the *sign* of strcmp, it does
matter).

It's been some time since I last tried it, but at least from memory,
it really was mostly the standard C string functions that caused
almost all problems.  Your *own* functions you can just make sure the
signedness is right, but it's really really annoying when you try to
be careful about the byte signs, and the compiler starts complaining
just because you want to use the bog-standard 'strlen()' function.

And no, something like 'ustrlen()' with a hidden cast is just noise
for a warning that really shouldn't exist.

So some way to say 'this function really doesn't care about the sign
of this pointer' (and having the compiler know that for the string
functions it already knows about anyway) would probably make almost
all problems with -Wsign-warning go away.

Put another way: 'char *' is so fundamental and inherent in C, that
you can't just warn when people use it in contexts where sign really
doesn't matter.

                 Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 18:11       ` Linus Torvalds
@ 2022-10-19 18:20         ` Nick Desaulniers
  2022-10-19 18:56           ` Linus Torvalds
  2022-10-19 21:07         ` David Laight
  2022-10-20 10:41         ` Gabriel Paubert
  2 siblings, 1 reply; 70+ messages in thread
From: Nick Desaulniers @ 2022-10-19 18:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Segher Boessenkool, Jason A. Donenfeld, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Kees Cook, Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 11:11 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But while sparse does a lot of basic optimizations, it still left
> enough "look, you're doing sign-extensions on a 'char'" on the table
> that it warned about perfectly valid stuff.
>
> And maybe that's fundamentally hard.
>
> The "-Wpointer-sign" thing could probably be fairly easily improved,
> by just recognizing that things like 'strlen()' and friends do not
> care about the sign of 'char', and neither does a 'strcmp()' that only
> checks for equality (but if you check the *sign* of strcmp, it does
> matter).
>
> It's been some time since I last tried it, but at least from memory,
> it really was mostly the standard C string functions that caused
> almost all problems.  Your *own* functions you can just make sure the
> signedness is right, but it's really really annoying when you try to
> be careful about the byte signs, and the compiler starts complaining
> just because you want to use the bog-standard 'strlen()' function.
>
> And no, something like 'ustrlen()' with a hidden cast is just noise
> for a warning that really shouldn't exist.
>
> So some way to say 'this function really doesn't care about the sign
> of this pointer' (and having the compiler know that for the string
> functions it already knows about anyway) would probably make almost
> all problems with -Wsign-warning go away.
>
> Put another way: 'char *' is so fundamental and inherent in C, that
> you can't just warn when people use it in contexts where sign really
> doesn't matter.

A few times in the past, we've split a warning flag into a group so
that we could be more specific about distinct cases. Perhaps if
-Wpointer-sign was a group that implied -Wpointer-char-sign, then the
kernel could use -Wpointer-sign -Wno-pointer-char-sign.

I don't know if that's the right granularity though.
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 18:10       ` Nick Desaulniers
@ 2022-10-19 18:35         ` Linus Torvalds
  2022-10-19 19:23           ` Andy Shevchenko
  0 siblings, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2022-10-19 18:35 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Segher Boessenkool, Jason A. Donenfeld, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Kees Cook, Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 11:10 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Wed, Oct 19, 2022 at 10:26 AM Linus Torvalds
> >
> > Side note: several years ago I tried to make up some sane rules to
> > have 'sparse' actually be able to warn when a 'char' was used in a
> > context where the sign mattered.
>
> Do you have examples? Maybe we could turn this into a compiler feature
> request.  Having prior art on the problem would be a boon.

It's been over a decade since I seriously worked on sparse (Hmm.
Probably two, actually).  And I never got the 'char' logic to work
well enough for it to have ever made it into the kernel.

I'm also fairly sure I did it wrong - if I recall correctly, I did it
on the type system level, and the logic was in the tree
simplification, which was always much too weak.

Sparse does *some* expression simplification as it builds up the parse
tree and does all the type evaluations ("evaludate.c" in sparse), but
most of the real optimization work is done on the SSA format.

So what I probably *should* have done was to have a special "BEXT"
opcode (for "byte extend", the same way sparse has ZEXT and SEXT for
well-defined integer zero extend and sign extend), and linearized it
with all the simplifications that we do on the SSA level, and then if
the BEXT opcode still exists after all our optimization work, we'd
warn about it, because that means that the signedness ends up
mattering.

But sparse originally did almost everything just based on the type
system, which was the original intent of sparse (ie the whole "extend
the pointer types to have different address spaces" was really what
sparse was all about).

> Clang's -Wbitfield-constant-conversion can catch that.

Yeah, so bitfield signedness is really trivial, and works all on the
type system.

It's very easy to say: "you defined this member as an implicitly
signed bitfield, did you *really* mean to do that?" because signed
bitfields simply do not exists in the kernel.

So that warning is trivial, and the fix is basically universally
change 'int a:1' to 'unsigned a:1', because even *if* you do want
signed bitfields, it's just better to make that very very explicit,
and write it as 'signed int x:10'.

We do have a couple of signed bitfields in the kernel, but they are
unusual enough that it's actually a good thing that sparse just made
people be explicit about it.

Do

        git grep '\<signed\>.*:[1-9]'

to see the (few) examples and a few false positives that trigger in
the trace docs.

So sparse doesn't actually have to be clever about bitfield signs. It
can literally just say "did you really mean to do that", and that's
it. Very simple. Not at all the complexity that 'char' has, where
every single use technically tends to cause a sign-extension (due to
the integer conversion), but that doesn't mean that it *matters* in
the end.

            Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 18:20         ` Nick Desaulniers
@ 2022-10-19 18:56           ` Linus Torvalds
  2022-10-19 19:11             ` Kees Cook
  2022-10-19 20:15             ` Segher Boessenkool
  0 siblings, 2 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-10-19 18:56 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Segher Boessenkool, Jason A. Donenfeld, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Kees Cook, Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 11:21 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> A few times in the past, we've split a warning flag into a group so
> that we could be more specific about distinct cases. Perhaps if
> -Wpointer-sign was a group that implied -Wpointer-char-sign, then the
> kernel could use -Wpointer-sign -Wno-pointer-char-sign.

That might be interesting, just to see how much of the kernel is about
'char *' and how much is other noise.

Just for fun (for some definition of "fun") I tried to remove the
-Wno-pointer-sign thing, and started building a kernel.

After fixing fortify-string.h to not complain (which was indeed about
strlen() signedness), it turns out a lot were still about 'char', but
not necessarily the <string,h> functions.

We use 'unsigned char *' for our dentry data, for example, and then you get

     warning: pointer targets in initialization of ‘const unsigned
char *’ from ‘char *’ differ in signedness

when you do something like

    QSTR_INIT(NULL_FILE_NAME,

which is simply doing a regular initializer assignment, and wants to
assign a constant string (in this case the constant string "null") to
that "const unsigned char *name".

That's certainly another example of "why the heck did the compiler
warn about that thing".

You can literally try to compile this one-liner with gcc:

     const unsigned char *c = "p";

and it will complain. What a hugely pointless warning.

BUT.

It turns out we have a lot of non-char warnings too.

The kernel does all these "generic functions" that are based on size, like

        atomic_try_cmpxchg_acquire()

which are basically defined to be about "int sized object", but with
unspecified sign.

And the sign is basically pointless. Some people want "unsigned int",
others might want a signed int.

So from a quick grep, we do have a lot of strlen/strcpy cases, but we
also do have a lot of other cases.

Hundreds and hundreds of that atomic_try_cmpxchg_acquire(), for
example. And they might be trivial to fix (it might be similar to the
fortify-string.h one where it's just a header file that generates most
of them in one single place), but with all the ones that are just
clearly the compiler being silly, they aren't really even worth
looking at.

                    Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 18:56           ` Linus Torvalds
@ 2022-10-19 19:11             ` Kees Cook
  2022-10-19 19:30               ` Linus Torvalds
  2022-10-19 20:15             ` Segher Boessenkool
  1 sibling, 1 reply; 70+ messages in thread
From: Kees Cook @ 2022-10-19 19:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Desaulniers, Segher Boessenkool, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 11:56:00AM -0700, Linus Torvalds wrote:
> Hundreds and hundreds of that atomic_try_cmpxchg_acquire(), for
> example. And they might be trivial to fix (it might be similar to the
> fortify-string.h one where it's just a header file that generates most
> of them in one single place), but with all the ones that are just
> clearly the compiler being silly, they aren't really even worth
> looking at.

Yeah, I've had to fight these casts in fortify-string.h from time to
time. I'd love to see the patch you used -- I bet it would keep future
problems at bay.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 18:35         ` Linus Torvalds
@ 2022-10-19 19:23           ` Andy Shevchenko
  2022-10-19 19:36             ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Andy Shevchenko @ 2022-10-19 19:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Desaulniers, Segher Boessenkool, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 11:35:50AM -0700, Linus Torvalds wrote:
> On Wed, Oct 19, 2022 at 11:10 AM Nick Desaulniers
> <ndesaulniers@google.com> wrote:

...

> We do have a couple of signed bitfields in the kernel, but they are
> unusual enough that it's actually a good thing that sparse just made
> people be explicit about it.

At least drivers/media/usb/msi2500/msi2500.c:289 can be converted
to use sign_extend32() I believe.

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 19:11             ` Kees Cook
@ 2022-10-19 19:30               ` Linus Torvalds
  2022-10-19 20:35                 ` Jason A. Donenfeld
  0 siblings, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2022-10-19 19:30 UTC (permalink / raw)
  To: Kees Cook
  Cc: Nick Desaulniers, Segher Boessenkool, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 12:11 PM Kees Cook <keescook@chromium.org> wrote:
> Yeah, I've had to fight these casts in fortify-string.h from time to
> time. I'd love to see the patch you used -- I bet it would keep future
> problems at bay.

Heh. The fortify-source patch was just literally

  -       unsigned char *__p = (unsigned char *)(p);              \
  +       char *__p = (char *)(p);                                \

in __compiletime_strlen(), just to make the later

        __ret = __builtin_strlen(__p);

happy.

I didn't see any reason that code was using 'unsigned char *', but I
didn't look very closely.

But honestly, while fixing that was just a local thing, a lot of other
cases most definitely weren't.

The crypto code uses 'unsigned char *' a lot - which makes a lot of
sense, since the crypto code really does work basically with a "byte
array", and 'unsigned char *' tends to really be a good way to do
that.

But then a lot of the *users* of the crypto code may have other ideas,
ie they may have strings as the source, where 'char *' is a lot more
natural.

And as mentioned, some of it really is just fairly fundamental
compiler confusion. The fact that you can't use a regular string
literals with 'unsigned char' is just crazy. There's no *advantage* to
that, it's literally just an annoyance.

(And yes, there's u"hello word", but and yes, that's actually
"unsigned char" compatible as of C23, but not because the 'u' is
'unsigned', but because the 'u' stands for 'utf8', and it seems that
the C standard people finally decided that 'unsigned char[]' was the
right type for UTF8. But in C11, it's actually still just 'char *',
and I think that you get that completely broken sign warning unless
you do an explicit cast).

No sane person should think that any of this is reasonable, and C23
actually makes things *WORSE* - not because C23 made the right choice,
but because it just makes the whole signedness even messier.

IOW, signedness is C is such a mess that -Wpointer-sign is actively
detrimental as things are right now. And look above - it's not even
getting better, it's just getting even more confusing and odd.

              Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 19:23           ` Andy Shevchenko
@ 2022-10-19 19:36             ` Linus Torvalds
  0 siblings, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-10-19 19:36 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Nick Desaulniers, Segher Boessenkool, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 12:23 PM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:
>
> > We do have a couple of signed bitfields in the kernel, but they are
> > unusual enough that it's actually a good thing that sparse just made
> > people be explicit about it.
>
> At least drivers/media/usb/msi2500/msi2500.c:289 can be converted
> to use sign_extend32() I believe.

Heh. I didn't even look at that one - I did check that yeah, the MIPS
ones made sense (I say "ones", because while my grep pattern only
finds one, there are several others that have spacing that just made
my grep miss them).

You're right, that msi2500 use is a very odd use of bitfields for just
sign extension.

That's hilariously odd code, but not exactly wrong. And using "signed
int x:14" does make it very explicit that the bitfield wants that
sign.

And that code does actually have a fair number of comments to explain
each step, so I think it's all ok. Strange, but ok.

                  Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 16:26 [PATCH] kbuild: treat char as always signed Jason A. Donenfeld
  2022-10-19 16:54 ` Segher Boessenkool
@ 2022-10-19 19:54 ` Linus Torvalds
  2022-10-19 20:23   ` Jason A. Donenfeld
                     ` (2 more replies)
       [not found] ` <202210201618.8XhEGsLd-lkp@intel.com>
  2 siblings, 3 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-10-19 19:54 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 9:27 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> So let's just eliminate this particular variety of heisensigned bugs
> entirely. Set `-fsigned-char` globally, so that gcc makes the type
> signed on all architectures.

Btw, I do wonder if we might actually be better off doing this - but
doing it the other way around.

IOW, make 'char' always UNsigned. Unlike the signed char thing, it
shouldn't generate any worse code on any common architecture.

And I do think that having odd architecture differences is generally a
bad idea, and making the language rules stricter to avoid differences
is a good thing.

Now, you did '-fsigned-char', because that's the "common default" in
an x86-centric world.

You are also right that people might think that "char" works like
"int", and that if you don't specify the sign, it's signed.

But those people are obviously wrong anyway, so it's not a very strong argument.

And from a kernel perspective, I do think that "treat char as a byte"
and making it be unsigned is in many ways the saner model. There's a
reason we use 'unsigned char' in a fair number of places.

So using '-funsigned-char' might not be a bad idea.

Hilariously (and by "hilariously", I obviously mean "NOT
hilariously"), it doesn't actually fix the warning for

   const unsigned char *c = "p";

which still complains about

   warning: pointer targets in initialization of ‘const unsigned char
*’ from ‘char *’ differ in signedness

even when you've specified that 'char' should be unsigned with -funsigned-char.

Because gcc actually tries to be helpful, and has (reasonably, from a
"type sanity" standpoint) decided that

   "The type char is always a distinct type from each of signed char
or unsigned char, even though its behavior is always just like one of
those two"

so using "-funsigned-char" gives us well-defined *behavior*, but
doesn't really help us with cleaning up our code.

I understand why gcc would want to make it clear that despite any
behavioral issues, "char" is *not* the same as "[un]signed char" in
general. But in this kind of use case, that warning is just pointless
and annoying.

Oh well. You *really* can't win this thing. The game is rigged like
some geeky carnival game.

              Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 18:56           ` Linus Torvalds
  2022-10-19 19:11             ` Kees Cook
@ 2022-10-19 20:15             ` Segher Boessenkool
  1 sibling, 0 replies; 70+ messages in thread
From: Segher Boessenkool @ 2022-10-19 20:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Nick Desaulniers, Jason A. Donenfeld, linux-kernel, linux-kbuild,
	linux-arch, linux-toolchains, Masahiro Yamada, Kees Cook,
	Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 11:56:00AM -0700, Linus Torvalds wrote:
> After fixing fortify-string.h to not complain (which was indeed about
> strlen() signedness), it turns out a lot were still about 'char', but
> not necessarily the <string,h> functions.
> 
> We use 'unsigned char *' for our dentry data, for example, and then you get
> 
>      warning: pointer targets in initialization of ‘const unsigned
> char *’ from ‘char *’ differ in signedness
> 
> when you do something like
> 
>     QSTR_INIT(NULL_FILE_NAME,
> 
> which is simply doing a regular initializer assignment, and wants to
> assign a constant string (in this case the constant string "null") to
> that "const unsigned char *name".

It cannot see that all users of this are okay with ignoring the
difference.

> That's certainly another example of "why the heck did the compiler
> warn about that thing".

Because this is a simple warning.  It did exactly what it is supposed
to -- you are mixing "char" and "unsigned char" here, and in some cases
that matters hugely.

> You can literally try to compile this one-liner with gcc:
> 
>      const unsigned char *c = "p";
> 
> and it will complain. What a hugely pointless warning.

Yes, there are corner cases like this.  Please open a PR if you want
this fixed.

It is UB to (try to) modify string literals (since they can be shared
for example), but still they have type "array of (plain) char".  This is
historical :-/


Segher

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 19:54 ` Linus Torvalds
@ 2022-10-19 20:23   ` Jason A. Donenfeld
  2022-10-19 20:30     ` [PATCH v2] kbuild: treat char as always unsigned Jason A. Donenfeld
  2022-10-19 20:58   ` [PATCH] kbuild: treat char as always signed David Laight
  2022-10-26  0:10   ` make ctype ascii only? (was [PATCH] kbuild: treat char as always signed) Rasmus Villemoes
  2 siblings, 1 reply; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-19 20:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

Hi Linus,

On Wed, Oct 19, 2022 at 12:54:06PM -0700, Linus Torvalds wrote:
> On Wed, Oct 19, 2022 at 9:27 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > So let's just eliminate this particular variety of heisensigned bugs
> > entirely. Set `-fsigned-char` globally, so that gcc makes the type
> > signed on all architectures.
> 
> Btw, I do wonder if we might actually be better off doing this - but
> doing it the other way around.

That could work. The argument here would be that most people indeed
treat char as a byte. I'll send a v2 doing this.

This will probably break some things, though, on drivers that are
already broken on e.g. ARM. For example, the wifi driver I fixed that
started this whole thing would now be broken on x86 too. But also, we're
barely past rc1, so maybe this is something to do now, and then we'll
spend the rest of the 6.1 cycle fixing issues as the pop up? Sounds fine
to me if you think that's doable.

Either way, I'll send a patch and you can do with it what you think is
best.

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH v2] kbuild: treat char as always unsigned
  2022-10-19 20:23   ` Jason A. Donenfeld
@ 2022-10-19 20:30     ` Jason A. Donenfeld
  2022-10-19 23:56       ` Linus Torvalds
                         ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-19 20:30 UTC (permalink / raw)
  To: linux-kernel, linux-kbuild, linux-arch, linux-toolchains
  Cc: Jason A. Donenfeld, Masahiro Yamada, Kees Cook, Andrew Morton,
	Linus Torvalds, Andy Shevchenko, Greg Kroah-Hartman

Recently, some compile-time checking I added to the clamp_t family of
functions triggered a build error when a poorly written driver was
compiled on ARM, because the driver assumed that the naked `char` type
is signed, but ARM treats it as unsigned, and the C standard says it's
architecture-dependent.

I doubt this particular driver is the only instance in which
unsuspecting authors make assumptions about `char` with no `signed` or
`unsigned` specifier. We were lucky enough this time that that driver
used `clamp_t(char, negative_value, positive_value)`, so the new
checking code found it, and I've sent a patch to fix it, but there are
likely other places lurking that won't be so easily unearthed.

So let's just eliminate this particular variety of heisensign bugs
entirely. Set `-funsigned-char` globally, so that gcc makes the type
unsigned on all architectures.

This will break things in some places and fix things in others, so this
will likely cause a bit of churn while reconciling the type misuse.

Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://lore.kernel.org/lkml/202210190108.ESC3pc3D-lkp@intel.com/
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index f41ec8c8426b..bbf376931899 100644
--- a/Makefile
+++ b/Makefile
@@ -562,7 +562,7 @@ KBUILD_AFLAGS   := -D__ASSEMBLY__ -fno-PIE
 KBUILD_CFLAGS   := -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs \
 		   -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE \
 		   -Werror=implicit-function-declaration -Werror=implicit-int \
-		   -Werror=return-type -Wno-format-security \
+		   -Werror=return-type -Wno-format-security -funsigned-char \
 		   -std=gnu11
 KBUILD_CPPFLAGS := -D__KERNEL__
 KBUILD_RUSTFLAGS := $(rust_common_flags) \
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 19:30               ` Linus Torvalds
@ 2022-10-19 20:35                 ` Jason A. Donenfeld
  2022-10-20  0:10                   ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-19 20:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Nick Desaulniers, Segher Boessenkool, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 12:30:59PM -0700, Linus Torvalds wrote:
> The crypto code uses 'unsigned char *' a lot - which makes a lot of
> sense, since the crypto code really does work basically with a "byte
> array", and 'unsigned char *' tends to really be a good way to do
> that.

I wish folks would use `u8 *` when they mean "byte array".

Maybe the attitude should just be -- use u8 for bytes, s8 for signed
bytes, and char for characters/strings. Declare any use of char for
something non-stringy forbidden, and call it a day. Yes, obviously u8
and s8 are just typedefs, but they're a lot more explicit of intent.

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* RE: [PATCH] kbuild: treat char as always signed
  2022-10-19 19:54 ` Linus Torvalds
  2022-10-19 20:23   ` Jason A. Donenfeld
@ 2022-10-19 20:58   ` David Laight
  2022-10-26  0:10   ` make ctype ascii only? (was [PATCH] kbuild: treat char as always signed) Rasmus Villemoes
  2 siblings, 0 replies; 70+ messages in thread
From: David Laight @ 2022-10-19 20:58 UTC (permalink / raw)
  To: 'Linus Torvalds', Jason A. Donenfeld
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

From: Linus Torvalds
> Sent: 19 October 2022 20:54
> 
> On Wed, Oct 19, 2022 at 9:27 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > So let's just eliminate this particular variety of heisensigned bugs
> > entirely. Set `-fsigned-char` globally, so that gcc makes the type
> > signed on all architectures.
> 
> Btw, I do wonder if we might actually be better off doing this - but
> doing it the other way around.
> 
> IOW, make 'char' always UNsigned. Unlike the signed char thing, it
> shouldn't generate any worse code on any common architecture.
> 
> And I do think that having odd architecture differences is generally a
> bad idea, and making the language rules stricter to avoid differences
> is a good thing.
> 
> Now, you did '-fsigned-char', because that's the "common default" in
> an x86-centric world.

I'm pretty sure char is signed because the pdp11 only had
sign-extending byte loads.

> You are also right that people might think that "char" works like
> "int", and that if you don't specify the sign, it's signed.

But even 'unsigned char' works like int.
The values are promoted to int (thanks to the brain-dead ANSI-C
committee) rather than unsigned int (which I think was in K&R C).
(There is an exception, int, short and char can all be the same size.
In which case unsigned char promotes to unsigned int.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* RE: [PATCH] kbuild: treat char as always signed
  2022-10-19 18:11       ` Linus Torvalds
  2022-10-19 18:20         ` Nick Desaulniers
@ 2022-10-19 21:07         ` David Laight
  2022-10-19 21:26           ` Segher Boessenkool
  2022-10-20 10:41         ` Gabriel Paubert
  2 siblings, 1 reply; 70+ messages in thread
From: David Laight @ 2022-10-19 21:07 UTC (permalink / raw)
  To: 'Linus Torvalds', Segher Boessenkool
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

From: Linus Torvalds
> Sent: 19 October 2022 19:11
...
> Explicit casts are bad (unless, of course, you are explicitly trying
> to violate the type system, when they are both required, and a great
> way to say "look, I'm doing something dangerous").

The worst ones in the kernel are the __force ones for sparse.
They really ought to be a function (#define) so that they
are not seen by the compiler at all.
Otherwise they can hide a multitude of sins.

There are also the casts to convert integer values to/from unsigned.
and to different sized integers.
They all happen far too often and can hide things.
A '+ 0u' will convert into to unsigned int without a cast.
Casts really ought to be rare.
Even the casts to from (void *) (for 'buffers') can usually be
made implicit in a function call argument.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 21:07         ` David Laight
@ 2022-10-19 21:26           ` Segher Boessenkool
  0 siblings, 0 replies; 70+ messages in thread
From: Segher Boessenkool @ 2022-10-19 21:26 UTC (permalink / raw)
  To: David Laight
  Cc: 'Linus Torvalds',
	Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 09:07:01PM +0000, David Laight wrote:
> From: Linus Torvalds
> > Sent: 19 October 2022 19:11
> > Explicit casts are bad (unless, of course, you are explicitly trying
> > to violate the type system, when they are both required, and a great
> > way to say "look, I'm doing something dangerous").

> Casts really ought to be rare.

Sometimes you need casts for *data*, like where you write  (u32)smth
because you really want the low 32 bits of that something.  That only
happens in some kinds of code -- multi-precision integer, some crypto,
serialisation primitives.

You often want casts for varargs, too.  The alternative is to make very
certain some other way that the actual arguments will have the correct
type, but that is often awkward to do, and not as clear to read.

Pointer casts are almost always a mistake.  If you think you want one
you are almost always wrong.


Segher

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-19 20:30     ` [PATCH v2] kbuild: treat char as always unsigned Jason A. Donenfeld
@ 2022-10-19 23:56       ` Linus Torvalds
  2022-10-20  0:02         ` Jason A. Donenfeld
  2022-10-20 20:24         ` Segher Boessenkool
  2022-10-24  9:24       ` Dan Carpenter
  2022-12-21 14:53       ` Guenter Roeck
  2 siblings, 2 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-10-19 23:56 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 1:30 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> So let's just eliminate this particular variety of heisensign bugs
> entirely. Set `-funsigned-char` globally, so that gcc makes the type
> unsigned on all architectures.
>
> This will break things in some places and fix things in others, so this
> will likely cause a bit of churn while reconciling the type misuse.

Yeah, if we were still in the merge window, I'd probably apply this,
but as things stand, I think it should go into linux-next and cook
there for the next merge window.

Anybody willing to put this in their -next trees?

Any breakage it causes is likely going to be fairly subtle, and in
some random driver that isn't used on architectures that already have
an unsigned 'char' type.

I think the architectures with an unsigned 'char' are arm, powerpc and
s390, in all their variations (ie both 32- and 64-bit).

So all *core* code should be fine with this, but that still leaves a
lot of drivers that have likely never been tested on anything but x86,
and could just stop working.

I don't think breakage is very *likely*, but I suspect it exists.

                       Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-19 23:56       ` Linus Torvalds
@ 2022-10-20  0:02         ` Jason A. Donenfeld
  2022-10-20  0:38           ` Linus Torvalds
  2022-10-20 20:24         ` Segher Boessenkool
  1 sibling, 1 reply; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-20  0:02 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 04:56:03PM -0700, Linus Torvalds wrote:
> On Wed, Oct 19, 2022 at 1:30 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > So let's just eliminate this particular variety of heisensign bugs
> > entirely. Set `-funsigned-char` globally, so that gcc makes the type
> > unsigned on all architectures.
> >
> > This will break things in some places and fix things in others, so this
> > will likely cause a bit of churn while reconciling the type misuse.
> 
> Yeah, if we were still in the merge window, I'd probably apply this,
> but as things stand, I think it should go into linux-next and cook
> there for the next merge window.
> 
> Anybody willing to put this in their -next trees?

Sure, happy to take it.


> 
> Any breakage it causes is likely going to be fairly subtle, and in
> some random driver that isn't used on architectures that already have
> an unsigned 'char' type.
> 
> I think the architectures with an unsigned 'char' are arm, powerpc and
> s390, in all their variations (ie both 32- and 64-bit).
> 
> So all *core* code should be fine with this, but that still leaves a
> lot of drivers that have likely never been tested on anything but x86,
> and could just stop working.
> 
> I don't think breakage is very *likely*, but I suspect it exists.

Given I've started with cleaning up one driver already, I'll keep my eye
on further breakage.

Jason

> 
>                        Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 20:35                 ` Jason A. Donenfeld
@ 2022-10-20  0:10                   ` Linus Torvalds
  2022-10-20  3:11                     ` Jason A. Donenfeld
  0 siblings, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2022-10-20  0:10 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Kees Cook, Nick Desaulniers, Segher Boessenkool, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 1:35 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> I wish folks would use `u8 *` when they mean "byte array".

Together with '-funsigned-char', we could typedef 'u8' to just 'char'
(just for __KERNEL__ code, though!), and then we really could just use
'strlen()' and friends on said kind of arrays without any warnings.

But we do have a *lot* of 'unsigned char' users, so it would be a huge
amount of churn to do this kind of thing.

And as mentioned, right now we definitely have a lot of other "ignore
sign" code.

Much of it is probably simply because we haven't been able to ever use
that warning flag, so it's just accumulated and might be trivial to
fix. But I wouldn't be surprised at all if some of it ends up somewhat
fundamental.

              Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-20  0:02         ` Jason A. Donenfeld
@ 2022-10-20  0:38           ` Linus Torvalds
  2022-10-20  2:59             ` Jason A. Donenfeld
  2022-10-20 18:41             ` Kees Cook
  0 siblings, 2 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-10-20  0:38 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 5:02 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Given I've started with cleaning up one driver already, I'll keep my eye
> on further breakage.

I wonder if we could just check for code generation differences some way.

I tested a couple of files, and was able to find differences, eg

  # kernel/sched/core.c:8861: pr_info("task:%-15.15s state:%c",
p->comm, task_state_to_char(p));
 - movzbl state_char.149(%rax), %edx # state_char[_60], state_char[_60]
 + movsbl state_char.149(%rax), %edx # state_char[_60], state_char[_60]
   call _printk #

because the 'char' for the '%c' is passed as an integer. And the
tracing code has the

        .is_signed = is_signed_type(_type)

initializers, which obviously change when the type is 'char'.

But I also checked a number of other files that didn't have that
pattern at all, and there was zero code generation difference, even
when the "readable asm" output itself had some changes in some of the
internal label names.

That was what my old 'sparse' trial thing was actually *hoping* (but
failed) to do, ie notice when the signedness of a char actually
affects code generation. And it does in fact seem fairly rare.

Having some scripting automation that just notices "this changes code
generation in function X" might actually be interesting, and judging
by my quick tests might not be *too* verbose.

             Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-20  0:38           ` Linus Torvalds
@ 2022-10-20  2:59             ` Jason A. Donenfeld
  2022-10-20 18:41             ` Kees Cook
  1 sibling, 0 replies; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-20  2:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 05:38:55PM -0700, Linus Torvalds wrote:
> On Wed, Oct 19, 2022 at 5:02 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > Given I've started with cleaning up one driver already, I'll keep my eye
> > on further breakage.
> 
> I wonder if we could just check for code generation differences some way.
> Having some scripting automation that just notices "this changes code
> generation in function X" might actually be interesting, and judging
> by my quick tests might not be *too* verbose.

Or even just some allyesconfig diffing. 

> I tested a couple of files, and was able to find differences, eg
> 
>   # kernel/sched/core.c:8861: pr_info("task:%-15.15s state:%c",
> p->comm, task_state_to_char(p));
>  - movzbl state_char.149(%rax), %edx # state_char[_60], state_char[_60]
>  + movsbl state_char.149(%rax), %edx # state_char[_60], state_char[_60]
>    call _printk #
> 
> because the 'char' for the '%c' is passed as an integer. And the

Seems harmless though.

> tracing code has the
> 
>         .is_signed = is_signed_type(_type)
> 
> initializers, which obviously change when the type is 'char'.

And likewise, looking at the types of initializers that's used with.
Actually, for the array one, unsigned is probably more sensible anyway.

The thing is, anyhow, that most code that works without -funsigned-char
*will* work with it, because the core of the kernel obviously works fine
on ARM already. The problematic areas will be x86-specific drivers that
have never been tested on other archs. i915 comes to mind -- as a
general rule, it already does all manner of insane things. But there's
obviously a lot of other hardware that's only ever run on Intel. So I'm
much more concerned about that than I am about code in, say, kernel/sched.

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-20  0:10                   ` Linus Torvalds
@ 2022-10-20  3:11                     ` Jason A. Donenfeld
  0 siblings, 0 replies; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-20  3:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Kees Cook, Nick Desaulniers, Segher Boessenkool, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman,
	Sultan Alsawaf

On Wed, Oct 19, 2022 at 6:11 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Oct 19, 2022 at 1:35 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > I wish folks would use `u8 *` when they mean "byte array".
>
> Together with '-funsigned-char', we could typedef 'u8' to just 'char'
> (just for __KERNEL__ code, though!), and then we really could just use
> 'strlen()' and friends on said kind of arrays without any warnings.
>
> But we do have a *lot* of 'unsigned char' users, so it would be a huge
> amount of churn to do this kind of thing.

I think, though, there's an argument to be made that every use of
`unsigned char` is much better off as a `u8`. We don't have any C23
fancy unicode strings. As far as I can tell, the only usage of
`unsigned char` ought to be "treat this as a byte array", and that's
what u8 is for. Yea, that'd be churn. But technically, it wouldn't
really be difficult churn: If naive-sed mangles that, I'm sure
Coccinelle would be up to the task. If you think that's a wise
direction, I can play with it and see how miserable it is to do.

(As a sidebar, Sultan and I were discussing today... I find the
radical extension of this idea to its logical end somewhat attractive:
exclusively using u64, s64, u32, s32, u16, s16, u8, s8, uword (native
size), sword (native size), char (string/character). It'd hardly look
like C any more, though, and the very mention of the idea is probably
triggering for some. So I'm not actually suggesting we do that in
earnest. But there is some appeal.)

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-19 18:11       ` Linus Torvalds
  2022-10-19 18:20         ` Nick Desaulniers
  2022-10-19 21:07         ` David Laight
@ 2022-10-20 10:41         ` Gabriel Paubert
  2022-10-21 22:46           ` Linus Torvalds
  2 siblings, 1 reply; 70+ messages in thread
From: Gabriel Paubert @ 2022-10-20 10:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Segher Boessenkool, Jason A. Donenfeld, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Kees Cook, Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 11:11:16AM -0700, Linus Torvalds wrote:
> On Wed, Oct 19, 2022 at 10:45 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> >
> > When I did this more than a decade ago there indeed was a LOT of noise,
> > mostly caused by dubious code.
> 
> It really happens with explicitly *not* dubious code.

Indeed.

[snip]
> The "-Wpointer-sign" thing could probably be fairly easily improved,
> by just recognizing that things like 'strlen()' and friends do not
> care about the sign of 'char', and neither does a 'strcmp()' that only
> checks for equality (but if you check the *sign* of strcmp, it does
> matter).

I must miss something, the strcmp man page says:

"The comparison is done using unsigned characters."

But it's not for this that I wrote this message. Has anybody considered
using transparent unions?

They've been heavily used by userland networking code to pass pointer to
sockets, and they work reasonably well in that context IMHO.

So a very wild idea might to make string handling functions accept
transparent union of "char *" and "unsigned char *".

I've not even tried to write any code in this direction, so it's very
likely that this idea won't fly, and it clearly does not solve all
problems. It also probably needs a lot of surgery to avoid clashing with
GCC builtins and unfortunately lose some optimizations.

	Gabriel

> 
> It's been some time since I last tried it, but at least from memory,
> it really was mostly the standard C string functions that caused
> almost all problems.  Your *own* functions you can just make sure the
> signedness is right, but it's really really annoying when you try to
> be careful about the byte signs, and the compiler starts complaining
> just because you want to use the bog-standard 'strlen()' function.
> 
> And no, something like 'ustrlen()' with a hidden cast is just noise
> for a warning that really shouldn't exist.
> 
> So some way to say 'this function really doesn't care about the sign
> of this pointer' (and having the compiler know that for the string
> functions it already knows about anyway) would probably make almost
> all problems with -Wsign-warning go away.
> 
> Put another way: 'char *' is so fundamental and inherent in C, that
> you can't just warn when people use it in contexts where sign really
> doesn't matter.
> 
>                  Linus
 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
       [not found] ` <202210201618.8XhEGsLd-lkp@intel.com>
@ 2022-10-20 16:33   ` Jason A. Donenfeld
  0 siblings, 0 replies; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-20 16:33 UTC (permalink / raw)
  To: kernel test robot
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	kbuild-all, Masahiro Yamada, Kees Cook, Andrew Morton,
	Linux Memory Management List, Andy Shevchenko,
	Greg Kroah-Hartman

On Thu, Oct 20, 2022 at 2:40 AM kernel test robot <lkp@intel.com> wrote:
> >> drivers/s390/block/dasd.c:1912:9: warning: case label value exceeds maximum value for type [-Wswitch-outside-range]
>     1912 |         case DASD_CQR_ERROR:

Just to save other readers the momentary "huh?" that I experienced,
this warning/error is from the -fsigned-char patch. We ultimately went
with (or are trying to go with) the -funsigned-char approach instead.
So safely ignore this kernel test bot error, as it applies to v1
rather than the v2 here:
https://lore.kernel.org/lkml/20221019203034.3795710-1-Jason@zx2c4.com/

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-20  0:38           ` Linus Torvalds
  2022-10-20  2:59             ` Jason A. Donenfeld
@ 2022-10-20 18:41             ` Kees Cook
  2022-10-21  1:01               ` Jason A. Donenfeld
  1 sibling, 1 reply; 70+ messages in thread
From: Kees Cook @ 2022-10-20 18:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 05:38:55PM -0700, Linus Torvalds wrote:
> Having some scripting automation that just notices "this changes code
> generation in function X" might actually be interesting, and judging
> by my quick tests might not be *too* verbose.

On the reproducible build comparison system[1] we use for checking a lot
of the KSPP work for .text deltas, an allmodconfig finds a fair bit for
this change. Out of 33900 .o files, 1005 have changes.

Spot checking matches a lot of what you found already...

        u64 flags = how->flags;
	...
fs/open.c:1123:
        int acc_mode = ACC_MODE(flags);
-    1c86:      movsbl 0x0(%rdx),%edx
+    1c86:      movzbl 0x0(%rdx),%edx

#define ACC_MODE(x) ("\004\002\006\006"[(x)&O_ACCMODE])

Ignoring those, it goes down to 625, and spot checking those is more
difficult, but looks to be mostly register selection changes dominating
the delta. The resulting vmlinux sizes are identical, though.

-Kees

[1] A fancier version of:
    https://outflux.net/blog/archives/2022/06/24/finding-binary-differences/

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-19 23:56       ` Linus Torvalds
  2022-10-20  0:02         ` Jason A. Donenfeld
@ 2022-10-20 20:24         ` Segher Boessenkool
  1 sibling, 0 replies; 70+ messages in thread
From: Segher Boessenkool @ 2022-10-20 20:24 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 04:56:03PM -0700, Linus Torvalds wrote:
> I think the architectures with an unsigned 'char' are arm, powerpc and
> s390, in all their variations (ie both 32- and 64-bit).

xtensa and most MIPS configurations as well, fwiw.


Segher

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-20 18:41             ` Kees Cook
@ 2022-10-21  1:01               ` Jason A. Donenfeld
  0 siblings, 0 replies; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-21  1:01 UTC (permalink / raw)
  To: Kees Cook
  Cc: Linus Torvalds, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman, ndesaulniers

On Thu, Oct 20, 2022 at 11:41:29AM -0700, Kees Cook wrote:
> On Wed, Oct 19, 2022 at 05:38:55PM -0700, Linus Torvalds wrote:
> > Having some scripting automation that just notices "this changes code
> > generation in function X" might actually be interesting, and judging
> > by my quick tests might not be *too* verbose.
> 
> On the reproducible build comparison system[1] we use for checking a lot
> of the KSPP work for .text deltas, an allmodconfig finds a fair bit for
> this change. Out of 33900 .o files, 1005 have changes.
> 
> Spot checking matches a lot of what you found already...
> 
>         u64 flags = how->flags;
> 	...
> fs/open.c:1123:
>         int acc_mode = ACC_MODE(flags);
> -    1c86:      movsbl 0x0(%rdx),%edx
> +    1c86:      movzbl 0x0(%rdx),%edx
> 
> #define ACC_MODE(x) ("\004\002\006\006"[(x)&O_ACCMODE])
> 
> Ignoring those, it goes down to 625, and spot checking those is more
> difficult, but looks to be mostly register selection changes dominating
> the delta. The resulting vmlinux sizes are identical, though.
> 
> -Kees
> 
> [1] A fancier version of:
>     https://outflux.net/blog/archives/2022/06/24/finding-binary-differences/

Say, don't we have some way of outputting LLVM IL? I saw some
-fno-discard-value-names floating through a few days ago. Apparently you
can do `make LLVM=1 fs/select.ll`? This might have less noise in it.
I'll play on the airplane tomorrow.

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-20 10:41         ` Gabriel Paubert
@ 2022-10-21 22:46           ` Linus Torvalds
  2022-10-22  6:06             ` Gabriel Paubert
  0 siblings, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2022-10-21 22:46 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Segher Boessenkool, Jason A. Donenfeld, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Kees Cook, Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Thu, Oct 20, 2022 at 3:41 AM Gabriel Paubert <paubert@iram.es> wrote:
>
> I must miss something, the strcmp man page says:
>
> "The comparison is done using unsigned characters."

You're not missing anything, I just hadn't looked at strcmp() in forever.

Yeah, strcmp clearly doesn't care about the signedness of 'char', and
arguably an unsigned char argument makes more sense considering the
semantics of the funmction.

> But it's not for this that I wrote this message. Has anybody considered
> using transparent unions?

I don't love the transparent union-as-argument syntax, but you're
right, that would fix the warning.

Except it then doesn't actually *work* very well.

Try this:

        #include <sys/types.h>

        #if USE_UNION
        typedef union {
                const char *a;
                const signed char *b;
                const unsigned char *c;
        } conststring_arg __attribute__ ((__transparent_union__));
        size_t strlen(conststring_arg);
        #else
        size_t strlen(const char *);
        #endif

        int test(char *a, unsigned char *b)
        {
                return strlen(a)+strlen(b);
        }

        int test2(void)
        {
                return strlen("hello");
        }

and now compile it both ways with

        gcc -DUSE_UNION -Wall -O2 -S t.c
        gcc -Wall -O2 -S t.c

and notice how yes, the "-DUSE_UNION" one silences the warning about
using 'unsigned char *' for strlen. So it seems to work fine.

But then look at the code it generates for 'test2()" in the two cases.

The transparent union version actually generates a function call to an
external 'strlen()' function.

The regular version uses the compiler builtin, and just compiles
test2() to return the constant value 5.

So playing games with anonymous union arguments ends up also disabling
all the compiler optimizations we do want, becaue apparently gcc then
decides "ok, I'm not going to warn about you declaring this
differently, but I'm also not going to use the regular one because you
declared it differently".

This, btw, is also the reason why we don't use --freestanding in the
kernel. We do want the basic <string.h> things to just DTRT.

For the sockaddr_in games, the above isn't an issue. For strlen() and
friends, it very much is.

                       Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-21 22:46           ` Linus Torvalds
@ 2022-10-22  6:06             ` Gabriel Paubert
  2022-10-22 18:16               ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Gabriel Paubert @ 2022-10-22  6:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Segher Boessenkool, Jason A. Donenfeld, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Kees Cook, Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Fri, Oct 21, 2022 at 03:46:01PM -0700, Linus Torvalds wrote:
> On Thu, Oct 20, 2022 at 3:41 AM Gabriel Paubert <paubert@iram.es> wrote:
> >
> > I must miss something, the strcmp man page says:
> >
> > "The comparison is done using unsigned characters."
> 
> You're not missing anything, I just hadn't looked at strcmp() in forever.
> 
> Yeah, strcmp clearly doesn't care about the signedness of 'char', and
> arguably an unsigned char argument makes more sense considering the
> semantics of the funmction.
> 
> > But it's not for this that I wrote this message. Has anybody considered
> > using transparent unions?
> 
> I don't love the transparent union-as-argument syntax, but you're
> right, that would fix the warning.

I'm not in love with the syntax either.

> 
> Except it then doesn't actually *work* very well.
> 
> Try this:
> 
>         #include <sys/types.h>
> 
>         #if USE_UNION
>         typedef union {
>                 const char *a;
>                 const signed char *b;
>                 const unsigned char *c;
>         } conststring_arg __attribute__ ((__transparent_union__));
>         size_t strlen(conststring_arg);
>         #else
>         size_t strlen(const char *);
>         #endif
> 
>         int test(char *a, unsigned char *b)
>         {
>                 return strlen(a)+strlen(b);
>         }
> 
>         int test2(void)
>         {
>                 return strlen("hello");
>         }
> 
> and now compile it both ways with
> 
>         gcc -DUSE_UNION -Wall -O2 -S t.c
>         gcc -Wall -O2 -S t.c
> 

Ok, I´ve just tried it, except that I had something slightly different in
mind, but perhaps should have been clearer in my first post.

I have change your code to the following:


#include <sys/types.h>

#if USE_UNION
typedef union {
	const char *a;
	const signed char *b;
	const unsigned char *c;
} conststring_arg __attribute__ ((__transparent_union__));
static inline size_t strlen(conststring_arg p)
{
	return __builtin_strlen(p.a);
}
#else
size_t strlen(const char *);
#endif

int test(char *a, unsigned char *b)
{
	return strlen(a)+strlen(b);
}

int test2(void)
{
	return strlen("hello");
}

> and notice how yes, the "-DUSE_UNION" one silences the warning about
> using 'unsigned char *' for strlen. So it seems to work fine.
> 
> But then look at the code it generates for 'test2()" in the two cases.

Now test2 looks properly optimized.

This is a bit exploiting a compiler loophole, it calls an external
function which has been defined with the same name!

Depending on how you look at it, it's either disgusting or clever.

I don´t have clang installed, so I don't know whether it would swallow
this code or react with a strong allergy.

	Gabriel
> 
> The transparent union version actually generates a function call to an
> external 'strlen()' function.
> 
> The regular version uses the compiler builtin, and just compiles
> test2() to return the constant value 5.
> 
> So playing games with anonymous union arguments ends up also disabling
> all the compiler optimizations we do want, becaue apparently gcc then
> decides "ok, I'm not going to warn about you declaring this
> differently, but I'm also not going to use the regular one because you
> declared it differently".
> 
> This, btw, is also the reason why we don't use --freestanding in the
> kernel. We do want the basic <string.h> things to just DTRT.
> 
> For the sockaddr_in games, the above isn't an issue. For strlen() and
> friends, it very much is.
> 
>                        Linus


 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-22  6:06             ` Gabriel Paubert
@ 2022-10-22 18:16               ` Linus Torvalds
  2022-10-23 20:23                 ` Gabriel Paubert
  0 siblings, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2022-10-22 18:16 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Segher Boessenkool, Jason A. Donenfeld, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Kees Cook, Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Fri, Oct 21, 2022 at 11:06 PM Gabriel Paubert <paubert@iram.es> wrote:
>
> Ok, I´ve just tried it, except that I had something slightly different in
> mind, but perhaps should have been clearer in my first post.
>
> I have change your code to the following:

I actually tested that, but using a slightly different version, and my
non-union test case ended up like

   size_t strlen(const char *p)
  {
        return __builtin_strlen(p);
  }

and then gcc actually complains about

    warning: infinite recursion detected

and I (incorrectly) thought this was unworkable. But your version
seems to work fine.

So yeah, for the kernel I think we could do something like this. It's
ugly, but it gets rid of the crazy warning.

Practically speaking this might be a bit painful, because we've got
several different variations of this all due to all the things like
our debugging versions (see <linux/fortify-string.h> for example), so
some of our code is this crazy jungle of "with this config, use this
wrapper".

But if somebody wants to deal with the '-Wpointer-sign' warnings,
there does seem to be a way out. Maybe with another set of helper
macros, creating those odd __transparent_union__ wrappers might even
end up reasonable.

It's not like we don't have crazy macros for function wrappers
elsewhere (the SYSCALL macros come to mind - shudder). The macros
themselves may be a nasty horror, but when done right the _use_ point
of said macros can be nice and clean.

                  Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-22 18:16               ` Linus Torvalds
@ 2022-10-23 20:23                 ` Gabriel Paubert
  2022-10-25 23:00                   ` Kees Cook
  0 siblings, 1 reply; 70+ messages in thread
From: Gabriel Paubert @ 2022-10-23 20:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Segher Boessenkool, Jason A. Donenfeld, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Kees Cook, Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman

On Sat, Oct 22, 2022 at 11:16:33AM -0700, Linus Torvalds wrote:
> On Fri, Oct 21, 2022 at 11:06 PM Gabriel Paubert <paubert@iram.es> wrote:
> >
> > Ok, I´ve just tried it, except that I had something slightly different in
> > mind, but perhaps should have been clearer in my first post.
> >
> > I have change your code to the following:
> 
> I actually tested that, but using a slightly different version, and my
> non-union test case ended up like
> 
>    size_t strlen(const char *p)
>   {
>         return __builtin_strlen(p);
>   }
> 
> and then gcc actually complains about
> 
>     warning: infinite recursion detected
> 
> and I (incorrectly) thought this was unworkable. But your version
> seems to work fine.

Incidentally, it also gives exactly the same code with -ffreestanding.

> 
> So yeah, for the kernel I think we could do something like this. It's
> ugly, but it gets rid of the crazy warning.

Not as ugly as casts IMO, and it's localized in a few header files.

However, it does not solve the problem of assigning a constant string to
an u8 *; I've no idea on how to fix that.

> 
> Practically speaking this might be a bit painful, because we've got
> several different variations of this all due to all the things like
> our debugging versions (see <linux/fortify-string.h> for example), so
> some of our code is this crazy jungle of "with this config, use this
> wrapper".

I've just had a look at that code, and I don't want to touch it with a
10 foot pole. If someone else to get his hands dirty... 

	Gabriel

> 
> But if somebody wants to deal with the '-Wpointer-sign' warnings,
> there does seem to be a way out. Maybe with another set of helper
> macros, creating those odd __transparent_union__ wrappers might even
> end up reasonable.
> 
> It's not like we don't have crazy macros for function wrappers
> elsewhere (the SYSCALL macros come to mind - shudder). The macros
> themselves may be a nasty horror, but when done right the _use_ point
> of said macros can be nice and clean.
> 
>                   Linus
 


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-19 20:30     ` [PATCH v2] kbuild: treat char as always unsigned Jason A. Donenfeld
  2022-10-19 23:56       ` Linus Torvalds
@ 2022-10-24  9:24       ` Dan Carpenter
  2022-10-24  9:30         ` Dan Carpenter
  2022-10-24 15:17         ` Jason A. Donenfeld
  2022-12-21 14:53       ` Guenter Roeck
  2 siblings, 2 replies; 70+ messages in thread
From: Dan Carpenter @ 2022-10-24  9:24 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Linus Torvalds,
	Andy Shevchenko, Greg Kroah-Hartman

On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> Recently, some compile-time checking I added to the clamp_t family of
> functions triggered a build error when a poorly written driver was
> compiled on ARM, because the driver assumed that the naked `char` type
> is signed, but ARM treats it as unsigned, and the C standard says it's
> architecture-dependent.
> 
> I doubt this particular driver is the only instance in which
> unsuspecting authors make assumptions about `char` with no `signed` or
> `unsigned` specifier. We were lucky enough this time that that driver
> used `clamp_t(char, negative_value, positive_value)`, so the new
> checking code found it, and I've sent a patch to fix it, but there are
> likely other places lurking that won't be so easily unearthed.
> 
> So let's just eliminate this particular variety of heisensign bugs
> entirely. Set `-funsigned-char` globally, so that gcc makes the type
> unsigned on all architectures.
> 
> This will break things in some places and fix things in others, so this
> will likely cause a bit of churn while reconciling the type misuse.
> 

This is a very daring change and obviously is going to introduce bugs.
It might be better to create a static checker rule that says "char"
without explicit signedness can only be used for strings.

arch/parisc/kernel/drivers.c:337 print_hwpath() warn: impossible condition '(path->bc[i] == -1) => (0-255 == (-1))'
arch/parisc/kernel/drivers.c:410 setup_bus_id() warn: impossible condition '(path.bc[i] == -1) => (0-255 == (-1))'
arch/parisc/kernel/drivers.c:486 create_parisc_device() warn: impossible condition '(modpath->bc[i] == -1) => (0-255 == (-1))'
arch/parisc/kernel/drivers.c:759 hwpath_to_device() warn: impossible condition '(modpath->bc[i] == -1) => (0-255 == (-1))'
drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: assigning (-9) to unsigned variable 'tm'
drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: we never enter this loop
drivers/misc/sgi-gru/grumain.c:711 gru_check_chiplet_assignment() warn: 'gts->ts_user_chiplet_id' is unsigned
drivers/net/wireless/cisco/airo.c:5316 proc_wepkey_on_close() warn: assigning (-16) to unsigned variable 'key[i / 3]'
drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9415 rt2800_iq_search() warn: assigning (-32) to unsigned variable 'idx0'
drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9470 rt2800_iq_search() warn: assigning (-32) to unsigned variable 'perr'
drivers/video/fbdev/sis/init301.c:3549 SiS_GetCRT2Data301() warn: 'SiS_Pr->SiS_EModeIDTable[ModeIdIndex]->ROMMODEIDX661' is unsigned
sound/pci/au88x0/au88x0_core.c:2029 vortex_adb_checkinout() warn: signedness bug returning '(-22)'
sound/pci/au88x0/au88x0_core.c:2046 vortex_adb_checkinout() warn: signedness bug returning '(-12)'
sound/pci/au88x0/au88x0_core.c:2125 vortex_adb_allocroute() warn: 'vortex_adb_checkinout(vortex, (0), en, 0)' is unsigned
sound/pci/au88x0/au88x0_core.c:2170 vortex_adb_allocroute() warn: 'vortex_adb_checkinout(vortex, stream->resources, en, 4)' is unsigned
sound/pci/rme9652/hdsp.c:3953 hdsp_channel_buffer_location() warn: 'hdsp->channel_map[channel]' is unsigned
sound/pci/rme9652/rme9652.c:1833 rme9652_channel_buffer_location() warn: 'rme9652->channel_map[channel]' is unsigned

I did not know that ARM had unsigned chars.  I only knew about PPC and
on that arch they use char aggressively so that no one forgets that char
is unsigned.  Changing char to signed would have made people very
annoyed.  :P

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-24  9:24       ` Dan Carpenter
@ 2022-10-24  9:30         ` Dan Carpenter
  2022-10-24 16:33           ` Jason A. Donenfeld
  2022-10-24 15:17         ` Jason A. Donenfeld
  1 sibling, 1 reply; 70+ messages in thread
From: Dan Carpenter @ 2022-10-24  9:30 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Linus Torvalds,
	Andy Shevchenko, Greg Kroah-Hartman

On Mon, Oct 24, 2022 at 12:24:24PM +0300, Dan Carpenter wrote:
> On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> > Recently, some compile-time checking I added to the clamp_t family of
> > functions triggered a build error when a poorly written driver was
> > compiled on ARM, because the driver assumed that the naked `char` type
> > is signed, but ARM treats it as unsigned, and the C standard says it's
> > architecture-dependent.
> > 
> > I doubt this particular driver is the only instance in which
> > unsuspecting authors make assumptions about `char` with no `signed` or
> > `unsigned` specifier. We were lucky enough this time that that driver
> > used `clamp_t(char, negative_value, positive_value)`, so the new
> > checking code found it, and I've sent a patch to fix it, but there are
> > likely other places lurking that won't be so easily unearthed.
> > 
> > So let's just eliminate this particular variety of heisensign bugs
> > entirely. Set `-funsigned-char` globally, so that gcc makes the type
> > unsigned on all architectures.
> > 
> > This will break things in some places and fix things in others, so this
> > will likely cause a bit of churn while reconciling the type misuse.
> > 
> 
> This is a very daring change and obviously is going to introduce bugs.
> It might be better to create a static checker rule that says "char"
> without explicit signedness can only be used for strings.
> 
> arch/parisc/kernel/drivers.c:337 print_hwpath() warn: impossible condition '(path->bc[i] == -1) => (0-255 == (-1))'
> arch/parisc/kernel/drivers.c:410 setup_bus_id() warn: impossible condition '(path.bc[i] == -1) => (0-255 == (-1))'
> arch/parisc/kernel/drivers.c:486 create_parisc_device() warn: impossible condition '(modpath->bc[i] == -1) => (0-255 == (-1))'
> arch/parisc/kernel/drivers.c:759 hwpath_to_device() warn: impossible condition '(modpath->bc[i] == -1) => (0-255 == (-1))'
> drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: assigning (-9) to unsigned variable 'tm'
> drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: we never enter this loop
> drivers/misc/sgi-gru/grumain.c:711 gru_check_chiplet_assignment() warn: 'gts->ts_user_chiplet_id' is unsigned
> drivers/net/wireless/cisco/airo.c:5316 proc_wepkey_on_close() warn: assigning (-16) to unsigned variable 'key[i / 3]'
> drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9415 rt2800_iq_search() warn: assigning (-32) to unsigned variable 'idx0'
> drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9470 rt2800_iq_search() warn: assigning (-32) to unsigned variable 'perr'
> drivers/video/fbdev/sis/init301.c:3549 SiS_GetCRT2Data301() warn: 'SiS_Pr->SiS_EModeIDTable[ModeIdIndex]->ROMMODEIDX661' is unsigned
> sound/pci/au88x0/au88x0_core.c:2029 vortex_adb_checkinout() warn: signedness bug returning '(-22)'
> sound/pci/au88x0/au88x0_core.c:2046 vortex_adb_checkinout() warn: signedness bug returning '(-12)'
> sound/pci/au88x0/au88x0_core.c:2125 vortex_adb_allocroute() warn: 'vortex_adb_checkinout(vortex, (0), en, 0)' is unsigned
> sound/pci/au88x0/au88x0_core.c:2170 vortex_adb_allocroute() warn: 'vortex_adb_checkinout(vortex, stream->resources, en, 4)' is unsigned
> sound/pci/rme9652/hdsp.c:3953 hdsp_channel_buffer_location() warn: 'hdsp->channel_map[channel]' is unsigned
> sound/pci/rme9652/rme9652.c:1833 rme9652_channel_buffer_location() warn: 'rme9652->channel_map[channel]' is unsigned

Here are some more:

drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9472 rt2800_iq_search() warn: impossible condition '(gerr < -7) => (0-255 < (-7))'
drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9476 rt2800_iq_search() warn: impossible condition '(perr < -31) => (0-255 < (-31))'
drivers/staging/rtl8192e/rtllib_softmac_wx.c:459 rtllib_wx_set_essid() warn: impossible condition '(extra[i] < 0) => (0-255 < 0)'
sound/pci/rme9652/hdsp.c:4153 snd_hdsp_channel_info() warn: impossible condition '(hdsp->channel_map[channel] < 0) => (0-255 < 0)'

This might be interesting for backports if everyone starts to rely on
the fact that char is unsigned as the PPC people currently do.

regards,
dan carpenter


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-24  9:24       ` Dan Carpenter
  2022-10-24  9:30         ` Dan Carpenter
@ 2022-10-24 15:17         ` Jason A. Donenfeld
  1 sibling, 0 replies; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-24 15:17 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Linus Torvalds,
	Andy Shevchenko, Greg Kroah-Hartman

On Mon, Oct 24, 2022 at 12:24:24PM +0300, Dan Carpenter wrote:
> On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> > Recently, some compile-time checking I added to the clamp_t family of
> > functions triggered a build error when a poorly written driver was
> > compiled on ARM, because the driver assumed that the naked `char` type
> > is signed, but ARM treats it as unsigned, and the C standard says it's
> > architecture-dependent.
> > 
> > I doubt this particular driver is the only instance in which
> > unsuspecting authors make assumptions about `char` with no `signed` or
> > `unsigned` specifier. We were lucky enough this time that that driver
> > used `clamp_t(char, negative_value, positive_value)`, so the new
> > checking code found it, and I've sent a patch to fix it, but there are
> > likely other places lurking that won't be so easily unearthed.
> > 
> > So let's just eliminate this particular variety of heisensign bugs
> > entirely. Set `-funsigned-char` globally, so that gcc makes the type
> > unsigned on all architectures.
> > 
> > This will break things in some places and fix things in others, so this
> > will likely cause a bit of churn while reconciling the type misuse.
> > 
> 
> This is a very daring change and obviously is going to introduce bugs.
> It might be better to create a static checker rule that says "char"
> without explicit signedness can only be used for strings.

Indeed this would be great.

> 
> arch/parisc/kernel/drivers.c:337 print_hwpath() warn: impossible condition '(path->bc[i] == -1) => (0-255 == (-1))'
> arch/parisc/kernel/drivers.c:410 setup_bus_id() warn: impossible condition '(path.bc[i] == -1) => (0-255 == (-1))'
> arch/parisc/kernel/drivers.c:486 create_parisc_device() warn: impossible condition '(modpath->bc[i] == -1) => (0-255 == (-1))'
> arch/parisc/kernel/drivers.c:759 hwpath_to_device() warn: impossible condition '(modpath->bc[i] == -1) => (0-255 == (-1))'
> drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: assigning (-9) to unsigned variable 'tm'
> drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: we never enter this loop
> drivers/misc/sgi-gru/grumain.c:711 gru_check_chiplet_assignment() warn: 'gts->ts_user_chiplet_id' is unsigned
> drivers/net/wireless/cisco/airo.c:5316 proc_wepkey_on_close() warn: assigning (-16) to unsigned variable 'key[i / 3]'
> drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9415 rt2800_iq_search() warn: assigning (-32) to unsigned variable 'idx0'
> drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9470 rt2800_iq_search() warn: assigning (-32) to unsigned variable 'perr'
> drivers/video/fbdev/sis/init301.c:3549 SiS_GetCRT2Data301() warn: 'SiS_Pr->SiS_EModeIDTable[ModeIdIndex]->ROMMODEIDX661' is unsigned
> sound/pci/au88x0/au88x0_core.c:2029 vortex_adb_checkinout() warn: signedness bug returning '(-22)'
> sound/pci/au88x0/au88x0_core.c:2046 vortex_adb_checkinout() warn: signedness bug returning '(-12)'
> sound/pci/au88x0/au88x0_core.c:2125 vortex_adb_allocroute() warn: 'vortex_adb_checkinout(vortex, (0), en, 0)' is unsigned
> sound/pci/au88x0/au88x0_core.c:2170 vortex_adb_allocroute() warn: 'vortex_adb_checkinout(vortex, stream->resources, en, 4)' is unsigned
> sound/pci/rme9652/hdsp.c:3953 hdsp_channel_buffer_location() warn: 'hdsp->channel_map[channel]' is unsigned
> sound/pci/rme9652/rme9652.c:1833 rme9652_channel_buffer_location() warn: 'rme9652->channel_map[channel]' is unsigned


Thanks. I'll fix these up.

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-24  9:30         ` Dan Carpenter
@ 2022-10-24 16:33           ` Jason A. Donenfeld
  2022-10-24 17:10             ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-24 16:33 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Linus Torvalds,
	Andy Shevchenko, Greg Kroah-Hartman

On Mon, Oct 24, 2022 at 12:30:11PM +0300, Dan Carpenter wrote:
> On Mon, Oct 24, 2022 at 12:24:24PM +0300, Dan Carpenter wrote:
> > On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> > > Recently, some compile-time checking I added to the clamp_t family of
> > > functions triggered a build error when a poorly written driver was
> > > compiled on ARM, because the driver assumed that the naked `char` type
> > > is signed, but ARM treats it as unsigned, and the C standard says it's
> > > architecture-dependent.
> > > 
> > > I doubt this particular driver is the only instance in which
> > > unsuspecting authors make assumptions about `char` with no `signed` or
> > > `unsigned` specifier. We were lucky enough this time that that driver
> > > used `clamp_t(char, negative_value, positive_value)`, so the new
> > > checking code found it, and I've sent a patch to fix it, but there are
> > > likely other places lurking that won't be so easily unearthed.
> > > 
> > > So let's just eliminate this particular variety of heisensign bugs
> > > entirely. Set `-funsigned-char` globally, so that gcc makes the type
> > > unsigned on all architectures.
> > > 
> > > This will break things in some places and fix things in others, so this
> > > will likely cause a bit of churn while reconciling the type misuse.
> > > 
> > 
> > This is a very daring change and obviously is going to introduce bugs.
> > It might be better to create a static checker rule that says "char"
> > without explicit signedness can only be used for strings.
> > 
> > arch/parisc/kernel/drivers.c:337 print_hwpath() warn: impossible condition '(path->bc[i] == -1) => (0-255 == (-1))'
> > arch/parisc/kernel/drivers.c:410 setup_bus_id() warn: impossible condition '(path.bc[i] == -1) => (0-255 == (-1))'
> > arch/parisc/kernel/drivers.c:486 create_parisc_device() warn: impossible condition '(modpath->bc[i] == -1) => (0-255 == (-1))'
> > arch/parisc/kernel/drivers.c:759 hwpath_to_device() warn: impossible condition '(modpath->bc[i] == -1) => (0-255 == (-1))'
> > drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: assigning (-9) to unsigned variable 'tm'
> > drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: we never enter this loop
> > drivers/misc/sgi-gru/grumain.c:711 gru_check_chiplet_assignment() warn: 'gts->ts_user_chiplet_id' is unsigned
> > drivers/net/wireless/cisco/airo.c:5316 proc_wepkey_on_close() warn: assigning (-16) to unsigned variable 'key[i / 3]'
> > drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9415 rt2800_iq_search() warn: assigning (-32) to unsigned variable 'idx0'
> > drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9470 rt2800_iq_search() warn: assigning (-32) to unsigned variable 'perr'
> > drivers/video/fbdev/sis/init301.c:3549 SiS_GetCRT2Data301() warn: 'SiS_Pr->SiS_EModeIDTable[ModeIdIndex]->ROMMODEIDX661' is unsigned
> > sound/pci/au88x0/au88x0_core.c:2029 vortex_adb_checkinout() warn: signedness bug returning '(-22)'
> > sound/pci/au88x0/au88x0_core.c:2046 vortex_adb_checkinout() warn: signedness bug returning '(-12)'
> > sound/pci/au88x0/au88x0_core.c:2125 vortex_adb_allocroute() warn: 'vortex_adb_checkinout(vortex, (0), en, 0)' is unsigned
> > sound/pci/au88x0/au88x0_core.c:2170 vortex_adb_allocroute() warn: 'vortex_adb_checkinout(vortex, stream->resources, en, 4)' is unsigned
> > sound/pci/rme9652/hdsp.c:3953 hdsp_channel_buffer_location() warn: 'hdsp->channel_map[channel]' is unsigned
> > sound/pci/rme9652/rme9652.c:1833 rme9652_channel_buffer_location() warn: 'rme9652->channel_map[channel]' is unsigned
> 
> Here are some more:
> 
> drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9472 rt2800_iq_search() warn: impossible condition '(gerr < -7) => (0-255 < (-7))'
> drivers/net/wireless/ralink/rt2x00/rt2800lib.c:9476 rt2800_iq_search() warn: impossible condition '(perr < -31) => (0-255 < (-31))'
> drivers/staging/rtl8192e/rtllib_softmac_wx.c:459 rtllib_wx_set_essid() warn: impossible condition '(extra[i] < 0) => (0-255 < 0)'
> sound/pci/rme9652/hdsp.c:4153 snd_hdsp_channel_info() warn: impossible condition '(hdsp->channel_map[channel] < 0) => (0-255 < 0)'
> 
> This might be interesting for backports if everyone starts to rely on
> the fact that char is unsigned as the PPC people currently do.

Give these a minute to hit Lore, but patches just submitted to various
maintainers as fixes (for 6.1), since these are already broken on some
architecture.

https://lore.kernel.org/all/20221024163005.536097-1-Jason@zx2c4.com
https://lore.kernel.org/all/20221024162947.536060-1-Jason@zx2c4.com
https://lore.kernel.org/all/20221024162929.536004-1-Jason@zx2c4.com
https://lore.kernel.org/all/20221024162901.535972-1-Jason@zx2c4.com
https://lore.kernel.org/all/20221024162843.535921-1-Jason@zx2c4.com
https://lore.kernel.org/all/20221024162823.535884-1-Jason@zx2c4.com
https://lore.kernel.org/all/20221024162756.535776-1-Jason@zx2c4.com

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-24 16:33           ` Jason A. Donenfeld
@ 2022-10-24 17:10             ` Linus Torvalds
  2022-10-24 17:17               ` Jason A. Donenfeld
  2022-10-25 10:16               ` David Laight
  0 siblings, 2 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-10-24 17:10 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Dan Carpenter, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

On Mon, Oct 24, 2022 at 9:34 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Give these a minute to hit Lore, but patches just submitted to various
> maintainers as fixes (for 6.1), since these are already broken on some
> architecture.

Hold up a minute.

Some of those may need more thought. For example, that first one:

> https://lore.kernel.org/all/20221024163005.536097-1-Jason@zx2c4.com

looks just *strange*. As far as I can tell, no other wireless drivers
do any sign checks at all.

Now, I didn't really look around a lot, but looking at a few other
SIOCSIWESSID users, most don't even seem to treat it as a string at
all, but as just a byte dump (so memcpy() instead of strncpy())

As far as I know, there are no actual rules for SSID character sets,
and while using utf-8 or something else might cause interoperability
problems, this driver seems to be just confused. If you want to check
for "printable characters", that check is still wrong.

So I don't think this is a "assume char is signed" issue. I think this
is a "driver is confused" issue.

IOW, I don't think these are 6.1 material as some kind of obvious
fixes, at least not without driver author acks.

                Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-24 17:10             ` Linus Torvalds
@ 2022-10-24 17:17               ` Jason A. Donenfeld
  2022-10-25 19:22                 ` Kalle Valo
  2022-10-25 10:16               ` David Laight
  1 sibling, 1 reply; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-24 17:17 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dan Carpenter, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman, Kalle Valo, linux-wireless,
	Larry Finger, mikem, wlanfae

Hi Linus,

On Mon, Oct 24, 2022 at 7:11 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> IOW, I don't think these are 6.1 material as some kind of obvious
> fixes, at least not without driver author acks.

Right, these are posted to the authors and maintainers to look at.
Maybe they punt them until 6.2 which would be fine too.

> On Mon, Oct 24, 2022 at 9:34 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Some of those may need more thought. For example, that first one:
>
> > https://lore.kernel.org/all/20221024163005.536097-1-Jason@zx2c4.com
>
> looks just *strange*. As far as I can tell, no other wireless drivers
> do any sign checks at all.
>
> Now, I didn't really look around a lot, but looking at a few other
> SIOCSIWESSID users, most don't even seem to treat it as a string at
> all, but as just a byte dump (so memcpy() instead of strncpy())
>
> As far as I know, there are no actual rules for SSID character sets,
> and while using utf-8 or something else might cause interoperability
> problems, this driver seems to be just confused. If you want to check
> for "printable characters", that check is still wrong.
>
> So I don't think this is a "assume char is signed" issue. I think this
> is a "driver is confused" issue.

Yea I had a few versions of this. In one of them, I changed `char
*extra` throughout the wireless stack into `s8 *extra` and in another
`u8 *extra`, after realizing they're mostly just bags of bits. But
that seemed pretty invasive when, indeed, this staging driver is just
a little screwy.

So perhaps the right fix is to just kill that whole snippet? Kalle - opinions?

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* RE: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-24 17:10             ` Linus Torvalds
  2022-10-24 17:17               ` Jason A. Donenfeld
@ 2022-10-25 10:16               ` David Laight
  1 sibling, 0 replies; 70+ messages in thread
From: David Laight @ 2022-10-25 10:16 UTC (permalink / raw)
  To: 'Linus Torvalds', Jason A. Donenfeld
  Cc: Dan Carpenter, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

From: Linus Torvalds
> Sent: 24 October 2022 18:11
...
> 
> As far as I know, there are no actual rules for SSID character sets,
> and while using utf-8 or something else might cause interoperability
> problems, this driver seems to be just confused. If you want to check
> for "printable characters", that check is still wrong.

Are SSID even required to be printable at all?
While most systems only let you configure 'strings' I don't
remember that actually being a requirement.
(I've sure I read up on this years ago.)

The frame format will be using an explicit length.
So even embedded zeros may be valid!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-24 17:17               ` Jason A. Donenfeld
@ 2022-10-25 19:22                 ` Kalle Valo
  0 siblings, 0 replies; 70+ messages in thread
From: Kalle Valo @ 2022-10-25 19:22 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Linus Torvalds, Dan Carpenter, linux-kernel, linux-kbuild,
	linux-arch, linux-toolchains, Masahiro Yamada, Kees Cook,
	Andrew Morton, Andy Shevchenko, Greg Kroah-Hartman,
	linux-wireless, Larry Finger, mikem, wlanfae

"Jason A. Donenfeld" <Jason@zx2c4.com> writes:

> Hi Linus,
>
> On Mon, Oct 24, 2022 at 7:11 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>> IOW, I don't think these are 6.1 material as some kind of obvious
>> fixes, at least not without driver author acks.
>
> Right, these are posted to the authors and maintainers to look at.
> Maybe they punt them until 6.2 which would be fine too.
>
>> On Mon, Oct 24, 2022 at 9:34 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>> Some of those may need more thought. For example, that first one:
>>
>> > https://lore.kernel.org/all/20221024163005.536097-1-Jason@zx2c4.com
>>
>> looks just *strange*. As far as I can tell, no other wireless drivers
>> do any sign checks at all.
>>
>> Now, I didn't really look around a lot, but looking at a few other
>> SIOCSIWESSID users, most don't even seem to treat it as a string at
>> all, but as just a byte dump (so memcpy() instead of strncpy())

Yes, SSID should be handled as a byte array with a specified length.
Back in the day some badly written code treated it as string but luckily
it's rare now.

>> As far as I know, there are no actual rules for SSID character sets,
>> and while using utf-8 or something else might cause interoperability
>> problems, this driver seems to be just confused. If you want to check
>> for "printable characters", that check is still wrong.
>>
>> So I don't think this is a "assume char is signed" issue. I think this
>> is a "driver is confused" issue.
>
> Yea I had a few versions of this. In one of them, I changed `char
> *extra` throughout the wireless stack into `s8 *extra` and in another
> `u8 *extra`, after realizing they're mostly just bags of bits. But
> that seemed pretty invasive when, indeed, this staging driver is just
> a little screwy.
>
> So perhaps the right fix is to just kill that whole snippet? Kalle - opinions?

I would also remove the whole 'extra[i] < 0', seems like a pointless
check to me. And I see that you already submitted v2, good.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-23 20:23                 ` Gabriel Paubert
@ 2022-10-25 23:00                   ` Kees Cook
  2022-10-26  0:04                     ` Jason A. Donenfeld
  0 siblings, 1 reply; 70+ messages in thread
From: Kees Cook @ 2022-10-25 23:00 UTC (permalink / raw)
  To: Gabriel Paubert
  Cc: Linus Torvalds, Segher Boessenkool, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

On Sun, Oct 23, 2022 at 10:23:56PM +0200, Gabriel Paubert wrote:
> On Sat, Oct 22, 2022 at 11:16:33AM -0700, Linus Torvalds wrote:
> > Practically speaking this might be a bit painful, because we've got
> > several different variations of this all due to all the things like
> > our debugging versions (see <linux/fortify-string.h> for example), so
> > some of our code is this crazy jungle of "with this config, use this
> > wrapper".
> 
> I've just had a look at that code, and I don't want to touch it with a
> 10 foot pole. If someone else to get his hands dirty... 

Heh. Yes, fortify-string.h is a twisty maze. I've tried to keep it as
regular as possible, but I admit it is weird. On my list is to split
compile-time from run-time logic (as suggested by Linus a while back),
but I've worried it would end up spilling some of the ugly back into
string.h, which should probably not happen. As such, I've tried to keep
it all contained in fortify-string.h.

Regardless, I think I'd rather avoid yet more special cases in the
fortify code, so I'd like to avoid using transparent union if we can. It
seems like -funsigned-char and associated fixes will be sufficient,
though, yes?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-25 23:00                   ` Kees Cook
@ 2022-10-26  0:04                     ` Jason A. Donenfeld
  2022-10-26 15:41                       ` Kees Cook
  0 siblings, 1 reply; 70+ messages in thread
From: Jason A. Donenfeld @ 2022-10-26  0:04 UTC (permalink / raw)
  To: Kees Cook
  Cc: Gabriel Paubert, Linus Torvalds, Segher Boessenkool,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

On Tue, Oct 25, 2022 at 04:00:30PM -0700, Kees Cook wrote:
> On Sun, Oct 23, 2022 at 10:23:56PM +0200, Gabriel Paubert wrote:
> > On Sat, Oct 22, 2022 at 11:16:33AM -0700, Linus Torvalds wrote:
> > > Practically speaking this might be a bit painful, because we've got
> > > several different variations of this all due to all the things like
> > > our debugging versions (see <linux/fortify-string.h> for example), so
> > > some of our code is this crazy jungle of "with this config, use this
> > > wrapper".
> > 
> > I've just had a look at that code, and I don't want to touch it with a
> > 10 foot pole. If someone else to get his hands dirty... 
> 
> Heh. Yes, fortify-string.h is a twisty maze. I've tried to keep it as
> regular as possible, but I admit it is weird. On my list is to split
> compile-time from run-time logic (as suggested by Linus a while back),
> but I've worried it would end up spilling some of the ugly back into
> string.h, which should probably not happen. As such, I've tried to keep
> it all contained in fortify-string.h.
> 
> Regardless, I think I'd rather avoid yet more special cases in the
> fortify code, so I'd like to avoid using transparent union if we can. It
> seems like -funsigned-char and associated fixes will be sufficient,
> though, yes?

I thought some of the motivation behind the transparent union was that
gcc still treats `char` as a distinct type from `unsigned char`, so
gcc's checker can still get upset and warn when passing a u8[] to a
string handling function that expects a char[]. (Once the
-funsigned-char changes go in, though, we should probably decide that
s8[] is never a valid string.)

Jason

^ permalink raw reply	[flat|nested] 70+ messages in thread

* make ctype ascii only? (was [PATCH] kbuild: treat char as always signed)
  2022-10-19 19:54 ` Linus Torvalds
  2022-10-19 20:23   ` Jason A. Donenfeld
  2022-10-19 20:58   ` [PATCH] kbuild: treat char as always signed David Laight
@ 2022-10-26  0:10   ` Rasmus Villemoes
  2022-10-26 18:10     ` Linus Torvalds
  2 siblings, 1 reply; 70+ messages in thread
From: Rasmus Villemoes @ 2022-10-26  0:10 UTC (permalink / raw)
  To: Linus Torvalds, Jason A. Donenfeld
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

On 19/10/2022 21.54, Linus Torvalds wrote:
> On Wed, Oct 19, 2022 at 9:27 AM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>
>> So let's just eliminate this particular variety of heisensigned bugs
>> entirely. Set `-fsigned-char` globally, so that gcc makes the type
>> signed on all architectures.
> 
> Btw, I do wonder if we might actually be better off doing this - but
> doing it the other way around.

Only very tangentially related (because it has to do with chars...): Can
we switch our ctype to be ASCII only, just as it was back in the good'ol
mid 90s [i.e. before
https://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux-fullhistory.git/commit/lib/ctype.c?id=036b97b05489161be06e63be77c5fad9247d23ff].

It bugs me that it's almost-but-not-quite-latin1, that toupper() isn't
idempotent, and that one can hit an isalpha() with toupper() and get
something that isn't isalpha().

Rasmus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH] kbuild: treat char as always signed
  2022-10-26  0:04                     ` Jason A. Donenfeld
@ 2022-10-26 15:41                       ` Kees Cook
  0 siblings, 0 replies; 70+ messages in thread
From: Kees Cook @ 2022-10-26 15:41 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Gabriel Paubert, Linus Torvalds, Segher Boessenkool,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman

On Wed, Oct 26, 2022 at 02:04:30AM +0200, Jason A. Donenfeld wrote:
> ... (Once the
> -funsigned-char changes go in, though, we should probably decide that
> s8[] is never a valid string.)

Yeah, that's my goal too.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: make ctype ascii only? (was [PATCH] kbuild: treat char as always signed)
  2022-10-26  0:10   ` make ctype ascii only? (was [PATCH] kbuild: treat char as always signed) Rasmus Villemoes
@ 2022-10-26 18:10     ` Linus Torvalds
  2022-10-27  7:59       ` Rasmus Villemoes
  0 siblings, 1 reply; 70+ messages in thread
From: Linus Torvalds @ 2022-10-26 18:10 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

On Tue, Oct 25, 2022 at 5:10 PM Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> Only very tangentially related (because it has to do with chars...): Can
> we switch our ctype to be ASCII only, just as it was back in the good'ol
> mid 90s

Those US-ASCII days weren't really very "good" old days, but I forget
why we did this (it's attributed to me, but that's from the
pre-BK/pre-git days before we actually tracked things all that well,
so..)

Anyway, I think anybody using ctype.h on 8-bit chars gets what they
deserve, and I think Latin1 (or something close to it) is better than
US-ASCII, in that it's at least the same as Unicode in the low 8
chars.

So no, I'm disinclined to go back in time to what I think is an even
worse situation. Latin1 isn't great, but it sure beats US-ASCII. And
if you really want just US-ASII, then don't use the high bit, and make
your disgusting 7-bit code be *explicitly* 7-bit.

Now, if there are errors in that table wrt Latin1 / "first 256
codepoints of Unicode" too, then we can fix those.

Not that anybody has apparently cared since 2.0.1 was released back in
July of 1996 (btw, it's sad how none of the old linux git archive
creations seem to have tried to import the dates, so you have to look
those up separately)

And if nobody has cared since 1996, I don't really think it matters.

But fundamentally, I think anybody calling US-ASCII "good" is either
very very very confused, or is comparing it to EBCDIC.

                 Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: make ctype ascii only? (was [PATCH] kbuild: treat char as always signed)
  2022-10-26 18:10     ` Linus Torvalds
@ 2022-10-27  7:59       ` Rasmus Villemoes
  2022-10-27 18:28         ` Linus Torvalds
  0 siblings, 1 reply; 70+ messages in thread
From: Rasmus Villemoes @ 2022-10-27  7:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

On 26/10/2022 20.10, Linus Torvalds wrote:
> On Tue, Oct 25, 2022 at 5:10 PM Rasmus Villemoes
> <linux@rasmusvillemoes.dk> wrote:
>>
>> Only very tangentially related (because it has to do with chars...): Can
>> we switch our ctype to be ASCII only, just as it was back in the good'ol
>> mid 90s
> 
> Those US-ASCII days weren't really very "good" old days, but I forget
> why we did this (it's attributed to me, but that's from the
> pre-BK/pre-git days before we actually tracked things all that well,
> so..)
> 
> Anyway, I think anybody using ctype.h on 8-bit chars gets what they
> deserve, and I think Latin1 (or something close to it) is better than
> US-ASCII, in that it's at least the same as Unicode in the low 8
> chars.

My concern is that it's currently somewhat ill specified what our ctype
actually represents, and that would be a lot easier to specify if we
just said ASCII, everything above 0x7f is neither punct or ctrl or alpha
or anything else.

For example, people may do stuff like isprint(c) ? c : '.' in a printk()
call, but most likely the consumer (somebody doing dmesg) would, at
least these days, use utf-8, so that just results in a broken utf-8
sequence. Now I see that a lot of callers actually do "isascii(c) &&
isprint(c)", so they already know about this, but there are also many
instances where isprint() is used by itself.

There's also stuff like fs/afs/cell.c and other places that use
isprint/isalnum/... to make decisions on what is allowed on the wire
and/or in a disk format, where it's then hard to reason about just
exactly what is accepted. And places that use toupper() on their strings
to normalize them; that's broken when toupper() isn't idempotent.

> So no, I'm disinclined to go back in time to what I think is an even
> worse situation. Latin1 isn't great, but it sure beats US-ASCII. And
> if you really want just US-ASII, then don't use the high bit, and make
> your disgusting 7-bit code be *explicitly* 7-bit.
> 
> Now, if there are errors in that table wrt Latin1 / "first 256
> codepoints of Unicode" too, then we can fix those.

AFAICT, the differences are:

- 0xaa (FEMININE ORDINAL INDICATOR), 0xb5 (MICRO SIGN), 0xba (FEMININE
ORDINAL INDICATOR) should be lower (hence alpha and alnum), not punct.

- depending a little on just exactly what one wants latin1 to mean, but
if it does mean "first 256 codepoints of Unicode", 0x80-0x9f should be cntrl

- for some reason at least glibc seems to classify 0xa0 as punctuation
and not space (hence also as isgraph)

- 0xdf and 0xff are correctly classified as lower, but since they don't
have upper-case versions (at least not any that are representable in
latin1), correct toupper() behaviour is to return them unchanged, but we
just subtract 0x20, so 0xff becomes 0xdf which isn't isupper() and 0xdf
becomes something that isn't even isalpha().

Fixing the first would create more instances of the last, and I think
the only sane way to fix that would be a 256 byte lookup table to use by
toupper().

> Not that anybody has apparently cared since 2.0.1 was released back in
> July of 1996 
(btw, it's sad how none of the old linux git archive
> creations seem to have tried to import the dates, so you have to look
> those up separately)

Huh? That commit has 1996 as the author date, while its commit date is
indeed 2007. The very first line says:

author	linus1 <torvalds@linuxfoundation.org>	1996-07-02 11:00:00 -0600

> And if nobody has cared since 1996, I don't really think it matters.

Indeed, I don't think it's a huge problem in practice. But it still
bothers me that such a simple (and usually overlooked) corner of the
kernel's C library is ill-defined and arguably a little buggy.

Rasmus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: make ctype ascii only? (was [PATCH] kbuild: treat char as always signed)
  2022-10-27  7:59       ` Rasmus Villemoes
@ 2022-10-27 18:28         ` Linus Torvalds
  0 siblings, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-10-27 18:28 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman

On Thu, Oct 27, 2022 at 12:59 AM Rasmus Villemoes
<linux@rasmusvillemoes.dk> wrote:
>
> AFAICT, the differences are:
>
> - 0xaa (FEMININE ORDINAL INDICATOR), 0xb5 (MICRO SIGN), 0xba (FEMININE
> ORDINAL INDICATOR) should be lower (hence alpha and alnum), not punct.
>
> - depending a little on just exactly what one wants latin1 to mean, but
> if it does mean "first 256 codepoints of Unicode", 0x80-0x9f should be cntrl
>
> - for some reason at least glibc seems to classify 0xa0 as punctuation
> and not space (hence also as isgraph)
>
> - 0xdf and 0xff are correctly classified as lower, but since they don't
> have upper-case versions (at least not any that are representable in
> latin1), correct toupper() behaviour is to return them unchanged, but we
> just subtract 0x20, so 0xff becomes 0xdf which isn't isupper() and 0xdf
> becomes something that isn't even isalpha().

Heh.

Honestly, I don't think we should care at all.

For the byte range 128-255, anybody who uses ctype on them gets what
they get. In the kernel, the most likely use of it is for 'isprint()',
and if those care, they can (and some do) use 'isascii()' in addition.

I don't know if you realize, but the kernel already says "screw libc",
and makes all the isxyz() things just cast the argument to 'unsigned
char', and doesn't care about EOF.

And for the rest, let's just call it the "kernel locale", and just
admit that the kernel locale is entirely historical.

Boom - problem solved, and it's entirely standards conformant (apart
possibly from the EOF case, I think that is marked as a "lower case
character" right now ;)

Looking through

    https://pubs.opengroup.org/onlinepubs/9699919799/

I'm not actually seeing anything that says that we don't do *exactly*
what the standard requires.

You thinking that the kernel locale is US-ASCII is just wrong.

              Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-10-19 20:30     ` [PATCH v2] kbuild: treat char as always unsigned Jason A. Donenfeld
  2022-10-19 23:56       ` Linus Torvalds
  2022-10-24  9:24       ` Dan Carpenter
@ 2022-12-21 14:53       ` Guenter Roeck
  2022-12-21 15:05         ` Geert Uytterhoeven
  2 siblings, 1 reply; 70+ messages in thread
From: Guenter Roeck @ 2022-12-21 14:53 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Linus Torvalds,
	Andy Shevchenko, Greg Kroah-Hartman, Geert Uytterhoeven,
	linux-m68k

On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> Recently, some compile-time checking I added to the clamp_t family of
> functions triggered a build error when a poorly written driver was
> compiled on ARM, because the driver assumed that the naked `char` type
> is signed, but ARM treats it as unsigned, and the C standard says it's
> architecture-dependent.
> 
> I doubt this particular driver is the only instance in which
> unsuspecting authors make assumptions about `char` with no `signed` or
> `unsigned` specifier. We were lucky enough this time that that driver
> used `clamp_t(char, negative_value, positive_value)`, so the new
> checking code found it, and I've sent a patch to fix it, but there are
> likely other places lurking that won't be so easily unearthed.
> 
> So let's just eliminate this particular variety of heisensign bugs
> entirely. Set `-funsigned-char` globally, so that gcc makes the type
> unsigned on all architectures.
> 
> This will break things in some places and fix things in others, so this
> will likely cause a bit of churn while reconciling the type misuse.
> 

There is an interesting fallout: When running the m68k:q800 qemu emulation,
there are lots of warning backtraces.

WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
------------[ cut here ]------------
WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'

and so on for pretty much every entry in the alg_test_descs[] array.

Bisect points to this patch, and reverting it fixes the problem.

It looks like the problem is that arch/m68k/include/asm/string.h
uses "char res" to store the result of strcmp(), and char is now
unsigned - meaning strcmp() will now never return a value < 0.
Effectively that means that strcmp() is broken on m68k if
CONFIG_COLDFIRE=n.

The fix is probably quite simple.

diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
index f759d944c449..b8f4ae19e8f6 100644
--- a/arch/m68k/include/asm/string.h
+++ b/arch/m68k/include/asm/string.h
@@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
 #define __HAVE_ARCH_STRCMP
 static inline int strcmp(const char *cs, const char *ct)
 {
-       char res;
+       signed char res;

        asm ("\n"
                "1:     move.b  (%0)+,%2\n"     /* get *cs */

Does that make sense ? If so I can send a patch.

Guenter

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 14:53       ` Guenter Roeck
@ 2022-12-21 15:05         ` Geert Uytterhoeven
  2022-12-21 15:23           ` Guenter Roeck
  2022-12-21 15:29           ` Rasmus Villemoes
  0 siblings, 2 replies; 70+ messages in thread
From: Geert Uytterhoeven @ 2022-12-21 15:05 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Linus Torvalds, Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

Hi Günter,

On Wed, Dec 21, 2022 at 3:54 PM Guenter Roeck <linux@roeck-us.net> wrote:
> On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> > Recently, some compile-time checking I added to the clamp_t family of
> > functions triggered a build error when a poorly written driver was
> > compiled on ARM, because the driver assumed that the naked `char` type
> > is signed, but ARM treats it as unsigned, and the C standard says it's
> > architecture-dependent.
> >
> > I doubt this particular driver is the only instance in which
> > unsuspecting authors make assumptions about `char` with no `signed` or
> > `unsigned` specifier. We were lucky enough this time that that driver
> > used `clamp_t(char, negative_value, positive_value)`, so the new
> > checking code found it, and I've sent a patch to fix it, but there are
> > likely other places lurking that won't be so easily unearthed.
> >
> > So let's just eliminate this particular variety of heisensign bugs
> > entirely. Set `-funsigned-char` globally, so that gcc makes the type
> > unsigned on all architectures.
> >
> > This will break things in some places and fix things in others, so this
> > will likely cause a bit of churn while reconciling the type misuse.
> >
>
> There is an interesting fallout: When running the m68k:q800 qemu emulation,
> there are lots of warning backtraces.
>
> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'
>
> and so on for pretty much every entry in the alg_test_descs[] array.
>
> Bisect points to this patch, and reverting it fixes the problem.
>
> It looks like the problem is that arch/m68k/include/asm/string.h
> uses "char res" to store the result of strcmp(), and char is now
> unsigned - meaning strcmp() will now never return a value < 0.
> Effectively that means that strcmp() is broken on m68k if
> CONFIG_COLDFIRE=n.
>
> The fix is probably quite simple.
>
> diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
> index f759d944c449..b8f4ae19e8f6 100644
> --- a/arch/m68k/include/asm/string.h
> +++ b/arch/m68k/include/asm/string.h
> @@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
>  #define __HAVE_ARCH_STRCMP
>  static inline int strcmp(const char *cs, const char *ct)
>  {
> -       char res;
> +       signed char res;
>
>         asm ("\n"
>                 "1:     move.b  (%0)+,%2\n"     /* get *cs */
>
> Does that make sense ? If so I can send a patch.

Thanks, been there, done that
https://lore.kernel.org/all/bce014e60d7b1a3d1c60009fc3572e2f72591f21.1671110959.git.geert@linux-m68k.org

Note that we detected other issues with the m68k strcmp(), so
probably that patch wouldn't go in as-is.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:05         ` Geert Uytterhoeven
@ 2022-12-21 15:23           ` Guenter Roeck
  2022-12-21 15:29           ` Rasmus Villemoes
  1 sibling, 0 replies; 70+ messages in thread
From: Guenter Roeck @ 2022-12-21 15:23 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Linus Torvalds, Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 04:05:45PM +0100, Geert Uytterhoeven wrote:
> Hi Günter,
> 
> On Wed, Dec 21, 2022 at 3:54 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> > > Recently, some compile-time checking I added to the clamp_t family of
> > > functions triggered a build error when a poorly written driver was
> > > compiled on ARM, because the driver assumed that the naked `char` type
> > > is signed, but ARM treats it as unsigned, and the C standard says it's
> > > architecture-dependent.
> > >
> > > I doubt this particular driver is the only instance in which
> > > unsuspecting authors make assumptions about `char` with no `signed` or
> > > `unsigned` specifier. We were lucky enough this time that that driver
> > > used `clamp_t(char, negative_value, positive_value)`, so the new
> > > checking code found it, and I've sent a patch to fix it, but there are
> > > likely other places lurking that won't be so easily unearthed.
> > >
> > > So let's just eliminate this particular variety of heisensign bugs
> > > entirely. Set `-funsigned-char` globally, so that gcc makes the type
> > > unsigned on all architectures.
> > >
> > > This will break things in some places and fix things in others, so this
> > > will likely cause a bit of churn while reconciling the type misuse.
> > >
> >
> > There is an interesting fallout: When running the m68k:q800 qemu emulation,
> > there are lots of warning backtraces.
> >
> > WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> > testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> > testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'
> >
> > and so on for pretty much every entry in the alg_test_descs[] array.
> >
> > Bisect points to this patch, and reverting it fixes the problem.
> >
> > It looks like the problem is that arch/m68k/include/asm/string.h
> > uses "char res" to store the result of strcmp(), and char is now
> > unsigned - meaning strcmp() will now never return a value < 0.
> > Effectively that means that strcmp() is broken on m68k if
> > CONFIG_COLDFIRE=n.
> >
> > The fix is probably quite simple.
> >
> > diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
> > index f759d944c449..b8f4ae19e8f6 100644
> > --- a/arch/m68k/include/asm/string.h
> > +++ b/arch/m68k/include/asm/string.h
> > @@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
> >  #define __HAVE_ARCH_STRCMP
> >  static inline int strcmp(const char *cs, const char *ct)
> >  {
> > -       char res;
> > +       signed char res;
> >
> >         asm ("\n"
> >                 "1:     move.b  (%0)+,%2\n"     /* get *cs */
> >
> > Does that make sense ? If so I can send a patch.
> 
> Thanks, been there, done that
> https://lore.kernel.org/all/bce014e60d7b1a3d1c60009fc3572e2f72591f21.1671110959.git.geert@linux-m68k.org
> 
> Note that we detected other issues with the m68k strcmp(), so
> probably that patch wouldn't go in as-is.
> 

So anything non-Coldfire is and will remain broken on m68k for the time
being ? Wouldn't it be better to fix the acute problem now and address
the long-standing problem(s) separately ? 

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:05         ` Geert Uytterhoeven
  2022-12-21 15:23           ` Guenter Roeck
@ 2022-12-21 15:29           ` Rasmus Villemoes
  2022-12-21 15:56             ` Guenter Roeck
  2022-12-21 16:57             ` Geert Uytterhoeven
  1 sibling, 2 replies; 70+ messages in thread
From: Rasmus Villemoes @ 2022-12-21 15:29 UTC (permalink / raw)
  To: Geert Uytterhoeven, Guenter Roeck
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Linus Torvalds, Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

On 21/12/2022 16.05, Geert Uytterhoeven wrote:
> Hi Günter,
> 
> On Wed, Dec 21, 2022 at 3:54 PM Guenter Roeck <linux@roeck-us.net> wrote:
>> On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
>>> Recently, some compile-time checking I added to the clamp_t family of
>>> functions triggered a build error when a poorly written driver was
>>> compiled on ARM, because the driver assumed that the naked `char` type
>>> is signed, but ARM treats it as unsigned, and the C standard says it's
>>> architecture-dependent.
>>>
>>> I doubt this particular driver is the only instance in which
>>> unsuspecting authors make assumptions about `char` with no `signed` or
>>> `unsigned` specifier. We were lucky enough this time that that driver
>>> used `clamp_t(char, negative_value, positive_value)`, so the new
>>> checking code found it, and I've sent a patch to fix it, but there are
>>> likely other places lurking that won't be so easily unearthed.
>>>
>>> So let's just eliminate this particular variety of heisensign bugs
>>> entirely. Set `-funsigned-char` globally, so that gcc makes the type
>>> unsigned on all architectures.
>>>
>>> This will break things in some places and fix things in others, so this
>>> will likely cause a bit of churn while reconciling the type misuse.
>>>
>>
>> There is an interesting fallout: When running the m68k:q800 qemu emulation,
>> there are lots of warning backtraces.
>>
>> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
>> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
>> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'
>>
>> and so on for pretty much every entry in the alg_test_descs[] array.
>>
>> Bisect points to this patch, and reverting it fixes the problem.
>>
>> It looks like the problem is that arch/m68k/include/asm/string.h
>> uses "char res" to store the result of strcmp(), and char is now
>> unsigned - meaning strcmp() will now never return a value < 0.
>> Effectively that means that strcmp() is broken on m68k if
>> CONFIG_COLDFIRE=n.
>>
>> The fix is probably quite simple.
>>
>> diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
>> index f759d944c449..b8f4ae19e8f6 100644
>> --- a/arch/m68k/include/asm/string.h
>> +++ b/arch/m68k/include/asm/string.h
>> @@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
>>  #define __HAVE_ARCH_STRCMP
>>  static inline int strcmp(const char *cs, const char *ct)
>>  {
>> -       char res;
>> +       signed char res;
>>
>>         asm ("\n"
>>                 "1:     move.b  (%0)+,%2\n"     /* get *cs */
>>
>> Does that make sense ? If so I can send a patch.
> 
> Thanks, been there, done that
> https://lore.kernel.org/all/bce014e60d7b1a3d1c60009fc3572e2f72591f21.1671110959.git.geert@linux-m68k.org

Well, looks like that would still leave strcmp() buggy, you can't
represent all possible differences between two char values (signed or
not) in an 8-bit quantity. So any implementation based on returning the
first non-zero value of *a - *b must store that intermediate value in
something wider. Otherwise you'll get -128 from strcmp("\x40", "\xc0"),
but _also_ -128 when you do strcmp("\xc0", "\x40"), which is obviously
bogus.

I recently fixed that long-standing bug in U-Boot's strcmp() and a
similar one in nolibc in the linux tree. I wonder how many more
instances exist.

Rasmus


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:29           ` Rasmus Villemoes
@ 2022-12-21 15:56             ` Guenter Roeck
  2022-12-21 17:06               ` Linus Torvalds
  2022-12-21 17:49               ` Andreas Schwab
  2022-12-21 16:57             ` Geert Uytterhoeven
  1 sibling, 2 replies; 70+ messages in thread
From: Guenter Roeck @ 2022-12-21 15:56 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Geert Uytterhoeven, Jason A. Donenfeld, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Kees Cook, Andrew Morton, Linus Torvalds, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 04:29:11PM +0100, Rasmus Villemoes wrote:
> On 21/12/2022 16.05, Geert Uytterhoeven wrote:
> > Hi Günter,
> > 
> > On Wed, Dec 21, 2022 at 3:54 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >> On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> >>> Recently, some compile-time checking I added to the clamp_t family of
> >>> functions triggered a build error when a poorly written driver was
> >>> compiled on ARM, because the driver assumed that the naked `char` type
> >>> is signed, but ARM treats it as unsigned, and the C standard says it's
> >>> architecture-dependent.
> >>>
> >>> I doubt this particular driver is the only instance in which
> >>> unsuspecting authors make assumptions about `char` with no `signed` or
> >>> `unsigned` specifier. We were lucky enough this time that that driver
> >>> used `clamp_t(char, negative_value, positive_value)`, so the new
> >>> checking code found it, and I've sent a patch to fix it, but there are
> >>> likely other places lurking that won't be so easily unearthed.
> >>>
> >>> So let's just eliminate this particular variety of heisensign bugs
> >>> entirely. Set `-funsigned-char` globally, so that gcc makes the type
> >>> unsigned on all architectures.
> >>>
> >>> This will break things in some places and fix things in others, so this
> >>> will likely cause a bit of churn while reconciling the type misuse.
> >>>
> >>
> >> There is an interesting fallout: When running the m68k:q800 qemu emulation,
> >> there are lots of warning backtraces.
> >>
> >> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> >> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
> >> ------------[ cut here ]------------
> >> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> >> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'
> >>
> >> and so on for pretty much every entry in the alg_test_descs[] array.
> >>
> >> Bisect points to this patch, and reverting it fixes the problem.
> >>
> >> It looks like the problem is that arch/m68k/include/asm/string.h
> >> uses "char res" to store the result of strcmp(), and char is now
> >> unsigned - meaning strcmp() will now never return a value < 0.
> >> Effectively that means that strcmp() is broken on m68k if
> >> CONFIG_COLDFIRE=n.
> >>
> >> The fix is probably quite simple.
> >>
> >> diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
> >> index f759d944c449..b8f4ae19e8f6 100644
> >> --- a/arch/m68k/include/asm/string.h
> >> +++ b/arch/m68k/include/asm/string.h
> >> @@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
> >>  #define __HAVE_ARCH_STRCMP
> >>  static inline int strcmp(const char *cs, const char *ct)
> >>  {
> >> -       char res;
> >> +       signed char res;
> >>
> >>         asm ("\n"
> >>                 "1:     move.b  (%0)+,%2\n"     /* get *cs */
> >>
> >> Does that make sense ? If so I can send a patch.
> > 
> > Thanks, been there, done that
> > https://lore.kernel.org/all/bce014e60d7b1a3d1c60009fc3572e2f72591f21.1671110959.git.geert@linux-m68k.org
> 
> Well, looks like that would still leave strcmp() buggy, you can't
> represent all possible differences between two char values (signed or
> not) in an 8-bit quantity. So any implementation based on returning the
> first non-zero value of *a - *b must store that intermediate value in
> something wider. Otherwise you'll get -128 from strcmp("\x40", "\xc0"),
> but _also_ -128 when you do strcmp("\xc0", "\x40"), which is obviously
> bogus.
> 

The above assumes an unsigned char as input to strcmp(). I consider that
a hypothetical problem because "comparing" strings with upper bits
set doesn't really make sense in practice (How does one compare Günter
against Gunter ? And how about Gǖnter ?). On the other side, the problem
observed here is real and immediate.

Guenter

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:29           ` Rasmus Villemoes
  2022-12-21 15:56             ` Guenter Roeck
@ 2022-12-21 16:57             ` Geert Uytterhoeven
  1 sibling, 0 replies; 70+ messages in thread
From: Geert Uytterhoeven @ 2022-12-21 16:57 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Guenter Roeck, Jason A. Donenfeld, linux-kernel, linux-kbuild,
	linux-arch, linux-toolchains, Masahiro Yamada, Kees Cook,
	Andrew Morton, Linus Torvalds, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

Hi Rasmus,

On Wed, Dec 21, 2022 at 4:29 PM Rasmus Villemoes
<rasmus.villemoes@prevas.dk> wrote:
> On 21/12/2022 16.05, Geert Uytterhoeven wrote:
> > On Wed, Dec 21, 2022 at 3:54 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >> On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> >>> Recently, some compile-time checking I added to the clamp_t family of
> >>> functions triggered a build error when a poorly written driver was
> >>> compiled on ARM, because the driver assumed that the naked `char` type
> >>> is signed, but ARM treats it as unsigned, and the C standard says it's
> >>> architecture-dependent.
> >>>
> >>> I doubt this particular driver is the only instance in which
> >>> unsuspecting authors make assumptions about `char` with no `signed` or
> >>> `unsigned` specifier. We were lucky enough this time that that driver
> >>> used `clamp_t(char, negative_value, positive_value)`, so the new
> >>> checking code found it, and I've sent a patch to fix it, but there are
> >>> likely other places lurking that won't be so easily unearthed.
> >>>
> >>> So let's just eliminate this particular variety of heisensign bugs
> >>> entirely. Set `-funsigned-char` globally, so that gcc makes the type
> >>> unsigned on all architectures.
> >>>
> >>> This will break things in some places and fix things in others, so this
> >>> will likely cause a bit of churn while reconciling the type misuse.
> >>>
> >>
> >> There is an interesting fallout: When running the m68k:q800 qemu emulation,
> >> there are lots of warning backtraces.
> >>
> >> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> >> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
> >> ------------[ cut here ]------------
> >> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> >> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'
> >>
> >> and so on for pretty much every entry in the alg_test_descs[] array.
> >>
> >> Bisect points to this patch, and reverting it fixes the problem.
> >>
> >> It looks like the problem is that arch/m68k/include/asm/string.h
> >> uses "char res" to store the result of strcmp(), and char is now
> >> unsigned - meaning strcmp() will now never return a value < 0.
> >> Effectively that means that strcmp() is broken on m68k if
> >> CONFIG_COLDFIRE=n.
> >>
> >> The fix is probably quite simple.
> >>
> >> diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
> >> index f759d944c449..b8f4ae19e8f6 100644
> >> --- a/arch/m68k/include/asm/string.h
> >> +++ b/arch/m68k/include/asm/string.h
> >> @@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
> >>  #define __HAVE_ARCH_STRCMP
> >>  static inline int strcmp(const char *cs, const char *ct)
> >>  {
> >> -       char res;
> >> +       signed char res;
> >>
> >>         asm ("\n"
> >>                 "1:     move.b  (%0)+,%2\n"     /* get *cs */
> >>
> >> Does that make sense ? If so I can send a patch.
> >
> > Thanks, been there, done that
> > https://lore.kernel.org/all/bce014e60d7b1a3d1c60009fc3572e2f72591f21.1671110959.git.geert@linux-m68k.org
>
> Well, looks like that would still leave strcmp() buggy, you can't
> represent all possible differences between two char values (signed or
> not) in an 8-bit quantity. So any implementation based on returning the
> first non-zero value of *a - *b must store that intermediate value in
> something wider. Otherwise you'll get -128 from strcmp("\x40", "\xc0"),
> but _also_ -128 when you do strcmp("\xc0", "\x40"), which is obviously
> bogus.

So we have https://lore.kernel.org/all/87bko3ia88.fsf@igel.home ;-)

And the other issue is m68k strcmp() calls being dropped by the
optimizer, cfr. the discussion in
https://lore.kernel.org/all/b673f98db7d14d53a6e1a1957ef81741@AcuMS.aculab.com

> I recently fixed that long-standing bug in U-Boot's strcmp() and a
> similar one in nolibc in the linux tree. I wonder how many more
> instances exist.

Thanks, commit fb63362c63c7aeac ("lib: fix buggy strcmp and strncmp") in
v2023.01-rc1, which is not yet in a released version.
(and in plain C, not in asm ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:56             ` Guenter Roeck
@ 2022-12-21 17:06               ` Linus Torvalds
  2022-12-21 17:19                 ` Guenter Roeck
  2022-12-22 10:41                 ` David Laight
  2022-12-21 17:49               ` Andreas Schwab
  1 sibling, 2 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-12-21 17:06 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 7:56 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> The above assumes an unsigned char as input to strcmp(). I consider that
> a hypothetical problem because "comparing" strings with upper bits
> set doesn't really make sense in practice (How does one compare Günter
> against Gunter ? And how about Gǖnter ?). On the other side, the problem
> observed here is real and immediate.

POSIX does actually specify "Günter" vs "Gunter".

The way strcmp is supposed to work is to return the sign of the
difference between the byte values ("unsigned char").

But that sign has to be computed in 'int', not in 'signed char'.

So yes, the m68k implementation is broken regardless, but with a
signed char it just happened to work for the US-ASCII case that the
crypto case tested.

I think the real fix is to just remove that broken implementation
entirely, and rely on the generic one.

I'll commit that, and see what happens.

               Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 17:06               ` Linus Torvalds
@ 2022-12-21 17:19                 ` Guenter Roeck
  2022-12-21 18:46                   ` Linus Torvalds
  2022-12-22 10:41                 ` David Laight
  1 sibling, 1 reply; 70+ messages in thread
From: Guenter Roeck @ 2022-12-21 17:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 09:06:41AM -0800, Linus Torvalds wrote:
> On Wed, Dec 21, 2022 at 7:56 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > The above assumes an unsigned char as input to strcmp(). I consider that
> > a hypothetical problem because "comparing" strings with upper bits
> > set doesn't really make sense in practice (How does one compare Günter
> > against Gunter ? And how about Gǖnter ?). On the other side, the problem
> > observed here is real and immediate.
> 
> POSIX does actually specify "Günter" vs "Gunter".
> 
> The way strcmp is supposed to work is to return the sign of the
> difference between the byte values ("unsigned char").
> 
> But that sign has to be computed in 'int', not in 'signed char'.
> 
> So yes, the m68k implementation is broken regardless, but with a
> signed char it just happened to work for the US-ASCII case that the
> crypto case tested.
> 

I understand. I just prefer a known limited breakage to completely
broken code.

> I think the real fix is to just remove that broken implementation
> entirely, and rely on the generic one.

Perfectly fine with me.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:56             ` Guenter Roeck
  2022-12-21 17:06               ` Linus Torvalds
@ 2022-12-21 17:49               ` Andreas Schwab
  1 sibling, 0 replies; 70+ messages in thread
From: Andreas Schwab @ 2022-12-21 17:49 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Linus Torvalds,
	Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

On Dez 21 2022, Guenter Roeck wrote:

> The above assumes an unsigned char as input to strcmp().

That's how strcmp is defined.

See <https://lore.kernel.org/all/87bko3ia88.fsf@igel.home>

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 17:19                 ` Guenter Roeck
@ 2022-12-21 18:46                   ` Linus Torvalds
  2022-12-21 19:08                     ` Linus Torvalds
                                       ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-12-21 18:46 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 9:19 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On Wed, Dec 21, 2022 at 09:06:41AM -0800, Linus Torvalds wrote:
> >
> > I think the real fix is to just remove that broken implementation
> > entirely, and rely on the generic one.
>
> Perfectly fine with me.

That got pushed out as commit 7c0846125358 ("m68k: remove broken
strcmp implementation") but it's obviously entirely untested. I don't
do m68k cross-compiles, much less boot tests.

Just FYI for everybody - I may have screwed something up for some very
non-obvious reason.

But it looked very obvious indeed, and I hate having buggy code that
is architecture-specific when we have generic code that isn't buggy.

                   Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 18:46                   ` Linus Torvalds
@ 2022-12-21 19:08                     ` Linus Torvalds
  2022-12-21 21:01                     ` Guenter Roeck
  2022-12-22 13:05                     ` Geert Uytterhoeven
  2 siblings, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2022-12-21 19:08 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 10:46 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But it looked very obvious indeed, and I hate having buggy code that
> is architecture-specific when we have generic code that isn't buggy.

Side note: we have an x86-64 implementation that looks fine (but not
really noticeably better than the generic one) that is based on the
'return subtraction' model. But it seems to get it right.

And we have a 32-bit x86 assembly thing that is based on 'rep scasb',
that then uses the carry bit to also get things right.

That 32-bit asm goes back to Linux 0.01 (with some changes since to
use "sbbl+or" instead of a conditional neg). I was playing around a
lot with the 'rep' instructions back when, since it was all part of
"learn the instruction set" for me.

Both of them should probably be removed as pointless too, but they
don't seem actively buggy.

               Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 18:46                   ` Linus Torvalds
  2022-12-21 19:08                     ` Linus Torvalds
@ 2022-12-21 21:01                     ` Guenter Roeck
  2022-12-22 13:05                     ` Geert Uytterhoeven
  2 siblings, 0 replies; 70+ messages in thread
From: Guenter Roeck @ 2022-12-21 21:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 10:46:08AM -0800, Linus Torvalds wrote:
> On Wed, Dec 21, 2022 at 9:19 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > On Wed, Dec 21, 2022 at 09:06:41AM -0800, Linus Torvalds wrote:
> > >
> > > I think the real fix is to just remove that broken implementation
> > > entirely, and rely on the generic one.
> >
> > Perfectly fine with me.
> 
> That got pushed out as commit 7c0846125358 ("m68k: remove broken
> strcmp implementation") but it's obviously entirely untested. I don't
> do m68k cross-compiles, much less boot tests.
> 
> Just FYI for everybody - I may have screwed something up for some very
> non-obvious reason.
> 
No worries:

Build reference: msi-fixes-6.2-1-2644-g0a924817d2ed

Building mcf5208evb:m5208:m5208evb_defconfig:initrd ... running ..... passed
Building q800:m68040:mac_defconfig:initrd ... running ..... passed
Building q800:m68040:mac_defconfig:rootfs ... running ..... passed

Guenter

^ permalink raw reply	[flat|nested] 70+ messages in thread

* RE: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 17:06               ` Linus Torvalds
  2022-12-21 17:19                 ` Guenter Roeck
@ 2022-12-22 10:41                 ` David Laight
       [not found]                   ` <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>
  1 sibling, 1 reply; 70+ messages in thread
From: David Laight @ 2022-12-22 10:41 UTC (permalink / raw)
  To: 'Linus Torvalds', Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

From: Linus Torvalds
> Sent: 21 December 2022 17:07
> 
> On Wed, Dec 21, 2022 at 7:56 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > The above assumes an unsigned char as input to strcmp(). I consider that
> > a hypothetical problem because "comparing" strings with upper bits
> > set doesn't really make sense in practice (How does one compare Günter
> > against Gunter ? And how about Gǖnter ?). On the other side, the problem
> > observed here is real and immediate.
> 
> POSIX does actually specify "Günter" vs "Gunter".
> 
> The way strcmp is supposed to work is to return the sign of the
> difference between the byte values ("unsigned char").
> 
> But that sign has to be computed in 'int', not in 'signed char'.
> 
> So yes, the m68k implementation is broken regardless, but with a
> signed char it just happened to work for the US-ASCII case that the
> crypto case tested.
> 
> I think the real fix is to just remove that broken implementation
> entirely, and rely on the generic one.

I wonder how much slower it is - m68k is likely to be microcoded
and I don't think instruction timings are actually available.
The fastest version probably uses subx (with carry) to generate
0/-1 and leaves +delta for the other result - but getting the
compares and branches in the right order is hard.

I believe some of the other m68k asm functions are also missing
the "memory" 'clobber' and so could get mis-optimised.
While I can write (or rather have written) m68k asm I don't have
a compiler.

I also suspect that any x86 code that uses 'rep scas' is going
to be slow on anything modern.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 18:46                   ` Linus Torvalds
  2022-12-21 19:08                     ` Linus Torvalds
  2022-12-21 21:01                     ` Guenter Roeck
@ 2022-12-22 13:05                     ` Geert Uytterhoeven
  2 siblings, 0 replies; 70+ messages in thread
From: Geert Uytterhoeven @ 2022-12-22 13:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Rasmus Villemoes, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

Hi Linus,

On Wed, Dec 21, 2022 at 7:46 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Dec 21, 2022 at 9:19 AM Guenter Roeck <linux@roeck-us.net> wrote:
> > On Wed, Dec 21, 2022 at 09:06:41AM -0800, Linus Torvalds wrote:
> > > I think the real fix is to just remove that broken implementation
> > > entirely, and rely on the generic one.
> >
> > Perfectly fine with me.
>
> That got pushed out as commit 7c0846125358 ("m68k: remove broken
> strcmp implementation") but it's obviously entirely untested. I don't
> do m68k cross-compiles, much less boot tests.
>
> Just FYI for everybody - I may have screwed something up for some very
> non-obvious reason.
>
> But it looked very obvious indeed, and I hate having buggy code that
> is architecture-specific when we have generic code that isn't buggy.

Thank you for being proactive!
It works fine (and slightly reduced kernel size, too ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
       [not found]                   ` <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>
@ 2022-12-24  9:47                     ` Geert Uytterhoeven
  2022-12-30 11:39                     ` David Laight
  1 sibling, 0 replies; 70+ messages in thread
From: Geert Uytterhoeven @ 2022-12-24  9:47 UTC (permalink / raw)
  To: Holger Lubitz
  Cc: David Laight, Linus Torvalds, Guenter Roeck, Rasmus Villemoes,
	Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

Hi Holger,

On Sat, Dec 24, 2022 at 10:34 AM Holger Lubitz
<holger.lubitz@t-online.de> wrote:
> On Thu, 2022-12-22 at 10:41 +0000, David Laight wrote:
> > I wonder how much slower it is - m68k is likely to be microcoded
> > and I don't think instruction timings are actually available.
>
> Not sure if these are in any way official, but
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68030timing.HTML
>
> (There's also
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68000timing.HTML
> but that is probably only interesting to demo coders by now)

Yes, instruction timings are available.  Unlike for e.g. x86, there
is only a very limited number of parts to consider.

> > I believe some of the other m68k asm functions are also missing
> > the "memory" 'clobber' and so could get mis-optimised.
>
> In which case would that happen? This function doesn't clobber memory
> and its result does get used. If gcc mistakenly thinks the parameters
> haven't changed and uses a previously cached result, wouldn't that
> apply to a C function too?

For a pure C inline function, the compiler knows exactly what it does.

For an external C function, the compiler assumes all odds are off.

For inline asm, the compiler doesn't know what happens with (the data
pointed to by) the pointers, unless that's described in the constraints.
We do have some inline asm that has "*ptr" in the constraints, but
that applies to a single value, not to an array.  And in case of
strings, the size of the array is not known without looking for the
zero-terminator.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 70+ messages in thread

* RE: [PATCH v2] kbuild: treat char as always unsigned
       [not found]                   ` <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>
  2022-12-24  9:47                     ` Geert Uytterhoeven
@ 2022-12-30 11:39                     ` David Laight
  2022-12-30 13:13                       ` David Laight
  2023-01-02  8:29                       ` Geert Uytterhoeven
  1 sibling, 2 replies; 70+ messages in thread
From: David Laight @ 2022-12-30 11:39 UTC (permalink / raw)
  To: 'Holger Lubitz', 'Linus Torvalds', Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

From: Holger Lubitz
> Sent: 24 December 2022 09:34
> 
> On Thu, 2022-12-22 at 10:41 +0000, David Laight wrote:
> > I wonder how much slower it is - m68k is likely to be microcoded
> > and I don't think instruction timings are actually available.
> 
> Not sure if these are in any way official, but
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68030timing.HTML

I thought about that some more and remember seeing memory timings
on a logic analyser - and getting timings that (more or less)
implied sequential execution limited by the obvious memory (cache)
accesses.

The microcoding is more apparent in the large mid-instruction
interrupt stack frames - eg for page faults.


> 
> (There's also
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68000timing.HTML
> but that is probably only interesting to demo coders by now)
> 
> > The fastest version probably uses subx (with carry) to generate
> > 0/-1 and leaves +delta for the other result - but getting the
> > compares and branches in the right order is hard.
> 
> Guess it must have been over 20 years since I wrote any 68k asm, but
> now I actually ended up installing Debian on qemu to experiment.
> 
> There are two interesting differences between 68k and x86 that can be
> useful here: Unlike x86, MOV on 68k sets the flags. And also, subx
> differs from sbb in that it resets the zero flag on a non-zero result,
> but does not set it on a zero result. So if it is set, it must have
> been set before.
> 
> Here are the two functions I came up with (tested only stand-alone, not
> in a kernel build. Also no benchmarks because this 68040 is only
> emulated)
> #1 (optimized for minimum instruction count in loop,
>     68k + Coldfire ISA_B)
> 
> int strcmp1(const char *cs, const char *ct)
> {
>         int res;
> 
>         asm ("\n"
>                 "1: move.b  (%0)+,%2\n"  /* get *cs */
>                 "   jeq     2f\n"        /* end of first string? */
>                 "   cmp.b   (%1)+,%2\n"  /* compare *ct */
>                 "   jeq     1b\n"        /* if equal, continue */
>                 "   jra     3f\n"        /* else skip to tail */
>                 "2: cmp.b   (%1)+,%2\n"  /* compare one last byte */
>                 "3: subx.l  %2, %2\n"    /* -1 if borrow, 0 if not */
>                 "   jls     4f\n"        /* if set, z is from sub.b */

The subx will set Z unless C was set.
So that doesn't seem right.

>                 "   moveq.l #1, %2\n"    /* 1 if !borrow  */
>                 "4:"
>                 : "+a" (cs), "+a" (ct), "=d" (res));
>         return res;
> }

I think this should work:
(But the jc might need to be jnc.)

                 "   moveq.l #0,%2\n"     /* zero high bits of result */
                 "1: move.b  (%1)+,%2\n"  /* get *ct */
                 "   jeq     2f\n"        /* end of second string? */
                 "   cmp.b   (%0)+,%2\n"  /* compare *cs */
                 "   jeq     1b\n"        /* if equal, continue */
                 "   jc      4f           /* return +ve */
                 "   moveq.l #-1, %2\n"   /* return -ve */
                 "   jra     4f\n"
                 "2: move.b  (%0),%2\n"   /* check for matching strings */
                 "4:"


> #2 (optimized for minimum code size,
>     Coldfire ISA_A compatible)
> 
> int strcmp2(const char *cs, const char *ct)
> {
>         int res = 0, tmp = 0;
> 
>         asm ("\n"
>                 "1: move.b (%0)+,%2\n" /* get *cs */
>                 "   move.b (%1)+,%3\n" /* get *ct */
>                 "   subx.l %3,%2\n"    /* compare a byte */
>                 "   jeq    2f\n"       /* both inputs were zero */

That doesn't seem right.
Z will be set if either *ct is zero or the bytes match.

>                 "   tst.l  %2\n"       /* check result */

This only sets Z when it was already set by the subx.

>                 "   jeq    1b\n"       /* if zero, continue */
>                 "2:"
>                 : "+a" (cs), "+a" (ct), "+d" (res), "+d" (tmp));
>         return res;
> }
> 
> However, this one needs res and tmp to be set to zero, because we read
> only bytes (no automatic zero-extend on 68k), but then do a long
> operation on them. Coldfire ISA_A dropped cmpb, it only reappeared in
> ISA_B.
> 
> So the real instruction count is likely to be two more, unless gcc
> happens to have one or two zeros it can reuse.
> 
> > I believe some of the other m68k asm functions are also missing
> > the "memory" 'clobber' and so could get mis-optimised.
> 
> In which case would that happen? This function doesn't clobber memory
> and its result does get used. If gcc mistakenly thinks the parameters
> haven't changed and uses a previously cached result, wouldn't that
> apply to a C function too?

You need a memory 'clobber' on anything that READS memory as well
as writes it.

> > While I can write (or rather have written) m68k asm I don't have
> > a compiler.
> 
> Well, I now have an emulated Quadra 800 running Debian 68k.(Getting the
> emulated networking to work reliably was a bit problematic, though. But
> now it runs Kernel 6.0) qemu could emulate Coldfire too, but I am not
> sure where I would find a distribution for that.
> 
> I did not attach a patch because it seems already to be decided that
> the function is gone. But should anyone still want to include one (or
> both) of these functions, just give credit to me and I'm fine.

Thinking further the fastest strcmp() probably uses big-endian word compares
with a check for a zero byte.
Especially on 64 bit systems that support misaligned loads.
But I'd need to think hard about the actual details.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* RE: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-30 11:39                     ` David Laight
@ 2022-12-30 13:13                       ` David Laight
  2023-01-02  8:29                       ` Geert Uytterhoeven
  1 sibling, 0 replies; 70+ messages in thread
From: David Laight @ 2022-12-30 13:13 UTC (permalink / raw)
  To: David Laight, 'Holger Lubitz', 'Linus Torvalds',
	Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

....
> > int strcmp1(const char *cs, const char *ct)
> > {
> >         int res;
> >
> >         asm ("\n"
> >                 "1: move.b  (%0)+,%2\n"  /* get *cs */
> >                 "   jeq     2f\n"        /* end of first string? */
> >                 "   cmp.b   (%1)+,%2\n"  /* compare *ct */
> >                 "   jeq     1b\n"        /* if equal, continue */
> >                 "   jra     3f\n"        /* else skip to tail */
> >                 "2: cmp.b   (%1)+,%2\n"  /* compare one last byte */
> >                 "3: subx.l  %2, %2\n"    /* -1 if borrow, 0 if not */
> >                 "   jls     4f\n"        /* if set, z is from sub.b */
> 
> The subx will set Z unless C was set.
> So that doesn't seem right.

Clearly my brain was asleep earlier.
subx will clear Z not set it.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-30 11:39                     ` David Laight
  2022-12-30 13:13                       ` David Laight
@ 2023-01-02  8:29                       ` Geert Uytterhoeven
  1 sibling, 0 replies; 70+ messages in thread
From: Geert Uytterhoeven @ 2023-01-02  8:29 UTC (permalink / raw)
  To: David Laight
  Cc: Holger Lubitz, Linus Torvalds, Guenter Roeck, Rasmus Villemoes,
	Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

Hi David,

On Fri, Dec 30, 2022 at 12:39 PM David Laight <David.Laight@aculab.com> wrote:
> Thinking further the fastest strcmp() probably uses big-endian word compares
> with a check for a zero byte.
> Especially on 64 bit systems that support misaligned loads.
> But I'd need to think hard about the actual details.

arch/arc/lib/strcmp-archs.S
arch/csky/abiv2/strcmp.S

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2023-01-02  8:30 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-19 16:26 [PATCH] kbuild: treat char as always signed Jason A. Donenfeld
2022-10-19 16:54 ` Segher Boessenkool
2022-10-19 17:14   ` Linus Torvalds
2022-10-19 17:26     ` Linus Torvalds
2022-10-19 18:10       ` Nick Desaulniers
2022-10-19 18:35         ` Linus Torvalds
2022-10-19 19:23           ` Andy Shevchenko
2022-10-19 19:36             ` Linus Torvalds
2022-10-19 17:43     ` Segher Boessenkool
2022-10-19 18:11       ` Linus Torvalds
2022-10-19 18:20         ` Nick Desaulniers
2022-10-19 18:56           ` Linus Torvalds
2022-10-19 19:11             ` Kees Cook
2022-10-19 19:30               ` Linus Torvalds
2022-10-19 20:35                 ` Jason A. Donenfeld
2022-10-20  0:10                   ` Linus Torvalds
2022-10-20  3:11                     ` Jason A. Donenfeld
2022-10-19 20:15             ` Segher Boessenkool
2022-10-19 21:07         ` David Laight
2022-10-19 21:26           ` Segher Boessenkool
2022-10-20 10:41         ` Gabriel Paubert
2022-10-21 22:46           ` Linus Torvalds
2022-10-22  6:06             ` Gabriel Paubert
2022-10-22 18:16               ` Linus Torvalds
2022-10-23 20:23                 ` Gabriel Paubert
2022-10-25 23:00                   ` Kees Cook
2022-10-26  0:04                     ` Jason A. Donenfeld
2022-10-26 15:41                       ` Kees Cook
2022-10-19 19:54 ` Linus Torvalds
2022-10-19 20:23   ` Jason A. Donenfeld
2022-10-19 20:30     ` [PATCH v2] kbuild: treat char as always unsigned Jason A. Donenfeld
2022-10-19 23:56       ` Linus Torvalds
2022-10-20  0:02         ` Jason A. Donenfeld
2022-10-20  0:38           ` Linus Torvalds
2022-10-20  2:59             ` Jason A. Donenfeld
2022-10-20 18:41             ` Kees Cook
2022-10-21  1:01               ` Jason A. Donenfeld
2022-10-20 20:24         ` Segher Boessenkool
2022-10-24  9:24       ` Dan Carpenter
2022-10-24  9:30         ` Dan Carpenter
2022-10-24 16:33           ` Jason A. Donenfeld
2022-10-24 17:10             ` Linus Torvalds
2022-10-24 17:17               ` Jason A. Donenfeld
2022-10-25 19:22                 ` Kalle Valo
2022-10-25 10:16               ` David Laight
2022-10-24 15:17         ` Jason A. Donenfeld
2022-12-21 14:53       ` Guenter Roeck
2022-12-21 15:05         ` Geert Uytterhoeven
2022-12-21 15:23           ` Guenter Roeck
2022-12-21 15:29           ` Rasmus Villemoes
2022-12-21 15:56             ` Guenter Roeck
2022-12-21 17:06               ` Linus Torvalds
2022-12-21 17:19                 ` Guenter Roeck
2022-12-21 18:46                   ` Linus Torvalds
2022-12-21 19:08                     ` Linus Torvalds
2022-12-21 21:01                     ` Guenter Roeck
2022-12-22 13:05                     ` Geert Uytterhoeven
2022-12-22 10:41                 ` David Laight
     [not found]                   ` <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>
2022-12-24  9:47                     ` Geert Uytterhoeven
2022-12-30 11:39                     ` David Laight
2022-12-30 13:13                       ` David Laight
2023-01-02  8:29                       ` Geert Uytterhoeven
2022-12-21 17:49               ` Andreas Schwab
2022-12-21 16:57             ` Geert Uytterhoeven
2022-10-19 20:58   ` [PATCH] kbuild: treat char as always signed David Laight
2022-10-26  0:10   ` make ctype ascii only? (was [PATCH] kbuild: treat char as always signed) Rasmus Villemoes
2022-10-26 18:10     ` Linus Torvalds
2022-10-27  7:59       ` Rasmus Villemoes
2022-10-27 18:28         ` Linus Torvalds
     [not found] ` <202210201618.8XhEGsLd-lkp@intel.com>
2022-10-20 16:33   ` [PATCH] kbuild: treat char as always signed Jason A. Donenfeld

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).