linux-m68k.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2] kbuild: treat char as always unsigned
       [not found] ` <20221019203034.3795710-1-Jason@zx2c4.com>
@ 2022-12-21 14:53   ` Guenter Roeck
  2022-12-21 15:05     ` Geert Uytterhoeven
  0 siblings, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2022-12-21 14:53 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Linus Torvalds,
	Andy Shevchenko, Greg Kroah-Hartman, Geert Uytterhoeven,
	linux-m68k

On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> Recently, some compile-time checking I added to the clamp_t family of
> functions triggered a build error when a poorly written driver was
> compiled on ARM, because the driver assumed that the naked `char` type
> is signed, but ARM treats it as unsigned, and the C standard says it's
> architecture-dependent.
> 
> I doubt this particular driver is the only instance in which
> unsuspecting authors make assumptions about `char` with no `signed` or
> `unsigned` specifier. We were lucky enough this time that that driver
> used `clamp_t(char, negative_value, positive_value)`, so the new
> checking code found it, and I've sent a patch to fix it, but there are
> likely other places lurking that won't be so easily unearthed.
> 
> So let's just eliminate this particular variety of heisensign bugs
> entirely. Set `-funsigned-char` globally, so that gcc makes the type
> unsigned on all architectures.
> 
> This will break things in some places and fix things in others, so this
> will likely cause a bit of churn while reconciling the type misuse.
> 

There is an interesting fallout: When running the m68k:q800 qemu emulation,
there are lots of warning backtraces.

WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
------------[ cut here ]------------
WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'

and so on for pretty much every entry in the alg_test_descs[] array.

Bisect points to this patch, and reverting it fixes the problem.

It looks like the problem is that arch/m68k/include/asm/string.h
uses "char res" to store the result of strcmp(), and char is now
unsigned - meaning strcmp() will now never return a value < 0.
Effectively that means that strcmp() is broken on m68k if
CONFIG_COLDFIRE=n.

The fix is probably quite simple.

diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
index f759d944c449..b8f4ae19e8f6 100644
--- a/arch/m68k/include/asm/string.h
+++ b/arch/m68k/include/asm/string.h
@@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
 #define __HAVE_ARCH_STRCMP
 static inline int strcmp(const char *cs, const char *ct)
 {
-       char res;
+       signed char res;

        asm ("\n"
                "1:     move.b  (%0)+,%2\n"     /* get *cs */

Does that make sense ? If so I can send a patch.

Guenter

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 14:53   ` [PATCH v2] kbuild: treat char as always unsigned Guenter Roeck
@ 2022-12-21 15:05     ` Geert Uytterhoeven
  2022-12-21 15:23       ` Guenter Roeck
  2022-12-21 15:29       ` Rasmus Villemoes
  0 siblings, 2 replies; 18+ messages in thread
From: Geert Uytterhoeven @ 2022-12-21 15:05 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Linus Torvalds, Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

Hi Günter,

On Wed, Dec 21, 2022 at 3:54 PM Guenter Roeck <linux@roeck-us.net> wrote:
> On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> > Recently, some compile-time checking I added to the clamp_t family of
> > functions triggered a build error when a poorly written driver was
> > compiled on ARM, because the driver assumed that the naked `char` type
> > is signed, but ARM treats it as unsigned, and the C standard says it's
> > architecture-dependent.
> >
> > I doubt this particular driver is the only instance in which
> > unsuspecting authors make assumptions about `char` with no `signed` or
> > `unsigned` specifier. We were lucky enough this time that that driver
> > used `clamp_t(char, negative_value, positive_value)`, so the new
> > checking code found it, and I've sent a patch to fix it, but there are
> > likely other places lurking that won't be so easily unearthed.
> >
> > So let's just eliminate this particular variety of heisensign bugs
> > entirely. Set `-funsigned-char` globally, so that gcc makes the type
> > unsigned on all architectures.
> >
> > This will break things in some places and fix things in others, so this
> > will likely cause a bit of churn while reconciling the type misuse.
> >
>
> There is an interesting fallout: When running the m68k:q800 qemu emulation,
> there are lots of warning backtraces.
>
> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'
>
> and so on for pretty much every entry in the alg_test_descs[] array.
>
> Bisect points to this patch, and reverting it fixes the problem.
>
> It looks like the problem is that arch/m68k/include/asm/string.h
> uses "char res" to store the result of strcmp(), and char is now
> unsigned - meaning strcmp() will now never return a value < 0.
> Effectively that means that strcmp() is broken on m68k if
> CONFIG_COLDFIRE=n.
>
> The fix is probably quite simple.
>
> diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
> index f759d944c449..b8f4ae19e8f6 100644
> --- a/arch/m68k/include/asm/string.h
> +++ b/arch/m68k/include/asm/string.h
> @@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
>  #define __HAVE_ARCH_STRCMP
>  static inline int strcmp(const char *cs, const char *ct)
>  {
> -       char res;
> +       signed char res;
>
>         asm ("\n"
>                 "1:     move.b  (%0)+,%2\n"     /* get *cs */
>
> Does that make sense ? If so I can send a patch.

Thanks, been there, done that
https://lore.kernel.org/all/bce014e60d7b1a3d1c60009fc3572e2f72591f21.1671110959.git.geert@linux-m68k.org

Note that we detected other issues with the m68k strcmp(), so
probably that patch wouldn't go in as-is.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:05     ` Geert Uytterhoeven
@ 2022-12-21 15:23       ` Guenter Roeck
  2022-12-21 15:29       ` Rasmus Villemoes
  1 sibling, 0 replies; 18+ messages in thread
From: Guenter Roeck @ 2022-12-21 15:23 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Linus Torvalds, Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 04:05:45PM +0100, Geert Uytterhoeven wrote:
> Hi Günter,
> 
> On Wed, Dec 21, 2022 at 3:54 PM Guenter Roeck <linux@roeck-us.net> wrote:
> > On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> > > Recently, some compile-time checking I added to the clamp_t family of
> > > functions triggered a build error when a poorly written driver was
> > > compiled on ARM, because the driver assumed that the naked `char` type
> > > is signed, but ARM treats it as unsigned, and the C standard says it's
> > > architecture-dependent.
> > >
> > > I doubt this particular driver is the only instance in which
> > > unsuspecting authors make assumptions about `char` with no `signed` or
> > > `unsigned` specifier. We were lucky enough this time that that driver
> > > used `clamp_t(char, negative_value, positive_value)`, so the new
> > > checking code found it, and I've sent a patch to fix it, but there are
> > > likely other places lurking that won't be so easily unearthed.
> > >
> > > So let's just eliminate this particular variety of heisensign bugs
> > > entirely. Set `-funsigned-char` globally, so that gcc makes the type
> > > unsigned on all architectures.
> > >
> > > This will break things in some places and fix things in others, so this
> > > will likely cause a bit of churn while reconciling the type misuse.
> > >
> >
> > There is an interesting fallout: When running the m68k:q800 qemu emulation,
> > there are lots of warning backtraces.
> >
> > WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> > testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
> > ------------[ cut here ]------------
> > WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> > testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'
> >
> > and so on for pretty much every entry in the alg_test_descs[] array.
> >
> > Bisect points to this patch, and reverting it fixes the problem.
> >
> > It looks like the problem is that arch/m68k/include/asm/string.h
> > uses "char res" to store the result of strcmp(), and char is now
> > unsigned - meaning strcmp() will now never return a value < 0.
> > Effectively that means that strcmp() is broken on m68k if
> > CONFIG_COLDFIRE=n.
> >
> > The fix is probably quite simple.
> >
> > diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
> > index f759d944c449..b8f4ae19e8f6 100644
> > --- a/arch/m68k/include/asm/string.h
> > +++ b/arch/m68k/include/asm/string.h
> > @@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
> >  #define __HAVE_ARCH_STRCMP
> >  static inline int strcmp(const char *cs, const char *ct)
> >  {
> > -       char res;
> > +       signed char res;
> >
> >         asm ("\n"
> >                 "1:     move.b  (%0)+,%2\n"     /* get *cs */
> >
> > Does that make sense ? If so I can send a patch.
> 
> Thanks, been there, done that
> https://lore.kernel.org/all/bce014e60d7b1a3d1c60009fc3572e2f72591f21.1671110959.git.geert@linux-m68k.org
> 
> Note that we detected other issues with the m68k strcmp(), so
> probably that patch wouldn't go in as-is.
> 

So anything non-Coldfire is and will remain broken on m68k for the time
being ? Wouldn't it be better to fix the acute problem now and address
the long-standing problem(s) separately ? 

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:05     ` Geert Uytterhoeven
  2022-12-21 15:23       ` Guenter Roeck
@ 2022-12-21 15:29       ` Rasmus Villemoes
  2022-12-21 15:56         ` Guenter Roeck
  2022-12-21 16:57         ` Geert Uytterhoeven
  1 sibling, 2 replies; 18+ messages in thread
From: Rasmus Villemoes @ 2022-12-21 15:29 UTC (permalink / raw)
  To: Geert Uytterhoeven, Guenter Roeck
  Cc: Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Linus Torvalds, Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

On 21/12/2022 16.05, Geert Uytterhoeven wrote:
> Hi Günter,
> 
> On Wed, Dec 21, 2022 at 3:54 PM Guenter Roeck <linux@roeck-us.net> wrote:
>> On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
>>> Recently, some compile-time checking I added to the clamp_t family of
>>> functions triggered a build error when a poorly written driver was
>>> compiled on ARM, because the driver assumed that the naked `char` type
>>> is signed, but ARM treats it as unsigned, and the C standard says it's
>>> architecture-dependent.
>>>
>>> I doubt this particular driver is the only instance in which
>>> unsuspecting authors make assumptions about `char` with no `signed` or
>>> `unsigned` specifier. We were lucky enough this time that that driver
>>> used `clamp_t(char, negative_value, positive_value)`, so the new
>>> checking code found it, and I've sent a patch to fix it, but there are
>>> likely other places lurking that won't be so easily unearthed.
>>>
>>> So let's just eliminate this particular variety of heisensign bugs
>>> entirely. Set `-funsigned-char` globally, so that gcc makes the type
>>> unsigned on all architectures.
>>>
>>> This will break things in some places and fix things in others, so this
>>> will likely cause a bit of churn while reconciling the type misuse.
>>>
>>
>> There is an interesting fallout: When running the m68k:q800 qemu emulation,
>> there are lots of warning backtraces.
>>
>> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
>> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
>> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'
>>
>> and so on for pretty much every entry in the alg_test_descs[] array.
>>
>> Bisect points to this patch, and reverting it fixes the problem.
>>
>> It looks like the problem is that arch/m68k/include/asm/string.h
>> uses "char res" to store the result of strcmp(), and char is now
>> unsigned - meaning strcmp() will now never return a value < 0.
>> Effectively that means that strcmp() is broken on m68k if
>> CONFIG_COLDFIRE=n.
>>
>> The fix is probably quite simple.
>>
>> diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
>> index f759d944c449..b8f4ae19e8f6 100644
>> --- a/arch/m68k/include/asm/string.h
>> +++ b/arch/m68k/include/asm/string.h
>> @@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
>>  #define __HAVE_ARCH_STRCMP
>>  static inline int strcmp(const char *cs, const char *ct)
>>  {
>> -       char res;
>> +       signed char res;
>>
>>         asm ("\n"
>>                 "1:     move.b  (%0)+,%2\n"     /* get *cs */
>>
>> Does that make sense ? If so I can send a patch.
> 
> Thanks, been there, done that
> https://lore.kernel.org/all/bce014e60d7b1a3d1c60009fc3572e2f72591f21.1671110959.git.geert@linux-m68k.org

Well, looks like that would still leave strcmp() buggy, you can't
represent all possible differences between two char values (signed or
not) in an 8-bit quantity. So any implementation based on returning the
first non-zero value of *a - *b must store that intermediate value in
something wider. Otherwise you'll get -128 from strcmp("\x40", "\xc0"),
but _also_ -128 when you do strcmp("\xc0", "\x40"), which is obviously
bogus.

I recently fixed that long-standing bug in U-Boot's strcmp() and a
similar one in nolibc in the linux tree. I wonder how many more
instances exist.

Rasmus


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:29       ` Rasmus Villemoes
@ 2022-12-21 15:56         ` Guenter Roeck
  2022-12-21 17:06           ` Linus Torvalds
  2022-12-21 17:49           ` Andreas Schwab
  2022-12-21 16:57         ` Geert Uytterhoeven
  1 sibling, 2 replies; 18+ messages in thread
From: Guenter Roeck @ 2022-12-21 15:56 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Geert Uytterhoeven, Jason A. Donenfeld, linux-kernel,
	linux-kbuild, linux-arch, linux-toolchains, Masahiro Yamada,
	Kees Cook, Andrew Morton, Linus Torvalds, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 04:29:11PM +0100, Rasmus Villemoes wrote:
> On 21/12/2022 16.05, Geert Uytterhoeven wrote:
> > Hi Günter,
> > 
> > On Wed, Dec 21, 2022 at 3:54 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >> On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> >>> Recently, some compile-time checking I added to the clamp_t family of
> >>> functions triggered a build error when a poorly written driver was
> >>> compiled on ARM, because the driver assumed that the naked `char` type
> >>> is signed, but ARM treats it as unsigned, and the C standard says it's
> >>> architecture-dependent.
> >>>
> >>> I doubt this particular driver is the only instance in which
> >>> unsuspecting authors make assumptions about `char` with no `signed` or
> >>> `unsigned` specifier. We were lucky enough this time that that driver
> >>> used `clamp_t(char, negative_value, positive_value)`, so the new
> >>> checking code found it, and I've sent a patch to fix it, but there are
> >>> likely other places lurking that won't be so easily unearthed.
> >>>
> >>> So let's just eliminate this particular variety of heisensign bugs
> >>> entirely. Set `-funsigned-char` globally, so that gcc makes the type
> >>> unsigned on all architectures.
> >>>
> >>> This will break things in some places and fix things in others, so this
> >>> will likely cause a bit of churn while reconciling the type misuse.
> >>>
> >>
> >> There is an interesting fallout: When running the m68k:q800 qemu emulation,
> >> there are lots of warning backtraces.
> >>
> >> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> >> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
> >> ------------[ cut here ]------------
> >> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> >> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'
> >>
> >> and so on for pretty much every entry in the alg_test_descs[] array.
> >>
> >> Bisect points to this patch, and reverting it fixes the problem.
> >>
> >> It looks like the problem is that arch/m68k/include/asm/string.h
> >> uses "char res" to store the result of strcmp(), and char is now
> >> unsigned - meaning strcmp() will now never return a value < 0.
> >> Effectively that means that strcmp() is broken on m68k if
> >> CONFIG_COLDFIRE=n.
> >>
> >> The fix is probably quite simple.
> >>
> >> diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
> >> index f759d944c449..b8f4ae19e8f6 100644
> >> --- a/arch/m68k/include/asm/string.h
> >> +++ b/arch/m68k/include/asm/string.h
> >> @@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
> >>  #define __HAVE_ARCH_STRCMP
> >>  static inline int strcmp(const char *cs, const char *ct)
> >>  {
> >> -       char res;
> >> +       signed char res;
> >>
> >>         asm ("\n"
> >>                 "1:     move.b  (%0)+,%2\n"     /* get *cs */
> >>
> >> Does that make sense ? If so I can send a patch.
> > 
> > Thanks, been there, done that
> > https://lore.kernel.org/all/bce014e60d7b1a3d1c60009fc3572e2f72591f21.1671110959.git.geert@linux-m68k.org
> 
> Well, looks like that would still leave strcmp() buggy, you can't
> represent all possible differences between two char values (signed or
> not) in an 8-bit quantity. So any implementation based on returning the
> first non-zero value of *a - *b must store that intermediate value in
> something wider. Otherwise you'll get -128 from strcmp("\x40", "\xc0"),
> but _also_ -128 when you do strcmp("\xc0", "\x40"), which is obviously
> bogus.
> 

The above assumes an unsigned char as input to strcmp(). I consider that
a hypothetical problem because "comparing" strings with upper bits
set doesn't really make sense in practice (How does one compare Günter
against Gunter ? And how about Gǖnter ?). On the other side, the problem
observed here is real and immediate.

Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:29       ` Rasmus Villemoes
  2022-12-21 15:56         ` Guenter Roeck
@ 2022-12-21 16:57         ` Geert Uytterhoeven
  1 sibling, 0 replies; 18+ messages in thread
From: Geert Uytterhoeven @ 2022-12-21 16:57 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Guenter Roeck, Jason A. Donenfeld, linux-kernel, linux-kbuild,
	linux-arch, linux-toolchains, Masahiro Yamada, Kees Cook,
	Andrew Morton, Linus Torvalds, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

Hi Rasmus,

On Wed, Dec 21, 2022 at 4:29 PM Rasmus Villemoes
<rasmus.villemoes@prevas.dk> wrote:
> On 21/12/2022 16.05, Geert Uytterhoeven wrote:
> > On Wed, Dec 21, 2022 at 3:54 PM Guenter Roeck <linux@roeck-us.net> wrote:
> >> On Wed, Oct 19, 2022 at 02:30:34PM -0600, Jason A. Donenfeld wrote:
> >>> Recently, some compile-time checking I added to the clamp_t family of
> >>> functions triggered a build error when a poorly written driver was
> >>> compiled on ARM, because the driver assumed that the naked `char` type
> >>> is signed, but ARM treats it as unsigned, and the C standard says it's
> >>> architecture-dependent.
> >>>
> >>> I doubt this particular driver is the only instance in which
> >>> unsuspecting authors make assumptions about `char` with no `signed` or
> >>> `unsigned` specifier. We were lucky enough this time that that driver
> >>> used `clamp_t(char, negative_value, positive_value)`, so the new
> >>> checking code found it, and I've sent a patch to fix it, but there are
> >>> likely other places lurking that won't be so easily unearthed.
> >>>
> >>> So let's just eliminate this particular variety of heisensign bugs
> >>> entirely. Set `-funsigned-char` globally, so that gcc makes the type
> >>> unsigned on all architectures.
> >>>
> >>> This will break things in some places and fix things in others, so this
> >>> will likely cause a bit of churn while reconciling the type misuse.
> >>>
> >>
> >> There is an interesting fallout: When running the m68k:q800 qemu emulation,
> >> there are lots of warning backtraces.
> >>
> >> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> >> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha12,aes)' before 'adiantum(xchacha20,aes)'
> >> ------------[ cut here ]------------
> >> WARNING: CPU: 0 PID: 23 at crypto/testmgr.c:5724 alg_test.part.0+0x7c/0x326
> >> testmgr: alg_test_descs entries in wrong order: 'adiantum(xchacha20,aes)' before 'aegis128'
> >>
> >> and so on for pretty much every entry in the alg_test_descs[] array.
> >>
> >> Bisect points to this patch, and reverting it fixes the problem.
> >>
> >> It looks like the problem is that arch/m68k/include/asm/string.h
> >> uses "char res" to store the result of strcmp(), and char is now
> >> unsigned - meaning strcmp() will now never return a value < 0.
> >> Effectively that means that strcmp() is broken on m68k if
> >> CONFIG_COLDFIRE=n.
> >>
> >> The fix is probably quite simple.
> >>
> >> diff --git a/arch/m68k/include/asm/string.h b/arch/m68k/include/asm/string.h
> >> index f759d944c449..b8f4ae19e8f6 100644
> >> --- a/arch/m68k/include/asm/string.h
> >> +++ b/arch/m68k/include/asm/string.h
> >> @@ -42,7 +42,7 @@ static inline char *strncpy(char *dest, const char *src, size_t n)
> >>  #define __HAVE_ARCH_STRCMP
> >>  static inline int strcmp(const char *cs, const char *ct)
> >>  {
> >> -       char res;
> >> +       signed char res;
> >>
> >>         asm ("\n"
> >>                 "1:     move.b  (%0)+,%2\n"     /* get *cs */
> >>
> >> Does that make sense ? If so I can send a patch.
> >
> > Thanks, been there, done that
> > https://lore.kernel.org/all/bce014e60d7b1a3d1c60009fc3572e2f72591f21.1671110959.git.geert@linux-m68k.org
>
> Well, looks like that would still leave strcmp() buggy, you can't
> represent all possible differences between two char values (signed or
> not) in an 8-bit quantity. So any implementation based on returning the
> first non-zero value of *a - *b must store that intermediate value in
> something wider. Otherwise you'll get -128 from strcmp("\x40", "\xc0"),
> but _also_ -128 when you do strcmp("\xc0", "\x40"), which is obviously
> bogus.

So we have https://lore.kernel.org/all/87bko3ia88.fsf@igel.home ;-)

And the other issue is m68k strcmp() calls being dropped by the
optimizer, cfr. the discussion in
https://lore.kernel.org/all/b673f98db7d14d53a6e1a1957ef81741@AcuMS.aculab.com

> I recently fixed that long-standing bug in U-Boot's strcmp() and a
> similar one in nolibc in the linux tree. I wonder how many more
> instances exist.

Thanks, commit fb63362c63c7aeac ("lib: fix buggy strcmp and strncmp") in
v2023.01-rc1, which is not yet in a released version.
(and in plain C, not in asm ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:56         ` Guenter Roeck
@ 2022-12-21 17:06           ` Linus Torvalds
  2022-12-21 17:19             ` Guenter Roeck
  2022-12-22 10:41             ` David Laight
  2022-12-21 17:49           ` Andreas Schwab
  1 sibling, 2 replies; 18+ messages in thread
From: Linus Torvalds @ 2022-12-21 17:06 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 7:56 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> The above assumes an unsigned char as input to strcmp(). I consider that
> a hypothetical problem because "comparing" strings with upper bits
> set doesn't really make sense in practice (How does one compare Günter
> against Gunter ? And how about Gǖnter ?). On the other side, the problem
> observed here is real and immediate.

POSIX does actually specify "Günter" vs "Gunter".

The way strcmp is supposed to work is to return the sign of the
difference between the byte values ("unsigned char").

But that sign has to be computed in 'int', not in 'signed char'.

So yes, the m68k implementation is broken regardless, but with a
signed char it just happened to work for the US-ASCII case that the
crypto case tested.

I think the real fix is to just remove that broken implementation
entirely, and rely on the generic one.

I'll commit that, and see what happens.

               Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 17:06           ` Linus Torvalds
@ 2022-12-21 17:19             ` Guenter Roeck
  2022-12-21 18:46               ` Linus Torvalds
  2022-12-22 10:41             ` David Laight
  1 sibling, 1 reply; 18+ messages in thread
From: Guenter Roeck @ 2022-12-21 17:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 09:06:41AM -0800, Linus Torvalds wrote:
> On Wed, Dec 21, 2022 at 7:56 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > The above assumes an unsigned char as input to strcmp(). I consider that
> > a hypothetical problem because "comparing" strings with upper bits
> > set doesn't really make sense in practice (How does one compare Günter
> > against Gunter ? And how about Gǖnter ?). On the other side, the problem
> > observed here is real and immediate.
> 
> POSIX does actually specify "Günter" vs "Gunter".
> 
> The way strcmp is supposed to work is to return the sign of the
> difference between the byte values ("unsigned char").
> 
> But that sign has to be computed in 'int', not in 'signed char'.
> 
> So yes, the m68k implementation is broken regardless, but with a
> signed char it just happened to work for the US-ASCII case that the
> crypto case tested.
> 

I understand. I just prefer a known limited breakage to completely
broken code.

> I think the real fix is to just remove that broken implementation
> entirely, and rely on the generic one.

Perfectly fine with me.

Thanks,
Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 15:56         ` Guenter Roeck
  2022-12-21 17:06           ` Linus Torvalds
@ 2022-12-21 17:49           ` Andreas Schwab
  1 sibling, 0 replies; 18+ messages in thread
From: Andreas Schwab @ 2022-12-21 17:49 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Linus Torvalds,
	Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

On Dez 21 2022, Guenter Roeck wrote:

> The above assumes an unsigned char as input to strcmp().

That's how strcmp is defined.

See <https://lore.kernel.org/all/87bko3ia88.fsf@igel.home>

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 17:19             ` Guenter Roeck
@ 2022-12-21 18:46               ` Linus Torvalds
  2022-12-21 19:08                 ` Linus Torvalds
                                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Linus Torvalds @ 2022-12-21 18:46 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 9:19 AM Guenter Roeck <linux@roeck-us.net> wrote:
>
> On Wed, Dec 21, 2022 at 09:06:41AM -0800, Linus Torvalds wrote:
> >
> > I think the real fix is to just remove that broken implementation
> > entirely, and rely on the generic one.
>
> Perfectly fine with me.

That got pushed out as commit 7c0846125358 ("m68k: remove broken
strcmp implementation") but it's obviously entirely untested. I don't
do m68k cross-compiles, much less boot tests.

Just FYI for everybody - I may have screwed something up for some very
non-obvious reason.

But it looked very obvious indeed, and I hate having buggy code that
is architecture-specific when we have generic code that isn't buggy.

                   Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 18:46               ` Linus Torvalds
@ 2022-12-21 19:08                 ` Linus Torvalds
  2022-12-21 21:01                 ` Guenter Roeck
  2022-12-22 13:05                 ` Geert Uytterhoeven
  2 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2022-12-21 19:08 UTC (permalink / raw)
  To: Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 10:46 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But it looked very obvious indeed, and I hate having buggy code that
> is architecture-specific when we have generic code that isn't buggy.

Side note: we have an x86-64 implementation that looks fine (but not
really noticeably better than the generic one) that is based on the
'return subtraction' model. But it seems to get it right.

And we have a 32-bit x86 assembly thing that is based on 'rep scasb',
that then uses the carry bit to also get things right.

That 32-bit asm goes back to Linux 0.01 (with some changes since to
use "sbbl+or" instead of a conditional neg). I was playing around a
lot with the 'rep' instructions back when, since it was all part of
"learn the instruction set" for me.

Both of them should probably be removed as pointless too, but they
don't seem actively buggy.

               Linus

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 18:46               ` Linus Torvalds
  2022-12-21 19:08                 ` Linus Torvalds
@ 2022-12-21 21:01                 ` Guenter Roeck
  2022-12-22 13:05                 ` Geert Uytterhoeven
  2 siblings, 0 replies; 18+ messages in thread
From: Guenter Roeck @ 2022-12-21 21:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

On Wed, Dec 21, 2022 at 10:46:08AM -0800, Linus Torvalds wrote:
> On Wed, Dec 21, 2022 at 9:19 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > On Wed, Dec 21, 2022 at 09:06:41AM -0800, Linus Torvalds wrote:
> > >
> > > I think the real fix is to just remove that broken implementation
> > > entirely, and rely on the generic one.
> >
> > Perfectly fine with me.
> 
> That got pushed out as commit 7c0846125358 ("m68k: remove broken
> strcmp implementation") but it's obviously entirely untested. I don't
> do m68k cross-compiles, much less boot tests.
> 
> Just FYI for everybody - I may have screwed something up for some very
> non-obvious reason.
> 
No worries:

Build reference: msi-fixes-6.2-1-2644-g0a924817d2ed

Building mcf5208evb:m5208:m5208evb_defconfig:initrd ... running ..... passed
Building q800:m68040:mac_defconfig:initrd ... running ..... passed
Building q800:m68040:mac_defconfig:rootfs ... running ..... passed

Guenter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 17:06           ` Linus Torvalds
  2022-12-21 17:19             ` Guenter Roeck
@ 2022-12-22 10:41             ` David Laight
       [not found]               ` <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>
  1 sibling, 1 reply; 18+ messages in thread
From: David Laight @ 2022-12-22 10:41 UTC (permalink / raw)
  To: 'Linus Torvalds', Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

From: Linus Torvalds
> Sent: 21 December 2022 17:07
> 
> On Wed, Dec 21, 2022 at 7:56 AM Guenter Roeck <linux@roeck-us.net> wrote:
> >
> > The above assumes an unsigned char as input to strcmp(). I consider that
> > a hypothetical problem because "comparing" strings with upper bits
> > set doesn't really make sense in practice (How does one compare Günter
> > against Gunter ? And how about Gǖnter ?). On the other side, the problem
> > observed here is real and immediate.
> 
> POSIX does actually specify "Günter" vs "Gunter".
> 
> The way strcmp is supposed to work is to return the sign of the
> difference between the byte values ("unsigned char").
> 
> But that sign has to be computed in 'int', not in 'signed char'.
> 
> So yes, the m68k implementation is broken regardless, but with a
> signed char it just happened to work for the US-ASCII case that the
> crypto case tested.
> 
> I think the real fix is to just remove that broken implementation
> entirely, and rely on the generic one.

I wonder how much slower it is - m68k is likely to be microcoded
and I don't think instruction timings are actually available.
The fastest version probably uses subx (with carry) to generate
0/-1 and leaves +delta for the other result - but getting the
compares and branches in the right order is hard.

I believe some of the other m68k asm functions are also missing
the "memory" 'clobber' and so could get mis-optimised.
While I can write (or rather have written) m68k asm I don't have
a compiler.

I also suspect that any x86 code that uses 'rep scas' is going
to be slow on anything modern.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-21 18:46               ` Linus Torvalds
  2022-12-21 19:08                 ` Linus Torvalds
  2022-12-21 21:01                 ` Guenter Roeck
@ 2022-12-22 13:05                 ` Geert Uytterhoeven
  2 siblings, 0 replies; 18+ messages in thread
From: Geert Uytterhoeven @ 2022-12-22 13:05 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Guenter Roeck, Rasmus Villemoes, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

Hi Linus,

On Wed, Dec 21, 2022 at 7:46 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Dec 21, 2022 at 9:19 AM Guenter Roeck <linux@roeck-us.net> wrote:
> > On Wed, Dec 21, 2022 at 09:06:41AM -0800, Linus Torvalds wrote:
> > > I think the real fix is to just remove that broken implementation
> > > entirely, and rely on the generic one.
> >
> > Perfectly fine with me.
>
> That got pushed out as commit 7c0846125358 ("m68k: remove broken
> strcmp implementation") but it's obviously entirely untested. I don't
> do m68k cross-compiles, much less boot tests.
>
> Just FYI for everybody - I may have screwed something up for some very
> non-obvious reason.
>
> But it looked very obvious indeed, and I hate having buggy code that
> is architecture-specific when we have generic code that isn't buggy.

Thank you for being proactive!
It works fine (and slightly reduced kernel size, too ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
       [not found]               ` <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>
@ 2022-12-24  9:47                 ` Geert Uytterhoeven
  2022-12-30 11:39                 ` David Laight
  1 sibling, 0 replies; 18+ messages in thread
From: Geert Uytterhoeven @ 2022-12-24  9:47 UTC (permalink / raw)
  To: Holger Lubitz
  Cc: David Laight, Linus Torvalds, Guenter Roeck, Rasmus Villemoes,
	Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

Hi Holger,

On Sat, Dec 24, 2022 at 10:34 AM Holger Lubitz
<holger.lubitz@t-online.de> wrote:
> On Thu, 2022-12-22 at 10:41 +0000, David Laight wrote:
> > I wonder how much slower it is - m68k is likely to be microcoded
> > and I don't think instruction timings are actually available.
>
> Not sure if these are in any way official, but
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68030timing.HTML
>
> (There's also
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68000timing.HTML
> but that is probably only interesting to demo coders by now)

Yes, instruction timings are available.  Unlike for e.g. x86, there
is only a very limited number of parts to consider.

> > I believe some of the other m68k asm functions are also missing
> > the "memory" 'clobber' and so could get mis-optimised.
>
> In which case would that happen? This function doesn't clobber memory
> and its result does get used. If gcc mistakenly thinks the parameters
> haven't changed and uses a previously cached result, wouldn't that
> apply to a C function too?

For a pure C inline function, the compiler knows exactly what it does.

For an external C function, the compiler assumes all odds are off.

For inline asm, the compiler doesn't know what happens with (the data
pointed to by) the pointers, unless that's described in the constraints.
We do have some inline asm that has "*ptr" in the constraints, but
that applies to a single value, not to an array.  And in case of
strings, the size of the array is not known without looking for the
zero-terminator.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] kbuild: treat char as always unsigned
       [not found]               ` <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>
  2022-12-24  9:47                 ` Geert Uytterhoeven
@ 2022-12-30 11:39                 ` David Laight
  2022-12-30 13:13                   ` David Laight
  2023-01-02  8:29                   ` Geert Uytterhoeven
  1 sibling, 2 replies; 18+ messages in thread
From: David Laight @ 2022-12-30 11:39 UTC (permalink / raw)
  To: 'Holger Lubitz', 'Linus Torvalds', Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

From: Holger Lubitz
> Sent: 24 December 2022 09:34
> 
> On Thu, 2022-12-22 at 10:41 +0000, David Laight wrote:
> > I wonder how much slower it is - m68k is likely to be microcoded
> > and I don't think instruction timings are actually available.
> 
> Not sure if these are in any way official, but
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68030timing.HTML

I thought about that some more and remember seeing memory timings
on a logic analyser - and getting timings that (more or less)
implied sequential execution limited by the obvious memory (cache)
accesses.

The microcoding is more apparent in the large mid-instruction
interrupt stack frames - eg for page faults.


> 
> (There's also
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68000timing.HTML
> but that is probably only interesting to demo coders by now)
> 
> > The fastest version probably uses subx (with carry) to generate
> > 0/-1 and leaves +delta for the other result - but getting the
> > compares and branches in the right order is hard.
> 
> Guess it must have been over 20 years since I wrote any 68k asm, but
> now I actually ended up installing Debian on qemu to experiment.
> 
> There are two interesting differences between 68k and x86 that can be
> useful here: Unlike x86, MOV on 68k sets the flags. And also, subx
> differs from sbb in that it resets the zero flag on a non-zero result,
> but does not set it on a zero result. So if it is set, it must have
> been set before.
> 
> Here are the two functions I came up with (tested only stand-alone, not
> in a kernel build. Also no benchmarks because this 68040 is only
> emulated)
> #1 (optimized for minimum instruction count in loop,
>     68k + Coldfire ISA_B)
> 
> int strcmp1(const char *cs, const char *ct)
> {
>         int res;
> 
>         asm ("\n"
>                 "1: move.b  (%0)+,%2\n"  /* get *cs */
>                 "   jeq     2f\n"        /* end of first string? */
>                 "   cmp.b   (%1)+,%2\n"  /* compare *ct */
>                 "   jeq     1b\n"        /* if equal, continue */
>                 "   jra     3f\n"        /* else skip to tail */
>                 "2: cmp.b   (%1)+,%2\n"  /* compare one last byte */
>                 "3: subx.l  %2, %2\n"    /* -1 if borrow, 0 if not */
>                 "   jls     4f\n"        /* if set, z is from sub.b */

The subx will set Z unless C was set.
So that doesn't seem right.

>                 "   moveq.l #1, %2\n"    /* 1 if !borrow  */
>                 "4:"
>                 : "+a" (cs), "+a" (ct), "=d" (res));
>         return res;
> }

I think this should work:
(But the jc might need to be jnc.)

                 "   moveq.l #0,%2\n"     /* zero high bits of result */
                 "1: move.b  (%1)+,%2\n"  /* get *ct */
                 "   jeq     2f\n"        /* end of second string? */
                 "   cmp.b   (%0)+,%2\n"  /* compare *cs */
                 "   jeq     1b\n"        /* if equal, continue */
                 "   jc      4f           /* return +ve */
                 "   moveq.l #-1, %2\n"   /* return -ve */
                 "   jra     4f\n"
                 "2: move.b  (%0),%2\n"   /* check for matching strings */
                 "4:"


> #2 (optimized for minimum code size,
>     Coldfire ISA_A compatible)
> 
> int strcmp2(const char *cs, const char *ct)
> {
>         int res = 0, tmp = 0;
> 
>         asm ("\n"
>                 "1: move.b (%0)+,%2\n" /* get *cs */
>                 "   move.b (%1)+,%3\n" /* get *ct */
>                 "   subx.l %3,%2\n"    /* compare a byte */
>                 "   jeq    2f\n"       /* both inputs were zero */

That doesn't seem right.
Z will be set if either *ct is zero or the bytes match.

>                 "   tst.l  %2\n"       /* check result */

This only sets Z when it was already set by the subx.

>                 "   jeq    1b\n"       /* if zero, continue */
>                 "2:"
>                 : "+a" (cs), "+a" (ct), "+d" (res), "+d" (tmp));
>         return res;
> }
> 
> However, this one needs res and tmp to be set to zero, because we read
> only bytes (no automatic zero-extend on 68k), but then do a long
> operation on them. Coldfire ISA_A dropped cmpb, it only reappeared in
> ISA_B.
> 
> So the real instruction count is likely to be two more, unless gcc
> happens to have one or two zeros it can reuse.
> 
> > I believe some of the other m68k asm functions are also missing
> > the "memory" 'clobber' and so could get mis-optimised.
> 
> In which case would that happen? This function doesn't clobber memory
> and its result does get used. If gcc mistakenly thinks the parameters
> haven't changed and uses a previously cached result, wouldn't that
> apply to a C function too?

You need a memory 'clobber' on anything that READS memory as well
as writes it.

> > While I can write (or rather have written) m68k asm I don't have
> > a compiler.
> 
> Well, I now have an emulated Quadra 800 running Debian 68k.(Getting the
> emulated networking to work reliably was a bit problematic, though. But
> now it runs Kernel 6.0) qemu could emulate Coldfire too, but I am not
> sure where I would find a distribution for that.
> 
> I did not attach a patch because it seems already to be decided that
> the function is gone. But should anyone still want to include one (or
> both) of these functions, just give credit to me and I'm fine.

Thinking further the fastest strcmp() probably uses big-endian word compares
with a check for a zero byte.
Especially on 64 bit systems that support misaligned loads.
But I'd need to think hard about the actual details.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-30 11:39                 ` David Laight
@ 2022-12-30 13:13                   ` David Laight
  2023-01-02  8:29                   ` Geert Uytterhoeven
  1 sibling, 0 replies; 18+ messages in thread
From: David Laight @ 2022-12-30 13:13 UTC (permalink / raw)
  To: David Laight, 'Holger Lubitz', 'Linus Torvalds',
	Guenter Roeck
  Cc: Rasmus Villemoes, Geert Uytterhoeven, Jason A. Donenfeld,
	linux-kernel, linux-kbuild, linux-arch, linux-toolchains,
	Masahiro Yamada, Kees Cook, Andrew Morton, Andy Shevchenko,
	Greg Kroah-Hartman, linux-m68k

....
> > int strcmp1(const char *cs, const char *ct)
> > {
> >         int res;
> >
> >         asm ("\n"
> >                 "1: move.b  (%0)+,%2\n"  /* get *cs */
> >                 "   jeq     2f\n"        /* end of first string? */
> >                 "   cmp.b   (%1)+,%2\n"  /* compare *ct */
> >                 "   jeq     1b\n"        /* if equal, continue */
> >                 "   jra     3f\n"        /* else skip to tail */
> >                 "2: cmp.b   (%1)+,%2\n"  /* compare one last byte */
> >                 "3: subx.l  %2, %2\n"    /* -1 if borrow, 0 if not */
> >                 "   jls     4f\n"        /* if set, z is from sub.b */
> 
> The subx will set Z unless C was set.
> So that doesn't seem right.

Clearly my brain was asleep earlier.
subx will clear Z not set it.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] kbuild: treat char as always unsigned
  2022-12-30 11:39                 ` David Laight
  2022-12-30 13:13                   ` David Laight
@ 2023-01-02  8:29                   ` Geert Uytterhoeven
  1 sibling, 0 replies; 18+ messages in thread
From: Geert Uytterhoeven @ 2023-01-02  8:29 UTC (permalink / raw)
  To: David Laight
  Cc: Holger Lubitz, Linus Torvalds, Guenter Roeck, Rasmus Villemoes,
	Jason A. Donenfeld, linux-kernel, linux-kbuild, linux-arch,
	linux-toolchains, Masahiro Yamada, Kees Cook, Andrew Morton,
	Andy Shevchenko, Greg Kroah-Hartman, linux-m68k

Hi David,

On Fri, Dec 30, 2022 at 12:39 PM David Laight <David.Laight@aculab.com> wrote:
> Thinking further the fastest strcmp() probably uses big-endian word compares
> with a check for a zero byte.
> Especially on 64 bit systems that support misaligned loads.
> But I'd need to think hard about the actual details.

arch/arc/lib/strcmp-archs.S
arch/csky/abiv2/strcmp.S

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2023-01-02  8:30 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Y1BcpXAjR4tmV6RQ@zx2c4.com>
     [not found] ` <20221019203034.3795710-1-Jason@zx2c4.com>
2022-12-21 14:53   ` [PATCH v2] kbuild: treat char as always unsigned Guenter Roeck
2022-12-21 15:05     ` Geert Uytterhoeven
2022-12-21 15:23       ` Guenter Roeck
2022-12-21 15:29       ` Rasmus Villemoes
2022-12-21 15:56         ` Guenter Roeck
2022-12-21 17:06           ` Linus Torvalds
2022-12-21 17:19             ` Guenter Roeck
2022-12-21 18:46               ` Linus Torvalds
2022-12-21 19:08                 ` Linus Torvalds
2022-12-21 21:01                 ` Guenter Roeck
2022-12-22 13:05                 ` Geert Uytterhoeven
2022-12-22 10:41             ` David Laight
     [not found]               ` <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>
2022-12-24  9:47                 ` Geert Uytterhoeven
2022-12-30 11:39                 ` David Laight
2022-12-30 13:13                   ` David Laight
2023-01-02  8:29                   ` Geert Uytterhoeven
2022-12-21 17:49           ` Andreas Schwab
2022-12-21 16:57         ` Geert Uytterhoeven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).