linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'Holger Lubitz' <holger.lubitz@t-online.de>,
	'Linus Torvalds' <torvalds@linux-foundation.org>,
	Guenter Roeck <linux@roeck-us.net>
Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-kbuild@vger.kernel.org" <linux-kbuild@vger.kernel.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"linux-toolchains@vger.kernel.org"
	<linux-toolchains@vger.kernel.org>,
	Masahiro Yamada <masahiroy@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"linux-m68k@lists.linux-m68k.org"
	<linux-m68k@lists.linux-m68k.org>
Subject: RE: [PATCH v2] kbuild: treat char as always unsigned
Date: Fri, 30 Dec 2022 11:39:35 +0000	[thread overview]
Message-ID: <357cbd67260040e4bcf17d519aaafdcb@AcuMS.aculab.com> (raw)
In-Reply-To: <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>

From: Holger Lubitz
> Sent: 24 December 2022 09:34
> 
> On Thu, 2022-12-22 at 10:41 +0000, David Laight wrote:
> > I wonder how much slower it is - m68k is likely to be microcoded
> > and I don't think instruction timings are actually available.
> 
> Not sure if these are in any way official, but
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68030timing.HTML

I thought about that some more and remember seeing memory timings
on a logic analyser - and getting timings that (more or less)
implied sequential execution limited by the obvious memory (cache)
accesses.

The microcoding is more apparent in the large mid-instruction
interrupt stack frames - eg for page faults.


> 
> (There's also
> http://oldwww.nvg.ntnu.no/amiga/MC680x0_Sections/mc68000timing.HTML
> but that is probably only interesting to demo coders by now)
> 
> > The fastest version probably uses subx (with carry) to generate
> > 0/-1 and leaves +delta for the other result - but getting the
> > compares and branches in the right order is hard.
> 
> Guess it must have been over 20 years since I wrote any 68k asm, but
> now I actually ended up installing Debian on qemu to experiment.
> 
> There are two interesting differences between 68k and x86 that can be
> useful here: Unlike x86, MOV on 68k sets the flags. And also, subx
> differs from sbb in that it resets the zero flag on a non-zero result,
> but does not set it on a zero result. So if it is set, it must have
> been set before.
> 
> Here are the two functions I came up with (tested only stand-alone, not
> in a kernel build. Also no benchmarks because this 68040 is only
> emulated)
> #1 (optimized for minimum instruction count in loop,
>     68k + Coldfire ISA_B)
> 
> int strcmp1(const char *cs, const char *ct)
> {
>         int res;
> 
>         asm ("\n"
>                 "1: move.b  (%0)+,%2\n"  /* get *cs */
>                 "   jeq     2f\n"        /* end of first string? */
>                 "   cmp.b   (%1)+,%2\n"  /* compare *ct */
>                 "   jeq     1b\n"        /* if equal, continue */
>                 "   jra     3f\n"        /* else skip to tail */
>                 "2: cmp.b   (%1)+,%2\n"  /* compare one last byte */
>                 "3: subx.l  %2, %2\n"    /* -1 if borrow, 0 if not */
>                 "   jls     4f\n"        /* if set, z is from sub.b */

The subx will set Z unless C was set.
So that doesn't seem right.

>                 "   moveq.l #1, %2\n"    /* 1 if !borrow  */
>                 "4:"
>                 : "+a" (cs), "+a" (ct), "=d" (res));
>         return res;
> }

I think this should work:
(But the jc might need to be jnc.)

                 "   moveq.l #0,%2\n"     /* zero high bits of result */
                 "1: move.b  (%1)+,%2\n"  /* get *ct */
                 "   jeq     2f\n"        /* end of second string? */
                 "   cmp.b   (%0)+,%2\n"  /* compare *cs */
                 "   jeq     1b\n"        /* if equal, continue */
                 "   jc      4f           /* return +ve */
                 "   moveq.l #-1, %2\n"   /* return -ve */
                 "   jra     4f\n"
                 "2: move.b  (%0),%2\n"   /* check for matching strings */
                 "4:"


> #2 (optimized for minimum code size,
>     Coldfire ISA_A compatible)
> 
> int strcmp2(const char *cs, const char *ct)
> {
>         int res = 0, tmp = 0;
> 
>         asm ("\n"
>                 "1: move.b (%0)+,%2\n" /* get *cs */
>                 "   move.b (%1)+,%3\n" /* get *ct */
>                 "   subx.l %3,%2\n"    /* compare a byte */
>                 "   jeq    2f\n"       /* both inputs were zero */

That doesn't seem right.
Z will be set if either *ct is zero or the bytes match.

>                 "   tst.l  %2\n"       /* check result */

This only sets Z when it was already set by the subx.

>                 "   jeq    1b\n"       /* if zero, continue */
>                 "2:"
>                 : "+a" (cs), "+a" (ct), "+d" (res), "+d" (tmp));
>         return res;
> }
> 
> However, this one needs res and tmp to be set to zero, because we read
> only bytes (no automatic zero-extend on 68k), but then do a long
> operation on them. Coldfire ISA_A dropped cmpb, it only reappeared in
> ISA_B.
> 
> So the real instruction count is likely to be two more, unless gcc
> happens to have one or two zeros it can reuse.
> 
> > I believe some of the other m68k asm functions are also missing
> > the "memory" 'clobber' and so could get mis-optimised.
> 
> In which case would that happen? This function doesn't clobber memory
> and its result does get used. If gcc mistakenly thinks the parameters
> haven't changed and uses a previously cached result, wouldn't that
> apply to a C function too?

You need a memory 'clobber' on anything that READS memory as well
as writes it.

> > While I can write (or rather have written) m68k asm I don't have
> > a compiler.
> 
> Well, I now have an emulated Quadra 800 running Debian 68k.(Getting the
> emulated networking to work reliably was a bit problematic, though. But
> now it runs Kernel 6.0) qemu could emulate Coldfire too, but I am not
> sure where I would find a distribution for that.
> 
> I did not attach a patch because it seems already to be decided that
> the function is gone. But should anyone still want to include one (or
> both) of these functions, just give credit to me and I'm fine.

Thinking further the fastest strcmp() probably uses big-endian word compares
with a check for a zero byte.
Especially on 64 bit systems that support misaligned loads.
But I'd need to think hard about the actual details.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

  parent reply	other threads:[~2022-12-30 11:39 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-19 16:26 [PATCH] kbuild: treat char as always signed Jason A. Donenfeld
2022-10-19 16:54 ` Segher Boessenkool
2022-10-19 17:14   ` Linus Torvalds
2022-10-19 17:26     ` Linus Torvalds
2022-10-19 18:10       ` Nick Desaulniers
2022-10-19 18:35         ` Linus Torvalds
2022-10-19 19:23           ` Andy Shevchenko
2022-10-19 19:36             ` Linus Torvalds
2022-10-19 17:43     ` Segher Boessenkool
2022-10-19 18:11       ` Linus Torvalds
2022-10-19 18:20         ` Nick Desaulniers
2022-10-19 18:56           ` Linus Torvalds
2022-10-19 19:11             ` Kees Cook
2022-10-19 19:30               ` Linus Torvalds
2022-10-19 20:35                 ` Jason A. Donenfeld
2022-10-20  0:10                   ` Linus Torvalds
2022-10-20  3:11                     ` Jason A. Donenfeld
2022-10-19 20:15             ` Segher Boessenkool
2022-10-19 21:07         ` David Laight
2022-10-19 21:26           ` Segher Boessenkool
2022-10-20 10:41         ` Gabriel Paubert
2022-10-21 22:46           ` Linus Torvalds
2022-10-22  6:06             ` Gabriel Paubert
2022-10-22 18:16               ` Linus Torvalds
2022-10-23 20:23                 ` Gabriel Paubert
2022-10-25 23:00                   ` Kees Cook
2022-10-26  0:04                     ` Jason A. Donenfeld
2022-10-26 15:41                       ` Kees Cook
2022-10-19 19:54 ` Linus Torvalds
2022-10-19 20:23   ` Jason A. Donenfeld
2022-10-19 20:30     ` [PATCH v2] kbuild: treat char as always unsigned Jason A. Donenfeld
2022-10-19 23:56       ` Linus Torvalds
2022-10-20  0:02         ` Jason A. Donenfeld
2022-10-20  0:38           ` Linus Torvalds
2022-10-20  2:59             ` Jason A. Donenfeld
2022-10-20 18:41             ` Kees Cook
2022-10-21  1:01               ` Jason A. Donenfeld
2022-10-20 20:24         ` Segher Boessenkool
2022-10-24  9:24       ` Dan Carpenter
2022-10-24  9:30         ` Dan Carpenter
2022-10-24 16:33           ` Jason A. Donenfeld
2022-10-24 17:10             ` Linus Torvalds
2022-10-24 17:17               ` Jason A. Donenfeld
2022-10-25 19:22                 ` Kalle Valo
2022-10-25 10:16               ` David Laight
2022-10-24 15:17         ` Jason A. Donenfeld
2022-12-21 14:53       ` Guenter Roeck
2022-12-21 15:05         ` Geert Uytterhoeven
2022-12-21 15:23           ` Guenter Roeck
2022-12-21 15:29           ` Rasmus Villemoes
2022-12-21 15:56             ` Guenter Roeck
2022-12-21 17:06               ` Linus Torvalds
2022-12-21 17:19                 ` Guenter Roeck
2022-12-21 18:46                   ` Linus Torvalds
2022-12-21 19:08                     ` Linus Torvalds
2022-12-21 21:01                     ` Guenter Roeck
2022-12-22 13:05                     ` Geert Uytterhoeven
2022-12-22 10:41                 ` David Laight
     [not found]                   ` <f02e0ac7f2d805020a7ba66803aaff3e31b5eeff.camel@t-online.de>
2022-12-24  9:47                     ` Geert Uytterhoeven
2022-12-30 11:39                     ` David Laight [this message]
2022-12-30 13:13                       ` David Laight
2023-01-02  8:29                       ` Geert Uytterhoeven
2022-12-21 17:49               ` Andreas Schwab
2022-12-21 16:57             ` Geert Uytterhoeven
2022-10-19 20:58   ` [PATCH] kbuild: treat char as always signed David Laight
2022-10-26  0:10   ` make ctype ascii only? (was [PATCH] kbuild: treat char as always signed) Rasmus Villemoes
2022-10-26 18:10     ` Linus Torvalds
2022-10-27  7:59       ` Rasmus Villemoes
2022-10-27 18:28         ` Linus Torvalds
     [not found] ` <202210201618.8XhEGsLd-lkp@intel.com>
2022-10-20 16:33   ` [PATCH] kbuild: treat char as always signed Jason A. Donenfeld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=357cbd67260040e4bcf17d519aaafdcb@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=Jason@zx2c4.com \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=geert@linux-m68k.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=holger.lubitz@t-online.de \
    --cc=keescook@chromium.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kbuild@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=masahiroy@kernel.org \
    --cc=rasmus.villemoes@prevas.dk \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).