linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matteo Croce <mcroce@linux.microsoft.com>
To: David Laight <David.Laight@aculab.com>
Cc: Nick Kossifidis <mick@ics.forth.gr>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Albert Ou <aou@eecs.berkeley.edu>,
	Atish Patra <atish.patra@wdc.com>,
	Emil Renner Berthing <kernel@esmil.dk>,
	Akira Tsukamoto <akira.tsukamoto@gmail.com>,
	Drew Fustini <drew@beagleboard.org>,
	Bin Meng <bmeng.cn@gmail.com>, Guo Ren <guoren@kernel.org>
Subject: Re: [PATCH v3 3/3] riscv: optimized memset
Date: Wed, 23 Jun 2021 03:14:36 +0200	[thread overview]
Message-ID: <CAFnufp1XeKM-N1MdWsNpU6NnF-dYUgGXL1W9r_DDWazTMyRHVA@mail.gmail.com> (raw)
In-Reply-To: <d0f11655f21243ad983bd24381cdc245@AcuMS.aculab.com>

On Tue, Jun 22, 2021 at 10:38 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Nick Kossifidis
> > Sent: 22 June 2021 02:08
> >
> > Στις 2021-06-17 18:27, Matteo Croce έγραψε:
> > > +
> > > +void *__memset(void *s, int c, size_t count)
> > > +{
> > > +   union types dest = { .u8 = s };
> > > +
> > > +   if (count >= MIN_THRESHOLD) {
> > > +           const int bytes_long = BITS_PER_LONG / 8;
> >
> > You could make 'const int bytes_long = BITS_PER_LONG / 8;'
>
> What is wrong with sizeof (long) ?
> ...

Nothing, I guess that BITS_PER_LONG is just (sizeof(long) * 8) anyway

> > > +           unsigned long cu = (unsigned long)c;
> > > +
> > > +           /* Compose an ulong with 'c' repeated 4/8 times */
> > > +           cu |= cu << 8;
> > > +           cu |= cu << 16;
> > > +#if BITS_PER_LONG == 64
> > > +           cu |= cu << 32;
> > > +#endif
> > > +
> >
> > You don't have to create cu here, you'll fill dest buffer with 'c'
> > anyway so after filling up enough 'c's to be able to grab an aligned
> > word full of them from dest, you can just grab that word and keep
> > filling up dest with it.
>
> That will be a lot slower - especially if run on something like x86.
> A write-read of the same size is optimised by the store-load forwarder.
> But the byte write, word read will have to go via the cache.
>
> You can just write:
>         cu = (unsigned long)c * 0x0101010101010101ull;
> and let the compiler sort out the best way to generate the constant.
>

Interesting. I see that most compilers do an integer multiplication,
is it faster than three shift and three or?

clang on riscv generates even more instructions to create the immediate:

unsigned long repeat_shift(int c)
{
  unsigned long cu = (unsigned long)c;
  cu |= cu << 8;
  cu |= cu << 16;
  cu |= cu << 32;

  return cu;
}

unsigned long repeat_mul(int c)
{
  return (unsigned long)c * 0x0101010101010101ull;
}

repeat_shift:
  slli a1, a0, 8
  or a0, a0, a1
  slli a1, a0, 16
  or a0, a0, a1
  slli a1, a0, 32
  or a0, a0, a1
  ret

repeat_mul:
  lui a1, 4112
  addiw a1, a1, 257
  slli a1, a1, 16
  addi a1, a1, 257
  slli a1, a1, 16
  addi a1, a1, 257
  mul a0, a0, a1
  ret

> >
> > > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> > > +           /* Fill the buffer one byte at time until the destination
> > > +            * is aligned on a 32/64 bit boundary.
> > > +            */
> > > +           for (; count && dest.uptr % bytes_long; count--)
> >
> > You could reuse & mask here instead of % bytes_long.
> >
> > > +                   *dest.u8++ = c;
> > > +#endif
> >
> > I noticed you also used CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on your
> > memcpy patch, is it worth it here ? To begin with riscv doesn't set it
> > and even if it did we are talking about a loop that will run just a few
> > times to reach the alignment boundary (worst case scenario it'll run 7
> > times), I don't think we gain much here, even for archs that have
> > efficient unaligned access.
>
> With CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS it probably isn't worth
> even checking the alignment.
> While aligning the copy will be quicker for an unaligned buffer they
> almost certainly don't happen often enough to worry about.
> In any case you'd want to do a misaligned word write to the start
> of the buffer - not separate byte writes.
> Provided the buffer is long enough you can also do a misaligned write
> to the end of the buffer before filling from the start.
>

I don't understand this one, a misaligned write here is ~30x slower
than an aligned one because it gets trapped and emulated in SBI.
How can this be convenient?

> I suspect you may need either barrier() or use a ptr to packed
> to avoid the perverted 'undefined behaviour' fubar.'
>

Which UB are you referring to?

Regards,
--
per aspera ad upstream

  reply	other threads:[~2021-06-23  1:15 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17 15:27 [PATCH v3 0/3] riscv: optimized mem* functions Matteo Croce
2021-06-17 15:27 ` [PATCH v3 1/3] riscv: optimized memcpy Matteo Croce
2021-06-18 14:06   ` kernel test robot
2021-06-21 14:26   ` Christoph Hellwig
2021-06-22  8:19     ` David Laight
2021-06-22 22:53       ` Matteo Croce
2021-06-22 22:00     ` Matteo Croce
2021-06-22  0:14   ` Nick Kossifidis
2021-06-22 23:35     ` Matteo Croce
2021-06-23  9:48       ` Nick Kossifidis
2021-06-17 15:27 ` [PATCH v3 2/3] riscv: optimized memmove Matteo Croce
2021-06-21 14:28   ` Christoph Hellwig
2021-06-22  0:46   ` Nick Kossifidis
2021-06-30  4:40   ` kernel test robot
2021-06-17 15:27 ` [PATCH v3 3/3] riscv: optimized memset Matteo Croce
2021-06-21 14:32   ` Christoph Hellwig
2021-06-22  1:07   ` Nick Kossifidis
2021-06-22  8:38     ` David Laight
2021-06-23  1:14       ` Matteo Croce [this message]
2021-06-23  9:05         ` David Laight
2021-06-23  0:08     ` Matteo Croce
2021-06-22  1:09 ` [PATCH v3 0/3] riscv: optimized mem* functions Nick Kossifidis
2021-06-22  2:39   ` Guo Ren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFnufp1XeKM-N1MdWsNpU6NnF-dYUgGXL1W9r_DDWazTMyRHVA@mail.gmail.com \
    --to=mcroce@linux.microsoft.com \
    --cc=David.Laight@aculab.com \
    --cc=akira.tsukamoto@gmail.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=atish.patra@wdc.com \
    --cc=bmeng.cn@gmail.com \
    --cc=drew@beagleboard.org \
    --cc=guoren@kernel.org \
    --cc=kernel@esmil.dk \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=mick@ics.forth.gr \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).