From: Matteo Croce <mcroce@linux.microsoft.com>
To: David Laight <David.Laight@aculab.com>
Cc: Nick Kossifidis <mick@ics.forth.gr>,
"linux-riscv@lists.infradead.org"
<linux-riscv@lists.infradead.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Atish Patra <atish.patra@wdc.com>,
Emil Renner Berthing <kernel@esmil.dk>,
Akira Tsukamoto <akira.tsukamoto@gmail.com>,
Drew Fustini <drew@beagleboard.org>,
Bin Meng <bmeng.cn@gmail.com>, Guo Ren <guoren@kernel.org>
Subject: Re: [PATCH v3 3/3] riscv: optimized memset
Date: Wed, 23 Jun 2021 03:14:36 +0200 [thread overview]
Message-ID: <CAFnufp1XeKM-N1MdWsNpU6NnF-dYUgGXL1W9r_DDWazTMyRHVA@mail.gmail.com> (raw)
In-Reply-To: <d0f11655f21243ad983bd24381cdc245@AcuMS.aculab.com>
On Tue, Jun 22, 2021 at 10:38 AM David Laight <David.Laight@aculab.com> wrote:
>
> From: Nick Kossifidis
> > Sent: 22 June 2021 02:08
> >
> > Στις 2021-06-17 18:27, Matteo Croce έγραψε:
> > > +
> > > +void *__memset(void *s, int c, size_t count)
> > > +{
> > > + union types dest = { .u8 = s };
> > > +
> > > + if (count >= MIN_THRESHOLD) {
> > > + const int bytes_long = BITS_PER_LONG / 8;
> >
> > You could make 'const int bytes_long = BITS_PER_LONG / 8;'
>
> What is wrong with sizeof (long) ?
> ...
Nothing, I guess that BITS_PER_LONG is just (sizeof(long) * 8) anyway
> > > + unsigned long cu = (unsigned long)c;
> > > +
> > > + /* Compose an ulong with 'c' repeated 4/8 times */
> > > + cu |= cu << 8;
> > > + cu |= cu << 16;
> > > +#if BITS_PER_LONG == 64
> > > + cu |= cu << 32;
> > > +#endif
> > > +
> >
> > You don't have to create cu here, you'll fill dest buffer with 'c'
> > anyway so after filling up enough 'c's to be able to grab an aligned
> > word full of them from dest, you can just grab that word and keep
> > filling up dest with it.
>
> That will be a lot slower - especially if run on something like x86.
> A write-read of the same size is optimised by the store-load forwarder.
> But the byte write, word read will have to go via the cache.
>
> You can just write:
> cu = (unsigned long)c * 0x0101010101010101ull;
> and let the compiler sort out the best way to generate the constant.
>
Interesting. I see that most compilers do an integer multiplication,
is it faster than three shift and three or?
clang on riscv generates even more instructions to create the immediate:
unsigned long repeat_shift(int c)
{
unsigned long cu = (unsigned long)c;
cu |= cu << 8;
cu |= cu << 16;
cu |= cu << 32;
return cu;
}
unsigned long repeat_mul(int c)
{
return (unsigned long)c * 0x0101010101010101ull;
}
repeat_shift:
slli a1, a0, 8
or a0, a0, a1
slli a1, a0, 16
or a0, a0, a1
slli a1, a0, 32
or a0, a0, a1
ret
repeat_mul:
lui a1, 4112
addiw a1, a1, 257
slli a1, a1, 16
addi a1, a1, 257
slli a1, a1, 16
addi a1, a1, 257
mul a0, a0, a1
ret
> >
> > > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> > > + /* Fill the buffer one byte at time until the destination
> > > + * is aligned on a 32/64 bit boundary.
> > > + */
> > > + for (; count && dest.uptr % bytes_long; count--)
> >
> > You could reuse & mask here instead of % bytes_long.
> >
> > > + *dest.u8++ = c;
> > > +#endif
> >
> > I noticed you also used CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on your
> > memcpy patch, is it worth it here ? To begin with riscv doesn't set it
> > and even if it did we are talking about a loop that will run just a few
> > times to reach the alignment boundary (worst case scenario it'll run 7
> > times), I don't think we gain much here, even for archs that have
> > efficient unaligned access.
>
> With CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS it probably isn't worth
> even checking the alignment.
> While aligning the copy will be quicker for an unaligned buffer they
> almost certainly don't happen often enough to worry about.
> In any case you'd want to do a misaligned word write to the start
> of the buffer - not separate byte writes.
> Provided the buffer is long enough you can also do a misaligned write
> to the end of the buffer before filling from the start.
>
I don't understand this one, a misaligned write here is ~30x slower
than an aligned one because it gets trapped and emulated in SBI.
How can this be convenient?
> I suspect you may need either barrier() or use a ptr to packed
> to avoid the perverted 'undefined behaviour' fubar.'
>
Which UB are you referring to?
Regards,
--
per aspera ad upstream
next prev parent reply other threads:[~2021-06-23 1:15 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-17 15:27 [PATCH v3 0/3] riscv: optimized mem* functions Matteo Croce
2021-06-17 15:27 ` [PATCH v3 1/3] riscv: optimized memcpy Matteo Croce
2021-06-18 14:06 ` kernel test robot
2021-06-21 14:26 ` Christoph Hellwig
2021-06-22 8:19 ` David Laight
2021-06-22 22:53 ` Matteo Croce
2021-06-22 22:00 ` Matteo Croce
2021-06-22 0:14 ` Nick Kossifidis
2021-06-22 23:35 ` Matteo Croce
2021-06-23 9:48 ` Nick Kossifidis
2021-06-17 15:27 ` [PATCH v3 2/3] riscv: optimized memmove Matteo Croce
2021-06-21 14:28 ` Christoph Hellwig
2021-06-22 0:46 ` Nick Kossifidis
2021-06-30 4:40 ` kernel test robot
2021-06-17 15:27 ` [PATCH v3 3/3] riscv: optimized memset Matteo Croce
2021-06-21 14:32 ` Christoph Hellwig
2021-06-22 1:07 ` Nick Kossifidis
2021-06-22 8:38 ` David Laight
2021-06-23 1:14 ` Matteo Croce [this message]
2021-06-23 9:05 ` David Laight
2021-06-23 0:08 ` Matteo Croce
2021-06-22 1:09 ` [PATCH v3 0/3] riscv: optimized mem* functions Nick Kossifidis
2021-06-22 2:39 ` Guo Ren
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAFnufp1XeKM-N1MdWsNpU6NnF-dYUgGXL1W9r_DDWazTMyRHVA@mail.gmail.com \
--to=mcroce@linux.microsoft.com \
--cc=David.Laight@aculab.com \
--cc=akira.tsukamoto@gmail.com \
--cc=aou@eecs.berkeley.edu \
--cc=atish.patra@wdc.com \
--cc=bmeng.cn@gmail.com \
--cc=drew@beagleboard.org \
--cc=guoren@kernel.org \
--cc=kernel@esmil.dk \
--cc=linux-arch@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=mick@ics.forth.gr \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).