linux-riscv.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Matteo Croce <mcroce@linux.microsoft.com>
To: Nick Kossifidis <mick@ics.forth.gr>
Cc: linux-riscv <linux-riscv@lists.infradead.org>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	 Paul Walmsley <paul.walmsley@sifive.com>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	 Albert Ou <aou@eecs.berkeley.edu>,
	Atish Patra <atish.patra@wdc.com>,
	 Emil Renner Berthing <kernel@esmil.dk>,
	Akira Tsukamoto <akira.tsukamoto@gmail.com>,
	 Drew Fustini <drew@beagleboard.org>,
	Bin Meng <bmeng.cn@gmail.com>,
	 David Laight <David.Laight@aculab.com>,
	Guo Ren <guoren@kernel.org>
Subject: Re: [PATCH v3 3/3] riscv: optimized memset
Date: Wed, 23 Jun 2021 02:08:30 +0200	[thread overview]
Message-ID: <CAFnufp2w1TGtaBjfTtsBpDatgAtATRZbB4MURV3tLh1fi-W1JQ@mail.gmail.com> (raw)
In-Reply-To: <17cd289430f08f2b75b7f04242c646f6@mailhost.ics.forth.gr>

On Tue, Jun 22, 2021 at 3:07 AM Nick Kossifidis <mick@ics.forth.gr> wrote:
>
> Στις 2021-06-17 18:27, Matteo Croce έγραψε:
> > +
> > +void *__memset(void *s, int c, size_t count)
> > +{
> > +     union types dest = { .u8 = s };
> > +
> > +     if (count >= MIN_THRESHOLD) {
> > +             const int bytes_long = BITS_PER_LONG / 8;
>
> You could make 'const int bytes_long = BITS_PER_LONG / 8;' and 'const
> int mask = bytes_long - 1;' from your memcpy patch visible to memset as
> well (static const...) and use them here (mask would make more sense to
> be named as word_mask).
>

I'll do

> > +             unsigned long cu = (unsigned long)c;
> > +
> > +             /* Compose an ulong with 'c' repeated 4/8 times */
> > +             cu |= cu << 8;
> > +             cu |= cu << 16;
> > +#if BITS_PER_LONG == 64
> > +             cu |= cu << 32;
> > +#endif
> > +
>
> You don't have to create cu here, you'll fill dest buffer with 'c'
> anyway so after filling up enough 'c's to be able to grab an aligned
> word full of them from dest, you can just grab that word and keep
> filling up dest with it.
>

I tried that, but this way I have to wait 8 bytes more before starting
the memset.
And, the machine code needed to generate 'cu' is just 6 instructions on riscv:

slli a5,a0,8
or a5,a5,a0
slli a0,a5,16
or a0,a0,a5
slli a5,a0,32
or a0,a5,a0

so probably it's not worth it.

> > +#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> > +             /* Fill the buffer one byte at time until the destination
> > +              * is aligned on a 32/64 bit boundary.
> > +              */
> > +             for (; count && dest.uptr % bytes_long; count--)
>
> You could reuse & mask here instead of % bytes_long.
>

Sure, even if the machine code will be the same.

> > +                     *dest.u8++ = c;
> > +#endif
>
> I noticed you also used CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS on your
> memcpy patch, is it worth it here ? To begin with riscv doesn't set it
> and even if it did we are talking about a loop that will run just a few
> times to reach the alignment boundary (worst case scenario it'll run 7
> times), I don't think we gain much here, even for archs that have
> efficient unaligned access.

It doesn't _now_, but maybe in the future we will have a CPU which
handles unaligned accesses correctly!

-- 
per aspera ad upstream

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  parent reply	other threads:[~2021-06-23  0:09 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-17 15:27 [PATCH v3 0/3] riscv: optimized mem* functions Matteo Croce
2021-06-17 15:27 ` [PATCH v3 1/3] riscv: optimized memcpy Matteo Croce
2021-06-18 14:06   ` kernel test robot
2021-06-21 14:26   ` Christoph Hellwig
2021-06-22  8:19     ` David Laight
2021-06-22 22:53       ` Matteo Croce
2021-06-22 22:00     ` Matteo Croce
2021-06-22  0:14   ` Nick Kossifidis
2021-06-22 23:35     ` Matteo Croce
2021-06-23  9:48       ` Nick Kossifidis
2021-06-17 15:27 ` [PATCH v3 2/3] riscv: optimized memmove Matteo Croce
2021-06-21 14:28   ` Christoph Hellwig
2021-06-22  0:46   ` Nick Kossifidis
2021-06-30  4:40   ` kernel test robot
2021-06-17 15:27 ` [PATCH v3 3/3] riscv: optimized memset Matteo Croce
2021-06-21 14:32   ` Christoph Hellwig
2021-06-22  1:07   ` Nick Kossifidis
2021-06-22  8:38     ` David Laight
2021-06-23  1:14       ` Matteo Croce
2021-06-23  9:05         ` David Laight
2021-06-23  0:08     ` Matteo Croce [this message]
2021-06-22  1:09 ` [PATCH v3 0/3] riscv: optimized mem* functions Nick Kossifidis
2021-06-22  2:39   ` Guo Ren

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFnufp2w1TGtaBjfTtsBpDatgAtATRZbB4MURV3tLh1fi-W1JQ@mail.gmail.com \
    --to=mcroce@linux.microsoft.com \
    --cc=David.Laight@aculab.com \
    --cc=akira.tsukamoto@gmail.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=atish.patra@wdc.com \
    --cc=bmeng.cn@gmail.com \
    --cc=drew@beagleboard.org \
    --cc=guoren@kernel.org \
    --cc=kernel@esmil.dk \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=mick@ics.forth.gr \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).