All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gary Guo <gary@garyguo.net>
To: David Laight <David.Laight@ACULAB.COM>
Cc: 'Palmer Dabbelt' <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"aou@eecs.berkeley.edu" <aou@eecs.berkeley.edu>,
	"nickhu@andestech.com" <nickhu@andestech.com>,
	"nylon7@andestech.com" <nylon7@andestech.com>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] riscv: fix memmove and optimise memcpy when misalign
Date: Tue, 25 May 2021 15:34:32 +0100	[thread overview]
Message-ID: <20210525153431.0000508d@garyguo.net> (raw)
In-Reply-To: <17637b10e71b41b89126cbb1b2fa61cf@AcuMS.aculab.com>

On Sun, 23 May 2021 17:12:23 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> From: Palmer Dabbelt
> > Sent: 23 May 2021 02:47  
> ...
> > IMO the right way to go here is to just move to C-based string
> > routines, at least until we get to the point where we're seriously
> > optimizing for specific processors.  We went with the C-based
> > string rountines in glibc as part of the upstreaming process and
> > found only some small performance differences when compared to the
> > hand-written assembly, and they're way easier to maintain.

I prefer C versions as well, and actually before commit 04091d6 we are
indeed using the generic C version. The issue is that 04091d6
introduces an assembly version that's very broken. It does not offer
and performance improvement to the C version, and breaks all processors
without hardware misalignment support (yes, firmware is expected to
trap and handle these, but they are painfully slow).

I noticed the issue because I ran Linux on my own firmware and found
that kernel couldn't boot. I didn't implement misalignment emulation at
that time (and just send the trap to the supervisor).

Because 04091d6 is accepted, my assumption is that we need an assembly
version. So I spent some time writing, testing and optimising the
assembly.

> > 
> > IIRC Linux only has trivial C string routines in lib, I think the
> > best way to go about that would be to higher performance versions
> > in there. That will allow other ports to use them.  
> 
> I certainly wonder how much benefit these massively unrolled
> loops have on modern superscaler processors - especially those
> with any form of 'out of order' execution.
> 
> It is often easy to write assembler where all the loop
> control instructions happen in parallel with the memory
> accesses - which cannot be avoided.
> Loop unrolling is so 1970s.
> 
> Sometimes you need to unroll once.
> And maybe interleave the loads and stores.
> But after that you can just be trashing the i-cache.

I didn't introduce the loop unrolling though. The loop unrolled
assembly is there before this patch, and I didn't even change the
unroll factor. I only added a path to handle misaligned case.

There are a lot of diffs because I did made some changes to the
register allocation so that the code is more optimal. I also made a few
cleanups and added a few comments. It might be easier to review if you
apply the patch locally and just look at the file.

- Gary

WARNING: multiple messages have this Message-ID (diff)
From: Gary Guo <gary@garyguo.net>
To: David Laight <David.Laight@ACULAB.COM>
Cc: 'Palmer Dabbelt' <palmer@dabbelt.com>,
	Paul Walmsley <paul.walmsley@sifive.com>,
	"aou@eecs.berkeley.edu" <aou@eecs.berkeley.edu>,
	"nickhu@andestech.com" <nickhu@andestech.com>,
	"nylon7@andestech.com" <nylon7@andestech.com>,
	"linux-riscv@lists.infradead.org"
	<linux-riscv@lists.infradead.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] riscv: fix memmove and optimise memcpy when misalign
Date: Tue, 25 May 2021 15:34:32 +0100	[thread overview]
Message-ID: <20210525153431.0000508d@garyguo.net> (raw)
In-Reply-To: <17637b10e71b41b89126cbb1b2fa61cf@AcuMS.aculab.com>

On Sun, 23 May 2021 17:12:23 +0000
David Laight <David.Laight@ACULAB.COM> wrote:

> From: Palmer Dabbelt
> > Sent: 23 May 2021 02:47  
> ...
> > IMO the right way to go here is to just move to C-based string
> > routines, at least until we get to the point where we're seriously
> > optimizing for specific processors.  We went with the C-based
> > string rountines in glibc as part of the upstreaming process and
> > found only some small performance differences when compared to the
> > hand-written assembly, and they're way easier to maintain.

I prefer C versions as well, and actually before commit 04091d6 we are
indeed using the generic C version. The issue is that 04091d6
introduces an assembly version that's very broken. It does not offer
and performance improvement to the C version, and breaks all processors
without hardware misalignment support (yes, firmware is expected to
trap and handle these, but they are painfully slow).

I noticed the issue because I ran Linux on my own firmware and found
that kernel couldn't boot. I didn't implement misalignment emulation at
that time (and just send the trap to the supervisor).

Because 04091d6 is accepted, my assumption is that we need an assembly
version. So I spent some time writing, testing and optimising the
assembly.

> > 
> > IIRC Linux only has trivial C string routines in lib, I think the
> > best way to go about that would be to higher performance versions
> > in there. That will allow other ports to use them.  
> 
> I certainly wonder how much benefit these massively unrolled
> loops have on modern superscaler processors - especially those
> with any form of 'out of order' execution.
> 
> It is often easy to write assembler where all the loop
> control instructions happen in parallel with the memory
> accesses - which cannot be avoided.
> Loop unrolling is so 1970s.
> 
> Sometimes you need to unroll once.
> And maybe interleave the loads and stores.
> But after that you can just be trashing the i-cache.

I didn't introduce the loop unrolling though. The loop unrolled
assembly is there before this patch, and I didn't even change the
unroll factor. I only added a path to handle misaligned case.

There are a lot of diffs because I did made some changes to the
register allocation so that the code is more optimal. I also made a few
cleanups and added a few comments. It might be easier to review if you
apply the patch locally and just look at the file.

- Gary

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2021-05-25 14:34 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-16 22:55 [PATCH] riscv: fix memmove and optimise memcpy when misalign Gary Guo
2021-02-16 22:55 ` Gary Guo
2021-05-13  8:13 ` Bin Meng
2021-05-13  8:13   ` Bin Meng
2021-05-22 22:22 ` Gary Guo
2021-05-22 22:22   ` Gary Guo
2021-05-23  1:47   ` Palmer Dabbelt
2021-05-23  1:47     ` Palmer Dabbelt
2021-05-23 17:12     ` David Laight
2021-05-23 17:12       ` David Laight
2021-05-25 14:34       ` Gary Guo [this message]
2021-05-25 14:34         ` Gary Guo
2021-06-15 13:40         ` Bin Meng
2021-06-15 13:40           ` Bin Meng
2021-06-15 14:08           ` David Laight
2021-06-15 14:08             ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210525153431.0000508d@garyguo.net \
    --to=gary@garyguo.net \
    --cc=David.Laight@ACULAB.COM \
    --cc=aou@eecs.berkeley.edu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=nickhu@andestech.com \
    --cc=nylon7@andestech.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.