All of lore.kernel.org
 help / color / mirror / Atom feed
From: Akira Tsukamoto <akira.tsukamoto@gmail.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>,
	Albert Ou <aou@eecs.berkeley.edu>, Gary Guo <gary@garyguo.net>,
	Nick Hu <nickhu@andestech.com>, Nylon Chen <nylon7@andestech.com>,
	linux-riscv@lists.infradead.org,
	Linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/1] riscv: better network performance with memcpy, uaccess
Date: Sat, 5 Jun 2021 17:02:44 +0900	[thread overview]
Message-ID: <CACuRN0MV4zNj1rBTnppoSudy98aOj2Pj6Ld1+D8mz0fn8kxGtg@mail.gmail.com> (raw)
In-Reply-To: <mhng-a3a53753-73e5-4676-93d3-33c4b8760283@palmerdabbelt-glaptop>

On Sat, Jun 5, 2021 at 1:19 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Fri, 04 Jun 2021 02:53:33 PDT (-0700), akira.tsukamoto@gmail.com wrote:
> > I am adding a cover letter to explain the history and details since
> > improvement is a combination with Gary's memcpy patch [1].
> >
> > Comparison of iperf3 benchmark results by applying Gary's memcpy patch and
> > my uaccess optimization patch. All results are from the same base kernel,
> > same rootfs and save BeagleV beta board.
> >
> > First left column : beaglev 5.13.rc4 kernel [2]
> > Second column     : Added Palmer's memcpy in C + my uaccess patch [3]
> > Third column      : Added Gary's memcpy + my uaccess patch [4]
> >
> > --- TCP recv ---
> > 686 Mbits/sec  |  700 Mbits/sec  |  904 Mbits/sec
> > 683 Mbits/sec  |  701 Mbits/sec  |  898 Mbits/sec
> > 695 Mbits/sec  |  702 Mbits/sec  |  905 Mbits/sec
> >
> > --- TCP send ---
> > 383 Mbits/sec  |  390 Mbits/sec  |  393 Mbits/sec
> > 384 Mbits/sec  |  393 Mbits/sec  |  392 Mbits/sec
> >
> > --- UDP send ---
> > 307 Mbits/sec  |  358 Mbits/sec  |  402 Mbits/sec
> > 307 Mbits/sec  |  359 Mbits/sec  |  402 Mbits/sec
> >
> > --- UDP recv ---
> > 630 Mbits/sec  |  799 Mbits/sec  |  875 Mbits/sec
> > 730 Mbits/sec  |  796 Mbits/sec  |  873 Mbits/sec
> >
> >
> > The uaccess patch is reducing pipeline stall of read after write (RAW)
> > by unroling load and store.
> > The main reason for using assembler inside uaccess.S is because the
> > __asm_to/copy_from_user() handling page fault must be done manually inside
> > the functions.
> >
> > The above result is combination from Gary $B!G (Bs memcpy speeding up
> > by reducing
> > the S-mode and M-mode switching and my uaccess reducing pipeline stall for
> > user space uses syscall with large data.
> >
> > We had a discussion of improving network performance on the BeagleV beta
> > board with Palmer.
> >
> > Palmer suggested to use C-based string routines, which checks the unaligned
> > address and use 8 bytes aligned copy if the both src and dest are aligned
> > and if not use the current copy function.
> >
> > The Gary's assembly version of memcpy is improving by not using unaligned
> > access in 64 bit boundary, uses shifting it after reading with offset of
> > aligned access, because every misaligned access is trapped and switches to
> > opensbi in M-mode. The main speed up is coming from avoiding S-mode (kernel)
> > and M-mode (opensbi) switching.
> >
> > Processing network packets require a lot of unaligned access for the packet
> > header, which is not able to change the design of the header format to be
> > aligned.
> > And user applications pass large packet data with send/recf() and sendto/
> > recvfrom() to repeat less function calls for reading and writing data for the
> > optimization.
>
> Makes sense.  I'm still not opposed to moving to a C version, but it'd
> need to be a fairly complicated one.  I think having a fast C memcpy
> would likely benefit a handful of architectures, as everything we're
> talking about is an algorithmic improvement that can be expressed in C.
>
> Given that the simple memcpy doesn't perform well for your workload, I'm
> fine taking the assembly version.

Thanks, for merging them.

I agree that having a fast C memcpy would benefit many architectures.
I will make the patches for lib/string.c by extending your memcpy and send
them after I finish other priorities. The current functions in lib/string.c
use a byte copy, while most linux capable cpus moved to 64 bits.

Akira

>
> Thanks!
>
> >
> > Akira
> >
> > [1] https://lkml.org/lkml/2021/2/16/778
> > [2] https://github.com/mcd500/linux-jh7100/tree/starlight-sdimproved
> > [3] https://github.com/mcd500/linux-jh7100/tree/starlight-sd-palmer-string
> > [4] https://github.com/mcd500/linux-jh7100/tree/starlight-sd-gary
> >
> > Akira Tsukamoto (1):
> >   riscv: prevent pipeline stall in __asm_to/copy_from_user
> >
> >  arch/riscv/lib/uaccess.S | 106 +++++++++++++++++++++++++++------------
> >  1 file changed, 73 insertions(+), 33 deletions(-)

WARNING: multiple messages have this Message-ID (diff)
From: Akira Tsukamoto <akira.tsukamoto@gmail.com>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>,
	Albert Ou <aou@eecs.berkeley.edu>, Gary Guo <gary@garyguo.net>,
	Nick Hu <nickhu@andestech.com>,
	 Nylon Chen <nylon7@andestech.com>,
	linux-riscv@lists.infradead.org,
	 Linux kernel mailing list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 0/1] riscv: better network performance with memcpy, uaccess
Date: Sat, 5 Jun 2021 17:02:44 +0900	[thread overview]
Message-ID: <CACuRN0MV4zNj1rBTnppoSudy98aOj2Pj6Ld1+D8mz0fn8kxGtg@mail.gmail.com> (raw)
In-Reply-To: <mhng-a3a53753-73e5-4676-93d3-33c4b8760283@palmerdabbelt-glaptop>

On Sat, Jun 5, 2021 at 1:19 AM Palmer Dabbelt <palmer@dabbelt.com> wrote:
>
> On Fri, 04 Jun 2021 02:53:33 PDT (-0700), akira.tsukamoto@gmail.com wrote:
> > I am adding a cover letter to explain the history and details since
> > improvement is a combination with Gary's memcpy patch [1].
> >
> > Comparison of iperf3 benchmark results by applying Gary's memcpy patch and
> > my uaccess optimization patch. All results are from the same base kernel,
> > same rootfs and save BeagleV beta board.
> >
> > First left column : beaglev 5.13.rc4 kernel [2]
> > Second column     : Added Palmer's memcpy in C + my uaccess patch [3]
> > Third column      : Added Gary's memcpy + my uaccess patch [4]
> >
> > --- TCP recv ---
> > 686 Mbits/sec  |  700 Mbits/sec  |  904 Mbits/sec
> > 683 Mbits/sec  |  701 Mbits/sec  |  898 Mbits/sec
> > 695 Mbits/sec  |  702 Mbits/sec  |  905 Mbits/sec
> >
> > --- TCP send ---
> > 383 Mbits/sec  |  390 Mbits/sec  |  393 Mbits/sec
> > 384 Mbits/sec  |  393 Mbits/sec  |  392 Mbits/sec
> >
> > --- UDP send ---
> > 307 Mbits/sec  |  358 Mbits/sec  |  402 Mbits/sec
> > 307 Mbits/sec  |  359 Mbits/sec  |  402 Mbits/sec
> >
> > --- UDP recv ---
> > 630 Mbits/sec  |  799 Mbits/sec  |  875 Mbits/sec
> > 730 Mbits/sec  |  796 Mbits/sec  |  873 Mbits/sec
> >
> >
> > The uaccess patch is reducing pipeline stall of read after write (RAW)
> > by unroling load and store.
> > The main reason for using assembler inside uaccess.S is because the
> > __asm_to/copy_from_user() handling page fault must be done manually inside
> > the functions.
> >
> > The above result is combination from Gary $B!G (Bs memcpy speeding up
> > by reducing
> > the S-mode and M-mode switching and my uaccess reducing pipeline stall for
> > user space uses syscall with large data.
> >
> > We had a discussion of improving network performance on the BeagleV beta
> > board with Palmer.
> >
> > Palmer suggested to use C-based string routines, which checks the unaligned
> > address and use 8 bytes aligned copy if the both src and dest are aligned
> > and if not use the current copy function.
> >
> > The Gary's assembly version of memcpy is improving by not using unaligned
> > access in 64 bit boundary, uses shifting it after reading with offset of
> > aligned access, because every misaligned access is trapped and switches to
> > opensbi in M-mode. The main speed up is coming from avoiding S-mode (kernel)
> > and M-mode (opensbi) switching.
> >
> > Processing network packets require a lot of unaligned access for the packet
> > header, which is not able to change the design of the header format to be
> > aligned.
> > And user applications pass large packet data with send/recf() and sendto/
> > recvfrom() to repeat less function calls for reading and writing data for the
> > optimization.
>
> Makes sense.  I'm still not opposed to moving to a C version, but it'd
> need to be a fairly complicated one.  I think having a fast C memcpy
> would likely benefit a handful of architectures, as everything we're
> talking about is an algorithmic improvement that can be expressed in C.
>
> Given that the simple memcpy doesn't perform well for your workload, I'm
> fine taking the assembly version.

Thanks, for merging them.

I agree that having a fast C memcpy would benefit many architectures.
I will make the patches for lib/string.c by extending your memcpy and send
them after I finish other priorities. The current functions in lib/string.c
use a byte copy, while most linux capable cpus moved to 64 bits.

Akira

>
> Thanks!
>
> >
> > Akira
> >
> > [1] https://lkml.org/lkml/2021/2/16/778
> > [2] https://github.com/mcd500/linux-jh7100/tree/starlight-sdimproved
> > [3] https://github.com/mcd500/linux-jh7100/tree/starlight-sd-palmer-string
> > [4] https://github.com/mcd500/linux-jh7100/tree/starlight-sd-gary
> >
> > Akira Tsukamoto (1):
> >   riscv: prevent pipeline stall in __asm_to/copy_from_user
> >
> >  arch/riscv/lib/uaccess.S | 106 +++++++++++++++++++++++++++------------
> >  1 file changed, 73 insertions(+), 33 deletions(-)

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

  reply	other threads:[~2021-06-05  8:03 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-04  9:53 [PATCH 0/1] riscv: better network performance with memcpy, uaccess Akira Tsukamoto
2021-06-04  9:53 ` Akira Tsukamoto
2021-06-04  9:56 ` [PATCH 1/1] riscv: prevent pipeline stall in __asm_to/copy_from_user Akira Tsukamoto
2021-06-04  9:56   ` Akira Tsukamoto
2021-06-08 11:31   ` David Laight
2021-06-08 11:31     ` David Laight
2021-06-12  4:05     ` Palmer Dabbelt
2021-06-12  4:05       ` Palmer Dabbelt
2021-06-12 12:17       ` David Laight
2021-06-12 12:17         ` David Laight
2021-06-16 10:24         ` Akira Tsukamoto
2021-06-16 10:24           ` Akira Tsukamoto
2021-06-16 10:08       ` Akira Tsukamoto
2021-06-16 10:08         ` Akira Tsukamoto
2021-06-04 16:19 ` [PATCH 0/1] riscv: better network performance with memcpy, uaccess Palmer Dabbelt
2021-06-04 16:19   ` Palmer Dabbelt
2021-06-05  8:02   ` Akira Tsukamoto [this message]
2021-06-05  8:02     ` Akira Tsukamoto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CACuRN0MV4zNj1rBTnppoSudy98aOj2Pj6Ld1+D8mz0fn8kxGtg@mail.gmail.com \
    --to=akira.tsukamoto@gmail.com \
    --cc=aou@eecs.berkeley.edu \
    --cc=gary@garyguo.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=nickhu@andestech.com \
    --cc=nylon7@andestech.com \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.