From: Palmer Dabbelt <palmer@dabbelt.com>
To: akira.tsukamoto@gmail.com
Cc: Paul Walmsley <paul.walmsley@sifive.com>,
linux@roeck-us.net, geert@linux-m68k.org,
qiuwenbo@kylinos.com.cn, aou@eecs.berkeley.edu,
linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG
Date: Mon, 16 Aug 2021 11:09:45 -0700 (PDT) [thread overview]
Message-ID: <mhng-f83b1d51-c006-4b01-830a-0f827f0c56a1@palmerdabbelt-glaptop> (raw)
In-Reply-To: <e3e9fb3a-40b1-50f3-23cc-50bfa53baa8d@gmail.com>
On Fri, 30 Jul 2021 06:52:44 PDT (-0700), akira.tsukamoto@gmail.com wrote:
> Reduce the number of slow byte_copy when the size is in between
> 2*SZREG to 9*SZREG by using none unrolled word_copy.
>
> Without it any size smaller than 9*SZREG will be using slow byte_copy
> instead of none unrolled word_copy.
>
> Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
> ---
> arch/riscv/lib/uaccess.S | 46 ++++++++++++++++++++++++++++++++++++----
> 1 file changed, 42 insertions(+), 4 deletions(-)
>
> diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
> index 63bc691cff91..6a80d5517afc 100644
> --- a/arch/riscv/lib/uaccess.S
> +++ b/arch/riscv/lib/uaccess.S
> @@ -34,8 +34,10 @@ ENTRY(__asm_copy_from_user)
> /*
> * Use byte copy only if too small.
> * SZREG holds 4 for RV32 and 8 for RV64
> + * a3 - 2*SZREG is minimum size for word_copy
> + * 1*SZREG for aligning dst + 1*SZREG for word_copy
> */
> - li a3, 9*SZREG /* size must be larger than size in word_copy */
> + li a3, 2*SZREG
> bltu a2, a3, .Lbyte_copy_tail
>
> /*
> @@ -66,9 +68,40 @@ ENTRY(__asm_copy_from_user)
> andi a3, a1, SZREG-1
> bnez a3, .Lshift_copy
>
> +.Lcheck_size_bulk:
> + /*
> + * Evaluate the size if possible to use unrolled.
> + * The word_copy_unlrolled requires larger than 8*SZREG
> + */
> + li a3, 8*SZREG
> + add a4, a0, a3
> + bltu a4, t0, .Lword_copy_unlrolled
> +
> .Lword_copy:
> - /*
> - * Both src and dst are aligned, unrolled word copy
> + /*
> + * Both src and dst are aligned
> + * None unrolled word copy with every 1*SZREG iteration
> + *
> + * a0 - start of aligned dst
> + * a1 - start of aligned src
> + * t0 - end of aligned dst
> + */
> + bgeu a0, t0, .Lbyte_copy_tail /* check if end of copy */
> + addi t0, t0, -(SZREG) /* not to over run */
> +1:
> + REG_L a5, 0(a1)
> + addi a1, a1, SZREG
> + REG_S a5, 0(a0)
> + addi a0, a0, SZREG
> + bltu a0, t0, 1b
> +
> + addi t0, t0, SZREG /* revert to original value */
> + j .Lbyte_copy_tail
> +
> +.Lword_copy_unlrolled:
> + /*
> + * Both src and dst are aligned
> + * Unrolled word copy with every 8*SZREG iteration
> *
> * a0 - start of aligned dst
> * a1 - start of aligned src
> @@ -97,7 +130,12 @@ ENTRY(__asm_copy_from_user)
> bltu a0, t0, 2b
>
> addi t0, t0, 8*SZREG /* revert to original value */
> - j .Lbyte_copy_tail
> +
> + /*
> + * Remaining might large enough for word_copy to reduce slow byte
> + * copy
> + */
> + j .Lcheck_size_bulk
>
> .Lshift_copy:
I'm still not convinced that going all the way to such a large unrolling
factor is a net win, but this at least provides a much smoother cost
curve.
That said, this is causing my 32-bit configs to hang. There were a few
conflicts so I may have messed something up, but nothing is jumping out
at me. I've put what I ended up with on a branch, if you have time to
look that'd be great but if not then I'll take another shot at this when
I get back around to it.
https://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux.git/commit/?h=wip-word_user_copy
Here's the backtrace, though that's probably not all that useful:
[ 0.703694] Unable to handle kernel NULL pointer dereference at virtual address 000005a8
[ 0.704194] Oops [#1]
[ 0.704301] Modules linked in:[ 0.704463] CPU: 2 PID: 1 Comm: init Not tainted 5.14.0-rc1-00016-g59461ddb9dbd #5
[ 0.704660] Hardware name: riscv-virtio,qemu (DT)
[ 0.704802] epc : walk_stackframe+0xac/0xc2[ 0.704941] ra : dump_backtrace+0x1a/0x22
[ 0.705074] epc : c0004558 ra : c0004588 sp : c1c5fe10
[ 0.705216] gp : c18b41c8 tp : c1cd8000 t0 : 00000000[ 0.705357] t1 : ffffffff t2 : 00000000 s0 : c1c5fe40
[ 0.705506] s1 : c11313dc a0 : 00000000 a1 : 00000000
[ 0.705647] a2 : c06fd2c2 a3 : c11313dc a4 : c084292d[ 0.705787] a5 : 00000000 a6 : c1864cb8 a7 : 3fffffff
[ 0.705926] s2 : 00000000 s3 : c1123e88 s4 : 00000000
[ 0.706066] s5 : c11313dc s6 : c06fd2c2 s7 : 00000001[ 0.706206] s8 : 00000000 s9 : 95af6e28 s10: 00000000
[ 0.706345] s11: 00000001 t3 : 00000000 t4 : 00000000
[ 0.706482] t5 : 00000001 t6 : 00000000[ 0.706594] status: 00000100 badaddr: 000005a8 cause: 0000000d
[ 0.706809] [<c0004558>] walk_stackframe+0xac/0xc2
[ 0.707019] [<c0004588>] dump_backtrace+0x1a/0x22[ 0.707149] [<c06fd312>] show_stack+0x2c/0x38
[ 0.707271] [<c06ffba4>] dump_stack_lvl+0x40/0x58
[ 0.707400] [<c06ffbce>] dump_stack+0x12/0x1a[ 0.707521] [<c06fd4f6>] panic+0xfa/0x2a6
[ 0.707632] [<c000e2f4>] do_exit+0x7a8/0x7ac
[ 0.707749] [<c000eefa>] do_group_exit+0x2a/0x7e[ 0.707872] [<c000ef60>] __wake_up_parent+0x0/0x20
[ 0.707999] [<c0003020>] ret_from_syscall+0x0/0x2
[ 0.708385] ---[ end trace 260976561a3770d1 ]---
next prev parent reply other threads:[~2021-08-16 18:10 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-30 13:50 [PATCH 0/1] __asm_copy_to-from_user: Reduce more byte_copy Akira Tsukamoto
2021-07-30 13:52 ` [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG Akira Tsukamoto
2021-08-12 13:41 ` Guenter Roeck
2021-08-15 6:51 ` Andreas Schwab
2021-08-16 18:09 ` Palmer Dabbelt [this message]
2021-08-16 19:00 ` Andreas Schwab
2021-08-20 6:42 ` Akira Tsukamoto
2021-08-17 9:03 ` Akira Tsukamoto
2021-08-12 11:01 ` [PATCH 0/1] __asm_copy_to-from_user: Reduce more byte_copy Akira Tsukamoto
[not found] ` <61187c37.1c69fb81.ed9bd.cc45SMTPIN_ADDED_BROKEN@mx.google.com>
2021-08-16 6:24 ` Akira Tsukamoto
[not found] ` <611a33ac.1c69fb81.12aae.89a5SMTPIN_ADDED_BROKEN@mx.google.com>
2021-08-17 7:32 ` Akira Tsukamoto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mhng-f83b1d51-c006-4b01-830a-0f827f0c56a1@palmerdabbelt-glaptop \
--to=palmer@dabbelt.com \
--cc=akira.tsukamoto@gmail.com \
--cc=aou@eecs.berkeley.edu \
--cc=geert@linux-m68k.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux@roeck-us.net \
--cc=paul.walmsley@sifive.com \
--cc=qiuwenbo@kylinos.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).