From: Andreas Schwab <schwab@linux-m68k.org>
To: Palmer Dabbelt <palmer@dabbelt.com>
Cc: akira.tsukamoto@gmail.com,
Paul Walmsley <paul.walmsley@sifive.com>,
linux@roeck-us.net, geert@linux-m68k.org,
qiuwenbo@kylinos.com.cn, aou@eecs.berkeley.edu,
linux-riscv@lists.infradead.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG
Date: Mon, 16 Aug 2021 21:00:16 +0200 [thread overview]
Message-ID: <87zgthjjun.fsf@igel.home> (raw)
In-Reply-To: <mhng-f83b1d51-c006-4b01-830a-0f827f0c56a1@palmerdabbelt-glaptop> (Palmer Dabbelt's message of "Mon, 16 Aug 2021 11:09:45 -0700 (PDT)")
On Aug 16 2021, Palmer Dabbelt wrote:
> On Fri, 30 Jul 2021 06:52:44 PDT (-0700), akira.tsukamoto@gmail.com wrote:
>> Reduce the number of slow byte_copy when the size is in between
>> 2*SZREG to 9*SZREG by using none unrolled word_copy.
>>
>> Without it any size smaller than 9*SZREG will be using slow byte_copy
>> instead of none unrolled word_copy.
>>
>> Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
>> ---
>> arch/riscv/lib/uaccess.S | 46 ++++++++++++++++++++++++++++++++++++----
>> 1 file changed, 42 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
>> index 63bc691cff91..6a80d5517afc 100644
>> --- a/arch/riscv/lib/uaccess.S
>> +++ b/arch/riscv/lib/uaccess.S
>> @@ -34,8 +34,10 @@ ENTRY(__asm_copy_from_user)
>> /*
>> * Use byte copy only if too small.
>> * SZREG holds 4 for RV32 and 8 for RV64
>> + * a3 - 2*SZREG is minimum size for word_copy
>> + * 1*SZREG for aligning dst + 1*SZREG for word_copy
>> */
>> - li a3, 9*SZREG /* size must be larger than size in word_copy */
>> + li a3, 2*SZREG
>> bltu a2, a3, .Lbyte_copy_tail
>>
>> /*
>> @@ -66,9 +68,40 @@ ENTRY(__asm_copy_from_user)
>> andi a3, a1, SZREG-1
>> bnez a3, .Lshift_copy
>>
>> +.Lcheck_size_bulk:
>> + /*
>> + * Evaluate the size if possible to use unrolled.
>> + * The word_copy_unlrolled requires larger than 8*SZREG
>> + */
>> + li a3, 8*SZREG
>> + add a4, a0, a3
>> + bltu a4, t0, .Lword_copy_unlrolled
>> +
>> .Lword_copy:
>> - /*
>> - * Both src and dst are aligned, unrolled word copy
>> + /*
>> + * Both src and dst are aligned
>> + * None unrolled word copy with every 1*SZREG iteration
>> + *
>> + * a0 - start of aligned dst
>> + * a1 - start of aligned src
>> + * t0 - end of aligned dst
>> + */
>> + bgeu a0, t0, .Lbyte_copy_tail /* check if end of copy */
>> + addi t0, t0, -(SZREG) /* not to over run */
>> +1:
>> + REG_L a5, 0(a1)
>> + addi a1, a1, SZREG
>> + REG_S a5, 0(a0)
>> + addi a0, a0, SZREG
>> + bltu a0, t0, 1b
>> +
>> + addi t0, t0, SZREG /* revert to original value */
>> + j .Lbyte_copy_tail
>> +
>> +.Lword_copy_unlrolled:
>> + /*
>> + * Both src and dst are aligned
>> + * Unrolled word copy with every 8*SZREG iteration
>> *
>> * a0 - start of aligned dst
>> * a1 - start of aligned src
>> @@ -97,7 +130,12 @@ ENTRY(__asm_copy_from_user)
>> bltu a0, t0, 2b
>>
>> addi t0, t0, 8*SZREG /* revert to original value */
>> - j .Lbyte_copy_tail
>> +
>> + /*
>> + * Remaining might large enough for word_copy to reduce slow byte
>> + * copy
>> + */
>> + j .Lcheck_size_bulk
>>
>> .Lshift_copy:
>
> I'm still not convinced that going all the way to such a large unrolling
> factor is a net win, but this at least provides a much smoother cost
> curve.
>
> That said, this is causing my 32-bit configs to hang.
It's missing fixups for the loads in the loop.
diff --git a/arch/riscv/lib/uaccess.S b/arch/riscv/lib/uaccess.S
index a835df6bd68f..12ed1f76bd1f 100644
--- a/arch/riscv/lib/uaccess.S
+++ b/arch/riscv/lib/uaccess.S
@@ -89,9 +89,9 @@ ENTRY(__asm_copy_from_user)
bgeu a0, t0, .Lbyte_copy_tail /* check if end of copy */
addi t0, t0, -(SZREG) /* not to over run */
1:
- REG_L a5, 0(a1)
+ fixup REG_L a5, 0(a1), 10f
addi a1, a1, SZREG
- REG_S a5, 0(a0)
+ fixup REG_S a5, 0(a0), 10f
addi a0, a0, SZREG
bltu a0, t0, 1b
Andreas.
--
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1
"And now for something completely different."
next prev parent reply other threads:[~2021-08-16 19:00 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-30 13:50 [PATCH 0/1] __asm_copy_to-from_user: Reduce more byte_copy Akira Tsukamoto
2021-07-30 13:52 ` [PATCH 1/1] riscv: __asm_copy_to-from_user: Improve using word copy if size < 9*SZREG Akira Tsukamoto
2021-08-12 13:41 ` Guenter Roeck
2021-08-15 6:51 ` Andreas Schwab
2021-08-16 18:09 ` Palmer Dabbelt
2021-08-16 19:00 ` Andreas Schwab [this message]
2021-08-20 6:42 ` Akira Tsukamoto
2021-08-17 9:03 ` Akira Tsukamoto
2021-08-12 11:01 ` [PATCH 0/1] __asm_copy_to-from_user: Reduce more byte_copy Akira Tsukamoto
[not found] ` <61187c37.1c69fb81.ed9bd.cc45SMTPIN_ADDED_BROKEN@mx.google.com>
2021-08-16 6:24 ` Akira Tsukamoto
[not found] ` <611a33ac.1c69fb81.12aae.89a5SMTPIN_ADDED_BROKEN@mx.google.com>
2021-08-17 7:32 ` Akira Tsukamoto
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87zgthjjun.fsf@igel.home \
--to=schwab@linux-m68k.org \
--cc=akira.tsukamoto@gmail.com \
--cc=aou@eecs.berkeley.edu \
--cc=geert@linux-m68k.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux@roeck-us.net \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=qiuwenbo@kylinos.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).