From: Palmer Dabbelt <palmer@dabbelt.com>
To: maochenxi@eswin.com
Cc: aou@eecs.berkeley.edu, kernel@esmil.dk, wangqiang1@eswin.com,
Paul Walmsley <paul.walmsley@sifive.com>,
linux-kernel@vger.kernel.org, chenxi.mao2013@gmail.com,
linux-riscv@lists.infradead.org
Subject: Re: [PATCH 1/1] riscv: Enable ARCH_HAS_FAST_MULTIPLIER for RV64I
Date: Wed, 22 Jul 2020 19:13:08 -0700 (PDT) [thread overview]
Message-ID: <mhng-f0599c86-b976-4116-9228-7da783085905@palmerdabbelt-glaptop1> (raw)
In-Reply-To: <ca14f182-21e4-adb2-d6cf-da3e1cd7b104@eswin.com>
On Wed, 22 Jul 2020 18:59:12 PDT (-0700), maochenxi@eswin.com wrote:
> Hi Palmer and Emil:
>
> As Emil mentioned in previous E-mail loop, I did the same test on my kernel as well.
Sorry, I guess I crossed up my emails. I think it's best to just drop this for
now, as it doesn't actually seem to generate better code for our current
target.
>
> My kernel is based on Linux 5.8-RC6 with GCC-10.1. (ISA C extension enabled)
>
> The disassembly code as below:
>
> CONFIG_ARCH_HAS_FAST_MULTIPLIER enabled:
>
> 0000000000000000 <__sw_hweight32>:
> 0: 555557b7 lui a5,0x55555
> 4: 0015571b srliw a4,a0,0x1
> 8: 55578793 addi a5,a5,1365 # 55555555 <.LASF5+0x5555509d>
> c: 8ff9 and a5,a5,a4
> e: 9d1d subw a0,a0,a5
>
> 0000000000000010 <.LVL1>:
> 10: 333337b7 lui a5,0x33333
> 14: 33378793 addi a5,a5,819 # 33333333 <.LASF5+0x33332e7b>
> 18: 0025571b srliw a4,a0,0x2
> 1c: 8d7d and a0,a0,a5
> 1e: 8ff9 and a5,a5,a4
> 20: 9fa9 addw a5,a5,a0
> 22: 0047d51b srliw a0,a5,0x4
> 26: 9fa9 addw a5,a5,a0
> 28: 0f0f1537 lui a0,0xf0f1
> 2c: 1141 addi sp,sp,-16
> 2e: f0f50513 addi a0,a0,-241 # f0f0f0f <.LASF5+0xf0f0a57>
> 32: e422 sd s0,8(sp)
> 34: 8fe9 and a5,a5,a0
> 36: 0800 addi s0,sp,16
> 38: 0087951b slliw a0,a5,0x8
> 3c: 6422 ld s0,8(sp)
> 3e: 9d3d addw a0,a0,a5
> 40: 0105179b slliw a5,a0,0x10
> 44: 9d3d addw a0,a0,a5
> 46: 0185551b srliw a0,a0,0x18
> 4a: 0141 addi sp,sp,16
> 4c: 8082 ret
>
> CONFIG_ARCH_HAS_FAST_MULTIPLIER disabled:
>
> 000000000000004e <__sw_hweight32_default>:
> 4e: 55555737 lui a4,0x55555
> 52: 0015579b srliw a5,a0,0x1
> 56: 55570713 addi a4,a4,1365 # 55555555 <.LASF5+0x5555509d>
> 5a: 8ff9 and a5,a5,a4
> 5c: 9d1d subw a0,a0,a5
>
> 000000000000005e <.LVL3>:
> 5e: 333337b7 lui a5,0x33333
> 62: 33378793 addi a5,a5,819 # 33333333 <.LASF5+0x33332e7b>
> 66: 0025571b srliw a4,a0,0x2
> 6a: 8d7d and a0,a0,a5
> 6c: 8ff9 and a5,a5,a4
> 6e: 9fa9 addw a5,a5,a0
> 70: 0047d51b srliw a0,a5,0x4
> 74: 9d3d addw a0,a0,a5
> 76: 0f0f17b7 lui a5,0xf0f1
> 7a: 1141 addi sp,sp,-16
> 7c: f0f78793 addi a5,a5,-241 # f0f0f0f <.LASF5+0xf0f0a57>
> 80: e422 sd s0,8(sp)
> 82: 8fe9 and a5,a5,a0
> 84: 0800 addi s0,sp,16
> 86: 0087d51b srliw a0,a5,0x8
> 8a: 6422 ld s0,8(sp)
> 8c: 9fa9 addw a5,a5,a0
> 8e: 0107d51b srliw a0,a5,0x10
> 92: 9d3d addw a0,a0,a5
> 94: 0ff57513 andi a0,a0,255
> 98: 0141 addi sp,sp,16
> 9a: 8082 ret
>
> This 2 implementations is almost same but small differences.
>
> Especially in CONFIG_ARCH_HAS_FAST_MULTIPLIER condition, below code didn't use "mul" instructions.
>
> " return (w * 0x01010101) >> 24; "
>
> So I am trying to translate this code with inline assembly as below:
>
> //return (w * 0x01010101) >> 24;
> __asm__ (
> " mul %0, %0, %1\n"
> : "+r" (w)
> : "r" (w), "r"(0x01010101)
> :);
> return w >> 24;
>
> After above change, the disassambly as below:
> 0000000000000000 <__sw_hweight32>:
> 0: 555557b7 lui a5,0x55555
> 4: 0015571b srliw a4,a0,0x1
> 8: 55578793 addi a5,a5,1365 # 55555555 <.LASF5+0x55555119>
> c: 8ff9 and a5,a5,a4
> e: 9d1d subw a0,a0,a5
>
> 0000000000000010 <.LVL1>:
> 10: 333337b7 lui a5,0x33333
> 14: 0025571b srliw a4,a0,0x2
> 18: 33378793 addi a5,a5,819 # 33333333 <.LASF5+0x33332ef7>
> 1c: 8d7d and a0,a0,a5
> 1e: 8ff9 and a5,a5,a4
> 20: 9fa9 addw a5,a5,a0
> 22: 0047d71b srliw a4,a5,0x4
> 26: 9f3d addw a4,a4,a5
> 28: 0f0f17b7 lui a5,0xf0f1
> 2c: 1141 addi sp,sp,-16
> 2e: f0f78793 addi a5,a5,-241 # f0f0f0f <.LASF5+0xf0f0ad3>
> 32: e422 sd s0,8(sp)
> 34: 8ff9 and a5,a5,a4
> 36: 0800 addi s0,sp,16
> 38: 01010737 lui a4,0x1010
> 3c: 853e mv a0,a5
>
> 000000000000003e <.LVL2>:
> 3e: 1017071b addiw a4,a4,257
> 42: 02f50533 mul a0,a0,a5
> 46: 6422 ld s0,8(sp)
> 48: 0185551b srliw a0,a0,0x18
>
> "mul" instruction is leveraged as expectation, but 0x01010101 load waste several instructions.
>
> Based on this test, force to leverage "mul" instruction might be not faster than current compiler implementations.
>
> I am not sure above assembly is the best way to load 0x01010101? I checked the ISA manual, "lui" only
>
> load 20 bits per time, is this the best way to load instants?
>
>
> On the other hand, I try to compare ARM64 disassembly code:
>
> .....
>
> 4: 3200c3e2 mov w2, #0x1010101 // #16843009
>
> ......
>
> w = (w + (w >> 4)) & 0x0f0f0f0f;
> 20: 0b401000 add w0, w0, w0, lsr #4
> 24: 1200cc00 and w0, w0, #0xf0f0f0f
> return (w * 0x01010101) >> 24;
> 28: 1b027c00 mul w0, w0, w2
>
> Only one "mov" instructions to load 0x1010101 and one "mul" instruction for multiply.
>
>
> Let me summary as below:
>
> 1. GCC 10.1 cannot generate "mul" instruction when CONFIG_ARCH_HAS_FAST_MULTIPLIER enabled.
>
> 2. force to generate "mul" didn't get better because instants load waste instructions.
>
> 3. If GCC compiler behavior is best solution for this case, we could have below work around on Riscv.
>
> unsigned int __sw_hweight32(unsigned int w)
> {
> -#ifdef CONFIG_ARCH_HAS_FAST_MULTIPLIER
> +/*
> + * Risc-V could not generate mul(w) instruction in this case
> + */
> +#if defined(CONFIG_ARCH_HAS_FAST_MULTIPLIER) && !defined(CONFIG_RISCV)
> w -= (w >> 1) & 0x55555555;
> w = (w & 0x33333333) + ((w >> 2) & 0x33333333);
> w = (w + (w >> 4)) & 0x0f0f0f0f;
>
>
> Chenxi
>
>
> On 2020/7/21 上午9:17, Palmer Dabbelt wrote:
>> On Wed, 08 Jul 2020 22:19:22 PDT (-0700), maochenxi@eswin.com wrote:
>>> Enable ARCH_HAS_FAST_MULTIPLIER on RV64I
>>> which works fine on GCC-9.3 and GCC-10.1
>>>
>>> PS2: remove ARCH_SUPPORTS_INT128 because of RV64I already enabled.
>>>
>>> Signed-off-by: Chenxi Mao <maochenxi@eswin.com>
>>> ---
>>> arch/riscv/Kconfig | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index 128192e14ff2..84e6777fecad 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -202,6 +202,7 @@ config ARCH_RV64I
>>> bool "RV64I"
>>> select 64BIT
>>> select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 && GCC_VERSION >= 50000
>>> + select ARCH_HAS_FAST_MULTIPLIER
>>> select HAVE_DYNAMIC_FTRACE if MMU
>>> select HAVE_DYNAMIC_FTRACE_WITH_REGS if HAVE_DYNAMIC_FTRACE
>>> select HAVE_FTRACE_MCOUNT_RECORD
>>
>> Ah, thanks -- this one didn't show up when I was looking at the last one. I
>> think we can put the fast multiplier on rv32 and rv64, there shouldn't be any
>> difference there. I guess in theory we should be sticking this all in some
>> sort of "platform type" optimization flags, but that's probably bit much for
>> now.
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2020-07-23 2:13 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-09 5:19 [PATCH 1/1] riscv: Enable ARCH_HAS_FAST_MULTIPLIER for RV64I Chenxi Mao
2020-07-21 1:17 ` Palmer Dabbelt
2020-07-21 1:47 ` Chenxi Mao
2020-07-21 2:38 ` Chenxi Mao
2020-07-23 1:59 ` Chenxi Mao
2020-07-23 2:13 ` Palmer Dabbelt [this message]
2020-07-23 2:57 ` Chenxi Mao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=mhng-f0599c86-b976-4116-9228-7da783085905@palmerdabbelt-glaptop1 \
--to=palmer@dabbelt.com \
--cc=aou@eecs.berkeley.edu \
--cc=chenxi.mao2013@gmail.com \
--cc=kernel@esmil.dk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=maochenxi@eswin.com \
--cc=paul.walmsley@sifive.com \
--cc=wangqiang1@eswin.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).