All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] riscv: select ARCH_HAS_FAST_MULTIPLIER
@ 2023-11-21 14:43 ` Jisheng Zhang
  0 siblings, 0 replies; 6+ messages in thread
From: Jisheng Zhang @ 2023-11-21 14:43 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou; +Cc: linux-riscv, linux-kernel

Currently, riscv linux requires at least IMA, so all platforms have a
multiplier. And I assume the 'mul' efficiency is comparable or better
than a sequence of five or so register-dependent arithmetic
instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer
codegen. Refer to commit f9b4192923fa ("[PATCH] bitops: hweight()
speedup") for more details.

In a simple benchmark test calling hweight64() in a loop, it got:
about 14% preformance improvement on JH7110, tested on Milkv Mars.

about 23% performance improvement on TH1520 and SG2042, tested on
Sipeed LPI4A and SG2042 platform.

a slight performance drop on CV1800B, tested on milkv duo. Among all
riscv platforms in my hands, this is the only one which sees a slight
performance drop. It means the 'mul' isn't quick enough. However, the
situation exists on x86 too, for example, P4 doesn't have fast
integer multiplies as said in the above commit, x86 also selects
ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER
which can benefit almost riscv platforms.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
 arch/riscv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 95a2a06acc6a..e4834fa76417 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -23,6 +23,7 @@ config RISCV
 	select ARCH_HAS_DEBUG_VIRTUAL if MMU
 	select ARCH_HAS_DEBUG_VM_PGTABLE
 	select ARCH_HAS_DEBUG_WX
+	select ARCH_HAS_FAST_MULTIPLIER
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_GIGANTIC_PAGE
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] riscv: select ARCH_HAS_FAST_MULTIPLIER
@ 2023-11-21 14:43 ` Jisheng Zhang
  0 siblings, 0 replies; 6+ messages in thread
From: Jisheng Zhang @ 2023-11-21 14:43 UTC (permalink / raw)
  To: Paul Walmsley, Palmer Dabbelt, Albert Ou; +Cc: linux-riscv, linux-kernel

Currently, riscv linux requires at least IMA, so all platforms have a
multiplier. And I assume the 'mul' efficiency is comparable or better
than a sequence of five or so register-dependent arithmetic
instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer
codegen. Refer to commit f9b4192923fa ("[PATCH] bitops: hweight()
speedup") for more details.

In a simple benchmark test calling hweight64() in a loop, it got:
about 14% preformance improvement on JH7110, tested on Milkv Mars.

about 23% performance improvement on TH1520 and SG2042, tested on
Sipeed LPI4A and SG2042 platform.

a slight performance drop on CV1800B, tested on milkv duo. Among all
riscv platforms in my hands, this is the only one which sees a slight
performance drop. It means the 'mul' isn't quick enough. However, the
situation exists on x86 too, for example, P4 doesn't have fast
integer multiplies as said in the above commit, x86 also selects
ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER
which can benefit almost riscv platforms.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
---
 arch/riscv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 95a2a06acc6a..e4834fa76417 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -23,6 +23,7 @@ config RISCV
 	select ARCH_HAS_DEBUG_VIRTUAL if MMU
 	select ARCH_HAS_DEBUG_VM_PGTABLE
 	select ARCH_HAS_DEBUG_WX
+	select ARCH_HAS_FAST_MULTIPLIER
 	select ARCH_HAS_FORTIFY_SOURCE
 	select ARCH_HAS_GCOV_PROFILE_ALL
 	select ARCH_HAS_GIGANTIC_PAGE
-- 
2.42.0


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] riscv: select ARCH_HAS_FAST_MULTIPLIER
  2023-11-21 14:43 ` Jisheng Zhang
@ 2023-11-21 22:20   ` Samuel Holland
  -1 siblings, 0 replies; 6+ messages in thread
From: Samuel Holland @ 2023-11-21 22:20 UTC (permalink / raw)
  To: Jisheng Zhang, Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: linux-riscv, linux-kernel

On 2023-11-21 8:43 AM, Jisheng Zhang wrote:
> Currently, riscv linux requires at least IMA, so all platforms have a
> multiplier. And I assume the 'mul' efficiency is comparable or better
> than a sequence of five or so register-dependent arithmetic
> instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer
> codegen. Refer to commit f9b4192923fa ("[PATCH] bitops: hweight()
> speedup") for more details.
> 
> In a simple benchmark test calling hweight64() in a loop, it got:
> about 14% preformance improvement on JH7110, tested on Milkv Mars.

typo: performance

> about 23% performance improvement on TH1520 and SG2042, tested on
> Sipeed LPI4A and SG2042 platform.
> 
> a slight performance drop on CV1800B, tested on milkv duo. Among all
> riscv platforms in my hands, this is the only one which sees a slight
> performance drop. It means the 'mul' isn't quick enough. However, the
> situation exists on x86 too, for example, P4 doesn't have fast
> integer multiplies as said in the above commit, x86 also selects
> ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER
> which can benefit almost riscv platforms.

On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for __sw_hweight64.
On D1: 8% speedup for __sw_hweight32 and 8% slowdown for __sw_hweight64.

So overall still an improvement.

> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---
>  arch/riscv/Kconfig | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] riscv: select ARCH_HAS_FAST_MULTIPLIER
@ 2023-11-21 22:20   ` Samuel Holland
  0 siblings, 0 replies; 6+ messages in thread
From: Samuel Holland @ 2023-11-21 22:20 UTC (permalink / raw)
  To: Jisheng Zhang, Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: linux-riscv, linux-kernel

On 2023-11-21 8:43 AM, Jisheng Zhang wrote:
> Currently, riscv linux requires at least IMA, so all platforms have a
> multiplier. And I assume the 'mul' efficiency is comparable or better
> than a sequence of five or so register-dependent arithmetic
> instructions. Select ARCH_HAS_FAST_MULTIPLIER to get slightly nicer
> codegen. Refer to commit f9b4192923fa ("[PATCH] bitops: hweight()
> speedup") for more details.
> 
> In a simple benchmark test calling hweight64() in a loop, it got:
> about 14% preformance improvement on JH7110, tested on Milkv Mars.

typo: performance

> about 23% performance improvement on TH1520 and SG2042, tested on
> Sipeed LPI4A and SG2042 platform.
> 
> a slight performance drop on CV1800B, tested on milkv duo. Among all
> riscv platforms in my hands, this is the only one which sees a slight
> performance drop. It means the 'mul' isn't quick enough. However, the
> situation exists on x86 too, for example, P4 doesn't have fast
> integer multiplies as said in the above commit, x86 also selects
> ARCH_HAS_FAST_MULTIPLIER. So let's select ARCH_HAS_FAST_MULTIPLIER
> which can benefit almost riscv platforms.

On Unmatched: 20% speedup for __sw_hweight32 and 30% speedup for __sw_hweight64.
On D1: 8% speedup for __sw_hweight32 and 8% slowdown for __sw_hweight64.

So overall still an improvement.

> Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
> ---
>  arch/riscv/Kconfig | 1 +
>  1 file changed, 1 insertion(+)

Reviewed-by: Samuel Holland <samuel.holland@sifive.com>
Tested-by: Samuel Holland <samuel.holland@sifive.com>


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] riscv: select ARCH_HAS_FAST_MULTIPLIER
  2023-11-21 14:43 ` Jisheng Zhang
@ 2023-11-22 11:16   ` David Laight
  -1 siblings, 0 replies; 6+ messages in thread
From: David Laight @ 2023-11-22 11:16 UTC (permalink / raw)
  To: 'Jisheng Zhang', Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: linux-riscv, linux-kernel

...
> However, the
> situation exists on x86 too, for example, P4 doesn't have fast
> integer multiplies ...

P4 doesn't have fast anything :-)

More interestingly does anything modern not have fast multiplies?
(that you'd consider running Linux on).

The silicon required isn't that big.
IIRC an 'n' bit multiplier requires n^2 full adders and has
the latency of 2n adders.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] riscv: select ARCH_HAS_FAST_MULTIPLIER
@ 2023-11-22 11:16   ` David Laight
  0 siblings, 0 replies; 6+ messages in thread
From: David Laight @ 2023-11-22 11:16 UTC (permalink / raw)
  To: 'Jisheng Zhang', Paul Walmsley, Palmer Dabbelt, Albert Ou
  Cc: linux-riscv, linux-kernel

...
> However, the
> situation exists on x86 too, for example, P4 doesn't have fast
> integer multiplies ...

P4 doesn't have fast anything :-)

More interestingly does anything modern not have fast multiplies?
(that you'd consider running Linux on).

The silicon required isn't that big.
IIRC an 'n' bit multiplier requires n^2 full adders and has
the latency of 2n adders.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-11-22 11:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-21 14:43 [PATCH] riscv: select ARCH_HAS_FAST_MULTIPLIER Jisheng Zhang
2023-11-21 14:43 ` Jisheng Zhang
2023-11-21 22:20 ` Samuel Holland
2023-11-21 22:20   ` Samuel Holland
2023-11-22 11:16 ` David Laight
2023-11-22 11:16   ` David Laight

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.