* [PATCH] arm64: Select ARCH_HAS_FAST_MULTIPLIER
@ 2018-04-24 15:25 Robin Murphy
2018-04-25 13:41 ` Will Deacon
2018-05-16 10:51 ` Catalin Marinas
0 siblings, 2 replies; 3+ messages in thread
From: Robin Murphy @ 2018-04-24 15:25 UTC (permalink / raw)
To: linux-arm-kernel
It is probably safe to assume that all Armv8-A implementations have a
multiplier whose efficiency is comparable or better than a sequence of
three or so register-dependent arithmetic instructions. Select
ARCH_HAS_FAST_MULTIPLIER to get ever-so-slightly nicer codegen in the
few dusty old corners which care.
In a contrived benchmark calling hweight64() in a loop, this does indeed
turn out to be a small win overall, with no measurable impact on
Cortex-A57 but about 5% performance improvement on Cortex-A53.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
Apropos of stumbling across this option whilst digging down into some
bitmap-juggling code...
arch/arm64/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index eb2cf4938f6d..9c850f3b398f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -12,6 +12,7 @@ config ARM64
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
select ARCH_HAS_ELF_RANDOMIZE
+ select ARCH_HAS_FAST_MULTIPLIER
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_GCOV_PROFILE_ALL
select ARCH_HAS_GIGANTIC_PAGE if (MEMORY_ISOLATION && COMPACTION) || CMA
--
2.17.0.dirty
^ permalink raw reply related [flat|nested] 3+ messages in thread
* [PATCH] arm64: Select ARCH_HAS_FAST_MULTIPLIER
2018-04-24 15:25 [PATCH] arm64: Select ARCH_HAS_FAST_MULTIPLIER Robin Murphy
@ 2018-04-25 13:41 ` Will Deacon
2018-05-16 10:51 ` Catalin Marinas
1 sibling, 0 replies; 3+ messages in thread
From: Will Deacon @ 2018-04-25 13:41 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Apr 24, 2018 at 04:25:47PM +0100, Robin Murphy wrote:
> It is probably safe to assume that all Armv8-A implementations have a
> multiplier whose efficiency is comparable or better than a sequence of
> three or so register-dependent arithmetic instructions. Select
> ARCH_HAS_FAST_MULTIPLIER to get ever-so-slightly nicer codegen in the
> few dusty old corners which care.
>
> In a contrived benchmark calling hweight64() in a loop, this does indeed
> turn out to be a small win overall, with no measurable impact on
> Cortex-A57 but about 5% performance improvement on Cortex-A53.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
Acked-by: Will Deacon <will.deacon@arm.com>
Will
^ permalink raw reply [flat|nested] 3+ messages in thread
* [PATCH] arm64: Select ARCH_HAS_FAST_MULTIPLIER
2018-04-24 15:25 [PATCH] arm64: Select ARCH_HAS_FAST_MULTIPLIER Robin Murphy
2018-04-25 13:41 ` Will Deacon
@ 2018-05-16 10:51 ` Catalin Marinas
1 sibling, 0 replies; 3+ messages in thread
From: Catalin Marinas @ 2018-05-16 10:51 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Apr 24, 2018 at 04:25:47PM +0100, Robin Murphy wrote:
> It is probably safe to assume that all Armv8-A implementations have a
> multiplier whose efficiency is comparable or better than a sequence of
> three or so register-dependent arithmetic instructions. Select
> ARCH_HAS_FAST_MULTIPLIER to get ever-so-slightly nicer codegen in the
> few dusty old corners which care.
>
> In a contrived benchmark calling hweight64() in a loop, this does indeed
> turn out to be a small win overall, with no measurable impact on
> Cortex-A57 but about 5% performance improvement on Cortex-A53.
>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Queued for 4.18. Thanks.
--
Catalin
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2018-05-16 10:51 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-24 15:25 [PATCH] arm64: Select ARCH_HAS_FAST_MULTIPLIER Robin Murphy
2018-04-25 13:41 ` Will Deacon
2018-05-16 10:51 ` Catalin Marinas
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.