* [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
@ 2022-07-16 0:16 Nick Desaulniers
2022-07-16 9:46 ` Arnd Bergmann
2022-07-18 16:50 ` [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem kernel test robot
0 siblings, 2 replies; 9+ messages in thread
From: Nick Desaulniers @ 2022-07-16 0:16 UTC (permalink / raw)
Cc: llvm, Arnd Bergmann, Nathan Chancellor, Miguel Ojeda,
Ard Biesheuval, Nick Desaulniers, Gary Guo, Russell King,
linux-arm-kernel, linux-kernel
Compilers frequently need to defer 64b division to a libcall with this
symbol name. It essentially is div64_u64_rem, just with a different
signature. Kernel developers know to call div64_u64_rem, but compilers
don't.
Link: https://lore.kernel.org/lkml/20220524004156.0000790e@garyguo.net/
Suggested-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
arch/arm/lib/Makefile | 1 +
arch/arm/lib/aeabi_uldivmod.c | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+)
create mode 100644 arch/arm/lib/aeabi_uldivmod.c
diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 6d2ba454f25b..3fa273219312 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -29,6 +29,7 @@ endif
obj-$(CONFIG_UACCESS_WITH_MEMCPY) += uaccess_with_memcpy.o
lib-$(CONFIG_MMU) += $(mmu-y)
+lib-$(CONFIG_AEABI) += aeabi_uldivmod.o
ifeq ($(CONFIG_CPU_32v3),y)
lib-y += io-readsw-armv3.o io-writesw-armv3.o
diff --git a/arch/arm/lib/aeabi_uldivmod.c b/arch/arm/lib/aeabi_uldivmod.c
new file mode 100644
index 000000000000..310427893195
--- /dev/null
+++ b/arch/arm/lib/aeabi_uldivmod.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Unsigned 64b division with remainder, as is typically provided by libgcc or
+ * compiler-rt.
+ *
+ * Copyright (C) 2023 Google, LLC.
+ *
+ * Author: Nick Desaulniers <ndesaulniers@google.com>
+ */
+
+#include <linux/math64.h>
+
+struct result {
+ u64 quot, rem;
+};
+
+struct result __aeabi_uldivmod(u64 numerator, u64 denominator)
+{
+ struct result res;
+
+ res.quot = div64_u64_rem(numerator, denominator, &res.rem);
+ return res;
+}
--
2.37.0.170.g444d1eabd0-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
2022-07-16 0:16 [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem Nick Desaulniers
@ 2022-07-16 9:46 ` Arnd Bergmann
2022-10-10 21:23 ` Nick Desaulniers
2022-07-18 16:50 ` [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem kernel test robot
1 sibling, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2022-07-16 9:46 UTC (permalink / raw)
To: Nick Desaulniers
Cc: clang-built-linux, Nathan Chancellor, Miguel Ojeda,
Ard Biesheuval, Gary Guo, Russell King, Linux ARM,
Linux Kernel Mailing List
On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> Compilers frequently need to defer 64b division to a libcall with this
> symbol name. It essentially is div64_u64_rem, just with a different
> signature. Kernel developers know to call div64_u64_rem, but compilers
> don't.
>
> Link: https://lore.kernel.org/lkml/20220524004156.0000790e@garyguo.net/
> Suggested-by: Gary Guo <gary@garyguo.net>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
This has historically been strongly NAK'd, and I don't think that position
has changed in the meantime. A variable-argument 64-bit division is
really expensive, especially on 32-bit machines that lack a native
32-bit division instruction, and we don't want developers to accidentally
insert one in their driver code.
Explicitly calling one of the division helpers in linux/math64.h is the
established way for driver writers to declare that a particular division
cannot be turned into a cheaper operation and is never run in a
performance critical code path. The compiler of course cannot know
about either of those.
Arnd
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
2022-07-16 0:16 [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem Nick Desaulniers
2022-07-16 9:46 ` Arnd Bergmann
@ 2022-07-18 16:50 ` kernel test robot
1 sibling, 0 replies; 9+ messages in thread
From: kernel test robot @ 2022-07-18 16:50 UTC (permalink / raw)
To: Nick Desaulniers
Cc: llvm, kbuild-all, Arnd Bergmann, Nathan Chancellor, Miguel Ojeda,
Ard Biesheuval, Nick Desaulniers, Gary Guo, Russell King,
linux-arm-kernel, linux-kernel
Hi Nick,
I love your patch! Perhaps something to improve:
[auto build test WARNING on soc/for-next]
[also build test WARNING on arm/for-next linus/master v5.19-rc7 next-20220718]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Nick-Desaulniers/arm-lib-implement-aeabi_uldivmod-via-div64_u64_rem/20220716-231205
base: https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next
config: arm-ixp4xx_defconfig (https://download.01.org/0day-ci/archive/20220719/202207190057.1gn0emAr-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d74b88c69dc2644bd0dc5d64e2d7413a0d4040e5)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install arm cross compiling tool for clang build
# apt-get install binutils-arm-linux-gnueabi
# https://github.com/intel-lab-lkp/linux/commit/f0441d7a00899e433705accb0da58c94f4ff8808
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Nick-Desaulniers/arm-lib-implement-aeabi_uldivmod-via-div64_u64_rem/20220716-231205
git checkout f0441d7a00899e433705accb0da58c94f4ff8808
# save the config file
mkdir build_dir && cp config build_dir/.config
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash
If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>
All warnings (new ones prefixed by >>):
>> arch/arm/lib/aeabi_uldivmod.c:17:15: warning: no previous prototype for function '__aeabi_uldivmod' [-Wmissing-prototypes]
struct result __aeabi_uldivmod(u64 numerator, u64 denominator)
^
arch/arm/lib/aeabi_uldivmod.c:17:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
struct result __aeabi_uldivmod(u64 numerator, u64 denominator)
^
static
1 warning generated.
vim +/__aeabi_uldivmod +17 arch/arm/lib/aeabi_uldivmod.c
16
> 17 struct result __aeabi_uldivmod(u64 numerator, u64 denominator)
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
2022-07-16 9:46 ` Arnd Bergmann
@ 2022-10-10 21:23 ` Nick Desaulniers
2022-10-10 22:13 ` Arnd Bergmann
0 siblings, 1 reply; 9+ messages in thread
From: Nick Desaulniers @ 2022-10-10 21:23 UTC (permalink / raw)
To: Arnd Bergmann
Cc: clang-built-linux, Nathan Chancellor, Miguel Ojeda,
Ard Biesheuval, Gary Guo, Russell King, Linux ARM,
Linux Kernel Mailing List, Craig Topper, Philip Reames, jh
On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > Compilers frequently need to defer 64b division to a libcall with this
> > symbol name. It essentially is div64_u64_rem, just with a different
> > signature. Kernel developers know to call div64_u64_rem, but compilers
> > don't.
> >
> > Link: https://lore.kernel.org/lkml/20220524004156.0000790e@garyguo.net/
> > Suggested-by: Gary Guo <gary@garyguo.net>
> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
So the existing division by constant issues went away, and Craig was
able to improve division by double-word constants in LLVM
1. https://reviews.llvm.org/D130862
2. https://reviews.llvm.org/D135541
But we still have one instance left that's not div/rem by constant via
CONFIG_FPE_NWFPE=y that's now blocking Android's compiler upgrade.
https://github.com/ClangBuiltLinux/linux/issues/1666
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312
Any creative ideas on how to avoid this? Perhaps putting the `aSig -=
bSig;` in inline asm? Inserting a `barrier()` or empty asm statement
into the loops also seems to work.
Otherwise I'd define __aeabi_uldivmod as static in
arch/arm/nwfpe/softfloat.c (with the body from this patch) only for
clang.
I see this function seems to be based on Berkeley Softfloat
http://www.jhauser.us/arithmetic/SoftFloat.html
v2. v3 looks like a total rewrite. Looking at v3e, it looks like
float64_rem() is now called f64_rem() and defined in f64_rem.c. It
doesn't look like there's anything from v3 that we could backport to
the kernel's v2 to avoid this.
Otherwise perhaps we just disable OABI_COMPAT for clang. Quite a few
defconfigs explicitly enable FPE_NWFPE though. Are there really a lot
of OABI binaries still in use?
There's also the hidden llvm flag:
`-mllvm -replexitval=never` that seems to work here, though FWICT it's
disabling 3 such loop elisions (I think all three statements in that
do while). That's probably the best way forward here...
https://reviews.llvm.org/D9800 made the decision to do such a
transformation when a loop can be fully elided ("deleted").
>
> This has historically been strongly NAK'd, and I don't think that position
> has changed in the meantime. A variable-argument 64-bit division is
> really expensive, especially on 32-bit machines that lack a native
> 32-bit division instruction, and we don't want developers to accidentally
> insert one in their driver code.
>
> Explicitly calling one of the division helpers in linux/math64.h is the
> established way for driver writers to declare that a particular division
> cannot be turned into a cheaper operation and is never run in a
> performance critical code path. The compiler of course cannot know
> about either of those.
>
> Arnd
--
Thanks,
~Nick Desaulniers
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
2022-10-10 21:23 ` Nick Desaulniers
@ 2022-10-10 22:13 ` Arnd Bergmann
2022-10-10 22:34 ` Nick Desaulniers
0 siblings, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2022-10-10 22:13 UTC (permalink / raw)
To: Nick Desaulniers
Cc: clang-built-linux, Nathan Chancellor, Miguel Ojeda,
Ard Biesheuvel, Gary Guo, Russell King, Linux ARM,
Linux Kernel Mailing List, Craig Topper, Philip Reames, jh
On Mon, Oct 10, 2022, at 11:23 PM, Nick Desaulniers wrote:
> On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@kernel.org> wrote:
>> On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers <ndesaulniers@google.com> wrote:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312
> Any creative ideas on how to avoid this? Perhaps putting the `aSig -=
> bSig;` in inline asm? Inserting a `barrier()` or empty asm statement
> into the loops also seems to work.
I was going to suggest adding a barrier() as well, should have
read on first ;-)
> I see this function seems to be based on Berkeley Softfloat
> http://www.jhauser.us/arithmetic/SoftFloat.html
> v2. v3 looks like a total rewrite. Looking at v3e, it looks like
> float64_rem() is now called f64_rem() and defined in f64_rem.c. It
> doesn't look like there's anything from v3 that we could backport to
> the kernel's v2 to avoid this.
>
> Otherwise perhaps we just disable OABI_COMPAT for clang. Quite a few
> defconfigs explicitly enable FPE_NWFPE though. Are there really a lot
> of OABI binaries still in use?
I'd do the minimal thing here, neither NWFPE nor OABI_COMPAT should
be too relevant here. Russell still uses some machines with old OABI
binaries, but I have not heard from anyone else using those in a
very long time.
Note that while gcc can still produce kernels that are OABI binaries,
gcc-4.6 marked the arm-linux-gnu target as obsolete and gcc-4.8
removed it, so any existing binaries were built with toolchains
predating that.
Arnd
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
2022-10-10 22:13 ` Arnd Bergmann
@ 2022-10-10 22:34 ` Nick Desaulniers
2022-10-10 22:53 ` [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod Nick Desaulniers
0 siblings, 1 reply; 9+ messages in thread
From: Nick Desaulniers @ 2022-10-10 22:34 UTC (permalink / raw)
To: Arnd Bergmann
Cc: clang-built-linux, Nathan Chancellor, Miguel Ojeda,
Ard Biesheuvel, Gary Guo, Russell King, Linux ARM,
Linux Kernel Mailing List, Craig Topper, Philip Reames, jh
On Mon, Oct 10, 2022 at 3:14 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Mon, Oct 10, 2022, at 11:23 PM, Nick Desaulniers wrote:
> > On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >> On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers <ndesaulniers@google.com> wrote:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312
> > Any creative ideas on how to avoid this? Perhaps putting the `aSig -=
> > bSig;` in inline asm? Inserting a `barrier()` or empty asm statement
> > into the loops also seems to work.
>
> I was going to suggest adding a barrier() as well, should have
> read on first ;-)
barrier() forces reloads+spills in the loop. The output with `-mllvm
-replexitval=never` is optimal (assuming the loop is faster than
__aeabi_uldivmod (which I think is unprovable).
https://godbolt.org/z/7dMabYYcM
As much I hate relying on compiler-internal flags, I think this is optimal:
```
diff --git a/arch/arm/nwfpe/Makefile b/arch/arm/nwfpe/Makefile
index 303400fa2cdf..2aec85ab1e8b 100644
--- a/arch/arm/nwfpe/Makefile
+++ b/arch/arm/nwfpe/Makefile
@@ -11,3 +11,9 @@ nwfpe-y += fpa11.o
fpa11_cpdo.o fpa11_cpdt.o \
entry.o
nwfpe-$(CONFIG_FPE_NWFPE_XP) += extended_cpdo.o
+
+# Try really hard to avoid generating calls to __aeabi_uldivmod() from
+# float64_rem() due to loop elision.
+ifdef CONFIG_CC_IS_CLANG
+CFLAGS_softfloat.o += -mllvm -replexitval=never
+endif
```
Part of me is tempted to move float64_rem() to its own file for that
flag, but indvars+loop-utils isn't eliding other loops in that file
(comparing the full disassembly before+after the above diff).
Long term, it might be nice for us to have `--rtlib` recognize
`--rtlib=linux-kernel@version` or something so that we could better
describe the effective compiler runtime to the compiler. There are
already differences in compiler-rt and libgcc where we could make
better codegen decisions if we were to consider the target rtlib.
These libraries also change over time though...
--
Thanks,
~Nick Desaulniers
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod
2022-10-10 22:34 ` Nick Desaulniers
@ 2022-10-10 22:53 ` Nick Desaulniers
2022-10-11 11:01 ` Arnd Bergmann
2022-10-13 17:04 ` Nathan Chancellor
0 siblings, 2 replies; 9+ messages in thread
From: Nick Desaulniers @ 2022-10-10 22:53 UTC (permalink / raw)
To: Arnd Bergmann, Russell King
Cc: Tom Rix, linux-arm-kernel, linux-kernel, llvm, Miguel Ojeda,
Ard Biesheuvel, Gary Guo, Craig Topper, Philip Reames, jh,
Nick Desaulniers, Nathan Chancellor
clang-15's ability to elide loops completely became more aggressive when
it can deduce how a variable is being updated in a loop. Counting down
one variable by an increment of another can be replaced by a modulo
operation.
For 64b variables on 32b ARM EABI targets, this can result in the
compiler generating calls to __aeabi_uldivmod, which it does for a do
while loop in float64_rem().
For the kernel, we'd generally prefer that developers not open code 64b
division via binary / operators and instead use the more explicit
helpers from div64.h. On arm-linux-gnuabi targets, failure to do so can
result in linkage failures due to undefined references to
__aeabi_uldivmod().
While developers can avoid open coding divisions on 64b variables, the
compiler doesn't know that the Linux kernel has a partial implementation
of a compiler runtime (--rtlib) to enforce this convention.
It's also undecidable for the compiler whether the code in question
would be faster to execute the loop vs elide it and do the 64b division.
While I actively avoid using the internal -mllvm command line flags, I
think we get better code than using barrier() here, which will force
reloads+spills in the loop for all toolchains.
Link: https://github.com/ClangBuiltLinux/linux/issues/1666
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
arch/arm/nwfpe/Makefile | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/arch/arm/nwfpe/Makefile b/arch/arm/nwfpe/Makefile
index 303400fa2cdf..2aec85ab1e8b 100644
--- a/arch/arm/nwfpe/Makefile
+++ b/arch/arm/nwfpe/Makefile
@@ -11,3 +11,9 @@ nwfpe-y += fpa11.o fpa11_cpdo.o fpa11_cpdt.o \
entry.o
nwfpe-$(CONFIG_FPE_NWFPE_XP) += extended_cpdo.o
+
+# Try really hard to avoid generating calls to __aeabi_uldivmod() from
+# float64_rem() due to loop elision.
+ifdef CONFIG_CC_IS_CLANG
+CFLAGS_softfloat.o += -mllvm -replexitval=never
+endif
--
2.38.0.rc2.412.g84df46c1b4-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod
2022-10-10 22:53 ` [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod Nick Desaulniers
@ 2022-10-11 11:01 ` Arnd Bergmann
2022-10-13 17:04 ` Nathan Chancellor
1 sibling, 0 replies; 9+ messages in thread
From: Arnd Bergmann @ 2022-10-11 11:01 UTC (permalink / raw)
To: Nick Desaulniers, Russell King
Cc: Tom Rix, linux-arm-kernel, linux-kernel, llvm, Miguel Ojeda,
Ard Biesheuvel, Gary Guo, Craig Topper, Philip Reames, jh,
Nathan Chancellor
On Tue, Oct 11, 2022, at 12:53 AM, Nick Desaulniers wrote:
> clang-15's ability to elide loops completely became more aggressive when
> it can deduce how a variable is being updated in a loop. Counting down
> one variable by an increment of another can be replaced by a modulo
> operation.
>
> For 64b variables on 32b ARM EABI targets, this can result in the
> compiler generating calls to __aeabi_uldivmod, which it does for a do
> while loop in float64_rem().
>
> For the kernel, we'd generally prefer that developers not open code 64b
> division via binary / operators and instead use the more explicit
> helpers from div64.h. On arm-linux-gnuabi targets, failure to do so can
> result in linkage failures due to undefined references to
> __aeabi_uldivmod().
>
> While developers can avoid open coding divisions on 64b variables, the
> compiler doesn't know that the Linux kernel has a partial implementation
> of a compiler runtime (--rtlib) to enforce this convention.
>
> It's also undecidable for the compiler whether the code in question
> would be faster to execute the loop vs elide it and do the 64b division.
>
> While I actively avoid using the internal -mllvm command line flags, I
> think we get better code than using barrier() here, which will force
> reloads+spills in the loop for all toolchains.
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/1666
> Reported-by: Nathan Chancellor <nathan@kernel.org>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Works for me,
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
I would have been fine with disallowing NWFPE for clang, or with
adding a barrier in the loop as well, i.e. any approach that
doesn't cause invalid behavior or a maintenance burden, given that
there is probably nobody that actually needs nwfpe on a clang built
kernel.
Arnd
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod
2022-10-10 22:53 ` [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod Nick Desaulniers
2022-10-11 11:01 ` Arnd Bergmann
@ 2022-10-13 17:04 ` Nathan Chancellor
1 sibling, 0 replies; 9+ messages in thread
From: Nathan Chancellor @ 2022-10-13 17:04 UTC (permalink / raw)
To: Nick Desaulniers
Cc: Arnd Bergmann, Russell King, Tom Rix, linux-arm-kernel,
linux-kernel, llvm, Miguel Ojeda, Ard Biesheuvel, Gary Guo,
Craig Topper, Philip Reames, jh
On Mon, Oct 10, 2022 at 03:53:42PM -0700, Nick Desaulniers wrote:
> clang-15's ability to elide loops completely became more aggressive when
> it can deduce how a variable is being updated in a loop. Counting down
> one variable by an increment of another can be replaced by a modulo
> operation.
>
> For 64b variables on 32b ARM EABI targets, this can result in the
> compiler generating calls to __aeabi_uldivmod, which it does for a do
> while loop in float64_rem().
>
> For the kernel, we'd generally prefer that developers not open code 64b
> division via binary / operators and instead use the more explicit
> helpers from div64.h. On arm-linux-gnuabi targets, failure to do so can
> result in linkage failures due to undefined references to
> __aeabi_uldivmod().
>
> While developers can avoid open coding divisions on 64b variables, the
> compiler doesn't know that the Linux kernel has a partial implementation
> of a compiler runtime (--rtlib) to enforce this convention.
>
> It's also undecidable for the compiler whether the code in question
> would be faster to execute the loop vs elide it and do the 64b division.
>
> While I actively avoid using the internal -mllvm command line flags, I
> think we get better code than using barrier() here, which will force
> reloads+spills in the loop for all toolchains.
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/1666
> Reported-by: Nathan Chancellor <nathan@kernel.org>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
I built ARCH=arm allmodconfig + CONFIG_WERROR=n without this patch and
saw the link failure then applied it and the error went away. Thanks for
all the investigation done into fixing this! I think you put this in the
patch tracker already but just for posterity:
Tested-by: Nathan Chancellor <nathan@kernel.org>
> ---
> arch/arm/nwfpe/Makefile | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/arm/nwfpe/Makefile b/arch/arm/nwfpe/Makefile
> index 303400fa2cdf..2aec85ab1e8b 100644
> --- a/arch/arm/nwfpe/Makefile
> +++ b/arch/arm/nwfpe/Makefile
> @@ -11,3 +11,9 @@ nwfpe-y += fpa11.o fpa11_cpdo.o fpa11_cpdt.o \
> entry.o
>
> nwfpe-$(CONFIG_FPE_NWFPE_XP) += extended_cpdo.o
> +
> +# Try really hard to avoid generating calls to __aeabi_uldivmod() from
> +# float64_rem() due to loop elision.
> +ifdef CONFIG_CC_IS_CLANG
> +CFLAGS_softfloat.o += -mllvm -replexitval=never
> +endif
> --
> 2.38.0.rc2.412.g84df46c1b4-goog
>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-10-13 17:04 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-16 0:16 [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem Nick Desaulniers
2022-07-16 9:46 ` Arnd Bergmann
2022-10-10 21:23 ` Nick Desaulniers
2022-10-10 22:13 ` Arnd Bergmann
2022-10-10 22:34 ` Nick Desaulniers
2022-10-10 22:53 ` [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod Nick Desaulniers
2022-10-11 11:01 ` Arnd Bergmann
2022-10-13 17:04 ` Nathan Chancellor
2022-07-18 16:50 ` [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem kernel test robot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).