linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
@ 2022-07-16  0:16 Nick Desaulniers
  2022-07-16  9:46 ` Arnd Bergmann
  2022-07-18 16:50 ` [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem kernel test robot
  0 siblings, 2 replies; 9+ messages in thread
From: Nick Desaulniers @ 2022-07-16  0:16 UTC (permalink / raw)
  Cc: llvm, Arnd Bergmann, Nathan Chancellor, Miguel Ojeda,
	Ard Biesheuval, Nick Desaulniers, Gary Guo, Russell King,
	linux-arm-kernel, linux-kernel

Compilers frequently need to defer 64b division to a libcall with this
symbol name. It essentially is div64_u64_rem, just with a different
signature. Kernel developers know to call div64_u64_rem, but compilers
don't.

Link: https://lore.kernel.org/lkml/20220524004156.0000790e@garyguo.net/
Suggested-by: Gary Guo <gary@garyguo.net>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/arm/lib/Makefile         |  1 +
 arch/arm/lib/aeabi_uldivmod.c | 23 +++++++++++++++++++++++
 2 files changed, 24 insertions(+)
 create mode 100644 arch/arm/lib/aeabi_uldivmod.c

diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
index 6d2ba454f25b..3fa273219312 100644
--- a/arch/arm/lib/Makefile
+++ b/arch/arm/lib/Makefile
@@ -29,6 +29,7 @@ endif
 obj-$(CONFIG_UACCESS_WITH_MEMCPY) += uaccess_with_memcpy.o
 
 lib-$(CONFIG_MMU) += $(mmu-y)
+lib-$(CONFIG_AEABI) += aeabi_uldivmod.o
 
 ifeq ($(CONFIG_CPU_32v3),y)
   lib-y	+= io-readsw-armv3.o io-writesw-armv3.o
diff --git a/arch/arm/lib/aeabi_uldivmod.c b/arch/arm/lib/aeabi_uldivmod.c
new file mode 100644
index 000000000000..310427893195
--- /dev/null
+++ b/arch/arm/lib/aeabi_uldivmod.c
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Unsigned 64b division with remainder, as is typically provided by libgcc or
+ * compiler-rt.
+ *
+ * Copyright (C) 2023 Google, LLC.
+ *
+ * Author: Nick Desaulniers <ndesaulniers@google.com>
+ */
+
+#include <linux/math64.h>
+
+struct result {
+	u64 quot, rem;
+};
+
+struct result __aeabi_uldivmod(u64 numerator, u64 denominator)
+{
+	struct result res;
+
+	res.quot = div64_u64_rem(numerator, denominator, &res.rem);
+	return res;
+}
-- 
2.37.0.170.g444d1eabd0-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
  2022-07-16  0:16 [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem Nick Desaulniers
@ 2022-07-16  9:46 ` Arnd Bergmann
  2022-10-10 21:23   ` Nick Desaulniers
  2022-07-18 16:50 ` [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem kernel test robot
  1 sibling, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2022-07-16  9:46 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: clang-built-linux, Nathan Chancellor, Miguel Ojeda,
	Ard Biesheuval, Gary Guo, Russell King, Linux ARM,
	Linux Kernel Mailing List

On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> Compilers frequently need to defer 64b division to a libcall with this
> symbol name. It essentially is div64_u64_rem, just with a different
> signature. Kernel developers know to call div64_u64_rem, but compilers
> don't.
>
> Link: https://lore.kernel.org/lkml/20220524004156.0000790e@garyguo.net/
> Suggested-by: Gary Guo <gary@garyguo.net>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

This has historically been strongly NAK'd, and I don't think that position
has changed in the meantime. A variable-argument 64-bit division is
really expensive, especially on 32-bit machines that lack a native
32-bit division instruction, and we don't want developers to accidentally
insert one in their driver code.

Explicitly calling one of the division helpers in linux/math64.h is the
established way for driver writers to declare that a particular division
cannot be turned into a cheaper operation and is never run in a
performance critical code path. The compiler of course cannot know
about either of those.

        Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
  2022-07-16  0:16 [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem Nick Desaulniers
  2022-07-16  9:46 ` Arnd Bergmann
@ 2022-07-18 16:50 ` kernel test robot
  1 sibling, 0 replies; 9+ messages in thread
From: kernel test robot @ 2022-07-18 16:50 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: llvm, kbuild-all, Arnd Bergmann, Nathan Chancellor, Miguel Ojeda,
	Ard Biesheuval, Nick Desaulniers, Gary Guo, Russell King,
	linux-arm-kernel, linux-kernel

Hi Nick,

I love your patch! Perhaps something to improve:

[auto build test WARNING on soc/for-next]
[also build test WARNING on arm/for-next linus/master v5.19-rc7 next-20220718]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Nick-Desaulniers/arm-lib-implement-aeabi_uldivmod-via-div64_u64_rem/20220716-231205
base:   https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next
config: arm-ixp4xx_defconfig (https://download.01.org/0day-ci/archive/20220719/202207190057.1gn0emAr-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d74b88c69dc2644bd0dc5d64e2d7413a0d4040e5)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/intel-lab-lkp/linux/commit/f0441d7a00899e433705accb0da58c94f4ff8808
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Nick-Desaulniers/arm-lib-implement-aeabi_uldivmod-via-div64_u64_rem/20220716-231205
        git checkout f0441d7a00899e433705accb0da58c94f4ff8808
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> arch/arm/lib/aeabi_uldivmod.c:17:15: warning: no previous prototype for function '__aeabi_uldivmod' [-Wmissing-prototypes]
   struct result __aeabi_uldivmod(u64 numerator, u64 denominator)
                 ^
   arch/arm/lib/aeabi_uldivmod.c:17:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   struct result __aeabi_uldivmod(u64 numerator, u64 denominator)
   ^
   static 
   1 warning generated.


vim +/__aeabi_uldivmod +17 arch/arm/lib/aeabi_uldivmod.c

    16	
  > 17	struct result __aeabi_uldivmod(u64 numerator, u64 denominator)

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
  2022-07-16  9:46 ` Arnd Bergmann
@ 2022-10-10 21:23   ` Nick Desaulniers
  2022-10-10 22:13     ` Arnd Bergmann
  0 siblings, 1 reply; 9+ messages in thread
From: Nick Desaulniers @ 2022-10-10 21:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: clang-built-linux, Nathan Chancellor, Miguel Ojeda,
	Ard Biesheuval, Gary Guo, Russell King, Linux ARM,
	Linux Kernel Mailing List, Craig Topper, Philip Reames, jh

On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > Compilers frequently need to defer 64b division to a libcall with this
> > symbol name. It essentially is div64_u64_rem, just with a different
> > signature. Kernel developers know to call div64_u64_rem, but compilers
> > don't.
> >
> > Link: https://lore.kernel.org/lkml/20220524004156.0000790e@garyguo.net/
> > Suggested-by: Gary Guo <gary@garyguo.net>
> > Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

So the existing division by constant issues went away, and Craig was
able to improve division by double-word constants in LLVM
1. https://reviews.llvm.org/D130862
2. https://reviews.llvm.org/D135541
But we still have one instance left that's not div/rem by constant via
CONFIG_FPE_NWFPE=y that's now blocking Android's compiler upgrade.
https://github.com/ClangBuiltLinux/linux/issues/1666

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312
Any creative ideas on how to avoid this? Perhaps putting the `aSig -=
bSig;` in inline asm? Inserting a `barrier()` or empty asm statement
into the loops also seems to work.

Otherwise I'd define __aeabi_uldivmod as static in
arch/arm/nwfpe/softfloat.c (with the body from this patch) only for
clang.

I see this function seems to be based on Berkeley Softfloat
http://www.jhauser.us/arithmetic/SoftFloat.html
v2.  v3 looks like a total rewrite.  Looking at v3e, it looks like
float64_rem() is now called f64_rem() and defined in f64_rem.c.  It
doesn't look like there's anything from v3 that we could backport to
the kernel's v2 to avoid this.

Otherwise perhaps we just disable OABI_COMPAT for clang. Quite a few
defconfigs explicitly enable FPE_NWFPE though.  Are there really a lot
of OABI binaries still in use?

There's also the hidden llvm flag:
`-mllvm -replexitval=never` that seems to work here, though FWICT it's
disabling 3 such loop elisions (I think all three statements in that
do while).  That's probably the best way forward here...

https://reviews.llvm.org/D9800 made the decision to do such a
transformation when a loop can be fully elided ("deleted").

>
> This has historically been strongly NAK'd, and I don't think that position
> has changed in the meantime. A variable-argument 64-bit division is
> really expensive, especially on 32-bit machines that lack a native
> 32-bit division instruction, and we don't want developers to accidentally
> insert one in their driver code.
>
> Explicitly calling one of the division helpers in linux/math64.h is the
> established way for driver writers to declare that a particular division
> cannot be turned into a cheaper operation and is never run in a
> performance critical code path. The compiler of course cannot know
> about either of those.
>
>         Arnd



-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
  2022-10-10 21:23   ` Nick Desaulniers
@ 2022-10-10 22:13     ` Arnd Bergmann
  2022-10-10 22:34       ` Nick Desaulniers
  0 siblings, 1 reply; 9+ messages in thread
From: Arnd Bergmann @ 2022-10-10 22:13 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: clang-built-linux, Nathan Chancellor, Miguel Ojeda,
	Ard Biesheuvel, Gary Guo, Russell King, Linux ARM,
	Linux Kernel Mailing List, Craig Topper, Philip Reames, jh

On Mon, Oct 10, 2022, at 11:23 PM, Nick Desaulniers wrote:
> On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@kernel.org> wrote:
>> On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers <ndesaulniers@google.com> wrote:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312
> Any creative ideas on how to avoid this? Perhaps putting the `aSig -=
> bSig;` in inline asm? Inserting a `barrier()` or empty asm statement
> into the loops also seems to work.

I was going to suggest adding a barrier() as well, should have
read on first ;-)

> I see this function seems to be based on Berkeley Softfloat
> http://www.jhauser.us/arithmetic/SoftFloat.html
> v2.  v3 looks like a total rewrite.  Looking at v3e, it looks like
> float64_rem() is now called f64_rem() and defined in f64_rem.c.  It
> doesn't look like there's anything from v3 that we could backport to
> the kernel's v2 to avoid this.
>
> Otherwise perhaps we just disable OABI_COMPAT for clang. Quite a few
> defconfigs explicitly enable FPE_NWFPE though.  Are there really a lot
> of OABI binaries still in use?

I'd do the minimal thing here, neither NWFPE nor OABI_COMPAT should
be too relevant here. Russell still uses some machines with old OABI
binaries, but I have not heard from anyone else using those in a
very long time.

Note that while gcc can still produce kernels that are OABI binaries,
gcc-4.6 marked the arm-linux-gnu target as obsolete and gcc-4.8
removed it, so any existing binaries were built with toolchains
predating that.

     Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem
  2022-10-10 22:13     ` Arnd Bergmann
@ 2022-10-10 22:34       ` Nick Desaulniers
  2022-10-10 22:53         ` [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod Nick Desaulniers
  0 siblings, 1 reply; 9+ messages in thread
From: Nick Desaulniers @ 2022-10-10 22:34 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: clang-built-linux, Nathan Chancellor, Miguel Ojeda,
	Ard Biesheuvel, Gary Guo, Russell King, Linux ARM,
	Linux Kernel Mailing List, Craig Topper, Philip Reames, jh

On Mon, Oct 10, 2022 at 3:14 PM Arnd Bergmann <arnd@kernel.org> wrote:
>
> On Mon, Oct 10, 2022, at 11:23 PM, Nick Desaulniers wrote:
> > On Sat, Jul 16, 2022 at 2:47 AM Arnd Bergmann <arnd@kernel.org> wrote:
> >> On Sat, Jul 16, 2022 at 2:16 AM Nick Desaulniers <ndesaulniers@google.com> wrote:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/nwfpe/softfloat.c#n2312
> > Any creative ideas on how to avoid this? Perhaps putting the `aSig -=
> > bSig;` in inline asm? Inserting a `barrier()` or empty asm statement
> > into the loops also seems to work.
>
> I was going to suggest adding a barrier() as well, should have
> read on first ;-)

barrier() forces reloads+spills in the loop.  The output with `-mllvm
-replexitval=never` is optimal (assuming the loop is faster than
__aeabi_uldivmod (which I think is unprovable).
https://godbolt.org/z/7dMabYYcM

As much I hate relying on compiler-internal flags, I think this is optimal:
```
diff --git a/arch/arm/nwfpe/Makefile b/arch/arm/nwfpe/Makefile
index 303400fa2cdf..2aec85ab1e8b 100644
--- a/arch/arm/nwfpe/Makefile
+++ b/arch/arm/nwfpe/Makefile
@@ -11,3 +11,9 @@ nwfpe-y                               += fpa11.o
fpa11_cpdo.o fpa11_cpdt.o \
                                   entry.o

 nwfpe-$(CONFIG_FPE_NWFPE_XP)   += extended_cpdo.o
+
+# Try really hard to avoid generating calls to __aeabi_uldivmod() from
+# float64_rem() due to loop elision.
+ifdef CONFIG_CC_IS_CLANG
+CFLAGS_softfloat.o     += -mllvm -replexitval=never
+endif
```

Part of me is tempted to move float64_rem() to its own file for that
flag, but indvars+loop-utils isn't eliding other loops in that file
(comparing the full disassembly before+after the above diff).

Long term, it might be nice for us to have `--rtlib` recognize
`--rtlib=linux-kernel@version` or something so that we could better
describe the effective compiler runtime to the compiler.  There are
already differences in compiler-rt and libgcc where we could make
better codegen decisions if we were to consider the target rtlib.
These libraries also change over time though...
-- 
Thanks,
~Nick Desaulniers

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod
  2022-10-10 22:34       ` Nick Desaulniers
@ 2022-10-10 22:53         ` Nick Desaulniers
  2022-10-11 11:01           ` Arnd Bergmann
  2022-10-13 17:04           ` Nathan Chancellor
  0 siblings, 2 replies; 9+ messages in thread
From: Nick Desaulniers @ 2022-10-10 22:53 UTC (permalink / raw)
  To: Arnd Bergmann, Russell King
  Cc: Tom Rix, linux-arm-kernel, linux-kernel, llvm, Miguel Ojeda,
	Ard Biesheuvel, Gary Guo, Craig Topper, Philip Reames, jh,
	Nick Desaulniers, Nathan Chancellor

clang-15's ability to elide loops completely became more aggressive when
it can deduce how a variable is being updated in a loop. Counting down
one variable by an increment of another can be replaced by a modulo
operation.

For 64b variables on 32b ARM EABI targets, this can result in the
compiler generating calls to __aeabi_uldivmod, which it does for a do
while loop in float64_rem().

For the kernel, we'd generally prefer that developers not open code 64b
division via binary / operators and instead use the more explicit
helpers from div64.h. On arm-linux-gnuabi targets, failure to do so can
result in linkage failures due to undefined references to
__aeabi_uldivmod().

While developers can avoid open coding divisions on 64b variables, the
compiler doesn't know that the Linux kernel has a partial implementation
of a compiler runtime (--rtlib) to enforce this convention.

It's also undecidable for the compiler whether the code in question
would be faster to execute the loop vs elide it and do the 64b division.

While I actively avoid using the internal -mllvm command line flags, I
think we get better code than using barrier() here, which will force
reloads+spills in the loop for all toolchains.

Link: https://github.com/ClangBuiltLinux/linux/issues/1666
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
 arch/arm/nwfpe/Makefile | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/arm/nwfpe/Makefile b/arch/arm/nwfpe/Makefile
index 303400fa2cdf..2aec85ab1e8b 100644
--- a/arch/arm/nwfpe/Makefile
+++ b/arch/arm/nwfpe/Makefile
@@ -11,3 +11,9 @@ nwfpe-y				+= fpa11.o fpa11_cpdo.o fpa11_cpdt.o \
 				   entry.o
 
 nwfpe-$(CONFIG_FPE_NWFPE_XP)	+= extended_cpdo.o
+
+# Try really hard to avoid generating calls to __aeabi_uldivmod() from
+# float64_rem() due to loop elision.
+ifdef CONFIG_CC_IS_CLANG
+CFLAGS_softfloat.o	+= -mllvm -replexitval=never
+endif
-- 
2.38.0.rc2.412.g84df46c1b4-goog


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod
  2022-10-10 22:53         ` [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod Nick Desaulniers
@ 2022-10-11 11:01           ` Arnd Bergmann
  2022-10-13 17:04           ` Nathan Chancellor
  1 sibling, 0 replies; 9+ messages in thread
From: Arnd Bergmann @ 2022-10-11 11:01 UTC (permalink / raw)
  To: Nick Desaulniers, Russell King
  Cc: Tom Rix, linux-arm-kernel, linux-kernel, llvm, Miguel Ojeda,
	Ard Biesheuvel, Gary Guo, Craig Topper, Philip Reames, jh,
	Nathan Chancellor

On Tue, Oct 11, 2022, at 12:53 AM, Nick Desaulniers wrote:
> clang-15's ability to elide loops completely became more aggressive when
> it can deduce how a variable is being updated in a loop. Counting down
> one variable by an increment of another can be replaced by a modulo
> operation.
>
> For 64b variables on 32b ARM EABI targets, this can result in the
> compiler generating calls to __aeabi_uldivmod, which it does for a do
> while loop in float64_rem().
>
> For the kernel, we'd generally prefer that developers not open code 64b
> division via binary / operators and instead use the more explicit
> helpers from div64.h. On arm-linux-gnuabi targets, failure to do so can
> result in linkage failures due to undefined references to
> __aeabi_uldivmod().
>
> While developers can avoid open coding divisions on 64b variables, the
> compiler doesn't know that the Linux kernel has a partial implementation
> of a compiler runtime (--rtlib) to enforce this convention.
>
> It's also undecidable for the compiler whether the code in question
> would be faster to execute the loop vs elide it and do the 64b division.
>
> While I actively avoid using the internal -mllvm command line flags, I
> think we get better code than using barrier() here, which will force
> reloads+spills in the loop for all toolchains.
>
> Link: https://github.com/ClangBuiltLinux/linux/issues/1666
> Reported-by: Nathan Chancellor <nathan@kernel.org>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Works for me,

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

I would have been fine with disallowing NWFPE for clang, or with
adding a barrier in the loop as well, i.e. any approach that
doesn't cause invalid behavior or a maintenance burden, given that
there is probably nobody that actually needs nwfpe on a clang built
kernel.

    Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod
  2022-10-10 22:53         ` [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod Nick Desaulniers
  2022-10-11 11:01           ` Arnd Bergmann
@ 2022-10-13 17:04           ` Nathan Chancellor
  1 sibling, 0 replies; 9+ messages in thread
From: Nathan Chancellor @ 2022-10-13 17:04 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arnd Bergmann, Russell King, Tom Rix, linux-arm-kernel,
	linux-kernel, llvm, Miguel Ojeda, Ard Biesheuvel, Gary Guo,
	Craig Topper, Philip Reames, jh

On Mon, Oct 10, 2022 at 03:53:42PM -0700, Nick Desaulniers wrote:
> clang-15's ability to elide loops completely became more aggressive when
> it can deduce how a variable is being updated in a loop. Counting down
> one variable by an increment of another can be replaced by a modulo
> operation.
> 
> For 64b variables on 32b ARM EABI targets, this can result in the
> compiler generating calls to __aeabi_uldivmod, which it does for a do
> while loop in float64_rem().
> 
> For the kernel, we'd generally prefer that developers not open code 64b
> division via binary / operators and instead use the more explicit
> helpers from div64.h. On arm-linux-gnuabi targets, failure to do so can
> result in linkage failures due to undefined references to
> __aeabi_uldivmod().
> 
> While developers can avoid open coding divisions on 64b variables, the
> compiler doesn't know that the Linux kernel has a partial implementation
> of a compiler runtime (--rtlib) to enforce this convention.
> 
> It's also undecidable for the compiler whether the code in question
> would be faster to execute the loop vs elide it and do the 64b division.
> 
> While I actively avoid using the internal -mllvm command line flags, I
> think we get better code than using barrier() here, which will force
> reloads+spills in the loop for all toolchains.
> 
> Link: https://github.com/ClangBuiltLinux/linux/issues/1666
> Reported-by: Nathan Chancellor <nathan@kernel.org>
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

I built ARCH=arm allmodconfig + CONFIG_WERROR=n without this patch and
saw the link failure then applied it and the error went away. Thanks for
all the investigation done into fixing this! I think you put this in the
patch tracker already but just for posterity:

Tested-by: Nathan Chancellor <nathan@kernel.org>

> ---
>  arch/arm/nwfpe/Makefile | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm/nwfpe/Makefile b/arch/arm/nwfpe/Makefile
> index 303400fa2cdf..2aec85ab1e8b 100644
> --- a/arch/arm/nwfpe/Makefile
> +++ b/arch/arm/nwfpe/Makefile
> @@ -11,3 +11,9 @@ nwfpe-y				+= fpa11.o fpa11_cpdo.o fpa11_cpdt.o \
>  				   entry.o
>  
>  nwfpe-$(CONFIG_FPE_NWFPE_XP)	+= extended_cpdo.o
> +
> +# Try really hard to avoid generating calls to __aeabi_uldivmod() from
> +# float64_rem() due to loop elision.
> +ifdef CONFIG_CC_IS_CLANG
> +CFLAGS_softfloat.o	+= -mllvm -replexitval=never
> +endif
> -- 
> 2.38.0.rc2.412.g84df46c1b4-goog
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-13 17:05 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-16  0:16 [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem Nick Desaulniers
2022-07-16  9:46 ` Arnd Bergmann
2022-10-10 21:23   ` Nick Desaulniers
2022-10-10 22:13     ` Arnd Bergmann
2022-10-10 22:34       ` Nick Desaulniers
2022-10-10 22:53         ` [PATCH] ARM: NWFPE: avoid compiler-generated __aeabi_uldivmod Nick Desaulniers
2022-10-11 11:01           ` Arnd Bergmann
2022-10-13 17:04           ` Nathan Chancellor
2022-07-18 16:50 ` [PATCH] arm: lib: implement aeabi_uldivmod via div64_u64_rem kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).