All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] lib/string.c: Disable tree-loop-distribute-patterns
@ 2020-08-18 23:43 Arvind Sankar
  2020-08-19  0:44 ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Arvind Sankar @ 2020-08-18 23:43 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds
  Cc: Nick Desaulniers, linux-kernel, clang-built-linux

gcc can transform the loop in a naive implementation of memset/memcpy
etc into a call to the function itself. This optimization is enabled by
-ftree-loop-distribute-patterns.

This has been the case for a while (see eg [0]), but gcc-10.x enables
this option at -O2 rather than -O3 as in previous versions.

Add -ffreestanding, which implicitly disables this optimization with
gcc. It is unclear whether clang performs such optimizations, but
hopefully it will also not do so in a freestanding environment.

This by itself is insufficient for gcc if the optimization was
explicitly enabled by CFLAGS, so also add a flag to explicitly disable
it.

[0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
---
 lib/Makefile | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/lib/Makefile b/lib/Makefile
index e290fc5707ea..80edea49613f 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -15,11 +15,18 @@ KCOV_INSTRUMENT_debugobjects.o := n
 KCOV_INSTRUMENT_dynamic_debug.o := n
 KCOV_INSTRUMENT_fault-inject.o := n
 
+# string.o implements standard library functions like memset/memcpy etc.
+# Use -ffreestanding to ensure that the compiler does not try to "optimize"
+# them into calls to themselves.
+# The optimization pass that does such transformations in gcc is
+# tree-loop-distribute-patterns. Explicitly disable it just in case.
+CFLAGS_string.o := -ffreestanding $(call cc-option,-fno-tree-loop-distribute-patterns)
+
 # Early boot use of cmdline, don't instrument it
 ifdef CONFIG_AMD_MEM_ENCRYPT
 KASAN_SANITIZE_string.o := n
 
-CFLAGS_string.o := -fno-stack-protector
+CFLAGS_string.o += -fno-stack-protector
 endif
 
 # Used by KCSAN while enabled, avoid recursion.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib/string.c: Disable tree-loop-distribute-patterns
  2020-08-18 23:43 [PATCH] lib/string.c: Disable tree-loop-distribute-patterns Arvind Sankar
@ 2020-08-19  0:44 ` Linus Torvalds
  2020-08-19  3:04   ` Arvind Sankar
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2020-08-19  0:44 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Andrew Morton, Nick Desaulniers, Linux Kernel Mailing List,
	clang-built-linux

On Tue, Aug 18, 2020 at 4:43 PM Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> This by itself is insufficient for gcc if the optimization was
> explicitly enabled by CFLAGS, so also add a flag to explicitly disable
> it.

Using -fno-tree-loop-distribute-patterns seems to really be a bit too
incestuous with internal compiler knowledge.

That generic memcpy implementation is horrible anyway. It should never be used.

So I'd rather see this either removed entirely, ot possibly rewritten
to be a somewhat proper memcpy implementation, and in the process made
to not be recognizable by the compiler (possibly by adding a dummy
barrier() or something like that).

Looking at the implementation of "strscpy()" in the same file, and
then comparing that to the ludicrously simplisting "memcpy()", I
really get the feeling that that memcpy() is not worth having.

              Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib/string.c: Disable tree-loop-distribute-patterns
  2020-08-19  0:44 ` Linus Torvalds
@ 2020-08-19  3:04   ` Arvind Sankar
  2020-08-19  3:32     ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Arvind Sankar @ 2020-08-19  3:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arvind Sankar, Andrew Morton, Nick Desaulniers,
	Linux Kernel Mailing List, clang-built-linux

On Tue, Aug 18, 2020 at 05:44:03PM -0700, Linus Torvalds wrote:
> On Tue, Aug 18, 2020 at 4:43 PM Arvind Sankar <nivedita@alum.mit.edu> wrote:
> >
> > This by itself is insufficient for gcc if the optimization was
> > explicitly enabled by CFLAGS, so also add a flag to explicitly disable
> > it.
> 
> Using -fno-tree-loop-distribute-patterns seems to really be a bit too
> incestuous with internal compiler knowledge.

Fair enough -- you ok with just the -ffreestanding? That's what protects
the memset in arch/x86/boot/compressed/string.c.

I think this is worthwhile to be safe.

> 
> That generic memcpy implementation is horrible anyway. It should never be used.
> 
> So I'd rather see this either removed entirely, ot possibly rewritten
> to be a somewhat proper memcpy implementation, and in the process made
> to not be recognizable by the compiler (possibly by adding a dummy
> barrier() or something like that).
> 
> Looking at the implementation of "strscpy()" in the same file, and
> then comparing that to the ludicrously simplisting "memcpy()", I
> really get the feeling that that memcpy() is not worth having.
> 
>               Linus

I don't think anything actually uses the generic memcpy, and I think
only c6x uses the generic memset. Might be worth optimizing strnlen etc
with the word-at-a-time thing though.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib/string.c: Disable tree-loop-distribute-patterns
  2020-08-19  3:04   ` Arvind Sankar
@ 2020-08-19  3:32     ` Linus Torvalds
  2020-08-19 13:16       ` Arvind Sankar
  2020-08-19 14:08       ` [PATCH v2] lib/string.c: Use freestanding environment Arvind Sankar
  0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2020-08-19  3:32 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Andrew Morton, Nick Desaulniers, Linux Kernel Mailing List,
	clang-built-linux

On Tue, Aug 18, 2020 at 8:04 PM Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> On Tue, Aug 18, 2020 at 05:44:03PM -0700, Linus Torvalds wrote:
> > Using -fno-tree-loop-distribute-patterns seems to really be a bit too
> > incestuous with internal compiler knowledge.
>
> Fair enough -- you ok with just the -ffreestanding? That's what protects
> the memset in arch/x86/boot/compressed/string.c.

Yeah, I think -ffreestanding makes sense. It may not be optimal, but
it doesn't smell wrong to me.

> > Looking at the implementation of "strscpy()" in the same file, and
> > then comparing that to the ludicrously simplisting "memcpy()", I
> > really get the feeling that that memcpy() is not worth having.
>
> I don't think anything actually uses the generic memcpy, and I think
> only c6x uses the generic memset.

I do think maybe we should just remove the generic memcpy and memset
and say "hey people, you really do need to implement your own".

Even if you don't have this "recognize and recurse" issue, you end up
having other issues like just tracing etc. Yeah, we've hopefully
turned everything like that off, but considering that no major
architecture uses this, I wonder how many small details we've missed
with ftrace recursion etc?

> Might be worth optimizing strnlen etc with the word-at-a-time thing though.

Yeah, possibly. Except the kernel almost never uses strnlen for
anything bigger. At least I haven't seen it very much in the profiles.

The "strncpy_from_user()" stuff shows up like a sore thumb on some
loads (lots and lots of strings from user space for pathnames and
execve), but the kernel itself tends to seldom deal a lot with any
longer strings.  Stuff like device names etc, I  guess, but any time I
see string handling in profiles, it tends to be in user space (GNU
make spends all of its time in string handling, it sometimes seems).

Of course, that may be just me looking at very particular profiles, so
maybe I've just not seen the loads where the kernel strnlen matters.

memcpy and memset? Those matter. A lot.

            Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] lib/string.c: Disable tree-loop-distribute-patterns
  2020-08-19  3:32     ` Linus Torvalds
@ 2020-08-19 13:16       ` Arvind Sankar
  2020-08-19 14:08       ` [PATCH v2] lib/string.c: Use freestanding environment Arvind Sankar
  1 sibling, 0 replies; 8+ messages in thread
From: Arvind Sankar @ 2020-08-19 13:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Arvind Sankar, Andrew Morton, Nick Desaulniers,
	Linux Kernel Mailing List, clang-built-linux

On Tue, Aug 18, 2020 at 08:32:58PM -0700, Linus Torvalds wrote:
> On Tue, Aug 18, 2020 at 8:04 PM Arvind Sankar <nivedita@alum.mit.edu> wrote:
> 
> > Might be worth optimizing strnlen etc with the word-at-a-time thing though.
> 
> Yeah, possibly. Except the kernel almost never uses strnlen for
> anything bigger. At least I haven't seen it very much in the profiles.

strscpy could be implemented as strnlen+memcpy. I'd think that wouldn't
be much slower, especially if strnlen is optimized and the arch has a
good implementation of memcpy?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2] lib/string.c: Use freestanding environment
  2020-08-19  3:32     ` Linus Torvalds
  2020-08-19 13:16       ` Arvind Sankar
@ 2020-08-19 14:08       ` Arvind Sankar
  2020-08-19 18:35         ` Nick Desaulniers
  1 sibling, 1 reply; 8+ messages in thread
From: Arvind Sankar @ 2020-08-19 14:08 UTC (permalink / raw)
  To: Andrew Morton, Linus Torvalds
  Cc: Nick Desaulniers, linux-kernel, clang-built-linux

gcc can transform the loop in a naive implementation of memset/memcpy
etc into a call to the function itself. This optimization is enabled by
-ftree-loop-distribute-patterns.

This has been the case for a while (see eg [0]), but gcc-10.x enables
this option at -O2 rather than -O3 as in previous versions.

Add -ffreestanding, which implicitly disables this optimization with
gcc. It is unclear whether clang performs such optimizations, but
hopefully it will also not do so in a freestanding environment.

[0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
---
 lib/Makefile | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/lib/Makefile b/lib/Makefile
index e290fc5707ea..a4a4c6864f51 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -15,11 +15,16 @@ KCOV_INSTRUMENT_debugobjects.o := n
 KCOV_INSTRUMENT_dynamic_debug.o := n
 KCOV_INSTRUMENT_fault-inject.o := n
 
+# string.o implements standard library functions like memset/memcpy etc.
+# Use -ffreestanding to ensure that the compiler does not try to "optimize"
+# them into calls to themselves.
+CFLAGS_string.o := -ffreestanding
+
 # Early boot use of cmdline, don't instrument it
 ifdef CONFIG_AMD_MEM_ENCRYPT
 KASAN_SANITIZE_string.o := n
 
-CFLAGS_string.o := -fno-stack-protector
+CFLAGS_string.o += -fno-stack-protector
 endif
 
 # Used by KCSAN while enabled, avoid recursion.
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] lib/string.c: Use freestanding environment
  2020-08-19 14:08       ` [PATCH v2] lib/string.c: Use freestanding environment Arvind Sankar
@ 2020-08-19 18:35         ` Nick Desaulniers
  2020-08-19 19:06           ` Arvind Sankar
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Desaulniers @ 2020-08-19 18:35 UTC (permalink / raw)
  To: Arvind Sankar; +Cc: Andrew Morton, Linus Torvalds, LKML, clang-built-linux

On Wed, Aug 19, 2020 at 7:08 AM Arvind Sankar <nivedita@alum.mit.edu> wrote:
>
> gcc can transform the loop in a naive implementation of memset/memcpy
> etc into a call to the function itself. This optimization is enabled by
> -ftree-loop-distribute-patterns.
>
> This has been the case for a while (see eg [0]), but gcc-10.x enables
> this option at -O2 rather than -O3 as in previous versions.
>
> Add -ffreestanding, which implicitly disables this optimization with
> gcc. It is unclear whether clang performs such optimizations, but
> hopefully it will also not do so in a freestanding environment.
>
> [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
>
> Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>

For Clang:
For x86_64 defconfig:
This results in no change for the code generated.

For aarch64 defconfig:
This results in calls to bcmp being replaced with calls to memcmp in
strstr and strnstr.  I plan on adding -fno-built-bcmp then removing
bcmp anyways.  Not a bug either way, just noting the difference is
disassembly.

For arm defconfig:
This results in no change for the code generated.

I should check the other architectures we support, but my local build
doesn't have all backends enabled currently; we'll catch it once it's
being testing in -next if it's an issue, but I don't foresee it
(knocks on wood, famous last words, ...)

If it helps GCC not optimize these core functions into infinite
recursion, I'm for that, especially since I'd bet these get called
frequently and early on in boot, which is my least favorite time to
debug.

Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>

> ---
>  lib/Makefile | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/lib/Makefile b/lib/Makefile
> index e290fc5707ea..a4a4c6864f51 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -15,11 +15,16 @@ KCOV_INSTRUMENT_debugobjects.o := n
>  KCOV_INSTRUMENT_dynamic_debug.o := n
>  KCOV_INSTRUMENT_fault-inject.o := n
>
> +# string.o implements standard library functions like memset/memcpy etc.
> +# Use -ffreestanding to ensure that the compiler does not try to "optimize"
> +# them into calls to themselves.
> +CFLAGS_string.o := -ffreestanding
> +
>  # Early boot use of cmdline, don't instrument it
>  ifdef CONFIG_AMD_MEM_ENCRYPT
>  KASAN_SANITIZE_string.o := n
>
> -CFLAGS_string.o := -fno-stack-protector
> +CFLAGS_string.o += -fno-stack-protector
>  endif
>
>  # Used by KCSAN while enabled, avoid recursion.
> --
> 2.26.2
>


-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2] lib/string.c: Use freestanding environment
  2020-08-19 18:35         ` Nick Desaulniers
@ 2020-08-19 19:06           ` Arvind Sankar
  0 siblings, 0 replies; 8+ messages in thread
From: Arvind Sankar @ 2020-08-19 19:06 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arvind Sankar, Andrew Morton, Linus Torvalds, LKML, clang-built-linux

On Wed, Aug 19, 2020 at 11:35:11AM -0700, Nick Desaulniers wrote:
> On Wed, Aug 19, 2020 at 7:08 AM Arvind Sankar <nivedita@alum.mit.edu> wrote:
> >
> > gcc can transform the loop in a naive implementation of memset/memcpy
> > etc into a call to the function itself. This optimization is enabled by
> > -ftree-loop-distribute-patterns.
> >
> > This has been the case for a while (see eg [0]), but gcc-10.x enables
> > this option at -O2 rather than -O3 as in previous versions.
> >
> > Add -ffreestanding, which implicitly disables this optimization with
> > gcc. It is unclear whether clang performs such optimizations, but
> > hopefully it will also not do so in a freestanding environment.
> >
> > [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888
> >
> > Signed-off-by: Arvind Sankar <nivedita@alum.mit.edu>
> 
> For Clang:
> For x86_64 defconfig:
> This results in no change for the code generated.
> 
> For aarch64 defconfig:
> This results in calls to bcmp being replaced with calls to memcmp in
> strstr and strnstr.  I plan on adding -fno-built-bcmp then removing
> bcmp anyways.  Not a bug either way, just noting the difference is
> disassembly.
> 
> For arm defconfig:
> This results in no change for the code generated.
> 
> I should check the other architectures we support, but my local build
> doesn't have all backends enabled currently; we'll catch it once it's
> being testing in -next if it's an issue, but I don't foresee it
> (knocks on wood, famous last words, ...)
> 
> If it helps GCC not optimize these core functions into infinite
> recursion, I'm for that, especially since I'd bet these get called
> frequently and early on in boot, which is my least favorite time to
> debug.
> 
> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
> Tested-by: Nick Desaulniers <ndesaulniers@google.com>
> 

I verified that arch/c6x with gcc-10 downloaded from kernel.org has the
broken memset with CC_OPTIMIZE_FOR_PERFORMANCE and gets fixed with this
patch. The default is optimize for size though, which doesn't seem to be
busted.

Thanks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-08-19 19:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-18 23:43 [PATCH] lib/string.c: Disable tree-loop-distribute-patterns Arvind Sankar
2020-08-19  0:44 ` Linus Torvalds
2020-08-19  3:04   ` Arvind Sankar
2020-08-19  3:32     ` Linus Torvalds
2020-08-19 13:16       ` Arvind Sankar
2020-08-19 14:08       ` [PATCH v2] lib/string.c: Use freestanding environment Arvind Sankar
2020-08-19 18:35         ` Nick Desaulniers
2020-08-19 19:06           ` Arvind Sankar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.