linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12
@ 2022-10-09 15:19 Willy Tarreau
  2022-10-09 18:29 ` [PATCH v2] " Willy Tarreau
  0 siblings, 1 reply; 9+ messages in thread
From: Willy Tarreau @ 2022-10-09 15:19 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: kernel test robot, linux-kselftest, linux-kernel, Willy Tarreau

When built at -Os, gcc-12 recognizes an strlen() pattern in nolibc_strlen()
and replaces it with a jump to strlen(), which is not defined as a symbol
and breaks compilation. Worse, when the function is called strlen(), the
function is simply replaced with a jump to itself, hence becomes an
infinite loop.

One way to avoid this is to always set -ffreestanding, but the calling
code doesn't know this and there's no way (either via attributes or
pragmas) to globally enable it from include files, effectively leaving
a painful situation for the caller.

It turns out that -fno-tree-loop-distribute-patterns disables replacement
of strlen-like loops with calls to strlen and that this option is accepted
in the optimize() function attribute. Thus at least it allows us to make
sure our local definition is not replaced with a self jump. The function
only needs to be renamed back to strlen() so that the symbol exists, which
implies that nolibc_strlen() which is used on variable strings has to be
declared as a macro that points back to it before the strlen() macro is
redifined.

It was verified to produce valid code with gcc 3.4 to 12.1 at different
optimization levels, and both with constant and variable strings.

Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/r/202210081618.754a77db-yujie.liu@intel.com
Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
Fixes: 96980b833a21 ("tools/nolibc/string: do not use __builtin_strlen() at -O0")
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
 tools/include/nolibc/string.h | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index bef35bee9c44..5ef8778cd16f 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -125,10 +125,16 @@ char *strcpy(char *dst, const char *src)
 }
 
 /* this function is only used with arguments that are not constants or when
- * it's not known because optimizations are disabled.
+ * it's not known because optimizations are disabled. Note that gcc 12
+ * recognizes an strlen() pattern and replaces it with a jump to strlen(),
+ * thus itself, hence the optimize() attribute below that's meant to disable
+ * this confusing practice.
  */
+#if defined(__GNUC__) && (__GNUC__ >= 12)
+__attribute__((optimize("no-tree-loop-distribute-patterns")))
+#endif
 static __attribute__((unused))
-size_t nolibc_strlen(const char *str)
+size_t strlen(const char *str)
 {
 	size_t len;
 
@@ -140,13 +146,12 @@ size_t nolibc_strlen(const char *str)
  * the two branches, then will rely on an external definition of strlen().
  */
 #if defined(__OPTIMIZE__)
+#define nolibc_strlen(x) strlen(x)
 #define strlen(str) ({                          \
 	__builtin_constant_p((str)) ?           \
 		__builtin_strlen((str)) :       \
 		nolibc_strlen((str));           \
 })
-#else
-#define strlen(str) nolibc_strlen((str))
 #endif
 
 static __attribute__((unused))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2] tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12
  2022-10-09 15:19 tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12 Willy Tarreau
@ 2022-10-09 18:29 ` Willy Tarreau
  2022-10-20 22:47   ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Willy Tarreau @ 2022-10-09 18:29 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Alexey Dobriyan, kernel test robot, linux-kselftest,
	linux-kernel, Willy Tarreau

When built at -Os, gcc-12 recognizes an strlen() pattern in nolibc_strlen()
and replaces it with a jump to strlen(), which is not defined as a symbol
and breaks compilation. Worse, when the function is called strlen(), the
function is simply replaced with a jump to itself, hence becomes an
infinite loop.

One way to avoid this is to always set -ffreestanding, but the calling
code doesn't know this and there's no way (either via attributes or
pragmas) to globally enable it from include files, effectively leaving
a painful situation for the caller.

Alexey suggested to place an empty asm() statement inside the loop to
stop gcc from recognizing a well-known pattern, which happens to work
pretty fine. At least it allows us to make sure our local definition
is not replaced with a self jump.

The function only needs to be renamed back to strlen() so that the symbol
exists, which implies that nolibc_strlen() which is used on variable
strings has to be declared as a macro that points back to it before the
strlen() macro is redifined.

It was verified to produce valid code with gcc 3.4 to 12.1 at different
optimization levels, and both with constant and variable strings.

In case this problem surfaces again in the future, an alternate approach
consisting in adding an optimize("no-tree-loop-distribute-patterns")
function attribute for gcc>=12 worked as well but is less pretty.

Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/r/202210081618.754a77db-yujie.liu@intel.com
Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
Fixes: 96980b833a21 ("tools/nolibc/string: do not use __builtin_strlen() at -O0")
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
---
v2: dropped the attribute(optimize) in favor of an empty asm() statement

---
 tools/include/nolibc/string.h | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
index bef35bee9c44..718a405ffbc3 100644
--- a/tools/include/nolibc/string.h
+++ b/tools/include/nolibc/string.h
@@ -125,14 +125,18 @@ char *strcpy(char *dst, const char *src)
 }
 
 /* this function is only used with arguments that are not constants or when
- * it's not known because optimizations are disabled.
+ * it's not known because optimizations are disabled. Note that gcc 12
+ * recognizes an strlen() pattern and replaces it with a jump to strlen(),
+ * thus itself, hence the asm() statement below that's meant to disable this
+ * confusing practice.
  */
 static __attribute__((unused))
-size_t nolibc_strlen(const char *str)
+size_t strlen(const char *str)
 {
 	size_t len;
 
-	for (len = 0; str[len]; len++);
+	for (len = 0; str[len]; len++)
+		asm("");
 	return len;
 }
 
@@ -140,13 +144,12 @@ size_t nolibc_strlen(const char *str)
  * the two branches, then will rely on an external definition of strlen().
  */
 #if defined(__OPTIMIZE__)
+#define nolibc_strlen(x) strlen(x)
 #define strlen(str) ({                          \
 	__builtin_constant_p((str)) ?           \
 		__builtin_strlen((str)) :       \
 		nolibc_strlen((str));           \
 })
-#else
-#define strlen(str) nolibc_strlen((str))
 #endif
 
 static __attribute__((unused))
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2] tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12
  2022-10-09 18:29 ` [PATCH v2] " Willy Tarreau
@ 2022-10-20 22:47   ` Paul E. McKenney
  0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2022-10-20 22:47 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Alexey Dobriyan, kernel test robot, linux-kselftest, linux-kernel

On Sun, Oct 09, 2022 at 08:29:36PM +0200, Willy Tarreau wrote:
> When built at -Os, gcc-12 recognizes an strlen() pattern in nolibc_strlen()
> and replaces it with a jump to strlen(), which is not defined as a symbol
> and breaks compilation. Worse, when the function is called strlen(), the
> function is simply replaced with a jump to itself, hence becomes an
> infinite loop.
> 
> One way to avoid this is to always set -ffreestanding, but the calling
> code doesn't know this and there's no way (either via attributes or
> pragmas) to globally enable it from include files, effectively leaving
> a painful situation for the caller.
> 
> Alexey suggested to place an empty asm() statement inside the loop to
> stop gcc from recognizing a well-known pattern, which happens to work
> pretty fine. At least it allows us to make sure our local definition
> is not replaced with a self jump.
> 
> The function only needs to be renamed back to strlen() so that the symbol
> exists, which implies that nolibc_strlen() which is used on variable
> strings has to be declared as a macro that points back to it before the
> strlen() macro is redifined.
> 
> It was verified to produce valid code with gcc 3.4 to 12.1 at different
> optimization levels, and both with constant and variable strings.
> 
> In case this problem surfaces again in the future, an alternate approach
> consisting in adding an optimize("no-tree-loop-distribute-patterns")
> function attribute for gcc>=12 worked as well but is less pretty.
> 
> Reported-by: kernel test robot <yujie.liu@intel.com>
> Link: https://lore.kernel.org/r/202210081618.754a77db-yujie.liu@intel.com
> Fixes: 66b6f755ad45 ("rcutorture: Import a copy of nolibc")
> Fixes: 96980b833a21 ("tools/nolibc/string: do not use __builtin_strlen() at -O0")
> Cc: "Paul E. McKenney" <paulmck@kernel.org>
> Cc: Alexey Dobriyan <adobriyan@gmail.com>
> Signed-off-by: Willy Tarreau <w@1wt.eu>

Hearing no objections, I have pulled this one in for review and testing,
thank you!

							Thanx, Paul

> ---
> v2: dropped the attribute(optimize) in favor of an empty asm() statement
> 
> ---
>  tools/include/nolibc/string.h | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/include/nolibc/string.h b/tools/include/nolibc/string.h
> index bef35bee9c44..718a405ffbc3 100644
> --- a/tools/include/nolibc/string.h
> +++ b/tools/include/nolibc/string.h
> @@ -125,14 +125,18 @@ char *strcpy(char *dst, const char *src)
>  }
>  
>  /* this function is only used with arguments that are not constants or when
> - * it's not known because optimizations are disabled.
> + * it's not known because optimizations are disabled. Note that gcc 12
> + * recognizes an strlen() pattern and replaces it with a jump to strlen(),
> + * thus itself, hence the asm() statement below that's meant to disable this
> + * confusing practice.
>   */
>  static __attribute__((unused))
> -size_t nolibc_strlen(const char *str)
> +size_t strlen(const char *str)
>  {
>  	size_t len;
>  
> -	for (len = 0; str[len]; len++);
> +	for (len = 0; str[len]; len++)
> +		asm("");
>  	return len;
>  }
>  
> @@ -140,13 +144,12 @@ size_t nolibc_strlen(const char *str)
>   * the two branches, then will rely on an external definition of strlen().
>   */
>  #if defined(__OPTIMIZE__)
> +#define nolibc_strlen(x) strlen(x)
>  #define strlen(str) ({                          \
>  	__builtin_constant_p((str)) ?           \
>  		__builtin_strlen((str)) :       \
>  		nolibc_strlen((str));           \
>  })
> -#else
> -#define strlen(str) nolibc_strlen((str))
>  #endif
>  
>  static __attribute__((unused))
> -- 
> 2.35.3
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12
  2022-10-11  6:20       ` Willy Tarreau
@ 2022-10-11 10:18         ` David Laight
  0 siblings, 0 replies; 9+ messages in thread
From: David Laight @ 2022-10-11 10:18 UTC (permalink / raw)
  To: 'Willy Tarreau'
  Cc: Alexey Dobriyan, lkp, linux-kselftest, linux-kernel, Paul E. McKenney

From: Willy Tarreau
> Sent: 11 October 2022 07:21
> 
> On Mon, Oct 10, 2022 at 10:03:53AM +0000, David Laight wrote:
> > From: Willy Tarreau <w@1wt.eu>
> > > Sent: 09 October 2022 19:36
> > ...
> > > By the way, just for the sake of completeness, the one that consistently
> > > gives me a better output is this one:
> > >
> > >   size_t strlen(const char *str)
> > >   {
> > >           const char *s0 = str--;
> > >
> > >           while (*++str)
> > >   		;
> > >           return str - s0;
> > >   }
> > >
> > > Which gives me this:
> > >
> > >
> > >   0000000000000000 <strlen>:
> > >      0:   48 8d 47 ff             lea    -0x1(%rdi),%rax
> > >      4:   48 ff c0                inc    %rax
> > >      7:   80 38 00                cmpb   $0x0,(%rax)
> > >      a:   75 f8                   jne    4 <len+0x4>
> > >      c:   48 29 f8                sub    %rdi,%rax
> > >      f:   c3                      ret
> > >
> > > But this is totally ruined by the addition of asm() in the loop. However
> > > I suspect that the construct is difficult to match against a real strlen()
> > > since it starts on an extra character, thus placing the asm() statement
> > > before the loop could durably preserve it. It does work here (the code
> > > remains the exact same one), but for how long, that's the question. Maybe
> > > we can revisit the various loop-based functions in the future with this in
> > > mind.
> >
> > clang wilfully and persistently generates:
> >
> > strlen:                                 # @strlen
> >         movq    $-1, %rax
> > .LBB0_1:                                # =>This Inner Loop Header: Depth=1
> >         cmpb    $0, 1(%rdi,%rax)
> >         leaq    1(%rax), %rax
> >         jne     .LBB0_1
> >         retq
> >
> > But feed the C for that into gcc and it generates a 'jmp strlen'
> > at everything above -O1.
> 
> Interesting, that's not the case for me here with 12.2 from kernel.org
> on x86_64, which gives this at -O1, -O2, -O3 and -Ofast:
> 
>   0000000000000000 <strlen>:
>      0:   48 8d 47 ff             lea    -0x1(%rdi),%rax
>      4:   0f 1f 40 00             nopl   0x0(%rax)
>      8:   48 83 c0 01             add    $0x1,%rax
>      c:   80 38 00                cmpb   $0x0,(%rax)
>      f:   75 f7                   jne    8 <strlen+0x8>
>     11:   48 29 f8                sub    %rdi,%rax
>     14:   c3                      ret
> 
> Out of curiosity what version were you using ?

Clang 12.0.0 onwards, see https://godbolt.org/z/67Gnzs8js

> > I suspect that might run with less clocks/byte than the code above.
> 
> Certainly for large strings, but not for short ones.

For short strings not needing the final sub and not having
the read depend on the increment should make the leal one faster.
(The nop to align the loop label is monumentally pointless.)

For long strings what matters is how many clocks it takes
to schedule the 4 uops in the loop.
It might be possible to get down to 2 clocks - but I think
both the loops are 3 clocks (assuming the adjacent cmp/jne fuse).

I'm not going to try to instrument the loops though!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12
  2022-10-10 10:03     ` David Laight
@ 2022-10-11  6:20       ` Willy Tarreau
  2022-10-11 10:18         ` David Laight
  0 siblings, 1 reply; 9+ messages in thread
From: Willy Tarreau @ 2022-10-11  6:20 UTC (permalink / raw)
  To: David Laight
  Cc: Alexey Dobriyan, lkp, linux-kselftest, linux-kernel, Paul E. McKenney

On Mon, Oct 10, 2022 at 10:03:53AM +0000, David Laight wrote:
> From: Willy Tarreau <w@1wt.eu>
> > Sent: 09 October 2022 19:36
> ...
> > By the way, just for the sake of completeness, the one that consistently
> > gives me a better output is this one:
> > 
> >   size_t strlen(const char *str)
> >   {
> >           const char *s0 = str--;
> > 
> >           while (*++str)
> >   		;
> >           return str - s0;
> >   }
> > 
> > Which gives me this:
> > 
> > 
> >   0000000000000000 <strlen>:
> >      0:   48 8d 47 ff             lea    -0x1(%rdi),%rax
> >      4:   48 ff c0                inc    %rax
> >      7:   80 38 00                cmpb   $0x0,(%rax)
> >      a:   75 f8                   jne    4 <len+0x4>
> >      c:   48 29 f8                sub    %rdi,%rax
> >      f:   c3                      ret
> > 
> > But this is totally ruined by the addition of asm() in the loop. However
> > I suspect that the construct is difficult to match against a real strlen()
> > since it starts on an extra character, thus placing the asm() statement
> > before the loop could durably preserve it. It does work here (the code
> > remains the exact same one), but for how long, that's the question. Maybe
> > we can revisit the various loop-based functions in the future with this in
> > mind.
> 
> clang wilfully and persistently generates:
> 
> strlen:                                 # @strlen
>         movq    $-1, %rax
> .LBB0_1:                                # =>This Inner Loop Header: Depth=1
>         cmpb    $0, 1(%rdi,%rax)
>         leaq    1(%rax), %rax
>         jne     .LBB0_1
>         retq
> 
> But feed the C for that into gcc and it generates a 'jmp strlen'
> at everything above -O1.

Interesting, that's not the case for me here with 12.2 from kernel.org
on x86_64, which gives this at -O1, -O2, -O3 and -Ofast:

  0000000000000000 <strlen>:
     0:   48 8d 47 ff             lea    -0x1(%rdi),%rax
     4:   0f 1f 40 00             nopl   0x0(%rax)
     8:   48 83 c0 01             add    $0x1,%rax
     c:   80 38 00                cmpb   $0x0,(%rax)
     f:   75 f7                   jne    8 <strlen+0x8>
    11:   48 29 f8                sub    %rdi,%rax
    14:   c3                      ret    

Out of curiosity what version were you using ? 

> I suspect that might run with less clocks/byte than the code above.

Certainly for large strings, but not for short ones.

> Somewhere I hate some complier pessimisations.
> Substituting a call to strlen() is typical.
> strlen() is almost certainly optimised for long strings.
> If the string is short the coded loop will be faster.

Yes, and more importantly, if a developer takes the time to explicitly
write a loop to do something that matches such a function, it's very
likely that they already considered the function and did *not* want to
use it for whatever reason. And the problem we're currently having with
compilers is that they are not willing to respect the developer's intent
because they always know better.

> The same is true (and probably more so) for memcpy.

Yes, I think that we'll eventually have to stuff a few asm() here and
there in a few such loops as compilers become less and less trustable.

Willy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12
  2022-10-09 18:36   ` Willy Tarreau
@ 2022-10-10 10:03     ` David Laight
  2022-10-11  6:20       ` Willy Tarreau
  0 siblings, 1 reply; 9+ messages in thread
From: David Laight @ 2022-10-10 10:03 UTC (permalink / raw)
  To: 'Willy Tarreau', Alexey Dobriyan
  Cc: lkp, linux-kselftest, linux-kernel, Paul E. McKenney

From: Willy Tarreau <w@1wt.eu>
> Sent: 09 October 2022 19:36
...
> By the way, just for the sake of completeness, the one that consistently
> gives me a better output is this one:
> 
>   size_t strlen(const char *str)
>   {
>           const char *s0 = str--;
> 
>           while (*++str)
>   		;
>           return str - s0;
>   }
> 
> Which gives me this:
> 
> 
>   0000000000000000 <strlen>:
>      0:   48 8d 47 ff             lea    -0x1(%rdi),%rax
>      4:   48 ff c0                inc    %rax
>      7:   80 38 00                cmpb   $0x0,(%rax)
>      a:   75 f8                   jne    4 <len+0x4>
>      c:   48 29 f8                sub    %rdi,%rax
>      f:   c3                      ret
> 
> But this is totally ruined by the addition of asm() in the loop. However
> I suspect that the construct is difficult to match against a real strlen()
> since it starts on an extra character, thus placing the asm() statement
> before the loop could durably preserve it. It does work here (the code
> remains the exact same one), but for how long, that's the question. Maybe
> we can revisit the various loop-based functions in the future with this in
> mind.

clang wilfully and persistently generates:

strlen:                                 # @strlen
        movq    $-1, %rax
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        cmpb    $0, 1(%rdi,%rax)
        leaq    1(%rax), %rax
        jne     .LBB0_1
        retq

But feed the C for that into gcc and it generates a 'jmp strlen'
at everything above -O1.
I suspect that might run with less clocks/byte than the code above.

Somewhere I hate some complier pessimisations.
Substituting a call to strlen() is typical.
strlen() is almost certainly optimised for long strings.
If the string is short the coded loop will be faster.
The same is true (and probably more so) for memcpy.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12
  2022-10-09 17:59 ` Willy Tarreau
@ 2022-10-09 18:36   ` Willy Tarreau
  2022-10-10 10:03     ` David Laight
  0 siblings, 1 reply; 9+ messages in thread
From: Willy Tarreau @ 2022-10-09 18:36 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: lkp, linux-kselftest, linux-kernel, Paul E. McKenney

On Sun, Oct 09, 2022 at 07:59:20PM +0200, Willy Tarreau wrote:
> Hi Alexey,
> 
> On Sun, Oct 09, 2022 at 06:45:49PM +0300, Alexey Dobriyan wrote:
> > Willy Tarreau wrote:
> > > +#if defined(__GNUC__) && (__GNUC__ >= 12)
> > > +__attribute__((optimize("no-tree-loop-distribute-patterns")))
> > > +#endif
> > >  static __attribute__((unused))
> > > -size_t nolibc_strlen(const char *str
> > 
> > I'd suggest to use asm("") in the loop body. It worked in the past
> > to prevent folding division loop back into division instruction.
> 
> Ah excellent idea! I initially thought about using asm() to hide a
> variable provenance but didn't like it much because it undermines
> code optimization. But you're right, with an empty asm() statement
> alone, the loop will not look like an strlen() anymore. Just tried
> and it works like a charm, I'll resend a patch so that we can get
> rid of the ugly ifdef.
> 
> > Or switch to 
> > 
> > 	size_t f(const char *s)
> > 	{
> > 		const char *s0 = s;
> > 		while (*s++)
> > 			;
> > 		return s - s0 - 1;
> > 	}
> > 
> > which compiles to 1 branch, not 2.
> 
> In fact it depends. In the original code that approach was part of
> the ones I had considered, but it doesn't always in better code due
> to the prologue and epilogue being larger. It's only better at -O1,
> and -O2, but not -Os, and once you add asm() into it, only -O1
> remains better:
> 
>   $ nm --size len.o|grep O|rev|sort|rev
>   000000000000001a T len_while_O1
>   0000000000000022 T len_while_asm_O1
>   0000000000000026 T len_for_O1
>   000000000000001a T len_while_O2
>   000000000000002b T len_while_asm_O2
>   0000000000000021 T len_for_O2
>   0000000000000013 T len_while_Os
>   0000000000000015 T len_while_asm_Os
>   000000000000000e T len_for_Os
> 
> This observation seems consistent for me on x86_64, i386, arm and arm64.

By the way, just for the sake of completeness, the one that consistently
gives me a better output is this one:

  size_t strlen(const char *str)
  {
          const char *s0 = str--;
  
          while (*++str)
  		;
          return str - s0;
  }

Which gives me this:


  0000000000000000 <strlen>:
     0:   48 8d 47 ff             lea    -0x1(%rdi),%rax
     4:   48 ff c0                inc    %rax
     7:   80 38 00                cmpb   $0x0,(%rax)
     a:   75 f8                   jne    4 <len+0x4>
     c:   48 29 f8                sub    %rdi,%rax
     f:   c3                      ret    

But this is totally ruined by the addition of asm() in the loop. However
I suspect that the construct is difficult to match against a real strlen()
since it starts on an extra character, thus placing the asm() statement
before the loop could durably preserve it. It does work here (the code
remains the exact same one), but for how long, that's the question. Maybe
we can revisit the various loop-based functions in the future with this in
mind.

Cheers,
Willy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12
  2022-10-09 15:45 Alexey Dobriyan
@ 2022-10-09 17:59 ` Willy Tarreau
  2022-10-09 18:36   ` Willy Tarreau
  0 siblings, 1 reply; 9+ messages in thread
From: Willy Tarreau @ 2022-10-09 17:59 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: lkp, linux-kselftest, linux-kernel, Paul E. McKenney

Hi Alexey,

On Sun, Oct 09, 2022 at 06:45:49PM +0300, Alexey Dobriyan wrote:
> Willy Tarreau wrote:
> > +#if defined(__GNUC__) && (__GNUC__ >= 12)
> > +__attribute__((optimize("no-tree-loop-distribute-patterns")))
> > +#endif
> >  static __attribute__((unused))
> > -size_t nolibc_strlen(const char *str
> 
> I'd suggest to use asm("") in the loop body. It worked in the past
> to prevent folding division loop back into division instruction.

Ah excellent idea! I initially thought about using asm() to hide a
variable provenance but didn't like it much because it undermines
code optimization. But you're right, with an empty asm() statement
alone, the loop will not look like an strlen() anymore. Just tried
and it works like a charm, I'll resend a patch so that we can get
rid of the ugly ifdef.

> Or switch to 
> 
> 	size_t f(const char *s)
> 	{
> 		const char *s0 = s;
> 		while (*s++)
> 			;
> 		return s - s0 - 1;
> 	}
> 
> which compiles to 1 branch, not 2.

In fact it depends. In the original code that approach was part of
the ones I had considered, but it doesn't always in better code due
to the prologue and epilogue being larger. It's only better at -O1,
and -O2, but not -Os, and once you add asm() into it, only -O1
remains better:

  $ nm --size len.o|grep O|rev|sort|rev
  000000000000001a T len_while_O1
  0000000000000022 T len_while_asm_O1
  0000000000000026 T len_for_O1
  000000000000001a T len_while_O2
  000000000000002b T len_while_asm_O2
  0000000000000021 T len_for_O2
  0000000000000013 T len_while_Os
  0000000000000015 T len_while_asm_Os
  000000000000000e T len_for_Os

This observation seems consistent for me on x86_64, i386, arm and arm64.

> But of course they could recognise this pattern too.

Yes definitely, hence the need for asm() there as well to complete the
comparison.

Thanks for the suggestion, I'll send a v2 shortly.
Willy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12
@ 2022-10-09 15:45 Alexey Dobriyan
  2022-10-09 17:59 ` Willy Tarreau
  0 siblings, 1 reply; 9+ messages in thread
From: Alexey Dobriyan @ 2022-10-09 15:45 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: lkp, linux-kselftest, linux-kernel, Paul E. McKenney

Willy Tarreau wrote:
> +#if defined(__GNUC__) && (__GNUC__ >= 12)
> +__attribute__((optimize("no-tree-loop-distribute-patterns")))
> +#endif
>  static __attribute__((unused))
> -size_t nolibc_strlen(const char *str

I'd suggest to use asm("") in the loop body. It worked in the past
to prevent folding division loop back into division instruction.

Or switch to 

	size_t f(const char *s)
	{
		const char *s0 = s;
		while (*s++)
			;
		return s - s0 - 1;
	}

which compiles to 1 branch, not 2.

But of course they could recognise this pattern too.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-10-20 22:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-09 15:19 tools/nolibc: fix missing strlen() definition and infinite loop with gcc-12 Willy Tarreau
2022-10-09 18:29 ` [PATCH v2] " Willy Tarreau
2022-10-20 22:47   ` Paul E. McKenney
2022-10-09 15:45 Alexey Dobriyan
2022-10-09 17:59 ` Willy Tarreau
2022-10-09 18:36   ` Willy Tarreau
2022-10-10 10:03     ` David Laight
2022-10-11  6:20       ` Willy Tarreau
2022-10-11 10:18         ` David Laight

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).