All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M
@ 2020-05-05 13:53 Arnd Bergmann
  2020-05-06  5:12 ` Nathan Chancellor
  2020-05-15  6:20 ` Herbert Xu
  0 siblings, 2 replies; 5+ messages in thread
From: Arnd Bergmann @ 2020-05-05 13:53 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, David Sterba
  Cc: Arnd Bergmann, Horia Geantă,
	Eric Biggers, linux-crypto, linux-kernel, clang-built-linux

When building for ARMv7-M, clang-9 or higher tries to unroll some loops,
which ends up confusing the register allocator to the point of generating
rather bad code and using more than the warning limit for stack frames:

warning: stack frame size of 1200 bytes in function 'blake2b_compress' [-Wframe-larger-than=]

Forcing it to not unroll the final loop avoids this problem.

Fixes: 91d689337fe8 ("crypto: blake2b - add blake2b generic implementation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 crypto/blake2b_generic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/crypto/blake2b_generic.c b/crypto/blake2b_generic.c
index 1d262374fa4e..0ffd8d92e308 100644
--- a/crypto/blake2b_generic.c
+++ b/crypto/blake2b_generic.c
@@ -129,7 +129,9 @@ static void blake2b_compress(struct blake2b_state *S,
 	ROUND(9);
 	ROUND(10);
 	ROUND(11);
-
+#ifdef CONFIG_CC_IS_CLANG
+#pragma nounroll /* https://bugs.llvm.org/show_bug.cgi?id=45803 */
+#endif
 	for (i = 0; i < 8; ++i)
 		S->h[i] = S->h[i] ^ v[i] ^ v[i + 8];
 }
-- 
2.26.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M
  2020-05-05 13:53 [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M Arnd Bergmann
@ 2020-05-06  5:12 ` Nathan Chancellor
  2020-05-08 21:31   ` Arnd Bergmann
  2020-05-15  6:20 ` Herbert Xu
  1 sibling, 1 reply; 5+ messages in thread
From: Nathan Chancellor @ 2020-05-06  5:12 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Herbert Xu, David S. Miller, David Sterba, Horia Geantă,
	Eric Biggers, linux-crypto, linux-kernel, clang-built-linux

On Tue, May 05, 2020 at 03:53:45PM +0200, Arnd Bergmann wrote:
> When building for ARMv7-M, clang-9 or higher tries to unroll some loops,
> which ends up confusing the register allocator to the point of generating
> rather bad code and using more than the warning limit for stack frames:
> 
> warning: stack frame size of 1200 bytes in function 'blake2b_compress' [-Wframe-larger-than=]
> 
> Forcing it to not unroll the final loop avoids this problem.
> 
> Fixes: 91d689337fe8 ("crypto: blake2b - add blake2b generic implementation")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  crypto/blake2b_generic.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/crypto/blake2b_generic.c b/crypto/blake2b_generic.c
> index 1d262374fa4e..0ffd8d92e308 100644
> --- a/crypto/blake2b_generic.c
> +++ b/crypto/blake2b_generic.c
> @@ -129,7 +129,9 @@ static void blake2b_compress(struct blake2b_state *S,
>  	ROUND(9);
>  	ROUND(10);
>  	ROUND(11);
> -
> +#ifdef CONFIG_CC_IS_CLANG

Given your comment in the bug:

"The code is written to assume no loops are unrolled"

Does it make sense to make this unconditional and take compiler
heuristics out of it?

> +#pragma nounroll /* https://bugs.llvm.org/show_bug.cgi?id=45803 */
> +#endif
>  	for (i = 0; i < 8; ++i)
>  		S->h[i] = S->h[i] ^ v[i] ^ v[i + 8];
>  }
> -- 
> 2.26.0
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M
  2020-05-06  5:12 ` Nathan Chancellor
@ 2020-05-08 21:31   ` Arnd Bergmann
  2020-05-09  1:43     ` Nathan Chancellor
  0 siblings, 1 reply; 5+ messages in thread
From: Arnd Bergmann @ 2020-05-08 21:31 UTC (permalink / raw)
  To: Nathan Chancellor
  Cc: Herbert Xu, David S. Miller, David Sterba, Horia Geantă,
	Eric Biggers, open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	linux-kernel, clang-built-linux

On Wed, May 6, 2020 at 7:12 AM Nathan Chancellor
<natechancellor@gmail.com> wrote:
> > -
> > +#ifdef CONFIG_CC_IS_CLANG
>
> Given your comment in the bug:
>
> "The code is written to assume no loops are unrolled"
>
> Does it make sense to make this unconditional and take compiler
> heuristics out of it?
>
> > +#pragma nounroll /* https://bugs.llvm.org/show_bug.cgi?id=45803 */
> > +#endif
> >       for (i = 0; i < 8; ++i)
> >               S->h[i] = S->h[i] ^ v[i] ^ v[i + 8];

No, that would not work, as gcc does not support this pragma.

        Arnd

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M
  2020-05-08 21:31   ` Arnd Bergmann
@ 2020-05-09  1:43     ` Nathan Chancellor
  0 siblings, 0 replies; 5+ messages in thread
From: Nathan Chancellor @ 2020-05-09  1:43 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Herbert Xu, David S. Miller, David Sterba, Horia Geantă,
	Eric Biggers, open list:HARDWARE RANDOM NUMBER GENERATOR CORE,
	linux-kernel, clang-built-linux

On Fri, May 08, 2020 at 11:31:07PM +0200, Arnd Bergmann wrote:
> On Wed, May 6, 2020 at 7:12 AM Nathan Chancellor
> <natechancellor@gmail.com> wrote:
> > > -
> > > +#ifdef CONFIG_CC_IS_CLANG
> >
> > Given your comment in the bug:
> >
> > "The code is written to assume no loops are unrolled"
> >
> > Does it make sense to make this unconditional and take compiler
> > heuristics out of it?
> >
> > > +#pragma nounroll /* https://bugs.llvm.org/show_bug.cgi?id=45803 */
> > > +#endif
> > >       for (i = 0; i < 8; ++i)
> > >               S->h[i] = S->h[i] ^ v[i] ^ v[i + 8];
> 
> No, that would not work, as gcc does not support this pragma.
> 
>         Arnd

Ah fair enough.

Reviewed-by: Nathan Chancellor <natechancellor@gmail.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M
  2020-05-05 13:53 [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M Arnd Bergmann
  2020-05-06  5:12 ` Nathan Chancellor
@ 2020-05-15  6:20 ` Herbert Xu
  1 sibling, 0 replies; 5+ messages in thread
From: Herbert Xu @ 2020-05-15  6:20 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: David S. Miller, David Sterba, Horia Geantă,
	Eric Biggers, linux-crypto, linux-kernel, clang-built-linux

On Tue, May 05, 2020 at 03:53:45PM +0200, Arnd Bergmann wrote:
> When building for ARMv7-M, clang-9 or higher tries to unroll some loops,
> which ends up confusing the register allocator to the point of generating
> rather bad code and using more than the warning limit for stack frames:
> 
> warning: stack frame size of 1200 bytes in function 'blake2b_compress' [-Wframe-larger-than=]
> 
> Forcing it to not unroll the final loop avoids this problem.
> 
> Fixes: 91d689337fe8 ("crypto: blake2b - add blake2b generic implementation")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  crypto/blake2b_generic.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-05-15  6:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-05 13:53 [PATCH] crypto: blake2b - Fix clang optimization for ARMv7-M Arnd Bergmann
2020-05-06  5:12 ` Nathan Chancellor
2020-05-08 21:31   ` Arnd Bergmann
2020-05-09  1:43     ` Nathan Chancellor
2020-05-15  6:20 ` Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.