All of lore.kernel.org
 help / color / mirror / Atom feed
* x86 AES crypto alignment
@ 2021-12-07 19:32 Jakub Kicinski
  2021-12-07 23:12 ` Ard Biesheuvel
  2021-12-08  4:40 ` Herbert Xu
  0 siblings, 2 replies; 11+ messages in thread
From: Jakub Kicinski @ 2021-12-07 19:32 UTC (permalink / raw)
  To: linux-crypto; +Cc: Herbert Xu, Sabrina Dubroca

Hi!

The x86 AES crypto (gcm(aes)) requires 16B alignment which is hard to
achieve in networking. Is there any reason for this? On any moderately 
recent Intel platform aligned and unaligned vmovdq should have the same
performance (reportedly).

I'll hack it up and do some testing, but I thought it's worth asking
first..

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86 AES crypto alignment
  2021-12-07 19:32 x86 AES crypto alignment Jakub Kicinski
@ 2021-12-07 23:12 ` Ard Biesheuvel
  2021-12-08  4:40 ` Herbert Xu
  1 sibling, 0 replies; 11+ messages in thread
From: Ard Biesheuvel @ 2021-12-07 23:12 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Linux Crypto Mailing List, Herbert Xu, Sabrina Dubroca

On Tue, 7 Dec 2021 at 20:33, Jakub Kicinski <kuba@kernel.org> wrote:
>
> Hi!
>
> The x86 AES crypto (gcm(aes)) requires 16B alignment which is hard to
> achieve in networking. Is there any reason for this? On any moderately
> recent Intel platform aligned and unaligned vmovdq should have the same
> performance (reportedly).
>
> I'll hack it up and do some testing, but I thought it's worth asking
> first..

Most likely that whoever contributed the code originally cared more
about squeezing the last drop of performance out of it (on the
microarchitecture of the era) than about general usefulness in real
world scenarios.

So yes, please go ahead and remove this restriction: please use the
builtin randomized tests (CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y) which
should generate calls with misaligned plain/ciphertexts, IVs etc with
sufficient coverage.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86 AES crypto alignment
  2021-12-07 19:32 x86 AES crypto alignment Jakub Kicinski
  2021-12-07 23:12 ` Ard Biesheuvel
@ 2021-12-08  4:40 ` Herbert Xu
  2021-12-08  5:29   ` Jakub Kicinski
  1 sibling, 1 reply; 11+ messages in thread
From: Herbert Xu @ 2021-12-08  4:40 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: linux-crypto, Sabrina Dubroca

On Tue, Dec 07, 2021 at 11:32:52AM -0800, Jakub Kicinski wrote:
> Hi!
> 
> The x86 AES crypto (gcm(aes)) requires 16B alignment which is hard to
> achieve in networking. Is there any reason for this? On any moderately 
> recent Intel platform aligned and unaligned vmovdq should have the same
> performance (reportedly).

There is no such thing as an alignment requirement.  If an algorithm
specifies an alignment and you pass it a request which is unaligned,
the Crypto API will automatically align the data for you.

So what is the actual problem here?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86 AES crypto alignment
  2021-12-08  4:40 ` Herbert Xu
@ 2021-12-08  5:29   ` Jakub Kicinski
  2021-12-20 23:03     ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2021-12-08  5:29 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto, Sabrina Dubroca

On Wed, 8 Dec 2021 15:40:37 +1100 Herbert Xu wrote:
> On Tue, Dec 07, 2021 at 11:32:52AM -0800, Jakub Kicinski wrote:
> > Hi!
> > 
> > The x86 AES crypto (gcm(aes)) requires 16B alignment which is hard to
> > achieve in networking. Is there any reason for this? On any moderately 
> > recent Intel platform aligned and unaligned vmovdq should have the same
> > performance (reportedly).  
> 
> There is no such thing as an alignment requirement.  If an algorithm
> specifies an alignment and you pass it a request which is unaligned,
> the Crypto API will automatically align the data for you.
> 
> So what is the actual problem here?

By align you mean copy right? I'm trying to avoid the copy.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86 AES crypto alignment
  2021-12-08  5:29   ` Jakub Kicinski
@ 2021-12-20 23:03     ` Jakub Kicinski
  2021-12-21  0:11       ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2021-12-20 23:03 UTC (permalink / raw)
  To: Herbert Xu, Ard Biesheuvel; +Cc: linux-crypto, Sabrina Dubroca

On Tue, 7 Dec 2021 21:29:07 -0800 Jakub Kicinski wrote:
> On Wed, 8 Dec 2021 15:40:37 +1100 Herbert Xu wrote:
> > On Tue, Dec 07, 2021 at 11:32:52AM -0800, Jakub Kicinski wrote:  
> > > Hi!
> > > 
> > > The x86 AES crypto (gcm(aes)) requires 16B alignment which is hard to
> > > achieve in networking. Is there any reason for this? On any moderately 
> > > recent Intel platform aligned and unaligned vmovdq should have the same
> > > performance (reportedly).    
> > 
> > There is no such thing as an alignment requirement.  If an algorithm
> > specifies an alignment and you pass it a request which is unaligned,
> > the Crypto API will automatically align the data for you.
> > 
> > So what is the actual problem here?  
> 
> By align you mean copy right? I'm trying to avoid the copy.

Hm, I'm benchmarking things now and it appears to be a regression
introduced somewhere around 5.11 / 5.12. I don't see the memcpy 
eating 20% of performance on 5.10. Bisection time.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86 AES crypto alignment
  2021-12-20 23:03     ` Jakub Kicinski
@ 2021-12-21  0:11       ` Jakub Kicinski
  2021-12-21  0:52         ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Jakub Kicinski @ 2021-12-21  0:11 UTC (permalink / raw)
  To: Herbert Xu, Ard Biesheuvel; +Cc: linux-crypto, Sabrina Dubroca

On Mon, 20 Dec 2021 15:03:43 -0800 Jakub Kicinski wrote:
> On Tue, 7 Dec 2021 21:29:07 -0800 Jakub Kicinski wrote:
> > On Wed, 8 Dec 2021 15:40:37 +1100 Herbert Xu wrote:  
> > > There is no such thing as an alignment requirement.  If an algorithm
> > > specifies an alignment and you pass it a request which is unaligned,
> > > the Crypto API will automatically align the data for you.
> > > 
> > > So what is the actual problem here?    
> > 
> > By align you mean copy right? I'm trying to avoid the copy.  
> 
> Hm, I'm benchmarking things now and it appears to be a regression
> introduced somewhere around 5.11 / 5.12. I don't see the memcpy 
> eating 20% of performance on 5.10. Bisection time.

83c83e658863 ("crypto: aesni - refactor scatterlist processing")

is what introduced the regression.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86 AES crypto alignment
  2021-12-21  0:11       ` Jakub Kicinski
@ 2021-12-21  0:52         ` Jakub Kicinski
  2021-12-21  6:59           ` Ard Biesheuvel
  2021-12-21 10:52           ` Herbert Xu
  0 siblings, 2 replies; 11+ messages in thread
From: Jakub Kicinski @ 2021-12-21  0:52 UTC (permalink / raw)
  To: Herbert Xu, Ard Biesheuvel; +Cc: linux-crypto, Sabrina Dubroca

On Mon, 20 Dec 2021 16:11:25 -0800 Jakub Kicinski wrote:
> On Mon, 20 Dec 2021 15:03:43 -0800 Jakub Kicinski wrote:
> > Hm, I'm benchmarking things now and it appears to be a regression
> > introduced somewhere around 5.11 / 5.12. I don't see the memcpy 
> > eating 20% of performance on 5.10. Bisection time.  
> 
> 83c83e658863 ("crypto: aesni - refactor scatterlist processing")
> 
> is what introduced the regression.

Something like this?

---->8-----------

From: Jakub Kicinski <kuba@kernel.org>
Date: Mon, 20 Dec 2021 16:29:26 -0800
Subject: [PATCH] x86/aesni: don't require alignment

Looks like we take care of the meta-data (key, iv etc.) alignment
anyway and data can safely be unaligned. In fact we were feeding
unaligned data into crypto routines up until commit 83c83e658863 
("crypto: aesni - refactor scatterlist processing") switched to 
use the full skcipher API.

This fixes 21% performance regression in kTLS.

Tested by booting with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
(and running thru various kTLS packets).

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
 arch/x86/crypto/aesni-intel_glue.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index e09f4672dd38..41901ba9d3a2 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1107,7 +1107,7 @@ static struct aead_alg aesni_aeads[] = { {
 		.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.cra_blocksize		= 1,
 		.cra_ctxsize		= sizeof(struct aesni_rfc4106_gcm_ctx),
-		.cra_alignmask		= AESNI_ALIGN - 1,
+		.cra_alignmask		= 0,
 		.cra_module		= THIS_MODULE,
 	},
 }, {
@@ -1124,7 +1124,7 @@ static struct aead_alg aesni_aeads[] = { {
 		.cra_flags		= CRYPTO_ALG_INTERNAL,
 		.cra_blocksize		= 1,
 		.cra_ctxsize		= sizeof(struct generic_gcmaes_ctx),
-		.cra_alignmask		= AESNI_ALIGN - 1,
+		.cra_alignmask		= 0,
 		.cra_module		= THIS_MODULE,
 	},
 } };
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: x86 AES crypto alignment
  2021-12-21  0:52         ` Jakub Kicinski
@ 2021-12-21  6:59           ` Ard Biesheuvel
  2021-12-21  7:24             ` Ard Biesheuvel
  2021-12-21 10:52           ` Herbert Xu
  1 sibling, 1 reply; 11+ messages in thread
From: Ard Biesheuvel @ 2021-12-21  6:59 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Herbert Xu, Linux Crypto Mailing List, Sabrina Dubroca

On Tue, 21 Dec 2021 at 01:52, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 20 Dec 2021 16:11:25 -0800 Jakub Kicinski wrote:
> > On Mon, 20 Dec 2021 15:03:43 -0800 Jakub Kicinski wrote:
> > > Hm, I'm benchmarking things now and it appears to be a regression
> > > introduced somewhere around 5.11 / 5.12. I don't see the memcpy
> > > eating 20% of performance on 5.10. Bisection time.
> >
> > 83c83e658863 ("crypto: aesni - refactor scatterlist processing")
> >
> > is what introduced the regression.
>
> Something like this?
>
> ---->8-----------
>
> From: Jakub Kicinski <kuba@kernel.org>
> Date: Mon, 20 Dec 2021 16:29:26 -0800
> Subject: [PATCH] x86/aesni: don't require alignment
>
> Looks like we take care of the meta-data (key, iv etc.) alignment
> anyway and data can safely be unaligned. In fact we were feeding
> unaligned data into crypto routines up until commit 83c83e658863
> ("crypto: aesni - refactor scatterlist processing") switched to
> use the full skcipher API.
>
> This fixes 21% performance regression in kTLS.
>
> Tested by booting with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
> (and running thru various kTLS packets).
>
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Acked-by: Ard Biesheuvel <ardb@kernel.org>

but it needs a Fixes tag.

Could you check whether this means that gcm_context_data in
gcmaes_crypt_by_sg() does not have to be aligned either? It would be
nice if we could drop that horrible hack as well.


> ---
>  arch/x86/crypto/aesni-intel_glue.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
> index e09f4672dd38..41901ba9d3a2 100644
> --- a/arch/x86/crypto/aesni-intel_glue.c
> +++ b/arch/x86/crypto/aesni-intel_glue.c
> @@ -1107,7 +1107,7 @@ static struct aead_alg aesni_aeads[] = { {
>                 .cra_flags              = CRYPTO_ALG_INTERNAL,
>                 .cra_blocksize          = 1,
>                 .cra_ctxsize            = sizeof(struct aesni_rfc4106_gcm_ctx),
> -               .cra_alignmask          = AESNI_ALIGN - 1,
> +               .cra_alignmask          = 0,
>                 .cra_module             = THIS_MODULE,
>         },
>  }, {
> @@ -1124,7 +1124,7 @@ static struct aead_alg aesni_aeads[] = { {
>                 .cra_flags              = CRYPTO_ALG_INTERNAL,
>                 .cra_blocksize          = 1,
>                 .cra_ctxsize            = sizeof(struct generic_gcmaes_ctx),
> -               .cra_alignmask          = AESNI_ALIGN - 1,
> +               .cra_alignmask          = 0,
>                 .cra_module             = THIS_MODULE,
>         },
>  } };
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86 AES crypto alignment
  2021-12-21  6:59           ` Ard Biesheuvel
@ 2021-12-21  7:24             ` Ard Biesheuvel
  2021-12-21 14:57               ` Jakub Kicinski
  0 siblings, 1 reply; 11+ messages in thread
From: Ard Biesheuvel @ 2021-12-21  7:24 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Herbert Xu, Linux Crypto Mailing List, Sabrina Dubroca

On Tue, 21 Dec 2021 at 07:59, Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Tue, 21 Dec 2021 at 01:52, Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Mon, 20 Dec 2021 16:11:25 -0800 Jakub Kicinski wrote:
> > > On Mon, 20 Dec 2021 15:03:43 -0800 Jakub Kicinski wrote:
> > > > Hm, I'm benchmarking things now and it appears to be a regression
> > > > introduced somewhere around 5.11 / 5.12. I don't see the memcpy
> > > > eating 20% of performance on 5.10. Bisection time.
> > >
> > > 83c83e658863 ("crypto: aesni - refactor scatterlist processing")
> > >
> > > is what introduced the regression.
> >
> > Something like this?
> >
> > ---->8-----------
> >
> > From: Jakub Kicinski <kuba@kernel.org>
> > Date: Mon, 20 Dec 2021 16:29:26 -0800
> > Subject: [PATCH] x86/aesni: don't require alignment
> >
> > Looks like we take care of the meta-data (key, iv etc.) alignment
> > anyway and data can safely be unaligned. In fact we were feeding
> > unaligned data into crypto routines up until commit 83c83e658863
> > ("crypto: aesni - refactor scatterlist processing") switched to
> > use the full skcipher API.
> >
> > This fixes 21% performance regression in kTLS.
> >
> > Tested by booting with CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y
> > (and running thru various kTLS packets).
> >
> > Signed-off-by: Jakub Kicinski <kuba@kernel.org>
>
> Acked-by: Ard Biesheuvel <ardb@kernel.org>
>
> but it needs a Fixes tag.
>
> Could you check whether this means that gcm_context_data in
> gcmaes_crypt_by_sg() does not have to be aligned either? It would be
> nice if we could drop that horrible hack as well.
>

I guess you meant by "we take care of the meta-data (key, iv etc.)
alignment anyway" that we have these hacks for gcm_context_data (which
carries the key) and the IV, using oversized buffers on the stack and
open coded realignment.

It would be really nice if we could just get rid of all of that as
well, and just use {v}movdqu to load those items.




>
> > ---
> >  arch/x86/crypto/aesni-intel_glue.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
> > index e09f4672dd38..41901ba9d3a2 100644
> > --- a/arch/x86/crypto/aesni-intel_glue.c
> > +++ b/arch/x86/crypto/aesni-intel_glue.c
> > @@ -1107,7 +1107,7 @@ static struct aead_alg aesni_aeads[] = { {
> >                 .cra_flags              = CRYPTO_ALG_INTERNAL,
> >                 .cra_blocksize          = 1,
> >                 .cra_ctxsize            = sizeof(struct aesni_rfc4106_gcm_ctx),
> > -               .cra_alignmask          = AESNI_ALIGN - 1,
> > +               .cra_alignmask          = 0,
> >                 .cra_module             = THIS_MODULE,
> >         },
> >  }, {
> > @@ -1124,7 +1124,7 @@ static struct aead_alg aesni_aeads[] = { {
> >                 .cra_flags              = CRYPTO_ALG_INTERNAL,
> >                 .cra_blocksize          = 1,
> >                 .cra_ctxsize            = sizeof(struct generic_gcmaes_ctx),
> > -               .cra_alignmask          = AESNI_ALIGN - 1,
> > +               .cra_alignmask          = 0,
> >                 .cra_module             = THIS_MODULE,
> >         },
> >  } };
> > --
> > 2.31.1
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86 AES crypto alignment
  2021-12-21  0:52         ` Jakub Kicinski
  2021-12-21  6:59           ` Ard Biesheuvel
@ 2021-12-21 10:52           ` Herbert Xu
  1 sibling, 0 replies; 11+ messages in thread
From: Herbert Xu @ 2021-12-21 10:52 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Ard Biesheuvel, linux-crypto, Sabrina Dubroca

On Mon, Dec 20, 2021 at 04:52:51PM -0800, Jakub Kicinski wrote:
>
> Something like this?
> 
> ---->8-----------
> 
> From: Jakub Kicinski <kuba@kernel.org>
> Date: Mon, 20 Dec 2021 16:29:26 -0800
> Subject: [PATCH] x86/aesni: don't require alignment

You need to send the patch with a new Subject line as otherwise
patchwork will ignore it.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: x86 AES crypto alignment
  2021-12-21  7:24             ` Ard Biesheuvel
@ 2021-12-21 14:57               ` Jakub Kicinski
  0 siblings, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2021-12-21 14:57 UTC (permalink / raw)
  To: Ard Biesheuvel; +Cc: Herbert Xu, Linux Crypto Mailing List, Sabrina Dubroca

On Tue, 21 Dec 2021 08:24:48 +0100 Ard Biesheuvel wrote:
> > Could you check whether this means that gcm_context_data in
> > gcmaes_crypt_by_sg() does not have to be aligned either? It would be
> > nice if we could drop that horrible hack as well.
> 
> I guess you meant by "we take care of the meta-data (key, iv etc.)
> alignment anyway" that we have these hacks for gcm_context_data (which
> carries the key) and the IV, using oversized buffers on the stack and
> open coded realignment.
> 
> It would be really nice if we could just get rid of all of that as
> well, and just use {v}movdqu to load those items.

Yup, exactly. I did something close to s/movdqa/movdqu/ initially,
but doing a competent job removing the alignment assumption would
be more effort. Let's see if I can see the copy if any perf profile...

FWIW there is a comment up top in arch/x86/crypto/aesni-intel_asm.S
which explains the aligned operations were chosen because they have
a shorter encoding. Seems like an intentional choice.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-12-21 14:57 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-07 19:32 x86 AES crypto alignment Jakub Kicinski
2021-12-07 23:12 ` Ard Biesheuvel
2021-12-08  4:40 ` Herbert Xu
2021-12-08  5:29   ` Jakub Kicinski
2021-12-20 23:03     ` Jakub Kicinski
2021-12-21  0:11       ` Jakub Kicinski
2021-12-21  0:52         ` Jakub Kicinski
2021-12-21  6:59           ` Ard Biesheuvel
2021-12-21  7:24             ` Ard Biesheuvel
2021-12-21 14:57               ` Jakub Kicinski
2021-12-21 10:52           ` Herbert Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.