All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dey, Megha" <megha.dey@intel.com>
To: Ard Biesheuvel <ardb@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>,
	Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	ravi.v.shankar@intel.com, tim.c.chen@intel.com,
	andi.kleen@intel.com, Dave Hansen <dave.hansen@intel.com>,
	wajdi.k.feghali@intel.com, greg.b.tucker@intel.com,
	robert.a.kasten@intel.com, rajendrakumar.chinnaiyan@intel.com,
	tomasz.kantecki@intel.com, ryan.d.saffores@intel.com,
	ilya.albrekht@intel.com, kyung.min.park@intel.com,
	Tony Luck <tony.luck@intel.com>,
	ira.weiny@intel.com
Subject: Re: [RFC V1 5/7] crypto: aesni - AES CTR x86_64 "by16" AVX512 optimization
Date: Wed, 20 Jan 2021 14:46:45 -0800	[thread overview]
Message-ID: <3d8fe68d-bb30-c61a-7aeb-cca85d5f5df6@intel.com> (raw)
In-Reply-To: <CAMj1kXH3Mkwfeq-Mu9s9p1fB3i3ez2EEnRbr9ZHZWePShnovcw@mail.gmail.com>

Hi Ard,

On 1/16/2021 9:03 AM, Ard Biesheuvel wrote:
> On Fri, 18 Dec 2020 at 22:08, Megha Dey <megha.dey@intel.com> wrote:
>> Introduce the "by16" implementation of the AES CTR mode using AVX512
>> optimizations. "by16" means that 16 independent blocks (each block
>> being 128 bits) can be ciphered simultaneously as opposed to the
>> current 8 blocks.
>>
>> The glue code in AESNI module overrides the existing "by8" CTR mode
>> encryption/decryption routines with the "by16" ones when the following
>> criteria are met:
>> At compile time:
>> 1. CONFIG_CRYPTO_AVX512 is enabled
>> 2. toolchain(assembler) supports VAES instructions
>> At runtime:
>> 1. VAES and AVX512VL features are supported on platform (currently
>>     only Icelake)
>> 2. aesni_intel.use_avx512 module parameter is set at boot time. For this
>>     algorithm, switching from AVX512 optimized version is not possible once
>>     set at boot time because of how the code is structured today.(Can be
>>     changed later if required)
>>
>> The functions aes_ctr_enc_128_avx512_by16(), aes_ctr_enc_192_avx512_by16()
>> and aes_ctr_enc_256_avx512_by16() are adapted from Intel Optimized IPSEC
>> Cryptographic library.
>>
>> On a Icelake desktop, with turbo disabled and all CPUs running at maximum
>> frequency, the "by16" CTR mode optimization shows better performance
>> across data & key sizes as measured by tcrypt.
>>
>> The average performance improvement of the "by16" version over the "by8"
>> version is as follows:
>> For all key sizes(128/192/256 bits),
>>          data sizes < 128 bytes/block, negligible improvement(~3% loss)
>>          data sizes > 128 bytes/block, there is an average improvement of
>> 48% for both encryption and decryption.
>>
>> A typical run of tcrypt with AES CTR mode encryption/decryption of the
>> "by8" and "by16" optimization on a Icelake desktop shows the following
>> results:
>>
>> --------------------------------------------------------------
>> |  key   | bytes | cycles/op (lower is better)| percentage   |
>> | length |  per  |  encryption  |  decryption |  loss/gain   |
>> | (bits) | block |-------------------------------------------|
>> |        |       | by8  | by16  | by8  | by16 |  enc | dec   |
>> |------------------------------------------------------------|
>> |  128   |  16   | 156  | 168   | 164  | 168  | -7.7 |  -2.5 |
>> |  128   |  64   | 180  | 190   | 157  | 146  | -5.6 |   7.1 |
>> |  128   |  256  | 248  | 158   | 251  | 161  | 36.3 |  35.9 |
>> |  128   |  1024 | 633  | 316   | 642  | 319  | 50.1 |  50.4 |
>> |  128   |  1472 | 853  | 411   | 877  | 407  | 51.9 |  53.6 |
>> |  128   |  8192 | 4463 | 1959  | 4447 | 1940 | 56.2 |  56.4 |
>> |  192   |  16   | 136  | 145   | 149  | 166  | -6.7 | -11.5 |
>> |  192   |  64   | 159  | 154   | 157  | 160  |  3.2 |  -2   |
>> |  192   |  256  | 268  | 172   | 274  | 177  | 35.9 |  35.5 |
>> |  192   |  1024 | 710  | 358   | 720  | 355  | 49.6 |  50.7 |
>> |  192   |  1472 | 989  | 468   | 983  | 469  | 52.7 |  52.3 |
>> |  192   |  8192 | 6326 | 3551  | 6301 | 3567 | 43.9 |  43.4 |
>> |  256   |  16   | 153  | 165   | 139  | 156  | -7.9 | -12.3 |
>> |  256   |  64   | 158  | 152   | 174  | 161  |  3.8 |   7.5 |
>> |  256   |  256  | 283  | 176   | 287  | 202  | 37.9 |  29.7 |
>> |  256   |  1024 | 797  | 393   | 807  | 395  | 50.7 |  51.1 |
>> |  256   |  1472 | 1108 | 534   | 1107 | 527  | 51.9 |  52.4 |
>> |  256   |  8192 | 5763 | 2616  | 5773 | 2617 | 54.7 |  54.7 |
>> --------------------------------------------------------------
>>
>> This work was inspired by the AES CTR mode optimization published
>> in Intel Optimized IPSEC Cryptographic library.
>> https://github.com/intel/intel-ipsec-mb/blob/master/lib/avx512/cntr_vaes_avx512.asm
>>
>> Co-developed-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
>> Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
>> Signed-off-by: Megha Dey <megha.dey@intel.com>
>> ---
>>   arch/x86/crypto/Makefile                    |   1 +
>>   arch/x86/crypto/aes_ctrby16_avx512-x86_64.S | 856 ++++++++++++++++++++++++++++
>>   arch/x86/crypto/aesni-intel_glue.c          |  57 +-
>>   arch/x86/crypto/avx512_vaes_common.S        | 422 ++++++++++++++
>>   arch/x86/include/asm/disabled-features.h    |   8 +-
>>   crypto/Kconfig                              |  12 +
>>   6 files changed, 1354 insertions(+), 2 deletions(-)
>>   create mode 100644 arch/x86/crypto/aes_ctrby16_avx512-x86_64.S
>>
> ...
>> diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
>> index ad8a718..f45059e 100644
>> --- a/arch/x86/crypto/aesni-intel_glue.c
>> +++ b/arch/x86/crypto/aesni-intel_glue.c
>> @@ -46,6 +46,10 @@
>>   #define CRYPTO_AES_CTX_SIZE (sizeof(struct crypto_aes_ctx) + AESNI_ALIGN_EXTRA)
>>   #define XTS_AES_CTX_SIZE (sizeof(struct aesni_xts_ctx) + AESNI_ALIGN_EXTRA)
>>
>> +static bool use_avx512;
>> +module_param(use_avx512, bool, 0644);
>> +MODULE_PARM_DESC(use_avx512, "Use AVX512 optimized algorithm, if available");
>> +
>>   /* This data is stored at the end of the crypto_tfm struct.
>>    * It's a type of per "session" data storage location.
>>    * This needs to be 16 byte aligned.
>> @@ -191,6 +195,35 @@ asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv,
>>                  void *keys, u8 *out, unsigned int num_bytes);
>>   asmlinkage void aes_ctr_enc_256_avx_by8(const u8 *in, u8 *iv,
>>                  void *keys, u8 *out, unsigned int num_bytes);
>> +
>> +#ifdef CONFIG_CRYPTO_AES_CTR_AVX512
>> +asmlinkage void aes_ctr_enc_128_avx512_by16(void *keys, u8 *out,
>> +                                           const u8 *in,
>> +                                           unsigned int num_bytes,
>> +                                           u8 *iv);
>> +asmlinkage void aes_ctr_enc_192_avx512_by16(void *keys, u8 *out,
>> +                                           const u8 *in,
>> +                                           unsigned int num_bytes,
>> +                                           u8 *iv);
>> +asmlinkage void aes_ctr_enc_256_avx512_by16(void *keys, u8 *out,
>> +                                           const u8 *in,
>> +                                           unsigned int num_bytes,
>> +                                           u8 *iv);
>> +#else
>> +static inline void aes_ctr_enc_128_avx512_by16(void *keys, u8 *out,
>> +                                              const u8 *in,
>> +                                              unsigned int num_bytes,
>> +                                              u8 *iv) {}
>> +static inline void aes_ctr_enc_192_avx512_by16(void *keys, u8 *out,
>> +                                              const u8 *in,
>> +                                              unsigned int num_bytes,
>> +                                              u8 *iv) {}
>> +static inline void aes_ctr_enc_256_avx512_by16(void *keys, u8 *out,
>> +                                              const u8 *in,
>> +                                              unsigned int num_bytes,
>> +                                              u8 *iv) {}
>> +#endif
>> +
> Please drop these alternatives.
ok
>
>>   /*
>>    * asmlinkage void aesni_gcm_init_avx_gen2()
>>    * gcm_data *my_ctx_data, context data
>> @@ -487,6 +520,23 @@ static void aesni_ctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out,
>>                  aes_ctr_enc_256_avx_by8(in, iv, (void *)ctx, out, len);
>>   }
>>
>> +static void aesni_ctr_enc_avx512_tfm(struct crypto_aes_ctx *ctx, u8 *out,
>> +                                    const u8 *in, unsigned int len, u8 *iv)
>> +{
>> +       /*
>> +        * based on key length, override with the by16 version
>> +        * of ctr mode encryption/decryption for improved performance.
>> +        * aes_set_key_common() ensures that key length is one of
>> +        * {128,192,256}
>> +        */
>> +       if (ctx->key_length == AES_KEYSIZE_128)
>> +               aes_ctr_enc_128_avx512_by16((void *)ctx, out, in, len, iv);
>> +       else if (ctx->key_length == AES_KEYSIZE_192)
>> +               aes_ctr_enc_192_avx512_by16((void *)ctx, out, in, len, iv);
>> +       else
>> +               aes_ctr_enc_256_avx512_by16((void *)ctx, out, in, len, iv);
>> +}
>> +
>>   static int ctr_crypt(struct skcipher_request *req)
>>   {
>>          struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
>> @@ -1076,7 +1126,12 @@ static int __init aesni_init(void)
>>                  aesni_gcm_tfm = &aesni_gcm_tfm_sse;
>>          }
>>          aesni_ctr_enc_tfm = aesni_ctr_enc;
>> -       if (boot_cpu_has(X86_FEATURE_AVX)) {
>> +       if (use_avx512 && IS_ENABLED(CONFIG_CRYPTO_AES_CTR_AVX512) &&
>> +           cpu_feature_enabled(X86_FEATURE_VAES)) {
>> +               /* Ctr mode performance optimization using AVX512 */
>> +               aesni_ctr_enc_tfm = aesni_ctr_enc_avx512_tfm;
>> +               pr_info("AES CTR mode by16 optimization enabled\n");
> This will need to be changed to a static_call_update() once my
> outstanding patch is merged.
yeah will do!
>
>> +       } else if (boot_cpu_has(X86_FEATURE_AVX)) {
>>                  /* optimize performance of ctr mode encryption transform */
>>                  aesni_ctr_enc_tfm = aesni_ctr_enc_avx_tfm;
>>                  pr_info("AES CTR mode by8 optimization enabled\n");

  reply	other threads:[~2021-01-20 23:45 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-18 21:10 [RFC V1 0/7] Introduce AVX512 optimized crypto algorithms Megha Dey
2020-12-18 21:10 ` [RFC V1 1/7] x86: Probe assembler capabilities for VAES and VPLCMULQDQ support Megha Dey
2021-01-16 16:54   ` Ard Biesheuvel
2021-01-20 22:38     ` Dey, Megha
2020-12-18 21:10 ` [RFC V1 2/7] crypto: crct10dif - Accelerated CRC T10 DIF with vectorized instruction Megha Dey
2021-01-16 17:00   ` Ard Biesheuvel
2021-01-20 22:46     ` Dey, Megha
2020-12-18 21:11 ` [RFC V1 3/7] crypto: ghash - Optimized GHASH computations Megha Dey
2020-12-19 17:03   ` Ard Biesheuvel
2021-01-16  0:14     ` Dey, Megha
2021-01-16  0:20       ` Dave Hansen
2021-01-16  2:04         ` Eric Biggers
2021-01-16  5:13           ` Dave Hansen
2021-01-16 16:48             ` Ard Biesheuvel
2021-01-16  1:43       ` Eric Biggers
2021-01-16  5:07         ` Dey, Megha
2020-12-18 21:11 ` [RFC V1 4/7] crypto: tcrypt - Add speed test for optimized " Megha Dey
2020-12-18 21:11 ` [RFC V1 5/7] crypto: aesni - AES CTR x86_64 "by16" AVX512 optimization Megha Dey
2021-01-16 17:03   ` Ard Biesheuvel
2021-01-20 22:46     ` Dey, Megha [this message]
2020-12-18 21:11 ` [RFC V1 6/7] crypto: aesni - fix coding style for if/else block Megha Dey
2020-12-18 21:11 ` [RFC V1 7/7] crypto: aesni - AVX512 version of AESNI-GCM using VPCLMULQDQ Megha Dey
2021-01-16 17:16   ` Ard Biesheuvel
2021-01-20 22:48     ` Dey, Megha
2020-12-21 23:20 ` [RFC V1 0/7] Introduce AVX512 optimized crypto algorithms Eric Biggers
2020-12-28 19:10   ` Dey, Megha
2021-01-16 16:52     ` Ard Biesheuvel
2021-01-16 18:35       ` Dey, Megha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3d8fe68d-bb30-c61a-7aeb-cca85d5f5df6@intel.com \
    --to=megha.dey@intel.com \
    --cc=andi.kleen@intel.com \
    --cc=ardb@kernel.org \
    --cc=dave.hansen@intel.com \
    --cc=davem@davemloft.net \
    --cc=greg.b.tucker@intel.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=ilya.albrekht@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=kyung.min.park@intel.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rajendrakumar.chinnaiyan@intel.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=robert.a.kasten@intel.com \
    --cc=ryan.d.saffores@intel.com \
    --cc=tim.c.chen@intel.com \
    --cc=tomasz.kantecki@intel.com \
    --cc=tony.luck@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.