linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: linux-crypto@vger.kernel.org, x86@kernel.org
Cc: linux-kernel@vger.kernel.org, Ard Biesheuvel <ardb@kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	"Chang S . Bae" <chang.seok.bae@intel.com>
Subject: [PATCH v2 5/6] crypto: x86/aes-xts - wire up VAES + AVX10/256 implementation
Date: Fri, 29 Mar 2024 01:03:53 -0700	[thread overview]
Message-ID: <20240329080355.2871-6-ebiggers@kernel.org> (raw)
In-Reply-To: <20240329080355.2871-1-ebiggers@kernel.org>

From: Eric Biggers <ebiggers@google.com>

Add an AES-XTS implementation "xts-aes-vaes-avx10_256" for x86_64 CPUs
with the VAES, VPCLMULQDQ, and either AVX10/256 or AVX512BW + AVX512VL
extensions.  This implementation avoids using zmm registers, instead
using ymm registers to operate on two AES blocks at a time.  The
assembly code is instantiated using a macro so that most of the source
code is shared with other implementations.

This is the optimal implementation on CPUs that support VAES and AVX512
but where the zmm registers should not be used due to downclocking
effects, for example Intel's Ice Lake.  It should also be the optimal
implementation on future CPUs that support AVX10/256 but not AVX10/512.

The performance is slightly better than that of xts-aes-vaes-avx2, which
uses the same 256-bit vector length, due to factors such as being able
to use ymm16-ymm31 to cache the AES round keys, and being able to use
the vpternlogd instruction to do XORs more efficiently.  For example, on
Ice Lake, the throughput of decrypting 4096-byte messages with
AES-256-XTS is 6.6% higher with xts-aes-vaes-avx10_256 than with
xts-aes-vaes-avx2.  While this is a small improvement, it is
straightforward to provide this implementation (xts-aes-vaes-avx10_256)
as long as we are providing xts-aes-vaes-avx2 and xts-aes-vaes-avx10_512
anyway, due to the way the _aes_xts_crypt macro is structured.

Signed-off-by: Eric Biggers <ebiggers@google.com>
---
 arch/x86/crypto/aes-xts-avx-x86_64.S |  9 +++++++++
 arch/x86/crypto/aesni-intel_glue.c   | 16 ++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/arch/x86/crypto/aes-xts-avx-x86_64.S b/arch/x86/crypto/aes-xts-avx-x86_64.S
index 43706213dfca..71be474b22da 100644
--- a/arch/x86/crypto/aes-xts-avx-x86_64.S
+++ b/arch/x86/crypto/aes-xts-avx-x86_64.S
@@ -815,6 +815,15 @@ SYM_TYPED_FUNC_START(aes_xts_encrypt_vaes_avx2)
 	_aes_xts_crypt	1
 SYM_FUNC_END(aes_xts_encrypt_vaes_avx2)
 SYM_TYPED_FUNC_START(aes_xts_decrypt_vaes_avx2)
 	_aes_xts_crypt	0
 SYM_FUNC_END(aes_xts_decrypt_vaes_avx2)
+
+.set	VL, 32
+.set	USE_AVX10, 1
+SYM_TYPED_FUNC_START(aes_xts_encrypt_vaes_avx10_256)
+	_aes_xts_crypt	1
+SYM_FUNC_END(aes_xts_encrypt_vaes_avx10_256)
+SYM_TYPED_FUNC_START(aes_xts_decrypt_vaes_avx10_256)
+	_aes_xts_crypt	0
+SYM_FUNC_END(aes_xts_decrypt_vaes_avx10_256)
 #endif /* CONFIG_AS_VAES && CONFIG_AS_VPCLMULQDQ */
diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c
index 4cc15c7207f3..914cbf5d1f5c 100644
--- a/arch/x86/crypto/aesni-intel_glue.c
+++ b/arch/x86/crypto/aesni-intel_glue.c
@@ -1297,10 +1297,11 @@ static struct skcipher_alg aes_xts_alg_##suffix = {			       \
 static struct simd_skcipher_alg *aes_xts_simdalg_##suffix
 
 DEFINE_XTS_ALG(aesni_avx, "xts-aes-aesni-avx", 500);
 #if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
 DEFINE_XTS_ALG(vaes_avx2, "xts-aes-vaes-avx2", 600);
+DEFINE_XTS_ALG(vaes_avx10_256, "xts-aes-vaes-avx10_256", 700);
 #endif
 
 static int __init register_xts_algs(void)
 {
 	int err;
@@ -1320,10 +1321,22 @@ static int __init register_xts_algs(void)
 		return 0;
 	err = simd_register_skciphers_compat(&aes_xts_alg_vaes_avx2, 1,
 					     &aes_xts_simdalg_vaes_avx2);
 	if (err)
 		return err;
+
+	if (!boot_cpu_has(X86_FEATURE_AVX512BW) ||
+	    !boot_cpu_has(X86_FEATURE_AVX512VL) ||
+	    !boot_cpu_has(X86_FEATURE_BMI2) ||
+	    !cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM |
+			       XFEATURE_MASK_AVX512, NULL))
+		return 0;
+
+	err = simd_register_skciphers_compat(&aes_xts_alg_vaes_avx10_256, 1,
+					     &aes_xts_simdalg_vaes_avx10_256);
+	if (err)
+		return err;
 #endif /* CONFIG_AS_VAES && CONFIG_AS_VPCLMULQDQ */
 	return 0;
 }
 
 static void unregister_xts_algs(void)
@@ -1333,10 +1346,13 @@ static void unregister_xts_algs(void)
 					  &aes_xts_simdalg_aesni_avx);
 #if defined(CONFIG_AS_VAES) && defined(CONFIG_AS_VPCLMULQDQ)
 	if (aes_xts_simdalg_vaes_avx2)
 		simd_unregister_skciphers(&aes_xts_alg_vaes_avx2, 1,
 					  &aes_xts_simdalg_vaes_avx2);
+	if (aes_xts_simdalg_vaes_avx10_256)
+		simd_unregister_skciphers(&aes_xts_alg_vaes_avx10_256, 1,
+					  &aes_xts_simdalg_vaes_avx10_256);
 #endif
 }
 #else /* CONFIG_X86_64 */
 static int __init register_xts_algs(void)
 {
-- 
2.44.0


  parent reply	other threads:[~2024-03-29  8:06 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-29  8:03 [PATCH v2 0/6] Faster AES-XTS on modern x86_64 CPUs Eric Biggers
2024-03-29  8:03 ` [PATCH v2 1/6] x86: add kconfig symbols for assembler VAES and VPCLMULQDQ support Eric Biggers
2024-03-29  8:03 ` [PATCH v2 2/6] crypto: x86/aes-xts - add AES-XTS assembly macro for modern CPUs Eric Biggers
2024-03-29  8:03 ` [PATCH v2 3/6] crypto: x86/aes-xts - wire up AESNI + AVX implementation Eric Biggers
2024-03-29  8:03 ` [PATCH v2 4/6] crypto: x86/aes-xts - wire up VAES + AVX2 implementation Eric Biggers
2024-03-29  8:03 ` Eric Biggers [this message]
2024-03-29  8:03 ` [PATCH v2 6/6] crypto: x86/aes-xts - wire up VAES + AVX10/512 implementation Eric Biggers
2024-04-04 20:34   ` Dave Hansen
2024-04-04 23:36     ` Eric Biggers
2024-04-04 23:53       ` Dave Hansen
2024-04-05  0:11         ` Eric Biggers
2024-04-05  7:20           ` Herbert Xu
2024-03-29  9:03 ` [PATCH v2 0/6] Faster AES-XTS on modern x86_64 CPUs Ard Biesheuvel
2024-03-29  9:31   ` Eric Biggers
2024-04-03  0:44     ` Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240329080355.2871-6-ebiggers@kernel.org \
    --to=ebiggers@kernel.org \
    --cc=ardb@kernel.org \
    --cc=chang.seok.bae@intel.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).