* Re: [PATCH v5 4/8] crypto: x86/aesni-xctr: Add accelerated implementation of XCTR
@ 2022-04-27 9:56 kernel test robot
0 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2022-04-27 9:56 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 3144 bytes --]
CC: kbuild-all(a)lists.01.org
BCC: lkp(a)intel.com
In-Reply-To: <20220427003759.1115361-5-nhuck@google.com>
References: <20220427003759.1115361-5-nhuck@google.com>
TO: Nathan Huckleberry <nhuck@google.com>
Hi Nathan,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on herbert-cryptodev-2.6/master]
[also build test WARNING on herbert-crypto-2.6/master linus/master v5.18-rc4 next-20220427]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/intel-lab-lkp/linux/commits/Nathan-Huckleberry/crypto-HCTR2-support/20220427-084044
base: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
:::::: branch date: 9 hours ago
:::::: commit date: 9 hours ago
config: i386-randconfig-s002-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271747.9NI6Iyxe-lkp(a)intel.com/config)
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.4-dirty
# https://github.com/intel-lab-lkp/linux/commit/00cd244c8a1bd9623a271407bf10b99c01884ef5
git remote add linux-review https://github.com/intel-lab-lkp/linux
git fetch --no-tags linux-review Nathan-Huckleberry/crypto-HCTR2-support/20220427-084044
git checkout 00cd244c8a1bd9623a271407bf10b99c01884ef5
# save the config file
mkdir build_dir && cp config build_dir/.config
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' O=build_dir ARCH=i386 SHELL=/bin/bash arch/x86/crypto/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
sparse warnings: (new ones prefixed by >>)
>> arch/x86/crypto/aesni-intel_glue.c:1287:1: sparse: sparse: unused label 'unregister_aeads'
vim +/unregister_aeads +1287 arch/x86/crypto/aesni-intel_glue.c
00cd244c8a1bd9 Nathan Huckleberry 2022-04-27 1284
85671860caaca2 Herbert Xu 2016-11-22 1285 return 0;
85671860caaca2 Herbert Xu 2016-11-22 1286
00cd244c8a1bd9 Nathan Huckleberry 2022-04-27 @1287 unregister_aeads:
00cd244c8a1bd9 Nathan Huckleberry 2022-04-27 1288 simd_unregister_aeads(aesni_aeads, ARRAY_SIZE(aesni_aeads),
00cd244c8a1bd9 Nathan Huckleberry 2022-04-27 1289 aesni_simd_aeads);
85671860caaca2 Herbert Xu 2016-11-22 1290 unregister_skciphers:
8b56d3488d8755 Eric Biggers 2019-03-10 1291 simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers),
8b56d3488d8755 Eric Biggers 2019-03-10 1292 aesni_simd_skciphers);
07269559ac0bf7 Eric Biggers 2019-06-02 1293 unregister_cipher:
07269559ac0bf7 Eric Biggers 2019-06-02 1294 crypto_unregister_alg(&aesni_cipher_alg);
af05b3009b6b10 Herbert Xu 2015-05-28 1295 return err;
54b6a1bd5364ac Huang Ying 2009-01-18 1296 }
54b6a1bd5364ac Huang Ying 2009-01-18 1297
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 4/8] crypto: x86/aesni-xctr: Add accelerated implementation of XCTR
@ 2022-04-27 9:12 kernel test robot
0 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2022-04-27 9:12 UTC (permalink / raw)
To: kbuild
[-- Attachment #1: Type: text/plain, Size: 4746 bytes --]
CC: kbuild-all(a)lists.01.org
BCC: lkp(a)intel.com
In-Reply-To: <20220427003759.1115361-5-nhuck@google.com>
References: <20220427003759.1115361-5-nhuck@google.com>
TO: Nathan Huckleberry <nhuck@google.com>
Hi Nathan,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on herbert-cryptodev-2.6/master]
[also build test WARNING on herbert-crypto-2.6/master linus/master v5.18-rc4 next-20220427]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/intel-lab-lkp/linux/commits/Nathan-Huckleberry/crypto-HCTR2-support/20220427-084044
base: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master
:::::: branch date: 8 hours ago
:::::: commit date: 8 hours ago
compiler: gcc-11 (Debian 11.2.0-20) 11.2.0
reproduce (cppcheck warning):
# apt-get install cppcheck
git checkout 00cd244c8a1bd9623a271407bf10b99c01884ef5
cppcheck --quiet --enable=style,performance,portability --template=gcc FILE
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
cppcheck possible warnings: (new ones prefixed by >>, may not real problems)
>> arch/x86/crypto/aesni-intel_glue.c:1287:1: warning: Label 'unregister_aeads' is not used. There is #if in function body so the label might be used in code that is removed by the preprocessor. [unusedLabelConfiguration]
unregister_aeads:
^
arch/x86/crypto/aesni-intel_glue.c:831:8: warning: Local variable 'aes_ctx' shadows outer function [shadowFunction]
void *aes_ctx = &(ctx->aes_key_expanded);
^
arch/x86/crypto/aesni-intel_glue.c:222:38: note: Shadowed declaration
static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
^
arch/x86/crypto/aesni-intel_glue.c:831:8: note: Shadow variable
void *aes_ctx = &(ctx->aes_key_expanded);
^
arch/x86/crypto/aesni-intel_glue.c:859:8: warning: Local variable 'aes_ctx' shadows outer function [shadowFunction]
void *aes_ctx = &(ctx->aes_key_expanded);
^
arch/x86/crypto/aesni-intel_glue.c:222:38: note: Shadowed declaration
static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
^
arch/x86/crypto/aesni-intel_glue.c:859:8: note: Shadow variable
void *aes_ctx = &(ctx->aes_key_expanded);
^
arch/x86/crypto/aesni-intel_glue.c:1162:8: warning: Local variable 'aes_ctx' shadows outer function [shadowFunction]
void *aes_ctx = &(ctx->aes_key_expanded);
^
arch/x86/crypto/aesni-intel_glue.c:222:38: note: Shadowed declaration
static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
^
arch/x86/crypto/aesni-intel_glue.c:1162:8: note: Shadow variable
void *aes_ctx = &(ctx->aes_key_expanded);
^
arch/x86/crypto/aesni-intel_glue.c:1179:8: warning: Local variable 'aes_ctx' shadows outer function [shadowFunction]
void *aes_ctx = &(ctx->aes_key_expanded);
^
arch/x86/crypto/aesni-intel_glue.c:222:38: note: Shadowed declaration
static inline struct crypto_aes_ctx *aes_ctx(void *raw_ctx)
^
arch/x86/crypto/aesni-intel_glue.c:1179:8: note: Shadow variable
void *aes_ctx = &(ctx->aes_key_expanded);
^
vim +/unregister_aeads +1287 arch/x86/crypto/aesni-intel_glue.c
00cd244c8a1bd9 Nathan Huckleberry 2022-04-27 1284
85671860caaca2 Herbert Xu 2016-11-22 1285 return 0;
85671860caaca2 Herbert Xu 2016-11-22 1286
00cd244c8a1bd9 Nathan Huckleberry 2022-04-27 @1287 unregister_aeads:
00cd244c8a1bd9 Nathan Huckleberry 2022-04-27 1288 simd_unregister_aeads(aesni_aeads, ARRAY_SIZE(aesni_aeads),
00cd244c8a1bd9 Nathan Huckleberry 2022-04-27 1289 aesni_simd_aeads);
85671860caaca2 Herbert Xu 2016-11-22 1290 unregister_skciphers:
8b56d3488d8755 Eric Biggers 2019-03-10 1291 simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers),
8b56d3488d8755 Eric Biggers 2019-03-10 1292 aesni_simd_skciphers);
07269559ac0bf7 Eric Biggers 2019-06-02 1293 unregister_cipher:
07269559ac0bf7 Eric Biggers 2019-06-02 1294 crypto_unregister_alg(&aesni_cipher_alg);
af05b3009b6b10 Herbert Xu 2015-05-28 1295 return err;
54b6a1bd5364ac Huang Ying 2009-01-18 1296 }
54b6a1bd5364ac Huang Ying 2009-01-18 1297
--
0-DAY CI Kernel Test Service
https://01.org/lkp
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v5 0/8] crypto: HCTR2 support @ 2022-04-27 0:37 Nathan Huckleberry 2022-04-27 0:37 ` Nathan Huckleberry 0 siblings, 1 reply; 7+ messages in thread From: Nathan Huckleberry @ 2022-04-27 0:37 UTC (permalink / raw) To: linux-crypto Cc: linux-fscrypt.vger.kernel.org, Herbert Xu, David S. Miller, linux-arm-kernel, Paul Crowley, Eric Biggers, Sami Tolvanen, Ard Biesheuvel, Nathan Huckleberry HCTR2 is a length-preserving encryption mode that is efficient on processors with instructions to accelerate AES and carryless multiplication, e.g. x86 processors with AES-NI and CLMUL, and ARM processors with the ARMv8 Crypto Extensions. HCTR2 is specified in https://ia.cr/2021/1441 "Length-preserving encryption with HCTR2" which shows that if AES is secure and HCTR2 is instantiated with AES, then HCTR2 is secure. Reference code and test vectors are at https://github.com/google/hctr2. As a length-preserving encryption mode, HCTR2 is suitable for applications such as storage encryption where ciphertext expansion is not possible, and thus authenticated encryption cannot be used. Currently, such applications usually use XTS, or in some cases Adiantum. XTS has the disadvantage that it is a narrow-block mode: a bitflip will only change 16 bytes in the resulting ciphertext or plaintext. This reveals more information to an attacker than necessary. HCTR2 is a wide-block mode, so it provides a stronger security property: a bitflip will change the entire message. HCTR2 is somewhat similar to Adiantum, which is also a wide-block mode. However, HCTR2 is designed to take advantage of existing crypto instructions, while Adiantum targets devices without such hardware support. Adiantum is also designed with longer messages in mind, while HCTR2 is designed to be efficient even on short messages. The first intended use of this mode in the kernel is for the encryption of filenames, where for efficiency reasons encryption must be fully deterministic (only one ciphertext for each plaintext) and the existing CBC solution leaks more information than necessary for filenames with common prefixes. HCTR2 uses two passes of an ε-almost-∆-universal hash function called POLYVAL and one pass of a block cipher mode called XCTR. POLYVAL is a polynomial hash designed for efficiency on modern processors and was originally specified for use in AES-GCM-SIV (RFC 8452). XCTR mode is a variant of CTR mode that is more efficient on little-endian machines. This patchset adds HCTR2 to Linux's crypto API, including generic implementations of XCTR and POLYVAL, hardware accelerated implementations of XCTR and POLYVAL for both x86-64 and ARM64, a templated implementation of HCTR2, and an fscrypt policy for using HCTR2 for filename encryption. Changes in v5: * Refactor HCTR2 tweak hashing * Remove non-AVX x86-64 XCTR implementation * Combine arm64 CTR and XCTR modes * Comment and alias CTR and XCTR modes * Move generic fallback code for simd POLYVAL into polyval-generic.c * Various small style fixes Changes in v4: * Small style fixes in generic POLYVAL and XCTR * Move HCTR2 hash exporting/importing to helper functions * Rewrite montgomery reduction for x86-64 POLYVAL * Rewrite partial block handling for x86-64 POLYVAL * Optimize x86-64 POLYVAL loop handling * Remove ahash wrapper from x86-64 POLYVAL * Add simd-unavailable handling to x86-64 POLYVAL * Rewrite montgomery reduction for ARM64 POLYVAL * Rewrite partial block handling for ARM64 POLYVAL * Optimize ARM64 POLYVAL loop handling * Remove ahash wrapper from ARM64 POLYVAL * Add simd-unavailable handling to ARM64 POLYVAL Changes in v3: * Improve testvec coverage for XCTR, POLYVAL and HCTR2 * Fix endianness bug in xctr.c * Fix alignment issues in polyval-generic.c * Optimize hctr2.c by exporting/importing hash states * Fix blockcipher name derivation in hctr2.c * Move x86-64 XCTR implementation into aes_ctrby8_avx-x86_64.S * Reuse ARM64 CTR mode tail handling in ARM64 XCTR * Fix x86-64 POLYVAL comments * Fix x86-64 POLYVAL key_powers type to match asm * Fix ARM64 POLYVAL comments * Fix ARM64 POLYVAL key_powers type to match asm * Add XTS + HCTR2 policy to fscrypt Nathan Huckleberry (8): crypto: xctr - Add XCTR support crypto: polyval - Add POLYVAL support crypto: hctr2 - Add HCTR2 support crypto: x86/aesni-xctr: Add accelerated implementation of XCTR crypto: arm64/aes-xctr: Add accelerated implementation of XCTR crypto: x86/polyval: Add PCLMULQDQ accelerated implementation of POLYVAL crypto: arm64/polyval: Add PMULL accelerated implementation of POLYVAL fscrypt: Add HCTR2 support for filename encryption Documentation/filesystems/fscrypt.rst | 22 +- arch/arm64/crypto/Kconfig | 9 +- arch/arm64/crypto/Makefile | 3 + arch/arm64/crypto/aes-glue.c | 64 +- arch/arm64/crypto/aes-modes.S | 290 +++-- arch/arm64/crypto/polyval-ce-core.S | 369 ++++++ arch/arm64/crypto/polyval-ce-glue.c | 194 +++ arch/x86/crypto/Makefile | 3 + arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 232 ++-- arch/x86/crypto/aesni-intel_glue.c | 109 ++ arch/x86/crypto/polyval-clmulni_asm.S | 330 +++++ arch/x86/crypto/polyval-clmulni_glue.c | 200 +++ crypto/Kconfig | 39 +- crypto/Makefile | 3 + crypto/hctr2.c | 580 +++++++++ crypto/polyval-generic.c | 242 ++++ crypto/tcrypt.c | 10 + crypto/testmgr.c | 20 + crypto/testmgr.h | 1536 +++++++++++++++++++++++ crypto/xctr.c | 191 +++ fs/crypto/fscrypt_private.h | 2 +- fs/crypto/keysetup.c | 7 + fs/crypto/policy.c | 14 +- include/crypto/polyval.h | 26 + include/uapi/linux/fscrypt.h | 3 +- 25 files changed, 4306 insertions(+), 192 deletions(-) create mode 100644 arch/arm64/crypto/polyval-ce-core.S create mode 100644 arch/arm64/crypto/polyval-ce-glue.c create mode 100644 arch/x86/crypto/polyval-clmulni_asm.S create mode 100644 arch/x86/crypto/polyval-clmulni_glue.c create mode 100644 crypto/hctr2.c create mode 100644 crypto/polyval-generic.c create mode 100644 crypto/xctr.c create mode 100644 include/crypto/polyval.h -- 2.36.0.rc2.479.g8af0fa9b8e-goog ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v5 4/8] crypto: x86/aesni-xctr: Add accelerated implementation of XCTR 2022-04-27 0:37 [PATCH v5 0/8] crypto: HCTR2 support Nathan Huckleberry @ 2022-04-27 0:37 ` Nathan Huckleberry 0 siblings, 0 replies; 7+ messages in thread From: Nathan Huckleberry @ 2022-04-27 0:37 UTC (permalink / raw) To: linux-crypto Cc: linux-fscrypt.vger.kernel.org, Herbert Xu, David S. Miller, linux-arm-kernel, Paul Crowley, Eric Biggers, Sami Tolvanen, Ard Biesheuvel, Nathan Huckleberry Add hardware accelerated versions of XCTR for x86-64 CPUs with AESNI support. These implementations are modified versions of the CTR implementations found in aesni-intel_asm.S and aes_ctrby8_avx-x86_64.S. More information on XCTR can be found in the HCTR2 paper: "Length-preserving encryption with HCTR2": https://eprint.iacr.org/2021/1441.pdf Signed-off-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> --- arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 232 ++++++++++++++++-------- arch/x86/crypto/aesni-intel_glue.c | 109 +++++++++++ crypto/Kconfig | 2 +- 3 files changed, 262 insertions(+), 81 deletions(-) diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S index 43852ba6e19c..6de06779b77c 100644 --- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S +++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S @@ -23,6 +23,10 @@ #define VMOVDQ vmovdqu +/* Note: the "x" prefix in these aliases means "this is an xmm register". The + * alias prefixes have no relation to XCTR where the "X" prefix means "XOR + * counter". + */ #define xdata0 %xmm0 #define xdata1 %xmm1 #define xdata2 %xmm2 @@ -31,8 +35,10 @@ #define xdata5 %xmm5 #define xdata6 %xmm6 #define xdata7 %xmm7 -#define xcounter %xmm8 -#define xbyteswap %xmm9 +#define xcounter %xmm8 // CTR mode only +#define xiv %xmm8 // XCTR mode only +#define xbyteswap %xmm9 // CTR mode only +#define xtmp %xmm9 // XCTR mode only #define xkey0 %xmm10 #define xkey4 %xmm11 #define xkey8 %xmm12 @@ -45,7 +51,7 @@ #define p_keys %rdx #define p_out %rcx #define num_bytes %r8 - +#define counter %r9 // XCTR mode only #define tmp %r10 #define DDQ_DATA 0 #define XDATA 1 @@ -102,7 +108,7 @@ ddq_add_8: * do_aes num_in_par load_keys key_len * This increments p_in, but not p_out */ -.macro do_aes b, k, key_len +.macro do_aes b, k, key_len, xctr .set by, \b .set load_keys, \k .set klen, \key_len @@ -111,29 +117,48 @@ ddq_add_8: vmovdqa 0*16(p_keys), xkey0 .endif - vpshufb xbyteswap, xcounter, xdata0 - - .set i, 1 - .rept (by - 1) - club XDATA, i - vpaddq (ddq_add_1 + 16 * (i - 1))(%rip), xcounter, var_xdata - vptest ddq_low_msk(%rip), var_xdata - jnz 1f - vpaddq ddq_high_add_1(%rip), var_xdata, var_xdata - vpaddq ddq_high_add_1(%rip), xcounter, xcounter - 1: - vpshufb xbyteswap, var_xdata, var_xdata - .set i, (i +1) - .endr + .if !\xctr + vpshufb xbyteswap, xcounter, xdata0 + .set i, 1 + .rept (by - 1) + club XDATA, i + vpaddq (ddq_add_1 + 16 * (i - 1))(%rip), xcounter, var_xdata + vptest ddq_low_msk(%rip), var_xdata + jnz 1f + vpaddq ddq_high_add_1(%rip), var_xdata, var_xdata + vpaddq ddq_high_add_1(%rip), xcounter, xcounter + 1: + vpshufb xbyteswap, var_xdata, var_xdata + .set i, (i +1) + .endr + .else + movq counter, xtmp + .set i, 0 + .rept (by) + club XDATA, i + vpaddq (ddq_add_1 + 16 * i)(%rip), xtmp, var_xdata + .set i, (i +1) + .endr + .set i, 0 + .rept (by) + club XDATA, i + vpxor xiv, var_xdata, var_xdata + .set i, (i +1) + .endr + .endif vmovdqa 1*16(p_keys), xkeyA vpxor xkey0, xdata0, xdata0 - vpaddq (ddq_add_1 + 16 * (by - 1))(%rip), xcounter, xcounter - vptest ddq_low_msk(%rip), xcounter - jnz 1f - vpaddq ddq_high_add_1(%rip), xcounter, xcounter - 1: + .if !\xctr + vpaddq (ddq_add_1 + 16 * (by - 1))(%rip), xcounter, xcounter + vptest ddq_low_msk(%rip), xcounter + jnz 1f + vpaddq ddq_high_add_1(%rip), xcounter, xcounter + 1: + .else + add $by, counter + .endif .set i, 1 .rept (by - 1) @@ -371,94 +396,100 @@ ddq_add_8: .endr .endm -.macro do_aes_load val, key_len - do_aes \val, 1, \key_len +.macro do_aes_load val, key_len, xctr + do_aes \val, 1, \key_len, \xctr .endm -.macro do_aes_noload val, key_len - do_aes \val, 0, \key_len +.macro do_aes_noload val, key_len, xctr + do_aes \val, 0, \key_len, \xctr .endm /* main body of aes ctr load */ -.macro do_aes_ctrmain key_len +.macro do_aes_ctrmain key_len, xctr cmp $16, num_bytes - jb .Ldo_return2\key_len + jb .Ldo_return2\xctr\key_len - vmovdqa byteswap_const(%rip), xbyteswap - vmovdqu (p_iv), xcounter - vpshufb xbyteswap, xcounter, xcounter + .if !\xctr + vmovdqa byteswap_const(%rip), xbyteswap + vmovdqu (p_iv), xcounter + vpshufb xbyteswap, xcounter, xcounter + .else + andq $(~0xf), num_bytes + shr $4, counter + vmovdqu (p_iv), xiv + .endif mov num_bytes, tmp and $(7*16), tmp - jz .Lmult_of_8_blks\key_len + jz .Lmult_of_8_blks\xctr\key_len /* 1 <= tmp <= 7 */ cmp $(4*16), tmp - jg .Lgt4\key_len - je .Leq4\key_len + jg .Lgt4\xctr\key_len + je .Leq4\xctr\key_len -.Llt4\key_len: +.Llt4\xctr\key_len: cmp $(2*16), tmp - jg .Leq3\key_len - je .Leq2\key_len + jg .Leq3\xctr\key_len + je .Leq2\xctr\key_len -.Leq1\key_len: - do_aes_load 1, \key_len +.Leq1\xctr\key_len: + do_aes_load 1, \key_len, \xctr add $(1*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Leq2\key_len: - do_aes_load 2, \key_len +.Leq2\xctr\key_len: + do_aes_load 2, \key_len, \xctr add $(2*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Leq3\key_len: - do_aes_load 3, \key_len +.Leq3\xctr\key_len: + do_aes_load 3, \key_len, \xctr add $(3*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Leq4\key_len: - do_aes_load 4, \key_len +.Leq4\xctr\key_len: + do_aes_load 4, \key_len, \xctr add $(4*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Lgt4\key_len: +.Lgt4\xctr\key_len: cmp $(6*16), tmp - jg .Leq7\key_len - je .Leq6\key_len + jg .Leq7\xctr\key_len + je .Leq6\xctr\key_len -.Leq5\key_len: - do_aes_load 5, \key_len +.Leq5\xctr\key_len: + do_aes_load 5, \key_len, \xctr add $(5*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Leq6\key_len: - do_aes_load 6, \key_len +.Leq6\xctr\key_len: + do_aes_load 6, \key_len, \xctr add $(6*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Leq7\key_len: - do_aes_load 7, \key_len +.Leq7\xctr\key_len: + do_aes_load 7, \key_len, \xctr add $(7*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Lmult_of_8_blks\key_len: +.Lmult_of_8_blks\xctr\key_len: .if (\key_len != KEY_128) vmovdqa 0*16(p_keys), xkey0 vmovdqa 4*16(p_keys), xkey4 @@ -471,17 +502,19 @@ ddq_add_8: vmovdqa 9*16(p_keys), xkey12 .endif .align 16 -.Lmain_loop2\key_len: +.Lmain_loop2\xctr\key_len: /* num_bytes is a multiple of 8 and >0 */ - do_aes_noload 8, \key_len + do_aes_noload 8, \key_len, \xctr add $(8*16), p_out sub $(8*16), num_bytes - jne .Lmain_loop2\key_len + jne .Lmain_loop2\xctr\key_len -.Ldo_return2\key_len: - /* return updated IV */ - vpshufb xbyteswap, xcounter, xcounter - vmovdqu xcounter, (p_iv) +.Ldo_return2\xctr\key_len: + .if !\xctr + /* return updated IV */ + vpshufb xbyteswap, xcounter, xcounter + vmovdqu xcounter, (p_iv) + .endif RET .endm @@ -494,7 +527,7 @@ ddq_add_8: */ SYM_FUNC_START(aes_ctr_enc_128_avx_by8) /* call the aes main loop */ - do_aes_ctrmain KEY_128 + do_aes_ctrmain KEY_128 0 SYM_FUNC_END(aes_ctr_enc_128_avx_by8) @@ -507,7 +540,7 @@ SYM_FUNC_END(aes_ctr_enc_128_avx_by8) */ SYM_FUNC_START(aes_ctr_enc_192_avx_by8) /* call the aes main loop */ - do_aes_ctrmain KEY_192 + do_aes_ctrmain KEY_192 0 SYM_FUNC_END(aes_ctr_enc_192_avx_by8) @@ -520,6 +553,45 @@ SYM_FUNC_END(aes_ctr_enc_192_avx_by8) */ SYM_FUNC_START(aes_ctr_enc_256_avx_by8) /* call the aes main loop */ - do_aes_ctrmain KEY_256 + do_aes_ctrmain KEY_256 0 SYM_FUNC_END(aes_ctr_enc_256_avx_by8) + +/* + * routine to do AES128 XCTR enc/decrypt "by8" + * XMM registers are clobbered. + * Saving/restoring must be done at a higher level + * aes_xctr_enc_128_avx_by8(const u8 *in, const u8 *iv, const void *keys, + * u8* out, unsigned int num_bytes, unsigned int byte_ctr) + */ +SYM_FUNC_START(aes_xctr_enc_128_avx_by8) + /* call the aes main loop */ + do_aes_ctrmain KEY_128 1 + +SYM_FUNC_END(aes_xctr_enc_128_avx_by8) + +/* + * routine to do AES192 XCTR enc/decrypt "by8" + * XMM registers are clobbered. + * Saving/restoring must be done at a higher level + * aes_xctr_enc_192_avx_by8(const u8 *in, const u8 *iv, const void *keys, + * u8* out, unsigned int num_bytes, unsigned int byte_ctr) + */ +SYM_FUNC_START(aes_xctr_enc_192_avx_by8) + /* call the aes main loop */ + do_aes_ctrmain KEY_192 1 + +SYM_FUNC_END(aes_xctr_enc_192_avx_by8) + +/* + * routine to do AES256 XCTR enc/decrypt "by8" + * XMM registers are clobbered. + * Saving/restoring must be done at a higher level + * aes_xctr_enc_256_avx_by8(const u8 *in, const u8 *iv, const void *keys, + * u8* out, unsigned int num_bytes, unsigned int byte_ctr) + */ +SYM_FUNC_START(aes_xctr_enc_256_avx_by8) + /* call the aes main loop */ + do_aes_ctrmain KEY_256 1 + +SYM_FUNC_END(aes_xctr_enc_256_avx_by8) diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index 41901ba9d3a2..f79ed168a77b 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -135,6 +135,20 @@ asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv, void *keys, u8 *out, unsigned int num_bytes); asmlinkage void aes_ctr_enc_256_avx_by8(const u8 *in, u8 *iv, void *keys, u8 *out, unsigned int num_bytes); + + +asmlinkage void aes_xctr_enc_128_avx_by8(const u8 *in, const u8 *iv, + const void *keys, u8 *out, unsigned int num_bytes, + unsigned int byte_ctr); + +asmlinkage void aes_xctr_enc_192_avx_by8(const u8 *in, const u8 *iv, + const void *keys, u8 *out, unsigned int num_bytes, + unsigned int byte_ctr); + +asmlinkage void aes_xctr_enc_256_avx_by8(const u8 *in, const u8 *iv, + const void *keys, u8 *out, unsigned int num_bytes, + unsigned int byte_ctr); + /* * asmlinkage void aesni_gcm_init_avx_gen2() * gcm_data *my_ctx_data, context data @@ -527,6 +541,59 @@ static int ctr_crypt(struct skcipher_request *req) return err; } +static void aesni_xctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out, + const u8 *in, unsigned int len, u8 *iv, + unsigned int byte_ctr) +{ + if (ctx->key_length == AES_KEYSIZE_128) + aes_xctr_enc_128_avx_by8(in, iv, (void *)ctx, out, len, + byte_ctr); + else if (ctx->key_length == AES_KEYSIZE_192) + aes_xctr_enc_192_avx_by8(in, iv, (void *)ctx, out, len, + byte_ctr); + else + aes_xctr_enc_256_avx_by8(in, iv, (void *)ctx, out, len, + byte_ctr); +} + +static int xctr_crypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm)); + u8 keystream[AES_BLOCK_SIZE]; + struct skcipher_walk walk; + unsigned int nbytes; + unsigned int byte_ctr = 0; + int err; + __le32 block[AES_BLOCK_SIZE / sizeof(__le32)]; + + err = skcipher_walk_virt(&walk, req, false); + + while ((nbytes = walk.nbytes) > 0) { + kernel_fpu_begin(); + if (nbytes & AES_BLOCK_MASK) + aesni_xctr_enc_avx_tfm(ctx, walk.dst.virt.addr, + walk.src.virt.addr, nbytes & AES_BLOCK_MASK, + walk.iv, byte_ctr); + nbytes &= ~AES_BLOCK_MASK; + byte_ctr += walk.nbytes - nbytes; + + if (walk.nbytes == walk.total && nbytes > 0) { + memcpy(block, walk.iv, AES_BLOCK_SIZE); + block[0] ^= cpu_to_le32(1 + byte_ctr / AES_BLOCK_SIZE); + aesni_enc(ctx, keystream, (u8 *)block); + crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - + nbytes, walk.src.virt.addr + walk.nbytes + - nbytes, keystream, nbytes); + byte_ctr += nbytes; + nbytes = 0; + } + kernel_fpu_end(); + err = skcipher_walk_done(&walk, nbytes); + } + return err; +} + static int rfc4106_set_hash_subkey(u8 *hash_subkey, const u8 *key, unsigned int key_len) { @@ -1050,6 +1117,33 @@ static struct skcipher_alg aesni_skciphers[] = { static struct simd_skcipher_alg *aesni_simd_skciphers[ARRAY_SIZE(aesni_skciphers)]; +#ifdef CONFIG_X86_64 +/* + * XCTR does not have a non-AVX implementation, so it must be enabled + * conditionally. + */ +static struct skcipher_alg aesni_xctr = { + .base = { + .cra_name = "__xctr(aes)", + .cra_driver_name = "__xctr-aes-aesni", + .cra_priority = 400, + .cra_flags = CRYPTO_ALG_INTERNAL, + .cra_blocksize = 1, + .cra_ctxsize = CRYPTO_AES_CTX_SIZE, + .cra_module = THIS_MODULE, + }, + .min_keysize = AES_MIN_KEY_SIZE, + .max_keysize = AES_MAX_KEY_SIZE, + .ivsize = AES_BLOCK_SIZE, + .chunksize = AES_BLOCK_SIZE, + .setkey = aesni_skcipher_setkey, + .encrypt = xctr_crypt, + .decrypt = xctr_crypt, +}; + +static struct simd_skcipher_alg *aesni_simd_xctr; +#endif + #ifdef CONFIG_X86_64 static int generic_gcmaes_set_key(struct crypto_aead *aead, const u8 *key, unsigned int key_len) @@ -1180,8 +1274,19 @@ static int __init aesni_init(void) if (err) goto unregister_skciphers; +#ifdef CONFIG_X86_64 + if (boot_cpu_has(X86_FEATURE_AVX)) + err = simd_register_skciphers_compat(&aesni_xctr, 1, + &aesni_simd_xctr); + if (err) + goto unregister_aeads; +#endif + return 0; +unregister_aeads: + simd_unregister_aeads(aesni_aeads, ARRAY_SIZE(aesni_aeads), + aesni_simd_aeads); unregister_skciphers: simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers), aesni_simd_skciphers); @@ -1197,6 +1302,10 @@ static void __exit aesni_exit(void) simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers), aesni_simd_skciphers); crypto_unregister_alg(&aesni_cipher_alg); +#ifdef CONFIG_X86_64 + if (boot_cpu_has(X86_FEATURE_AVX)) + simd_unregister_skciphers(&aesni_xctr, 1, &aesni_simd_xctr); +#endif } late_initcall(aesni_init); diff --git a/crypto/Kconfig b/crypto/Kconfig index 0dedba74db4a..aa06af0e0ebe 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1161,7 +1161,7 @@ config CRYPTO_AES_NI_INTEL In addition to AES cipher algorithm support, the acceleration for some popular block cipher mode is supported too, including ECB, CBC, LRW, XTS. The 64 bit version has additional - acceleration for CTR. + acceleration for CTR and XCTR. config CRYPTO_AES_SPARC64 tristate "AES cipher algorithms (SPARC64)" -- 2.36.0.rc2.479.g8af0fa9b8e-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH v5 4/8] crypto: x86/aesni-xctr: Add accelerated implementation of XCTR @ 2022-04-27 0:37 ` Nathan Huckleberry 0 siblings, 0 replies; 7+ messages in thread From: Nathan Huckleberry @ 2022-04-27 0:37 UTC (permalink / raw) To: linux-crypto Cc: linux-fscrypt.vger.kernel.org, Herbert Xu, David S. Miller, linux-arm-kernel, Paul Crowley, Eric Biggers, Sami Tolvanen, Ard Biesheuvel, Nathan Huckleberry Add hardware accelerated versions of XCTR for x86-64 CPUs with AESNI support. These implementations are modified versions of the CTR implementations found in aesni-intel_asm.S and aes_ctrby8_avx-x86_64.S. More information on XCTR can be found in the HCTR2 paper: "Length-preserving encryption with HCTR2": https://eprint.iacr.org/2021/1441.pdf Signed-off-by: Nathan Huckleberry <nhuck@google.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> --- arch/x86/crypto/aes_ctrby8_avx-x86_64.S | 232 ++++++++++++++++-------- arch/x86/crypto/aesni-intel_glue.c | 109 +++++++++++ crypto/Kconfig | 2 +- 3 files changed, 262 insertions(+), 81 deletions(-) diff --git a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S index 43852ba6e19c..6de06779b77c 100644 --- a/arch/x86/crypto/aes_ctrby8_avx-x86_64.S +++ b/arch/x86/crypto/aes_ctrby8_avx-x86_64.S @@ -23,6 +23,10 @@ #define VMOVDQ vmovdqu +/* Note: the "x" prefix in these aliases means "this is an xmm register". The + * alias prefixes have no relation to XCTR where the "X" prefix means "XOR + * counter". + */ #define xdata0 %xmm0 #define xdata1 %xmm1 #define xdata2 %xmm2 @@ -31,8 +35,10 @@ #define xdata5 %xmm5 #define xdata6 %xmm6 #define xdata7 %xmm7 -#define xcounter %xmm8 -#define xbyteswap %xmm9 +#define xcounter %xmm8 // CTR mode only +#define xiv %xmm8 // XCTR mode only +#define xbyteswap %xmm9 // CTR mode only +#define xtmp %xmm9 // XCTR mode only #define xkey0 %xmm10 #define xkey4 %xmm11 #define xkey8 %xmm12 @@ -45,7 +51,7 @@ #define p_keys %rdx #define p_out %rcx #define num_bytes %r8 - +#define counter %r9 // XCTR mode only #define tmp %r10 #define DDQ_DATA 0 #define XDATA 1 @@ -102,7 +108,7 @@ ddq_add_8: * do_aes num_in_par load_keys key_len * This increments p_in, but not p_out */ -.macro do_aes b, k, key_len +.macro do_aes b, k, key_len, xctr .set by, \b .set load_keys, \k .set klen, \key_len @@ -111,29 +117,48 @@ ddq_add_8: vmovdqa 0*16(p_keys), xkey0 .endif - vpshufb xbyteswap, xcounter, xdata0 - - .set i, 1 - .rept (by - 1) - club XDATA, i - vpaddq (ddq_add_1 + 16 * (i - 1))(%rip), xcounter, var_xdata - vptest ddq_low_msk(%rip), var_xdata - jnz 1f - vpaddq ddq_high_add_1(%rip), var_xdata, var_xdata - vpaddq ddq_high_add_1(%rip), xcounter, xcounter - 1: - vpshufb xbyteswap, var_xdata, var_xdata - .set i, (i +1) - .endr + .if !\xctr + vpshufb xbyteswap, xcounter, xdata0 + .set i, 1 + .rept (by - 1) + club XDATA, i + vpaddq (ddq_add_1 + 16 * (i - 1))(%rip), xcounter, var_xdata + vptest ddq_low_msk(%rip), var_xdata + jnz 1f + vpaddq ddq_high_add_1(%rip), var_xdata, var_xdata + vpaddq ddq_high_add_1(%rip), xcounter, xcounter + 1: + vpshufb xbyteswap, var_xdata, var_xdata + .set i, (i +1) + .endr + .else + movq counter, xtmp + .set i, 0 + .rept (by) + club XDATA, i + vpaddq (ddq_add_1 + 16 * i)(%rip), xtmp, var_xdata + .set i, (i +1) + .endr + .set i, 0 + .rept (by) + club XDATA, i + vpxor xiv, var_xdata, var_xdata + .set i, (i +1) + .endr + .endif vmovdqa 1*16(p_keys), xkeyA vpxor xkey0, xdata0, xdata0 - vpaddq (ddq_add_1 + 16 * (by - 1))(%rip), xcounter, xcounter - vptest ddq_low_msk(%rip), xcounter - jnz 1f - vpaddq ddq_high_add_1(%rip), xcounter, xcounter - 1: + .if !\xctr + vpaddq (ddq_add_1 + 16 * (by - 1))(%rip), xcounter, xcounter + vptest ddq_low_msk(%rip), xcounter + jnz 1f + vpaddq ddq_high_add_1(%rip), xcounter, xcounter + 1: + .else + add $by, counter + .endif .set i, 1 .rept (by - 1) @@ -371,94 +396,100 @@ ddq_add_8: .endr .endm -.macro do_aes_load val, key_len - do_aes \val, 1, \key_len +.macro do_aes_load val, key_len, xctr + do_aes \val, 1, \key_len, \xctr .endm -.macro do_aes_noload val, key_len - do_aes \val, 0, \key_len +.macro do_aes_noload val, key_len, xctr + do_aes \val, 0, \key_len, \xctr .endm /* main body of aes ctr load */ -.macro do_aes_ctrmain key_len +.macro do_aes_ctrmain key_len, xctr cmp $16, num_bytes - jb .Ldo_return2\key_len + jb .Ldo_return2\xctr\key_len - vmovdqa byteswap_const(%rip), xbyteswap - vmovdqu (p_iv), xcounter - vpshufb xbyteswap, xcounter, xcounter + .if !\xctr + vmovdqa byteswap_const(%rip), xbyteswap + vmovdqu (p_iv), xcounter + vpshufb xbyteswap, xcounter, xcounter + .else + andq $(~0xf), num_bytes + shr $4, counter + vmovdqu (p_iv), xiv + .endif mov num_bytes, tmp and $(7*16), tmp - jz .Lmult_of_8_blks\key_len + jz .Lmult_of_8_blks\xctr\key_len /* 1 <= tmp <= 7 */ cmp $(4*16), tmp - jg .Lgt4\key_len - je .Leq4\key_len + jg .Lgt4\xctr\key_len + je .Leq4\xctr\key_len -.Llt4\key_len: +.Llt4\xctr\key_len: cmp $(2*16), tmp - jg .Leq3\key_len - je .Leq2\key_len + jg .Leq3\xctr\key_len + je .Leq2\xctr\key_len -.Leq1\key_len: - do_aes_load 1, \key_len +.Leq1\xctr\key_len: + do_aes_load 1, \key_len, \xctr add $(1*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Leq2\key_len: - do_aes_load 2, \key_len +.Leq2\xctr\key_len: + do_aes_load 2, \key_len, \xctr add $(2*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Leq3\key_len: - do_aes_load 3, \key_len +.Leq3\xctr\key_len: + do_aes_load 3, \key_len, \xctr add $(3*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Leq4\key_len: - do_aes_load 4, \key_len +.Leq4\xctr\key_len: + do_aes_load 4, \key_len, \xctr add $(4*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Lgt4\key_len: +.Lgt4\xctr\key_len: cmp $(6*16), tmp - jg .Leq7\key_len - je .Leq6\key_len + jg .Leq7\xctr\key_len + je .Leq6\xctr\key_len -.Leq5\key_len: - do_aes_load 5, \key_len +.Leq5\xctr\key_len: + do_aes_load 5, \key_len, \xctr add $(5*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Leq6\key_len: - do_aes_load 6, \key_len +.Leq6\xctr\key_len: + do_aes_load 6, \key_len, \xctr add $(6*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Leq7\key_len: - do_aes_load 7, \key_len +.Leq7\xctr\key_len: + do_aes_load 7, \key_len, \xctr add $(7*16), p_out and $(~7*16), num_bytes - jz .Ldo_return2\key_len - jmp .Lmain_loop2\key_len + jz .Ldo_return2\xctr\key_len + jmp .Lmain_loop2\xctr\key_len -.Lmult_of_8_blks\key_len: +.Lmult_of_8_blks\xctr\key_len: .if (\key_len != KEY_128) vmovdqa 0*16(p_keys), xkey0 vmovdqa 4*16(p_keys), xkey4 @@ -471,17 +502,19 @@ ddq_add_8: vmovdqa 9*16(p_keys), xkey12 .endif .align 16 -.Lmain_loop2\key_len: +.Lmain_loop2\xctr\key_len: /* num_bytes is a multiple of 8 and >0 */ - do_aes_noload 8, \key_len + do_aes_noload 8, \key_len, \xctr add $(8*16), p_out sub $(8*16), num_bytes - jne .Lmain_loop2\key_len + jne .Lmain_loop2\xctr\key_len -.Ldo_return2\key_len: - /* return updated IV */ - vpshufb xbyteswap, xcounter, xcounter - vmovdqu xcounter, (p_iv) +.Ldo_return2\xctr\key_len: + .if !\xctr + /* return updated IV */ + vpshufb xbyteswap, xcounter, xcounter + vmovdqu xcounter, (p_iv) + .endif RET .endm @@ -494,7 +527,7 @@ ddq_add_8: */ SYM_FUNC_START(aes_ctr_enc_128_avx_by8) /* call the aes main loop */ - do_aes_ctrmain KEY_128 + do_aes_ctrmain KEY_128 0 SYM_FUNC_END(aes_ctr_enc_128_avx_by8) @@ -507,7 +540,7 @@ SYM_FUNC_END(aes_ctr_enc_128_avx_by8) */ SYM_FUNC_START(aes_ctr_enc_192_avx_by8) /* call the aes main loop */ - do_aes_ctrmain KEY_192 + do_aes_ctrmain KEY_192 0 SYM_FUNC_END(aes_ctr_enc_192_avx_by8) @@ -520,6 +553,45 @@ SYM_FUNC_END(aes_ctr_enc_192_avx_by8) */ SYM_FUNC_START(aes_ctr_enc_256_avx_by8) /* call the aes main loop */ - do_aes_ctrmain KEY_256 + do_aes_ctrmain KEY_256 0 SYM_FUNC_END(aes_ctr_enc_256_avx_by8) + +/* + * routine to do AES128 XCTR enc/decrypt "by8" + * XMM registers are clobbered. + * Saving/restoring must be done at a higher level + * aes_xctr_enc_128_avx_by8(const u8 *in, const u8 *iv, const void *keys, + * u8* out, unsigned int num_bytes, unsigned int byte_ctr) + */ +SYM_FUNC_START(aes_xctr_enc_128_avx_by8) + /* call the aes main loop */ + do_aes_ctrmain KEY_128 1 + +SYM_FUNC_END(aes_xctr_enc_128_avx_by8) + +/* + * routine to do AES192 XCTR enc/decrypt "by8" + * XMM registers are clobbered. + * Saving/restoring must be done at a higher level + * aes_xctr_enc_192_avx_by8(const u8 *in, const u8 *iv, const void *keys, + * u8* out, unsigned int num_bytes, unsigned int byte_ctr) + */ +SYM_FUNC_START(aes_xctr_enc_192_avx_by8) + /* call the aes main loop */ + do_aes_ctrmain KEY_192 1 + +SYM_FUNC_END(aes_xctr_enc_192_avx_by8) + +/* + * routine to do AES256 XCTR enc/decrypt "by8" + * XMM registers are clobbered. + * Saving/restoring must be done at a higher level + * aes_xctr_enc_256_avx_by8(const u8 *in, const u8 *iv, const void *keys, + * u8* out, unsigned int num_bytes, unsigned int byte_ctr) + */ +SYM_FUNC_START(aes_xctr_enc_256_avx_by8) + /* call the aes main loop */ + do_aes_ctrmain KEY_256 1 + +SYM_FUNC_END(aes_xctr_enc_256_avx_by8) diff --git a/arch/x86/crypto/aesni-intel_glue.c b/arch/x86/crypto/aesni-intel_glue.c index 41901ba9d3a2..f79ed168a77b 100644 --- a/arch/x86/crypto/aesni-intel_glue.c +++ b/arch/x86/crypto/aesni-intel_glue.c @@ -135,6 +135,20 @@ asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv, void *keys, u8 *out, unsigned int num_bytes); asmlinkage void aes_ctr_enc_256_avx_by8(const u8 *in, u8 *iv, void *keys, u8 *out, unsigned int num_bytes); + + +asmlinkage void aes_xctr_enc_128_avx_by8(const u8 *in, const u8 *iv, + const void *keys, u8 *out, unsigned int num_bytes, + unsigned int byte_ctr); + +asmlinkage void aes_xctr_enc_192_avx_by8(const u8 *in, const u8 *iv, + const void *keys, u8 *out, unsigned int num_bytes, + unsigned int byte_ctr); + +asmlinkage void aes_xctr_enc_256_avx_by8(const u8 *in, const u8 *iv, + const void *keys, u8 *out, unsigned int num_bytes, + unsigned int byte_ctr); + /* * asmlinkage void aesni_gcm_init_avx_gen2() * gcm_data *my_ctx_data, context data @@ -527,6 +541,59 @@ static int ctr_crypt(struct skcipher_request *req) return err; } +static void aesni_xctr_enc_avx_tfm(struct crypto_aes_ctx *ctx, u8 *out, + const u8 *in, unsigned int len, u8 *iv, + unsigned int byte_ctr) +{ + if (ctx->key_length == AES_KEYSIZE_128) + aes_xctr_enc_128_avx_by8(in, iv, (void *)ctx, out, len, + byte_ctr); + else if (ctx->key_length == AES_KEYSIZE_192) + aes_xctr_enc_192_avx_by8(in, iv, (void *)ctx, out, len, + byte_ctr); + else + aes_xctr_enc_256_avx_by8(in, iv, (void *)ctx, out, len, + byte_ctr); +} + +static int xctr_crypt(struct skcipher_request *req) +{ + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); + struct crypto_aes_ctx *ctx = aes_ctx(crypto_skcipher_ctx(tfm)); + u8 keystream[AES_BLOCK_SIZE]; + struct skcipher_walk walk; + unsigned int nbytes; + unsigned int byte_ctr = 0; + int err; + __le32 block[AES_BLOCK_SIZE / sizeof(__le32)]; + + err = skcipher_walk_virt(&walk, req, false); + + while ((nbytes = walk.nbytes) > 0) { + kernel_fpu_begin(); + if (nbytes & AES_BLOCK_MASK) + aesni_xctr_enc_avx_tfm(ctx, walk.dst.virt.addr, + walk.src.virt.addr, nbytes & AES_BLOCK_MASK, + walk.iv, byte_ctr); + nbytes &= ~AES_BLOCK_MASK; + byte_ctr += walk.nbytes - nbytes; + + if (walk.nbytes == walk.total && nbytes > 0) { + memcpy(block, walk.iv, AES_BLOCK_SIZE); + block[0] ^= cpu_to_le32(1 + byte_ctr / AES_BLOCK_SIZE); + aesni_enc(ctx, keystream, (u8 *)block); + crypto_xor_cpy(walk.dst.virt.addr + walk.nbytes - + nbytes, walk.src.virt.addr + walk.nbytes + - nbytes, keystream, nbytes); + byte_ctr += nbytes; + nbytes = 0; + } + kernel_fpu_end(); + err = skcipher_walk_done(&walk, nbytes); + } + return err; +} + static int rfc4106_set_hash_subkey(u8 *hash_subkey, const u8 *key, unsigned int key_len) { @@ -1050,6 +1117,33 @@ static struct skcipher_alg aesni_skciphers[] = { static struct simd_skcipher_alg *aesni_simd_skciphers[ARRAY_SIZE(aesni_skciphers)]; +#ifdef CONFIG_X86_64 +/* + * XCTR does not have a non-AVX implementation, so it must be enabled + * conditionally. + */ +static struct skcipher_alg aesni_xctr = { + .base = { + .cra_name = "__xctr(aes)", + .cra_driver_name = "__xctr-aes-aesni", + .cra_priority = 400, + .cra_flags = CRYPTO_ALG_INTERNAL, + .cra_blocksize = 1, + .cra_ctxsize = CRYPTO_AES_CTX_SIZE, + .cra_module = THIS_MODULE, + }, + .min_keysize = AES_MIN_KEY_SIZE, + .max_keysize = AES_MAX_KEY_SIZE, + .ivsize = AES_BLOCK_SIZE, + .chunksize = AES_BLOCK_SIZE, + .setkey = aesni_skcipher_setkey, + .encrypt = xctr_crypt, + .decrypt = xctr_crypt, +}; + +static struct simd_skcipher_alg *aesni_simd_xctr; +#endif + #ifdef CONFIG_X86_64 static int generic_gcmaes_set_key(struct crypto_aead *aead, const u8 *key, unsigned int key_len) @@ -1180,8 +1274,19 @@ static int __init aesni_init(void) if (err) goto unregister_skciphers; +#ifdef CONFIG_X86_64 + if (boot_cpu_has(X86_FEATURE_AVX)) + err = simd_register_skciphers_compat(&aesni_xctr, 1, + &aesni_simd_xctr); + if (err) + goto unregister_aeads; +#endif + return 0; +unregister_aeads: + simd_unregister_aeads(aesni_aeads, ARRAY_SIZE(aesni_aeads), + aesni_simd_aeads); unregister_skciphers: simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers), aesni_simd_skciphers); @@ -1197,6 +1302,10 @@ static void __exit aesni_exit(void) simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers), aesni_simd_skciphers); crypto_unregister_alg(&aesni_cipher_alg); +#ifdef CONFIG_X86_64 + if (boot_cpu_has(X86_FEATURE_AVX)) + simd_unregister_skciphers(&aesni_xctr, 1, &aesni_simd_xctr); +#endif } late_initcall(aesni_init); diff --git a/crypto/Kconfig b/crypto/Kconfig index 0dedba74db4a..aa06af0e0ebe 100644 --- a/crypto/Kconfig +++ b/crypto/Kconfig @@ -1161,7 +1161,7 @@ config CRYPTO_AES_NI_INTEL In addition to AES cipher algorithm support, the acceleration for some popular block cipher mode is supported too, including ECB, CBC, LRW, XTS. The 64 bit version has additional - acceleration for CTR. + acceleration for CTR and XCTR. config CRYPTO_AES_SPARC64 tristate "AES cipher algorithms (SPARC64)" -- 2.36.0.rc2.479.g8af0fa9b8e-goog _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v5 4/8] crypto: x86/aesni-xctr: Add accelerated implementation of XCTR 2022-04-27 0:37 ` Nathan Huckleberry (?) @ 2022-04-27 4:26 ` kernel test robot -1 siblings, 0 replies; 7+ messages in thread From: kernel test robot @ 2022-04-27 4:26 UTC (permalink / raw) To: Nathan Huckleberry; +Cc: llvm, kbuild-all Hi Nathan, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on herbert-cryptodev-2.6/master] [also build test WARNING on herbert-crypto-2.6/master linus/master v5.18-rc4 next-20220426] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/Nathan-Huckleberry/crypto-HCTR2-support/20220427-084044 base: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git master config: i386-randconfig-a002-20220425 (https://download.01.org/0day-ci/archive/20220427/202204271241.gjp97ols-lkp@intel.com/config) compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 1cddcfdc3c683b393df1a5c9063252eb60e52818) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/00cd244c8a1bd9623a271407bf10b99c01884ef5 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Nathan-Huckleberry/crypto-HCTR2-support/20220427-084044 git checkout 00cd244c8a1bd9623a271407bf10b99c01884ef5 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash arch/x86/crypto/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): >> arch/x86/crypto/aesni-intel_glue.c:1287:1: warning: unused label 'unregister_aeads' [-Wunused-label] unregister_aeads: ^~~~~~~~~~~~~~~~~ 1 warning generated. vim +/unregister_aeads +1287 arch/x86/crypto/aesni-intel_glue.c 1284 1285 return 0; 1286 > 1287 unregister_aeads: 1288 simd_unregister_aeads(aesni_aeads, ARRAY_SIZE(aesni_aeads), 1289 aesni_simd_aeads); 1290 unregister_skciphers: 1291 simd_unregister_skciphers(aesni_skciphers, ARRAY_SIZE(aesni_skciphers), 1292 aesni_simd_skciphers); 1293 unregister_cipher: 1294 crypto_unregister_alg(&aesni_cipher_alg); 1295 return err; 1296 } 1297 -- 0-DAY CI Kernel Test Service https://01.org/lkp ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 4/8] crypto: x86/aesni-xctr: Add accelerated implementation of XCTR 2022-04-27 0:37 ` Nathan Huckleberry @ 2022-05-01 21:31 ` Eric Biggers -1 siblings, 0 replies; 7+ messages in thread From: Eric Biggers @ 2022-05-01 21:31 UTC (permalink / raw) To: Nathan Huckleberry Cc: linux-crypto, linux-fscrypt.vger.kernel.org, Herbert Xu, David S. Miller, linux-arm-kernel, Paul Crowley, Sami Tolvanen, Ard Biesheuvel On Wed, Apr 27, 2022 at 12:37:55AM +0000, Nathan Huckleberry wrote: > Add hardware accelerated versions of XCTR for x86-64 CPUs with AESNI > support. These implementations are modified versions of the CTR > implementations found in aesni-intel_asm.S and aes_ctrby8_avx-x86_64.S. Just one implementation now, using aes_ctrby8_avx-x86_64.S. > +/* Note: the "x" prefix in these aliases means "this is an xmm register". The > + * alias prefixes have no relation to XCTR where the "X" prefix means "XOR > + * counter". > + */ Block comments look like: /* * text */ > + .if !\xctr > + vpshufb xbyteswap, xcounter, xdata0 > + .set i, 1 > + .rept (by - 1) > + club XDATA, i > + vpaddq (ddq_add_1 + 16 * (i - 1))(%rip), xcounter, var_xdata > + vptest ddq_low_msk(%rip), var_xdata > + jnz 1f > + vpaddq ddq_high_add_1(%rip), var_xdata, var_xdata > + vpaddq ddq_high_add_1(%rip), xcounter, xcounter > + 1: > + vpshufb xbyteswap, var_xdata, var_xdata > + .set i, (i +1) > + .endr > + .else > + movq counter, xtmp > + .set i, 0 > + .rept (by) > + club XDATA, i > + vpaddq (ddq_add_1 + 16 * i)(%rip), xtmp, var_xdata > + .set i, (i +1) > + .endr > + .set i, 0 > + .rept (by) > + club XDATA, i > + vpxor xiv, var_xdata, var_xdata > + .set i, (i +1) > + .endr > + .endif I'm not a fan of 'if !condition ... else ...', as the else clause is double-negated. It's more straightforward to do 'if condition ... else ...'. > + .if !\xctr > + vmovdqa byteswap_const(%rip), xbyteswap > + vmovdqu (p_iv), xcounter > + vpshufb xbyteswap, xcounter, xcounter > + .else > + andq $(~0xf), num_bytes > + shr $4, counter > + vmovdqu (p_iv), xiv > + .endif Isn't the 'andq $(~0xf), num_bytes' instruction unnecessary? If it is necessary, I'd expect it to be necessary for CTR too. Otherwise this file looks good. Note, the macros in this file all expand to way too much code, especially due to the separate cases for AES-128, AES-192, and AES-256, and for each one every partial stride length 1..7. Of course, this is true for the existing CTR code too, so I don't think you have to fix this... But maybe think about addressing this later. Changing the handling of partial strides might be the easiest way to save a lot of code without hurting any micro-benchmarks too much. Also maybe some or all of the AES key sizes could be combined. > +#ifdef CONFIG_X86_64 > +/* > + * XCTR does not have a non-AVX implementation, so it must be enabled > + * conditionally. > + */ > +static struct skcipher_alg aesni_xctr = { > + .base = { > + .cra_name = "__xctr(aes)", > + .cra_driver_name = "__xctr-aes-aesni", > + .cra_priority = 400, > + .cra_flags = CRYPTO_ALG_INTERNAL, > + .cra_blocksize = 1, > + .cra_ctxsize = CRYPTO_AES_CTX_SIZE, > + .cra_module = THIS_MODULE, > + }, > + .min_keysize = AES_MIN_KEY_SIZE, > + .max_keysize = AES_MAX_KEY_SIZE, > + .ivsize = AES_BLOCK_SIZE, > + .chunksize = AES_BLOCK_SIZE, > + .setkey = aesni_skcipher_setkey, > + .encrypt = xctr_crypt, > + .decrypt = xctr_crypt, > +}; > + > +static struct simd_skcipher_alg *aesni_simd_xctr; > +#endif Comment the #endif above: #endif /* CONFIG_X86_64 */ > @@ -1180,8 +1274,19 @@ static int __init aesni_init(void) > if (err) > goto unregister_skciphers; > > +#ifdef CONFIG_X86_64 > + if (boot_cpu_has(X86_FEATURE_AVX)) > + err = simd_register_skciphers_compat(&aesni_xctr, 1, > + &aesni_simd_xctr); > + if (err) > + goto unregister_aeads; > +#endif > + > return 0; > > +unregister_aeads: > + simd_unregister_aeads(aesni_aeads, ARRAY_SIZE(aesni_aeads), > + aesni_simd_aeads); This will cause a compiler warning in 32-bit builds because the 'unregister_aeads' label won't be used. - Eric ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v5 4/8] crypto: x86/aesni-xctr: Add accelerated implementation of XCTR @ 2022-05-01 21:31 ` Eric Biggers 0 siblings, 0 replies; 7+ messages in thread From: Eric Biggers @ 2022-05-01 21:31 UTC (permalink / raw) To: Nathan Huckleberry Cc: linux-crypto, linux-fscrypt.vger.kernel.org, Herbert Xu, David S. Miller, linux-arm-kernel, Paul Crowley, Sami Tolvanen, Ard Biesheuvel On Wed, Apr 27, 2022 at 12:37:55AM +0000, Nathan Huckleberry wrote: > Add hardware accelerated versions of XCTR for x86-64 CPUs with AESNI > support. These implementations are modified versions of the CTR > implementations found in aesni-intel_asm.S and aes_ctrby8_avx-x86_64.S. Just one implementation now, using aes_ctrby8_avx-x86_64.S. > +/* Note: the "x" prefix in these aliases means "this is an xmm register". The > + * alias prefixes have no relation to XCTR where the "X" prefix means "XOR > + * counter". > + */ Block comments look like: /* * text */ > + .if !\xctr > + vpshufb xbyteswap, xcounter, xdata0 > + .set i, 1 > + .rept (by - 1) > + club XDATA, i > + vpaddq (ddq_add_1 + 16 * (i - 1))(%rip), xcounter, var_xdata > + vptest ddq_low_msk(%rip), var_xdata > + jnz 1f > + vpaddq ddq_high_add_1(%rip), var_xdata, var_xdata > + vpaddq ddq_high_add_1(%rip), xcounter, xcounter > + 1: > + vpshufb xbyteswap, var_xdata, var_xdata > + .set i, (i +1) > + .endr > + .else > + movq counter, xtmp > + .set i, 0 > + .rept (by) > + club XDATA, i > + vpaddq (ddq_add_1 + 16 * i)(%rip), xtmp, var_xdata > + .set i, (i +1) > + .endr > + .set i, 0 > + .rept (by) > + club XDATA, i > + vpxor xiv, var_xdata, var_xdata > + .set i, (i +1) > + .endr > + .endif I'm not a fan of 'if !condition ... else ...', as the else clause is double-negated. It's more straightforward to do 'if condition ... else ...'. > + .if !\xctr > + vmovdqa byteswap_const(%rip), xbyteswap > + vmovdqu (p_iv), xcounter > + vpshufb xbyteswap, xcounter, xcounter > + .else > + andq $(~0xf), num_bytes > + shr $4, counter > + vmovdqu (p_iv), xiv > + .endif Isn't the 'andq $(~0xf), num_bytes' instruction unnecessary? If it is necessary, I'd expect it to be necessary for CTR too. Otherwise this file looks good. Note, the macros in this file all expand to way too much code, especially due to the separate cases for AES-128, AES-192, and AES-256, and for each one every partial stride length 1..7. Of course, this is true for the existing CTR code too, so I don't think you have to fix this... But maybe think about addressing this later. Changing the handling of partial strides might be the easiest way to save a lot of code without hurting any micro-benchmarks too much. Also maybe some or all of the AES key sizes could be combined. > +#ifdef CONFIG_X86_64 > +/* > + * XCTR does not have a non-AVX implementation, so it must be enabled > + * conditionally. > + */ > +static struct skcipher_alg aesni_xctr = { > + .base = { > + .cra_name = "__xctr(aes)", > + .cra_driver_name = "__xctr-aes-aesni", > + .cra_priority = 400, > + .cra_flags = CRYPTO_ALG_INTERNAL, > + .cra_blocksize = 1, > + .cra_ctxsize = CRYPTO_AES_CTX_SIZE, > + .cra_module = THIS_MODULE, > + }, > + .min_keysize = AES_MIN_KEY_SIZE, > + .max_keysize = AES_MAX_KEY_SIZE, > + .ivsize = AES_BLOCK_SIZE, > + .chunksize = AES_BLOCK_SIZE, > + .setkey = aesni_skcipher_setkey, > + .encrypt = xctr_crypt, > + .decrypt = xctr_crypt, > +}; > + > +static struct simd_skcipher_alg *aesni_simd_xctr; > +#endif Comment the #endif above: #endif /* CONFIG_X86_64 */ > @@ -1180,8 +1274,19 @@ static int __init aesni_init(void) > if (err) > goto unregister_skciphers; > > +#ifdef CONFIG_X86_64 > + if (boot_cpu_has(X86_FEATURE_AVX)) > + err = simd_register_skciphers_compat(&aesni_xctr, 1, > + &aesni_simd_xctr); > + if (err) > + goto unregister_aeads; > +#endif > + > return 0; > > +unregister_aeads: > + simd_unregister_aeads(aesni_aeads, ARRAY_SIZE(aesni_aeads), > + aesni_simd_aeads); This will cause a compiler warning in 32-bit builds because the 'unregister_aeads' label won't be used. - Eric _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-05-01 21:32 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-27 9:56 [PATCH v5 4/8] crypto: x86/aesni-xctr: Add accelerated implementation of XCTR kernel test robot -- strict thread matches above, loose matches on Subject: below -- 2022-04-27 9:12 kernel test robot 2022-04-27 0:37 [PATCH v5 0/8] crypto: HCTR2 support Nathan Huckleberry 2022-04-27 0:37 ` [PATCH v5 4/8] crypto: x86/aesni-xctr: Add accelerated implementation of XCTR Nathan Huckleberry 2022-04-27 0:37 ` Nathan Huckleberry 2022-04-27 4:26 ` kernel test robot 2022-05-01 21:31 ` Eric Biggers 2022-05-01 21:31 ` Eric Biggers
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.