All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Gilad Ben-Yossef <gilad@benyossef.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	"Markku-Juhani O . Saarinen" <mjos@iki.fi>,
	Jussi Kivilinna <jussi.kivilinna@iki.fi>,
	x86@kernel.org, linux-crypto@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] crypto: x86/sm4 - add AES-NI/AVX/x86_64 assembler implementation
Date: Thu, 10 Jun 2021 16:27:51 -0700	[thread overview]
Message-ID: <YMKf93/cnPGGtRW3@gmail.com> (raw)
In-Reply-To: <20210610134459.28541-4-tianjia.zhang@linux.alibaba.com>

On Thu, Jun 10, 2021 at 09:44:59PM +0800, Tianjia Zhang wrote:
> This patch adds AES-NI/AVX/x86_64 assembler implementation of SM4
> block cipher. Through two affine transforms, we can use the AES
> S-Box to simulate the SM4 S-Box to achieve the effect of instruction
> acceleration.
> 

Benchmark results, please.

Also, is this passing the self-tests, including the fuzz tests?

> +/*
> + * void sm4_aesni_avx_expand_key(const u8 *key, u32 *rk_enc,
> + *                  u32 *rk_dec, const u32 *fk, const u32 *ck);
> + */
> +SYM_FUNC_START(sm4_aesni_avx_expand_key)
> +	/* input:
> +	 *	%rdi: 128-bit key
> +	 *	%rsi: rkey_enc
> +	 *	%rdx: rkey_dec
> +	 *	%rcx: fk array
> +	 *	%r8: ck array
> +	 */
> +	FRAME_BEGIN

Key expansion isn't performance-critical.  Can the C library version be used, or
does the key need to be expanded in a way specific to this x86 implementation?

> +/*
> + * void sm4_aesni_avx_crypt4(const u32 *rk, u8 *dst,
> + *                          const u8 *src, int nblocks)
> + */
> +SYM_FUNC_START(sm4_aesni_avx_crypt4)
> +	/* input:
> +	 *	%rdi: round key array, CTX
> +	 *	%rsi: dst (1..4 blocks)
> +	 *	%rdx: src (1..4 blocks)
> +	 *	%rcx: num blocks (1..4)
> +	 */
> +	FRAME_BEGIN
[...]

> +static void sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
> +{
> +	const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> +	if (crypto_simd_usable()) {
> +		kernel_fpu_begin();
> +		sm4_aesni_avx_crypt4(ctx->rkey_enc, out, in, 1);
> +		kernel_fpu_end();
> +	} else
> +		crypto_sm4_do_crypt(ctx->rkey_enc, out, in);
> +}
> +
> +static void sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
> +{
> +	const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> +	if (crypto_simd_usable()) {
> +		kernel_fpu_begin();
> +		sm4_aesni_avx_crypt4(ctx->rkey_dec, out, in, 1);
> +		kernel_fpu_end();
> +	} else
> +		crypto_sm4_do_crypt(ctx->rkey_dec, out, in);
> +}

Your assembly code appears to handle encrypting up to 4 blocks at a time.
However you have only wired this up to the "cipher" API which does 1 block at a
time.  Is this intentional?

What are your performance results with real-world chaining modes like XTS, and
do you plan to implement any of these modes directly?

> +
> +static struct crypto_alg sm4_asm_alg = {
> +	.cra_name		= "sm4",
> +	.cra_driver_name	= "sm4-asm",

In arch/x86/crypto/, "-asm" usually means a vanilla x86 assembly implementation
without any AES-NI, SSE, AVX, etc. instructions.  Calling this something like
"sm4-aesni-avx" would make more sense.  (Or is it actually avx2, not avx?)

> +config CRYPTO_SM4_AESNI_AVX_X86_64
> +	tristate "SM4 cipher algorithm (x86_64/AES-NI/AVX)"
> +	depends on X86 && 64BIT
> +	select CRYPTO_SKCIPHER
> +	select CRYPTO_SIMD
> +	select CRYPTO_ALGAPI
> +	select CRYPTO_LIB_SM4

As-is, neither CRYPTO_SKCIPHER nor CRYPTO_SIMD needs to be selected here.

> +	help
> +	  SM4 cipher algorithms (OSCCA GB/T 32907-2016) (x86_64/AES-NI/AVX).
> +
> +	  SM4 (GBT.32907-2016) is a cryptographic standard issued by the
> +	  Organization of State Commercial Administration of China (OSCCA)
> +	  as an authorized cryptographic algorithms for the use within China.
> +
> +	  SMS4 was originally created for use in protecting wireless
> +	  networks, and is mandated in the Chinese National Standard for
> +	  Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure)
> +	  (GB.15629.11-2003).
> +
> +	  The latest SM4 standard (GBT.32907-2016) was proposed by OSCCA and
> +	  standardized through TC 260 of the Standardization Administration
> +	  of the People's Republic of China (SAC).
> +
> +	  The input, output, and key of SMS4 are each 128 bits.
> +
> +	  See also: <https://eprint.iacr.org/2008/329.pdf>
> +
> +	  If unsure, say N.

This is the help text for the x86 implementation specifically.  Please don't
have boilerplate text about the algorithm here; that already exists for the
generic implementation.  The text should explain about the x86 implementation.

- Eric

WARNING: multiple messages have this Message-ID (diff)
From: Eric Biggers <ebiggers@kernel.org>
To: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Gilad Ben-Yossef <gilad@benyossef.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	"Markku-Juhani O . Saarinen" <mjos@iki.fi>,
	Jussi Kivilinna <jussi.kivilinna@iki.fi>,
	x86@kernel.org, linux-crypto@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] crypto: x86/sm4 - add AES-NI/AVX/x86_64 assembler implementation
Date: Thu, 10 Jun 2021 16:27:51 -0700	[thread overview]
Message-ID: <YMKf93/cnPGGtRW3@gmail.com> (raw)
In-Reply-To: <20210610134459.28541-4-tianjia.zhang@linux.alibaba.com>

On Thu, Jun 10, 2021 at 09:44:59PM +0800, Tianjia Zhang wrote:
> This patch adds AES-NI/AVX/x86_64 assembler implementation of SM4
> block cipher. Through two affine transforms, we can use the AES
> S-Box to simulate the SM4 S-Box to achieve the effect of instruction
> acceleration.
> 

Benchmark results, please.

Also, is this passing the self-tests, including the fuzz tests?

> +/*
> + * void sm4_aesni_avx_expand_key(const u8 *key, u32 *rk_enc,
> + *                  u32 *rk_dec, const u32 *fk, const u32 *ck);
> + */
> +SYM_FUNC_START(sm4_aesni_avx_expand_key)
> +	/* input:
> +	 *	%rdi: 128-bit key
> +	 *	%rsi: rkey_enc
> +	 *	%rdx: rkey_dec
> +	 *	%rcx: fk array
> +	 *	%r8: ck array
> +	 */
> +	FRAME_BEGIN

Key expansion isn't performance-critical.  Can the C library version be used, or
does the key need to be expanded in a way specific to this x86 implementation?

> +/*
> + * void sm4_aesni_avx_crypt4(const u32 *rk, u8 *dst,
> + *                          const u8 *src, int nblocks)
> + */
> +SYM_FUNC_START(sm4_aesni_avx_crypt4)
> +	/* input:
> +	 *	%rdi: round key array, CTX
> +	 *	%rsi: dst (1..4 blocks)
> +	 *	%rdx: src (1..4 blocks)
> +	 *	%rcx: num blocks (1..4)
> +	 */
> +	FRAME_BEGIN
[...]

> +static void sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
> +{
> +	const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> +	if (crypto_simd_usable()) {
> +		kernel_fpu_begin();
> +		sm4_aesni_avx_crypt4(ctx->rkey_enc, out, in, 1);
> +		kernel_fpu_end();
> +	} else
> +		crypto_sm4_do_crypt(ctx->rkey_enc, out, in);
> +}
> +
> +static void sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
> +{
> +	const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> +	if (crypto_simd_usable()) {
> +		kernel_fpu_begin();
> +		sm4_aesni_avx_crypt4(ctx->rkey_dec, out, in, 1);
> +		kernel_fpu_end();
> +	} else
> +		crypto_sm4_do_crypt(ctx->rkey_dec, out, in);
> +}

Your assembly code appears to handle encrypting up to 4 blocks at a time.
However you have only wired this up to the "cipher" API which does 1 block at a
time.  Is this intentional?

What are your performance results with real-world chaining modes like XTS, and
do you plan to implement any of these modes directly?

> +
> +static struct crypto_alg sm4_asm_alg = {
> +	.cra_name		= "sm4",
> +	.cra_driver_name	= "sm4-asm",

In arch/x86/crypto/, "-asm" usually means a vanilla x86 assembly implementation
without any AES-NI, SSE, AVX, etc. instructions.  Calling this something like
"sm4-aesni-avx" would make more sense.  (Or is it actually avx2, not avx?)

> +config CRYPTO_SM4_AESNI_AVX_X86_64
> +	tristate "SM4 cipher algorithm (x86_64/AES-NI/AVX)"
> +	depends on X86 && 64BIT
> +	select CRYPTO_SKCIPHER
> +	select CRYPTO_SIMD
> +	select CRYPTO_ALGAPI
> +	select CRYPTO_LIB_SM4

As-is, neither CRYPTO_SKCIPHER nor CRYPTO_SIMD needs to be selected here.

> +	help
> +	  SM4 cipher algorithms (OSCCA GB/T 32907-2016) (x86_64/AES-NI/AVX).
> +
> +	  SM4 (GBT.32907-2016) is a cryptographic standard issued by the
> +	  Organization of State Commercial Administration of China (OSCCA)
> +	  as an authorized cryptographic algorithms for the use within China.
> +
> +	  SMS4 was originally created for use in protecting wireless
> +	  networks, and is mandated in the Chinese National Standard for
> +	  Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure)
> +	  (GB.15629.11-2003).
> +
> +	  The latest SM4 standard (GBT.32907-2016) was proposed by OSCCA and
> +	  standardized through TC 260 of the Standardization Administration
> +	  of the People's Republic of China (SAC).
> +
> +	  The input, output, and key of SMS4 are each 128 bits.
> +
> +	  See also: <https://eprint.iacr.org/2008/329.pdf>
> +
> +	  If unsure, say N.

This is the help text for the x86 implementation specifically.  Please don't
have boilerplate text about the algorithm here; that already exists for the
generic implementation.  The text should explain about the x86 implementation.

- Eric

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-06-10 23:27 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 13:44 [PATCH 0/3] Introduce x86 assembler accelerated implementation for SM4 algorithm Tianjia Zhang
2021-06-10 13:44 ` Tianjia Zhang
2021-06-10 13:44 ` [PATCH 1/3] crypto: sm4 - create SM4 library based on sm4 generic code Tianjia Zhang
2021-06-10 13:44   ` Tianjia Zhang
2021-06-10 23:19   ` Eric Biggers
2021-06-10 23:19     ` Eric Biggers
2021-06-13 10:14     ` Tianjia Zhang
2021-06-13 10:14       ` Tianjia Zhang
2022-03-01 10:34   ` Jason A. Donenfeld
2022-03-01 10:34     ` Jason A. Donenfeld
2022-03-01 11:50     ` Tianjia Zhang
2022-03-01 11:50       ` Tianjia Zhang
2022-03-01 13:22       ` Jason A. Donenfeld
2022-03-01 13:22         ` Jason A. Donenfeld
2022-03-01 14:17         ` Jason A. Donenfeld
2022-03-01 14:17           ` Jason A. Donenfeld
2022-03-02  0:24     ` Herbert Xu
2022-03-02  0:24       ` Herbert Xu
2022-03-02  0:26       ` Jason A. Donenfeld
2022-03-02  0:26         ` Jason A. Donenfeld
2022-03-02 22:23         ` Eric Biggers
2022-03-02 22:23           ` Eric Biggers
2022-03-11 23:03           ` Jason A. Donenfeld
2022-03-11 23:03             ` Jason A. Donenfeld
2022-03-14  2:32             ` Tianjia Zhang
2022-03-14  2:32               ` Tianjia Zhang
2022-03-14  2:40               ` Jason A. Donenfeld
2022-03-14  2:40                 ` Jason A. Donenfeld
2022-03-14  2:45                 ` Herbert Xu
2022-03-14  2:45                   ` Herbert Xu
2022-03-14  2:46                   ` Jason A. Donenfeld
2022-03-14  2:46                     ` Jason A. Donenfeld
2021-06-10 13:44 ` [PATCH 2/3] crypto: arm64/sm4-ce - Make dependent on sm4 library instead of sm4-generic Tianjia Zhang
2021-06-10 13:44   ` Tianjia Zhang
2021-06-10 13:44 ` [PATCH 3/3] crypto: x86/sm4 - add AES-NI/AVX/x86_64 assembler implementation Tianjia Zhang
2021-06-10 13:44   ` Tianjia Zhang
2021-06-10 23:27   ` Eric Biggers [this message]
2021-06-10 23:27     ` Eric Biggers
2021-06-13 10:14     ` Tianjia Zhang
2021-06-13 10:14       ` Tianjia Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YMKf93/cnPGGtRW3@gmail.com \
    --to=ebiggers@kernel.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=davem@davemloft.net \
    --cc=gilad@benyossef.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=hpa@zytor.com \
    --cc=jussi.kivilinna@iki.fi \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mjos@iki.fi \
    --cc=tglx@linutronix.de \
    --cc=tianjia.zhang@linux.alibaba.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.