All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
To: Eric Biggers <ebiggers@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Gilad Ben-Yossef <gilad@benyossef.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	"Markku-Juhani O . Saarinen" <mjos@iki.fi>,
	Jussi Kivilinna <jussi.kivilinna@iki.fi>,
	x86@kernel.org, linux-crypto@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] crypto: x86/sm4 - add AES-NI/AVX/x86_64 assembler implementation
Date: Sun, 13 Jun 2021 18:14:52 +0800	[thread overview]
Message-ID: <d8df4d67-1a01-ff83-7739-af49b5b5e574@linux.alibaba.com> (raw)
In-Reply-To: <YMKf93/cnPGGtRW3@gmail.com>

Hi Eric,

On 6/11/21 7:27 AM, Eric Biggers wrote:
> On Thu, Jun 10, 2021 at 09:44:59PM +0800, Tianjia Zhang wrote:
>> This patch adds AES-NI/AVX/x86_64 assembler implementation of SM4
>> block cipher. Through two affine transforms, we can use the AES
>> S-Box to simulate the SM4 S-Box to achieve the effect of instruction
>> acceleration.
>>
> 
> Benchmark results, please.
> 
> Also, is this passing the self-tests, including the fuzz tests?
> 

I will provide this information in the next version.

>> +/*
>> + * void sm4_aesni_avx_expand_key(const u8 *key, u32 *rk_enc,
>> + *                  u32 *rk_dec, const u32 *fk, const u32 *ck);
>> + */
>> +SYM_FUNC_START(sm4_aesni_avx_expand_key)
>> +	/* input:
>> +	 *	%rdi: 128-bit key
>> +	 *	%rsi: rkey_enc
>> +	 *	%rdx: rkey_dec
>> +	 *	%rcx: fk array
>> +	 *	%r8: ck array
>> +	 */
>> +	FRAME_BEGIN
> 
> Key expansion isn't performance-critical.  Can the C library version be used, or
> does the key need to be expanded in a way specific to this x86 implementation?
> 

It can be replaced by a common implementation of C library. For expand 
key that are not called frequently, the optimization of a specific 
instruction set does not bring much benefit. Of course, it is possible 
to delete this implementation.

>> +/*
>> + * void sm4_aesni_avx_crypt4(const u32 *rk, u8 *dst,
>> + *                          const u8 *src, int nblocks)
>> + */
>> +SYM_FUNC_START(sm4_aesni_avx_crypt4)
>> +	/* input:
>> +	 *	%rdi: round key array, CTX
>> +	 *	%rsi: dst (1..4 blocks)
>> +	 *	%rdx: src (1..4 blocks)
>> +	 *	%rcx: num blocks (1..4)
>> +	 */
>> +	FRAME_BEGIN
> [...]
> 
>> +static void sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>> +{
>> +	const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
>> +
>> +	if (crypto_simd_usable()) {
>> +		kernel_fpu_begin();
>> +		sm4_aesni_avx_crypt4(ctx->rkey_enc, out, in, 1);
>> +		kernel_fpu_end();
>> +	} else
>> +		crypto_sm4_do_crypt(ctx->rkey_enc, out, in);
>> +}
>> +
>> +static void sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>> +{
>> +	const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
>> +
>> +	if (crypto_simd_usable()) {
>> +		kernel_fpu_begin();
>> +		sm4_aesni_avx_crypt4(ctx->rkey_dec, out, in, 1);
>> +		kernel_fpu_end();
>> +	} else
>> +		crypto_sm4_do_crypt(ctx->rkey_dec, out, in);
>> +}
> 
> Your assembly code appears to handle encrypting up to 4 blocks at a time.
> However you have only wired this up to the "cipher" API which does 1 block at a
> time.  Is this intentional?
> 
> What are your performance results with real-world chaining modes like XTS, and
> do you plan to implement any of these modes directly?
> 

This implementation is intentional. First, a general block encryption is 
provided. There is no obvious performance improvement in this 
implementation. The key to optimization is to make full use of parallel 
four blocks encryption at a time. This is still under development, and I 
will continue to implement things like XTS in the future. Optimization 
of such specific modes.

>> +
>> +static struct crypto_alg sm4_asm_alg = {
>> +	.cra_name		= "sm4",
>> +	.cra_driver_name	= "sm4-asm",
> 
> In arch/x86/crypto/, "-asm" usually means a vanilla x86 assembly implementation
> without any AES-NI, SSE, AVX, etc. instructions.  Calling this something like
> "sm4-aesni-avx" would make more sense.  (Or is it actually avx2, not avx?)
> 

will do in next version patch.

>> +config CRYPTO_SM4_AESNI_AVX_X86_64
>> +	tristate "SM4 cipher algorithm (x86_64/AES-NI/AVX)"
>> +	depends on X86 && 64BIT
>> +	select CRYPTO_SKCIPHER
>> +	select CRYPTO_SIMD
>> +	select CRYPTO_ALGAPI
>> +	select CRYPTO_LIB_SM4
> 
> As-is, neither CRYPTO_SKCIPHER nor CRYPTO_SIMD needs to be selected here.
> 

ditto.

>> +	help
>> +	  SM4 cipher algorithms (OSCCA GB/T 32907-2016) (x86_64/AES-NI/AVX).
>> +
>> +	  SM4 (GBT.32907-2016) is a cryptographic standard issued by the
>> +	  Organization of State Commercial Administration of China (OSCCA)
>> +	  as an authorized cryptographic algorithms for the use within China.
>> +
>> +	  SMS4 was originally created for use in protecting wireless
>> +	  networks, and is mandated in the Chinese National Standard for
>> +	  Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure)
>> +	  (GB.15629.11-2003).
>> +
>> +	  The latest SM4 standard (GBT.32907-2016) was proposed by OSCCA and
>> +	  standardized through TC 260 of the Standardization Administration
>> +	  of the People's Republic of China (SAC).
>> +
>> +	  The input, output, and key of SMS4 are each 128 bits.
>> +
>> +	  See also: <https://eprint.iacr.org/2008/329.pdf>
>> +
>> +	  If unsure, say N.
> 
> This is the help text for the x86 implementation specifically.  Please don't
> have boilerplate text about the algorithm here; that already exists for the
> generic implementation.  The text should explain about the x86 implementation.
> 

ditto.

> - Eric
> 

Thanks for your suggestion.

Cheers,
Tianjia

WARNING: multiple messages have this Message-ID (diff)
From: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
To: Eric Biggers <ebiggers@kernel.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Gilad Ben-Yossef <gilad@benyossef.com>,
	Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	"Markku-Juhani O . Saarinen" <mjos@iki.fi>,
	Jussi Kivilinna <jussi.kivilinna@iki.fi>,
	x86@kernel.org, linux-crypto@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/3] crypto: x86/sm4 - add AES-NI/AVX/x86_64 assembler implementation
Date: Sun, 13 Jun 2021 18:14:52 +0800	[thread overview]
Message-ID: <d8df4d67-1a01-ff83-7739-af49b5b5e574@linux.alibaba.com> (raw)
In-Reply-To: <YMKf93/cnPGGtRW3@gmail.com>

Hi Eric,

On 6/11/21 7:27 AM, Eric Biggers wrote:
> On Thu, Jun 10, 2021 at 09:44:59PM +0800, Tianjia Zhang wrote:
>> This patch adds AES-NI/AVX/x86_64 assembler implementation of SM4
>> block cipher. Through two affine transforms, we can use the AES
>> S-Box to simulate the SM4 S-Box to achieve the effect of instruction
>> acceleration.
>>
> 
> Benchmark results, please.
> 
> Also, is this passing the self-tests, including the fuzz tests?
> 

I will provide this information in the next version.

>> +/*
>> + * void sm4_aesni_avx_expand_key(const u8 *key, u32 *rk_enc,
>> + *                  u32 *rk_dec, const u32 *fk, const u32 *ck);
>> + */
>> +SYM_FUNC_START(sm4_aesni_avx_expand_key)
>> +	/* input:
>> +	 *	%rdi: 128-bit key
>> +	 *	%rsi: rkey_enc
>> +	 *	%rdx: rkey_dec
>> +	 *	%rcx: fk array
>> +	 *	%r8: ck array
>> +	 */
>> +	FRAME_BEGIN
> 
> Key expansion isn't performance-critical.  Can the C library version be used, or
> does the key need to be expanded in a way specific to this x86 implementation?
> 

It can be replaced by a common implementation of C library. For expand 
key that are not called frequently, the optimization of a specific 
instruction set does not bring much benefit. Of course, it is possible 
to delete this implementation.

>> +/*
>> + * void sm4_aesni_avx_crypt4(const u32 *rk, u8 *dst,
>> + *                          const u8 *src, int nblocks)
>> + */
>> +SYM_FUNC_START(sm4_aesni_avx_crypt4)
>> +	/* input:
>> +	 *	%rdi: round key array, CTX
>> +	 *	%rsi: dst (1..4 blocks)
>> +	 *	%rdx: src (1..4 blocks)
>> +	 *	%rcx: num blocks (1..4)
>> +	 */
>> +	FRAME_BEGIN
> [...]
> 
>> +static void sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>> +{
>> +	const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
>> +
>> +	if (crypto_simd_usable()) {
>> +		kernel_fpu_begin();
>> +		sm4_aesni_avx_crypt4(ctx->rkey_enc, out, in, 1);
>> +		kernel_fpu_end();
>> +	} else
>> +		crypto_sm4_do_crypt(ctx->rkey_enc, out, in);
>> +}
>> +
>> +static void sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
>> +{
>> +	const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm);
>> +
>> +	if (crypto_simd_usable()) {
>> +		kernel_fpu_begin();
>> +		sm4_aesni_avx_crypt4(ctx->rkey_dec, out, in, 1);
>> +		kernel_fpu_end();
>> +	} else
>> +		crypto_sm4_do_crypt(ctx->rkey_dec, out, in);
>> +}
> 
> Your assembly code appears to handle encrypting up to 4 blocks at a time.
> However you have only wired this up to the "cipher" API which does 1 block at a
> time.  Is this intentional?
> 
> What are your performance results with real-world chaining modes like XTS, and
> do you plan to implement any of these modes directly?
> 

This implementation is intentional. First, a general block encryption is 
provided. There is no obvious performance improvement in this 
implementation. The key to optimization is to make full use of parallel 
four blocks encryption at a time. This is still under development, and I 
will continue to implement things like XTS in the future. Optimization 
of such specific modes.

>> +
>> +static struct crypto_alg sm4_asm_alg = {
>> +	.cra_name		= "sm4",
>> +	.cra_driver_name	= "sm4-asm",
> 
> In arch/x86/crypto/, "-asm" usually means a vanilla x86 assembly implementation
> without any AES-NI, SSE, AVX, etc. instructions.  Calling this something like
> "sm4-aesni-avx" would make more sense.  (Or is it actually avx2, not avx?)
> 

will do in next version patch.

>> +config CRYPTO_SM4_AESNI_AVX_X86_64
>> +	tristate "SM4 cipher algorithm (x86_64/AES-NI/AVX)"
>> +	depends on X86 && 64BIT
>> +	select CRYPTO_SKCIPHER
>> +	select CRYPTO_SIMD
>> +	select CRYPTO_ALGAPI
>> +	select CRYPTO_LIB_SM4
> 
> As-is, neither CRYPTO_SKCIPHER nor CRYPTO_SIMD needs to be selected here.
> 

ditto.

>> +	help
>> +	  SM4 cipher algorithms (OSCCA GB/T 32907-2016) (x86_64/AES-NI/AVX).
>> +
>> +	  SM4 (GBT.32907-2016) is a cryptographic standard issued by the
>> +	  Organization of State Commercial Administration of China (OSCCA)
>> +	  as an authorized cryptographic algorithms for the use within China.
>> +
>> +	  SMS4 was originally created for use in protecting wireless
>> +	  networks, and is mandated in the Chinese National Standard for
>> +	  Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure)
>> +	  (GB.15629.11-2003).
>> +
>> +	  The latest SM4 standard (GBT.32907-2016) was proposed by OSCCA and
>> +	  standardized through TC 260 of the Standardization Administration
>> +	  of the People's Republic of China (SAC).
>> +
>> +	  The input, output, and key of SMS4 are each 128 bits.
>> +
>> +	  See also: <https://eprint.iacr.org/2008/329.pdf>
>> +
>> +	  If unsure, say N.
> 
> This is the help text for the x86 implementation specifically.  Please don't
> have boilerplate text about the algorithm here; that already exists for the
> generic implementation.  The text should explain about the x86 implementation.
> 

ditto.

> - Eric
> 

Thanks for your suggestion.

Cheers,
Tianjia

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-06-13 10:14 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 13:44 [PATCH 0/3] Introduce x86 assembler accelerated implementation for SM4 algorithm Tianjia Zhang
2021-06-10 13:44 ` Tianjia Zhang
2021-06-10 13:44 ` [PATCH 1/3] crypto: sm4 - create SM4 library based on sm4 generic code Tianjia Zhang
2021-06-10 13:44   ` Tianjia Zhang
2021-06-10 23:19   ` Eric Biggers
2021-06-10 23:19     ` Eric Biggers
2021-06-13 10:14     ` Tianjia Zhang
2021-06-13 10:14       ` Tianjia Zhang
2022-03-01 10:34   ` Jason A. Donenfeld
2022-03-01 10:34     ` Jason A. Donenfeld
2022-03-01 11:50     ` Tianjia Zhang
2022-03-01 11:50       ` Tianjia Zhang
2022-03-01 13:22       ` Jason A. Donenfeld
2022-03-01 13:22         ` Jason A. Donenfeld
2022-03-01 14:17         ` Jason A. Donenfeld
2022-03-01 14:17           ` Jason A. Donenfeld
2022-03-02  0:24     ` Herbert Xu
2022-03-02  0:24       ` Herbert Xu
2022-03-02  0:26       ` Jason A. Donenfeld
2022-03-02  0:26         ` Jason A. Donenfeld
2022-03-02 22:23         ` Eric Biggers
2022-03-02 22:23           ` Eric Biggers
2022-03-11 23:03           ` Jason A. Donenfeld
2022-03-11 23:03             ` Jason A. Donenfeld
2022-03-14  2:32             ` Tianjia Zhang
2022-03-14  2:32               ` Tianjia Zhang
2022-03-14  2:40               ` Jason A. Donenfeld
2022-03-14  2:40                 ` Jason A. Donenfeld
2022-03-14  2:45                 ` Herbert Xu
2022-03-14  2:45                   ` Herbert Xu
2022-03-14  2:46                   ` Jason A. Donenfeld
2022-03-14  2:46                     ` Jason A. Donenfeld
2021-06-10 13:44 ` [PATCH 2/3] crypto: arm64/sm4-ce - Make dependent on sm4 library instead of sm4-generic Tianjia Zhang
2021-06-10 13:44   ` Tianjia Zhang
2021-06-10 13:44 ` [PATCH 3/3] crypto: x86/sm4 - add AES-NI/AVX/x86_64 assembler implementation Tianjia Zhang
2021-06-10 13:44   ` Tianjia Zhang
2021-06-10 23:27   ` Eric Biggers
2021-06-10 23:27     ` Eric Biggers
2021-06-13 10:14     ` Tianjia Zhang [this message]
2021-06-13 10:14       ` Tianjia Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8df4d67-1a01-ff83-7739-af49b5b5e574@linux.alibaba.com \
    --to=tianjia.zhang@linux.alibaba.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=davem@davemloft.net \
    --cc=ebiggers@kernel.org \
    --cc=gilad@benyossef.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=hpa@zytor.com \
    --cc=jussi.kivilinna@iki.fi \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=mjos@iki.fi \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.