From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BEB2BC48BDF for ; Thu, 10 Jun 2021 23:27:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A6BD4613E1 for ; Thu, 10 Jun 2021 23:27:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231236AbhFJX3w (ORCPT ); Thu, 10 Jun 2021 19:29:52 -0400 Received: from mail.kernel.org ([198.145.29.99]:56424 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230212AbhFJX3u (ORCPT ); Thu, 10 Jun 2021 19:29:50 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id C9886613CB; Thu, 10 Jun 2021 23:27:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1623367673; bh=hOK0J1uI//q3YZets8JX6ACpEoV1vqm/HHQw5zocAqo=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=Vfs5YsUpOIzAcog1ud+K4bqVRr77Q/lChQyoH13m2lGR+SA6jipEJRDB6nQSxgNh3 QcG0ZcftfzSxVLFH33sgBniohcrNDCxB8GQK8LEVgig7IsUtbxuc7oR1ak2q+rXS+K zxBa7dVmNZ9qQaAHxYZpzJyRhAMvv4lmDeo8JGqpDdfa5I6tWCG3Z46rx/vF6JVHEU lJ1bqZESazgK+eHCjoAzJ2WBNK8QAC7XqFybpsrk//ZLry2nuMy5Od7GzyUdz/hzX1 VizpdcVaimCdwF9rYeFtj1QA2LyID4T6W7Cl/kKeXvAdKia91nsKyQp23Eh63NkN1t C0WEX+Vxuf4Zg== Date: Thu, 10 Jun 2021 16:27:51 -0700 From: Eric Biggers To: Tianjia Zhang Cc: Herbert Xu , "David S. Miller" , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , Gilad Ben-Yossef , Ard Biesheuvel , "Markku-Juhani O . Saarinen" , Jussi Kivilinna , x86@kernel.org, linux-crypto@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/3] crypto: x86/sm4 - add AES-NI/AVX/x86_64 assembler implementation Message-ID: References: <20210610134459.28541-1-tianjia.zhang@linux.alibaba.com> <20210610134459.28541-4-tianjia.zhang@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210610134459.28541-4-tianjia.zhang@linux.alibaba.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 10, 2021 at 09:44:59PM +0800, Tianjia Zhang wrote: > This patch adds AES-NI/AVX/x86_64 assembler implementation of SM4 > block cipher. Through two affine transforms, we can use the AES > S-Box to simulate the SM4 S-Box to achieve the effect of instruction > acceleration. > Benchmark results, please. Also, is this passing the self-tests, including the fuzz tests? > +/* > + * void sm4_aesni_avx_expand_key(const u8 *key, u32 *rk_enc, > + * u32 *rk_dec, const u32 *fk, const u32 *ck); > + */ > +SYM_FUNC_START(sm4_aesni_avx_expand_key) > + /* input: > + * %rdi: 128-bit key > + * %rsi: rkey_enc > + * %rdx: rkey_dec > + * %rcx: fk array > + * %r8: ck array > + */ > + FRAME_BEGIN Key expansion isn't performance-critical. Can the C library version be used, or does the key need to be expanded in a way specific to this x86 implementation? > +/* > + * void sm4_aesni_avx_crypt4(const u32 *rk, u8 *dst, > + * const u8 *src, int nblocks) > + */ > +SYM_FUNC_START(sm4_aesni_avx_crypt4) > + /* input: > + * %rdi: round key array, CTX > + * %rsi: dst (1..4 blocks) > + * %rdx: src (1..4 blocks) > + * %rcx: num blocks (1..4) > + */ > + FRAME_BEGIN [...] > +static void sm4_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) > +{ > + const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm); > + > + if (crypto_simd_usable()) { > + kernel_fpu_begin(); > + sm4_aesni_avx_crypt4(ctx->rkey_enc, out, in, 1); > + kernel_fpu_end(); > + } else > + crypto_sm4_do_crypt(ctx->rkey_enc, out, in); > +} > + > +static void sm4_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in) > +{ > + const struct crypto_sm4_ctx *ctx = crypto_tfm_ctx(tfm); > + > + if (crypto_simd_usable()) { > + kernel_fpu_begin(); > + sm4_aesni_avx_crypt4(ctx->rkey_dec, out, in, 1); > + kernel_fpu_end(); > + } else > + crypto_sm4_do_crypt(ctx->rkey_dec, out, in); > +} Your assembly code appears to handle encrypting up to 4 blocks at a time. However you have only wired this up to the "cipher" API which does 1 block at a time. Is this intentional? What are your performance results with real-world chaining modes like XTS, and do you plan to implement any of these modes directly? > + > +static struct crypto_alg sm4_asm_alg = { > + .cra_name = "sm4", > + .cra_driver_name = "sm4-asm", In arch/x86/crypto/, "-asm" usually means a vanilla x86 assembly implementation without any AES-NI, SSE, AVX, etc. instructions. Calling this something like "sm4-aesni-avx" would make more sense. (Or is it actually avx2, not avx?) > +config CRYPTO_SM4_AESNI_AVX_X86_64 > + tristate "SM4 cipher algorithm (x86_64/AES-NI/AVX)" > + depends on X86 && 64BIT > + select CRYPTO_SKCIPHER > + select CRYPTO_SIMD > + select CRYPTO_ALGAPI > + select CRYPTO_LIB_SM4 As-is, neither CRYPTO_SKCIPHER nor CRYPTO_SIMD needs to be selected here. > + help > + SM4 cipher algorithms (OSCCA GB/T 32907-2016) (x86_64/AES-NI/AVX). > + > + SM4 (GBT.32907-2016) is a cryptographic standard issued by the > + Organization of State Commercial Administration of China (OSCCA) > + as an authorized cryptographic algorithms for the use within China. > + > + SMS4 was originally created for use in protecting wireless > + networks, and is mandated in the Chinese National Standard for > + Wireless LAN WAPI (Wired Authentication and Privacy Infrastructure) > + (GB.15629.11-2003). > + > + The latest SM4 standard (GBT.32907-2016) was proposed by OSCCA and > + standardized through TC 260 of the Standardization Administration > + of the People's Republic of China (SAC). > + > + The input, output, and key of SMS4 are each 128 bits. > + > + See also: > + > + If unsure, say N. This is the help text for the x86 implementation specifically. Please don't have boilerplate text about the algorithm here; that already exists for the generic implementation. The text should explain about the x86 implementation. - Eric