From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E24DFC433FE for ; Tue, 21 Dec 2021 07:39:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235326AbhLUHjh (ORCPT ); Tue, 21 Dec 2021 02:39:37 -0500 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:44549 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235359AbhLUHje (ORCPT ); Tue, 21 Dec 2021 02:39:34 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04395;MF=tianjia.zhang@linux.alibaba.com;NM=1;PH=DS;RN=19;SR=0;TI=SMTPD_---0V.JbicZ_1640072369; Received: from 30.240.100.46(mailfrom:tianjia.zhang@linux.alibaba.com fp:SMTPD_---0V.JbicZ_1640072369) by smtp.aliyun-inc.com(127.0.0.1); Tue, 21 Dec 2021 15:39:30 +0800 Message-ID: <404b02be-2e94-1d80-8512-f25a5a93378e@linux.alibaba.com> Date: Tue, 21 Dec 2021 15:39:28 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.4.0 Subject: Re: [PATCH 5/6] crypto: x86/sm3 - add AVX assembly implementation Content-Language: en-US To: Jussi Kivilinna , Herbert Xu , "David S. Miller" , Vitaly Chikunov , Eric Biggers , Eric Biggers , Gilad Ben-Yossef , Ard Biesheuvel , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-crypto@vger.kernel.org, x86@kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org References: <20211220082251.1445-1-tianjia.zhang@linux.alibaba.com> <20211220082251.1445-6-tianjia.zhang@linux.alibaba.com> <9e70bf33-bab5-83a3-1eb0-7cae442c2f64@iki.fi> From: Tianjia Zhang In-Reply-To: <9e70bf33-bab5-83a3-1eb0-7cae442c2f64@iki.fi> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/21/21 2:03 AM, Jussi Kivilinna wrote: > On 20.12.2021 10.22, Tianjia Zhang wrote: >> This patch adds AVX assembly accelerated implementation of SM3 secure >> hash algorithm. From the benchmark data, compared to pure software >> implementation sm3-generic, the performance increase is up to 38%. >> >> The main algorithm implementation based on SM3 AES/BMI2 accelerated >> work by libgcrypt at: >> https://gnupg.org/software/libgcrypt/index.html >> >> Benchmark on Intel i5-6200U 2.30GHz, performance data of two >> implementations, pure software sm3-generic and sm3-avx acceleration. >> The data comes from the 326 mode and 422 mode of tcrypt. The abscissas >> are different lengths of per update. The data is tabulated and the >> unit is Mb/s: >> >> update-size |     16      64     256    1024    2048    4096    8192 >> -------------------------------------------------------------------- >> sm3-generic | 105.97  129.60  182.12  189.62  188.06  193.66  194.88 >> sm3-avx     | 119.87  163.05  244.44  260.92  257.60  264.87  265.88 >> >> Signed-off-by: Tianjia Zhang >> --- >>   arch/x86/crypto/Makefile         |   3 + >>   arch/x86/crypto/sm3-avx-asm_64.S | 521 +++++++++++++++++++++++++++++++ >>   arch/x86/crypto/sm3_avx_glue.c   | 134 ++++++++ >>   crypto/Kconfig                   |  13 + >>   4 files changed, 671 insertions(+) >>   create mode 100644 arch/x86/crypto/sm3-avx-asm_64.S >>   create mode 100644 arch/x86/crypto/sm3_avx_glue.c >> >> diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile >> index f307c93fc90a..7cbe860f6201 100644 >> --- a/arch/x86/crypto/Makefile >> +++ b/arch/x86/crypto/Makefile >> @@ -88,6 +88,9 @@ nhpoly1305-avx2-y := nh-avx2-x86_64.o >> nhpoly1305-avx2-glue.o >>   obj-$(CONFIG_CRYPTO_CURVE25519_X86) += curve25519-x86_64.o >> +obj-$(CONFIG_CRYPTO_SM3_AVX_X86_64) += sm3-avx-x86_64.o >> +sm3-avx-x86_64-y := sm3-avx-asm_64.o sm3_avx_glue.o >> + >>   obj-$(CONFIG_CRYPTO_SM4_AESNI_AVX_X86_64) += sm4-aesni-avx-x86_64.o >>   sm4-aesni-avx-x86_64-y := sm4-aesni-avx-asm_64.o sm4_aesni_avx_glue.o >> diff --git a/arch/x86/crypto/sm3-avx-asm_64.S >> b/arch/x86/crypto/sm3-avx-asm_64.S >> new file mode 100644 >> index 000000000000..e7a9a37f3609 >> --- /dev/null >> +++ b/arch/x86/crypto/sm3-avx-asm_64.S >> @@ -0,0 +1,521 @@ >> +/* SPDX-License-Identifier: GPL-2.0-or-later */ >> +/* >> + * SM3 AVX accelerated transform. >> + * specified in: >> https://datatracker.ietf.org/doc/html/draft-sca-cfrg-sm3-02 >> + * >> + * Copyright (C) 2021 Jussi Kivilinna >> + * Copyright (C) 2021 Tianjia Zhang >> + */ > >> + >> +#define R(i, a, b, c, d, e, f, g, h, round, widx, >> wtype)                      \ >> +    /* rol(a, 12) => t0 >> */                                                \ >> +    roll3mov(12, a, t0); /* rorxl here would reduce perf by 6% on >> zen3 */ \ >> +    /* rol (t0 + e + t), 7) => t1 >> */                                      \ >> +    addl3(t0, e, >> t1);                                                     \ >> +    addl $K##round, >> t1;                                                   \ > > It's better to use "leal K##round(t0, e, 1), t1;" here and fix K0-K63 > macros > instead as I noted at libgcrypt mailing-list: >  https://lists.gnupg.org/pipermail/gcrypt-devel/2021-December/005209.html > > -Jussi Thanks for pointing it out, I will fix it in the next patch. Best regards, Tianjia