All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ard Biesheuvel <ardb@kernel.org>
To: Megha Dey <megha.dey@intel.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>,
	Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	ravi.v.shankar@intel.com, tim.c.chen@intel.com,
	andi.kleen@intel.com, dave.hansen@intel.com,
	wajdi.k.feghali@intel.com, greg.b.tucker@intel.com,
	robert.a.kasten@intel.com, rajendrakumar.chinnaiyan@intel.com,
	tomasz.kantecki@intel.com, ryan.d.saffores@intel.com,
	ilya.albrekht@intel.com, kyung.min.park@intel.com,
	Tony Luck <tony.luck@intel.com>,
	ira.weiny@intel.com
Subject: Re: [RFC V1 3/7] crypto: ghash - Optimized GHASH computations
Date: Sat, 19 Dec 2020 18:03:45 +0100	[thread overview]
Message-ID: <CAMj1kXGhGopfg19at5N_9q89-UA4irSgMULyDXg+dKhnbRrCZQ@mail.gmail.com> (raw)
In-Reply-To: <1608325864-4033-4-git-send-email-megha.dey@intel.com>

On Fri, 18 Dec 2020 at 22:07, Megha Dey <megha.dey@intel.com> wrote:
>
> From: Kyung Min Park <kyung.min.park@intel.com>
>
> Optimize GHASH computations with the 512 bit wide VPCLMULQDQ instructions.
> The new instruction allows to work on 4 x 16 byte blocks at the time.
> For best parallelism and deeper out of order execution, the main loop of
> the code works on 16 x 16 byte blocks at the time and performs reduction
> every 48 x 16 byte blocks. Such approach needs 48 precomputed GHASH subkeys
> and the precompute operation has been optimized as well to leverage 512 bit
> registers, parallel carry less multiply and reduction.
>
> VPCLMULQDQ instruction is used to accelerate the most time-consuming
> part of GHASH, carry-less multiplication. VPCLMULQDQ instruction
> with AVX-512F adds EVEX encoded 512 bit version of PCLMULQDQ instruction.
>
> The glue code in ghash_clmulni_intel module overrides existing PCLMULQDQ
> version with the VPCLMULQDQ version when the following criteria are met:
> At compile time:
> 1. CONFIG_CRYPTO_AVX512 is enabled
> 2. toolchain(assembler) supports VPCLMULQDQ instructions
> At runtime:
> 1. VPCLMULQDQ and AVX512VL features are supported on a platform (currently
>    only Icelake)
> 2. If compiled as built-in module, ghash_clmulni_intel.use_avx512 is set at
>    boot time or /sys/module/ghash_clmulni_intel/parameters/use_avx512 is set
>    to 1 after boot.
>    If compiled as loadable module, use_avx512 module parameter must be set:
>    modprobe ghash_clmulni_intel use_avx512=1
>
> With new implementation, tcrypt ghash speed test shows about 4x to 10x
> speedup improvement for GHASH calculation compared to the original
> implementation with PCLMULQDQ when the bytes per update size is 256 Bytes
> or above. Detailed results for a variety of block sizes and update
> sizes are in the table below. The test was performed on Icelake based
> platform with constant frequency set for CPU.
>
> The average performance improvement of the AVX512 version over the current
> implementation is as follows:
> For bytes per update >= 1KB, we see the average improvement of 882%(~8.8x).
> For bytes per update < 1KB, we see the average improvement of 370%(~3.7x).
>
> A typical run of tcrypt with GHASH calculation with PCLMULQDQ instruction
> and VPCLMULQDQ instruction shows the following results.
>
> ---------------------------------------------------------------------------
> |            |            |         cycles/operation         |            |
> |            |            |       (the lower the better)     |            |
> |    byte    |   bytes    |----------------------------------| percentage |
> |   blocks   | per update |   GHASH test   |   GHASH test    | loss/gain  |
> |            |            | with PCLMULQDQ | with VPCLMULQDQ |            |
> |------------|------------|----------------|-----------------|------------|
> |      16    |     16     |       144      |        233      |   -38.0    |
> |      64    |     16     |       535      |        709      |   -24.5    |
> |      64    |     64     |       210      |        146      |    43.8    |
> |     256    |     16     |      1808      |       1911      |    -5.4    |
> |     256    |     64     |       865      |        581      |    48.9    |
> |     256    |    256     |       682      |        170      |   301.0    |
> |    1024    |     16     |      6746      |       6935      |    -2.7    |
> |    1024    |    256     |      2829      |        714      |   296.0    |
> |    1024    |   1024     |      2543      |        341      |   645.0    |
> |    2048    |     16     |     13219      |      13403      |    -1.3    |
> |    2048    |    256     |      5435      |       1408      |   286.0    |
> |    2048    |   1024     |      5218      |        685      |   661.0    |
> |    2048    |   2048     |      5061      |        565      |   796.0    |
> |    4096    |     16     |     40793      |      27615      |    47.8    |
> |    4096    |    256     |     10662      |       2689      |   297.0    |
> |    4096    |   1024     |     10196      |       1333      |   665.0    |
> |    4096    |   4096     |     10049      |       1011      |   894.0    |
> |    8192    |     16     |     51672      |      54599      |    -5.3    |
> |    8192    |    256     |     21228      |       5284      |   301.0    |
> |    8192    |   1024     |     20306      |       2556      |   694.0    |
> |    8192    |   4096     |     20076      |       2044      |   882.0    |
> |    8192    |   8192     |     20071      |       2017      |   895.0    |
> ---------------------------------------------------------------------------
>
> This work was inspired by the AES GCM mode optimization published
> in Intel Optimized IPSEC Cryptographic library.
> https://github.com/intel/intel-ipsec-mb/lib/avx512/gcm_vaes_avx512.asm
>
> Co-developed-by: Greg Tucker <greg.b.tucker@intel.com>
> Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
> Co-developed-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
> Signed-off-by: Tomasz Kantecki <tomasz.kantecki@intel.com>
> Signed-off-by: Kyung Min Park <kyung.min.park@intel.com>
> Co-developed-by: Megha Dey <megha.dey@intel.com>
> Signed-off-by: Megha Dey <megha.dey@intel.com>

Hello Megha,

What is the purpose of this separate GHASH module? GHASH is only used
in combination with AES-CTR to produce GCM, and this series already
contains a GCM driver.

Do cores exist that implement PCLMULQDQ but not AES-NI?

If not, I think we should be able to drop this patch (and remove the
existing PCLMULQDQ GHASH driver as well)

  reply	other threads:[~2020-12-19 17:04 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-18 21:10 [RFC V1 0/7] Introduce AVX512 optimized crypto algorithms Megha Dey
2020-12-18 21:10 ` [RFC V1 1/7] x86: Probe assembler capabilities for VAES and VPLCMULQDQ support Megha Dey
2021-01-16 16:54   ` Ard Biesheuvel
2021-01-20 22:38     ` Dey, Megha
2020-12-18 21:10 ` [RFC V1 2/7] crypto: crct10dif - Accelerated CRC T10 DIF with vectorized instruction Megha Dey
2021-01-16 17:00   ` Ard Biesheuvel
2021-01-20 22:46     ` Dey, Megha
2020-12-18 21:11 ` [RFC V1 3/7] crypto: ghash - Optimized GHASH computations Megha Dey
2020-12-19 17:03   ` Ard Biesheuvel [this message]
2021-01-16  0:14     ` Dey, Megha
2021-01-16  0:20       ` Dave Hansen
2021-01-16  2:04         ` Eric Biggers
2021-01-16  5:13           ` Dave Hansen
2021-01-16 16:48             ` Ard Biesheuvel
2021-01-16  1:43       ` Eric Biggers
2021-01-16  5:07         ` Dey, Megha
2020-12-18 21:11 ` [RFC V1 4/7] crypto: tcrypt - Add speed test for optimized " Megha Dey
2020-12-18 21:11 ` [RFC V1 5/7] crypto: aesni - AES CTR x86_64 "by16" AVX512 optimization Megha Dey
2021-01-16 17:03   ` Ard Biesheuvel
2021-01-20 22:46     ` Dey, Megha
2020-12-18 21:11 ` [RFC V1 6/7] crypto: aesni - fix coding style for if/else block Megha Dey
2020-12-18 21:11 ` [RFC V1 7/7] crypto: aesni - AVX512 version of AESNI-GCM using VPCLMULQDQ Megha Dey
2021-01-16 17:16   ` Ard Biesheuvel
2021-01-20 22:48     ` Dey, Megha
2020-12-21 23:20 ` [RFC V1 0/7] Introduce AVX512 optimized crypto algorithms Eric Biggers
2020-12-28 19:10   ` Dey, Megha
2021-01-16 16:52     ` Ard Biesheuvel
2021-01-16 18:35       ` Dey, Megha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMj1kXGhGopfg19at5N_9q89-UA4irSgMULyDXg+dKhnbRrCZQ@mail.gmail.com \
    --to=ardb@kernel.org \
    --cc=andi.kleen@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=davem@davemloft.net \
    --cc=greg.b.tucker@intel.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=ilya.albrekht@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=kyung.min.park@intel.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=megha.dey@intel.com \
    --cc=rajendrakumar.chinnaiyan@intel.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=robert.a.kasten@intel.com \
    --cc=ryan.d.saffores@intel.com \
    --cc=tim.c.chen@intel.com \
    --cc=tomasz.kantecki@intel.com \
    --cc=tony.luck@intel.com \
    --cc=wajdi.k.feghali@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.