linux-crypto.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Megha Dey <megha.dey@intel.com>, Tony Luck <tony.luck@intel.com>,
	Asit K Mallick <asit.k.mallick@intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>
Cc: Linux Crypto Mailing List <linux-crypto@vger.kernel.org>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	"Chen, Tim C" <tim.c.chen@intel.com>,
	"Kleen, Andi" <andi.kleen@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	greg.b.tucker@intel.com, "Kasten,
	Robert A" <robert.a.kasten@intel.com>,
	rajendrakumar.chinnaiyan@intel.com, tomasz.kantecki@intel.com,
	ryan.d.saffores@intel.com, ilya.albrekht@intel.com,
	Kyung Min Park <kyung.min.park@intel.com>,
	Weiny Ira <ira.weiny@intel.com>,
	Eric Biggers <ebiggers@kernel.org>,
	Ard Biesheuvel <ardb@kernel.org>, X86 ML <x86@kernel.org>
Subject: Re: [RFC V2 0/5] Introduce AVX512 optimized crypto algorithms
Date: Sun, 24 Jan 2021 08:23:57 -0800	[thread overview]
Message-ID: <CALCETrU06cuvUF5NDSm8--dy3dOkxYQ88cGWaakOQUE4Vkz88w@mail.gmail.com> (raw)
In-Reply-To: <1611386920-28579-1-git-send-email-megha.dey@intel.com>

On Fri, Jan 22, 2021 at 11:29 PM Megha Dey <megha.dey@intel.com> wrote:
>
> Optimize crypto algorithms using AVX512 instructions - VAES and VPCLMULQDQ
> (first implemented on Intel's Icelake client and Xeon CPUs).
>
> These algorithms take advantage of the AVX512 registers to keep the CPU
> busy and increase memory bandwidth utilization. They provide substantial
> (2-10x) improvements over existing crypto algorithms when update data size
> is greater than 128 bytes and do not have any significant impact when used
> on small amounts of data.
>
> However, these algorithms may also incur a frequency penalty and cause
> collateral damage to other workloads running on the same core(co-scheduled
> threads). These frequency drops are also known as bin drops where 1 bin
> drop is around 100MHz. With the SpecCPU and ffmpeg benchmark, a 0-1 bin
> drop(0-100MHz) is observed on Icelake desktop and 0-2 bin drops (0-200Mhz)
> are observed on the Icelake server.
>
> The AVX512 optimization are disabled by default to avoid impact on other
> workloads. In order to use these optimized algorithms:
> 1. At compile time:
>    a. User must enable CONFIG_CRYPTO_AVX512 option
>    b. Toolchain(assembler) must support VPCLMULQDQ and VAES instructions
> 2. At run time:
>    a. User must set module parameter use_avx512 at boot time
>    b. Platform must support VPCLMULQDQ and VAES features
>
> N.B. It is unclear whether these coarse grain controls(global module
> parameter) would meet all user needs. Perhaps some per-thread control might
> be useful? Looking for guidance here.


I've just been looking at some performance issues with in-kernel AVX,
and I have a whole pile of questions that I think should be answered
first:

What is the impact of using an AVX-512 instruction on the logical
thread, its siblings, and other cores on the package?

Does the impact depend on whether it’s a 512-bit insn or a shorter EVEX insn?

What is the impact on subsequent shorter EVEX, VEX, and legacy
SSE(2,3, etc) insns?

How does VZEROUPPER figure in?  I can find an enormous amount of
misinformation online, but nothing authoritative.

What is the effect of the AVX-512 states (5-7) being “in use”?  As far
as I can tell, the only operations that clear XINUSE[5-7] are XRSTOR
and its variants.  Is this correct?

On AVX-512 capable CPUs, do we ever get a penalty for executing a
non-VEX insn followed by a large-width EVEX insn without an
intervening VZEROUPPER?  The docs suggest no, since Broadwell and
before don’t support EVEX, but I’d like to know for sure.


My current opinion is that we should not enable AVX-512 in-kernel
except on CPUs that we determine have good AVX-512 support.  Based on
some reading, that seems to mean Ice Lake Client and not anything
before it.  I also think a bunch of the above questions should be
answered before we do any of this.  Right now we have a regression of
unknown impact in regular AVX support in-kernel, we will have
performance issues in-kernel depending on what user code has done
recently, and I'm still trying to figure out what to do about it.
Throwing AVX-512 into the mix without real information is not going to
improve the situation.

  parent reply	other threads:[~2021-01-24 16:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-23  7:28 [RFC V2 0/5] Introduce AVX512 optimized crypto algorithms Megha Dey
2021-01-23  7:28 ` [RFC V2 1/5] crypto: aesni - fix coding style for if/else block Megha Dey
2021-01-23  7:28 ` [RFC V2 2/5] x86: Probe assembler capabilities for VAES and VPLCMULQDQ support Megha Dey
2021-01-23  7:28 ` [RFC V2 3/5] crypto: crct10dif - Accelerated CRC T10 DIF with vectorized instruction Megha Dey
2021-01-23  7:28 ` [RFC V2 4/5] crypto: aesni - AES CTR x86_64 "by16" AVX512 optimization Megha Dey
2021-01-23  7:28 ` [RFC V2 5/5] crypto: aesni - AVX512 version of AESNI-GCM using VPCLMULQDQ Megha Dey
2021-01-24 16:23 ` Andy Lutomirski [this message]
2021-02-24  0:54   ` [RFC V2 0/5] Introduce AVX512 optimized crypto algorithms Dey, Megha
2021-02-24 17:42     ` Andy Lutomirski
2022-01-31 18:43       ` Dey, Megha
2022-01-31 19:18         ` Dave Hansen
2022-02-01 16:42           ` Dey, Megha
2022-02-24 19:31           ` Dey, Megha
2022-03-05 18:37             ` Andy Lutomirski
2021-05-07 16:22   ` Dave Hansen
2021-01-25 17:27 ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrU06cuvUF5NDSm8--dy3dOkxYQ88cGWaakOQUE4Vkz88w@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=andi.kleen@intel.com \
    --cc=ardb@kernel.org \
    --cc=asit.k.mallick@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=davem@davemloft.net \
    --cc=ebiggers@kernel.org \
    --cc=greg.b.tucker@intel.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=hpa@zytor.com \
    --cc=ilya.albrekht@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=kyung.min.park@intel.com \
    --cc=linux-crypto@vger.kernel.org \
    --cc=megha.dey@intel.com \
    --cc=rajendrakumar.chinnaiyan@intel.com \
    --cc=ravi.v.shankar@intel.com \
    --cc=robert.a.kasten@intel.com \
    --cc=ryan.d.saffores@intel.com \
    --cc=tim.c.chen@intel.com \
    --cc=tomasz.kantecki@intel.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).