From: "Elliott, Robert (Servers)" <elliott@hpe.com>
To: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: "herbert@gondor.apana.org.au" <herbert@gondor.apana.org.au>,
"davem@davemloft.net" <davem@davemloft.net>,
"tim.c.chen@linux.intel.com" <tim.c.chen@linux.intel.com>,
"ap420073@gmail.com" <ap420073@gmail.com>,
"ardb@kernel.org" <ardb@kernel.org>,
"linux-crypto@vger.kernel.org" <linux-crypto@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit
Date: Tue, 18 Oct 2022 00:06:02 +0000 [thread overview]
Message-ID: <MW5PR84MB18426636F27B59BB04872D78AB289@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <CAHmME9p7SQv=iay3QujFU7jGaNXmhYhU9TWPobERBXQ1xNVV5g@mail.gmail.com>
> -----Original Message-----
> From: Jason A. Donenfeld <Jason@zx2c4.com>
> Sent: Thursday, October 13, 2022 8:27 PM
> Subject: Re: [PATCH v2 09/19] crypto: x86 - use common macro for FPU
> limit
>
> On Thu, Oct 13, 2022 at 3:48 PM Elliott, Robert (Servers)
> <elliott@hpe.com> wrote:
> > Perhaps we should declare a time goal like "30 us," measure the actual
> > speed of each algorithm with a tcrypt speed test, and calculate the
> > nominal value assuming some slow x86 CPU core speed?
>
> Sure, pick something reasonable with good margin for a reasonable CPU.
> It doesn't have to be perfect, but just vaguely right for supported
> hardware.
>
> > That could be further adjusted at run-time based on the supposed
> > minimum CPU frequency (e.g., as reported in
> > /sys/devices/system/cpu/cpufreq/policy0/scaling_min_freq).
>
> Oh no, please no. Not another runtime knob. That also will make the
> loop less efficient.
Here's some stats measuring the time in CPU cycles between
kernel_fpu_begin() and kernel_fpu_end() for every x86 crypto
module using those function calls. This is before any
patches to enforce any new limits.
Driver boot tcrypt-sweep average
====== ==== ============ =======
aegis128_aesni 6240 | 8214 433
aesni_intel 22218 | 150558 68
aria_aesni_avx_x86_64 0 > 95560 1282
camellia_aesni_avx2 52300 52300 4300
camellia_aesni_avx_x86_64 20920 20920 5915
camellia_x86_64 0 0 0
cast5_avx_x86_64 41854 | 108996 6602
cast6_avx_x86_64 39270 | 119476 10596
chacha_x86_64 3516 | 58112 349
crc32c_intel 1458 | 2702 235
crc32_pclmul 1610 | 3130 210
crct10dif_pclmul 1928 | 2096 82
ghash_clmulni_intel 9154 | 56632 336
libblake2s_x86_64 7514 7514 897
nhpoly1305_avx2 1360 | 5408 301
poly1305_x86_64 20656 | 21688 409
polyval_clmulni 13972 13972 34
serpent_avx2 45686 | 74824 4185
serpent_avx_x86_64 47436 47436 7120
serpent_sse2_x86_64 38492 38492 7400
sha1_ssse3 20950 | 38310 512
sha256_ssse3 46554 | 57162 1201
sha512_ssse3 157051800 157051800 167728
sm3_avx_x86_64 82372 82372 2017
sm4_aesni_avx_x86_64 66350 66350 2019
twofish_avx_x86_64 104598 | 163894 6633
twofish_x86_64_3way 0 0 0
Comparing a few of the hash functions with tcrypt test 16
(4 KiB of data with 1 update) shows a 35x difference from the
fastest to slowest:
crc32c 695 cycles/operation
crct10dif 2197
sha1-avx2 8825
sha224-avx2 24816
sha256-avx2 21179
sha384-avx2 14939
sha512-avx2 14584
Test notes
==========
Measurement points:
- after booting, with
- CONFIG_MODULE_SIG_SHA512=y (use SHA-512 for module signing)
- CONFIG_CRYPTO_MANAGER_EXTRA_TESTS=y (compares results
with generic module during init)
- # CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
(run self-tests during module load)
- after sweeping through tcrypt test modes 1 to 999
- except 0, 300, and 400 which run combinations of the others
- measured on a system with Intel Cascade Lake CPUs at 2.2 GHz
This run did not report any RCU stalls.
The hash function is the main problem, subjected to huge
sizes during module signature checking. sha1 or sha256 would
face the same problem if they had been selected.
The self-tests are limited to 2 * PAGE_SIZE so don't stress
the drivers anywhere near as much as booting. This run did
include the tcrypt patch to call cond_resched during speed
tests, so the speed test induced problem is out of the way.
aria_aesni_avx_x86_64 0 > 95560 1282
This run didn't have the patch to load aria based on the
device table, so it wasn't loaded until tcrypt asked for it.
camellia_x86_64 0 0 0
twofish_x86_64_3way 0 0 0
Those use the ecb_cbc_helper macros, but pass along -1 to
not use kernel_fpu_begin/end, so the debug instrumentation
is there but unused.
Next steps
==========
I'll try to add a test with long data, and work on scaling the
loops based on relative performance (e.g., if sha512 needs
4 KiB, then crc32c should be fine with 80 KiB).
next prev parent reply other threads:[~2022-10-18 0:06 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-06 22:31 [RFC PATCH 0/7] crypto: x86 - fix RCU stalls Robert Elliott
2022-10-06 22:31 ` [RFC PATCH 1/7] rcu: correct CONFIG_EXT_RCU_CPU_STALL_TIMEOUT descriptions Robert Elliott
2022-10-06 22:31 ` [RFC PATCH 2/7] crypto: x86/sha - limit FPU preemption Robert Elliott
2022-10-06 22:31 ` [RFC PATCH 3/7] crypto: x86/crc " Robert Elliott
2022-10-06 22:31 ` [RFC PATCH 4/7] crypto: x86/sm3 " Robert Elliott
2022-10-06 22:31 ` [RFC PATCH 5/7] crypto: x86/ghash - restructure FPU context saving Robert Elliott
2022-10-06 22:31 ` [RFC PATCH 6/7] crypto: x86/ghash - limit FPU preemption Robert Elliott
2022-10-06 22:31 ` [RFC PATCH 7/7] crypto: x86 - use common macro for FPU limit Robert Elliott
2022-10-12 21:59 ` [PATCH v2 00/19] crypto: x86 - fix RCU stalls Robert Elliott
2022-10-12 21:59 ` [PATCH v2 01/19] crypto: tcrypt - test crc32 Robert Elliott
2022-10-12 21:59 ` [PATCH v2 02/19] crypto: tcrypt - test nhpoly1305 Robert Elliott
2022-10-12 21:59 ` [PATCH v2 03/19] crypto: tcrypt - reschedule during cycles speed tests Robert Elliott
2022-10-12 21:59 ` [PATCH v2 04/19] crypto: x86/sha - limit FPU preemption Robert Elliott
2022-10-13 0:41 ` Jason A. Donenfeld
2022-10-13 21:50 ` Elliott, Robert (Servers)
2022-10-14 11:01 ` David Laight
2022-10-13 5:57 ` Eric Biggers
2022-10-13 6:04 ` Herbert Xu
2022-10-13 6:08 ` Eric Biggers
2022-10-13 7:50 ` Herbert Xu
2022-10-13 22:41 ` :Re: " Elliott, Robert (Servers)
2022-10-12 21:59 ` [PATCH v2 05/19] crypto: x86/crc " Robert Elliott
2022-10-13 2:00 ` Herbert Xu
2022-10-13 22:34 ` Elliott, Robert (Servers)
2022-10-14 4:02 ` David Laight
2022-10-24 2:03 ` kernel test robot
2022-10-12 21:59 ` [PATCH v2 06/19] crypto: x86/sm3 " Robert Elliott
2022-10-12 21:59 ` [PATCH v2 07/19] crypto: x86/ghash - restructure FPU context saving Robert Elliott
2022-10-12 21:59 ` [PATCH v2 08/19] crypto: x86/ghash - limit FPU preemption Robert Elliott
2022-10-13 6:03 ` Eric Biggers
2022-10-13 22:52 ` Elliott, Robert (Servers)
2022-10-12 21:59 ` [PATCH v2 09/19] crypto: x86 - use common macro for FPU limit Robert Elliott
2022-10-13 0:35 ` Jason A. Donenfeld
2022-10-13 21:48 ` Elliott, Robert (Servers)
2022-10-14 1:26 ` Jason A. Donenfeld
2022-10-18 0:06 ` Elliott, Robert (Servers) [this message]
2022-10-12 21:59 ` [PATCH v2 10/19] crypto: x86/sha1, sha256 - load based on CPU features Robert Elliott
2022-10-12 21:59 ` [PATCH v2 11/19] crypto: x86/crc " Robert Elliott
2022-10-12 21:59 ` [PATCH v2 12/19] crypto: x86/sm3 " Robert Elliott
2022-10-12 21:59 ` [PATCH v2 13/19] crypto: x86/ghash " Robert Elliott
2022-10-12 21:59 ` [PATCH v2 14/19] crypto: x86 " Robert Elliott
2022-10-14 14:26 ` Elliott, Robert (Servers)
2022-10-12 21:59 ` [PATCH v2 15/19] crypto: x86 - add pr_fmt to all modules Robert Elliott
2022-10-12 21:59 ` [PATCH v2 16/19] crypto: x86 - print CPU optimized loaded messages Robert Elliott
2022-10-13 0:40 ` Jason A. Donenfeld
2022-10-13 13:47 ` kernel test robot
2022-10-13 13:48 ` kernel test robot
2022-10-12 21:59 ` [PATCH v2 17/19] crypto: x86 - standardize suboptimal prints Robert Elliott
2022-10-13 0:38 ` Jason A. Donenfeld
2022-10-12 21:59 ` [PATCH v2 18/19] crypto: x86 - standardize not loaded prints Robert Elliott
2022-10-13 0:42 ` Jason A. Donenfeld
2022-10-13 22:20 ` Elliott, Robert (Servers)
2022-11-10 22:06 ` Elliott, Robert (Servers)
2022-10-12 21:59 ` [PATCH v2 19/19] crypto: x86/sha - register only the best function Robert Elliott
2022-10-13 6:07 ` Eric Biggers
2022-10-13 7:52 ` Herbert Xu
2022-10-13 22:59 ` Elliott, Robert (Servers)
2022-10-14 8:22 ` Herbert Xu
2022-11-01 21:34 ` [PATCH v2 00/19] crypto: x86 - fix RCU stalls Elliott, Robert (Servers)
2022-11-03 4:27 ` [PATCH v3 00/17] crypt: " Robert Elliott
2022-11-03 4:27 ` [PATCH v3 01/17] crypto: tcrypt - test crc32 Robert Elliott
2022-11-03 4:27 ` [PATCH v3 02/17] crypto: tcrypt - test nhpoly1305 Robert Elliott
2022-11-03 4:27 ` [PATCH v3 03/17] crypto: tcrypt - reschedule during cycles speed tests Robert Elliott
2022-11-03 4:27 ` [PATCH v3 04/17] crypto: x86/sha - limit FPU preemption Robert Elliott
2022-11-03 4:27 ` [PATCH v3 05/17] crypto: x86/crc " Robert Elliott
2022-11-03 4:27 ` [PATCH v3 06/17] crypto: x86/sm3 " Robert Elliott
2022-11-03 4:27 ` [PATCH v3 07/17] crypto: x86/ghash - use u8 rather than char Robert Elliott
2022-11-03 4:27 ` [PATCH v3 08/17] crypto: x86/ghash - restructure FPU context saving Robert Elliott
2022-11-03 4:27 ` [PATCH v3 09/17] crypto: x86/ghash - limit FPU preemption Robert Elliott
2022-11-03 4:27 ` [PATCH v3 10/17] crypto: x86/*poly* " Robert Elliott
2022-11-03 4:27 ` [PATCH v3 11/17] crypto: x86/sha - register all variations Robert Elliott
2022-11-03 9:26 ` kernel test robot
2022-11-03 4:27 ` [PATCH v3 12/17] crypto: x86/sha - minimize time in FPU context Robert Elliott
2022-11-03 4:27 ` [PATCH v3 13/17] crypto: x86/sha1, sha256 - load based on CPU features Robert Elliott
2022-11-03 4:27 ` [PATCH v3 14/17] crypto: x86/crc " Robert Elliott
2022-11-03 4:27 ` [PATCH v3 15/17] crypto: x86/sm3 " Robert Elliott
2022-11-03 4:27 ` [PATCH v3 16/17] crypto: x86/ghash,polyval " Robert Elliott
2022-11-03 4:27 ` [PATCH v3 17/17] crypto: x86/nhpoly1305, poly1305 " Robert Elliott
2022-11-16 4:13 ` [PATCH v4 00/24] crypto: fix RCU stalls Robert Elliott
2022-11-16 4:13 ` [PATCH v4 01/24] crypto: tcrypt - test crc32 Robert Elliott
2022-11-16 4:13 ` [PATCH v4 02/24] crypto: tcrypt - test nhpoly1305 Robert Elliott
2022-11-16 4:13 ` [PATCH v4 03/24] crypto: tcrypt - reschedule during cycles speed tests Robert Elliott
2022-11-16 4:13 ` [PATCH v4 04/24] crypto: x86/sha - limit FPU preemption Robert Elliott
2022-11-16 4:13 ` [PATCH v4 05/24] crypto: x86/crc " Robert Elliott
2022-11-16 4:13 ` [PATCH v4 06/24] crypto: x86/sm3 " Robert Elliott
2022-11-16 4:13 ` [PATCH v4 07/24] crypto: x86/ghash - use u8 rather than char Robert Elliott
2022-11-16 4:13 ` [PATCH v4 08/24] crypto: x86/ghash - restructure FPU context saving Robert Elliott
2022-11-16 4:13 ` [PATCH v4 09/24] crypto: x86/ghash - limit FPU preemption Robert Elliott
2022-11-16 4:13 ` [PATCH v4 10/24] crypto: x86/poly " Robert Elliott
2022-11-16 11:13 ` Jason A. Donenfeld
2022-11-22 5:06 ` Elliott, Robert (Servers)
2022-11-22 9:07 ` David Laight
2022-11-25 8:40 ` Herbert Xu
2022-11-25 8:59 ` Ard Biesheuvel
2022-11-25 9:03 ` Herbert Xu
2022-11-28 16:57 ` Elliott, Robert (Servers)
2022-11-28 18:48 ` Elliott, Robert (Servers)
2022-12-02 6:21 ` Elliott, Robert (Servers)
2022-12-02 9:25 ` Herbert Xu
2022-12-02 16:15 ` Elliott, Robert (Servers)
2022-12-06 4:27 ` Herbert Xu
2022-12-06 14:03 ` Peter Lafreniere
2022-12-06 14:44 ` David Laight
2022-12-06 23:06 ` Peter Lafreniere
2022-12-10 0:34 ` Elliott, Robert (Servers)
2022-12-16 22:12 ` Elliott, Robert (Servers)
2022-11-16 4:13 ` [PATCH v4 11/24] crypto: x86/aegis " Robert Elliott
2022-11-16 4:13 ` [PATCH v4 12/24] crypto: x86/sha - register all variations Robert Elliott
2022-11-16 4:13 ` [PATCH v4 13/24] crypto: x86/sha - minimize time in FPU context Robert Elliott
2022-11-16 4:13 ` [PATCH v4 14/24] crypto: x86/sha - load based on CPU features Robert Elliott
2022-11-16 4:13 ` [PATCH v4 15/24] crypto: x86/crc " Robert Elliott
2022-11-16 4:13 ` [PATCH v4 16/24] crypto: x86/sm3 " Robert Elliott
2022-11-16 4:13 ` [PATCH v4 17/24] crypto: x86/poly " Robert Elliott
2022-11-16 11:19 ` Jason A. Donenfeld
2022-11-16 4:13 ` [PATCH v4 18/24] crypto: x86/ghash " Robert Elliott
2022-11-16 4:13 ` [PATCH v4 19/24] crypto: x86/aesni - avoid type conversions Robert Elliott
2022-11-16 4:13 ` [PATCH v4 20/24] crypto: x86/ciphers - load based on CPU features Robert Elliott
2022-11-16 11:30 ` Jason A. Donenfeld
2022-11-16 4:13 ` [PATCH v4 21/24] crypto: x86 - report used CPU features via module parameters Robert Elliott
2022-11-16 11:26 ` Jason A. Donenfeld
2022-11-16 4:13 ` [PATCH v4 22/24] crypto: x86 - report missing " Robert Elliott
2022-11-16 4:13 ` [PATCH v4 23/24] crypto: x86 - report suboptimal CPUs " Robert Elliott
2022-11-16 4:13 ` [PATCH v4 24/24] crypto: x86 - standarize module descriptions Robert Elliott
2022-11-17 3:58 ` [PATCH v4 00/24] crypto: fix RCU stalls Herbert Xu
2022-11-17 15:13 ` Elliott, Robert (Servers)
2022-11-17 15:15 ` Jason A. Donenfeld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=MW5PR84MB18426636F27B59BB04872D78AB289@MW5PR84MB1842.NAMPRD84.PROD.OUTLOOK.COM \
--to=elliott@hpe.com \
--cc=Jason@zx2c4.com \
--cc=ap420073@gmail.com \
--cc=ardb@kernel.org \
--cc=davem@davemloft.net \
--cc=herbert@gondor.apana.org.au \
--cc=linux-crypto@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tim.c.chen@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).