From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: akpm@linux-foundation.org, "Arnd Bergmann" <arnd@arndb.de>,
"Herbert Xu" <herbert@gondor.apana.org.au>,
"Ralf Baechle" <ralf@linux-mips.org>
Subject: [PATCH 3.16 09/19] crypto: improve gcc optimization flags for serpent and wp512
Date: Sat, 01 Apr 2017 14:17:50 +0100 [thread overview]
Message-ID: <lsq.1491052670.915320569@decadent.org.uk> (raw)
In-Reply-To: <lsq.1491052670.319419763@decadent.org.uk>
3.16.43-rc1 review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann <arnd@arndb.de>
commit 7d6e9105026788c497f0ab32fa16c82f4ab5ff61 upstream.
An ancient gcc bug (first reported in 2003) has apparently resurfaced
on MIPS, where kernelci.org reports an overly large stack frame in the
whirlpool hash algorithm:
crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]
With some testing in different configurations, I'm seeing large
variations in stack frames size up to 1500 bytes for what should have
around 300 bytes at most. I also checked the reference implementation,
which is essentially the same code but also comes with some test and
benchmarking infrastructure.
It seems that recent compiler versions on at least arm, arm64 and powerpc
have a partial fix for this problem, but enabling "-fsched-pressure", but
even with that fix they suffer from the issue to a certain degree. Some
testing on arm64 shows that the time needed to hash a given amount of
data is roughly proportional to the stack frame size here, which makes
sense given that the wp512 implementation is doing lots of loads for
table lookups, and the problem with the overly large stack is a result
of doing a lot more loads and stores for spilled registers (as seen from
inspecting the object code).
Disabling -fschedule-insns consistently fixes the problem for wp512,
in my collection of cross-compilers, the results are consistently better
or identical when comparing the stack sizes in this function, though
some architectures (notable x86) have schedule-insns disabled by
default.
The four columns are:
default: -O2
press: -O2 -fsched-pressure
nopress: -O2 -fschedule-insns -fno-sched-pressure
nosched: -O2 -no-schedule-insns (disables sched-pressure)
default press nopress nosched
alpha-linux-gcc-4.9.3 1136 848 1136 176
am33_2.0-linux-gcc-4.9.3 2100 2076 2100 2104
arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352
cris-linux-gcc-4.9.3 272 272 272 272
frv-linux-gcc-4.9.3 1128 1000 1128 280
hppa64-linux-gcc-4.9.3 1128 336 1128 184
hppa-linux-gcc-4.9.3 644 308 644 276
i386-linux-gcc-4.9.3 352 352 352 352
m32r-linux-gcc-4.9.3 720 656 720 268
microblaze-linux-gcc-4.9.3 1108 604 1108 256
mips64-linux-gcc-4.9.3 1328 592 1328 208
mips-linux-gcc-4.9.3 1096 624 1096 240
powerpc64-linux-gcc-4.9.3 1088 432 1088 160
powerpc-linux-gcc-4.9.3 1080 584 1080 224
s390-linux-gcc-4.9.3 456 456 624 360
sh3-linux-gcc-4.9.3 292 292 292 292
sparc64-linux-gcc-4.9.3 992 240 992 208
sparc-linux-gcc-4.9.3 680 592 680 312
x86_64-linux-gcc-4.9.3 224 240 272 224
xtensa-linux-gcc-4.9.3 1152 704 1152 304
aarch64-linux-gcc-7.0.0 224 224 1104 208
arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352
mips-linux-gcc-7.0.0 1120 648 1120 272
x86_64-linux-gcc-7.0.1 240 240 304 240
arm-linux-gnueabi-gcc-4.4.7 840 392
arm-linux-gnueabi-gcc-4.5.4 784 728 784 320
arm-linux-gnueabi-gcc-4.6.4 736 728 736 304
arm-linux-gnueabi-gcc-4.7.4 944 784 944 352
arm-linux-gnueabi-gcc-4.8.5 464 464 760 352
arm-linux-gnueabi-gcc-4.9.3 848 848 1048 352
arm-linux-gnueabi-gcc-5.3.1 824 824 1064 336
arm-linux-gnueabi-gcc-6.1.1 808 808 1056 344
arm-linux-gnueabi-gcc-7.0.1 824 824 1048 352
Trying the same test for serpent-generic, the picture is a bit different,
and while -fno-schedule-insns is generally better here than the default,
-fsched-pressure wins overall, so I picked that instead.
default press nopress nosched
alpha-linux-gcc-4.9.3 1392 864 1392 960
am33_2.0-linux-gcc-4.9.3 536 524 536 528
arm-linux-gnueabi-gcc-4.9.3 552 552 776 536
cris-linux-gcc-4.9.3 528 528 528 528
frv-linux-gcc-4.9.3 536 400 536 504
hppa64-linux-gcc-4.9.3 524 208 524 480
hppa-linux-gcc-4.9.3 768 472 768 508
i386-linux-gcc-4.9.3 564 564 564 564
m32r-linux-gcc-4.9.3 712 576 712 532
microblaze-linux-gcc-4.9.3 724 392 724 512
mips64-linux-gcc-4.9.3 720 384 720 496
mips-linux-gcc-4.9.3 728 384 728 496
powerpc64-linux-gcc-4.9.3 704 304 704 480
powerpc-linux-gcc-4.9.3 704 296 704 480
s390-linux-gcc-4.9.3 560 560 592 536
sh3-linux-gcc-4.9.3 540 540 540 540
sparc64-linux-gcc-4.9.3 544 352 544 496
sparc-linux-gcc-4.9.3 544 344 544 496
x86_64-linux-gcc-4.9.3 528 536 576 528
xtensa-linux-gcc-4.9.3 752 544 752 544
aarch64-linux-gcc-7.0.0 432 432 656 480
arm-linux-gnueabi-gcc-7.0.1 616 616 808 536
mips-linux-gcc-7.0.0 720 464 720 488
x86_64-linux-gcc-7.0.1 536 528 600 536
arm-linux-gnueabi-gcc-4.4.7 592 440
arm-linux-gnueabi-gcc-4.5.4 776 448 776 544
arm-linux-gnueabi-gcc-4.6.4 776 448 776 544
arm-linux-gnueabi-gcc-4.7.4 768 448 768 544
arm-linux-gnueabi-gcc-4.8.5 488 488 776 544
arm-linux-gnueabi-gcc-4.9.3 552 552 776 536
arm-linux-gnueabi-gcc-5.3.1 552 552 776 536
arm-linux-gnueabi-gcc-6.1.1 560 560 776 536
arm-linux-gnueabi-gcc-7.0.1 616 616 808 536
I did not do any runtime tests with serpent, so it is possible that stack
frame size does not directly correlate with runtime performance here and
it actually makes things worse, but it's more likely to help here, and
the reduced stack frame size is probably enough reason to apply the patch,
especially given that the crypto code is often used in deep call chains.
Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/
Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
crypto/Makefile | 2 ++
1 file changed, 2 insertions(+)
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -47,6 +47,7 @@ obj-$(CONFIG_CRYPTO_SHA1) += sha1_generi
obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o
obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o
obj-$(CONFIG_CRYPTO_WP512) += wp512.o
+CFLAGS_wp512.o := $(call cc-option,-fno-schedule-insns) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o
obj-$(CONFIG_CRYPTO_GF128MUL) += gf128mul.o
obj-$(CONFIG_CRYPTO_ECB) += ecb.o
@@ -67,6 +68,7 @@ obj-$(CONFIG_CRYPTO_BLOWFISH_COMMON) +=
obj-$(CONFIG_CRYPTO_TWOFISH) += twofish_generic.o
obj-$(CONFIG_CRYPTO_TWOFISH_COMMON) += twofish_common.o
obj-$(CONFIG_CRYPTO_SERPENT) += serpent_generic.o
+CFLAGS_serpent_generic.o := $(call cc-option,-fsched-pressure) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
obj-$(CONFIG_CRYPTO_AES) += aes_generic.o
obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia_generic.o
obj-$(CONFIG_CRYPTO_CAST_COMMON) += cast_common.o
next prev parent reply other threads:[~2017-04-01 13:23 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-01 13:17 [PATCH 3.16 00/19] 3.16.43-rc1 review Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 05/19] MIPS: save/disable MSA in lose_fpu Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 14/19] MIPS: Zero variable read by get_user / __get_user in case of an error Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 04/19] MIPS: preserve scalar FP CSR when switching vector context Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 15/19] HID: hid-input: Add parentheses to quell gcc warning Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 18/19] aio: mark AIO pseudo-fs noexec Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 17/19] vfs: Commit to never having exectuables on proc and sysfs Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 12/19] serial: samsung: Use %pa to print 'resource_size_t' type Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 16/19] netlink: remove mmapped netlink support Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 11/19] mmc: sunxi: avoid invalid pointer calculation Ben Hutchings
2017-04-01 18:45 ` David Lanzendörfer
2017-04-01 19:53 ` Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 08/19] atm: iphase: fix misleading indention Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 19/19] keys: Guard against null match function in keyring_search_aux() Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 10/19] fs/nfs: fix new compiler warning about boolean in switch Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 13/19] MIPS: ralink: Cosmetic change to prom_init() Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 06/19] MIPS: init upper 64b of vector registers when MSA is first used Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 02/19] blk: rq_data_dir() should not return a boolean Ben Hutchings
2017-04-01 13:17 ` Ben Hutchings [this message]
2017-04-01 13:17 ` [PATCH 3.16 03/19] MIPS: save/restore MSACSR register on context switch Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 01/19] fs: namespace: suppress 'may be used uninitialized' warnings Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 07/19] MIPS: Fix build with binutils 2.24.51+ Ben Hutchings
2017-04-01 17:43 ` [PATCH 3.16 00/19] 3.16.43-rc1 review Guenter Roeck
2017-04-01 22:40 ` Ben Hutchings
2017-04-02 2:21 ` Guenter Roeck
2017-04-02 2:48 ` Ben Hutchings
2017-04-02 3:04 ` [PATCH 3.16 00/26] 3.16.43-rc2 review Ben Hutchings
2017-04-02 3:04 ` [PATCH 3.16 22/26] MIPS: traps: Fix inline asm ctc1 missing .set hardfloat Ben Hutchings
2017-04-02 3:04 ` [PATCH 3.16 23/26] MIPS: Push .set mips64r* into the functions needing it Ben Hutchings
2017-04-02 3:04 ` [PATCH 3.16 24/26] MIPS: assume at as source/dest of MSA copy/insert instructions Ben Hutchings
2017-04-02 3:04 ` [PATCH 3.16 20/26] MIPS: allow msa.h to be included in assembly files Ben Hutchings
2017-04-02 3:04 ` [PATCH 3.16 26/26] MIPS: wrap cfcmsa & ctcmsa accesses for toolchains with MSA support Ben Hutchings
2017-04-02 3:04 ` [PATCH 3.16 21/26] MIPS: mipsregs.h: Add write_32bit_cp1_register() Ben Hutchings
2017-04-02 3:04 ` [PATCH 3.16 25/26] MIPS: remove MSA macro recursion Ben Hutchings
2017-04-02 3:15 ` [PATCH 3.16 00/26] 3.16.43-rc2 review Ben Hutchings
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=lsq.1491052670.915320569@decadent.org.uk \
--to=ben@decadent.org.uk \
--cc=akpm@linux-foundation.org \
--cc=arnd@arndb.de \
--cc=herbert@gondor.apana.org.au \
--cc=linux-kernel@vger.kernel.org \
--cc=ralf@linux-mips.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.