All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Hutchings <ben@decadent.org.uk>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: akpm@linux-foundation.org, "Arnd Bergmann" <arnd@arndb.de>,
	"Herbert Xu" <herbert@gondor.apana.org.au>,
	"Ralf Baechle" <ralf@linux-mips.org>
Subject: [PATCH 3.16 09/19] crypto: improve gcc optimization flags for serpent and wp512
Date: Sat, 01 Apr 2017 14:17:50 +0100	[thread overview]
Message-ID: <lsq.1491052670.915320569@decadent.org.uk> (raw)
In-Reply-To: <lsq.1491052670.319419763@decadent.org.uk>

3.16.43-rc1 review patch.  If anyone has any objections, please let me know.

------------------

From: Arnd Bergmann <arnd@arndb.de>

commit 7d6e9105026788c497f0ab32fa16c82f4ab5ff61 upstream.

An ancient gcc bug (first reported in 2003) has apparently resurfaced
on MIPS, where kernelci.org reports an overly large stack frame in the
whirlpool hash algorithm:

crypto/wp512.c:987:1: warning: the frame size of 1112 bytes is larger than 1024 bytes [-Wframe-larger-than=]

With some testing in different configurations, I'm seeing large
variations in stack frames size up to 1500 bytes for what should have
around 300 bytes at most. I also checked the reference implementation,
which is essentially the same code but also comes with some test and
benchmarking infrastructure.

It seems that recent compiler versions on at least arm, arm64 and powerpc
have a partial fix for this problem, but enabling "-fsched-pressure", but
even with that fix they suffer from the issue to a certain degree. Some
testing on arm64 shows that the time needed to hash a given amount of
data is roughly proportional to the stack frame size here, which makes
sense given that the wp512 implementation is doing lots of loads for
table lookups, and the problem with the overly large stack is a result
of doing a lot more loads and stores for spilled registers (as seen from
inspecting the object code).

Disabling -fschedule-insns consistently fixes the problem for wp512,
in my collection of cross-compilers, the results are consistently better
or identical when comparing the stack sizes in this function, though
some architectures (notable x86) have schedule-insns disabled by
default.

The four columns are:
default: -O2
press:	 -O2 -fsched-pressure
nopress: -O2 -fschedule-insns -fno-sched-pressure
nosched: -O2 -no-schedule-insns (disables sched-pressure)

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1136	848	1136	176
am33_2.0-linux-gcc-4.9.3	2100	2076	2100	2104
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
cris-linux-gcc-4.9.3		272	272	272	272
frv-linux-gcc-4.9.3		1128	1000	1128	280
hppa64-linux-gcc-4.9.3		1128	336	1128	184
hppa-linux-gcc-4.9.3		644	308	644	276
i386-linux-gcc-4.9.3		352	352	352	352
m32r-linux-gcc-4.9.3		720	656	720	268
microblaze-linux-gcc-4.9.3	1108	604	1108	256
mips64-linux-gcc-4.9.3		1328	592	1328	208
mips-linux-gcc-4.9.3		1096	624	1096	240
powerpc64-linux-gcc-4.9.3	1088	432	1088	160
powerpc-linux-gcc-4.9.3		1080	584	1080	224
s390-linux-gcc-4.9.3		456	456	624	360
sh3-linux-gcc-4.9.3		292	292	292	292
sparc64-linux-gcc-4.9.3		992	240	992	208
sparc-linux-gcc-4.9.3		680	592	680	312
x86_64-linux-gcc-4.9.3		224	240	272	224
xtensa-linux-gcc-4.9.3		1152	704	1152	304

aarch64-linux-gcc-7.0.0		224	224	1104	208
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352
mips-linux-gcc-7.0.0		1120	648	1120	272
x86_64-linux-gcc-7.0.1		240	240	304	240

arm-linux-gnueabi-gcc-4.4.7	840			392
arm-linux-gnueabi-gcc-4.5.4	784	728	784	320
arm-linux-gnueabi-gcc-4.6.4	736	728	736	304
arm-linux-gnueabi-gcc-4.7.4	944	784	944	352
arm-linux-gnueabi-gcc-4.8.5	464	464	760	352
arm-linux-gnueabi-gcc-4.9.3	848	848	1048	352
arm-linux-gnueabi-gcc-5.3.1	824	824	1064	336
arm-linux-gnueabi-gcc-6.1.1	808	808	1056	344
arm-linux-gnueabi-gcc-7.0.1	824	824	1048	352

Trying the same test for serpent-generic, the picture is a bit different,
and while -fno-schedule-insns is generally better here than the default,
-fsched-pressure wins overall, so I picked that instead.

				default	press	nopress	nosched
alpha-linux-gcc-4.9.3		1392	864	1392	960
am33_2.0-linux-gcc-4.9.3	536	524	536	528
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
cris-linux-gcc-4.9.3		528	528	528	528
frv-linux-gcc-4.9.3		536	400	536	504
hppa64-linux-gcc-4.9.3		524	208	524	480
hppa-linux-gcc-4.9.3		768	472	768	508
i386-linux-gcc-4.9.3		564	564	564	564
m32r-linux-gcc-4.9.3		712	576	712	532
microblaze-linux-gcc-4.9.3	724	392	724	512
mips64-linux-gcc-4.9.3		720	384	720	496
mips-linux-gcc-4.9.3		728	384	728	496
powerpc64-linux-gcc-4.9.3	704	304	704	480
powerpc-linux-gcc-4.9.3		704	296	704	480
s390-linux-gcc-4.9.3		560	560	592	536
sh3-linux-gcc-4.9.3		540	540	540	540
sparc64-linux-gcc-4.9.3		544	352	544	496
sparc-linux-gcc-4.9.3		544	344	544	496
x86_64-linux-gcc-4.9.3		528	536	576	528
xtensa-linux-gcc-4.9.3		752	544	752	544

aarch64-linux-gcc-7.0.0		432	432	656	480
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536
mips-linux-gcc-7.0.0		720	464	720	488
x86_64-linux-gcc-7.0.1		536	528	600	536

arm-linux-gnueabi-gcc-4.4.7	592			440
arm-linux-gnueabi-gcc-4.5.4	776	448	776	544
arm-linux-gnueabi-gcc-4.6.4	776	448	776	544
arm-linux-gnueabi-gcc-4.7.4	768	448	768	544
arm-linux-gnueabi-gcc-4.8.5	488	488	776	544
arm-linux-gnueabi-gcc-4.9.3	552	552	776	536
arm-linux-gnueabi-gcc-5.3.1	552	552	776	536
arm-linux-gnueabi-gcc-6.1.1	560	560	776	536
arm-linux-gnueabi-gcc-7.0.1	616	616	808	536

I did not do any runtime tests with serpent, so it is possible that stack
frame size does not directly correlate with runtime performance here and
it actually makes things worse, but it's more likely to help here, and
the reduced stack frame size is probably enough reason to apply the patch,
especially given that the crypto code is often used in deep call chains.

Link: https://kernelci.org/build/id/58797d7559b5149efdf6c3a9/logs/
Link: http://www.larc.usp.br/~pbarreto/WhirlpoolPage.html
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11488
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 crypto/Makefile | 2 ++
 1 file changed, 2 insertions(+)

--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -47,6 +47,7 @@ obj-$(CONFIG_CRYPTO_SHA1) += sha1_generi
 obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o
 obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o
 obj-$(CONFIG_CRYPTO_WP512) += wp512.o
+CFLAGS_wp512.o := $(call cc-option,-fno-schedule-insns)  # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
 obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o
 obj-$(CONFIG_CRYPTO_GF128MUL) += gf128mul.o
 obj-$(CONFIG_CRYPTO_ECB) += ecb.o
@@ -67,6 +68,7 @@ obj-$(CONFIG_CRYPTO_BLOWFISH_COMMON) +=
 obj-$(CONFIG_CRYPTO_TWOFISH) += twofish_generic.o
 obj-$(CONFIG_CRYPTO_TWOFISH_COMMON) += twofish_common.o
 obj-$(CONFIG_CRYPTO_SERPENT) += serpent_generic.o
+CFLAGS_serpent_generic.o := $(call cc-option,-fsched-pressure)  # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
 obj-$(CONFIG_CRYPTO_AES) += aes_generic.o
 obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia_generic.o
 obj-$(CONFIG_CRYPTO_CAST_COMMON) += cast_common.o

  parent reply	other threads:[~2017-04-01 13:23 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-01 13:17 [PATCH 3.16 00/19] 3.16.43-rc1 review Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 05/19] MIPS: save/disable MSA in lose_fpu Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 14/19] MIPS: Zero variable read by get_user / __get_user in case of an error Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 04/19] MIPS: preserve scalar FP CSR when switching vector context Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 15/19] HID: hid-input: Add parentheses to quell gcc warning Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 18/19] aio: mark AIO pseudo-fs noexec Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 17/19] vfs: Commit to never having exectuables on proc and sysfs Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 12/19] serial: samsung: Use %pa to print 'resource_size_t' type Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 16/19] netlink: remove mmapped netlink support Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 11/19] mmc: sunxi: avoid invalid pointer calculation Ben Hutchings
2017-04-01 18:45   ` David Lanzendörfer
2017-04-01 19:53     ` Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 08/19] atm: iphase: fix misleading indention Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 19/19] keys: Guard against null match function in keyring_search_aux() Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 10/19] fs/nfs: fix new compiler warning about boolean in switch Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 13/19] MIPS: ralink: Cosmetic change to prom_init() Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 06/19] MIPS: init upper 64b of vector registers when MSA is first used Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 02/19] blk: rq_data_dir() should not return a boolean Ben Hutchings
2017-04-01 13:17 ` Ben Hutchings [this message]
2017-04-01 13:17 ` [PATCH 3.16 03/19] MIPS: save/restore MSACSR register on context switch Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 01/19] fs: namespace: suppress 'may be used uninitialized' warnings Ben Hutchings
2017-04-01 13:17 ` [PATCH 3.16 07/19] MIPS: Fix build with binutils 2.24.51+ Ben Hutchings
2017-04-01 17:43 ` [PATCH 3.16 00/19] 3.16.43-rc1 review Guenter Roeck
2017-04-01 22:40   ` Ben Hutchings
2017-04-02  2:21     ` Guenter Roeck
2017-04-02  2:48     ` Ben Hutchings
2017-04-02  3:04 ` [PATCH 3.16 00/26] 3.16.43-rc2 review Ben Hutchings
2017-04-02  3:04   ` [PATCH 3.16 22/26] MIPS: traps: Fix inline asm ctc1 missing .set hardfloat Ben Hutchings
2017-04-02  3:04   ` [PATCH 3.16 23/26] MIPS: Push .set mips64r* into the functions needing it Ben Hutchings
2017-04-02  3:04   ` [PATCH 3.16 24/26] MIPS: assume at as source/dest of MSA copy/insert instructions Ben Hutchings
2017-04-02  3:04   ` [PATCH 3.16 20/26] MIPS: allow msa.h to be included in assembly files Ben Hutchings
2017-04-02  3:04   ` [PATCH 3.16 26/26] MIPS: wrap cfcmsa & ctcmsa accesses for toolchains with MSA support Ben Hutchings
2017-04-02  3:04   ` [PATCH 3.16 21/26] MIPS: mipsregs.h: Add write_32bit_cp1_register() Ben Hutchings
2017-04-02  3:04   ` [PATCH 3.16 25/26] MIPS: remove MSA macro recursion Ben Hutchings
2017-04-02  3:15   ` [PATCH 3.16 00/26] 3.16.43-rc2 review Ben Hutchings

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=lsq.1491052670.915320569@decadent.org.uk \
    --to=ben@decadent.org.uk \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ralf@linux-mips.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.