LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Arnd Bergmann <arnd@kernel.org>, "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Eric Biggers <ebiggers@kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Vineet Gupta <vgupta@synopsys.com>,
	Russell King <linux@armlinux.org.uk>,
	Herbert Xu <herbert@gondor.apana.org.au>,
	"David S. Miller" <davem@davemloft.net>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"open list:HARDWARE RANDOM NUMBER GENERATOR CORE" 
	<linux-crypto@vger.kernel.org>,
	"open list:BROADCOM NVRAM DRIVER" <linux-mips@vger.kernel.org>
Subject: Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
Date: Tue, 18 May 2021 06:12:03 -1000
Message-ID: <CAHk-=wjuoGyxDhAF8SsrTkN0-YfCx7E6jUN3ikC_tn2AKWTTsA@mail.gmail.com> (raw)
In-Reply-To: <CAK8P3a3hbts4k+rrfnE8Z78ezCaME0UVgwqkdLW5NOps2YHUQQ@mail.gmail.com>

On Tue, May 18, 2021 at 5:42 AM Arnd Bergmann <arnd@kernel.org> wrote:
>
> To be on the safe side, we could pass -fno-tree-loop-vectorize along
> with -O3 on the affected gcc versions, or use a bigger hammer
> (not use -O3 at all, always set -fno-tree-loop-vectorize, ...).

I personally think -O3 in general is unsafe.

It has historically been horribly buggy. It's gotten better, but this
case clearly shows that "gotten better" really isn't that high of a
bar.

Very few projects use -O3, which is obviously part of why it's buggy.
But the other part of why it's buggy is that vectorization is simply
very complicated, and honestly, judging by the last report the gcc
people don't care about being careful. They literally are ok with
knowingly generating an off-by-one range check, because "it's
undefined behavior".

With that kind of mentality, I'm not personally all that inclined to
say "sure, use -O3". We know it has bugs even for the well-defined
cases.

> -O3 is set for the lz4 and zstd compression helpers and for wireguard.

I'm actually surprised wireguard would use -O3. Yes, performance is
important. But for wireguard, correctness is certainly important too.
Maybe Jason isn't aware of just how bad gcc -O3 has historically been?

And -O3 has often generated _slower_ code, in addition to the bugs.
It's not like it's a situation where "-O3 is obviously better than
-O2". There's a reason -O2 is the default.

And that tends to be even more true in the kernel than in many user
space programs (ie smaller loops, generally much higher I$ miss rates
etc).

Jason? How big of a deal is that -O3 for wireguard wrt the normal -O2?
There are known buggy gcc versions that aren't ancient.

Of the other cases, that xor-neon.c case actually makes sense. For
that file, it literally exists _only_ to get a vectorized version of
the trivial xor_8regs loop. It's one of the (very very few) cases of
vectorization we actually want. And in that case, we might even want
to make things easier - and more explicit - for the compiler by making
the xor_8regs loops use "restrict" pointers.

That neon case actually wants and needs that tree-vectorization to
DTRT. But maybe it doesn't need the actual _loop_ vectorization? The
xor_8regs code is literally using hand-unrolled loops already, exactly
to make it as simple as possible for the compiler (but the lack of
"restrict" pointers means that it's not all that simple after all, and
I assume the compiler generates conditionals for the NEON case?

lz4 is questionable - yes, upstream lh4 seems to use -O3 (good), but
it also very much uses unaligned accesses, which is where the gcc bug
hits. I doubt that it really needs or wants the loop vectorization.

zstd looks very similar to lz4.

End result: at a minimum, I'd suggest using
"-fno-tree-loop-vectorize", although somebody should check that NEON
case.

And I still think that using O3 for anything halfway complicated
should be considered odd and need some strong numbers to enable.

               Linus

  reply index

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-14 10:00 [PATCH v2 00/13] Unify asm/unaligned.h around struct helper Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 01/13] asm-generic: use asm-generic/unaligned.h for most architectures Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 02/13] openrisc: always use unaligned-struct header Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 03/13] sh: remove unaligned access for sh4a Arnd Bergmann
2021-05-14 10:34   ` John Paul Adrian Glaubitz
2021-05-14 12:22     ` Arnd Bergmann
2021-05-15 15:36       ` John Paul Adrian Glaubitz
2021-05-15 20:10         ` Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 04/13] m68k: select CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 05/13] powerpc: use linux/unaligned/le_struct.h on LE power7 Arnd Bergmann
2021-05-14 11:48   ` Segher Boessenkool
2021-05-14 13:02     ` Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 06/13] asm-generic: unaligned: remove byteshift helpers Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 07/13] asm-generic: unaligned always use struct helpers Arnd Bergmann
2021-05-17 21:53   ` Eric Biggers
2021-05-18  7:25     ` Arnd Bergmann
2021-05-18 14:56       ` Linus Torvalds
2021-05-18 15:41         ` Arnd Bergmann
2021-05-18 16:12           ` Linus Torvalds [this message]
2021-05-18 18:09             ` Jason A. Donenfeld
2021-05-18 20:51             ` Arnd Bergmann
2021-05-18 21:31               ` Eric Biggers
2021-05-18 21:14         ` David Laight
2021-05-14 10:00 ` [PATCH v2 08/13] partitions: msdos: fix one-byte get_unaligned() Arnd Bergmann
2021-05-17 10:28   ` Christoph Hellwig
2021-05-17 10:44     ` Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 09/13] apparmor: use get_unaligned() only for multi-byte words Arnd Bergmann
2021-05-14 10:00 ` [PATCH v2 10/13] mwifiex: re-fix for unaligned accesses Arnd Bergmann
2021-05-15  6:22   ` Kalle Valo
2021-05-15  9:01     ` Arnd Bergmann
2021-05-15 18:23       ` Kalle Valo
2021-05-14 10:00 ` [PATCH v2 11/13] netpoll: avoid put_unaligned() on single character Arnd Bergmann
2021-05-14 10:01 ` [PATCH v2 12/13] asm-generic: uaccess: 1-byte access is always aligned Arnd Bergmann
2021-05-15 18:41   ` Randy Dunlap
2021-05-15 20:16     ` Arnd Bergmann
2021-05-14 10:01 ` [PATCH v2 13/13] asm-generic: simplify asm/unaligned.h Arnd Bergmann
2021-05-14 10:35   ` David Laight
2021-05-14 17:32 ` [PATCH v2 00/13] Unify asm/unaligned.h around struct helper Linus Torvalds
2021-05-14 18:51   ` Vineet Gupta
2021-05-14 19:22     ` Linus Torvalds
2021-05-14 19:45       ` Vineet Gupta
2021-05-14 20:19         ` Linus Torvalds
2021-05-14 19:31   ` Arnd Bergmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wjuoGyxDhAF8SsrTkN0-YfCx7E6jUN3ikC_tn2AKWTTsA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Jason@zx2c4.com \
    --cc=arnd@kernel.org \
    --cc=davem@davemloft.net \
    --cc=ebiggers@kernel.org \
    --cc=herbert@gondor.apana.org.au \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=tsbogend@alpha.franken.de \
    --cc=vgupta@synopsys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git