linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Valentin Schneider <valentin.schneider@arm.com>
Cc: Joel Fernandes <joel@joelfernandes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Alan Stern <stern@rowland.harvard.edu>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	rostedt <rostedt@goodmis.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	paulmck <paulmck@linux.ibm.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Will Deacon <will.deacon@arm.com>,
	David Howells <dhowells@redhat.com>
Subject: Re: [PATCH 1/1] Fix: trace sched switch start/stop racy updates
Date: Fri, 16 Aug 2019 15:57:43 -0700	[thread overview]
Message-ID: <CAHk-=wi_KeD1M-_-_SU_H92vJ-yNkDnAGhAS=RR1yNNGWKW+aA@mail.gmail.com> (raw)
In-Reply-To: <3c0cb8a2-eba2-7bea-8523-b948253a6804@arm.com>

On Fri, Aug 16, 2019 at 3:27 PM Valentin Schneider
<valentin.schneider@arm.com> wrote:
>
> How would you differentiate optimizations you want from those you don't with
> just a flag? There's a reason we use volatile casts instead of declaring
> everything volatile: we actually *want* those optimizations. It just so
> happens that we don't want them *in some places*, and we have tools to tag
> them as such.

We actually disable lots of "valid" (read: the standard allows them,
but they are completely wrong for the kernel) optimizations because
they are wrong.

The whole type-based alias thing is just wrong. The C standards body
was incompetent to allow that garbage. So we disable it.

If you can *prove* that no aliasing exists, go ahead and re-order
accesses. But no guesses based on random types.

Similarly, if some compiler decides that it's ok to make speculative
writes (knowing it will over-write it with the right data later) to
data that is possibly visible to other threads, then such an
"optimization" needs to just be disabled. It might help some
benchmark, and if you read the standard just the right way it might be
allowed - but that doesn't make it valid.

We already had situations like that, where compiler people thought it
would be ok (for example) to turns a narrow write into a wider
read-modify-write because it had already done the wider read for other
reasons.

Again, the original C standard "allows" that in theory, because the
original C standard doesn't take threading into account. In fact, the
alpha architecture made actively bad design decisions based on that
(incorrect) assumption.

It turns out that in that case, even non-kernel people rebelled, and
it's apparently thankfully not allowed in newer versions of the
standard, exactly because threading has become a thing. You can't
magically write back unrelated variables just because they might be
next-door neighbors and share a word.

So no, we do *not* in general just say that we want any random
optimizations. A compiler that turns a single write into something
else is almost certainly something that shouldn't be allowed near the
kernel.

We add READ_ONCE and WRITE_ONCE annotations when they make sense. Not
because of some theoretical "compiler is free to do garbage"
arguments. If such garbage happens, we need to fix the compiler, the
same way we already do with

  -fno-strict-aliasing
  -fno-delete-null-pointer-checks
  -fno-strict-overflow

because all those "optimizations" are just fundamentally unsafe and wrong.

I really wish the compiler would never take advantage of "I can prove
this is undefined behavior" kind of things when it comes to the kernel
(or any other projects I am involved with, for that matter). If you
can prove that, then you shouldn't decide to generate random code
without a big warning. But that's what those optimizations that we
disable effectively all do.

I'd love to have a flag that says "all undefined behavior is treated
as implementation-defined". There's a somewhat subtle - but very
important - difference there.

And that's what some hypothetical speculative write optimizations do
too. I do not believe they are valid for the kernel. If the code says

   if (a)
      global_var = 1
   else
      global_var = 0

then the compiler had better not turn that into

     global_var = 0
     if (a)
         global_var = 1

even if there isn't a volatile there. But yes, we've had compiler
writers that say "if you read the specs, that's ok".

No, it's not ok. Because reality trumps any weasel-spec-reading.

And happily, I don't think we've ever really seen a compiler that we
use that actually does the above kind of speculative write thing (but
doing it for your own local variables that can't be seen by other
threads of execution - go wild).

So in general, we very much expect the compiler to do sane code
generation, and not (for example) do store tearing on normal
word-sized things or add writes that weren't there originally etc.

And yes, reads are different from writes. Reads don't have the same
kind of "other threads of execution can see them" effects, so a
compiler turning a single read into multiple reads is much more
realistic and not the same kind of "we need to expect a certain kind
of sanity from the compiler" issue.

              Linus

  reply	other threads:[~2019-08-16 22:58 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-18 10:29 WARNING in tracepoint_probe_register_prio (3) syzbot
2019-08-16  0:11 ` syzbot
2019-08-16 14:26   ` [PATCH 1/1] Fix: trace sched switch start/stop racy updates Mathieu Desnoyers
2019-08-16 16:25     ` Steven Rostedt
2019-08-16 16:48       ` Valentin Schneider
2019-08-16 17:04         ` Steven Rostedt
2019-08-16 17:41           ` Mathieu Desnoyers
2019-08-16 19:18             ` Steven Rostedt
2019-08-16 19:19             ` Alan Stern
2019-08-16 20:44               ` Joel Fernandes
2019-08-16 20:49                 ` Thomas Gleixner
2019-08-16 20:57                   ` Joel Fernandes
2019-08-16 22:27                     ` Valentin Schneider
2019-08-16 22:57                       ` Linus Torvalds [this message]
2019-08-17  1:41                         ` Mathieu Desnoyers
2019-08-17  4:52                         ` Paul E. McKenney
2019-08-17  8:28                           ` Linus Torvalds
2019-08-17  8:44                             ` Linus Torvalds
2019-08-17 15:02                               ` Mathieu Desnoyers
2019-08-17 20:03                                 ` Valentin Schneider
2019-08-17 23:00                                   ` Paul E. McKenney
2019-08-19 10:34                                     ` Valentin Schneider
2019-08-17 22:28                             ` Paul E. McKenney
2019-08-20 14:01                           ` Peter Zijlstra
2019-08-20 20:31                             ` Paul E. McKenney
2019-08-20 20:39                               ` Peter Zijlstra
2019-08-20 20:52                                 ` Paul E. McKenney
2019-08-16 21:04                   ` Linus Torvalds
2019-08-17  1:36                     ` Mathieu Desnoyers
2019-08-17  2:13                       ` Steven Rostedt
2019-08-17 14:40                         ` Mathieu Desnoyers
2019-08-17 15:26                           ` Steven Rostedt
2019-08-17 15:55                             ` Mathieu Desnoyers
2019-08-17 16:40                               ` Steven Rostedt
2019-08-17 22:06                                 ` Paul E. McKenney
2019-08-17  8:08                       ` Linus Torvalds
2019-08-20 13:56                         ` Peter Zijlstra
2019-08-20 20:29                           ` Paul E. McKenney
2019-08-21 10:32                             ` Will Deacon
2019-08-21 13:23                               ` Paul E. McKenney
2019-08-21 13:32                                 ` Will Deacon
2019-08-21 13:56                                   ` Paul E. McKenney
2019-08-21 16:22                                     ` Will Deacon
2019-08-21 15:33                                 ` Peter Zijlstra
2019-08-21 15:48                                   ` Mathieu Desnoyers
2019-08-21 16:14                                     ` Paul E. McKenney
2019-08-21 19:03                                     ` Joel Fernandes
2019-09-09  6:21                           ` Herbert Xu
2019-08-16 20:49                 ` Steven Rostedt
2019-08-16 20:59                   ` Joel Fernandes
2019-08-17  1:25                   ` Mathieu Desnoyers
2019-08-18  9:15                   ` stable markup was " Pavel Machek
2019-08-16 17:19       ` Mathieu Desnoyers
2019-08-16 19:15         ` Steven Rostedt
2019-08-17 14:27           ` Mathieu Desnoyers
2019-08-17 15:42             ` Steven Rostedt
2019-08-17 15:53               ` Mathieu Desnoyers
2019-08-17 16:43                 ` Steven Rostedt
2019-08-16 12:32 ` WARNING in tracepoint_probe_register_prio (3) syzbot
2019-08-16 12:41   ` Sebastian Andrzej Siewior

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wi_KeD1M-_-_SU_H92vJ-yNkDnAGhAS=RR1yNNGWKW+aA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=boqun.feng@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=paulmck@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=stern@rowland.harvard.edu \
    --cc=tglx@linutronix.de \
    --cc=valentin.schneider@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).