linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@kernel.org>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Segher Boessenkool <segher@kernel.crashing.org>,
	Will Deacon <will@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Alan Stern <stern@rowland.harvard.edu>,
	Andrea Parri <parri.andrea@gmail.com>,
	Boqun Feng <boqun.feng@gmail.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	David Howells <dhowells@redhat.com>,
	j alglave <j.alglave@ucl.ac.uk>,
	luc maranget <luc.maranget@inria.fr>, akiyks <akiyks@gmail.com>,
	linux-toolchains <linux-toolchains@vger.kernel.org>,
	linux-arch <linux-arch@vger.kernel.org>
Subject: Re: [RFC PATCH] LKMM: Add ctrl_dep() macro for control dependency
Date: Fri, 1 Oct 2021 15:53:29 -0700	[thread overview]
Message-ID: <20211001225329.GN880162@paulmck-ThinkPad-P17-Gen-1> (raw)
In-Reply-To: <971151132.46974.1633102138685.JavaMail.zimbra@efficios.com>

On Fri, Oct 01, 2021 at 11:28:58AM -0400, Mathieu Desnoyers wrote:
> ----- On Sep 29, 2021, at 7:57 PM, paulmck paulmck@kernel.org wrote:
> 
> > On Wed, Sep 29, 2021 at 04:47:03PM -0500, Segher Boessenkool wrote:
> >> Hi!
> >> 
> >> On Tue, Sep 28, 2021 at 05:15:07PM -0400, Mathieu Desnoyers wrote:
> >> > C99 describes that accessing volatile objects are side-effects, and that
> >> > "at certain specified points in the execution sequence called sequence
> >> > points, all side effects of previous evaluations shall be complete
> >> > and no side effects of subsequent evaluations shall have taken
> >> > place". [2]
> >> 
> >> But note that the kernel explicitly uses C89 (with GNU extensions).
> >> Side effects are largely equal there though.
> >> 
> >> Also note that there may no place in the generated machine code that
> >> corresponds exactly to some sequence point.  Sequence points are a
> >> concept that applies to the source program and how that executes on the
> >> abstract machine.
> > 
> > Plus the "as if" rule rears its ugly head in many of these situations.
> > 
> >> > +Because ctrl_dep emits distinct asm volatile within each leg of the if
> >> > +statement, the compiler cannot transform the two writes to 'b' into a
> >> > +conditional-move (cmov) instruction, thus ensuring the presence of a
> >> > +conditional branch.  Also because the ctrl_dep emits asm volatile within
> >> > +each leg of the if statement, the compiler cannot move the write to 'c'
> >> > +before the conditional branch.
> >> 
> >> I think your reasoning here misses some things.  So many that I don't
> >> know where to start to list them, every "because" and "thus" here does
> >> not follow, and even the statements of fact are not a given.
> >> 
> >> Why do you want a conditional branch insn at all, anyway?  You really
> >> want something else as far as I can see.
> > 
> > Because at the assembly language level on some architectures, a
> > conditional branch instruction provides weak but very real and very
> > useful memory-ordering properties.  Such a branch orders all loads
> > whose return values feed into the branch condition before any stores
> > that execute after the branch does (regardless of whether or not the
> > branch was taken).  And this is all the ordering that is required for
> > the use cases that Mathieu is worried about.
> > 
> > Yes, you can use explicit memory-barrier or acquire-load instructions,
> > but those incur more overhead on some types of hardware.  The code in
> > question is on a hotpath and is thus performance-critical.
> > 
> > It would be nice to be able to somehow tell the compiler exactly
> > what the ordering constraints are ("this particular load must be
> > ordered before these particular stores") and then let it (1) figure
> > out that a conditional branch will do the trick and (2) generate the
> > code accordingly.  But last I checked, this was not going to happen any
> > time soon.  So for the time being, we have to live within the current
> > capability of the tools that are available to us.
> > 
> > Linus points out that in all the actual control-dependent code in
> > the Linux kernel, the compiler is going to be hard-pressed to fail
> > to emit the required branch.  (Or in the case of ARMv8, the required
> > conditional-move instruction.)
> > 
> > Mathieu, for his part, recently read the relevant portions of
> > memory-barriers.txt (reproduced below) and would like to simplify these
> > coding guidlines, which, speaking as the author of those guidelines,
> > would be an extremely good thing.  His patches are attempting to move
> > us in that direction.
> > 
> > Alternatives include: (1) Using acquire loads or memory barriers
> > and accepting the loss in performance, but giving the compiler much
> > less leeway, (2) Ripping all of the two-legged "if" examples from
> > memory-barriers.txt and restricting control dependencies to else-less
> > "if" statements, again giving the compiler less leeway, and (3) Your
> > ideas here.
> > 
> > Does that help, or am I missing your point?
> 
> Thanks Paul, it does help explaining the motivation for relying on
> control dependencies for some fast-path memory ordering in the kernel.
> 
> And yes, my main goal is to simplify the coding guide lines, but I have
> not found any example of bad generated code in the tree kernel at this
> point. In some cases (e.g. uses of smp_acquire__after_ctrl_dep()) it's
> mainly thanks to luck though.
> 
> There is another alternative we could list here: implement ctrl_dep_true(),
> ctrl_dep_false() and ctrl_dep(), which would respectively ensure a
> control dependency on the then leg, on the else leg, or on both legs of
> a conditional expression evaluation.
> 
> > 
> >> It is essential here that there is a READ_ONCE and the WRITE_ONCE.
> >> Those things might make it work the way you want, but as Linus says this
> >> is all way too subtle.  Can you include the *_ONCE into the primitive
> >> itself somehow?
> > 
> > Actually, if the store is not involved in a data race, the WRITE_ONCE()
> > is not needed.  And in that case, the compiler is much less able to
> > fail to provide the needed ordering.  (No, the current documentation
> > does not reflect this.)  But if there is a data race, then your point
> > is right on the mark -- that WRITE_ONCE() cannot be safely omitted.
> > 
> > But you are absolutely right that the READ_ONCE() or equivalent is not
> > at all optional.  An example of an acceptable equivalent is an atomic
> > read-modify-write operation such as atomic_xchg_relaxed().
> > 
> > The question about whether the READ_ONCE() and WRITE_ONCE() can be
> > incorporated into the macro I leave to Mathieu.  I can certainly see
> > serious benefits from this approach, at least from a compiler viewpoint.
> > I must reserve judgment on usability until I see a proposal.
> 
> [...]
> 
> After having audited thoroughly all obviously documented control dependencies
> in the kernel tree, I'm not sure that including the READ_ONCE() and WRITE_ONCE()
> with the ctrl_dep() macro is a good idea, because in some cases there is
> calculation to be done on the result of the READ_ONCE() (e.g. through
> a static inline) before handing it over to the conditional expression.
> In other cases many stores are being done after the control dependency, e.g.:
> 
> kernel/events/ring_buffer.c:__perf_output_begin()
> 
>         do {
>                 tail = READ_ONCE(rb->user_page->data_tail);
>                 offset = head = local_read(&rb->head);
>                 if (!rb->overwrite) {
>                         if (unlikely(!ring_buffer_has_space(head, tail,
>                                                             perf_data_size(rb),
>                                                             size, backward)))
>                                 goto fail;
>                 }
> 
>                 /*
>                  * The above forms a control dependency barrier separating the
>                  * @tail load above from the data stores below. Since the @tail
>                  * load is required to compute the branch to fail below.
>                  *
>                  * A, matches D; the full memory barrier userspace SHOULD issue
>                  * after reading the data and before storing the new tail
>                  * position.
>                  *
>                  * See perf_output_put_handle().
>                  */
> 
>                 if (!backward)
>                         head += size;
>                 else
>                         head -= size;
>         } while (local_cmpxchg(&rb->head, offset, head) != offset);
> 
>         if (backward) {
>                 offset = head;
>                 head = (u64)(-head);
>         }
> 
>         /*
>          * We rely on the implied barrier() by local_cmpxchg() to ensure
>          * none of the data stores below can be lifted up by the compiler.
>          */
> 
> [...]

Indeed, these things do not necessarily fit a nice simple pattern.

> Note that I suspect that this control dependency documentation could be
> improved to state that the first control dependency is with the following
> local_cmpxchg store, which itself has a control dependency (when it evaluates
> to false) with the following stores to the ring buffer. Those are not volatile
> stores, but the "memory" clobber with the local_cmpxchg should ensure that
> following stores are after the local_cmpxchg in program order.
> 
> One other thing we could do to improve things slightly would be to turn
> smp_acquire__after_ctrl_dep() into something which really is an acquire
> in all cases, which may not currently be true if the compiler finds a
> matching barrier()/smp_rmb() in the other leg after the conditional
> expression.

Tools to post-process binaries have been suggested in other venues,
but there would still need to be a way to tell the tool what the
constraints are.

							Thanx, Paul

  reply	other threads:[~2021-10-01 22:53 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-28 21:15 [RFC PATCH] LKMM: Add ctrl_dep() macro for control dependency Mathieu Desnoyers
2021-09-29 12:06 ` Marco Elver
2021-10-01 15:45   ` Mathieu Desnoyers
2021-10-01 16:20     ` Linus Torvalds
2021-10-01 17:28       ` Mathieu Desnoyers
2021-10-01 18:18         ` Linus Torvalds
2021-09-29 12:28 ` Florian Weimer
2021-09-29 17:41   ` Segher Boessenkool
2021-09-29 19:46     ` Florian Weimer
2021-10-01 16:13     ` Mathieu Desnoyers
2021-10-01 16:26       ` Florian Weimer
2021-10-01 16:35         ` Linus Torvalds
2021-10-10 14:02           ` Florian Weimer
2021-10-14  0:01             ` Paul E. McKenney
2021-10-14  2:14               ` Alan Stern
2021-10-14 16:14                 ` Paul E. McKenney
2021-10-14 15:58               ` Florian Weimer
2021-10-14 16:23                 ` Paul E. McKenney
2021-10-14 18:19                   ` Florian Weimer
2021-10-14 21:09                     ` Paul E. McKenney
2021-10-14 22:36                       ` Linus Torvalds
2021-09-30 13:28   ` Mathieu Desnoyers
2021-09-29 14:47 ` Linus Torvalds
2021-09-29 14:54   ` Linus Torvalds
2021-09-29 19:50     ` Mathieu Desnoyers
2021-09-29 20:13       ` Mathieu Desnoyers
2021-09-29 19:27   ` Mathieu Desnoyers
2021-09-29 22:14     ` Linus Torvalds
2021-09-29 21:47 ` Segher Boessenkool
2021-09-29 23:57   ` Paul E. McKenney
2021-10-01 15:28     ` Mathieu Desnoyers
2021-10-01 22:53       ` Paul E. McKenney [this message]
2021-10-01 19:10     ` Segher Boessenkool
2021-10-01 22:50       ` Paul E. McKenney
2021-10-02 14:29       ` Alan Stern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211001225329.GN880162@paulmck-ThinkPad-P17-Gen-1 \
    --to=paulmck@kernel.org \
    --cc=akiyks@gmail.com \
    --cc=boqun.feng@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=j.alglave@ucl.ac.uk \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-toolchains@vger.kernel.org \
    --cc=luc.maranget@inria.fr \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=npiggin@gmail.com \
    --cc=parri.andrea@gmail.com \
    --cc=peterz@infradead.org \
    --cc=segher@kernel.crashing.org \
    --cc=stern@rowland.harvard.edu \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).