From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Victor Kaplansky <VICTORK@il.ibm.com>,
Oleg Nesterov <oleg@redhat.com>,
Anton Blanchard <anton@samba.org>,
Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Frederic Weisbecker <fweisbec@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
Linux PPC dev <linuxppc-dev@ozlabs.org>,
Michael Ellerman <michael@ellerman.id.au>,
Michael Neuling <mikey@neuling.org>
Subject: Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()
Date: Thu, 7 Nov 2013 18:50:01 -0500 [thread overview]
Message-ID: <20131107235001.GA30034@Krystal> (raw)
In-Reply-To: <20131104112254.GK28601@twins.programming.kicks-ass.net>
* Peter Zijlstra (peterz@infradead.org) wrote:
[...]
Hi Peter,
Looking at this simplified version of perf's ring buffer
synchronization, I get concerned about the following issue:
> /*
> * One important detail is that the kbuf part and the kbuf_writer() are
> * strictly per cpu and we can thus rely on program order for those.
> *
> * Only the userspace consumer can possibly run on another cpu, and thus we
> * need to ensure data consistency for those.
> */
>
> struct buffer {
> u64 size;
> u64 tail;
> u64 head;
> void *data;
> };
>
> struct buffer *kbuf, *ubuf;
>
> /*
> * If there's space in the buffer; store the data @buf; otherwise
> * discard it.
> */
> void kbuf_write(int sz, void *buf)
> {
> u64 tail, head, offset;
>
> do {
> tail = ACCESS_ONCE(ubuf->tail);
> offset = head = kbuf->head;
> if (CIRC_SPACE(head, tail, kbuf->size) < sz) {
> /* discard @buf */
> return;
> }
> head += sz;
> } while (local_cmpxchg(&kbuf->head, offset, head) != offset)
>
Let's suppose we have a thread executing kbuf_write(), interrupted by an
IRQ or NMI right after a successful local_cmpxchg() (space reservation
in the buffer). If the nested execution context also calls kbuf_write(),
it will therefore update ubuf->head (below) with the second reserved
space, and only after that will it return to the original thread context
and continue executing kbuf_write(), thus overwriting ubuf->head with
the prior-to-last reserved offset.
All this probably works OK most of the times, when we have an event flow
guaranteeing that a following event will fix things up, but there
appears to be a risk of losing events near the end of the trace when
those are in nested execution contexts.
Thoughts ?
Thanks,
Mathieu
> /*
> * Ensure that if we see the userspace tail (ubuf->tail) such
> * that there is space to write @buf without overwriting data
> * userspace hasn't seen yet, we won't in fact store data before
> * that read completes.
> */
>
> smp_mb(); /* A, matches with D */
>
> memcpy(kbuf->data + offset, buf, sz);
>
> /*
> * Ensure that we write all the @buf data before we update the
> * userspace visible ubuf->head pointer.
> */
> smp_wmb(); /* B, matches with C */
>
> ubuf->head = kbuf->head;
> }
>
> /*
> * Consume the buffer data and update the tail pointer to indicate to
> * kernel space there's 'free' space.
> */
> void ubuf_read(void)
> {
> u64 head, tail;
>
> tail = ACCESS_ONCE(ubuf->tail);
> head = ACCESS_ONCE(ubuf->head);
>
> /*
> * Ensure we read the buffer boundaries before the actual buffer
> * data...
> */
> smp_rmb(); /* C, matches with B */
>
> while (tail != head) {
> obj = ubuf->data + tail;
> /* process obj */
> tail += obj->size;
> tail %= ubuf->size;
> }
>
> /*
> * Ensure all data reads are complete before we issue the
> * ubuf->tail update; once that update hits, kbuf_write() can
> * observe and overwrite data.
> */
> smp_mb(); /* D, matches with A */
>
> ubuf->tail = tail;
> }
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
next prev parent reply other threads:[~2013-11-08 3:59 UTC|newest]
Thread overview: 117+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-22 23:54 perf events ring buffer memory barrier on powerpc Michael Neuling
2013-10-23 7:39 ` Victor Kaplansky
2013-10-23 14:19 ` Frederic Weisbecker
2013-10-23 14:25 ` Frederic Weisbecker
2013-10-25 17:37 ` Peter Zijlstra
2013-10-25 20:31 ` Michael Neuling
2013-10-27 9:00 ` Victor Kaplansky
2013-10-28 9:22 ` Peter Zijlstra
2013-10-28 10:02 ` Frederic Weisbecker
2013-10-28 12:38 ` Victor Kaplansky
2013-10-28 13:26 ` Peter Zijlstra
2013-10-28 16:34 ` Paul E. McKenney
2013-10-28 20:17 ` Oleg Nesterov
2013-10-28 20:58 ` Victor Kaplansky
2013-10-29 10:21 ` Peter Zijlstra
2013-10-29 10:30 ` Peter Zijlstra
2013-10-29 10:35 ` Peter Zijlstra
2013-10-29 20:15 ` Oleg Nesterov
2013-10-29 19:27 ` Vince Weaver
2013-10-30 10:42 ` Peter Zijlstra
2013-10-30 11:48 ` James Hogan
2013-10-30 12:48 ` Peter Zijlstra
2013-11-06 13:19 ` [tip:perf/core] tools/perf: Add required memory barriers tip-bot for Peter Zijlstra
2013-11-06 13:50 ` Vince Weaver
2013-11-06 14:00 ` Peter Zijlstra
2013-11-06 14:28 ` Peter Zijlstra
2013-11-06 14:55 ` Vince Weaver
2013-11-06 15:10 ` Peter Zijlstra
2013-11-06 15:23 ` Peter Zijlstra
2013-11-06 14:44 ` Peter Zijlstra
2013-11-06 16:07 ` Peter Zijlstra
2013-11-06 17:31 ` Vince Weaver
2013-11-06 18:24 ` Peter Zijlstra
2013-11-07 8:21 ` Ingo Molnar
2013-11-07 14:27 ` Vince Weaver
2013-11-07 15:55 ` Ingo Molnar
2013-11-11 16:24 ` Peter Zijlstra
2013-11-11 21:10 ` Ingo Molnar
2013-10-29 21:23 ` perf events ring buffer memory barrier on powerpc Michael Neuling
2013-10-30 9:27 ` Paul E. McKenney
2013-10-30 11:25 ` Peter Zijlstra
2013-10-30 14:52 ` Victor Kaplansky
2013-10-30 15:39 ` Peter Zijlstra
2013-10-30 17:14 ` Victor Kaplansky
2013-10-30 17:44 ` Peter Zijlstra
2013-10-31 6:16 ` Paul E. McKenney
2013-11-01 13:12 ` Victor Kaplansky
2013-11-02 16:36 ` Paul E. McKenney
2013-11-02 17:26 ` Paul E. McKenney
2013-10-31 6:40 ` Paul E. McKenney
2013-11-01 14:25 ` Victor Kaplansky
2013-11-02 17:28 ` Paul E. McKenney
2013-11-01 14:56 ` Peter Zijlstra
2013-11-02 17:32 ` Paul E. McKenney
2013-11-03 14:40 ` Paul E. McKenney
2013-11-03 15:17 ` [RFC] arch: Introduce new TSO memory barrier smp_tmb() Peter Zijlstra
2013-11-03 18:08 ` Linus Torvalds
2013-11-03 20:01 ` Peter Zijlstra
2013-11-03 22:42 ` Paul E. McKenney
2013-11-03 23:34 ` Linus Torvalds
2013-11-04 10:51 ` Paul E. McKenney
2013-11-04 11:22 ` Peter Zijlstra
2013-11-04 16:27 ` Paul E. McKenney
2013-11-04 16:48 ` Peter Zijlstra
2013-11-04 19:11 ` Peter Zijlstra
2013-11-04 19:18 ` Peter Zijlstra
2013-11-04 20:54 ` Paul E. McKenney
2013-11-04 20:53 ` Paul E. McKenney
2013-11-05 14:05 ` Will Deacon
2013-11-05 14:49 ` Paul E. McKenney
2013-11-05 18:49 ` Peter Zijlstra
2013-11-06 11:00 ` Will Deacon
2013-11-06 12:39 ` Peter Zijlstra
2013-11-06 12:51 ` Geert Uytterhoeven
2013-11-06 13:57 ` Peter Zijlstra
2013-11-06 18:48 ` Paul E. McKenney
2013-11-06 19:42 ` Peter Zijlstra
2013-11-07 11:17 ` Will Deacon
2013-11-07 13:36 ` Peter Zijlstra
2013-11-07 23:50 ` Mathieu Desnoyers [this message]
2013-11-04 11:05 ` Will Deacon
2013-11-04 16:34 ` Paul E. McKenney
2013-11-03 20:59 ` Benjamin Herrenschmidt
2013-11-03 22:43 ` Paul E. McKenney
2013-11-03 17:07 ` perf events ring buffer memory barrier on powerpc Will Deacon
2013-11-03 22:47 ` Paul E. McKenney
2013-11-04 9:57 ` Will Deacon
2013-11-04 10:52 ` Paul E. McKenney
2013-11-01 16:11 ` Peter Zijlstra
2013-11-02 17:46 ` Paul E. McKenney
2013-11-01 16:18 ` Peter Zijlstra
2013-11-02 17:49 ` Paul E. McKenney
2013-10-30 13:28 ` Victor Kaplansky
2013-10-30 15:51 ` Peter Zijlstra
2013-10-30 18:29 ` Peter Zijlstra
2013-10-30 19:11 ` Peter Zijlstra
2013-10-31 4:33 ` Paul E. McKenney
2013-10-31 4:32 ` Paul E. McKenney
2013-10-31 9:04 ` Peter Zijlstra
2013-10-31 15:07 ` Paul E. McKenney
2013-10-31 15:19 ` Peter Zijlstra
2013-11-01 9:28 ` Paul E. McKenney
2013-11-01 10:30 ` Peter Zijlstra
2013-11-02 15:20 ` Paul E. McKenney
2013-11-04 9:07 ` Peter Zijlstra
2013-11-04 10:00 ` Paul E. McKenney
2013-10-31 9:59 ` Victor Kaplansky
2013-10-31 12:28 ` David Laight
2013-10-31 12:55 ` Victor Kaplansky
2013-10-31 15:25 ` Paul E. McKenney
2013-11-01 16:06 ` Victor Kaplansky
2013-11-01 16:25 ` David Laight
2013-11-01 16:30 ` Victor Kaplansky
2013-11-03 20:57 ` Benjamin Herrenschmidt
2013-11-02 15:46 ` Paul E. McKenney
2013-10-28 19:09 ` Oleg Nesterov
2013-10-29 14:06 ` [tip:perf/urgent] perf: Fix perf ring buffer memory ordering tip-bot for Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131107235001.GA30034@Krystal \
--to=mathieu.desnoyers@efficios.com \
--cc=VICTORK@il.ibm.com \
--cc=anton@samba.org \
--cc=benh@kernel.crashing.org \
--cc=fweisbec@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linuxppc-dev@ozlabs.org \
--cc=michael@ellerman.id.au \
--cc=mikey@neuling.org \
--cc=oleg@redhat.com \
--cc=paulmck@linux.vnet.ibm.com \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).