linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Victor Kaplansky <VICTORK@il.ibm.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Anton Blanchard <anton@samba.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux PPC dev <linuxppc-dev@ozlabs.org>,
	Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>,
	Michael Ellerman <michael@ellerman.id.au>,
	Michael Neuling <mikey@neuling.org>
Subject: Re: perf events ring buffer memory barrier on powerpc
Date: Wed, 30 Oct 2013 12:25:26 +0100	[thread overview]
Message-ID: <20131030112526.GI16117@laptop.programming.kicks-ass.net> (raw)
In-Reply-To: <20131030092725.GL4126@linux.vnet.ibm.com>

On Wed, Oct 30, 2013 at 02:27:25AM -0700, Paul E. McKenney wrote:
> On Mon, Oct 28, 2013 at 10:58:58PM +0200, Victor Kaplansky wrote:
> > Oleg Nesterov <oleg@redhat.com> wrote on 10/28/2013 10:17:35 PM:
> > 
> > >       mb();   // XXXXXXXX: do we really need it? I think yes.
> > 
> > Oh, it is hard to argue with feelings. Also, it is easy to be on
> > conservative side and put the barrier here just in case.
> > But I still insist that the barrier is redundant in your example.
> 
> If you were to back up that insistence with a description of the orderings
> you are relying on, why other orderings are not important, and how the
> important orderings are enforced, I might be tempted to pay attention
> to your opinion.

OK, so let me try.. a slightly less convoluted version of the code in
kernel/events/ring_buffer.c coupled with a userspace consumer would look
something like the below.

One important detail is that the kbuf part and the kbuf_writer() are
strictly per cpu and we can thus rely on implicit ordering for those.

Only the userspace consumer can possibly run on another cpu, and thus we
need to ensure data consistency for those. 

struct buffer {
	u64 size;
	u64 tail;
	u64 head;
	void *data;
};

struct buffer *kbuf, *ubuf;

/*
 * Determine there's space in the buffer to store data at @offset to
 * @head without overwriting data at @tail.
 */
bool space(u64 tail, u64 offset, u64 head)
{
	offset = (offset - tail) % kbuf->size;
	head   = (head   - tail) % kbuf->size;

	return (s64)(head - offset) >= 0;
}

/*
 * If there's space in the buffer; store the data @buf; otherwise
 * discard it.
 */
void kbuf_write(int sz, void *buf)
{
	u64 tail = ACCESS_ONCE(ubuf->tail); /* last location userspace read */
	u64 offset = kbuf->head; /* we already know where we last wrote */
	u64 head = offset + sz;

	if (!space(tail, offset, head)) {
		/* discard @buf */
		return;
	}

	/*
	 * Ensure that if we see the userspace tail (ubuf->tail) such
	 * that there is space to write @buf without overwriting data
	 * userspace hasn't seen yet, we won't in fact store data before
	 * that read completes.
	 */

	smp_mb(); /* A, matches with D */

	write(kbuf->data + offset, buf, sz);
	kbuf->head = head % kbuf->size;

	/*
	 * Ensure that we write all the @buf data before we update the
	 * userspace visible ubuf->head pointer.
	 */
	smp_wmb(); /* B, matches with C */

	ubuf->head = kbuf->head;
}

/*
 * Consume the buffer data and update the tail pointer to indicate to
 * kernel space there's 'free' space.
 */
void ubuf_read(void)
{
	u64 head, tail;

	tail = ACCESS_ONCE(ubuf->tail);
	head = ACCESS_ONCE(ubuf->head);

	/*
	 * Ensure we read the buffer boundaries before the actual buffer
	 * data...
	 */
	smp_rmb(); /* C, matches with B */

	while (tail != head) {
		obj = ubuf->data + tail;
		/* process obj */
		tail += obj->size;
		tail %= ubuf->size;
	}

	/*
	 * Ensure all data reads are complete before we issue the
	 * ubuf->tail update; once that update hits, kbuf_write() can
	 * observe and overwrite data.
	 */
	smp_mb(); /* D, matches with A */

	ubuf->tail = tail;
}


Now the whole crux of the question is if we need barrier A at all, since
the STORES issued by the @buf writes are dependent on the ubuf->tail
read.

If the read shows no available space, we simply will not issue those
writes -- therefore we could argue we can avoid the memory barrier.

However, that leaves D unpaired and me confused. We must have D because
otherwise the CPU could reorder that write into the reads previous and
the kernel could start overwriting data we're still reading.. which
seems like a bad deal.

Also, I'm not entirely sure on C, that too seems like a dependency, we
simply cannot read the buffer @tail before we've read the tail itself,
now can we? Similarly we cannot compare tail to head without having the
head read completed.


Could we replace A and C with an smp_read_barrier_depends()?

  reply	other threads:[~2013-10-30 11:25 UTC|newest]

Thread overview: 120+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-22 23:54 perf events ring buffer memory barrier on powerpc Michael Neuling
2013-10-23  7:39 ` Victor Kaplansky
2013-10-23 14:19 ` Frederic Weisbecker
2013-10-23 14:25   ` Frederic Weisbecker
2013-10-25 17:37   ` Peter Zijlstra
2013-10-25 20:31     ` Michael Neuling
2013-10-27  9:00     ` Victor Kaplansky
2013-10-28  9:22       ` Peter Zijlstra
2013-10-28 10:02     ` Frederic Weisbecker
2013-10-28 12:38       ` Victor Kaplansky
2013-10-28 13:26         ` Peter Zijlstra
2013-10-28 16:34           ` Paul E. McKenney
2013-10-28 20:17             ` Oleg Nesterov
2013-10-28 20:58               ` Victor Kaplansky
2013-10-29 10:21                 ` Peter Zijlstra
2013-10-29 10:30                   ` Peter Zijlstra
2013-10-29 10:35                     ` Peter Zijlstra
2013-10-29 20:15                       ` Oleg Nesterov
2013-10-29 19:27                     ` Vince Weaver
2013-10-30 10:42                       ` Peter Zijlstra
2013-10-30 11:48                         ` James Hogan
2013-10-30 12:48                           ` Peter Zijlstra
2013-11-06 13:19                         ` [tip:perf/core] tools/perf: Add required memory barriers tip-bot for Peter Zijlstra
2013-11-06 13:50                           ` Vince Weaver
2013-11-06 14:00                             ` Peter Zijlstra
2013-11-06 14:28                               ` Peter Zijlstra
2013-11-06 14:55                                 ` Vince Weaver
2013-11-06 15:10                                   ` Peter Zijlstra
2013-11-06 15:23                                     ` Peter Zijlstra
2013-11-06 14:44                               ` Peter Zijlstra
2013-11-06 16:07                                 ` Peter Zijlstra
2013-11-06 17:31                                   ` Vince Weaver
2013-11-06 18:24                                     ` Peter Zijlstra
2013-11-07  8:21                                       ` Ingo Molnar
2013-11-07 14:27                                         ` Vince Weaver
2013-11-07 15:55                                           ` Ingo Molnar
2013-11-11 16:24                                         ` Peter Zijlstra
2013-11-11 21:10                                           ` Ingo Molnar
2013-10-29 21:23                     ` perf events ring buffer memory barrier on powerpc Michael Neuling
2013-10-30  9:27                 ` Paul E. McKenney
2013-10-30 11:25                   ` Peter Zijlstra [this message]
2013-10-30 14:52                     ` Victor Kaplansky
2013-10-30 15:39                       ` Peter Zijlstra
2013-10-30 17:14                         ` Victor Kaplansky
2013-10-30 17:44                           ` Peter Zijlstra
2013-10-31  6:16                       ` Paul E. McKenney
2013-11-01 13:12                         ` Victor Kaplansky
2013-11-02 16:36                           ` Paul E. McKenney
2013-11-02 17:26                             ` Paul E. McKenney
2013-10-31  6:40                     ` Paul E. McKenney
2013-11-01 14:25                       ` Victor Kaplansky
2013-11-02 17:28                         ` Paul E. McKenney
2013-11-01 14:56                       ` Peter Zijlstra
2013-11-02 17:32                         ` Paul E. McKenney
2013-11-03 14:40                           ` Paul E. McKenney
2013-11-03 15:17                             ` [RFC] arch: Introduce new TSO memory barrier smp_tmb() Peter Zijlstra
2013-11-03 18:08                               ` Linus Torvalds
2013-11-03 20:01                                 ` Peter Zijlstra
2013-11-03 22:42                                   ` Paul E. McKenney
2013-11-03 23:34                                     ` Linus Torvalds
2013-11-04 10:51                                       ` Paul E. McKenney
2013-11-04 11:22                                         ` Peter Zijlstra
2013-11-04 16:27                                           ` Paul E. McKenney
2013-11-04 16:48                                             ` Peter Zijlstra
2013-11-04 19:11                                             ` Peter Zijlstra
2013-11-04 19:18                                               ` Peter Zijlstra
2013-11-04 20:54                                                 ` Paul E. McKenney
2013-11-04 20:53                                               ` Paul E. McKenney
2013-11-05 14:05                                                 ` Will Deacon
2013-11-05 14:49                                                   ` Paul E. McKenney
2013-11-05 18:49                                                   ` Peter Zijlstra
2013-11-06 11:00                                                     ` Will Deacon
2013-11-06 12:39                                                 ` Peter Zijlstra
2013-11-06 12:51                                                   ` Geert Uytterhoeven
2013-11-06 13:57                                                     ` Peter Zijlstra
2013-11-06 18:48                                                       ` Paul E. McKenney
2013-11-06 19:42                                                         ` Peter Zijlstra
2013-11-07 11:17                                                       ` Will Deacon
2013-11-07 13:36                                                         ` Peter Zijlstra
2013-11-07 23:50                                           ` Mathieu Desnoyers
2013-11-04 11:05                                       ` Will Deacon
2013-11-04 16:34                                         ` Paul E. McKenney
2013-11-03 20:59                               ` Benjamin Herrenschmidt
2013-11-03 22:43                                 ` Paul E. McKenney
2013-11-03 17:07                             ` perf events ring buffer memory barrier on powerpc Will Deacon
2013-11-03 22:47                               ` Paul E. McKenney
2013-11-04  9:57                                 ` Will Deacon
2013-11-04 10:52                                   ` Paul E. McKenney
2013-11-01 16:11                       ` Peter Zijlstra
2013-11-02 17:46                         ` Paul E. McKenney
2013-11-01 16:18                       ` Peter Zijlstra
2013-11-02 17:49                         ` Paul E. McKenney
2013-10-30 13:28                   ` Victor Kaplansky
2013-10-30 15:51                     ` Peter Zijlstra
2013-10-30 18:29                       ` Peter Zijlstra
2013-10-30 19:11                         ` Peter Zijlstra
2013-10-31  4:33                       ` Paul E. McKenney
2013-10-31  4:32                     ` Paul E. McKenney
2013-10-31  9:04                       ` Peter Zijlstra
2013-10-31 15:07                         ` Paul E. McKenney
2013-10-31 15:19                           ` Peter Zijlstra
2013-11-01  9:28                             ` Paul E. McKenney
2013-11-01 10:30                               ` Peter Zijlstra
2013-11-02 15:20                                 ` Paul E. McKenney
2013-11-04  9:07                                   ` Peter Zijlstra
2013-11-04 10:00                                     ` Paul E. McKenney
2013-10-31  9:59                       ` Victor Kaplansky
2013-10-31 12:28                         ` David Laight
2013-10-31 12:55                           ` Victor Kaplansky
2013-10-31 15:25                         ` Paul E. McKenney
2013-11-01 16:06                           ` Victor Kaplansky
2013-11-01 16:25                             ` David Laight
2013-11-01 16:30                               ` Victor Kaplansky
2013-11-03 20:57                                 ` Benjamin Herrenschmidt
2013-11-02 15:46                             ` Paul E. McKenney
2013-10-28 19:09           ` Oleg Nesterov
2013-10-29 14:06     ` [tip:perf/urgent] perf: Fix perf ring buffer memory ordering tip-bot for Peter Zijlstra
2014-05-08 20:46 perf events ring buffer memory barrier on powerpc Mikulas Patocka
     [not found] ` <OF667059AA.7F151BCC-ONC2257CD3.0036CFEB-C2257CD3.003BBF01@il.ibm.com>
2014-05-09 12:20   ` Mikulas Patocka
2014-05-09 13:47     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131030112526.GI16117@laptop.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=VICTORK@il.ibm.com \
    --cc=anton@samba.org \
    --cc=benh@kernel.crashing.org \
    --cc=fweisbec@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linuxppc-dev@ozlabs.org \
    --cc=mathieu.desnoyers@polymtl.ca \
    --cc=michael@ellerman.id.au \
    --cc=mikey@neuling.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).