linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	jeremy.linton@arm.com, Boqun Feng <boqun.feng@gmail.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>
Subject: Re: Perf hotplug lockup in v4.9-rc8
Date: Thu, 22 Dec 2016 09:45:09 +0100	[thread overview]
Message-ID: <20161222084509.GX3174@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20161212124228.GE3124@twins.programming.kicks-ass.net>

On Mon, Dec 12, 2016 at 01:42:28PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 12, 2016 at 11:46:40AM +0000, Will Deacon wrote:
> > > @@ -2331,13 +2330,36 @@ perf_install_in_context(struct perf_event_context *ctx,
> > >  	/*
> > >  	 * Installing events is tricky because we cannot rely on ctx->is_active
> > >  	 * to be set in case this is the nr_events 0 -> 1 transition.
> > > +	 *
> > > +	 * Instead we use task_curr(), which tells us if the task is running.
> > > +	 * However, since we use task_curr() outside of rq::lock, we can race
> > > +	 * against the actual state. This means the result can be wrong.
> > > +	 *
> > > +	 * If we get a false positive, we retry, this is harmless.
> > > +	 *
> > > +	 * If we get a false negative, things are complicated. If we are after
> > > +	 * perf_event_context_sched_in() ctx::lock will serialize us, and the
> > > +	 * value must be correct. If we're before, it doesn't matter since
> > > +	 * perf_event_context_sched_in() will program the counter.
> > > +	 *
> > > +	 * However, this hinges on the remote context switch having observed
> > > +	 * our task->perf_event_ctxp[] store, such that it will in fact take
> > > +	 * ctx::lock in perf_event_context_sched_in().
> > > +	 *
> > > +	 * We do this by task_function_call(), if the IPI fails to hit the task
> > > +	 * we know any future context switch of task must see the
> > > +	 * perf_event_ctpx[] store.
> > >  	 */
> > > +
> > >  	/*
> > > +	 * This smp_mb() orders the task->perf_event_ctxp[] store with the
> > > +	 * task_cpu() load, such that if the IPI then does not find the task
> > > +	 * running, a future context switch of that task must observe the
> > > +	 * store.
> > >  	 */
> > > +	smp_mb();
> > > +again:
> > > +	if (!task_function_call(task, __perf_install_in_context, event))
> > >  		return;
> > 
> > I'm trying to figure out whether or not the barriers implied by the IPI
> > are sufficient here, or whether we really need the explicit smp_mb().
> > Certainly, arch_send_call_function_single_ipi has to order the publishing
> > of the remote work before the signalling of the interrupt, but the comment
> > above refers to "the task_cpu() load" and I can't see that after your
> > diff.
> > 
> > What are you trying to order here?
> 
> I suppose something like this:
> 
> 
> CPU0		CPU1		CPU2
> 
> 		(current == t)
> 
> t->perf_event_ctxp[] = ctx;
> smp_mb();
> cpu = task_cpu(t);
> 
> 		switch(t, n);
> 				migrate(t, 2);
> 				switch(p, t);
> 
> 				ctx = t->perf_event_ctxp[]; // must not be NULL
> 

So I think I can cast the above into a test like:

  W[x] = 1                W[y] = 1                R[z] = 1
  mb                      mb                      mb
  R[y] = 0                W[z] = 1                R[x] = 0

Where x is the perf_event_ctxp[], y is our task's cpu and z is our task
being placed on the rq of cpu2.

See also commit: 8643cda549ca ("sched/core, locking: Document
Program-Order guarantees"), Independent of which cpu initiates the
migration between CPU1 and CPU2 there is ordering between the CPUs.

This would then translate into something like:

  C C-peterz

  {
  }

  P0(int *x, int *y)
  {
	  int r1;

	  WRITE_ONCE(*x, 1);
	  smp_mb();
	  r1 = READ_ONCE(*y);
  }

  P1(int *y, int *z)
  {
	  WRITE_ONCE(*y, 1);
	  smp_mb();
	  WRITE_ONCE(*z, 1);
  }

  P2(int *x, int *z)
  {
	  int r1;
	  int r2;

	  r1 = READ_ONCE(*z);
	  smp_mb();
	  r2 = READ_ONCE(*x);
  }

  exists
  (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)

Which evaluates into:

  Test C-peterz Allowed
  States 7
  0:r1=0; 2:r1=0; 2:r2=0;
  0:r1=0; 2:r1=0; 2:r2=1;
  0:r1=0; 2:r1=1; 2:r2=1;
  0:r1=1; 2:r1=0; 2:r2=0;
  0:r1=1; 2:r1=0; 2:r2=1;
  0:r1=1; 2:r1=1; 2:r2=0;
  0:r1=1; 2:r1=1; 2:r2=1;
  No
  Witnesses
  Positive: 0 Negative: 7
  Condition exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
  Observation C-peterz Never 0 7
  Hash=661589febb9e41b222d8acae1fd64e25

And the strong and weak model agree.


> smp_function_call(cpu, ..);
> 
> 		generic_exec_single()
> 		  func();
> 		    spin_lock(ctx->lock);
> 		    if (task_curr(t)) // false
> 
> 		    add_event_to_ctx();
> 		    spin_unlock(ctx->lock);
> 
> 				perf_event_context_sched_in();
> 				  spin_lock(ctx->lock);
> 				  // sees event
> 
> 
> Where between setting the perf_event_ctxp[] and sending the IPI the task
> moves away and the IPI misses, and while the new CPU is in the middle of
> scheduling in t, it hasn't yet passed through perf_event_sched_in(), but
> when it does, it _must_ observe the ctx value we stored.
> 
> My thinking was that the IPI itself is not sufficient since when it
> misses the task, nothing then guarantees we see the store. However, if
> we order the store and the task_cpu() load, then any context
> switching/migrating involved with changing that value, should ensure we
> see our prior store.
> 
> Of course, even now writing this, I'm still confused.

On IRC you said:

: I think it's similar to the "ISA2" litmus test, only the first reads-from edge is an IPI and the second is an Unlock->Lock

In case the IPI misses, we cannot use the IPI itself for anything I'm
afraid, also per the above we don't need to.

: the case I'm more confused by is if CPU2 takes the ctx->lock before CPU1
: I'm guessing that's prevented by the way migration works?

So same scenario but CPU2 takes the ctx->lock first. In that case it
will not observe our event and do nothing. CPU1 will then acquire
ctx->lock, this then implies ordering against CPU2, which means it
_must_ observe task_curr() && task != current and it too will not do
anything but we'll loop and try the whole thing again.

  reply	other threads:[~2016-12-22  8:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-07 13:53 Perf hotplug lockup in v4.9-rc8 Mark Rutland
2016-12-07 14:30 ` Mark Rutland
2016-12-07 16:39   ` Mark Rutland
2016-12-07 17:53 ` Mark Rutland
2016-12-07 18:34   ` Peter Zijlstra
2016-12-07 19:56     ` Mark Rutland
2016-12-09 13:59     ` Peter Zijlstra
2016-12-12 11:46       ` Will Deacon
2016-12-12 12:42         ` Peter Zijlstra
2016-12-22  8:45           ` Peter Zijlstra [this message]
2016-12-22 14:00             ` Peter Zijlstra
2016-12-22 16:33               ` Paul E. McKenney
2017-01-11 14:59       ` Mark Rutland
2017-01-11 16:03         ` Peter Zijlstra
2017-01-11 16:26           ` Mark Rutland
2017-01-11 19:51           ` Peter Zijlstra
2017-01-14 12:28       ` [tip:perf/urgent] perf/core: Fix sys_perf_event_open() vs. hotplug tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161222084509.GX3174@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=jeremy.linton@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).