All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	jeremy.linton@arm.com, Boqun Feng <boqun.feng@gmail.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>
Subject: Re: Perf hotplug lockup in v4.9-rc8
Date: Thu, 22 Dec 2016 09:45:09 +0100	[thread overview]
Message-ID: <20161222084509.GX3174@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20161212124228.GE3124@twins.programming.kicks-ass.net>

On Mon, Dec 12, 2016 at 01:42:28PM +0100, Peter Zijlstra wrote:
> On Mon, Dec 12, 2016 at 11:46:40AM +0000, Will Deacon wrote:
> > > @@ -2331,13 +2330,36 @@ perf_install_in_context(struct perf_event_context *ctx,
> > >  	/*
> > >  	 * Installing events is tricky because we cannot rely on ctx->is_active
> > >  	 * to be set in case this is the nr_events 0 -> 1 transition.
> > > +	 *
> > > +	 * Instead we use task_curr(), which tells us if the task is running.
> > > +	 * However, since we use task_curr() outside of rq::lock, we can race
> > > +	 * against the actual state. This means the result can be wrong.
> > > +	 *
> > > +	 * If we get a false positive, we retry, this is harmless.
> > > +	 *
> > > +	 * If we get a false negative, things are complicated. If we are after
> > > +	 * perf_event_context_sched_in() ctx::lock will serialize us, and the
> > > +	 * value must be correct. If we're before, it doesn't matter since
> > > +	 * perf_event_context_sched_in() will program the counter.
> > > +	 *
> > > +	 * However, this hinges on the remote context switch having observed
> > > +	 * our task->perf_event_ctxp[] store, such that it will in fact take
> > > +	 * ctx::lock in perf_event_context_sched_in().
> > > +	 *
> > > +	 * We do this by task_function_call(), if the IPI fails to hit the task
> > > +	 * we know any future context switch of task must see the
> > > +	 * perf_event_ctpx[] store.
> > >  	 */
> > > +
> > >  	/*
> > > +	 * This smp_mb() orders the task->perf_event_ctxp[] store with the
> > > +	 * task_cpu() load, such that if the IPI then does not find the task
> > > +	 * running, a future context switch of that task must observe the
> > > +	 * store.
> > >  	 */
> > > +	smp_mb();
> > > +again:
> > > +	if (!task_function_call(task, __perf_install_in_context, event))
> > >  		return;
> > 
> > I'm trying to figure out whether or not the barriers implied by the IPI
> > are sufficient here, or whether we really need the explicit smp_mb().
> > Certainly, arch_send_call_function_single_ipi has to order the publishing
> > of the remote work before the signalling of the interrupt, but the comment
> > above refers to "the task_cpu() load" and I can't see that after your
> > diff.
> > 
> > What are you trying to order here?
> 
> I suppose something like this:
> 
> 
> CPU0		CPU1		CPU2
> 
> 		(current == t)
> 
> t->perf_event_ctxp[] = ctx;
> smp_mb();
> cpu = task_cpu(t);
> 
> 		switch(t, n);
> 				migrate(t, 2);
> 				switch(p, t);
> 
> 				ctx = t->perf_event_ctxp[]; // must not be NULL
> 

So I think I can cast the above into a test like:

  W[x] = 1                W[y] = 1                R[z] = 1
  mb                      mb                      mb
  R[y] = 0                W[z] = 1                R[x] = 0

Where x is the perf_event_ctxp[], y is our task's cpu and z is our task
being placed on the rq of cpu2.

See also commit: 8643cda549ca ("sched/core, locking: Document
Program-Order guarantees"), Independent of which cpu initiates the
migration between CPU1 and CPU2 there is ordering between the CPUs.

This would then translate into something like:

  C C-peterz

  {
  }

  P0(int *x, int *y)
  {
	  int r1;

	  WRITE_ONCE(*x, 1);
	  smp_mb();
	  r1 = READ_ONCE(*y);
  }

  P1(int *y, int *z)
  {
	  WRITE_ONCE(*y, 1);
	  smp_mb();
	  WRITE_ONCE(*z, 1);
  }

  P2(int *x, int *z)
  {
	  int r1;
	  int r2;

	  r1 = READ_ONCE(*z);
	  smp_mb();
	  r2 = READ_ONCE(*x);
  }

  exists
  (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)

Which evaluates into:

  Test C-peterz Allowed
  States 7
  0:r1=0; 2:r1=0; 2:r2=0;
  0:r1=0; 2:r1=0; 2:r2=1;
  0:r1=0; 2:r1=1; 2:r2=1;
  0:r1=1; 2:r1=0; 2:r2=0;
  0:r1=1; 2:r1=0; 2:r2=1;
  0:r1=1; 2:r1=1; 2:r2=0;
  0:r1=1; 2:r1=1; 2:r2=1;
  No
  Witnesses
  Positive: 0 Negative: 7
  Condition exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
  Observation C-peterz Never 0 7
  Hash=661589febb9e41b222d8acae1fd64e25

And the strong and weak model agree.


> smp_function_call(cpu, ..);
> 
> 		generic_exec_single()
> 		  func();
> 		    spin_lock(ctx->lock);
> 		    if (task_curr(t)) // false
> 
> 		    add_event_to_ctx();
> 		    spin_unlock(ctx->lock);
> 
> 				perf_event_context_sched_in();
> 				  spin_lock(ctx->lock);
> 				  // sees event
> 
> 
> Where between setting the perf_event_ctxp[] and sending the IPI the task
> moves away and the IPI misses, and while the new CPU is in the middle of
> scheduling in t, it hasn't yet passed through perf_event_sched_in(), but
> when it does, it _must_ observe the ctx value we stored.
> 
> My thinking was that the IPI itself is not sufficient since when it
> misses the task, nothing then guarantees we see the store. However, if
> we order the store and the task_cpu() load, then any context
> switching/migrating involved with changing that value, should ensure we
> see our prior store.
> 
> Of course, even now writing this, I'm still confused.

On IRC you said:

: I think it's similar to the "ISA2" litmus test, only the first reads-from edge is an IPI and the second is an Unlock->Lock

In case the IPI misses, we cannot use the IPI itself for anything I'm
afraid, also per the above we don't need to.

: the case I'm more confused by is if CPU2 takes the ctx->lock before CPU1
: I'm guessing that's prevented by the way migration works?

So same scenario but CPU2 takes the ctx->lock first. In that case it
will not observe our event and do nothing. CPU1 will then acquire
ctx->lock, this then implies ordering against CPU2, which means it
_must_ observe task_curr() && task != current and it too will not do
anything but we'll loop and try the whole thing again.

  reply	other threads:[~2016-12-22  8:45 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-07 13:53 Perf hotplug lockup in v4.9-rc8 Mark Rutland
2016-12-07 14:30 ` Mark Rutland
2016-12-07 16:39   ` Mark Rutland
2016-12-07 17:53 ` Mark Rutland
2016-12-07 18:34   ` Peter Zijlstra
2016-12-07 19:56     ` Mark Rutland
2016-12-09 13:59     ` Peter Zijlstra
2016-12-12 11:46       ` Will Deacon
2016-12-12 12:42         ` Peter Zijlstra
2016-12-22  8:45           ` Peter Zijlstra [this message]
2016-12-22 14:00             ` Peter Zijlstra
2016-12-22 16:33               ` Paul E. McKenney
2017-01-11 14:59       ` Mark Rutland
2017-01-11 16:03         ` Peter Zijlstra
2017-01-11 16:26           ` Mark Rutland
2017-01-11 19:51           ` Peter Zijlstra
2017-01-14 12:28       ` [tip:perf/urgent] perf/core: Fix sys_perf_event_open() vs. hotplug tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161222084509.GX3174@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=boqun.feng@gmail.com \
    --cc=jeremy.linton@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.