All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	jeremy.linton@arm.com
Subject: Re: Perf hotplug lockup in v4.9-rc8
Date: Mon, 12 Dec 2016 11:46:40 +0000	[thread overview]
Message-ID: <20161212114640.GD21248@arm.com> (raw)
In-Reply-To: <20161209135900.GU3174@twins.programming.kicks-ass.net>

On Fri, Dec 09, 2016 at 02:59:00PM +0100, Peter Zijlstra wrote:
> On Wed, Dec 07, 2016 at 07:34:55PM +0100, Peter Zijlstra wrote:
> 
> > @@ -2352,6 +2357,28 @@ perf_install_in_context(struct perf_event_context *ctx,
> >  		return;
> >  	}
> >  	raw_spin_unlock_irq(&ctx->lock);
> > +
> > +	raw_spin_lock_irq(&task->pi_lock);
> > +	if (!(task->state == TASK_RUNNING || task->state == TASK_WAKING)) {
> > +		/*
> > +		 * XXX horrific hack...
> > +		 */
> > +		raw_spin_lock(&ctx->lock);
> > +		if (task != ctx->task) {
> > +			raw_spin_unlock(&ctx->lock);
> > +			raw_spin_unlock_irq(&task->pi_lock);
> > +			goto again;
> > +		}
> > +
> > +		add_event_to_ctx(event, ctx);
> > +		raw_spin_unlock(&ctx->lock);
> > +		raw_spin_unlock_irq(&task->pi_lock);
> > +		return;
> > +	}
> > +	raw_spin_unlock_irq(&task->pi_lock);
> > +
> > +	cond_resched();
> > +
> >  	/*
> >  	 * Since !ctx->is_active doesn't mean anything, we must IPI
> >  	 * unconditionally.
> 
> So while I went back and forth trying to make that less ugly, I figured
> there was another problem.
> 
> Imagine the cpu_function_call() hitting the 'right' cpu, but not finding
> the task current. It will then continue to install the event in the
> context. However, that doesn't stop another CPU from pulling the task in
> question from our rq and scheduling it elsewhere.
> 
> This all lead me to the below patch.. Now it has a rather large comment,
> and while it represents my current thinking on the matter, I'm not at
> all sure its entirely correct. I got my brain in a fair twist while
> writing it.
> 
> Please as to carefully think about it.
> 
> ---
>  kernel/events/core.c | 70 +++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 48 insertions(+), 22 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 6ee1febdf6ff..7d9ae461c535 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2252,7 +2252,7 @@ static int  __perf_install_in_context(void *info)
>  	struct perf_event_context *ctx = event->ctx;
>  	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
>  	struct perf_event_context *task_ctx = cpuctx->task_ctx;
> -	bool activate = true;
> +	bool reprogram = true;
>  	int ret = 0;
>  
>  	raw_spin_lock(&cpuctx->ctx.lock);
> @@ -2260,27 +2260,26 @@ static int  __perf_install_in_context(void *info)
>  		raw_spin_lock(&ctx->lock);
>  		task_ctx = ctx;
>  
> -		/* If we're on the wrong CPU, try again */
> -		if (task_cpu(ctx->task) != smp_processor_id()) {
> -			ret = -ESRCH;
> -			goto unlock;
> -		}
> +		reprogram = (ctx->task == current);
>  
>  		/*
> -		 * If we're on the right CPU, see if the task we target is
> -		 * current, if not we don't have to activate the ctx, a future
> -		 * context switch will do that for us.
> +		 * If the task is running, it must be running on this CPU,
> +		 * otherwise we cannot reprogram things.
> +		 *
> +		 * If its not running, we don't care, ctx->lock will
> +		 * serialize against it becoming runnable.
>  		 */
> -		if (ctx->task != current)
> -			activate = false;
> -		else
> -			WARN_ON_ONCE(cpuctx->task_ctx && cpuctx->task_ctx != ctx);
> +		if (task_curr(ctx->task) && !reprogram) {
> +			ret = -ESRCH;
> +			goto unlock;
> +		}
>  
> +		WARN_ON_ONCE(reprogram && cpuctx->task_ctx && cpuctx->task_ctx != ctx);
>  	} else if (task_ctx) {
>  		raw_spin_lock(&task_ctx->lock);
>  	}
>  
> -	if (activate) {
> +	if (reprogram) {
>  		ctx_sched_out(ctx, cpuctx, EVENT_TIME);
>  		add_event_to_ctx(event, ctx);
>  		ctx_resched(cpuctx, task_ctx);
> @@ -2331,13 +2330,36 @@ perf_install_in_context(struct perf_event_context *ctx,
>  	/*
>  	 * Installing events is tricky because we cannot rely on ctx->is_active
>  	 * to be set in case this is the nr_events 0 -> 1 transition.
> +	 *
> +	 * Instead we use task_curr(), which tells us if the task is running.
> +	 * However, since we use task_curr() outside of rq::lock, we can race
> +	 * against the actual state. This means the result can be wrong.
> +	 *
> +	 * If we get a false positive, we retry, this is harmless.
> +	 *
> +	 * If we get a false negative, things are complicated. If we are after
> +	 * perf_event_context_sched_in() ctx::lock will serialize us, and the
> +	 * value must be correct. If we're before, it doesn't matter since
> +	 * perf_event_context_sched_in() will program the counter.
> +	 *
> +	 * However, this hinges on the remote context switch having observed
> +	 * our task->perf_event_ctxp[] store, such that it will in fact take
> +	 * ctx::lock in perf_event_context_sched_in().
> +	 *
> +	 * We do this by task_function_call(), if the IPI fails to hit the task
> +	 * we know any future context switch of task must see the
> +	 * perf_event_ctpx[] store.
>  	 */
> -again:
> +
>  	/*
> -	 * Cannot use task_function_call() because we need to run on the task's
> -	 * CPU regardless of whether its current or not.
> +	 * This smp_mb() orders the task->perf_event_ctxp[] store with the
> +	 * task_cpu() load, such that if the IPI then does not find the task
> +	 * running, a future context switch of that task must observe the
> +	 * store.
>  	 */
> -	if (!cpu_function_call(task_cpu(task), __perf_install_in_context, event))
> +	smp_mb();
> +again:
> +	if (!task_function_call(task, __perf_install_in_context, event))
>  		return;

I'm trying to figure out whether or not the barriers implied by the IPI
are sufficient here, or whether we really need the explicit smp_mb().
Certainly, arch_send_call_function_single_ipi has to order the publishing
of the remote work before the signalling of the interrupt, but the comment
above refers to "the task_cpu() load" and I can't see that after your
diff.

What are you trying to order here?

Will

>  
>  	raw_spin_lock_irq(&ctx->lock);
> @@ -2351,12 +2373,16 @@ perf_install_in_context(struct perf_event_context *ctx,
>  		raw_spin_unlock_irq(&ctx->lock);
>  		return;
>  	}
> -	raw_spin_unlock_irq(&ctx->lock);
>  	/*
> -	 * Since !ctx->is_active doesn't mean anything, we must IPI
> -	 * unconditionally.
> +	 * If the task is not running, ctx->lock will avoid it becoming so,
> +	 * thus we can safely install the event.
>  	 */
> -	goto again;
> +	if (task_curr(task)) {
> +		raw_spin_unlock_irq(&ctx->lock);
> +		goto again;
> +	}
> +	add_event_to_ctx(event, ctx);
> +	raw_spin_unlock_irq(&ctx->lock);
>  }
>  
>  /*

  reply	other threads:[~2016-12-12 11:46 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-07 13:53 Perf hotplug lockup in v4.9-rc8 Mark Rutland
2016-12-07 14:30 ` Mark Rutland
2016-12-07 16:39   ` Mark Rutland
2016-12-07 17:53 ` Mark Rutland
2016-12-07 18:34   ` Peter Zijlstra
2016-12-07 19:56     ` Mark Rutland
2016-12-09 13:59     ` Peter Zijlstra
2016-12-12 11:46       ` Will Deacon [this message]
2016-12-12 12:42         ` Peter Zijlstra
2016-12-22  8:45           ` Peter Zijlstra
2016-12-22 14:00             ` Peter Zijlstra
2016-12-22 16:33               ` Paul E. McKenney
2017-01-11 14:59       ` Mark Rutland
2017-01-11 16:03         ` Peter Zijlstra
2017-01-11 16:26           ` Mark Rutland
2017-01-11 19:51           ` Peter Zijlstra
2017-01-14 12:28       ` [tip:perf/urgent] perf/core: Fix sys_perf_event_open() vs. hotplug tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161212114640.GD21248@arm.com \
    --to=will.deacon@arm.com \
    --cc=acme@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=jeremy.linton@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.