All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Thomas-Mich Richter <tmricht@linux.ibm.com>
Cc: Kees Cook <keescook@chromium.org>,
	acme@redhat.com,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	Hendrik Brueckner <brueckner@linux.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	mark.rutland@arm.com, jolsa@redhat.com
Subject: Re: WARN_ON_ONCE() hit at kernel/events/core.c:330
Date: Mon, 8 Apr 2019 11:50:31 +0200	[thread overview]
Message-ID: <20190408095031.GG14281@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20190408082229.GI4038@hirez.programming.kicks-ass.net>

On Mon, Apr 08, 2019 at 10:22:29AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:

> > very good news, your fix ran over the weekend without any hit!!!
> > 
> > Thanks very much for your help. Do you submit this patch to the kernel mailing list?
> 
> Most excellent, let me go write a Changelog.

Hi Thomas, find below.

Sadly, while writing the Changelog I ended up with a 'completely'
differet patch again, could I bother you to test this one too?

---
Subject: perf: Fix perf_event_disable_inatomic()
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 4 Apr 2019 15:03:00 +0200

Thomas-Mich Richter reported he triggered a WARN from event_function_local()
on his s390. The problem boils down to:

  CPU-A				CPU-B

  perf_event_overflow()
    perf_event_disable_inatomic()
      @pending_disable = 1
      irq_work_queue();

  sched-out
    event_sched_out()
      @pending_disable = 0

				sched-in
				perf_event_overflow()
				  perf_event_disable_inatomic()
				    @pending_disable = 1;
				    irq_work_queue(); // FAILS

  irq_work_run()
    perf_pending_event()
      if (@pending_disable)
        perf_event_disable_local(); // WHOOPS

The problem exists in generic, but s390 is particularly sensitive
because it doesn't implement arch_irq_work_raise(), nor does it call
irq_work_run() from it's PMU interrupt handler (nor would that be
sufficient in this case, because s390 also generates
perf_event_overflow() from pmu::stop). Add to that the fact that s390
is a virtual architecture and (virtual) CPU-A can stall long enough
for the above race to happen, even if it would self-IPI.

Adding an irq_work_syn() to event_sched_in() would work for all hardare
PMUs that properly use irq_work_run() but fails for software PMUs.

Instead encode the CPU number in @pending_disable, such that we can
tell which CPU requested the disable. This then allows us to detect
the above scenario and even redirect the IPI to make up for the failed
queue.

Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Hendrik Brueckner <brueckner@linux.ibm.com>
Cc: acme@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Reported-by: Thomas-Mich Richter <tmricht@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   52 ++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 43 insertions(+), 9 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event
 	event->pmu->del(event, 0);
 	event->oncpu = -1;
 
-	if (event->pending_disable) {
-		event->pending_disable = 0;
+	if (READ_ONCE(event->pending_disable) >= 0) {
+		WRITE_ONCE(event->pending_disable, -1);
 		state = PERF_EVENT_STATE_OFF;
 	}
 	perf_event_set_state(event, state);
@@ -2198,7 +2198,8 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
 
 void perf_event_disable_inatomic(struct perf_event *event)
 {
-	event->pending_disable = 1;
+	WRITE_ONCE(event->pending_disable, smp_processor_id());
+	/* can fail, see perf_pending_event_disable() */
 	irq_work_queue(&event->pending);
 }
 
@@ -5810,10 +5811,45 @@ void perf_event_wakeup(struct perf_event
 	}
 }
 
+static void perf_pending_event_disable(struct perf_event *event)
+{
+	int cpu = READ_ONCE(event->pending_disable);
+
+	if (cpu < 0)
+		return;
+
+	if (cpu == smp_processor_id()) {
+		WRITE_ONCE(event->pending_disable, -1);
+		perf_event_disable_local(event);
+		return;
+	}
+
+	/*
+	 *  CPU-A			CPU-B
+	 *
+	 *  perf_event_disable_inatomic()
+	 *    @pending_disable = CPU-A;
+	 *    irq_work_queue();
+	 *
+	 *  sched-out
+	 *    @pending_disable = -1;
+	 *
+	 *				sched-in
+	 *				perf_event_disable_inatomic()
+	 *				  @pending_disable = CPU-B;
+	 *				  irq_work_queue(); // FAILS
+	 *
+	 *  irq_work_run()
+	 *    perf_pending_event()
+	 *
+	 * But the event runs on CPU-B and wants disabling there.
+	 */
+	irq_work_queue_on(&event->pending, cpu);
+}
+
 static void perf_pending_event(struct irq_work *entry)
 {
-	struct perf_event *event = container_of(entry,
-			struct perf_event, pending);
+	struct perf_event *event = container_of(entry, struct perf_event, pending);
 	int rctx;
 
 	rctx = perf_swevent_get_recursion_context();
@@ -5822,10 +5858,7 @@ static void perf_pending_event(struct ir
 	 * and we won't recurse 'further'.
 	 */
 
-	if (event->pending_disable) {
-		event->pending_disable = 0;
-		perf_event_disable_local(event);
-	}
+	perf_pending_event_disable(event);
 
 	if (event->pending_wakeup) {
 		event->pending_wakeup = 0;
@@ -10236,6 +10269,7 @@ perf_event_alloc(struct perf_event_attr
 
 
 	init_waitqueue_head(&event->waitq);
+	event->pending_disable = -1;
 	init_irq_work(&event->pending, perf_pending_event);
 
 	mutex_init(&event->mmap_mutex);

  parent reply	other threads:[~2019-04-08  9:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-03  9:47 WARN_ON_ONCE() hit at kernel/events/core.c:330 Thomas-Mich Richter
2019-04-03 10:41 ` Peter Zijlstra
2019-04-03 11:26   ` Thomas-Mich Richter
2019-04-04  9:15   ` Thomas-Mich Richter
2019-04-04 11:09     ` Peter Zijlstra
2019-04-04 12:02       ` Peter Zijlstra
2019-04-04 12:13         ` Peter Zijlstra
2019-04-04 13:03       ` Peter Zijlstra
2019-04-04 13:21         ` Thomas-Mich Richter
2019-04-05 10:18         ` Thomas-Mich Richter
2019-04-05 11:46           ` Peter Zijlstra
2019-04-08  7:12         ` Thomas-Mich Richter
2019-04-08  8:22           ` Peter Zijlstra
2019-04-08  8:47             ` Thomas-Mich Richter
2019-04-08  9:50             ` Peter Zijlstra [this message]
2019-04-08 13:28               ` Thomas-Mich Richter
2019-04-09  6:07               ` Thomas-Mich Richter
2019-04-09  8:29                 ` Peter Zijlstra
2019-04-09  8:53               ` Mark Rutland
2019-04-10 13:51                 ` Thomas-Mich Richter
2019-04-10 14:33                   ` Peter Zijlstra
2019-04-11 12:06                     ` Alexander Shishkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190408095031.GG14281@hirez.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=acme@redhat.com \
    --cc=brueckner@linux.ibm.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jolsa@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=schwidefsky@de.ibm.com \
    --cc=tmricht@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.