All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Vince Weaver <vince@deater.net>
Cc: mingo@kernel.org, alexander.shishkin@linux.intel.com,
	eranian@google.com, linux-kernel@vger.kernel.org,
	dvyukov@google.com, andi@firstfloor.org, jolsa@redhat.com,
	panand@redhat.com, sasha.levin@oracle.com, oleg@redhat.com,
	Borislav Petkov <bp@alien8.de>
Subject: Re: [PATCH 00/12] perf: more fixes
Date: Wed, 16 Mar 2016 23:59:33 +0100	[thread overview]
Message-ID: <20160316225933.GS6375@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20160315153830.GA6356@twins.programming.kicks-ass.net>

On Tue, Mar 15, 2016 at 04:38:30PM +0100, Peter Zijlstra wrote:

> Running perf_fuzzer on that AMD box is still producing lots of fail, I
> seen long strings of dazed and confused msgs, indicating we have a
> 'spurious' NMI problem somewhere.
> 
> And occasionally it locks up..
> 
> So we're not there yet.

So the below appears to alleviate some of this; but the hangs are
quicker now, so maybe I just made it worse.

---
Subject: perf, ibs: Fix race with IBS_STARTING state
From: Peter Zijlstra <peterz@infradead.org>
Date: Wed Mar 16 23:55:21 CET 2016

While tracing the IBS bits I saw the NMI hitting between clearing
IBS_STARTING and the actual MSR writes to disable the counter.

Since IBS_STARTING was cleared, the handler assumed these were spurious
NMIs and because STOPPING wasn't set yet either, insta-triggered an
"Unknown NMI".

Cure this by clearing IBS_STARTING after disabling the hardware.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/events/amd/ibs.c |   32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

--- a/arch/x86/events/amd/ibs.c
+++ b/arch/x86/events/amd/ibs.c
@@ -376,7 +376,13 @@ static void perf_ibs_start(struct perf_e
 	hwc->state = 0;
 
 	perf_ibs_set_period(perf_ibs, hwc, &period);
+	/*
+	 * Set STARTED before enabling the hardware, such that
+	 * a subsequent NMI must observe it. Then clear STOPPING
+	 * such that we don't consume NMIs by accident.
+	 */
 	set_bit(IBS_STARTED, pcpu->state);
+	clear_bit(IBS_STOPPING, pcpu->state);
 	perf_ibs_enable_event(perf_ibs, hwc, period >> 4);
 
 	perf_event_update_userpage(event);
@@ -390,7 +396,7 @@ static void perf_ibs_stop(struct perf_ev
 	u64 config;
 	int stopping;
 
-	stopping = test_and_clear_bit(IBS_STARTED, pcpu->state);
+	stopping = test_bit(IBS_STARTED, pcpu->state);
 
 	if (!stopping && (hwc->state & PERF_HES_UPTODATE))
 		return;
@@ -398,8 +404,24 @@ static void perf_ibs_stop(struct perf_ev
 	rdmsrl(hwc->config_base, config);
 
 	if (stopping) {
+		/*
+		 * Set STOPPING before disabling the hardware, such that it
+		 * must be visible to NMIs the moment we clear the EN bit,
+		 * at which point we can generate an !VALID sample which
+		 * we need to consume.
+		 */
 		set_bit(IBS_STOPPING, pcpu->state);
 		perf_ibs_disable_event(perf_ibs, hwc, config);
+		/*
+		 * Clear STARTED after disabling the hardware; if it were
+		 * cleared before an NMI hitting after the clear but before
+		 * clearing the EN bit might think it a spurious NMI and not
+		 * handle it.
+		 *
+		 * Clearing it after, however, creates the problem of the NMI
+		 * handler seeing STARTED but not having a valid sample.
+		 */
+		clear_bit(IBS_STARTED, pcpu->state);
 		WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
 		hwc->state |= PERF_HES_STOPPED;
 	}
@@ -527,20 +549,24 @@ static int perf_ibs_handle_irq(struct pe
 	u64 *buf, *config, period;
 
 	if (!test_bit(IBS_STARTED, pcpu->state)) {
+fail:
 		/*
 		 * Catch spurious interrupts after stopping IBS: After
 		 * disabling IBS there could be still incoming NMIs
 		 * with samples that even have the valid bit cleared.
 		 * Mark all this NMIs as handled.
 		 */
-		return test_and_clear_bit(IBS_STOPPING, pcpu->state) ? 1 : 0;
+		if (test_and_clear_bit(IBS_STOPPING, pcpu->state))
+			return 1;
+
+		return 0;
 	}
 
 	msr = hwc->config_base;
 	buf = ibs_data.regs;
 	rdmsrl(msr, *buf);
 	if (!(*buf++ & perf_ibs->valid_mask))
-		return 0;
+		goto fail;
 
 	config = &ibs_data.regs[0];
 	perf_ibs_event_update(perf_ibs, event, config);

  reply	other threads:[~2016-03-16 22:59 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-24 17:45 [PATCH 00/12] perf: more fixes Peter Zijlstra
2016-02-24 17:45 ` [PATCH 01/12] perf: Close install vs exit race Peter Zijlstra
2016-02-25  8:03   ` [tip:perf/urgent] perf: Close install vs. " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 02/12] perf: Do not double free Peter Zijlstra
2016-02-25  8:04   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 03/12] perf: Allow perf_release() with !event->ctx Peter Zijlstra
2016-02-25  8:04   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 04/12] perf: Only update context time when active Peter Zijlstra
2016-02-25  8:04   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 05/12] perf: Fix cloning Peter Zijlstra
2016-02-25  8:05   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 06/12] perf: Fix race between event install and jump_labels Peter Zijlstra
2016-02-25  8:05   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 07/12] perf: Cure event->pending_disable race Peter Zijlstra
2016-02-25  8:06   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 08/12] perf: Introduce EVENT_TIME Peter Zijlstra
2016-02-25  8:06   ` [tip:perf/urgent] perf: Fix ctx time tracking by introducing EVENT_TIME tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 09/12] perf: Fix scaling vs enable_on_exec Peter Zijlstra
2016-02-25  8:07   ` [tip:perf/urgent] perf: Fix scaling vs. perf_event_enable_on_exec() tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 10/12] perf: Fix scaling vs perf_event_enable Peter Zijlstra
2016-02-25  8:07   ` [tip:perf/urgent] perf: Fix scaling vs. perf_event_enable() tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 11/12] perf: Fix scaling vs perf_install_in_context Peter Zijlstra
2016-02-25  8:07   ` [tip:perf/urgent] perf: Fix scaling vs. perf_install_in_context() tip-bot for Peter Zijlstra
2016-02-24 17:45 ` [PATCH 12/12] perf: Robustify task_function_call() Peter Zijlstra
2016-02-25  8:08   ` [tip:perf/urgent] " tip-bot for Peter Zijlstra
2016-03-10 14:39 ` [PATCH 00/12] perf: more fixes Peter Zijlstra
2016-03-10 14:44   ` Vince Weaver
2016-03-11 14:23     ` Peter Zijlstra
2016-03-11 15:41       ` Borislav Petkov
2016-03-21  9:49       ` [tip:perf/urgent] perf/x86/ibs: Fix IBS throttle tip-bot for Peter Zijlstra
2016-03-15 15:38     ` [PATCH 00/12] perf: more fixes Peter Zijlstra
2016-03-16 22:59       ` Peter Zijlstra [this message]
2016-03-16 23:10         ` Peter Zijlstra
2016-03-17 11:54         ` Borislav Petkov
2016-03-11 10:12   ` Alexander Shishkin
2016-03-21  9:48   ` [tip:perf/urgent] perf/core: Fix the unthrottle logic tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160316225933.GS6375@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=andi@firstfloor.org \
    --cc=bp@alien8.de \
    --cc=dvyukov@google.com \
    --cc=eranian@google.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=panand@redhat.com \
    --cc=sasha.levin@oracle.com \
    --cc=vince@deater.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.