linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: peterz@infradead.org
Cc: linux-kernel@vger.kernel.org, Andi Kleen <ak@linux.intel.com>
Subject: [PATCH 1/4] x86, perf: Use a new PMU ack sequence on Skylake
Date: Thu, 15 Oct 2015 16:37:57 -0700	[thread overview]
Message-ID: <1444952280-24184-2-git-send-email-andi@firstfloor.org> (raw)
In-Reply-To: <1444952280-24184-1-git-send-email-andi@firstfloor.org>

From: Andi Kleen <ak@linux.intel.com>

The SKL PMU code had a problem with LBR freezing. When a counter
overflows already in the PMI handler, the LBR would be frozen
early and not be unfrozen until the next PMI. This means we would
get stale LBR information.

Depending on the workload this could happen for a few percent
of the PMIs for cycles in adaptive frequency mode, because the frequency
algorithm regularly goes down to very low periods.

This patch implements a new PMU ack sequence that avoids this problem.
The new sequence is:

- (counters are disabled with GLOBAL_CTRL)
There should be no further increments of the counters by later instructions; and
thus no additional PMI (and thus no additional freezing).

- ack the APIC

Clear the APIC PMI LVT entry so that any later interrupt is delivered and is
not lost due to the PMI LVT entry being masked. A lost PMI interrupt could lead to
LBRs staying frozen without entering the PMI handler

- Ack the PMU counters. This unfreezes the LBRs on Skylake (but not
on earlier CPUs which rely on DEBUGCTL writes for this)

- Reenable counters

The WRMSR will start the counters counting again (and will be ordered after the
APIC LVT PMI entry write since WRMSR is architecturally serializing). Since the
APIC PMI LVT is unmasked, any PMI which is caused by these perfmon counters
will trigger an NMI (but the delivery may be delayed until after the next
IRET)

One side effect is that the old retry loop is not possible anymore,
as the counters stay unacked for the majority of the PMI handler,
but that is not a big loss, as "profiling" the PMI was always
a bit dubious. For the old ack sequence it is still supported.

In principle the sequence should work on other CPUs too, but
since I only tested on Skylake it is only enabled there.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 arch/x86/kernel/cpu/perf_event.h       |  1 +
 arch/x86/kernel/cpu/perf_event_intel.c | 35 ++++++++++++++++++++++++++--------
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index 499f533..fcf01c7 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h
@@ -551,6 +551,7 @@ struct x86_pmu {
 	struct x86_pmu_quirk *quirks;
 	int		perfctr_second_write;
 	bool		late_ack;
+	bool		status_ack_after_apic;
 	unsigned	(*limit_period)(struct perf_event *event, unsigned l);
 
 	/*
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index f63360b..69a545e 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1789,6 +1789,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	struct cpu_hw_events *cpuc;
 	int bit, loops;
 	u64 status;
+	u64 orig_status;
 	int handled;
 
 	cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -1803,13 +1804,16 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 	handled = intel_pmu_drain_bts_buffer();
 	handled += intel_bts_interrupt();
 	status = intel_pmu_get_status();
+	orig_status = status;
 	if (!status)
 		goto done;
 
 	loops = 0;
 again:
 	intel_pmu_lbr_read();
-	intel_pmu_ack_status(status);
+	if (!x86_pmu.status_ack_after_apic)
+		__intel_pmu_enable_all(0, true);
+
 	if (++loops > 100) {
 		static bool warned = false;
 		if (!warned) {
@@ -1877,15 +1881,20 @@ again:
 			x86_pmu_stop(event, 0);
 	}
 
-	/*
-	 * Repeat if there is more work to be done:
-	 */
-	status = intel_pmu_get_status();
-	if (status)
-		goto again;
+
+	if (!x86_pmu.status_ack_after_apic) {
+		/*
+		 * Repeat if there is more work to be done:
+		 */
+		status = intel_pmu_get_status();
+		if (status)
+			goto again;
+	}
 
 done:
-	__intel_pmu_enable_all(0, true);
+	if (!x86_pmu.status_ack_after_apic)
+		__intel_pmu_enable_all(0, true);
+
 	/*
 	 * Only unmask the NMI after the overflow counters
 	 * have been reset. This avoids spurious NMIs on
@@ -1893,6 +1902,15 @@ done:
 	 */
 	if (x86_pmu.late_ack)
 		apic_write(APIC_LVTPC, APIC_DM_NMI);
+
+	/*
+	 * Ack the PMU late. This avoids bogus freezing
+	 * on Skylake CPUs.
+	 */
+	if (x86_pmu.status_ack_after_apic) {
+		intel_pmu_ack_status(orig_status);
+		__intel_pmu_enable_all(0, true);
+	}
 	return handled;
 }
 
@@ -3514,6 +3532,7 @@ __init int intel_pmu_init(void)
 	case 78: /* 14nm Skylake Mobile */
 	case 94: /* 14nm Skylake Desktop */
 		x86_pmu.late_ack = true;
+		x86_pmu.status_ack_after_apic = true;
 		memcpy(hw_cache_event_ids, skl_hw_cache_event_ids, sizeof(hw_cache_event_ids));
 		memcpy(hw_cache_extra_regs, skl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
 		intel_pmu_lbr_init_skl();
-- 
2.4.3


  reply	other threads:[~2015-10-15 23:38 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-15 23:37 perf: Some improvements for Skylake perf Andi Kleen
2015-10-15 23:37 ` Andi Kleen [this message]
2015-10-16 11:51   ` [PATCH 1/4] x86, perf: Use a new PMU ack sequence on Skylake Peter Zijlstra
2015-10-16 13:35     ` Andi Kleen
2015-10-16 15:00       ` Peter Zijlstra
2015-10-16 16:14         ` Mike Galbraith
2015-10-19  7:08         ` Ingo Molnar
2015-10-15 23:37 ` [PATCH 2/4] x86, perf: Factor out BTS enable/disable functions Andi Kleen
2015-10-15 23:37 ` [PATCH 3/4] perf, x86: Use counter freezing with Arch Perfmon v4 Andi Kleen
2015-10-15 23:38 ` [PATCH 4/4] x86, perf: Use INST_RETIRED.PREC_DIST for cycles:pp on Skylake Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1444952280-24184-2-git-send-email-andi@firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=ak@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).