linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] perf/x86/intel: Factor out common code of PMI handler
@ 2018-08-06 17:23 kan.liang
  2018-08-06 17:23 ` [PATCH 2/3] x86, perf: Add a separate Arch Perfmon v4 " kan.liang
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: kan.liang @ 2018-08-06 17:23 UTC (permalink / raw)
  To: peterz, tglx, mingo, acme, linux-kernel
  Cc: eranian, ak, alexander.shishkin, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The Arch Perfmon v4 PMI handler is substantially different than
the older PMI handler. Instead of adding more and more ifs cleanly
fork the new handler into a new function, with the main common
code factored out into a common function.

Fix complaint from checkpatch.pl by removing "false" from "static bool
warned".

No functional change.

Based-on-code-from: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c | 109 ++++++++++++++++++++++++-------------------
 1 file changed, 60 insertions(+), 49 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 035c374..9b320a5 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2200,59 +2200,15 @@ static void intel_pmu_reset(void)
 	local_irq_restore(flags);
 }
 
-/*
- * This handler is triggered by the local APIC, so the APIC IRQ handling
- * rules apply:
- */
-static int intel_pmu_handle_irq(struct pt_regs *regs)
+static int handle_pmi_common(struct pt_regs *regs, u64 status)
 {
 	struct perf_sample_data data;
-	struct cpu_hw_events *cpuc;
-	int bit, loops;
-	u64 status;
-	int handled;
-	int pmu_enabled;
-
-	cpuc = this_cpu_ptr(&cpu_hw_events);
-
-	/*
-	 * Save the PMU state.
-	 * It needs to be restored when leaving the handler.
-	 */
-	pmu_enabled = cpuc->enabled;
-	/*
-	 * No known reason to not always do late ACK,
-	 * but just in case do it opt-in.
-	 */
-	if (!x86_pmu.late_ack)
-		apic_write(APIC_LVTPC, APIC_DM_NMI);
-	intel_bts_disable_local();
-	cpuc->enabled = 0;
-	__intel_pmu_disable_all();
-	handled = intel_pmu_drain_bts_buffer();
-	handled += intel_bts_interrupt();
-	status = intel_pmu_get_status();
-	if (!status)
-		goto done;
-
-	loops = 0;
-again:
-	intel_pmu_lbr_read();
-	intel_pmu_ack_status(status);
-	if (++loops > 100) {
-		static bool warned = false;
-		if (!warned) {
-			WARN(1, "perfevents: irq loop stuck!\n");
-			perf_event_print_debug();
-			warned = true;
-		}
-		intel_pmu_reset();
-		goto done;
-	}
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int bit;
+	int handled = 0;
 
 	inc_irq_stat(apic_perf_irqs);
 
-
 	/*
 	 * Ignore a range of extra bits in status that do not indicate
 	 * overflow by themselves.
@@ -2261,7 +2217,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 		    GLOBAL_STATUS_ASIF |
 		    GLOBAL_STATUS_LBRS_FROZEN);
 	if (!status)
-		goto done;
+		return 0;
 	/*
 	 * In case multiple PEBS events are sampled at the same time,
 	 * it is possible to have GLOBAL_STATUS bit 62 set indicating
@@ -2331,6 +2287,61 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
 			x86_pmu_stop(event, 0);
 	}
 
+	return handled;
+}
+
+/*
+ * This handler is triggered by the local APIC, so the APIC IRQ handling
+ * rules apply:
+ */
+static int intel_pmu_handle_irq(struct pt_regs *regs)
+{
+	struct cpu_hw_events *cpuc;
+	int loops;
+	u64 status;
+	int handled;
+	int pmu_enabled;
+
+	cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	/*
+	 * Save the PMU state.
+	 * It needs to be restored when leaving the handler.
+	 */
+	pmu_enabled = cpuc->enabled;
+	/*
+	 * No known reason to not always do late ACK,
+	 * but just in case do it opt-in.
+	 */
+	if (!x86_pmu.late_ack)
+		apic_write(APIC_LVTPC, APIC_DM_NMI);
+	intel_bts_disable_local();
+	cpuc->enabled = 0;
+	__intel_pmu_disable_all();
+	handled = intel_pmu_drain_bts_buffer();
+	handled += intel_bts_interrupt();
+	status = intel_pmu_get_status();
+	if (!status)
+		goto done;
+
+	loops = 0;
+again:
+	intel_pmu_lbr_read();
+	intel_pmu_ack_status(status);
+	if (++loops > 100) {
+		static bool warned;
+
+		if (!warned) {
+			WARN(1, "perfevents: irq loop stuck!\n");
+			perf_event_print_debug();
+			warned = true;
+		}
+		intel_pmu_reset();
+		goto done;
+	}
+
+	handled += handle_pmi_common(regs, status);
+
 	/*
 	 * Repeat if there is more work to be done:
 	 */
-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 2/3] x86, perf: Add a separate Arch Perfmon v4 PMI handler
  2018-08-06 17:23 [PATCH 1/3] perf/x86/intel: Factor out common code of PMI handler kan.liang
@ 2018-08-06 17:23 ` kan.liang
  2018-08-06 18:35   ` Peter Zijlstra
  2018-08-06 17:23 ` [PATCH 3/3] perf/x86/intel: Add quirk for Goldmont Plus kan.liang
  2018-08-06 18:20 ` [PATCH 1/3] perf/x86/intel: Factor out common code of PMI handler Peter Zijlstra
  2 siblings, 1 reply; 12+ messages in thread
From: kan.liang @ 2018-08-06 17:23 UTC (permalink / raw)
  To: peterz, tglx, mingo, acme, linux-kernel
  Cc: eranian, ak, alexander.shishkin, Kan Liang

From: Andi Kleen <ak@linux.intel.com>

Implements counter freezing for Arch Perfmon v4 (Skylake and
newer). This allows to speed up the PMI handler by avoiding
unnecessary MSR writes and make it more accurate.

The Arch Perfmon v4 PMI handler is substantially different than
the older PMI handler.

Differences to the old handler:
- It relies on counter freezing, which eliminates several MSR
writes from the PMI handler and lowers the overhead significantly.

It makes the PMI handler more accurate, as all counters get
frozen atomically as soon as any counter overflows. So there is
much less counting of the PMI handler itself.

With the freezing we don't need to disable or enable counters or
PEBS. Only BTS which does not support auto-freezing still needs to
be explicitly managed.

- The PMU acking is done at the end, not the beginning.
This makes it possible to avoid manual enabling/disabling
of the PMU, instead we just rely on the freezing/acking.

- The APIC is acked before reenabling the PMU, which avoids
problems with LBRs occasionally not getting unfreezed on Skylake.

- Looping is only needed to workaround a corner case which several PMIs
are very close to each other. For common cases, the counters are freezed
during PMI handler. It doesn't need to do re-check.

This patch
- Adds code to enable v4 counter freezing
- Fork <=v3 and >=v4 PMI handlers into separate functions.
- Add kernel parameter to disable counter freezing. It took some time to
  debug counter freezing, so in case there are new problems we added an
  option to turn it off. Would not expect this to be used until there
  are new bugs.
- Only for big core. The patch for small core will be posted later
  separately.

Performance:

When profiling a kernel build on Kabylake with different perf options,
measuring the length of all NMI handlers using the nmi handler
trace point:

V3 is without counter freezing.
V4 is with counter freezing.
The value is the average cost of the PMI handler.
(lower is better)

perf options    `           V3(ns) V4(ns)  delta
-c 100000                   1088   894     -18%
-g -c 100000                1862   1646    -12%
--call-graph lbr -c 100000  3649   3367    -8%
--c.g. dwarf -c 100000      2248   1982    -12%

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c     | 122 +++++++++++++++++++++++++++++++++++++++
 arch/x86/events/perf_event.h     |   6 ++
 arch/x86/include/asm/msr-index.h |   1 +
 3 files changed, 129 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 9b320a5..03778ce 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -1995,6 +1995,18 @@ static void intel_pmu_nhm_enable_all(int added)
 	intel_pmu_enable_all(added);
 }
 
+static void enable_counter_freeze(void)
+{
+	update_debugctlmsr(get_debugctlmsr() |
+			DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI);
+}
+
+static void disable_counter_freeze(void)
+{
+	update_debugctlmsr(get_debugctlmsr() &
+			~DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI);
+}
+
 static inline u64 intel_pmu_get_status(void)
 {
 	u64 status;
@@ -2044,6 +2056,14 @@ static void intel_pmu_disable_event(struct perf_event *event)
 	if (unlikely(event->attr.precise_ip))
 		intel_pmu_pebs_disable(event);
 
+	/*
+	 * We could disable freezing here, but doesn't hurt if it's on.
+	 * perf remembers the state, and someone else will likely
+	 * reinitialize.
+	 *
+	 * This avoids an extra MSR write in many situations.
+	 */
+
 	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
 		intel_pmu_disable_fixed(hwc);
 		return;
@@ -2119,6 +2139,11 @@ static void intel_pmu_enable_event(struct perf_event *event)
 	if (event->attr.exclude_guest)
 		cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx);
 
+	if (x86_pmu.counter_freezing && !cpuc->frozen_enabled) {
+		enable_counter_freeze();
+		cpuc->frozen_enabled = 1;
+	}
+
 	if (unlikely(event_is_checkpointed(event)))
 		cpuc->intel_cp_status |= (1ull << hwc->idx);
 
@@ -2290,6 +2315,87 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 	return handled;
 }
 
+static bool disable_counter_freezing;
+module_param(disable_counter_freezing, bool, 0444);
+MODULE_PARM_DESC(disable_counter_freezing, "Disable counter freezing feature."
+		"The PMI handler will fall back to generic handler."
+		"Default is false (enable counter freezing feature).");
+
+/*
+ * Simplified handler for Arch Perfmon v4:
+ * - We rely on counter freezing/unfreezing to enable/disable the PMU.
+ * This is done automatically on PMU ack.
+ * - Ack the PMU only after the APIC.
+ */
+
+static int intel_pmu_handle_irq_v4(struct pt_regs *regs)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	int handled = 0;
+	bool bts = false;
+	u64 status;
+	int pmu_enabled = cpuc->enabled;
+	int loops = 0;
+
+	/* PMU has been disabled because of counter freezing */
+	cpuc->enabled = 0;
+	if (test_bit(INTEL_PMC_IDX_FIXED_BTS, cpuc->active_mask)) {
+		bts = true;
+		intel_bts_disable_local();
+		handled = intel_pmu_drain_bts_buffer();
+		handled += intel_bts_interrupt();
+	}
+	status = intel_pmu_get_status();
+	if (!status)
+		goto done;
+again:
+	intel_pmu_lbr_read();
+	if (++loops > 100) {
+		static bool warned;
+
+		if (!warned) {
+			WARN(1, "perfevents: irq loop stuck!\n");
+			perf_event_print_debug();
+			warned = true;
+		}
+		intel_pmu_reset();
+		goto done;
+	}
+
+
+	handled += handle_pmi_common(regs, status);
+done:
+	/* Ack the PMI in the APIC */
+	apic_write(APIC_LVTPC, APIC_DM_NMI);
+
+	/*
+	 * Ack the PMU late after the APIC.  This avoids bogus
+	 * freezing on Skylake CPUs.  The acking unfreezes the PMU
+	 */
+	if (status) {
+		intel_pmu_ack_status(status);
+	} else {
+		/*
+		 * CPU may issues two PMIs very close to each other.
+		 * When the PMI handler services the first one, the
+		 * GLOBAL_STATUS is already updated to reflect both.
+		 * When it IRETs, the second PMI is immediately
+		 * handled and it sees clear status. At the meantime,
+		 * there may be a third PMI, because the freezing bit
+		 * isn't set since the ack in first PMI handlers.
+		 * Double check if there is more work to be done.
+		 */
+		status = intel_pmu_get_status();
+		if (status)
+			goto again;
+	}
+
+	if (bts)
+		intel_bts_enable_local();
+	cpuc->enabled = pmu_enabled;
+	return handled;
+}
+
 /*
  * This handler is triggered by the local APIC, so the APIC IRQ handling
  * rules apply:
@@ -3432,6 +3538,11 @@ static void intel_pmu_cpu_dying(int cpu)
 	free_excl_cntrs(cpu);
 
 	fini_debug_store_on_cpu(cpu);
+
+	if (cpuc->frozen_enabled) {
+		cpuc->frozen_enabled = 0;
+		disable_counter_freeze();
+	}
 }
 
 static void intel_pmu_sched_task(struct perf_event_context *ctx,
@@ -4325,6 +4436,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.extra_regs = intel_skl_extra_regs;
 		x86_pmu.pebs_aliases = intel_pebs_aliases_skl;
 		x86_pmu.pebs_prec_dist = true;
+		x86_pmu.counter_freezing = disable_counter_freezing ?
+					   false : true;
 		/* all extra regs are per-cpu when HT is on */
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 		x86_pmu.flags |= PMU_FL_NO_HT_SHARING;
@@ -4442,6 +4555,15 @@ __init int intel_pmu_init(void)
 		pr_cont("full-width counters, ");
 	}
 
+	/*
+	 * For arch perfmon 4 use counter freezing to avoid
+	 * several MSR accesses in the PMI.
+	 */
+	if (x86_pmu.counter_freezing) {
+		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
+		pr_cont("counter freezing, ");
+	}
+
 	kfree(to_free);
 	return 0;
 }
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 1562863..f9d153e 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -243,6 +243,11 @@ struct cpu_hw_events {
 	int excl_thread_id; /* 0 or 1 */
 
 	/*
+	 * Counter freezing state.
+	 */
+	int				frozen_enabled;
+
+	/*
 	 * AMD specific bits
 	 */
 	struct amd_nb			*amd_nb;
@@ -561,6 +566,7 @@ struct x86_pmu {
 	struct x86_pmu_quirk *quirks;
 	int		perfctr_second_write;
 	bool		late_ack;
+	bool		counter_freezing;
 	u64		(*limit_period)(struct perf_event *event, u64 l);
 
 	/*
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 68b2c31..4ae4a59 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -157,6 +157,7 @@
 #define DEBUGCTLMSR_BTS_OFF_OS		(1UL <<  9)
 #define DEBUGCTLMSR_BTS_OFF_USR		(1UL << 10)
 #define DEBUGCTLMSR_FREEZE_LBRS_ON_PMI	(1UL << 11)
+#define DEBUGCTLMSR_FREEZE_PERFMON_ON_PMI	(1UL << 12)
 #define DEBUGCTLMSR_FREEZE_IN_SMM_BIT	14
 #define DEBUGCTLMSR_FREEZE_IN_SMM	(1UL << DEBUGCTLMSR_FREEZE_IN_SMM_BIT)
 
-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 3/3] perf/x86/intel: Add quirk for Goldmont Plus
  2018-08-06 17:23 [PATCH 1/3] perf/x86/intel: Factor out common code of PMI handler kan.liang
  2018-08-06 17:23 ` [PATCH 2/3] x86, perf: Add a separate Arch Perfmon v4 " kan.liang
@ 2018-08-06 17:23 ` kan.liang
  2018-08-06 18:39   ` Peter Zijlstra
  2018-08-06 18:20 ` [PATCH 1/3] perf/x86/intel: Factor out common code of PMI handler Peter Zijlstra
  2 siblings, 1 reply; 12+ messages in thread
From: kan.liang @ 2018-08-06 17:23 UTC (permalink / raw)
  To: peterz, tglx, mingo, acme, linux-kernel
  Cc: eranian, ak, alexander.shishkin, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

A ucode patch is needed for Goldmont Plus while counter freezing feature
is enabled. Otherwise, there will be some issues, e.g. PMI flood with
some events.

Add a quirk to check microcode version. Only enable counter freezing
feature on the machine with ucode patch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/core.c | 55 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 03778ce..10ab9a1 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3847,6 +3847,58 @@ static __init void intel_nehalem_quirk(void)
 	}
 }
 
+static bool intel_glk_counter_freezing_broken(int cpu)
+{
+	u32 rev = UINT_MAX; /* default to broken for unknown stepping */
+
+	switch (cpu_data(cpu).x86_stepping) {
+	case 1:
+		rev = 0x28;
+		break;
+	case 8:
+		rev = 0x6;
+		break;
+	}
+
+	return (cpu_data(cpu).microcode < rev);
+}
+
+static void intel_glk_check_microcode(void)
+{
+	bool counter_freezing_broken = false;
+	int cpu;
+
+	if (disable_counter_freezing)
+		return;
+
+	for_each_online_cpu(cpu) {
+		counter_freezing_broken = intel_glk_counter_freezing_broken(cpu);
+		if (counter_freezing_broken)
+			break;
+	}
+
+	if (counter_freezing_broken == !x86_pmu.counter_freezing)
+		return;
+
+	if (x86_pmu.counter_freezing) {
+		pr_info("PMU counter freezing disabled due to CPU errata, please upgrade microcode\n");
+		x86_pmu.counter_freezing = false;
+		x86_pmu.handle_irq = intel_pmu_handle_irq;
+	} else {
+		pr_info("PMU counter freezing enabled due to microcode update\n");
+		x86_pmu.counter_freezing = true;
+		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
+	}
+}
+
+static __init void intel_counter_freezing_quirk(void)
+{
+	x86_pmu.check_microcode = intel_glk_check_microcode;
+	cpus_read_lock();
+	intel_glk_check_microcode();
+	cpus_read_unlock();
+}
+
 /*
  * enable software workaround for errata:
  * SNB: BJ122
@@ -4191,6 +4243,7 @@ __init int intel_pmu_init(void)
 		break;
 
 	case INTEL_FAM6_ATOM_GEMINI_LAKE:
+		x86_add_quirk(intel_counter_freezing_quirk);
 		memcpy(hw_cache_event_ids, glp_hw_cache_event_ids,
 		       sizeof(hw_cache_event_ids));
 		memcpy(hw_cache_extra_regs, glp_hw_cache_extra_regs,
@@ -4207,6 +4260,8 @@ __init int intel_pmu_init(void)
 		x86_pmu.pebs_aliases = NULL;
 		x86_pmu.pebs_prec_dist = true;
 		x86_pmu.lbr_pt_coexist = true;
+		x86_pmu.counter_freezing = disable_counter_freezing ?
+					   false : true;
 		x86_pmu.flags |= PMU_FL_HAS_RSP_1;
 		x86_pmu.flags |= PMU_FL_PEBS_ALL;
 		x86_pmu.get_event_constraints = glp_get_event_constraints;
-- 
2.7.4


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] perf/x86/intel: Factor out common code of PMI handler
  2018-08-06 17:23 [PATCH 1/3] perf/x86/intel: Factor out common code of PMI handler kan.liang
  2018-08-06 17:23 ` [PATCH 2/3] x86, perf: Add a separate Arch Perfmon v4 " kan.liang
  2018-08-06 17:23 ` [PATCH 3/3] perf/x86/intel: Add quirk for Goldmont Plus kan.liang
@ 2018-08-06 18:20 ` Peter Zijlstra
  2018-08-07 15:29   ` Liang, Kan
  2 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2018-08-06 18:20 UTC (permalink / raw)
  To: kan.liang
  Cc: tglx, mingo, acme, linux-kernel, eranian, ak, alexander.shishkin

On Mon, Aug 06, 2018 at 10:23:41AM -0700, kan.liang@linux.intel.com wrote:
> +	if (++loops > 100) {
> +		static bool warned;
> +
> +		if (!warned) {
> +			WARN(1, "perfevents: irq loop stuck!\n");
> +			perf_event_print_debug();
> +			warned = true;
> +		}

Bah, that really reads like we want WARN_ONCE(), except for that
perf_event_print_debug() thing :/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] x86, perf: Add a separate Arch Perfmon v4 PMI handler
  2018-08-06 17:23 ` [PATCH 2/3] x86, perf: Add a separate Arch Perfmon v4 " kan.liang
@ 2018-08-06 18:35   ` Peter Zijlstra
  2018-08-06 21:33     ` Andi Kleen
  2018-08-07 15:29     ` Liang, Kan
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2018-08-06 18:35 UTC (permalink / raw)
  To: kan.liang
  Cc: tglx, mingo, acme, linux-kernel, eranian, ak, alexander.shishkin

On Mon, Aug 06, 2018 at 10:23:42AM -0700, kan.liang@linux.intel.com wrote:
> @@ -2044,6 +2056,14 @@ static void intel_pmu_disable_event(struct perf_event *event)
>  	if (unlikely(event->attr.precise_ip))
>  		intel_pmu_pebs_disable(event);
>  
> +	/*
> +	 * We could disable freezing here, but doesn't hurt if it's on.
> +	 * perf remembers the state, and someone else will likely
> +	 * reinitialize.
> +	 *
> +	 * This avoids an extra MSR write in many situations.
> +	 */
> +
>  	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
>  		intel_pmu_disable_fixed(hwc);
>  		return;
> @@ -2119,6 +2139,11 @@ static void intel_pmu_enable_event(struct perf_event *event)
>  	if (event->attr.exclude_guest)
>  		cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx);
>  
> +	if (x86_pmu.counter_freezing && !cpuc->frozen_enabled) {
> +		enable_counter_freeze();
> +		cpuc->frozen_enabled = 1;
> +	}
> +
>  	if (unlikely(event_is_checkpointed(event)))
>  		cpuc->intel_cp_status |= (1ull << hwc->idx);
>  

Why here? That doesn't really make sense; should this not be in
intel_pmu_cpu_starting() or something?

> +static bool disable_counter_freezing;
> +module_param(disable_counter_freezing, bool, 0444);
> +MODULE_PARM_DESC(disable_counter_freezing, "Disable counter freezing feature."
> +		"The PMI handler will fall back to generic handler."
> +		"Default is false (enable counter freezing feature).");

Why?

> +	/*
> +	 * Ack the PMU late after the APIC.  This avoids bogus

That doesn't make sense. PMU and APIC do not have order.

> +	 * freezing on Skylake CPUs.  The acking unfreezes the PMU
> +	 */
> +	if (status) {
> +		intel_pmu_ack_status(status);
> +	} else {
> +		/*
> +		 * CPU may issues two PMIs very close to each other.
> +		 * When the PMI handler services the first one, the
> +		 * GLOBAL_STATUS is already updated to reflect both.
> +		 * When it IRETs, the second PMI is immediately
> +		 * handled and it sees clear status. At the meantime,
> +		 * there may be a third PMI, because the freezing bit
> +		 * isn't set since the ack in first PMI handlers.
> +		 * Double check if there is more work to be done.
> +		 */

Urgh... fun fun fun.

> +		status = intel_pmu_get_status();
> +		if (status)
> +			goto again;
> +	}
> +
> +	if (bts)
> +		intel_bts_enable_local();
> +	cpuc->enabled = pmu_enabled;
> +	return handled;
> +}

> @@ -3432,6 +3538,11 @@ static void intel_pmu_cpu_dying(int cpu)
>  	free_excl_cntrs(cpu);
>  
>  	fini_debug_store_on_cpu(cpu);
> +
> +	if (cpuc->frozen_enabled) {
> +		cpuc->frozen_enabled = 0;
> +		disable_counter_freeze();
> +	}
>  }

See, you have the dying thing, so why not the matching starting thing.

> @@ -4442,6 +4555,15 @@ __init int intel_pmu_init(void)
>  		pr_cont("full-width counters, ");
>  	}
>  
> +	/*
> +	 * For arch perfmon 4 use counter freezing to avoid
> +	 * several MSR accesses in the PMI.
> +	 */
> +	if (x86_pmu.counter_freezing) {
> +		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
> +		pr_cont("counter freezing, ");
> +	}

Lets not print the counter freezing, we already print v4, right?


> @@ -561,6 +566,7 @@ struct x86_pmu {
>  	struct x86_pmu_quirk *quirks;
>  	int		perfctr_second_write;
>  	bool		late_ack;
> +	bool		counter_freezing;

Please make the both of them int or something.

>  	u64		(*limit_period)(struct perf_event *event, u64 l);
>  
>  	/*

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] perf/x86/intel: Add quirk for Goldmont Plus
  2018-08-06 17:23 ` [PATCH 3/3] perf/x86/intel: Add quirk for Goldmont Plus kan.liang
@ 2018-08-06 18:39   ` Peter Zijlstra
  2018-08-07 15:30     ` Liang, Kan
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2018-08-06 18:39 UTC (permalink / raw)
  To: kan.liang
  Cc: tglx, mingo, acme, linux-kernel, eranian, ak, alexander.shishkin

On Mon, Aug 06, 2018 at 10:23:43AM -0700, kan.liang@linux.intel.com wrote:

> +static bool intel_glk_counter_freezing_broken(int cpu)

>  	case INTEL_FAM6_ATOM_GEMINI_LAKE:
> +		x86_add_quirk(intel_counter_freezing_quirk);

We really should fix that ATOM naming instead of making it worse :/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] x86, perf: Add a separate Arch Perfmon v4 PMI handler
  2018-08-06 18:35   ` Peter Zijlstra
@ 2018-08-06 21:33     ` Andi Kleen
  2018-08-06 21:50       ` Peter Zijlstra
  2018-08-07 15:29     ` Liang, Kan
  1 sibling, 1 reply; 12+ messages in thread
From: Andi Kleen @ 2018-08-06 21:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: kan.liang, tglx, mingo, acme, linux-kernel, eranian, alexander.shishkin

On Mon, Aug 06, 2018 at 08:35:15PM +0200, Peter Zijlstra wrote:
> > +static bool disable_counter_freezing;
> > +module_param(disable_counter_freezing, bool, 0444);
> > +MODULE_PARM_DESC(disable_counter_freezing, "Disable counter freezing feature."
> > +		"The PMI handler will fall back to generic handler."
> > +		"Default is false (enable counter freezing feature).");
> 
> Why?

See the description. Counter freezing took some time to stabilize,
so it seemed better to have a knob to ask users to try in case
there are more problems.
> 
> > +	/*
> > +	 * Ack the PMU late after the APIC.  This avoids bogus
> 
> > +	 * freezing on Skylake CPUs.  The acking unfreezes the PMU
> > +	 */
> That doesn't make sense. PMU and APIC do not have order.> 

It makes a difference for the hardware.

> > +	/*
> > +	 * For arch perfmon 4 use counter freezing to avoid
> > +	 * several MSR accesses in the PMI.
> > +	 */
> > +	if (x86_pmu.counter_freezing) {
> > +		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
> > +		pr_cont("counter freezing, ");
> > +	}
> 
> Lets not print the counter freezing, we already print v4, right?

I find it useful to see that the kernel has the support, otherwise
you would need to look at the version number, but it gets difficult
with backports. This is another paranoia bit, in case there
are problems.

> > @@ -561,6 +566,7 @@ struct x86_pmu {
> >  	struct x86_pmu_quirk *quirks;
> >  	int		perfctr_second_write;
> >  	bool		late_ack;
> > +	bool		counter_freezing;
> 
> Please make the both of them int or something.

That would make them bigger for no reason?

-Andi


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] x86, perf: Add a separate Arch Perfmon v4 PMI handler
  2018-08-06 21:33     ` Andi Kleen
@ 2018-08-06 21:50       ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2018-08-06 21:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kan.liang, tglx, mingo, acme, linux-kernel, eranian, alexander.shishkin

On Mon, Aug 06, 2018 at 02:33:23PM -0700, Andi Kleen wrote:
> On Mon, Aug 06, 2018 at 08:35:15PM +0200, Peter Zijlstra wrote:
> > > +static bool disable_counter_freezing;
> > > +module_param(disable_counter_freezing, bool, 0444);
> > > +MODULE_PARM_DESC(disable_counter_freezing, "Disable counter freezing feature."
> > > +		"The PMI handler will fall back to generic handler."
> > > +		"Default is false (enable counter freezing feature).");
> > 
> > Why?
> 
> See the description. Counter freezing took some time to stabilize,
> so it seemed better to have a knob to ask users to try in case
> there are more problems.

But it is not a module.. did you want early_param() or __setup()?

> > > +	/*
> > > +	 * Ack the PMU late after the APIC.  This avoids bogus
> > 
> > > +	 * freezing on Skylake CPUs.  The acking unfreezes the PMU
> > > +	 */
> > That doesn't make sense. PMU and APIC do not have order.> 
> 
> It makes a difference for the hardware.

I still have no clue what it wants to say.

> > > +	/*
> > > +	 * For arch perfmon 4 use counter freezing to avoid
> > > +	 * several MSR accesses in the PMI.
> > > +	 */
> > > +	if (x86_pmu.counter_freezing) {
> > > +		x86_pmu.handle_irq = intel_pmu_handle_irq_v4;
> > > +		pr_cont("counter freezing, ");
> > > +	}
> > 
> > Lets not print the counter freezing, we already print v4, right?
> 
> I find it useful to see that the kernel has the support, otherwise
> you would need to look at the version number, but it gets difficult
> with backports. This is another paranoia bit, in case there
> are problems.

That line will get ver long if we keep adding every dinky bit to it.

> > > @@ -561,6 +566,7 @@ struct x86_pmu {
> > >  	struct x86_pmu_quirk *quirks;
> > >  	int		perfctr_second_write;
> > >  	bool		late_ack;
> > > +	bool		counter_freezing;
> > 
> > Please make the both of them int or something.
> 
> That would make them bigger for no reason?

Then use u8 or something, I just don't much like _Bool in composite
types.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/3] perf/x86/intel: Factor out common code of PMI handler
  2018-08-06 18:20 ` [PATCH 1/3] perf/x86/intel: Factor out common code of PMI handler Peter Zijlstra
@ 2018-08-07 15:29   ` Liang, Kan
  0 siblings, 0 replies; 12+ messages in thread
From: Liang, Kan @ 2018-08-07 15:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, acme, linux-kernel, eranian, ak, alexander.shishkin



On 8/6/2018 2:20 PM, Peter Zijlstra wrote:
> On Mon, Aug 06, 2018 at 10:23:41AM -0700, kan.liang@linux.intel.com wrote:
>> +	if (++loops > 100) {
>> +		static bool warned;
>> +
>> +		if (!warned) {
>> +			WARN(1, "perfevents: irq loop stuck!\n");
>> +			perf_event_print_debug();
>> +			warned = true;
>> +		}
> 
> Bah, that really reads like we want WARN_ONCE(), except for that
> perf_event_print_debug() thing :/
> 

Yes. I went though the log. To make the WARN text pair with 
perf_event_print_debug(), we open-coded WARN_ONCE()'s
one-time-only logic here since commit ae0def05ed85 ("perf/x86: Only 
print PMU state when also WARN()'ing")

I think I will still keep the logic here and just fix the complaint from 
checkpatch.pl.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] x86, perf: Add a separate Arch Perfmon v4 PMI handler
  2018-08-06 18:35   ` Peter Zijlstra
  2018-08-06 21:33     ` Andi Kleen
@ 2018-08-07 15:29     ` Liang, Kan
  2018-08-07 17:31       ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Liang, Kan @ 2018-08-07 15:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, acme, linux-kernel, eranian, ak, alexander.shishkin



On 8/6/2018 2:35 PM, Peter Zijlstra wrote:
> On Mon, Aug 06, 2018 at 10:23:42AM -0700, kan.liang@linux.intel.com wrote:
>> @@ -2044,6 +2056,14 @@ static void intel_pmu_disable_event(struct perf_event *event)
>>   	if (unlikely(event->attr.precise_ip))
>>   		intel_pmu_pebs_disable(event);
>>   
>> +	/*
>> +	 * We could disable freezing here, but doesn't hurt if it's on.
>> +	 * perf remembers the state, and someone else will likely
>> +	 * reinitialize.
>> +	 *
>> +	 * This avoids an extra MSR write in many situations.
>> +	 */
>> +
>>   	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
>>   		intel_pmu_disable_fixed(hwc);
>>   		return;
>> @@ -2119,6 +2139,11 @@ static void intel_pmu_enable_event(struct perf_event *event)
>>   	if (event->attr.exclude_guest)
>>   		cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx);
>>   
>> +	if (x86_pmu.counter_freezing && !cpuc->frozen_enabled) {
>> +		enable_counter_freeze();
>> +		cpuc->frozen_enabled = 1;
>> +	}
>> +
>>   	if (unlikely(event_is_checkpointed(event)))
>>   		cpuc->intel_cp_status |= (1ull << hwc->idx);
>>   
> 
> Why here? That doesn't really make sense; should this not be in
> intel_pmu_cpu_starting() or something?


For Goldmont Plus, the counter freezing feature can be re-enabled at 
run-time by loading a newer microcode.
We need to check the x86_pmu.counter_freezing every time.


> 
>> +static bool disable_counter_freezing;
>> +module_param(disable_counter_freezing, bool, 0444);
>> +MODULE_PARM_DESC(disable_counter_freezing, "Disable counter freezing feature."
>> +		"The PMI handler will fall back to generic handler."
>> +		"Default is false (enable counter freezing feature).");
> 
> Why?
> 
>> +	/*
>> +	 * Ack the PMU late after the APIC.  This avoids bogus
> 
> That doesn't make sense. PMU and APIC do not have order.
>

The moment FROZEN bit is cleared, counters start counting immediately.
Here, we try to make it as close as possible to IRET.


Thanks,
Kan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/3] perf/x86/intel: Add quirk for Goldmont Plus
  2018-08-06 18:39   ` Peter Zijlstra
@ 2018-08-07 15:30     ` Liang, Kan
  0 siblings, 0 replies; 12+ messages in thread
From: Liang, Kan @ 2018-08-07 15:30 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: tglx, mingo, acme, linux-kernel, eranian, ak, alexander.shishkin



On 8/6/2018 2:39 PM, Peter Zijlstra wrote:
> On Mon, Aug 06, 2018 at 10:23:43AM -0700, kan.liang@linux.intel.com wrote:
> 
>> +static bool intel_glk_counter_freezing_broken(int cpu)
> 
>>   	case INTEL_FAM6_ATOM_GEMINI_LAKE:
>> +		x86_add_quirk(intel_counter_freezing_quirk);
> 
> We really should fix that ATOM naming instead of making it worse :/
> 

OK. I will send a separate patch to fix it.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/3] x86, perf: Add a separate Arch Perfmon v4 PMI handler
  2018-08-07 15:29     ` Liang, Kan
@ 2018-08-07 17:31       ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2018-08-07 17:31 UTC (permalink / raw)
  To: Liang, Kan
  Cc: tglx, mingo, acme, linux-kernel, eranian, ak, alexander.shishkin

On Tue, Aug 07, 2018 at 11:29:54AM -0400, Liang, Kan wrote:
> On 8/6/2018 2:35 PM, Peter Zijlstra wrote:
> > On Mon, Aug 06, 2018 at 10:23:42AM -0700, kan.liang@linux.intel.com wrote:
> > > @@ -2044,6 +2056,14 @@ static void intel_pmu_disable_event(struct perf_event *event)
> > >   	if (unlikely(event->attr.precise_ip))
> > >   		intel_pmu_pebs_disable(event);
> > > +	/*
> > > +	 * We could disable freezing here, but doesn't hurt if it's on.
> > > +	 * perf remembers the state, and someone else will likely
> > > +	 * reinitialize.
> > > +	 *
> > > +	 * This avoids an extra MSR write in many situations.
> > > +	 */
> > > +
> > >   	if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) {
> > >   		intel_pmu_disable_fixed(hwc);
> > >   		return;
> > > @@ -2119,6 +2139,11 @@ static void intel_pmu_enable_event(struct perf_event *event)
> > >   	if (event->attr.exclude_guest)
> > >   		cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx);
> > > +	if (x86_pmu.counter_freezing && !cpuc->frozen_enabled) {
> > > +		enable_counter_freeze();
> > > +		cpuc->frozen_enabled = 1;
> > > +	}
> > > +
> > >   	if (unlikely(event_is_checkpointed(event)))
> > >   		cpuc->intel_cp_status |= (1ull << hwc->idx);
> > 
> > Why here? That doesn't really make sense; should this not be in
> > intel_pmu_cpu_starting() or something?
> 
> 
> For Goldmont Plus, the counter freezing feature can be re-enabled at
> run-time by loading a newer microcode.
> We need to check the x86_pmu.counter_freezing every time.

Blergh, just don't go there. If we start with the wrong ucode, leave it
disabled.

We do that for most ucode stuff.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-08-07 17:31 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-06 17:23 [PATCH 1/3] perf/x86/intel: Factor out common code of PMI handler kan.liang
2018-08-06 17:23 ` [PATCH 2/3] x86, perf: Add a separate Arch Perfmon v4 " kan.liang
2018-08-06 18:35   ` Peter Zijlstra
2018-08-06 21:33     ` Andi Kleen
2018-08-06 21:50       ` Peter Zijlstra
2018-08-07 15:29     ` Liang, Kan
2018-08-07 17:31       ` Peter Zijlstra
2018-08-06 17:23 ` [PATCH 3/3] perf/x86/intel: Add quirk for Goldmont Plus kan.liang
2018-08-06 18:39   ` Peter Zijlstra
2018-08-07 15:30     ` Liang, Kan
2018-08-06 18:20 ` [PATCH 1/3] perf/x86/intel: Factor out common code of PMI handler Peter Zijlstra
2018-08-07 15:29   ` Liang, Kan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).