linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V5 1/4] x86/cpufeature: Add facility to check for min microcode revisions
@ 2019-01-07 22:34 kan.liang
  2019-01-07 22:34 ` [PATCH V5 2/4] perf/x86/kvm: Avoid unnecessary work in guest filtering kan.liang
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: kan.liang @ 2019-01-07 22:34 UTC (permalink / raw)
  To: x86, linux-kernel, tglx, bp, peterz, mingo; +Cc: ak, eranian, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

For bug workarounds or checks it is useful to check for specific
microcode revisions.

Add a new generic function to match the CPU with stepping.
Add the other function to check the min microcode revisions for
the matched CPU.
A new table format is introduced to facilitate the quirk to
fill the related information.

This does not change the existing x86_cpu_id because it's an ABI
shared with modules, and also has quite different requirements,
as in no wildcards, but everything has to be matched exactly.

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Based-on-code-from: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V4:
- Rename to x86_cpu_check and INTEL_CHECK_UCODE
- Split x86_min_microcode() into two functions. One is a generic
  CPU match function with stepping. The other is to check the min
  microcode revisions.

The previous discussion can be found here.
https://lkml.org/lkml/2018/10/10/740

 arch/x86/include/asm/cpu_device_id.h | 28 ++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/match.c          | 31 +++++++++++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/arch/x86/include/asm/cpu_device_id.h b/arch/x86/include/asm/cpu_device_id.h
index baeba05..43f19af 100644
--- a/arch/x86/include/asm/cpu_device_id.h
+++ b/arch/x86/include/asm/cpu_device_id.h
@@ -11,4 +11,32 @@
 
 extern const struct x86_cpu_id *x86_match_cpu(const struct x86_cpu_id *match);
 
+/*
+ * Match specific microcode revisions.
+ *
+ * vendor/family/model/stepping must be all set.
+ *
+ * only checks against the boot cpu.  When mixed-stepping configs are
+ * valid for a CPU model, add a quirk for every valid stepping and
+ * do the fine-tuning in the quirk handler.
+ */
+
+struct x86_cpu_check {
+	u8	vendor;
+	u8	family;
+	u8	model;
+	u8	stepping;
+	u32	microcode_rev;
+};
+
+#define INTEL_CHECK_UCODE(mod, step, rev) {			\
+	.vendor = X86_VENDOR_INTEL,				\
+	.family = 6,						\
+	.model = mod,						\
+	.stepping = step,					\
+	.microcode_rev = rev,					\
+}
+
+extern bool x86_cpu_has_min_microcode_rev(const struct x86_cpu_check *table);
+
 #endif
diff --git a/arch/x86/kernel/cpu/match.c b/arch/x86/kernel/cpu/match.c
index 3fed388..408bed37 100644
--- a/arch/x86/kernel/cpu/match.c
+++ b/arch/x86/kernel/cpu/match.c
@@ -48,3 +48,34 @@ const struct x86_cpu_id *x86_match_cpu(const struct x86_cpu_id *match)
 	return NULL;
 }
 EXPORT_SYMBOL(x86_match_cpu);
+
+static const struct x86_cpu_check *
+x86_match_cpu_with_stepping(const struct x86_cpu_check *match)
+{
+	struct cpuinfo_x86 *c = &boot_cpu_data;
+	const struct x86_cpu_check *m;
+
+	for (m = match; m->family | m->model; m++) {
+		if (c->x86_vendor != m->vendor)
+			continue;
+		if (c->x86 != m->family)
+			continue;
+		if (c->x86_model != m->model)
+			continue;
+		if (c->x86_stepping != m->stepping)
+			continue;
+		return m;
+	}
+	return NULL;
+}
+
+bool x86_cpu_has_min_microcode_rev(const struct x86_cpu_check *table)
+{
+	const struct x86_cpu_check *res = x86_match_cpu_with_stepping(table);
+
+	if (!res || res->microcode_rev > boot_cpu_data.microcode)
+		return false;
+
+	return true;
+}
+EXPORT_SYMBOL(x86_cpu_has_min_microcode_rev);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V5 2/4] perf/x86/kvm: Avoid unnecessary work in guest filtering
  2019-01-07 22:34 [PATCH V5 1/4] x86/cpufeature: Add facility to check for min microcode revisions kan.liang
@ 2019-01-07 22:34 ` kan.liang
  2019-01-16 13:14   ` Borislav Petkov
  2019-01-07 22:34 ` [PATCH V5 3/4] perf/x86/intel: Clean up counter freezing quirk kan.liang
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 7+ messages in thread
From: kan.liang @ 2019-01-07 22:34 UTC (permalink / raw)
  To: x86, linux-kernel, tglx, bp, peterz, mingo; +Cc: ak, eranian

From: Andi Kleen <ak@linux.intel.com>

KVM added a workaround for PEBS events leaking into guests with
commit 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.")
This uses the VT entry/exit list to add an extra disable of the
PEBS_ENABLE MSR.

Intel also added a fix for this issue to microcode updates on
Haswell/Broadwell/Skylake.

It turns out using the MSR entry/exit list makes VM exits
significantly slower. The list is only needed for disabling
PEBS, because the GLOBAL_CTRL change gets optimized by
KVM into changing the VMCS.

Check for the microcode updates that have the microcode
fix for leaking PEBS, and disable the extra entry/exit list
entry for PEBS_ENABLE. In addition we always clear the
GLOBAL_CTRL for the PEBS counter while running in the guest,
which is enough to make them never fire at the wrong
side of the host/guest transition.

We see significantly reduced overhead for VM exits with the
filtering active with the patch from 8% to 4%.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---

Changes since V4:
- Use new name x86_cpu_has_min_microcode_rev() and INTEL_CHECK_UCODE

 arch/x86/events/intel/core.c | 80 +++++++++++++++++++++++++++++++++++++++-----
 arch/x86/events/perf_event.h |  3 +-
 2 files changed, 73 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index ecc3e34..587d83e 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -18,6 +18,7 @@
 #include <asm/hardirq.h>
 #include <asm/intel-family.h>
 #include <asm/apic.h>
+#include <asm/cpu_device_id.h>
 
 #include "../perf_event.h"
 
@@ -3206,16 +3207,27 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
 	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
 	arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
 	arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
-	/*
-	 * If PMU counter has PEBS enabled it is not enough to disable counter
-	 * on a guest entry since PEBS memory write can overshoot guest entry
-	 * and corrupt guest memory. Disabling PEBS solves the problem.
-	 */
-	arr[1].msr = MSR_IA32_PEBS_ENABLE;
-	arr[1].host = cpuc->pebs_enabled;
-	arr[1].guest = 0;
+	if (x86_pmu.flags & PMU_FL_PEBS_ALL)
+		arr[0].guest &= ~cpuc->pebs_enabled;
+	else
+		arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
+	*nr = 1;
+
+	if (!x86_pmu.pebs_isolated) {
+		/*
+		 * If PMU counter has PEBS enabled it is not enough to
+		 * disable counter on a guest entry since PEBS memory
+		 * write can overshoot guest entry and corrupt guest
+		 * memory. Disabling PEBS solves the problem.
+		 *
+		 * Don't do this if the CPU already enforces it.
+		 */
+		arr[1].msr = MSR_IA32_PEBS_ENABLE;
+		arr[1].host = cpuc->pebs_enabled;
+		arr[1].guest = 0;
+		*nr = 2;
+	}
 
-	*nr = 2;
 	return arr;
 }
 
@@ -3733,6 +3745,45 @@ static __init void intel_clovertown_quirk(void)
 	x86_pmu.pebs_constraints = NULL;
 }
 
+static const struct x86_cpu_check isolation_ucodes[] = {
+	INTEL_CHECK_UCODE(INTEL_FAM6_HASWELL_CORE,	 3, 0x0000001f),
+	INTEL_CHECK_UCODE(INTEL_FAM6_HASWELL_ULT,	 1, 0x0000001e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_HASWELL_GT3E,	 1, 0x00000015),
+	INTEL_CHECK_UCODE(INTEL_FAM6_HASWELL_X,		 2, 0x00000037),
+	INTEL_CHECK_UCODE(INTEL_FAM6_HASWELL_X,		 4, 0x0000000a),
+	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_CORE,	 4, 0x00000023),
+	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_GT3E,	 1, 0x00000014),
+	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_XEON_D,	 2, 0x00000010),
+	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_XEON_D,	 3, 0x07000009),
+	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_XEON_D,	 4, 0x0f000009),
+	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_XEON_D,	 5, 0x0e000002),
+	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_X,	 2, 0x0b000014),
+	INTEL_CHECK_UCODE(INTEL_FAM6_SKYLAKE_X,		 3, 0x00000021),
+	INTEL_CHECK_UCODE(INTEL_FAM6_SKYLAKE_X,		 4, 0x00000000),
+	INTEL_CHECK_UCODE(INTEL_FAM6_SKYLAKE_MOBILE,	 3, 0x0000007c),
+	INTEL_CHECK_UCODE(INTEL_FAM6_SKYLAKE_DESKTOP,	 3, 0x0000007c),
+	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_DESKTOP,	 9, 0x0000004e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_MOBILE,	 9, 0x0000004e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_MOBILE,     10, 0x0000004e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_MOBILE,     11, 0x0000004e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_MOBILE,     12, 0x0000004e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_DESKTOP,    10, 0x0000004e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_DESKTOP,    11, 0x0000004e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_DESKTOP,    12, 0x0000004e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_DESKTOP,    13, 0x0000004e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_CANNONLAKE_MOBILE,    3, 0x00000000),
+	{}
+};
+
+static void intel_check_isolation(void)
+{
+	if (!x86_cpu_has_min_microcode_rev(isolation_ucodes)) {
+		x86_pmu.pebs_isolated = 0;
+		return;
+	}
+	x86_pmu.pebs_isolated = 1;
+}
+
 static int intel_snb_pebs_broken(int cpu)
 {
 	u32 rev = UINT_MAX; /* default to broken for unknown models */
@@ -3757,6 +3808,8 @@ static void intel_snb_check_microcode(void)
 	int pebs_broken = 0;
 	int cpu;
 
+	intel_check_isolation();
+
 	for_each_online_cpu(cpu) {
 		if ((pebs_broken = intel_snb_pebs_broken(cpu)))
 			break;
@@ -3838,6 +3891,12 @@ static __init void intel_sandybridge_quirk(void)
 	cpus_read_unlock();
 }
 
+static __init void intel_isolation_quirk(void)
+{
+	x86_pmu.check_microcode = intel_check_isolation;
+	intel_check_isolation();
+}
+
 static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
 	{ PERF_COUNT_HW_CPU_CYCLES, "cpu cycles" },
 	{ PERF_COUNT_HW_INSTRUCTIONS, "instructions" },
@@ -4424,6 +4483,7 @@ __init int intel_pmu_init(void)
 	case INTEL_FAM6_HASWELL_X:
 	case INTEL_FAM6_HASWELL_ULT:
 	case INTEL_FAM6_HASWELL_GT3E:
+		x86_add_quirk(intel_isolation_quirk);
 		x86_add_quirk(intel_ht_bug);
 		x86_pmu.late_ack = true;
 		memcpy(hw_cache_event_ids, hsw_hw_cache_event_ids, sizeof(hw_cache_event_ids));
@@ -4456,6 +4516,7 @@ __init int intel_pmu_init(void)
 	case INTEL_FAM6_BROADWELL_XEON_D:
 	case INTEL_FAM6_BROADWELL_GT3E:
 	case INTEL_FAM6_BROADWELL_X:
+		x86_add_quirk(intel_isolation_quirk);
 		x86_pmu.late_ack = true;
 		memcpy(hw_cache_event_ids, hsw_hw_cache_event_ids, sizeof(hw_cache_event_ids));
 		memcpy(hw_cache_extra_regs, hsw_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
@@ -4518,6 +4579,7 @@ __init int intel_pmu_init(void)
 	case INTEL_FAM6_SKYLAKE_X:
 	case INTEL_FAM6_KABYLAKE_MOBILE:
 	case INTEL_FAM6_KABYLAKE_DESKTOP:
+		x86_add_quirk(intel_isolation_quirk);
 		x86_pmu.late_ack = true;
 		memcpy(hw_cache_event_ids, skl_hw_cache_event_ids, sizeof(hw_cache_event_ids));
 		memcpy(hw_cache_extra_regs, skl_hw_cache_extra_regs, sizeof(hw_cache_extra_regs));
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 78d7b70..c2d5a51 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -607,7 +607,8 @@ struct x86_pmu {
 			pebs_active	:1,
 			pebs_broken	:1,
 			pebs_prec_dist	:1,
-			pebs_no_tlb	:1;
+			pebs_no_tlb	:1,
+			pebs_isolated   :1;
 	int		pebs_record_size;
 	int		pebs_buffer_size;
 	void		(*drain_pebs)(struct pt_regs *regs);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V5 3/4] perf/x86/intel: Clean up counter freezing quirk
  2019-01-07 22:34 [PATCH V5 1/4] x86/cpufeature: Add facility to check for min microcode revisions kan.liang
  2019-01-07 22:34 ` [PATCH V5 2/4] perf/x86/kvm: Avoid unnecessary work in guest filtering kan.liang
@ 2019-01-07 22:34 ` kan.liang
  2019-01-07 22:34 ` [PATCH V5 4/4] perf/x86/intel: Add counter freezing quirk for Goldmont kan.liang
  2019-01-16 12:20 ` [PATCH V5 1/4] x86/cpufeature: Add facility to check for min microcode revisions Borislav Petkov
  3 siblings, 0 replies; 7+ messages in thread
From: kan.liang @ 2019-01-07 22:34 UTC (permalink / raw)
  To: x86, linux-kernel, tglx, bp, peterz, mingo; +Cc: ak, eranian, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Clean up counter freezing quirk to use the new facility to check for
min microcode revisions.

Rename the counter freezing quirk related functions. Because other
platforms, e.g. Goldmont, also needs to call the quirk.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

- New patch, merged from https://lkml.org/lkml/2018/10/3/25

 arch/x86/events/intel/core.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 587d83e..e94731e 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3938,23 +3938,18 @@ static __init void intel_nehalem_quirk(void)
 	}
 }
 
-static bool intel_glp_counter_freezing_broken(int cpu)
-{
-	u32 rev = UINT_MAX; /* default to broken for unknown stepping */
-
-	switch (cpu_data(cpu).x86_stepping) {
-	case 1:
-		rev = 0x28;
-		break;
-	case 8:
-		rev = 0x6;
-		break;
-	}
+static const struct x86_cpu_check counter_freezing_ucodes[] = {
+	INTEL_CHECK_UCODE(INTEL_FAM6_ATOM_GOLDMONT_PLUS,	1, 0x00000028),
+	INTEL_CHECK_UCODE(INTEL_FAM6_ATOM_GOLDMONT_PLUS,	8, 0x00000006),
+	{}
+};
 
-	return (cpu_data(cpu).microcode < rev);
+static bool intel_counter_freezing_broken(int cpu)
+{
+	return !x86_cpu_has_min_microcode_rev(counter_freezing_ucodes);
 }
 
-static __init void intel_glp_counter_freezing_quirk(void)
+static __init void intel_counter_freezing_quirk(void)
 {
 	/* Check if it's already disabled */
 	if (disable_counter_freezing)
@@ -3964,7 +3959,7 @@ static __init void intel_glp_counter_freezing_quirk(void)
 	 * If the system starts with the wrong ucode, leave the
 	 * counter-freezing feature permanently disabled.
 	 */
-	if (intel_glp_counter_freezing_broken(raw_smp_processor_id())) {
+	if (intel_counter_freezing_broken(raw_smp_processor_id())) {
 		pr_info("PMU counter freezing disabled due to CPU errata,"
 			"please upgrade microcode\n");
 		x86_pmu.counter_freezing = false;
@@ -4341,7 +4336,7 @@ __init int intel_pmu_init(void)
 		break;
 
 	case INTEL_FAM6_ATOM_GOLDMONT_PLUS:
-		x86_add_quirk(intel_glp_counter_freezing_quirk);
+		x86_add_quirk(intel_counter_freezing_quirk);
 		memcpy(hw_cache_event_ids, glp_hw_cache_event_ids,
 		       sizeof(hw_cache_event_ids));
 		memcpy(hw_cache_extra_regs, glp_hw_cache_extra_regs,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH V5 4/4] perf/x86/intel: Add counter freezing quirk for Goldmont
  2019-01-07 22:34 [PATCH V5 1/4] x86/cpufeature: Add facility to check for min microcode revisions kan.liang
  2019-01-07 22:34 ` [PATCH V5 2/4] perf/x86/kvm: Avoid unnecessary work in guest filtering kan.liang
  2019-01-07 22:34 ` [PATCH V5 3/4] perf/x86/intel: Clean up counter freezing quirk kan.liang
@ 2019-01-07 22:34 ` kan.liang
  2019-01-16 12:20 ` [PATCH V5 1/4] x86/cpufeature: Add facility to check for min microcode revisions Borislav Petkov
  3 siblings, 0 replies; 7+ messages in thread
From: kan.liang @ 2019-01-07 22:34 UTC (permalink / raw)
  To: x86, linux-kernel, tglx, bp, peterz, mingo; +Cc: ak, eranian, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

A ucode patch is also needed for Goldmont while counter freezing feature
is enabled. Otherwise, there will be some issues, e.g. PMI lost.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

- New patch, merged from https://lkml.org/lkml/2018/10/3/25

 arch/x86/events/intel/core.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index e94731e..99be2d5 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3939,6 +3939,10 @@ static __init void intel_nehalem_quirk(void)
 }
 
 static const struct x86_cpu_check counter_freezing_ucodes[] = {
+	INTEL_CHECK_UCODE(INTEL_FAM6_ATOM_GOLDMONT,		2, 0x0000000e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_ATOM_GOLDMONT,		9, 0x0000002e),
+	INTEL_CHECK_UCODE(INTEL_FAM6_ATOM_GOLDMONT,		10, 0x00000008),
+	INTEL_CHECK_UCODE(INTEL_FAM6_ATOM_GOLDMONT_X,		1, 0x00000028),
 	INTEL_CHECK_UCODE(INTEL_FAM6_ATOM_GOLDMONT_PLUS,	1, 0x00000028),
 	INTEL_CHECK_UCODE(INTEL_FAM6_ATOM_GOLDMONT_PLUS,	8, 0x00000006),
 	{}
@@ -4310,6 +4314,7 @@ __init int intel_pmu_init(void)
 
 	case INTEL_FAM6_ATOM_GOLDMONT:
 	case INTEL_FAM6_ATOM_GOLDMONT_X:
+		x86_add_quirk(intel_counter_freezing_quirk);
 		memcpy(hw_cache_event_ids, glm_hw_cache_event_ids,
 		       sizeof(hw_cache_event_ids));
 		memcpy(hw_cache_extra_regs, glm_hw_cache_extra_regs,
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH V5 1/4] x86/cpufeature: Add facility to check for min microcode revisions
  2019-01-07 22:34 [PATCH V5 1/4] x86/cpufeature: Add facility to check for min microcode revisions kan.liang
                   ` (2 preceding siblings ...)
  2019-01-07 22:34 ` [PATCH V5 4/4] perf/x86/intel: Add counter freezing quirk for Goldmont kan.liang
@ 2019-01-16 12:20 ` Borislav Petkov
  3 siblings, 0 replies; 7+ messages in thread
From: Borislav Petkov @ 2019-01-16 12:20 UTC (permalink / raw)
  To: kan.liang; +Cc: x86, linux-kernel, tglx, peterz, mingo, ak, eranian

On Mon, Jan 07, 2019 at 02:34:22PM -0800, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> For bug workarounds or checks it is useful to check for specific
> microcode revisions.
> 
> Add a new generic function to match the CPU with stepping.
> Add the other function to check the min microcode revisions for
> the matched CPU.
> A new table format is introduced to facilitate the quirk to
> fill the related information.
> 
> This does not change the existing x86_cpu_id because it's an ABI
> shared with modules, and also has quite different requirements,
> as in no wildcards, but everything has to be matched exactly.
> 
> Suggested-by: Thomas Gleixner <tglx@linutronix.de>
> Based-on-code-from: Andi Kleen <ak@linux.intel.com>

The proper tag is: Originally-by

> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
> 
> Changes since V4:
> - Rename to x86_cpu_check and INTEL_CHECK_UCODE
> - Split x86_min_microcode() into two functions. One is a generic
>   CPU match function with stepping. The other is to check the min
>   microcode revisions.
> 
> The previous discussion can be found here.
> https://lkml.org/lkml/2018/10/10/740
> 
>  arch/x86/include/asm/cpu_device_id.h | 28 ++++++++++++++++++++++++++++
>  arch/x86/kernel/cpu/match.c          | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 59 insertions(+)
> 
> diff --git a/arch/x86/include/asm/cpu_device_id.h b/arch/x86/include/asm/cpu_device_id.h
> index baeba05..43f19af 100644
> --- a/arch/x86/include/asm/cpu_device_id.h
> +++ b/arch/x86/include/asm/cpu_device_id.h
> @@ -11,4 +11,32 @@
>  
>  extern const struct x86_cpu_id *x86_match_cpu(const struct x86_cpu_id *match);
>  
> +/*
> + * Match specific microcode revisions.
> + *
> + * vendor/family/model/stepping must be all set.
> + *
> + * only checks against the boot cpu.  When mixed-stepping configs are

s/cpu/CPU/g

> + * valid for a CPU model, add a quirk for every valid stepping and
> + * do the fine-tuning in the quirk handler.
> + */
> +
> +struct x86_cpu_check {

That's not a "check". If anything, it is a descriptor of sorts. I.e.,
x86_cpu_desc or so.

> +	u8	vendor;
> +	u8	family;
> +	u8	model;
> +	u8	stepping;
> +	u32	microcode_rev;

Name those the same way as the corresponding members in struct
cpuinfo_x86 are named, so that there's no confusion about what is what.

> +};
> +
> +#define INTEL_CHECK_UCODE(mod, step, rev) {			\

INTEL_CPU_DESC

or so.

> +	.vendor = X86_VENDOR_INTEL,				\
> +	.family = 6,						\
> +	.model = mod,						\
> +	.stepping = step,					\
> +	.microcode_rev = rev,					\
> +}
> +
> +extern bool x86_cpu_has_min_microcode_rev(const struct x86_cpu_check *table);
> +
>  #endif
> diff --git a/arch/x86/kernel/cpu/match.c b/arch/x86/kernel/cpu/match.c
> index 3fed388..408bed37 100644
> --- a/arch/x86/kernel/cpu/match.c
> +++ b/arch/x86/kernel/cpu/match.c
> @@ -48,3 +48,34 @@ const struct x86_cpu_id *x86_match_cpu(const struct x86_cpu_id *match)
>  	return NULL;
>  }
>  EXPORT_SYMBOL(x86_match_cpu);
> +
> +static const struct x86_cpu_check *
> +x86_match_cpu_with_stepping(const struct x86_cpu_check *match)
> +{
> +	struct cpuinfo_x86 *c = &boot_cpu_data;
> +	const struct x86_cpu_check *m;
> +
> +	for (m = match; m->family | m->model; m++) {
> +		if (c->x86_vendor != m->vendor)
> +			continue;
> +		if (c->x86 != m->family)
> +			continue;
> +		if (c->x86_model != m->model)
> +			continue;
> +		if (c->x86_stepping != m->stepping)
> +			continue;
> +		return m;
> +	}
> +	return NULL;
> +}
> +
> +bool x86_cpu_has_min_microcode_rev(const struct x86_cpu_check *table)
> +{
> +	const struct x86_cpu_check *res = x86_match_cpu_with_stepping(table);
> +
> +	if (!res || res->microcode_rev > boot_cpu_data.microcode)
> +		return false;
> +
> +	return true;
> +}
> +EXPORT_SYMBOL(x86_cpu_has_min_microcode_rev);

EXPORT_SYMBOL_GPL.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V5 2/4] perf/x86/kvm: Avoid unnecessary work in guest filtering
  2019-01-07 22:34 ` [PATCH V5 2/4] perf/x86/kvm: Avoid unnecessary work in guest filtering kan.liang
@ 2019-01-16 13:14   ` Borislav Petkov
  2019-01-16 18:58     ` Liang, Kan
  0 siblings, 1 reply; 7+ messages in thread
From: Borislav Petkov @ 2019-01-16 13:14 UTC (permalink / raw)
  To: kan.liang; +Cc: x86, linux-kernel, tglx, peterz, mingo, ak, eranian

On Mon, Jan 07, 2019 at 02:34:23PM -0800, kan.liang@linux.intel.com wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> KVM added a workaround for PEBS events leaking into guests with
> commit 26a4f3c08de4 ("perf/x86: disable PEBS on a guest entry.")
> This uses the VT entry/exit list to add an extra disable of the
> PEBS_ENABLE MSR.
> 
> Intel also added a fix for this issue to microcode updates on
> Haswell/Broadwell/Skylake.
> 
> It turns out using the MSR entry/exit list makes VM exits
> significantly slower. The list is only needed for disabling
> PEBS, because the GLOBAL_CTRL change gets optimized by
> KVM into changing the VMCS.
> 
> Check for the microcode updates that have the microcode
> fix for leaking PEBS, and disable the extra entry/exit list
> entry for PEBS_ENABLE. In addition we always clear the
> GLOBAL_CTRL for the PEBS counter while running in the guest,
> which is enough to make them never fire at the wrong
> side of the host/guest transition.
> 
> We see significantly reduced overhead for VM exits with the

No "We" in commit messages pls.

> filtering active with the patch from 8% to 4%.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Your SOB is missing.

> ---
> 
> Changes since V4:
> - Use new name x86_cpu_has_min_microcode_rev() and INTEL_CHECK_UCODE
> 
>  arch/x86/events/intel/core.c | 80 +++++++++++++++++++++++++++++++++++++++-----
>  arch/x86/events/perf_event.h |  3 +-
>  2 files changed, 73 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index ecc3e34..587d83e 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -18,6 +18,7 @@
>  #include <asm/hardirq.h>
>  #include <asm/intel-family.h>
>  #include <asm/apic.h>
> +#include <asm/cpu_device_id.h>
>  
>  #include "../perf_event.h"
>  
> @@ -3206,16 +3207,27 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr)
>  	arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
>  	arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
>  	arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> -	/*
> -	 * If PMU counter has PEBS enabled it is not enough to disable counter
> -	 * on a guest entry since PEBS memory write can overshoot guest entry
> -	 * and corrupt guest memory. Disabling PEBS solves the problem.
> -	 */
> -	arr[1].msr = MSR_IA32_PEBS_ENABLE;
> -	arr[1].host = cpuc->pebs_enabled;
> -	arr[1].guest = 0;
> +	if (x86_pmu.flags & PMU_FL_PEBS_ALL)
> +		arr[0].guest &= ~cpuc->pebs_enabled;
> +	else
> +		arr[0].guest &= ~(cpuc->pebs_enabled & PEBS_COUNTER_MASK);
> +	*nr = 1;
> +
> +	if (!x86_pmu.pebs_isolated) {
> +		/*
> +		 * If PMU counter has PEBS enabled it is not enough to
> +		 * disable counter on a guest entry since PEBS memory
> +		 * write can overshoot guest entry and corrupt guest
> +		 * memory. Disabling PEBS solves the problem.
> +		 *
> +		 * Don't do this if the CPU already enforces it.
> +		 */
> +		arr[1].msr = MSR_IA32_PEBS_ENABLE;
> +		arr[1].host = cpuc->pebs_enabled;
> +		arr[1].guest = 0;
> +		*nr = 2;
> +	}
>  
> -	*nr = 2;
>  	return arr;
>  }
>  
> @@ -3733,6 +3745,45 @@ static __init void intel_clovertown_quirk(void)
>  	x86_pmu.pebs_constraints = NULL;
>  }
>  
> +static const struct x86_cpu_check isolation_ucodes[] = {
> +	INTEL_CHECK_UCODE(INTEL_FAM6_HASWELL_CORE,	 3, 0x0000001f),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_HASWELL_ULT,	 1, 0x0000001e),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_HASWELL_GT3E,	 1, 0x00000015),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_HASWELL_X,		 2, 0x00000037),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_HASWELL_X,		 4, 0x0000000a),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_CORE,	 4, 0x00000023),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_GT3E,	 1, 0x00000014),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_XEON_D,	 2, 0x00000010),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_XEON_D,	 3, 0x07000009),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_XEON_D,	 4, 0x0f000009),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_XEON_D,	 5, 0x0e000002),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_BROADWELL_X,	 2, 0x0b000014),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_SKYLAKE_X,		 3, 0x00000021),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_SKYLAKE_X,		 4, 0x00000000),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_SKYLAKE_MOBILE,	 3, 0x0000007c),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_SKYLAKE_DESKTOP,	 3, 0x0000007c),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_DESKTOP,	 9, 0x0000004e),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_MOBILE,	 9, 0x0000004e),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_MOBILE,     10, 0x0000004e),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_MOBILE,     11, 0x0000004e),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_MOBILE,     12, 0x0000004e),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_DESKTOP,    10, 0x0000004e),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_DESKTOP,    11, 0x0000004e),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_DESKTOP,    12, 0x0000004e),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_KABYLAKE_DESKTOP,    13, 0x0000004e),
> +	INTEL_CHECK_UCODE(INTEL_FAM6_CANNONLAKE_MOBILE,    3, 0x00000000),

Align vertically.

> +	{}
> +};
> +
> +static void intel_check_isolation(void)
> +{
> +	if (!x86_cpu_has_min_microcode_rev(isolation_ucodes)) {
> +		x86_pmu.pebs_isolated = 0;
> +		return;
> +	}
> +	x86_pmu.pebs_isolated = 1;
> +}

Simply:

static void intel_check_isolation(void)
{
        x86_pmu.pebs_isolated = x86_cpu_has_min_microcode_rev(isolation_ucodes);
}

> +
>  static int intel_snb_pebs_broken(int cpu)
>  {
>  	u32 rev = UINT_MAX; /* default to broken for unknown models */
> @@ -3757,6 +3808,8 @@ static void intel_snb_check_microcode(void)
>  	int pebs_broken = 0;
>  	int cpu;
>  
> +	intel_check_isolation();

That looks strange:

on the one hand, this function gets assigned to x86_pmu.check_microcode
but then, on the other, it gets called in intel_snb_check_microcode()
too, where latter gets assigned to that ->check_microcode pointer too.

This needs a cleanup to introduce a single microcode callback function
in this file and that function picks apart what to do based on the
models, etc.

Also, intel_snb_pebs_broken() needs to be converted to this new stepping
checking scheme too, says peterz.

> +
>  	for_each_online_cpu(cpu) {
>  		if ((pebs_broken = intel_snb_pebs_broken(cpu)))
>  			break;
> @@ -3838,6 +3891,12 @@ static __init void intel_sandybridge_quirk(void)
>  	cpus_read_unlock();
>  }
>  
> +static __init void intel_isolation_quirk(void)
> +{
> +	x86_pmu.check_microcode = intel_check_isolation;
> +	intel_check_isolation();
> +}
> +
>  static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
>  	{ PERF_COUNT_HW_CPU_CYCLES, "cpu cycles" },
>  	{ PERF_COUNT_HW_INSTRUCTIONS, "instructions" },
> @@ -4424,6 +4483,7 @@ __init int intel_pmu_init(void)
>  	case INTEL_FAM6_HASWELL_X:
>  	case INTEL_FAM6_HASWELL_ULT:
>  	case INTEL_FAM6_HASWELL_GT3E:
> +		x86_add_quirk(intel_isolation_quirk);

And reportedly, the quirks are one-off things - not what this one
needs to do. So you need to run this unconditionally at the end of
intel_pmu_init() and get rid of all that quirks indirection.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH V5 2/4] perf/x86/kvm: Avoid unnecessary work in guest filtering
  2019-01-16 13:14   ` Borislav Petkov
@ 2019-01-16 18:58     ` Liang, Kan
  0 siblings, 0 replies; 7+ messages in thread
From: Liang, Kan @ 2019-01-16 18:58 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: x86, linux-kernel, tglx, peterz, mingo, ak, eranian



On 1/16/2019 8:14 AM, Borislav Petkov wrote:
>> +static __init void intel_isolation_quirk(void)
>> +{
>> +	x86_pmu.check_microcode = intel_check_isolation;
>> +	intel_check_isolation();
>> +}
>> +
>>   static const struct { int id; char *name; } intel_arch_events_map[] __initconst = {
>>   	{ PERF_COUNT_HW_CPU_CYCLES, "cpu cycles" },
>>   	{ PERF_COUNT_HW_INSTRUCTIONS, "instructions" },
>> @@ -4424,6 +4483,7 @@ __init int intel_pmu_init(void)
>>   	case INTEL_FAM6_HASWELL_X:
>>   	case INTEL_FAM6_HASWELL_ULT:
>>   	case INTEL_FAM6_HASWELL_GT3E:
>> +		x86_add_quirk(intel_isolation_quirk);
> And reportedly, the quirks are one-off things - not what this one
> needs to do. So you need to run this unconditionally at the end of
> intel_pmu_init() and get rid of all that quirks indirection.

Hi Borislav,

Thanks for the review. I will change the code according to all your 
comments.

Besides, I will do one more change which impacts your last comment.

Current intel_check_isolation() will set x86_pmu.pebs_isolated=0 for the 
platforms not in the table. It's OK for now. But it will be a problem 
for future platforms.
I've confirmed with Andi. The microcode patch has been merged into 
future platforms. So we have to always set x86_pmu.pebs_isolated=1 for 
future platforms. That means we have to add the CPU model to the table 
for each new platforms. It will be hard to maintain.

I plan to rename x86_pmu.pebs_isolated to x86_pmu.no_pebs_isolation. The 
default value for x86_pmu.no_pebs_isolation is 0. So we don't need to 
worry about the future platforms.
The unconditional check will be moved to the begin of intel_pmu_init(). 
The old platforms, which doesn't have microcode fix, will be specially 
handled by adding x86_pmu.no_pebs_isolation=1 after each "case :" of old 
platforms.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-01-16 18:58 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-07 22:34 [PATCH V5 1/4] x86/cpufeature: Add facility to check for min microcode revisions kan.liang
2019-01-07 22:34 ` [PATCH V5 2/4] perf/x86/kvm: Avoid unnecessary work in guest filtering kan.liang
2019-01-16 13:14   ` Borislav Petkov
2019-01-16 18:58     ` Liang, Kan
2019-01-07 22:34 ` [PATCH V5 3/4] perf/x86/intel: Clean up counter freezing quirk kan.liang
2019-01-07 22:34 ` [PATCH V5 4/4] perf/x86/intel: Add counter freezing quirk for Goldmont kan.liang
2019-01-16 12:20 ` [PATCH V5 1/4] x86/cpufeature: Add facility to check for min microcode revisions Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).