[MODERATED] [PATCH v7 0/5] NX 0

historical-speck.lore.kernel.org archive mirror
 help / color / mirror / Atom feed

* [MODERATED] [PATCH v7 0/5] NX 0
@ 2019-10-24 16:34 Paolo Bonzini
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 1/5] NX 1 Paolo Bonzini
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-24 16:34 UTC (permalink / raw)
  To: speck

From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH v7 0/5] NX 0


v6->v7: tell nested hypervisors to disable the mitigation

Junaid Shahid (2):
  kvm: Add helper function for creating VM worker threads
  kvm: x86: mmu: Recovery of shattered NX large pages

Paolo Bonzini (1):
  kvm: mmu: ITLB_MULTIHIT mitigation

Pawan Gupta (2):
  x86: Add ITLB_MULTIHIT bug infrastructure
  x86/cpu: Add Tremont to the cpu vulnerability whitelist

 Documentation/ABI/testing/sysfs-devices-system-cpu |   1 +
 Documentation/admin-guide/kernel-parameters.txt    |  17 ++
 arch/x86/include/asm/cpufeatures.h                 |   1 +
 arch/x86/include/asm/kvm_host.h                    |   6 +
 arch/x86/include/asm/msr-index.h                   |   7 +
 arch/x86/kernel/cpu/bugs.c                         |  24 ++
 arch/x86/kernel/cpu/common.c                       |  73 +++---
 arch/x86/kvm/mmu.c                                 | 264 ++++++++++++++++++++-
 arch/x86/kvm/mmu.h                                 |   4 +
 arch/x86/kvm/paging_tmpl.h                         |  29 ++-
 arch/x86/kvm/x86.c                                 |  20 ++
 drivers/base/cpu.c                                 |   8 +
 include/linux/cpu.h                                |   2 +
 include/linux/kvm_host.h                           |   6 +
 virt/kvm/kvm_main.c                                | 114 ++++++++-
 15 files changed, 530 insertions(+), 46 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH v7 1/5] NX 1
  2019-10-24 16:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
@ 2019-10-24 16:34 ` Paolo Bonzini
  2019-10-30 18:38   ` [MODERATED] " Josh Poimboeuf
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 2/5] NX 2 Paolo Bonzini
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-24 16:34 UTC (permalink / raw)
  To: speck

From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Subject: [PATCH v7 1/5] x86: Add ITLB_MULTIHIT bug infrastructure


Some processors may incur a machine check error possibly
resulting in an unrecoverable cpu hang when an instruction fetch
encounters a TLB multi-hit in the instruction TLB. This can occur
when the page size is changed along with either the physical
address or cache type [1].

This issue affects both bare-metal x86 page tables and EPT.

This can be mitigated by either eliminating the use of large
pages or by using careful TLB invalidations when changing the
page size in the page tables.

Just like Spectre, Meltdown, L1TF and MDS, a new bit has been
allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will
be set on CPUs which are mitigated against this issue.

[1] For example please refer to erratum SKL002 in "6th Generation
Intel Processor Family Specification Update"
https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html
https://www.google.com/search?q=site:intel.com+SKL002

There are a lot of other affected processors outside of Skylake and
that the erratum(referred above) does not fully disclose the issue
and the impact, both on Skylake and across all the affected CPUs.

Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |  1 +
 arch/x86/include/asm/cpufeatures.h                 |  1 +
 arch/x86/include/asm/msr-index.h                   |  7 +++
 arch/x86/kernel/cpu/bugs.c                         | 13 ++++
 arch/x86/kernel/cpu/common.c                       | 71 ++++++++++++----------
 drivers/base/cpu.c                                 |  8 +++
 include/linux/cpu.h                                |  2 +
 7 files changed, 70 insertions(+), 33 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 06d0931119cc..55bf5e1538ad 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -486,6 +486,7 @@ What:		/sys/devices/system/cpu/vulnerabilities
 		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
 		/sys/devices/system/cpu/vulnerabilities/l1tf
 		/sys/devices/system/cpu/vulnerabilities/mds
+		/sys/devices/system/cpu/vulnerabilities/itlb_multihit
 Date:		January 2018
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
 Description:	Information about CPU vulnerabilities
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 0652d3eed9bd..66aaad7611b2 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -399,5 +399,6 @@
 #define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
 #define X86_BUG_MSBDS_ONLY		X86_BUG(20) /* CPU is only affected by the  MSDBS variant of BUG_MDS */
 #define X86_BUG_SWAPGS			X86_BUG(21) /* CPU is affected by speculation through SWAPGS */
+#define X86_BUG_ITLB_MULTIHIT		X86_BUG(22) /* CPU may incur MCE during certain page attribute changes */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 20ce682a2540..c678899b21db 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -93,6 +93,13 @@
 						  * Microarchitectural Data
 						  * Sampling (MDS) vulnerabilities.
 						  */
+#define ARCH_CAP_PSCHANGE_MC_NO		BIT(6)	 /*
+						  * The processor is not susceptible to a
+						  * machine check error due to modifying the
+						  * code page size along with either the
+						  * physical address or cache type
+						  * without TLB invalidation.
+						  */
 
 #define MSR_IA32_FLUSH_CMD		0x0000010b
 #define L1D_FLUSH			BIT(0)	/*
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 91c2561b905f..ecd0126648ea 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1311,6 +1311,11 @@ static ssize_t l1tf_show_state(char *buf)
 }
 #endif
 
+static ssize_t itlb_multihit_show_state(char *buf)
+{
+	return sprintf(buf, "Processor vulnerable\n");
+}
+
 static ssize_t mds_show_state(char *buf)
 {
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
@@ -1398,6 +1403,9 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
 	case X86_BUG_MDS:
 		return mds_show_state(buf);
 
+	case X86_BUG_ITLB_MULTIHIT:
+		return itlb_multihit_show_state(buf);
+
 	default:
 		break;
 	}
@@ -1434,4 +1442,9 @@ ssize_t cpu_show_mds(struct device *dev, struct device_attribute *attr, char *bu
 {
 	return cpu_show_common(dev, attr, buf, X86_BUG_MDS);
 }
+
+ssize_t cpu_show_itlb_multihit(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return cpu_show_common(dev, attr, buf, X86_BUG_ITLB_MULTIHIT);
+}
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9ae7d1bcd4f4..fc00b2349a9f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1016,13 +1016,14 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
 #endif
 }
 
-#define NO_SPECULATION	BIT(0)
-#define NO_MELTDOWN	BIT(1)
-#define NO_SSB		BIT(2)
-#define NO_L1TF		BIT(3)
-#define NO_MDS		BIT(4)
-#define MSBDS_ONLY	BIT(5)
-#define NO_SWAPGS	BIT(6)
+#define NO_SPECULATION		BIT(0)
+#define NO_MELTDOWN		BIT(1)
+#define NO_SSB			BIT(2)
+#define NO_L1TF			BIT(3)
+#define NO_MDS			BIT(4)
+#define MSBDS_ONLY		BIT(5)
+#define NO_SWAPGS		BIT(6)
+#define NO_ITLB_MULTIHIT	BIT(7)
 
 #define VULNWL(_vendor, _family, _model, _whitelist)	\
 	{ X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist }
@@ -1043,27 +1044,27 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
 	VULNWL(NSC,	5, X86_MODEL_ANY,	NO_SPECULATION),
 
 	/* Intel Family 6 */
-	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION),
-	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION),
-
-	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+
+	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	VULNWL_INTEL(CORE_YONAH,		NO_SSB),
 
-	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
-	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	/*
 	 * Technically, swapgs isn't serializing on AMD (despite it previously
@@ -1074,14 +1075,14 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
 	 */
 
 	/* AMD Family 0xf - 0x12 */
-	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
+	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */
-	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
+	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
 	{}
 };
 
@@ -1096,15 +1097,19 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
 {
 	u64 ia32_cap = 0;
 
+	if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES))
+		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
+
+	/* Set ITLB_MULTIHIT bug if cpu is not in the whitelist and not mitigated */
+	if (!cpu_matches(NO_ITLB_MULTIHIT) && !(ia32_cap & ARCH_CAP_PSCHANGE_MC_NO))
+		setup_force_cpu_bug(X86_BUG_ITLB_MULTIHIT);
+
 	if (cpu_matches(NO_SPECULATION))
 		return;
 
 	setup_force_cpu_bug(X86_BUG_SPECTRE_V1);
 	setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
 
-	if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES))
-		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
-
 	if (!cpu_matches(NO_SSB) && !(ia32_cap & ARCH_CAP_SSB_NO) &&
 	   !cpu_has(c, X86_FEATURE_AMD_SSB_NO))
 		setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index cc37511de866..f1a6e020ed8d 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -554,12 +554,19 @@ ssize_t __weak cpu_show_mds(struct device *dev,
 	return sprintf(buf, "Not affected\n");
 }
 
+ssize_t __weak cpu_show_itlb_multihit(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "Not affected\n");
+}
+
 static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
 static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
 static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
 static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
 static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);
 static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL);
+static DEVICE_ATTR(itlb_multihit, 0444, cpu_show_itlb_multihit, NULL);
 
 static struct attribute *cpu_root_vulnerabilities_attrs[] = {
 	&dev_attr_meltdown.attr,
@@ -568,6 +575,7 @@ ssize_t __weak cpu_show_mds(struct device *dev,
 	&dev_attr_spec_store_bypass.attr,
 	&dev_attr_l1tf.attr,
 	&dev_attr_mds.attr,
+	&dev_attr_itlb_multihit.attr,
 	NULL
 };
 
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index d0633ebdaa9c..038866a28f2c 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -59,6 +59,8 @@ extern ssize_t cpu_show_l1tf(struct device *dev,
 			     struct device_attribute *attr, char *buf);
 extern ssize_t cpu_show_mds(struct device *dev,
 			    struct device_attribute *attr, char *buf);
+extern ssize_t cpu_show_itlb_multihit(struct device *dev,
+				      struct device_attribute *attr, char *buf);
 
 extern __printf(4, 5)
 struct device *cpu_device_create(struct device *parent, void *drvdata,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH v7 2/5] NX 2
  2019-10-24 16:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 1/5] NX 1 Paolo Bonzini
@ 2019-10-24 16:34 ` Paolo Bonzini
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 3/5] NX 3 Paolo Bonzini
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-24 16:34 UTC (permalink / raw)
  To: speck

From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Subject: [PATCH v7 2/5] x86/cpu: Add Tremont to the cpu vulnerability

 whitelist

This patch adds new cpu family ATOM_TREMONT_D to the cpu vunerability
whitelist. ATOM_TREMONT_D is not affected by X86_BUG_ITLB_MULTIHIT. There
may be more bugs not affecting ATOM_TREMONT_D which are not known at
this point and could be added later.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kernel/cpu/common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index fc00b2349a9f..c652ca9dc046 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1074,6 +1074,8 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
 	 * good enough for our purposes.
 	 */
 
+	VULNWL_INTEL(ATOM_TREMONT_D,		NO_ITLB_MULTIHIT),
+
 	/* AMD Family 0xf - 0x12 */
 	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
 	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH v7 3/5] NX 3
  2019-10-24 16:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 1/5] NX 1 Paolo Bonzini
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 2/5] NX 2 Paolo Bonzini
@ 2019-10-24 16:34 ` Paolo Bonzini
  2019-10-25  8:37   ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 4/5] NX 4 Paolo Bonzini
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-24 16:34 UTC (permalink / raw)
  To: speck

From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH v7 3/5] kvm: mmu: ITLB_MULTIHIT mitigation


With some Intel processors, putting the same virtual address in the TLB
as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit
and cause the processor to issue a machine check.  Unfortunately if EPT
page tables use huge pages, it possible for a malicious guest to cause
this situation.

This patch adds a knob to mark huge pages as non-executable. When the
nx_huge_pages parameter is enabled (and we are using EPT), all huge pages
are marked as NX. If the guest attempts to execute in one of those pages,
the page is broken down into 4K pages, which are then marked executable.

This is not an issue for shadow paging (except nested EPT), because then
the host is in control of TLB flushes and the problematic situation cannot
happen.  With nested EPT, again the nested guest can cause problems so we
treat shadow and direct EPT the same.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  11 ++
 arch/x86/include/asm/kvm_host.h                 |   2 +
 arch/x86/kernel/cpu/bugs.c                      |  13 ++-
 arch/x86/kvm/mmu.c                              | 135 ++++++++++++++++++++++--
 arch/x86/kvm/paging_tmpl.h                      |  29 +++--
 arch/x86/kvm/x86.c                              |   9 ++
 6 files changed, 186 insertions(+), 13 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a84a83f8881e..d6449135c62d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2055,6 +2055,17 @@
 			KVM MMU at runtime.
 			Default is 0 (off)
 
+	kvm.nx_huge_pages=
+			[KVM] Controls the sw workaround for bug
+			X86_BUG_ITLB_MULTIHIT.
+			force	: Always deploy workaround.
+			off	: Default. Never deploy workaround.
+			auto	: Deploy workaround based on presence of
+				  X86_BUG_ITLB_MULTIHIT.
+
+			If the sw workaround is enabled for the host, guests
+			need not enable it for nested guests.
+
 	kvm-amd.nested=	[KVM,AMD] Allow nested virtualization in KVM/SVM.
 			Default is 1 (enabled)
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6f6b8886a8eb..6aba95fdb00b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -313,6 +313,7 @@ struct kvm_mmu_page {
 	bool unsync;
 	u8 mmu_valid_gen;
 	bool mmio_cached;
+	bool lpage_disallowed; /* Can't be replaced by an equiv large page */
 
 	/*
 	 * The following two entries are used to key the shadow page in the
@@ -945,6 +946,7 @@ struct kvm_vm_stat {
 	ulong mmu_unsync;
 	ulong remote_tlb_flush;
 	ulong lpages;
+	ulong nx_lpage_splits;
 	ulong max_mmu_page_hash_collisions;
 };
 
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ecd0126648ea..fe0378e03dbb 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1149,6 +1149,9 @@ void x86_spec_ctrl_setup_ap(void)
 		x86_amd_ssb_disable();
 }
 
+bool itlb_multihit_kvm_mitigation;
+EXPORT_SYMBOL_GPL(itlb_multihit_kvm_mitigation);
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"L1TF: " fmt
 
@@ -1304,17 +1307,25 @@ static ssize_t l1tf_show_state(char *buf)
 		       l1tf_vmx_states[l1tf_vmx_mitigation],
 		       sched_smt_active() ? "vulnerable" : "disabled");
 }
+
+static ssize_t itlb_multihit_show_state(char *buf)
+{
+	if (itlb_multihit_kvm_mitigation)
+		return sprintf(buf, "KVM: Mitigation: Split huge pages\n");
+	else
+		return sprintf(buf, "KVM: Vulnerable\n");
+}
 #else
 static ssize_t l1tf_show_state(char *buf)
 {
 	return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG);
 }
-#endif
 
 static ssize_t itlb_multihit_show_state(char *buf)
 {
 	return sprintf(buf, "Processor vulnerable\n");
 }
+#endif
 
 static ssize_t mds_show_state(char *buf)
 {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 24c23c66b226..837beefdf0a5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -47,6 +47,20 @@
 #include <asm/kvm_page_track.h>
 #include "trace.h"
 
+extern bool itlb_multihit_kvm_mitigation;
+
+static int __read_mostly nx_huge_pages = -1;
+
+static int set_nx_huge_pages(const char *val, const struct kernel_param *kp);
+
+static struct kernel_param_ops nx_huge_pages_ops = {
+	.set = set_nx_huge_pages,
+	.get = param_get_bool,
+};
+
+module_param_cb(nx_huge_pages, &nx_huge_pages_ops, &nx_huge_pages, 0644);
+__MODULE_PARM_TYPE(nx_huge_pages, "bool");
+
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
  * where the hardware walks 2 page tables:
@@ -352,6 +366,11 @@ static inline bool spte_ad_need_write_protect(u64 spte)
 	return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_ENABLED_MASK;
 }
 
+static bool is_nx_huge_page_enabled(void)
+{
+	return READ_ONCE(nx_huge_pages);
+}
+
 static inline u64 spte_shadow_accessed_mask(u64 spte)
 {
 	MMU_WARN_ON(is_mmio_spte(spte));
@@ -1190,6 +1209,15 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 }
 
+static void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	if (sp->lpage_disallowed)
+		return;
+
+	++kvm->stat.nx_lpage_splits;
+	sp->lpage_disallowed = true;
+}
+
 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	struct kvm_memslots *slots;
@@ -1207,6 +1235,12 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
 
+static void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	--kvm->stat.nx_lpage_splits;
+	sp->lpage_disallowed = false;
+}
+
 static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
 					  struct kvm_memory_slot *slot)
 {
@@ -2792,6 +2826,9 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 			kvm_reload_remote_mmus(kvm);
 	}
 
+	if (sp->lpage_disallowed)
+		unaccount_huge_nx_page(kvm, sp);
+
 	sp->role.invalid = 1;
 	return list_unstable;
 }
@@ -3013,6 +3050,11 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (!speculative)
 		spte |= spte_shadow_accessed_mask(spte);
 
+	if (level > PT_PAGE_TABLE_LEVEL && (pte_access & ACC_EXEC_MASK) &&
+	    is_nx_huge_page_enabled()) {
+		pte_access &= ~ACC_EXEC_MASK;
+	}
+
 	if (pte_access & ACC_EXEC_MASK)
 		spte |= shadow_x_mask;
 	else
@@ -3233,9 +3275,32 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 	__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
+static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator it,
+				       gfn_t gfn, kvm_pfn_t *pfnp, int *levelp)
+{
+	int level = *levelp;
+	u64 spte = *it.sptep;
+
+	if (it.level == level && level > PT_PAGE_TABLE_LEVEL &&
+	    is_nx_huge_page_enabled() &&
+	    is_shadow_present_pte(spte) &&
+	    !is_large_pte(spte)) {
+		/*
+		 * A small SPTE exists for this pfn, but FNAME(fetch)
+		 * and __direct_map would like to create a large PTE
+		 * instead: just force them to go down another level,
+		 * patching back for them into pfn the next 9 bits of
+		 * the address.
+		 */
+		u64 page_mask = KVM_PAGES_PER_HPAGE(level) - KVM_PAGES_PER_HPAGE(level - 1);
+		*pfnp |= gfn & page_mask;
+		(*levelp)--;
+	}
+}
+
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 			int map_writable, int level, kvm_pfn_t pfn,
-			bool prefault)
+			bool prefault, bool lpage_disallowed)
 {
 	struct kvm_shadow_walk_iterator it;
 	struct kvm_mmu_page *sp;
@@ -3248,6 +3313,12 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
 	for_each_shadow_entry(vcpu, gpa, it) {
+		/*
+		 * We cannot overwrite existing page tables with an NX
+		 * large page, as the leaf could be executable.
+		 */
+		disallowed_hugepage_adjust(it, gfn, &pfn, &level);
+
 		base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 		if (it.level == level)
 			break;
@@ -3258,6 +3329,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 					      it.level - 1, true, ACC_ALL);
 
 			link_shadow_page(vcpu, it.sptep, sp);
+			if (lpage_disallowed)
+				account_huge_nx_page(vcpu->kvm, sp);
 		}
 	}
 
@@ -3550,11 +3623,14 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
 {
 	int r;
 	int level;
-	bool force_pt_level = false;
+	bool force_pt_level;
 	kvm_pfn_t pfn;
 	unsigned long mmu_seq;
 	bool map_writable, write = error_code & PFERR_WRITE_MASK;
+	bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
+				is_nx_huge_page_enabled();
 
+	force_pt_level = lpage_disallowed;
 	level = mapping_level(vcpu, gfn, &force_pt_level);
 	if (likely(!force_pt_level)) {
 		/*
@@ -3588,7 +3664,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
 		goto out_unlock;
 	if (likely(!force_pt_level))
 		transparent_hugepage_adjust(vcpu, gfn, &pfn, &level);
-	r = __direct_map(vcpu, v, write, map_writable, level, pfn, prefault);
+	r = __direct_map(vcpu, v, write, map_writable, level, pfn,
+			 prefault, false);
 out_unlock:
 	spin_unlock(&vcpu->kvm->mmu_lock);
 	kvm_release_pfn_clean(pfn);
@@ -4174,6 +4251,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 	unsigned long mmu_seq;
 	int write = error_code & PFERR_WRITE_MASK;
 	bool map_writable;
+	bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
+				is_nx_huge_page_enabled();
 
 	MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa));
 
@@ -4184,8 +4263,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 	if (r)
 		return r;
 
-	force_pt_level = !check_hugepage_cache_consistency(vcpu, gfn,
-							   PT_DIRECTORY_LEVEL);
+	force_pt_level =
+		lpage_disallowed ||
+		!check_hugepage_cache_consistency(vcpu, gfn, PT_DIRECTORY_LEVEL);
 	level = mapping_level(vcpu, gfn, &force_pt_level);
 	if (likely(!force_pt_level)) {
 		if (level > PT_DIRECTORY_LEVEL &&
@@ -4214,7 +4294,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 		goto out_unlock;
 	if (likely(!force_pt_level))
 		transparent_hugepage_adjust(vcpu, gfn, &pfn, &level);
-	r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, prefault);
+	r = __direct_map(vcpu, gpa, write, map_writable, level, pfn,
+			 prefault, lpage_disallowed);
 out_unlock:
 	spin_unlock(&vcpu->kvm->mmu_lock);
 	kvm_release_pfn_clean(pfn);
@@ -6155,10 +6236,52 @@ static void kvm_set_mmio_spte_mask(void)
 	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
 }
 
+static void __set_nx_huge_pages(bool val)
+{
+	nx_huge_pages = itlb_multihit_kvm_mitigation = val;
+}
+
+static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
+{
+	bool old_val = nx_huge_pages;
+	bool new_val;
+
+	/* In "auto" mode deploy workaround only if CPU has the bug. */
+	if (sysfs_streq(val, "off"))
+		new_val = 0;
+	else if (sysfs_streq(val, "force"))
+		new_val = 1;
+	else if (sysfs_streq(val, "auto"))
+		new_val = boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT);
+	else if (strtobool(val, &new_val) < 0)
+		return -EINVAL;
+
+	__set_nx_huge_pages(new_val);
+
+	if (new_val != old_val) {
+		struct kvm *kvm;
+		int idx;
+
+		mutex_lock(&kvm_lock);
+
+		list_for_each_entry(kvm, &vm_list, vm_list) {
+			idx = srcu_read_lock(&kvm->srcu);
+			kvm_mmu_zap_all_fast(kvm);
+			srcu_read_unlock(&kvm->srcu, idx);
+		}
+		mutex_unlock(&kvm_lock);
+	}
+
+	return 0;
+}
+
 int kvm_mmu_module_init(void)
 {
 	int ret = -ENOMEM;
 
+	if (nx_huge_pages == -1)
+		__set_nx_huge_pages(boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT));
+
 	/*
 	 * MMU roles use union aliasing which is, generally speaking, an
 	 * undefined behavior. However, we supposedly know how compilers behave
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 7d5cdb3af594..97b21e7fd013 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -614,13 +614,14 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
 static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			 struct guest_walker *gw,
 			 int write_fault, int hlevel,
-			 kvm_pfn_t pfn, bool map_writable, bool prefault)
+			 kvm_pfn_t pfn, bool map_writable, bool prefault,
+			 bool lpage_disallowed)
 {
 	struct kvm_mmu_page *sp = NULL;
 	struct kvm_shadow_walk_iterator it;
 	unsigned direct_access, access = gw->pt_access;
 	int top_level, ret;
-	gfn_t base_gfn;
+	gfn_t gfn, base_gfn;
 
 	direct_access = gw->pte_access;
 
@@ -665,13 +666,25 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			link_shadow_page(vcpu, it.sptep, sp);
 	}
 
-	base_gfn = gw->gfn;
+	/*
+	 * FNAME(page_fault) might have clobbered the bottom bits of
+	 * gw->gfn, restore them from the virtual address.
+	 */
+	gfn = gw->gfn | ((addr & PT_LVL_OFFSET_MASK(gw->level)) >> PAGE_SHIFT);
+	base_gfn = gfn;
 
 	trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
 
 	for (; shadow_walk_okay(&it); shadow_walk_next(&it)) {
 		clear_sp_write_flooding_count(it.sptep);
-		base_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
+
+		/*
+		 * We cannot overwrite existing page tables with an NX
+		 * large page, as the leaf could be executable.
+		 */
+		disallowed_hugepage_adjust(it, gfn, &pfn, &hlevel);
+
+		base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 		if (it.level == hlevel)
 			break;
 
@@ -683,6 +696,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			sp = kvm_mmu_get_page(vcpu, base_gfn, addr,
 					      it.level - 1, true, direct_access);
 			link_shadow_page(vcpu, it.sptep, sp);
+			if (lpage_disallowed)
+				account_huge_nx_page(vcpu->kvm, sp);
 		}
 	}
 
@@ -759,9 +774,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 	int r;
 	kvm_pfn_t pfn;
 	int level = PT_PAGE_TABLE_LEVEL;
-	bool force_pt_level = false;
 	unsigned long mmu_seq;
 	bool map_writable, is_self_change_mapping;
+	bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
+				is_nx_huge_page_enabled();
+	bool force_pt_level = lpage_disallowed;
 
 	pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
 
@@ -851,7 +868,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 	if (!force_pt_level)
 		transparent_hugepage_adjust(vcpu, walker.gfn, &pfn, &level);
 	r = FNAME(fetch)(vcpu, addr, &walker, write_fault,
-			 level, pfn, map_writable, prefault);
+			 level, pfn, map_writable, prefault, lpage_disallowed);
 	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
 
 out_unlock:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19a0dc96beca..f81f547d138c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -215,6 +215,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ "mmu_unsync", VM_STAT(mmu_unsync) },
 	{ "remote_tlb_flush", VM_STAT(remote_tlb_flush) },
 	{ "largepages", VM_STAT(lpages, .mode = 0444) },
+	{ "nx_largepages_splitted", VM_STAT(nx_lpage_splits, .mode = 0444) },
 	{ "max_mmu_page_hash_collisions",
 		VM_STAT(max_mmu_page_hash_collisions) },
 	{ NULL }
@@ -1286,6 +1287,14 @@ static u64 kvm_get_arch_capabilities(void)
 		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, data);
 
 	/*
+	 * If nx_huge_pages is enabled, KVM's shadow paging will ensure that
+	 * the nested hypervisor runs with NX huge pages.  If it is not,
+	 * L1 is anyway vulnerable to ITLB_MULTIHIT explots from other
+	 * L1 guests, so it need not worry about its own (L2) guests.
+	 */
+	data |= ARCH_CAP_PSCHANGE_MC_NO;
+
+	/*
 	 * If we're doing cache flushes (either "always" or "cond")
 	 * we will do one whenever the guest does a vmlaunch/vmresume.
 	 * If an outer hypervisor is doing the cache flush for us
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH v7 4/5] NX 4
  2019-10-24 16:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
                   ` (2 preceding siblings ...)
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 3/5] NX 3 Paolo Bonzini
@ 2019-10-24 16:34 ` Paolo Bonzini
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 5/5] NX 5 Paolo Bonzini
  2019-10-24 18:27 ` [MODERATED] Re: [PATCH v7 0/5] NX 0 Ben Hutchings
  5 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-24 16:34 UTC (permalink / raw)
  To: speck

From: Junaid Shahid <junaids@google.com>
Subject: [PATCH v7 4/5] kvm: Add helper function for creating VM worker

 threads

This adds a function to create a kernel thread associated with a given
VM. In particular, it ensures that the worker thread inherits the
priority and cgroups of the calling thread.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h |  6 ++++
 virt/kvm/kvm_main.c      | 84 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a817e446c9aa..d0f3939f0f6c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1382,4 +1382,10 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 }
 #endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
 
+typedef int (*kvm_vm_thread_fn_t)(struct kvm *kvm, uintptr_t data);
+
+int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn,
+				uintptr_t data, const char *name,
+				struct task_struct **thread_ptr);
+
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b8534c6b8cf6..35c4d9aac184 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -50,6 +50,7 @@
 #include <linux/bsearch.h>
 #include <linux/io.h>
 #include <linux/lockdep.h>
+#include <linux/kthread.h>
 
 #include <asm/processor.h>
 #include <asm/ioctl.h>
@@ -4379,3 +4380,86 @@ void kvm_exit(void)
 	kvm_vfio_ops_exit();
 }
 EXPORT_SYMBOL_GPL(kvm_exit);
+
+struct kvm_vm_worker_thread_context {
+	struct kvm *kvm;
+	struct task_struct *parent;
+	struct completion init_done;
+	kvm_vm_thread_fn_t thread_fn;
+	uintptr_t data;
+	int err;
+};
+
+static int kvm_vm_worker_thread(void *context)
+{
+	/*
+	 * The init_context is allocated on the stack of the parent thread, so
+	 * we have to locally copy anything that is needed beyond initialization
+	 */
+	struct kvm_vm_worker_thread_context *init_context = context;
+	struct kvm *kvm = init_context->kvm;
+	kvm_vm_thread_fn_t thread_fn = init_context->thread_fn;
+	uintptr_t data = init_context->data;
+	int err;
+
+	err = kthread_park(current);
+	/* kthread_park(current) is never supposed to return an error */
+	WARN_ON(err != 0);
+	if (err)
+		goto init_complete;
+
+	err = cgroup_attach_task_all(init_context->parent, current);
+	if (err) {
+		kvm_err("%s: cgroup_attach_task_all failed with err %d\n",
+			__func__, err);
+		goto init_complete;
+	}
+
+	set_user_nice(current, task_nice(init_context->parent));
+
+init_complete:
+	init_context->err = err;
+	complete(&init_context->init_done);
+	init_context = NULL;
+
+	if (err)
+		return err;
+
+	/* Wait to be woken up by the spawner before proceeding. */
+	kthread_parkme();
+
+	if (!kthread_should_stop())
+		err = thread_fn(kvm, data);
+
+	return err;
+}
+
+int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn,
+				uintptr_t data, const char *name,
+				struct task_struct **thread_ptr)
+{
+	struct kvm_vm_worker_thread_context init_context = {};
+	struct task_struct *thread;
+
+	*thread_ptr = NULL;
+	init_context.kvm = kvm;
+	init_context.parent = current;
+	init_context.thread_fn = thread_fn;
+	init_context.data = data;
+	init_completion(&init_context.init_done);
+
+	thread = kthread_run(kvm_vm_worker_thread, &init_context,
+			     "%s-%d", name, task_pid_nr(current));
+	if (IS_ERR(thread))
+		return PTR_ERR(thread);
+
+	/* kthread_run is never supposed to return NULL */
+	WARN_ON(thread == NULL);
+
+	wait_for_completion(&init_context.init_done);
+
+	if (!init_context.err)
+		*thread_ptr = thread;
+
+	return init_context.err;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH v7 5/5] NX 5
  2019-10-24 16:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
                   ` (3 preceding siblings ...)
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 4/5] NX 4 Paolo Bonzini
@ 2019-10-24 16:34 ` Paolo Bonzini
  2019-10-24 18:27 ` [MODERATED] Re: [PATCH v7 0/5] NX 0 Ben Hutchings
  5 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-24 16:34 UTC (permalink / raw)
  To: speck

From: Junaid Shahid <junaids@google.com>
Subject: [PATCH v7 5/5] kvm: x86: mmu: Recovery of shattered NX large pages


The page table pages corresponding to broken down large pages are
zapped in FIFO order, so that the large page can potentially
be recovered, if it is no longer being used for execution.  This removes
the performance penalty for walking deeper EPT page tables.

By default, one large page will last about one hour once the guest
reaches a steady state.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   6 ++
 arch/x86/include/asm/kvm_host.h                 |   4 +
 arch/x86/kvm/mmu.c                              | 129 ++++++++++++++++++++++++
 arch/x86/kvm/mmu.h                              |   4 +
 arch/x86/kvm/x86.c                              |  11 ++
 virt/kvm/kvm_main.c                             |  30 +++++-
 6 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d6449135c62d..c667844c1c42 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2066,6 +2066,12 @@
 			If the sw workaround is enabled for the host, guests
 			need not enable it for nested guests.
 
+	kvm.nx_huge_pages_recovery_ratio=
+			[KVM] Controls how many 4KiB pages are periodically zapped
+			back to huge pages.  0 disables the recovery, otherwise if
+			the value is N KVM will zap 1/Nth of the 4KiB pages every
+			minute.  The default is 60.
+
 	kvm-amd.nested=	[KVM,AMD] Allow nested virtualization in KVM/SVM.
 			Default is 1 (enabled)
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6aba95fdb00b..8f721bfdbb14 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -310,6 +310,8 @@ struct kvm_rmap_head {
 struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
+	struct list_head lpage_disallowed_link;
+
 	bool unsync;
 	u8 mmu_valid_gen;
 	bool mmio_cached;
@@ -859,6 +861,7 @@ struct kvm_arch {
 	 */
 	struct list_head active_mmu_pages;
 	struct list_head zapped_obsolete_pages;
+	struct list_head lpage_disallowed_mmu_pages;
 	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 
@@ -933,6 +936,7 @@ struct kvm_arch {
 	bool exception_payload_enabled;
 
 	struct kvm_pmu_event_filter *pmu_event_filter;
+	struct task_struct *nx_lpage_recovery_thread;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 837beefdf0a5..e6a5748a12d5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -37,6 +37,7 @@
 #include <linux/uaccess.h>
 #include <linux/hash.h>
 #include <linux/kern_levels.h>
+#include <linux/kthread.h>
 
 #include <asm/page.h>
 #include <asm/pat.h>
@@ -50,16 +51,26 @@
 extern bool itlb_multihit_kvm_mitigation;
 
 static int __read_mostly nx_huge_pages = -1;
+static uint __read_mostly nx_huge_pages_recovery_ratio = 60;
 
 static int set_nx_huge_pages(const char *val, const struct kernel_param *kp);
+static int set_nx_huge_pages_recovery_ratio(const char *val, const struct kernel_param *kp);
 
 static struct kernel_param_ops nx_huge_pages_ops = {
 	.set = set_nx_huge_pages,
 	.get = param_get_bool,
 };
 
+static struct kernel_param_ops nx_huge_pages_recovery_ratio_ops = {
+	.set = set_nx_huge_pages_recovery_ratio,
+	.get = param_get_uint,
+};
+
 module_param_cb(nx_huge_pages, &nx_huge_pages_ops, &nx_huge_pages, 0644);
 __MODULE_PARM_TYPE(nx_huge_pages, "bool");
+module_param_cb(nx_huge_pages_recovery_ratio, &nx_huge_pages_recovery_ratio_ops,
+		&nx_huge_pages_recovery_ratio, 0644);
+__MODULE_PARM_TYPE(nx_huge_pages_recovery_ratio, "uint");
 
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
@@ -1215,6 +1226,8 @@ static void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 		return;
 
 	++kvm->stat.nx_lpage_splits;
+	list_add_tail(&sp->lpage_disallowed_link,
+		      &kvm->arch.lpage_disallowed_mmu_pages);
 	sp->lpage_disallowed = true;
 }
 
@@ -1239,6 +1252,7 @@ static void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	--kvm->stat.nx_lpage_splits;
 	sp->lpage_disallowed = false;
+	list_del(&sp->lpage_disallowed_link);
 }
 
 static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
@@ -6268,6 +6282,8 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
 			idx = srcu_read_lock(&kvm->srcu);
 			kvm_mmu_zap_all_fast(kvm);
 			srcu_read_unlock(&kvm->srcu, idx);
+
+			wake_up_process(kvm->arch.nx_lpage_recovery_thread);
 		}
 		mutex_unlock(&kvm_lock);
 	}
@@ -6361,3 +6377,116 @@ void kvm_mmu_module_exit(void)
 	unregister_shrinker(&mmu_shrinker);
 	mmu_audit_disable();
 }
+
+static int set_nx_huge_pages_recovery_ratio(const char *val, const struct kernel_param *kp)
+{
+	unsigned int old_val;
+	int err;
+
+	old_val = nx_huge_pages_recovery_ratio;
+	err = param_set_uint(val, kp);
+	if (err)
+		return err;
+
+	if (READ_ONCE(nx_huge_pages) &&
+	    !old_val && nx_huge_pages_recovery_ratio) {
+		struct kvm *kvm;
+
+		mutex_lock(&kvm_lock);
+
+		list_for_each_entry(kvm, &vm_list, vm_list)
+			wake_up_process(kvm->arch.nx_lpage_recovery_thread);
+
+		mutex_unlock(&kvm_lock);
+	}
+
+	return err;
+}
+
+static void kvm_recover_nx_lpages(struct kvm *kvm)
+{
+	int rcu_idx;
+	struct kvm_mmu_page *sp;
+	unsigned int ratio;
+	LIST_HEAD(invalid_list);
+	ulong to_zap;
+
+	rcu_idx = srcu_read_lock(&kvm->srcu);
+	spin_lock(&kvm->mmu_lock);
+
+	ratio = READ_ONCE(nx_huge_pages_recovery_ratio);
+	to_zap = ratio ? DIV_ROUND_UP(kvm->stat.nx_lpage_splits, ratio) : 0;
+	while (to_zap && !list_empty(&kvm->arch.lpage_disallowed_mmu_pages)) {
+		/*
+		 * We use a separate list instead of just using active_mmu_pages
+		 * because the number of lpage_disallowed pages is expected to
+		 * be relatively small compared to the total.
+		 */
+		sp = list_first_entry(&kvm->arch.lpage_disallowed_mmu_pages,
+				      struct kvm_mmu_page,
+				      lpage_disallowed_link);
+		WARN_ON_ONCE(!sp->lpage_disallowed);
+		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+		WARN_ON_ONCE(sp->lpage_disallowed);
+
+		if (!--to_zap || need_resched() || spin_needbreak(&kvm->mmu_lock)) {
+			kvm_mmu_commit_zap_page(kvm, &invalid_list);
+			if (to_zap)
+				cond_resched_lock(&kvm->mmu_lock);
+		}
+	}
+
+	spin_unlock(&kvm->mmu_lock);
+	srcu_read_unlock(&kvm->srcu, rcu_idx);
+}
+
+static long get_nx_lpage_recovery_timeout(u64 start_time)
+{
+	return READ_ONCE(nx_huge_pages) && READ_ONCE(nx_huge_pages_recovery_ratio)
+		? start_time + 60 * HZ - get_jiffies_64()
+		: MAX_SCHEDULE_TIMEOUT;
+}
+
+static int kvm_nx_lpage_recovery_worker(struct kvm *kvm, uintptr_t data)
+{
+	u64 start_time;
+	long remaining_time;
+
+	while (true) {
+		start_time = get_jiffies_64();
+		remaining_time = get_nx_lpage_recovery_timeout(start_time);
+
+		set_current_state(TASK_INTERRUPTIBLE);
+		while (!kthread_should_stop() && remaining_time > 0) {
+			schedule_timeout(remaining_time);
+			remaining_time = get_nx_lpage_recovery_timeout(start_time);
+			set_current_state(TASK_INTERRUPTIBLE);
+		}
+
+		set_current_state(TASK_RUNNING);
+
+		if (kthread_should_stop())
+			return 0;
+
+		kvm_recover_nx_lpages(kvm);
+	}
+}
+
+int kvm_mmu_post_init_vm(struct kvm *kvm)
+{
+	int err;
+
+	err = kvm_vm_create_worker_thread(kvm, kvm_nx_lpage_recovery_worker, 0,
+					  "kvm-nx-lpage-recovery",
+					  &kvm->arch.nx_lpage_recovery_thread);
+	if (!err)
+		kthread_unpark(kvm->arch.nx_lpage_recovery_thread);
+
+	return err;
+}
+
+void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
+{
+	if (kvm->arch.nx_lpage_recovery_thread)
+		kthread_stop(kvm->arch.nx_lpage_recovery_thread);
+}
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 11f8ec89433b..d55674f44a18 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -210,4 +210,8 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn);
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
+
+int kvm_mmu_post_init_vm(struct kvm *kvm);
+void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
+
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f81f547d138c..723f2f99a92d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9459,6 +9459,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
+	INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
 	atomic_set(&kvm->arch.noncoherent_dma_count, 0);
 
@@ -9487,6 +9488,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	return kvm_x86_ops->vm_init(kvm);
 }
 
+int kvm_arch_post_init_vm(struct kvm *kvm)
+{
+	return kvm_mmu_post_init_vm(kvm);
+}
+
 static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
 {
 	vcpu_load(vcpu);
@@ -9588,6 +9594,11 @@ int x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
 }
 EXPORT_SYMBOL_GPL(x86_set_memory_region);
 
+void kvm_arch_pre_destroy_vm(struct kvm *kvm)
+{
+	kvm_mmu_pre_destroy_vm(kvm);
+}
+
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	if (current->mm == kvm->mm) {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 35c4d9aac184..a7ff1e8b2f72 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -626,6 +626,23 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 	return 0;
 }
 
+/*
+ * Called after the VM is otherwise initialized, but just before adding it to
+ * the vm_list.
+ */
+int __weak kvm_arch_post_init_vm(struct kvm *kvm)
+{
+	return 0;
+}
+
+/*
+ * Called just after removing the VM from the vm_list, but before doing any
+ * other destruction.
+ */
+void __weak kvm_arch_pre_destroy_vm(struct kvm *kvm)
+{
+}
+
 static struct kvm *kvm_create_vm(unsigned long type)
 {
 	int r, i;
@@ -676,11 +693,15 @@ static struct kvm *kvm_create_vm(unsigned long type)
 		rcu_assign_pointer(kvm->buses[i],
 			kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL_ACCOUNT));
 		if (!kvm->buses[i])
-			goto out_err;
+			goto out_err_no_mmu_notifier;
 	}
 
 	r = kvm_init_mmu_notifier(kvm);
 	if (r)
+		goto out_err_no_mmu_notifier;
+
+	r = kvm_arch_post_init_vm(kvm);
+	if (r)
 		goto out_err;
 
 	mutex_lock(&kvm_lock);
@@ -692,6 +713,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	return kvm;
 
 out_err:
+#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
+	if (kvm->mmu_notifier.ops)
+		mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
+#endif
+out_err_no_mmu_notifier:
 	cleanup_srcu_struct(&kvm->irq_srcu);
 out_err_no_irq_srcu:
 	cleanup_srcu_struct(&kvm->srcu);
@@ -734,6 +760,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	mutex_lock(&kvm_lock);
 	list_del(&kvm->vm_list);
 	mutex_unlock(&kvm_lock);
+	kvm_arch_pre_destroy_vm(kvm);
+
 	kvm_free_irq_routing(kvm);
 	for (i = 0; i < KVM_NR_BUSES; i++) {
 		struct kvm_io_bus *bus = kvm_get_bus(kvm, i);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v7 0/5] NX 0
  2019-10-24 16:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
                   ` (4 preceding siblings ...)
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 5/5] NX 5 Paolo Bonzini
@ 2019-10-24 18:27 ` Ben Hutchings
  2019-10-24 18:56   ` Paolo Bonzini
  5 siblings, 1 reply; 18+ messages in thread
From: Ben Hutchings @ 2019-10-24 18:27 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1761 bytes --]

On Thu, 2019-10-24 at 18:34 +0200, speck for Paolo Bonzini wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
> Subject: [PATCH v7 0/5] NX 0
> 
> 
> v6->v7: tell nested hypervisors to disable the mitigation

Does that eliminate the need to patch qemu?

Ben.

> Junaid Shahid (2):
>   kvm: Add helper function for creating VM worker threads
>   kvm: x86: mmu: Recovery of shattered NX large pages
> 
> Paolo Bonzini (1):
>   kvm: mmu: ITLB_MULTIHIT mitigation
> 
> Pawan Gupta (2):
>   x86: Add ITLB_MULTIHIT bug infrastructure
>   x86/cpu: Add Tremont to the cpu vulnerability whitelist
> 
>  Documentation/ABI/testing/sysfs-devices-system-cpu |   1 +
>  Documentation/admin-guide/kernel-parameters.txt    |  17 ++
>  arch/x86/include/asm/cpufeatures.h                 |   1 +
>  arch/x86/include/asm/kvm_host.h                    |   6 +
>  arch/x86/include/asm/msr-index.h                   |   7 +
>  arch/x86/kernel/cpu/bugs.c                         |  24 ++
>  arch/x86/kernel/cpu/common.c                       |  73 +++---
>  arch/x86/kvm/mmu.c                                 | 264 ++++++++++++++++++++-
>  arch/x86/kvm/mmu.h                                 |   4 +
>  arch/x86/kvm/paging_tmpl.h                         |  29 ++-
>  arch/x86/kvm/x86.c                                 |  20 ++
>  drivers/base/cpu.c                                 |   8 +
>  include/linux/cpu.h                                |   2 +
>  include/linux/kvm_host.h                           |   6 +
>  virt/kvm/kvm_main.c                                | 114 ++++++++-
>  15 files changed, 530 insertions(+), 46 deletions(-)
> 
-- 
Ben Hutchings
If God had intended Man to program,
we'd have been born with serial I/O ports.



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v7 0/5] NX 0
  2019-10-24 18:27 ` [MODERATED] Re: [PATCH v7 0/5] NX 0 Ben Hutchings
@ 2019-10-24 18:56   ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-24 18:56 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1925 bytes --]

On 24/10/19 20:27, speck for Ben Hutchings wrote:
> On Thu, 2019-10-24 at 18:34 +0200, speck for Paolo Bonzini wrote:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Subject: [PATCH v7 0/5] NX 0
>>
>>
>> v6->v7: tell nested hypervisors to disable the mitigation
> 
> Does that eliminate the need to patch qemu?

No, it *introduces* the need to patch QEMU (because QEMU now can choose
to enable that bit in MSR_IA32_ARCH_CAPABILITIES; before, it need not
know about the bit).

Paolo

> Ben.
> 
>> Junaid Shahid (2):
>>   kvm: Add helper function for creating VM worker threads
>>   kvm: x86: mmu: Recovery of shattered NX large pages
>>
>> Paolo Bonzini (1):
>>   kvm: mmu: ITLB_MULTIHIT mitigation
>>
>> Pawan Gupta (2):
>>   x86: Add ITLB_MULTIHIT bug infrastructure
>>   x86/cpu: Add Tremont to the cpu vulnerability whitelist
>>
>>  Documentation/ABI/testing/sysfs-devices-system-cpu |   1 +
>>  Documentation/admin-guide/kernel-parameters.txt    |  17 ++
>>  arch/x86/include/asm/cpufeatures.h                 |   1 +
>>  arch/x86/include/asm/kvm_host.h                    |   6 +
>>  arch/x86/include/asm/msr-index.h                   |   7 +
>>  arch/x86/kernel/cpu/bugs.c                         |  24 ++
>>  arch/x86/kernel/cpu/common.c                       |  73 +++---
>>  arch/x86/kvm/mmu.c                                 | 264 ++++++++++++++++++++-
>>  arch/x86/kvm/mmu.h                                 |   4 +
>>  arch/x86/kvm/paging_tmpl.h                         |  29 ++-
>>  arch/x86/kvm/x86.c                                 |  20 ++
>>  drivers/base/cpu.c                                 |   8 +
>>  include/linux/cpu.h                                |   2 +
>>  include/linux/kvm_host.h                           |   6 +
>>  virt/kvm/kvm_main.c                                | 114 ++++++++-
>>  15 files changed, 530 insertions(+), 46 deletions(-)
>>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** [PATCH v7 3/5] NX 3
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 3/5] NX 3 Paolo Bonzini
@ 2019-10-25  8:37   ` Joerg Roedel
  2019-10-25  8:48     ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2019-10-25  8:37 UTC (permalink / raw)
  To: speck

Hi Paolo,

On Thu, Oct 24, 2019 at 06:34:28PM +0200, speck for Paolo Bonzini wrote:
>  Documentation/admin-guide/kernel-parameters.txt |  11 ++
>  arch/x86/include/asm/kvm_host.h                 |   2 +
>  arch/x86/kernel/cpu/bugs.c                      |  13 ++-
>  arch/x86/kvm/mmu.c                              | 135 ++++++++++++++++++++++--
>  arch/x86/kvm/paging_tmpl.h                      |  29 +++--
>  arch/x86/kvm/x86.c                              |   9 ++
>  6 files changed, 186 insertions(+), 13 deletions(-)

I did some testing on 5.4-rc4 with these patches and they break the
ept=off case on some machines, especially those with the load_ia32_efer
capability:

	kvm [10312]: vcpu0, guest rIP: 0xffffffff810643f8 disabled perfctr wrmsr: 0xc2 data 0xffff
	walk_shadow_page_get_mmio_spte: detect reserved bits on spte, addr 0xbffbffd8, dump hierarchy:
	------ spte 0x80000003ab000ce7 level 2.

The reason is that during early guest boot KVM direct-maps the guest
memory with a PAE page-table and pages there get the NX bit set. But the
guest has EFER.NX=0, which causes the NX-bit to be reserved.

I fixed it with this diff:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e7970a2e8eae..6e9380a0ca41 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -997,7 +997,7 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 	 * On CPUs that support "load IA32_EFER", always switch EFER
 	 * atomically, since it's faster than switching it manually.
 	 */
-	if (cpu_has_load_ia32_efer() ||
+	if ((cpu_has_load_ia32_efer() && (guest_efer & EFER_NX)) ||
 	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
 		if (!(guest_efer & EFER_LMA))
 			guest_efer &= ~EFER_LME;


Regards,

	Joerg

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** [PATCH v7 3/5] NX 3
  2019-10-25  8:37   ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
@ 2019-10-25  8:48     ` Paolo Bonzini
  2019-10-25  9:03       ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-25  8:48 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1661 bytes --]

On 25/10/19 10:37, speck for Joerg Roedel wrote:
> I fixed it with this diff:
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e7970a2e8eae..6e9380a0ca41 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -997,7 +997,7 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>  	 * On CPUs that support "load IA32_EFER", always switch EFER
>  	 * atomically, since it's faster than switching it manually.
>  	 */
> -	if (cpu_has_load_ia32_efer() ||
> +	if ((cpu_has_load_ia32_efer() && (guest_efer & EFER_NX)) ||
>  	    (enable_ept && ((vmx->vcpu.arch.efer ^ host_efer) & EFER_NX))) {
>  		if (!(guest_efer & EFER_LMA))
>  			guest_efer &= ~EFER_LME;
> 

What about this instead (completely untested):

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e7970a2e8eae..1f923dee99e5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -992,6 +992,9 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 		ignore_bits &= ~(u64)EFER_SCE;
 #endif
 
+	guest_efer &= ~ignore_bits;
+	guest_efer |= host_efer & ignore_bits;
+
 	/*
 	 * On EPT, we can't emulate NX, so we must switch EFER atomically.
 	 * On CPUs that support "load IA32_EFER", always switch EFER
@@ -1010,9 +1013,6 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 	} else {
 		clear_atomic_switch_msr(vmx, MSR_EFER);
 
-		guest_efer &= ~ignore_bits;
-		guest_efer |= host_efer & ignore_bits;
-
 		vmx->guest_msrs[efer_offset].data = guest_efer;
 		vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
 



^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: [PATCH v7 3/5] NX 3
  2019-10-25  8:48     ` Paolo Bonzini
@ 2019-10-25  9:03       ` Joerg Roedel
  2019-10-25  9:08         ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2019-10-25  9:03 UTC (permalink / raw)
  To: speck

On Fri, Oct 25, 2019 at 10:48:04AM +0200, speck for Paolo Bonzini wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e7970a2e8eae..1f923dee99e5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -992,6 +992,9 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>  		ignore_bits &= ~(u64)EFER_SCE;
>  #endif
>  
> +	guest_efer &= ~ignore_bits;
> +	guest_efer |= host_efer & ignore_bits;
> +
>  	/*
>  	 * On EPT, we can't emulate NX, so we must switch EFER atomically.
>  	 * On CPUs that support "load IA32_EFER", always switch EFER
> @@ -1010,9 +1013,6 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>  	} else {
>  		clear_atomic_switch_msr(vmx, MSR_EFER);
>  
> -		guest_efer &= ~ignore_bits;
> -		guest_efer |= host_efer & ignore_bits;
> -
>  		vmx->guest_msrs[efer_offset].data = guest_efer;
>  		vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
>  
> 
> 

This works on my machine with ept=0 but breaks with ept=1:

	KVM: entry failed, hardware error 0x80000021

Regards,

	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: [PATCH v7 3/5] NX 3
  2019-10-25  9:03       ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
@ 2019-10-25  9:08         ` Paolo Bonzini
  2019-10-25  9:45           ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-25  9:08 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 2222 bytes --]

On 25/10/19 11:03, speck for Joerg Roedel wrote:
> On Fri, Oct 25, 2019 at 10:48:04AM +0200, speck for Paolo Bonzini wrote:
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index e7970a2e8eae..1f923dee99e5 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -992,6 +992,9 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>>  		ignore_bits &= ~(u64)EFER_SCE;
>>  #endif
>>  
>> +	guest_efer &= ~ignore_bits;
>> +	guest_efer |= host_efer & ignore_bits;
>> +
>>  	/*
>>  	 * On EPT, we can't emulate NX, so we must switch EFER atomically.
>>  	 * On CPUs that support "load IA32_EFER", always switch EFER
>> @@ -1010,9 +1013,6 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>>  	} else {
>>  		clear_atomic_switch_msr(vmx, MSR_EFER);
>>  
>> -		guest_efer &= ~ignore_bits;
>> -		guest_efer |= host_efer & ignore_bits;
>> -
>>  		vmx->guest_msrs[efer_offset].data = guest_efer;
>>  		vmx->guest_msrs[efer_offset].mask = ~ignore_bits;
>>  
>>
>>
> 
> This works on my machine with ept=0 but breaks with ept=1:
> 
> 	KVM: entry failed, hardware error 0x80000021

Even simpler:

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e7970a2e8eae..8979d5e7b6f5 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -969,17 +969,9 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
 	u64 guest_efer = vmx->vcpu.arch.efer;
 	u64 ignore_bits = 0;
 
-	if (!enable_ept) {
-		/*
-		 * NX is needed to handle CR0.WP=1, CR4.SMEP=1.  Testing
-		 * host CPUID is more efficient than testing guest CPUID
-		 * or CR4.  Host SMEP is anyway a requirement for guest SMEP.
-		 */
-		if (boot_cpu_has(X86_FEATURE_SMEP))
-			guest_efer |= EFER_NX;
-		else if (!(guest_efer & EFER_NX))
-			ignore_bits |= EFER_NX;
-	}
+	/* Shadow paging assumes the NX bit to be available.  */
+	if (!enable_ept)
+		guest_efer |= EFER_NX;
 
 	/*
 	 * LMA and LME handled by hardware; SCE meaningless outside long mode.



This also shows why I couldn't reproduce it, my machine has SMEP and thus
always runs the guest with EFER.NXE=1.

Paolo


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: Re: [PATCH v7 3/5] NX 3
  2019-10-25  9:08         ` Paolo Bonzini
@ 2019-10-25  9:45           ` Joerg Roedel
  2019-10-25 10:39             ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Joerg Roedel @ 2019-10-25  9:45 UTC (permalink / raw)
  To: speck

On Fri, Oct 25, 2019 at 11:08:25AM +0200, speck for Paolo Bonzini wrote:
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index e7970a2e8eae..8979d5e7b6f5 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -969,17 +969,9 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>  	u64 guest_efer = vmx->vcpu.arch.efer;
>  	u64 ignore_bits = 0;
>  
> -	if (!enable_ept) {
> -		/*
> -		 * NX is needed to handle CR0.WP=1, CR4.SMEP=1.  Testing
> -		 * host CPUID is more efficient than testing guest CPUID
> -		 * or CR4.  Host SMEP is anyway a requirement for guest SMEP.
> -		 */
> -		if (boot_cpu_has(X86_FEATURE_SMEP))
> -			guest_efer |= EFER_NX;
> -		else if (!(guest_efer & EFER_NX))
> -			ignore_bits |= EFER_NX;
> -	}
> +	/* Shadow paging assumes the NX bit to be available.  */
> +	if (!enable_ept)
> +		guest_efer |= EFER_NX;
>  
>  	/*
>  	 * LMA and LME handled by hardware; SCE meaningless outside long mode.

Works with ept on and off, thanks.


	Joerg

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: Re: [PATCH v7 3/5] NX 3
  2019-10-25  9:45           ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
@ 2019-10-25 10:39             ` Paolo Bonzini
  2019-10-29 20:33               ` mark gross
  0 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-25 10:39 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 1858 bytes --]

On 25/10/19 11:45, speck for Joerg Roedel wrote:
> On Fri, Oct 25, 2019 at 11:08:25AM +0200, speck for Paolo Bonzini wrote:
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index e7970a2e8eae..8979d5e7b6f5 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -969,17 +969,9 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>>  	u64 guest_efer = vmx->vcpu.arch.efer;
>>  	u64 ignore_bits = 0;
>>  
>> -	if (!enable_ept) {
>> -		/*
>> -		 * NX is needed to handle CR0.WP=1, CR4.SMEP=1.  Testing
>> -		 * host CPUID is more efficient than testing guest CPUID
>> -		 * or CR4.  Host SMEP is anyway a requirement for guest SMEP.
>> -		 */
>> -		if (boot_cpu_has(X86_FEATURE_SMEP))
>> -			guest_efer |= EFER_NX;
>> -		else if (!(guest_efer & EFER_NX))
>> -			ignore_bits |= EFER_NX;
>> -	}
>> +	/* Shadow paging assumes the NX bit to be available.  */
>> +	if (!enable_ept)
>> +		guest_efer |= EFER_NX;
>>  
>>  	/*
>>  	 * LMA and LME handled by hardware; SCE meaningless outside long mode.
> 
> Works with ept on and off, thanks.

Thanks, I'll include also the AMD version in a new patch and send it out as v8:

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 4153ca8cddb7..29feb3ecc91c 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -739,8 +739,12 @@ static int get_npt_level(struct kvm_vcpu *vcpu)
 static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
 {
 	vcpu->arch.efer = efer;
-	if (!npt_enabled && !(efer & EFER_LMA))
-		efer &= ~EFER_LME;
+	if (!npt_enabled) {
+		/* Shadow paging assumes the NX bit to be available.  */
+		efer |= EFER_NXE;
+		if (!(efer & EFER_LMA))
+			efer &= ~EFER_LME;
+	}
 
 	to_svm(vcpu)->vmcb->save.efer = efer | EFER_SVME;
 	mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR);

Paolo


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: Re: [PATCH v7 3/5] NX 3
  2019-10-25 10:39             ` Paolo Bonzini
@ 2019-10-29 20:33               ` mark gross
  2019-10-31 11:02                 ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: mark gross @ 2019-10-29 20:33 UTC (permalink / raw)
  To: speck

On Fri, Oct 25, 2019 at 12:39:58PM +0200, speck for Paolo Bonzini wrote:
> On 25/10/19 11:45, speck for Joerg Roedel wrote:
> > On Fri, Oct 25, 2019 at 11:08:25AM +0200, speck for Paolo Bonzini wrote:
> >> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> >> index e7970a2e8eae..8979d5e7b6f5 100644
> >> --- a/arch/x86/kvm/vmx/vmx.c
> >> +++ b/arch/x86/kvm/vmx/vmx.c
> >> @@ -969,17 +969,9 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
> >>  	u64 guest_efer = vmx->vcpu.arch.efer;
> >>  	u64 ignore_bits = 0;
> >>  
> >> -	if (!enable_ept) {
> >> -		/*
> >> -		 * NX is needed to handle CR0.WP=1, CR4.SMEP=1.  Testing
> >> -		 * host CPUID is more efficient than testing guest CPUID
> >> -		 * or CR4.  Host SMEP is anyway a requirement for guest SMEP.
> >> -		 */
> >> -		if (boot_cpu_has(X86_FEATURE_SMEP))
> >> -			guest_efer |= EFER_NX;
> >> -		else if (!(guest_efer & EFER_NX))
> >> -			ignore_bits |= EFER_NX;
> >> -	}
> >> +	/* Shadow paging assumes the NX bit to be available.  */
> >> +	if (!enable_ept)
> >> +		guest_efer |= EFER_NX;
> >>  
> >>  	/*
> >>  	 * LMA and LME handled by hardware; SCE meaningless outside long mode.
> > 
> > Works with ept on and off, thanks.
> 
> Thanks, I'll include also the AMD version in a new patch and send it out as v8:
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 4153ca8cddb7..29feb3ecc91c 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -739,8 +739,12 @@ static int get_npt_level(struct kvm_vcpu *vcpu)
>  static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>  {
>  	vcpu->arch.efer = efer;
> -	if (!npt_enabled && !(efer & EFER_LMA))
> -		efer &= ~EFER_LME;
> +	if (!npt_enabled) {
> +		/* Shadow paging assumes the NX bit to be available.  */
> +		efer |= EFER_NXE;
                         ^EFER_NX ?
--mark

> +		if (!(efer & EFER_LMA))
> +			efer &= ~EFER_LME;
> +	}
>  
>  	to_svm(vcpu)->vmcb->save.efer = efer | EFER_SVME;
>  	mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR);
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v7 1/5] NX 1
  2019-10-24 16:34 ` [MODERATED] [PATCH v7 1/5] NX 1 Paolo Bonzini
@ 2019-10-30 18:38   ` Josh Poimboeuf
  2019-10-31 11:05     ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: Josh Poimboeuf @ 2019-10-30 18:38 UTC (permalink / raw)
  To: speck

On Thu, Oct 24, 2019 at 06:34:26PM +0200, speck for Paolo Bonzini wrote:
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -399,5 +399,6 @@
>  #define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
>  #define X86_BUG_MSBDS_ONLY		X86_BUG(20) /* CPU is only affected by the  MSDBS variant of BUG_MDS */
>  #define X86_BUG_SWAPGS			X86_BUG(21) /* CPU is affected by speculation through SWAPGS */
> +#define X86_BUG_ITLB_MULTIHIT		X86_BUG(22) /* CPU may incur MCE during certain page attribute changes */

This conflicts with TAA - X86_BUG_TAA is also 22.

-- 
Josh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: Re: [PATCH v7 3/5] NX 3
  2019-10-29 20:33               ` mark gross
@ 2019-10-31 11:02                 ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-31 11:02 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 2296 bytes --]

On 29/10/19 21:33, speck for mark gross wrote:
> On Fri, Oct 25, 2019 at 12:39:58PM +0200, speck for Paolo Bonzini wrote:
>> On 25/10/19 11:45, speck for Joerg Roedel wrote:
>>> On Fri, Oct 25, 2019 at 11:08:25AM +0200, speck for Paolo Bonzini wrote:
>>>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>>>> index e7970a2e8eae..8979d5e7b6f5 100644
>>>> --- a/arch/x86/kvm/vmx/vmx.c
>>>> +++ b/arch/x86/kvm/vmx/vmx.c
>>>> @@ -969,17 +969,9 @@ static bool update_transition_efer(struct vcpu_vmx *vmx, int efer_offset)
>>>>  	u64 guest_efer = vmx->vcpu.arch.efer;
>>>>  	u64 ignore_bits = 0;
>>>>  
>>>> -	if (!enable_ept) {
>>>> -		/*
>>>> -		 * NX is needed to handle CR0.WP=1, CR4.SMEP=1.  Testing
>>>> -		 * host CPUID is more efficient than testing guest CPUID
>>>> -		 * or CR4.  Host SMEP is anyway a requirement for guest SMEP.
>>>> -		 */
>>>> -		if (boot_cpu_has(X86_FEATURE_SMEP))
>>>> -			guest_efer |= EFER_NX;
>>>> -		else if (!(guest_efer & EFER_NX))
>>>> -			ignore_bits |= EFER_NX;
>>>> -	}
>>>> +	/* Shadow paging assumes the NX bit to be available.  */
>>>> +	if (!enable_ept)
>>>> +		guest_efer |= EFER_NX;
>>>>  
>>>>  	/*
>>>>  	 * LMA and LME handled by hardware; SCE meaningless outside long mode.
>>>
>>> Works with ept on and off, thanks.
>>
>> Thanks, I'll include also the AMD version in a new patch and send it out as v8:
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 4153ca8cddb7..29feb3ecc91c 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -739,8 +739,12 @@ static int get_npt_level(struct kvm_vcpu *vcpu)
>>  static void svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>>  {
>>  	vcpu->arch.efer = efer;
>> -	if (!npt_enabled && !(efer & EFER_LMA))
>> -		efer &= ~EFER_LME;
>> +	if (!npt_enabled) {
>> +		/* Shadow paging assumes the NX bit to be available.  */
>> +		efer |= EFER_NXE;
>                          ^EFER_NX ?

Yes, see message id 20191027152323.24326-1-pbonzini@redhat.com on
kvm@vger.kernel.org; I am about to send it to Linus.

Paolo

> 
>> +		if (!(efer & EFER_LMA))
>> +			efer &= ~EFER_LME;
>> +	}
>>  
>>  	to_svm(vcpu)->vmcb->save.efer = efer | EFER_SVME;
>>  	mark_dirty(to_svm(vcpu)->vmcb, VMCB_CR);
>>
>> Paolo
>>
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v7 1/5] NX 1
  2019-10-30 18:38   ` [MODERATED] " Josh Poimboeuf
@ 2019-10-31 11:05     ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-31 11:05 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 918 bytes --]

On 30/10/19 19:38, speck for Josh Poimboeuf wrote:
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -399,5 +399,6 @@
>>  #define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
>>  #define X86_BUG_MSBDS_ONLY		X86_BUG(20) /* CPU is only affected by the  MSDBS variant of BUG_MDS */
>>  #define X86_BUG_SWAPGS			X86_BUG(21) /* CPU is affected by speculation through SWAPGS */
>> +#define X86_BUG_ITLB_MULTIHIT		X86_BUG(22) /* CPU may incur MCE during certain page attribute changes */
> This conflicts with TAA - X86_BUG_TAA is also 22.

Pawan said he'd change TAA to 23, but in the end he didn't.  Since
Joerg's conflict resolution[1] kept TAA as 22 and bumped ITLB_MULTIHIT
to 23, let's keep it that way.

[1] See message id 20191028160203.GB838@suse.de, subject "Re:
***UNCHECKED*** Re: [PATCH v7 0/5] NX 0".

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-10-31 11:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24 16:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
2019-10-24 16:34 ` [MODERATED] [PATCH v7 1/5] NX 1 Paolo Bonzini
2019-10-30 18:38   ` [MODERATED] " Josh Poimboeuf
2019-10-31 11:05     ` Paolo Bonzini
2019-10-24 16:34 ` [MODERATED] [PATCH v7 2/5] NX 2 Paolo Bonzini
2019-10-24 16:34 ` [MODERATED] [PATCH v7 3/5] NX 3 Paolo Bonzini
2019-10-25  8:37   ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
2019-10-25  8:48     ` Paolo Bonzini
2019-10-25  9:03       ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
2019-10-25  9:08         ` Paolo Bonzini
2019-10-25  9:45           ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
2019-10-25 10:39             ` Paolo Bonzini
2019-10-29 20:33               ` mark gross
2019-10-31 11:02                 ` Paolo Bonzini
2019-10-24 16:34 ` [MODERATED] [PATCH v7 4/5] NX 4 Paolo Bonzini
2019-10-24 16:34 ` [MODERATED] [PATCH v7 5/5] NX 5 Paolo Bonzini
2019-10-24 18:27 ` [MODERATED] Re: [PATCH v7 0/5] NX 0 Ben Hutchings
2019-10-24 18:56   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).