[MODERATED] [PATCH v7 0/5] NX 0

historical-speck.lore.kernel.org archive mirror
 help / color / mirror / Atom feed

* [MODERATED] [PATCH v7 0/5] NX 0
@ 2019-10-24  7:34 Paolo Bonzini
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 1/5] NX 1 Paolo Bonzini
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Paolo Bonzini @ 2019-10-24  7:34 UTC (permalink / raw)
  To: speck

Junaid Shahid (2):
  kvm: Add helper function for creating VM worker threads
  kvm: x86: mmu: Recovery of shattered NX large pages

Paolo Bonzini (1):
  kvm: mmu: ITLB_MULTIHIT mitigation

Pawan Gupta (2):
  x86: Add ITLB_MULTIHIT bug infrastructure
  x86/cpu: Add Tremont to the cpu vulnerability whitelist

 Documentation/ABI/testing/sysfs-devices-system-cpu |   1 +
 Documentation/admin-guide/kernel-parameters.txt    |  17 ++
 arch/x86/include/asm/cpufeatures.h                 |   1 +
 arch/x86/include/asm/kvm_host.h                    |   6 +
 arch/x86/include/asm/msr-index.h                   |   7 +
 arch/x86/kernel/cpu/bugs.c                         |  24 ++
 arch/x86/kernel/cpu/common.c                       |  73 +++---
 arch/x86/kvm/mmu.c                                 | 264 ++++++++++++++++++++-
 arch/x86/kvm/mmu.h                                 |   4 +
 arch/x86/kvm/paging_tmpl.h                         |  29 ++-
 arch/x86/kvm/x86.c                                 |  20 ++
 drivers/base/cpu.c                                 |   8 +
 include/linux/cpu.h                                |   2 +
 include/linux/kvm_host.h                           |   6 +
 virt/kvm/kvm_main.c                                | 114 ++++++++-
 15 files changed, 530 insertions(+), 46 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [MODERATED] [PATCH v7 1/5] NX 1
  2019-10-24  7:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
@ 2019-10-24  7:34 ` Paolo Bonzini
  2019-10-24  8:57   ` [MODERATED] " Paolo Bonzini
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 2/5] NX 2 Paolo Bonzini
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2019-10-24  7:34 UTC (permalink / raw)
  To: speck


Some processors may incur a machine check error possibly
resulting in an unrecoverable cpu hang when an instruction fetch
encounters a TLB multi-hit in the instruction TLB. This can occur
when the page size is changed along with either the physical
address or cache type [1].

This issue affects both bare-metal x86 page tables and EPT.

This can be mitigated by either eliminating the use of large
pages or by using careful TLB invalidations when changing the
page size in the page tables.

Just like Spectre, Meltdown, L1TF and MDS, a new bit has been
allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will
be set on CPUs which are mitigated against this issue.

[1] For example please refer to erratum SKL002 in "6th Generation
Intel Processor Family Specification Update"
https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html
https://www.google.com/search?q=site:intel.com+SKL002

There are a lot of other affected processors outside of Skylake and
that the erratum(referred above) does not fully disclose the issue
and the impact, both on Skylake and across all the affected CPUs.

Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |  1 +
 arch/x86/include/asm/cpufeatures.h                 |  1 +
 arch/x86/include/asm/msr-index.h                   |  7 +++
 arch/x86/kernel/cpu/bugs.c                         | 13 ++++
 arch/x86/kernel/cpu/common.c                       | 71 ++++++++++++----------
 drivers/base/cpu.c                                 |  8 +++
 include/linux/cpu.h                                |  2 +
 7 files changed, 70 insertions(+), 33 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 06d0931119cc..55bf5e1538ad 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -486,6 +486,7 @@ What:		/sys/devices/system/cpu/vulnerabilities
 		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
 		/sys/devices/system/cpu/vulnerabilities/l1tf
 		/sys/devices/system/cpu/vulnerabilities/mds
+		/sys/devices/system/cpu/vulnerabilities/itlb_multihit
 Date:		January 2018
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
 Description:	Information about CPU vulnerabilities
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 0652d3eed9bd..66aaad7611b2 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -399,5 +399,6 @@
 #define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
 #define X86_BUG_MSBDS_ONLY		X86_BUG(20) /* CPU is only affected by the  MSDBS variant of BUG_MDS */
 #define X86_BUG_SWAPGS			X86_BUG(21) /* CPU is affected by speculation through SWAPGS */
+#define X86_BUG_ITLB_MULTIHIT		X86_BUG(22) /* CPU may incur MCE during certain page attribute changes */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 20ce682a2540..c678899b21db 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -93,6 +93,13 @@
 						  * Microarchitectural Data
 						  * Sampling (MDS) vulnerabilities.
 						  */
+#define ARCH_CAP_PSCHANGE_MC_NO		BIT(6)	 /*
+						  * The processor is not susceptible to a
+						  * machine check error due to modifying the
+						  * code page size along with either the
+						  * physical address or cache type
+						  * without TLB invalidation.
+						  */
 
 #define MSR_IA32_FLUSH_CMD		0x0000010b
 #define L1D_FLUSH			BIT(0)	/*
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 91c2561b905f..ecd0126648ea 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1311,6 +1311,11 @@ static ssize_t l1tf_show_state(char *buf)
 }
 #endif
 
+static ssize_t itlb_multihit_show_state(char *buf)
+{
+	return sprintf(buf, "Processor vulnerable\n");
+}
+
 static ssize_t mds_show_state(char *buf)
 {
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
@@ -1398,6 +1403,9 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
 	case X86_BUG_MDS:
 		return mds_show_state(buf);
 
+	case X86_BUG_ITLB_MULTIHIT:
+		return itlb_multihit_show_state(buf);
+
 	default:
 		break;
 	}
@@ -1434,4 +1442,9 @@ ssize_t cpu_show_mds(struct device *dev, struct device_attribute *attr, char *bu
 {
 	return cpu_show_common(dev, attr, buf, X86_BUG_MDS);
 }
+
+ssize_t cpu_show_itlb_multihit(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return cpu_show_common(dev, attr, buf, X86_BUG_ITLB_MULTIHIT);
+}
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9ae7d1bcd4f4..fc00b2349a9f 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1016,13 +1016,14 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
 #endif
 }
 
-#define NO_SPECULATION	BIT(0)
-#define NO_MELTDOWN	BIT(1)
-#define NO_SSB		BIT(2)
-#define NO_L1TF		BIT(3)
-#define NO_MDS		BIT(4)
-#define MSBDS_ONLY	BIT(5)
-#define NO_SWAPGS	BIT(6)
+#define NO_SPECULATION		BIT(0)
+#define NO_MELTDOWN		BIT(1)
+#define NO_SSB			BIT(2)
+#define NO_L1TF			BIT(3)
+#define NO_MDS			BIT(4)
+#define MSBDS_ONLY		BIT(5)
+#define NO_SWAPGS		BIT(6)
+#define NO_ITLB_MULTIHIT	BIT(7)
 
 #define VULNWL(_vendor, _family, _model, _whitelist)	\
 	{ X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist }
@@ -1043,27 +1044,27 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
 	VULNWL(NSC,	5, X86_MODEL_ANY,	NO_SPECULATION),
 
 	/* Intel Family 6 */
-	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION),
-	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION),
-
-	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+
+	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	VULNWL_INTEL(CORE_YONAH,		NO_SSB),
 
-	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
-	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	/*
 	 * Technically, swapgs isn't serializing on AMD (despite it previously
@@ -1074,14 +1075,14 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
 	 */
 
 	/* AMD Family 0xf - 0x12 */
-	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
+	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */
-	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
+	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
 	{}
 };
 
@@ -1096,15 +1097,19 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
 {
 	u64 ia32_cap = 0;
 
+	if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES))
+		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
+
+	/* Set ITLB_MULTIHIT bug if cpu is not in the whitelist and not mitigated */
+	if (!cpu_matches(NO_ITLB_MULTIHIT) && !(ia32_cap & ARCH_CAP_PSCHANGE_MC_NO))
+		setup_force_cpu_bug(X86_BUG_ITLB_MULTIHIT);
+
 	if (cpu_matches(NO_SPECULATION))
 		return;
 
 	setup_force_cpu_bug(X86_BUG_SPECTRE_V1);
 	setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
 
-	if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES))
-		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
-
 	if (!cpu_matches(NO_SSB) && !(ia32_cap & ARCH_CAP_SSB_NO) &&
 	   !cpu_has(c, X86_FEATURE_AMD_SSB_NO))
 		setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index cc37511de866..f1a6e020ed8d 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -554,12 +554,19 @@ ssize_t __weak cpu_show_mds(struct device *dev,
 	return sprintf(buf, "Not affected\n");
 }
 
+ssize_t __weak cpu_show_itlb_multihit(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "Not affected\n");
+}
+
 static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
 static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
 static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
 static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
 static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);
 static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL);
+static DEVICE_ATTR(itlb_multihit, 0444, cpu_show_itlb_multihit, NULL);
 
 static struct attribute *cpu_root_vulnerabilities_attrs[] = {
 	&dev_attr_meltdown.attr,
@@ -568,6 +575,7 @@ ssize_t __weak cpu_show_mds(struct device *dev,
 	&dev_attr_spec_store_bypass.attr,
 	&dev_attr_l1tf.attr,
 	&dev_attr_mds.attr,
+	&dev_attr_itlb_multihit.attr,
 	NULL
 };
 
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index d0633ebdaa9c..038866a28f2c 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -59,6 +59,8 @@ extern ssize_t cpu_show_l1tf(struct device *dev,
 			     struct device_attribute *attr, char *buf);
 extern ssize_t cpu_show_mds(struct device *dev,
 			    struct device_attribute *attr, char *buf);
+extern ssize_t cpu_show_itlb_multihit(struct device *dev,
+				      struct device_attribute *attr, char *buf);
 
 extern __printf(4, 5)
 struct device *cpu_device_create(struct device *parent, void *drvdata,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [MODERATED] [PATCH v7 2/5] NX 2
  2019-10-24  7:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 1/5] NX 1 Paolo Bonzini
@ 2019-10-24  7:34 ` Paolo Bonzini
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 3/5] NX 3 Paolo Bonzini
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2019-10-24  7:34 UTC (permalink / raw)
  To: speck

 whitelist

This patch adds new cpu family ATOM_TREMONT_D to the cpu vunerability
whitelist. ATOM_TREMONT_D is not affected by X86_BUG_ITLB_MULTIHIT. There
may be more bugs not affecting ATOM_TREMONT_D which are not known at
this point and could be added later.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kernel/cpu/common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index fc00b2349a9f..c652ca9dc046 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1074,6 +1074,8 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
 	 * good enough for our purposes.
 	 */
 
+	VULNWL_INTEL(ATOM_TREMONT_D,		NO_ITLB_MULTIHIT),
+
 	/* AMD Family 0xf - 0x12 */
 	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
 	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [MODERATED] [PATCH v7 3/5] NX 3
  2019-10-24  7:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 1/5] NX 1 Paolo Bonzini
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 2/5] NX 2 Paolo Bonzini
@ 2019-10-24  7:34 ` Paolo Bonzini
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 4/5] NX 4 Paolo Bonzini
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2019-10-24  7:34 UTC (permalink / raw)
  To: speck


With some Intel processors, putting the same virtual address in the TLB
as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit
and cause the processor to issue a machine check.  Unfortunately if EPT
page tables use huge pages, it possible for a malicious guest to cause
this situation.

This patch adds a knob to mark huge pages as non-executable. When the
nx_huge_pages parameter is enabled (and we are using EPT), all huge pages
are marked as NX. If the guest attempts to execute in one of those pages,
the page is broken down into 4K pages, which are then marked executable.

This is not an issue for shadow paging (except nested EPT), because then
the host is in control of TLB flushes and the problematic situation cannot
happen.  With nested EPT, again the nested guest can cause problems so we
treat shadow and direct EPT the same.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  11 ++
 arch/x86/include/asm/kvm_host.h                 |   2 +
 arch/x86/kernel/cpu/bugs.c                      |  13 ++-
 arch/x86/kvm/mmu.c                              | 135 ++++++++++++++++++++++--
 arch/x86/kvm/paging_tmpl.h                      |  29 +++--
 arch/x86/kvm/x86.c                              |   9 ++
 6 files changed, 186 insertions(+), 13 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a84a83f8881e..d6449135c62d 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2055,6 +2055,17 @@
 			KVM MMU at runtime.
 			Default is 0 (off)
 
+	kvm.nx_huge_pages=
+			[KVM] Controls the sw workaround for bug
+			X86_BUG_ITLB_MULTIHIT.
+			force	: Always deploy workaround.
+			off	: Default. Never deploy workaround.
+			auto	: Deploy workaround based on presence of
+				  X86_BUG_ITLB_MULTIHIT.
+
+			If the sw workaround is enabled for the host, guests
+			need not enable it for nested guests.
+
 	kvm-amd.nested=	[KVM,AMD] Allow nested virtualization in KVM/SVM.
 			Default is 1 (enabled)
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6f6b8886a8eb..6aba95fdb00b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -313,6 +313,7 @@ struct kvm_mmu_page {
 	bool unsync;
 	u8 mmu_valid_gen;
 	bool mmio_cached;
+	bool lpage_disallowed; /* Can't be replaced by an equiv large page */
 
 	/*
 	 * The following two entries are used to key the shadow page in the
@@ -945,6 +946,7 @@ struct kvm_vm_stat {
 	ulong mmu_unsync;
 	ulong remote_tlb_flush;
 	ulong lpages;
+	ulong nx_lpage_splits;
 	ulong max_mmu_page_hash_collisions;
 };
 
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index ecd0126648ea..fe0378e03dbb 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1149,6 +1149,9 @@ void x86_spec_ctrl_setup_ap(void)
 		x86_amd_ssb_disable();
 }
 
+bool itlb_multihit_kvm_mitigation;
+EXPORT_SYMBOL_GPL(itlb_multihit_kvm_mitigation);
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"L1TF: " fmt
 
@@ -1304,17 +1307,25 @@ static ssize_t l1tf_show_state(char *buf)
 		       l1tf_vmx_states[l1tf_vmx_mitigation],
 		       sched_smt_active() ? "vulnerable" : "disabled");
 }
+
+static ssize_t itlb_multihit_show_state(char *buf)
+{
+	if (itlb_multihit_kvm_mitigation)
+		return sprintf(buf, "KVM: Mitigation: Split huge pages\n");
+	else
+		return sprintf(buf, "KVM: Vulnerable\n");
+}
 #else
 static ssize_t l1tf_show_state(char *buf)
 {
 	return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG);
 }
-#endif
 
 static ssize_t itlb_multihit_show_state(char *buf)
 {
 	return sprintf(buf, "Processor vulnerable\n");
 }
+#endif
 
 static ssize_t mds_show_state(char *buf)
 {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 24c23c66b226..837beefdf0a5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -47,6 +47,20 @@
 #include <asm/kvm_page_track.h>
 #include "trace.h"
 
+extern bool itlb_multihit_kvm_mitigation;
+
+static int __read_mostly nx_huge_pages = -1;
+
+static int set_nx_huge_pages(const char *val, const struct kernel_param *kp);
+
+static struct kernel_param_ops nx_huge_pages_ops = {
+	.set = set_nx_huge_pages,
+	.get = param_get_bool,
+};
+
+module_param_cb(nx_huge_pages, &nx_huge_pages_ops, &nx_huge_pages, 0644);
+__MODULE_PARM_TYPE(nx_huge_pages, "bool");
+
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
  * where the hardware walks 2 page tables:
@@ -352,6 +366,11 @@ static inline bool spte_ad_need_write_protect(u64 spte)
 	return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_ENABLED_MASK;
 }
 
+static bool is_nx_huge_page_enabled(void)
+{
+	return READ_ONCE(nx_huge_pages);
+}
+
 static inline u64 spte_shadow_accessed_mask(u64 spte)
 {
 	MMU_WARN_ON(is_mmio_spte(spte));
@@ -1190,6 +1209,15 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 }
 
+static void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	if (sp->lpage_disallowed)
+		return;
+
+	++kvm->stat.nx_lpage_splits;
+	sp->lpage_disallowed = true;
+}
+
 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	struct kvm_memslots *slots;
@@ -1207,6 +1235,12 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
 
+static void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	--kvm->stat.nx_lpage_splits;
+	sp->lpage_disallowed = false;
+}
+
 static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
 					  struct kvm_memory_slot *slot)
 {
@@ -2792,6 +2826,9 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 			kvm_reload_remote_mmus(kvm);
 	}
 
+	if (sp->lpage_disallowed)
+		unaccount_huge_nx_page(kvm, sp);
+
 	sp->role.invalid = 1;
 	return list_unstable;
 }
@@ -3013,6 +3050,11 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (!speculative)
 		spte |= spte_shadow_accessed_mask(spte);
 
+	if (level > PT_PAGE_TABLE_LEVEL && (pte_access & ACC_EXEC_MASK) &&
+	    is_nx_huge_page_enabled()) {
+		pte_access &= ~ACC_EXEC_MASK;
+	}
+
 	if (pte_access & ACC_EXEC_MASK)
 		spte |= shadow_x_mask;
 	else
@@ -3233,9 +3275,32 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 	__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
+static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator it,
+				       gfn_t gfn, kvm_pfn_t *pfnp, int *levelp)
+{
+	int level = *levelp;
+	u64 spte = *it.sptep;
+
+	if (it.level == level && level > PT_PAGE_TABLE_LEVEL &&
+	    is_nx_huge_page_enabled() &&
+	    is_shadow_present_pte(spte) &&
+	    !is_large_pte(spte)) {
+		/*
+		 * A small SPTE exists for this pfn, but FNAME(fetch)
+		 * and __direct_map would like to create a large PTE
+		 * instead: just force them to go down another level,
+		 * patching back for them into pfn the next 9 bits of
+		 * the address.
+		 */
+		u64 page_mask = KVM_PAGES_PER_HPAGE(level) - KVM_PAGES_PER_HPAGE(level - 1);
+		*pfnp |= gfn & page_mask;
+		(*levelp)--;
+	}
+}
+
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 			int map_writable, int level, kvm_pfn_t pfn,
-			bool prefault)
+			bool prefault, bool lpage_disallowed)
 {
 	struct kvm_shadow_walk_iterator it;
 	struct kvm_mmu_page *sp;
@@ -3248,6 +3313,12 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
 	for_each_shadow_entry(vcpu, gpa, it) {
+		/*
+		 * We cannot overwrite existing page tables with an NX
+		 * large page, as the leaf could be executable.
+		 */
+		disallowed_hugepage_adjust(it, gfn, &pfn, &level);
+
 		base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 		if (it.level == level)
 			break;
@@ -3258,6 +3329,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 					      it.level - 1, true, ACC_ALL);
 
 			link_shadow_page(vcpu, it.sptep, sp);
+			if (lpage_disallowed)
+				account_huge_nx_page(vcpu->kvm, sp);
 		}
 	}
 
@@ -3550,11 +3623,14 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
 {
 	int r;
 	int level;
-	bool force_pt_level = false;
+	bool force_pt_level;
 	kvm_pfn_t pfn;
 	unsigned long mmu_seq;
 	bool map_writable, write = error_code & PFERR_WRITE_MASK;
+	bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
+				is_nx_huge_page_enabled();
 
+	force_pt_level = lpage_disallowed;
 	level = mapping_level(vcpu, gfn, &force_pt_level);
 	if (likely(!force_pt_level)) {
 		/*
@@ -3588,7 +3664,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
 		goto out_unlock;
 	if (likely(!force_pt_level))
 		transparent_hugepage_adjust(vcpu, gfn, &pfn, &level);
-	r = __direct_map(vcpu, v, write, map_writable, level, pfn, prefault);
+	r = __direct_map(vcpu, v, write, map_writable, level, pfn,
+			 prefault, false);
 out_unlock:
 	spin_unlock(&vcpu->kvm->mmu_lock);
 	kvm_release_pfn_clean(pfn);
@@ -4174,6 +4251,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 	unsigned long mmu_seq;
 	int write = error_code & PFERR_WRITE_MASK;
 	bool map_writable;
+	bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
+				is_nx_huge_page_enabled();
 
 	MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa));
 
@@ -4184,8 +4263,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 	if (r)
 		return r;
 
-	force_pt_level = !check_hugepage_cache_consistency(vcpu, gfn,
-							   PT_DIRECTORY_LEVEL);
+	force_pt_level =
+		lpage_disallowed ||
+		!check_hugepage_cache_consistency(vcpu, gfn, PT_DIRECTORY_LEVEL);
 	level = mapping_level(vcpu, gfn, &force_pt_level);
 	if (likely(!force_pt_level)) {
 		if (level > PT_DIRECTORY_LEVEL &&
@@ -4214,7 +4294,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 		goto out_unlock;
 	if (likely(!force_pt_level))
 		transparent_hugepage_adjust(vcpu, gfn, &pfn, &level);
-	r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, prefault);
+	r = __direct_map(vcpu, gpa, write, map_writable, level, pfn,
+			 prefault, lpage_disallowed);
 out_unlock:
 	spin_unlock(&vcpu->kvm->mmu_lock);
 	kvm_release_pfn_clean(pfn);
@@ -6155,10 +6236,52 @@ static void kvm_set_mmio_spte_mask(void)
 	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
 }
 
+static void __set_nx_huge_pages(bool val)
+{
+	nx_huge_pages = itlb_multihit_kvm_mitigation = val;
+}
+
+static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
+{
+	bool old_val = nx_huge_pages;
+	bool new_val;
+
+	/* In "auto" mode deploy workaround only if CPU has the bug. */
+	if (sysfs_streq(val, "off"))
+		new_val = 0;
+	else if (sysfs_streq(val, "force"))
+		new_val = 1;
+	else if (sysfs_streq(val, "auto"))
+		new_val = boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT);
+	else if (strtobool(val, &new_val) < 0)
+		return -EINVAL;
+
+	__set_nx_huge_pages(new_val);
+
+	if (new_val != old_val) {
+		struct kvm *kvm;
+		int idx;
+
+		mutex_lock(&kvm_lock);
+
+		list_for_each_entry(kvm, &vm_list, vm_list) {
+			idx = srcu_read_lock(&kvm->srcu);
+			kvm_mmu_zap_all_fast(kvm);
+			srcu_read_unlock(&kvm->srcu, idx);
+		}
+		mutex_unlock(&kvm_lock);
+	}
+
+	return 0;
+}
+
 int kvm_mmu_module_init(void)
 {
 	int ret = -ENOMEM;
 
+	if (nx_huge_pages == -1)
+		__set_nx_huge_pages(boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT));
+
 	/*
 	 * MMU roles use union aliasing which is, generally speaking, an
 	 * undefined behavior. However, we supposedly know how compilers behave
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 7d5cdb3af594..97b21e7fd013 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -614,13 +614,14 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
 static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			 struct guest_walker *gw,
 			 int write_fault, int hlevel,
-			 kvm_pfn_t pfn, bool map_writable, bool prefault)
+			 kvm_pfn_t pfn, bool map_writable, bool prefault,
+			 bool lpage_disallowed)
 {
 	struct kvm_mmu_page *sp = NULL;
 	struct kvm_shadow_walk_iterator it;
 	unsigned direct_access, access = gw->pt_access;
 	int top_level, ret;
-	gfn_t base_gfn;
+	gfn_t gfn, base_gfn;
 
 	direct_access = gw->pte_access;
 
@@ -665,13 +666,25 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			link_shadow_page(vcpu, it.sptep, sp);
 	}
 
-	base_gfn = gw->gfn;
+	/*
+	 * FNAME(page_fault) might have clobbered the bottom bits of
+	 * gw->gfn, restore them from the virtual address.
+	 */
+	gfn = gw->gfn | ((addr & PT_LVL_OFFSET_MASK(gw->level)) >> PAGE_SHIFT);
+	base_gfn = gfn;
 
 	trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
 
 	for (; shadow_walk_okay(&it); shadow_walk_next(&it)) {
 		clear_sp_write_flooding_count(it.sptep);
-		base_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
+
+		/*
+		 * We cannot overwrite existing page tables with an NX
+		 * large page, as the leaf could be executable.
+		 */
+		disallowed_hugepage_adjust(it, gfn, &pfn, &hlevel);
+
+		base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 		if (it.level == hlevel)
 			break;
 
@@ -683,6 +696,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			sp = kvm_mmu_get_page(vcpu, base_gfn, addr,
 					      it.level - 1, true, direct_access);
 			link_shadow_page(vcpu, it.sptep, sp);
+			if (lpage_disallowed)
+				account_huge_nx_page(vcpu->kvm, sp);
 		}
 	}
 
@@ -759,9 +774,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 	int r;
 	kvm_pfn_t pfn;
 	int level = PT_PAGE_TABLE_LEVEL;
-	bool force_pt_level = false;
 	unsigned long mmu_seq;
 	bool map_writable, is_self_change_mapping;
+	bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
+				is_nx_huge_page_enabled();
+	bool force_pt_level = lpage_disallowed;
 
 	pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
 
@@ -851,7 +868,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 	if (!force_pt_level)
 		transparent_hugepage_adjust(vcpu, walker.gfn, &pfn, &level);
 	r = FNAME(fetch)(vcpu, addr, &walker, write_fault,
-			 level, pfn, map_writable, prefault);
+			 level, pfn, map_writable, prefault, lpage_disallowed);
 	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
 
 out_unlock:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19a0dc96beca..f81f547d138c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -215,6 +215,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ "mmu_unsync", VM_STAT(mmu_unsync) },
 	{ "remote_tlb_flush", VM_STAT(remote_tlb_flush) },
 	{ "largepages", VM_STAT(lpages, .mode = 0444) },
+	{ "nx_largepages_splitted", VM_STAT(nx_lpage_splits, .mode = 0444) },
 	{ "max_mmu_page_hash_collisions",
 		VM_STAT(max_mmu_page_hash_collisions) },
 	{ NULL }
@@ -1286,6 +1287,14 @@ static u64 kvm_get_arch_capabilities(void)
 		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, data);
 
 	/*
+	 * If nx_huge_pages is enabled, KVM's shadow paging will ensure that
+	 * the nested hypervisor runs with NX huge pages.  If it is not,
+	 * L1 is anyway vulnerable to ITLB_MULTIHIT explots from other
+	 * L1 guests, so it need not worry about its own (L2) guests.
+	 */
+	data |= ARCH_CAP_PSCHANGE_MC_NO;
+
+	/*
 	 * If we're doing cache flushes (either "always" or "cond")
 	 * we will do one whenever the guest does a vmlaunch/vmresume.
 	 * If an outer hypervisor is doing the cache flush for us
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [MODERATED] [PATCH v7 4/5] NX 4
  2019-10-24  7:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
                   ` (2 preceding siblings ...)
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 3/5] NX 3 Paolo Bonzini
@ 2019-10-24  7:34 ` Paolo Bonzini
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 5/5] NX 5 Paolo Bonzini
  2019-10-28 12:08 ` [PATCH v7 0/5] NX 0 Thomas Gleixner
  5 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2019-10-24  7:34 UTC (permalink / raw)
  To: speck

 threads

This adds a function to create a kernel thread associated with a given
VM. In particular, it ensures that the worker thread inherits the
priority and cgroups of the calling thread.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h |  6 ++++
 virt/kvm/kvm_main.c      | 84 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a817e446c9aa..d0f3939f0f6c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1382,4 +1382,10 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 }
 #endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
 
+typedef int (*kvm_vm_thread_fn_t)(struct kvm *kvm, uintptr_t data);
+
+int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn,
+				uintptr_t data, const char *name,
+				struct task_struct **thread_ptr);
+
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index b8534c6b8cf6..35c4d9aac184 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -50,6 +50,7 @@
 #include <linux/bsearch.h>
 #include <linux/io.h>
 #include <linux/lockdep.h>
+#include <linux/kthread.h>
 
 #include <asm/processor.h>
 #include <asm/ioctl.h>
@@ -4379,3 +4380,86 @@ void kvm_exit(void)
 	kvm_vfio_ops_exit();
 }
 EXPORT_SYMBOL_GPL(kvm_exit);
+
+struct kvm_vm_worker_thread_context {
+	struct kvm *kvm;
+	struct task_struct *parent;
+	struct completion init_done;
+	kvm_vm_thread_fn_t thread_fn;
+	uintptr_t data;
+	int err;
+};
+
+static int kvm_vm_worker_thread(void *context)
+{
+	/*
+	 * The init_context is allocated on the stack of the parent thread, so
+	 * we have to locally copy anything that is needed beyond initialization
+	 */
+	struct kvm_vm_worker_thread_context *init_context = context;
+	struct kvm *kvm = init_context->kvm;
+	kvm_vm_thread_fn_t thread_fn = init_context->thread_fn;
+	uintptr_t data = init_context->data;
+	int err;
+
+	err = kthread_park(current);
+	/* kthread_park(current) is never supposed to return an error */
+	WARN_ON(err != 0);
+	if (err)
+		goto init_complete;
+
+	err = cgroup_attach_task_all(init_context->parent, current);
+	if (err) {
+		kvm_err("%s: cgroup_attach_task_all failed with err %d\n",
+			__func__, err);
+		goto init_complete;
+	}
+
+	set_user_nice(current, task_nice(init_context->parent));
+
+init_complete:
+	init_context->err = err;
+	complete(&init_context->init_done);
+	init_context = NULL;
+
+	if (err)
+		return err;
+
+	/* Wait to be woken up by the spawner before proceeding. */
+	kthread_parkme();
+
+	if (!kthread_should_stop())
+		err = thread_fn(kvm, data);
+
+	return err;
+}
+
+int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn,
+				uintptr_t data, const char *name,
+				struct task_struct **thread_ptr)
+{
+	struct kvm_vm_worker_thread_context init_context = {};
+	struct task_struct *thread;
+
+	*thread_ptr = NULL;
+	init_context.kvm = kvm;
+	init_context.parent = current;
+	init_context.thread_fn = thread_fn;
+	init_context.data = data;
+	init_completion(&init_context.init_done);
+
+	thread = kthread_run(kvm_vm_worker_thread, &init_context,
+			     "%s-%d", name, task_pid_nr(current));
+	if (IS_ERR(thread))
+		return PTR_ERR(thread);
+
+	/* kthread_run is never supposed to return NULL */
+	WARN_ON(thread == NULL);
+
+	wait_for_completion(&init_context.init_done);
+
+	if (!init_context.err)
+		*thread_ptr = thread;
+
+	return init_context.err;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [MODERATED] [PATCH v7 5/5] NX 5
  2019-10-24  7:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
                   ` (3 preceding siblings ...)
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 4/5] NX 4 Paolo Bonzini
@ 2019-10-24  7:34 ` Paolo Bonzini
  2019-10-28 12:08 ` [PATCH v7 0/5] NX 0 Thomas Gleixner
  5 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2019-10-24  7:34 UTC (permalink / raw)
  To: speck


The page table pages corresponding to broken down large pages are
zapped in FIFO order, so that the large page can potentially
be recovered, if it is no longer being used for execution.  This removes
the performance penalty for walking deeper EPT page tables.

By default, one large page will last about one hour once the guest
reaches a steady state.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Documentation/admin-guide/kernel-parameters.txt |   6 ++
 arch/x86/include/asm/kvm_host.h                 |   4 +
 arch/x86/kvm/mmu.c                              | 129 ++++++++++++++++++++++++
 arch/x86/kvm/mmu.h                              |   4 +
 arch/x86/kvm/x86.c                              |  11 ++
 virt/kvm/kvm_main.c                             |  30 +++++-
 6 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index d6449135c62d..c667844c1c42 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2066,6 +2066,12 @@
 			If the sw workaround is enabled for the host, guests
 			need not enable it for nested guests.
 
+	kvm.nx_huge_pages_recovery_ratio=
+			[KVM] Controls how many 4KiB pages are periodically zapped
+			back to huge pages.  0 disables the recovery, otherwise if
+			the value is N KVM will zap 1/Nth of the 4KiB pages every
+			minute.  The default is 60.
+
 	kvm-amd.nested=	[KVM,AMD] Allow nested virtualization in KVM/SVM.
 			Default is 1 (enabled)
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6aba95fdb00b..8f721bfdbb14 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -310,6 +310,8 @@ struct kvm_rmap_head {
 struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
+	struct list_head lpage_disallowed_link;
+
 	bool unsync;
 	u8 mmu_valid_gen;
 	bool mmio_cached;
@@ -859,6 +861,7 @@ struct kvm_arch {
 	 */
 	struct list_head active_mmu_pages;
 	struct list_head zapped_obsolete_pages;
+	struct list_head lpage_disallowed_mmu_pages;
 	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 
@@ -933,6 +936,7 @@ struct kvm_arch {
 	bool exception_payload_enabled;
 
 	struct kvm_pmu_event_filter *pmu_event_filter;
+	struct task_struct *nx_lpage_recovery_thread;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 837beefdf0a5..e6a5748a12d5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -37,6 +37,7 @@
 #include <linux/uaccess.h>
 #include <linux/hash.h>
 #include <linux/kern_levels.h>
+#include <linux/kthread.h>
 
 #include <asm/page.h>
 #include <asm/pat.h>
@@ -50,16 +51,26 @@
 extern bool itlb_multihit_kvm_mitigation;
 
 static int __read_mostly nx_huge_pages = -1;
+static uint __read_mostly nx_huge_pages_recovery_ratio = 60;
 
 static int set_nx_huge_pages(const char *val, const struct kernel_param *kp);
+static int set_nx_huge_pages_recovery_ratio(const char *val, const struct kernel_param *kp);
 
 static struct kernel_param_ops nx_huge_pages_ops = {
 	.set = set_nx_huge_pages,
 	.get = param_get_bool,
 };
 
+static struct kernel_param_ops nx_huge_pages_recovery_ratio_ops = {
+	.set = set_nx_huge_pages_recovery_ratio,
+	.get = param_get_uint,
+};
+
 module_param_cb(nx_huge_pages, &nx_huge_pages_ops, &nx_huge_pages, 0644);
 __MODULE_PARM_TYPE(nx_huge_pages, "bool");
+module_param_cb(nx_huge_pages_recovery_ratio, &nx_huge_pages_recovery_ratio_ops,
+		&nx_huge_pages_recovery_ratio, 0644);
+__MODULE_PARM_TYPE(nx_huge_pages_recovery_ratio, "uint");
 
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
@@ -1215,6 +1226,8 @@ static void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 		return;
 
 	++kvm->stat.nx_lpage_splits;
+	list_add_tail(&sp->lpage_disallowed_link,
+		      &kvm->arch.lpage_disallowed_mmu_pages);
 	sp->lpage_disallowed = true;
 }
 
@@ -1239,6 +1252,7 @@ static void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	--kvm->stat.nx_lpage_splits;
 	sp->lpage_disallowed = false;
+	list_del(&sp->lpage_disallowed_link);
 }
 
 static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
@@ -6268,6 +6282,8 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
 			idx = srcu_read_lock(&kvm->srcu);
 			kvm_mmu_zap_all_fast(kvm);
 			srcu_read_unlock(&kvm->srcu, idx);
+
+			wake_up_process(kvm->arch.nx_lpage_recovery_thread);
 		}
 		mutex_unlock(&kvm_lock);
 	}
@@ -6361,3 +6377,116 @@ void kvm_mmu_module_exit(void)
 	unregister_shrinker(&mmu_shrinker);
 	mmu_audit_disable();
 }
+
+static int set_nx_huge_pages_recovery_ratio(const char *val, const struct kernel_param *kp)
+{
+	unsigned int old_val;
+	int err;
+
+	old_val = nx_huge_pages_recovery_ratio;
+	err = param_set_uint(val, kp);
+	if (err)
+		return err;
+
+	if (READ_ONCE(nx_huge_pages) &&
+	    !old_val && nx_huge_pages_recovery_ratio) {
+		struct kvm *kvm;
+
+		mutex_lock(&kvm_lock);
+
+		list_for_each_entry(kvm, &vm_list, vm_list)
+			wake_up_process(kvm->arch.nx_lpage_recovery_thread);
+
+		mutex_unlock(&kvm_lock);
+	}
+
+	return err;
+}
+
+static void kvm_recover_nx_lpages(struct kvm *kvm)
+{
+	int rcu_idx;
+	struct kvm_mmu_page *sp;
+	unsigned int ratio;
+	LIST_HEAD(invalid_list);
+	ulong to_zap;
+
+	rcu_idx = srcu_read_lock(&kvm->srcu);
+	spin_lock(&kvm->mmu_lock);
+
+	ratio = READ_ONCE(nx_huge_pages_recovery_ratio);
+	to_zap = ratio ? DIV_ROUND_UP(kvm->stat.nx_lpage_splits, ratio) : 0;
+	while (to_zap && !list_empty(&kvm->arch.lpage_disallowed_mmu_pages)) {
+		/*
+		 * We use a separate list instead of just using active_mmu_pages
+		 * because the number of lpage_disallowed pages is expected to
+		 * be relatively small compared to the total.
+		 */
+		sp = list_first_entry(&kvm->arch.lpage_disallowed_mmu_pages,
+				      struct kvm_mmu_page,
+				      lpage_disallowed_link);
+		WARN_ON_ONCE(!sp->lpage_disallowed);
+		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+		WARN_ON_ONCE(sp->lpage_disallowed);
+
+		if (!--to_zap || need_resched() || spin_needbreak(&kvm->mmu_lock)) {
+			kvm_mmu_commit_zap_page(kvm, &invalid_list);
+			if (to_zap)
+				cond_resched_lock(&kvm->mmu_lock);
+		}
+	}
+
+	spin_unlock(&kvm->mmu_lock);
+	srcu_read_unlock(&kvm->srcu, rcu_idx);
+}
+
+static long get_nx_lpage_recovery_timeout(u64 start_time)
+{
+	return READ_ONCE(nx_huge_pages) && READ_ONCE(nx_huge_pages_recovery_ratio)
+		? start_time + 60 * HZ - get_jiffies_64()
+		: MAX_SCHEDULE_TIMEOUT;
+}
+
+static int kvm_nx_lpage_recovery_worker(struct kvm *kvm, uintptr_t data)
+{
+	u64 start_time;
+	long remaining_time;
+
+	while (true) {
+		start_time = get_jiffies_64();
+		remaining_time = get_nx_lpage_recovery_timeout(start_time);
+
+		set_current_state(TASK_INTERRUPTIBLE);
+		while (!kthread_should_stop() && remaining_time > 0) {
+			schedule_timeout(remaining_time);
+			remaining_time = get_nx_lpage_recovery_timeout(start_time);
+			set_current_state(TASK_INTERRUPTIBLE);
+		}
+
+		set_current_state(TASK_RUNNING);
+
+		if (kthread_should_stop())
+			return 0;
+
+		kvm_recover_nx_lpages(kvm);
+	}
+}
+
+int kvm_mmu_post_init_vm(struct kvm *kvm)
+{
+	int err;
+
+	err = kvm_vm_create_worker_thread(kvm, kvm_nx_lpage_recovery_worker, 0,
+					  "kvm-nx-lpage-recovery",
+					  &kvm->arch.nx_lpage_recovery_thread);
+	if (!err)
+		kthread_unpark(kvm->arch.nx_lpage_recovery_thread);
+
+	return err;
+}
+
+void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
+{
+	if (kvm->arch.nx_lpage_recovery_thread)
+		kthread_stop(kvm->arch.nx_lpage_recovery_thread);
+}
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 11f8ec89433b..d55674f44a18 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -210,4 +210,8 @@ static inline u8 permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn);
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
+
+int kvm_mmu_post_init_vm(struct kvm *kvm);
+void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
+
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f81f547d138c..723f2f99a92d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9459,6 +9459,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
+	INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
 	atomic_set(&kvm->arch.noncoherent_dma_count, 0);
 
@@ -9487,6 +9488,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	return kvm_x86_ops->vm_init(kvm);
 }
 
+int kvm_arch_post_init_vm(struct kvm *kvm)
+{
+	return kvm_mmu_post_init_vm(kvm);
+}
+
 static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
 {
 	vcpu_load(vcpu);
@@ -9588,6 +9594,11 @@ int x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
 }
 EXPORT_SYMBOL_GPL(x86_set_memory_region);
 
+void kvm_arch_pre_destroy_vm(struct kvm *kvm)
+{
+	kvm_mmu_pre_destroy_vm(kvm);
+}
+
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	if (current->mm == kvm->mm) {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 35c4d9aac184..a7ff1e8b2f72 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -626,6 +626,23 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 	return 0;
 }
 
+/*
+ * Called after the VM is otherwise initialized, but just before adding it to
+ * the vm_list.
+ */
+int __weak kvm_arch_post_init_vm(struct kvm *kvm)
+{
+	return 0;
+}
+
+/*
+ * Called just after removing the VM from the vm_list, but before doing any
+ * other destruction.
+ */
+void __weak kvm_arch_pre_destroy_vm(struct kvm *kvm)
+{
+}
+
 static struct kvm *kvm_create_vm(unsigned long type)
 {
 	int r, i;
@@ -676,11 +693,15 @@ static struct kvm *kvm_create_vm(unsigned long type)
 		rcu_assign_pointer(kvm->buses[i],
 			kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL_ACCOUNT));
 		if (!kvm->buses[i])
-			goto out_err;
+			goto out_err_no_mmu_notifier;
 	}
 
 	r = kvm_init_mmu_notifier(kvm);
 	if (r)
+		goto out_err_no_mmu_notifier;
+
+	r = kvm_arch_post_init_vm(kvm);
+	if (r)
 		goto out_err;
 
 	mutex_lock(&kvm_lock);
@@ -692,6 +713,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	return kvm;
 
 out_err:
+#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
+	if (kvm->mmu_notifier.ops)
+		mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
+#endif
+out_err_no_mmu_notifier:
 	cleanup_srcu_struct(&kvm->irq_srcu);
 out_err_no_irq_srcu:
 	cleanup_srcu_struct(&kvm->srcu);
@@ -734,6 +760,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	mutex_lock(&kvm_lock);
 	list_del(&kvm->vm_list);
 	mutex_unlock(&kvm_lock);
+	kvm_arch_pre_destroy_vm(kvm);
+
 	kvm_free_irq_routing(kvm);
 	for (i = 0; i < KVM_NR_BUSES; i++) {
 		struct kvm_io_bus *bus = kvm_get_bus(kvm, i);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [MODERATED] Re: [PATCH v7 1/5] NX 1
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 1/5] NX 1 Paolo Bonzini
@ 2019-10-24  8:57   ` Paolo Bonzini
  2019-10-24 14:21     ` Tyler Hicks
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2019-10-24  8:57 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 13874 bytes --]

Uhm, this time I'm sure that I included the From/Subject lines, both in 
this series and the QEMU patch.

$ gpg --decrypt 0001-target-i386-add-PSCHANGE_MC_NO-bit-for-MSR_ARCH_CAPA.patch 
gpg: encrypted with 4096-bit RSA key, ID 0x5669D7C4E1794FF3, created 2018-04-11
      "speck@linutronix.de <speck@linutronix.de>"
gpg: encrypted with 2048-bit RSA key, ID 0x873AB00F2B8E02C3, created 2014-10-18
      "Paolo Bonzini <bonzini@gnu.org>"
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH] target/i386: add PSCHANGE_MC_NO bit for MSR_ARCH_CAPABILITIES

This is required to disable ITLB multihit mitigations in nested
hypervisors.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

etc...

Paolo


On 24/10/19 09:34, speck for Paolo Bonzini wrote:
> 
> Some processors may incur a machine check error possibly
> resulting in an unrecoverable cpu hang when an instruction fetch
> encounters a TLB multi-hit in the instruction TLB. This can occur
> when the page size is changed along with either the physical
> address or cache type [1].
> 
> This issue affects both bare-metal x86 page tables and EPT.
> 
> This can be mitigated by either eliminating the use of large
> pages or by using careful TLB invalidations when changing the
> page size in the page tables.
> 
> Just like Spectre, Meltdown, L1TF and MDS, a new bit has been
> allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will
> be set on CPUs which are mitigated against this issue.
> 
> [1] For example please refer to erratum SKL002 in "6th Generation
> Intel Processor Family Specification Update"
> https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html
> https://www.google.com/search?q=site:intel.com+SKL002
> 
> There are a lot of other affected processors outside of Skylake and
> that the erratum(referred above) does not fully disclose the issue
> and the impact, both on Skylake and across all the affected CPUs.
> 
> Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  Documentation/ABI/testing/sysfs-devices-system-cpu |  1 +
>  arch/x86/include/asm/cpufeatures.h                 |  1 +
>  arch/x86/include/asm/msr-index.h                   |  7 +++
>  arch/x86/kernel/cpu/bugs.c                         | 13 ++++
>  arch/x86/kernel/cpu/common.c                       | 71 ++++++++++++----------
>  drivers/base/cpu.c                                 |  8 +++
>  include/linux/cpu.h                                |  2 +
>  7 files changed, 70 insertions(+), 33 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index 06d0931119cc..55bf5e1538ad 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -486,6 +486,7 @@ What:		/sys/devices/system/cpu/vulnerabilities
>  		/sys/devices/system/cpu/vulnerabilities/spec_store_bypass
>  		/sys/devices/system/cpu/vulnerabilities/l1tf
>  		/sys/devices/system/cpu/vulnerabilities/mds
> +		/sys/devices/system/cpu/vulnerabilities/itlb_multihit
>  Date:		January 2018
>  Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
>  Description:	Information about CPU vulnerabilities
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 0652d3eed9bd..66aaad7611b2 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -399,5 +399,6 @@
>  #define X86_BUG_MDS			X86_BUG(19) /* CPU is affected by Microarchitectural data sampling */
>  #define X86_BUG_MSBDS_ONLY		X86_BUG(20) /* CPU is only affected by the  MSDBS variant of BUG_MDS */
>  #define X86_BUG_SWAPGS			X86_BUG(21) /* CPU is affected by speculation through SWAPGS */
> +#define X86_BUG_ITLB_MULTIHIT		X86_BUG(22) /* CPU may incur MCE during certain page attribute changes */
>  
>  #endif /* _ASM_X86_CPUFEATURES_H */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index 20ce682a2540..c678899b21db 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -93,6 +93,13 @@
>  						  * Microarchitectural Data
>  						  * Sampling (MDS) vulnerabilities.
>  						  */
> +#define ARCH_CAP_PSCHANGE_MC_NO		BIT(6)	 /*
> +						  * The processor is not susceptible to a
> +						  * machine check error due to modifying the
> +						  * code page size along with either the
> +						  * physical address or cache type
> +						  * without TLB invalidation.
> +						  */
>  
>  #define MSR_IA32_FLUSH_CMD		0x0000010b
>  #define L1D_FLUSH			BIT(0)	/*
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 91c2561b905f..ecd0126648ea 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1311,6 +1311,11 @@ static ssize_t l1tf_show_state(char *buf)
>  }
>  #endif
>  
> +static ssize_t itlb_multihit_show_state(char *buf)
> +{
> +	return sprintf(buf, "Processor vulnerable\n");
> +}
> +
>  static ssize_t mds_show_state(char *buf)
>  {
>  	if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> @@ -1398,6 +1403,9 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
>  	case X86_BUG_MDS:
>  		return mds_show_state(buf);
>  
> +	case X86_BUG_ITLB_MULTIHIT:
> +		return itlb_multihit_show_state(buf);
> +
>  	default:
>  		break;
>  	}
> @@ -1434,4 +1442,9 @@ ssize_t cpu_show_mds(struct device *dev, struct device_attribute *attr, char *bu
>  {
>  	return cpu_show_common(dev, attr, buf, X86_BUG_MDS);
>  }
> +
> +ssize_t cpu_show_itlb_multihit(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	return cpu_show_common(dev, attr, buf, X86_BUG_ITLB_MULTIHIT);
> +}
>  #endif
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index 9ae7d1bcd4f4..fc00b2349a9f 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1016,13 +1016,14 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
>  #endif
>  }
>  
> -#define NO_SPECULATION	BIT(0)
> -#define NO_MELTDOWN	BIT(1)
> -#define NO_SSB		BIT(2)
> -#define NO_L1TF		BIT(3)
> -#define NO_MDS		BIT(4)
> -#define MSBDS_ONLY	BIT(5)
> -#define NO_SWAPGS	BIT(6)
> +#define NO_SPECULATION		BIT(0)
> +#define NO_MELTDOWN		BIT(1)
> +#define NO_SSB			BIT(2)
> +#define NO_L1TF			BIT(3)
> +#define NO_MDS			BIT(4)
> +#define MSBDS_ONLY		BIT(5)
> +#define NO_SWAPGS		BIT(6)
> +#define NO_ITLB_MULTIHIT	BIT(7)
>  
>  #define VULNWL(_vendor, _family, _model, _whitelist)	\
>  	{ X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist }
> @@ -1043,27 +1044,27 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
>  	VULNWL(NSC,	5, X86_MODEL_ANY,	NO_SPECULATION),
>  
>  	/* Intel Family 6 */
> -	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION),
> -	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION),
> -	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION),
> -	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION),
> -	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION),
> -
> -	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> +	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
> +
> +	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
>  
>  	VULNWL_INTEL(CORE_YONAH,		NO_SSB),
>  
> -	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS),
> +	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
>  
> -	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS),
> +	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
>  
>  	/*
>  	 * Technically, swapgs isn't serializing on AMD (despite it previously
> @@ -1074,14 +1075,14 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
>  	 */
>  
>  	/* AMD Family 0xf - 0x12 */
> -	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
> -	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
> -	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
> -	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
> +	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
>  
>  	/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */
> -	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
> -	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
> +	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
>  	{}
>  };
>  
> @@ -1096,15 +1097,19 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
>  {
>  	u64 ia32_cap = 0;
>  
> +	if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES))
> +		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
> +
> +	/* Set ITLB_MULTIHIT bug if cpu is not in the whitelist and not mitigated */
> +	if (!cpu_matches(NO_ITLB_MULTIHIT) && !(ia32_cap & ARCH_CAP_PSCHANGE_MC_NO))
> +		setup_force_cpu_bug(X86_BUG_ITLB_MULTIHIT);
> +
>  	if (cpu_matches(NO_SPECULATION))
>  		return;
>  
>  	setup_force_cpu_bug(X86_BUG_SPECTRE_V1);
>  	setup_force_cpu_bug(X86_BUG_SPECTRE_V2);
>  
> -	if (cpu_has(c, X86_FEATURE_ARCH_CAPABILITIES))
> -		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, ia32_cap);
> -
>  	if (!cpu_matches(NO_SSB) && !(ia32_cap & ARCH_CAP_SSB_NO) &&
>  	   !cpu_has(c, X86_FEATURE_AMD_SSB_NO))
>  		setup_force_cpu_bug(X86_BUG_SPEC_STORE_BYPASS);
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index cc37511de866..f1a6e020ed8d 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -554,12 +554,19 @@ ssize_t __weak cpu_show_mds(struct device *dev,
>  	return sprintf(buf, "Not affected\n");
>  }
>  
> +ssize_t __weak cpu_show_itlb_multihit(struct device *dev,
> +			    struct device_attribute *attr, char *buf)
> +{
> +	return sprintf(buf, "Not affected\n");
> +}
> +
>  static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
>  static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
>  static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
>  static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
>  static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);
>  static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL);
> +static DEVICE_ATTR(itlb_multihit, 0444, cpu_show_itlb_multihit, NULL);
>  
>  static struct attribute *cpu_root_vulnerabilities_attrs[] = {
>  	&dev_attr_meltdown.attr,
> @@ -568,6 +575,7 @@ ssize_t __weak cpu_show_mds(struct device *dev,
>  	&dev_attr_spec_store_bypass.attr,
>  	&dev_attr_l1tf.attr,
>  	&dev_attr_mds.attr,
> +	&dev_attr_itlb_multihit.attr,
>  	NULL
>  };
>  
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index d0633ebdaa9c..038866a28f2c 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -59,6 +59,8 @@ extern ssize_t cpu_show_l1tf(struct device *dev,
>  			     struct device_attribute *attr, char *buf);
>  extern ssize_t cpu_show_mds(struct device *dev,
>  			    struct device_attribute *attr, char *buf);
> +extern ssize_t cpu_show_itlb_multihit(struct device *dev,
> +				      struct device_attribute *attr, char *buf);
>  
>  extern __printf(4, 5)
>  struct device *cpu_device_create(struct device *parent, void *drvdata,
> 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [MODERATED] Re: [PATCH v7 1/5] NX 1
  2019-10-24  8:57   ` [MODERATED] " Paolo Bonzini
@ 2019-10-24 14:21     ` Tyler Hicks
  2019-10-24 14:25       ` Paolo Bonzini
  0 siblings, 1 reply; 21+ messages in thread
From: Tyler Hicks @ 2019-10-24 14:21 UTC (permalink / raw)
  To: speck

On 2019-10-24 10:57:57, speck for Paolo Bonzini wrote:
> Uhm, this time I'm sure that I included the From/Subject lines, both in 
> this series and the QEMU patch.

In another thread, Pawan and Thomas identified an additional newline
that needed to be emitted from speckify-gitmail. While digging up the
details to pass on to you, I noticed that Thomas just uploaded
speck-utils-v2 that contains the change:

 https://tglx.de/~tglx/.utils/

Sorry for stealing your thunder on the v2 announcement, Thomas. :)

Tyler

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [MODERATED] Re: [PATCH v7 1/5] NX 1
  2019-10-24 14:21     ` Tyler Hicks
@ 2019-10-24 14:25       ` Paolo Bonzini
  2019-10-24 15:01         ` Tyler Hicks
  0 siblings, 1 reply; 21+ messages in thread
From: Paolo Bonzini @ 2019-10-24 14:25 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 811 bytes --]

On 24/10/19 16:21, speck for Tyler Hicks wrote:
> On 2019-10-24 10:57:57, speck for Paolo Bonzini wrote:
>> Uhm, this time I'm sure that I included the From/Subject lines, both in 
>> this series and the QEMU patch.
> 
> In another thread, Pawan and Thomas identified an additional newline
> that needed to be emitted from speckify-gitmail. While digging up the
> details to pass on to you, I noticed that Thomas just uploaded
> speck-utils-v2 that contains the change:
> 
>  https://tglx.de/~tglx/.utils/
> 
> Sorry for stealing your thunder on the v2 announcement, Thomas. :)

I see (I had written my own script though before speck-utils was a
thing, or at least before I learnt about it, so I'll have to dig out
that thread).

I think I'll just send a git bundle later today.

Paolo



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [MODERATED] Re: [PATCH v7 1/5] NX 1
  2019-10-24 14:25       ` Paolo Bonzini
@ 2019-10-24 15:01         ` Tyler Hicks
  0 siblings, 0 replies; 21+ messages in thread
From: Tyler Hicks @ 2019-10-24 15:01 UTC (permalink / raw)
  To: speck

On 2019-10-24 16:25:00, speck for Paolo Bonzini wrote:
> On 24/10/19 16:21, speck for Tyler Hicks wrote:
> > On 2019-10-24 10:57:57, speck for Paolo Bonzini wrote:
> >> Uhm, this time I'm sure that I included the From/Subject lines, both in 
> >> this series and the QEMU patch.
> > 
> > In another thread, Pawan and Thomas identified an additional newline
> > that needed to be emitted from speckify-gitmail. While digging up the
> > details to pass on to you, I noticed that Thomas just uploaded
> > speck-utils-v2 that contains the change:
> > 
> >  https://tglx.de/~tglx/.utils/
> > 
> > Sorry for stealing your thunder on the v2 announcement, Thomas. :)
> 
> I see (I had written my own script though before speck-utils was a
> thing, or at least before I learnt about it, so I'll have to dig out
> that thread).

Message ID <20191021203528.GD23497@guptapadev.amr> is what you'll need
to search for.

Tyler

> 
> I think I'll just send a git bundle later today.
> 
> Paolo
> 
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v7 0/5] NX 0
  2019-10-24  7:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
                   ` (4 preceding siblings ...)
  2019-10-24  7:34 ` [MODERATED] [PATCH v7 5/5] NX 5 Paolo Bonzini
@ 2019-10-28 12:08 ` Thomas Gleixner
  2019-10-28 16:02   ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
  5 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-28 12:08 UTC (permalink / raw)
  To: speck

On Thu, 24 Oct 2019, speck for Paolo Bonzini wrote:

> Junaid Shahid (2):
>   kvm: Add helper function for creating VM worker threads
>   kvm: x86: mmu: Recovery of shattered NX large pages
> 
> Paolo Bonzini (1):
>   kvm: mmu: ITLB_MULTIHIT mitigation
> 
> Pawan Gupta (2):
>   x86: Add ITLB_MULTIHIT bug infrastructure
>   x86/cpu: Add Tremont to the cpu vulnerability whitelist

I assume you send another version of these. The series massively conflicts
with the merged TAA pile, so can you please rebase on top of that?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: [PATCH v7 0/5] NX 0
  2019-10-28 12:08 ` [PATCH v7 0/5] NX 0 Thomas Gleixner
@ 2019-10-28 16:02   ` Joerg Roedel
  2019-10-29  8:51     ` Thomas Gleixner
  2019-10-31 21:00     ` [MODERATED] Re: ***UNCHECKED*** " Tyler Hicks
  0 siblings, 2 replies; 21+ messages in thread
From: Joerg Roedel @ 2019-10-28 16:02 UTC (permalink / raw)
  To: speck

Hi Thomas,

On Mon, Oct 28, 2019 at 01:08:33PM +0100, speck for Thomas Gleixner wrote:
> I assume you send another version of these. The series massively conflicts
> with the merged TAA pile, so can you please rebase on top of that?

The conflicts are only in the first patch, I resolved them and here is
the result:

From 386176962dcbb8827804627dcc762f373a5c0269 Mon Sep 17 00:00:00 2001
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date: Mon, 28 Oct 2019 09:55:28 +0100
Subject: [PATCH 1/5] x86: Add ITLB_MULTIHIT bug infrastructure

Some processors may incur a machine check error possibly
resulting in an unrecoverable cpu hang when an instruction fetch
encounters a TLB multi-hit in the instruction TLB. This can occur
when the page size is changed along with either the physical
address or cache type [1].

This issue affects both bare-metal x86 page tables and EPT.

This can be mitigated by either eliminating the use of large
pages or by using careful TLB invalidations when changing the
page size in the page tables.

Just like Spectre, Meltdown, L1TF and MDS, a new bit has been
allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will
be set on CPUs which are mitigated against this issue.

[1] For example please refer to erratum SKL002 in "6th Generation
Intel Processor Family Specification Update"
https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html
https://www.google.com/search?q=site:intel.com+SKL002

There are a lot of other affected processors outside of Skylake and
that the erratum(referred above) does not fully disclose the issue
and the impact, both on Skylake and across all the affected CPUs.

Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
---
 Documentation/ABI/testing/sysfs-devices-system-cpu |  1 +
 arch/x86/include/asm/cpufeatures.h                 |  1 +
 arch/x86/include/asm/msr-index.h                   |  7 +++
 arch/x86/kernel/cpu/bugs.c                         | 13 +++++
 arch/x86/kernel/cpu/common.c                       | 65 ++++++++++++----------
 drivers/base/cpu.c                                 |  2 +
 include/linux/cpu.h                                |  2 +
 7 files changed, 61 insertions(+), 30 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 0e77569bd5e0..fc20cde63d1e 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -487,6 +487,7 @@ What:		/sys/devices/system/cpu/vulnerabilities
 		/sys/devices/system/cpu/vulnerabilities/l1tf
 		/sys/devices/system/cpu/vulnerabilities/mds
 		/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
+		/sys/devices/system/cpu/vulnerabilities/itlb_multihit
 Date:		January 2018
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
 Description:	Information about CPU vulnerabilities
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 989e03544f18..c4fbe379cc0b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -400,5 +400,6 @@
 #define X86_BUG_MSBDS_ONLY		X86_BUG(20) /* CPU is only affected by the  MSDBS variant of BUG_MDS */
 #define X86_BUG_SWAPGS			X86_BUG(21) /* CPU is affected by speculation through SWAPGS */
 #define X86_BUG_TAA			X86_BUG(22) /* CPU is affected by TSX Async Abort(TAA) */
+#define X86_BUG_ITLB_MULTIHIT		X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b3a8bb2af0b6..6a3124664289 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -93,6 +93,13 @@
 						  * Microarchitectural Data
 						  * Sampling (MDS) vulnerabilities.
 						  */
+#define ARCH_CAP_PSCHANGE_MC_NO		BIT(6)	 /*
+						  * The processor is not susceptible to a
+						  * machine check error due to modifying the
+						  * code page size along with either the
+						  * physical address or cache type
+						  * without TLB invalidation.
+						  */
 #define ARCH_CAP_TSX_CTRL_MSR		BIT(7)	/* MSR for TSX control is available. */
 #define ARCH_CAP_TAA_NO			BIT(8)	/*
 						 * Not susceptible to
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 43c647e19439..5364beda8c61 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1419,6 +1419,11 @@ static ssize_t l1tf_show_state(char *buf)
 }
 #endif
 
+static ssize_t itlb_multihit_show_state(char *buf)
+{
+	return sprintf(buf, "Processor vulnerable\n");
+}
+
 static ssize_t mds_show_state(char *buf)
 {
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
@@ -1524,6 +1529,9 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
 	case X86_BUG_TAA:
 		return tsx_async_abort_show_state(buf);
 
+	case X86_BUG_ITLB_MULTIHIT:
+		return itlb_multihit_show_state(buf);
+
 	default:
 		break;
 	}
@@ -1565,4 +1573,9 @@ ssize_t cpu_show_tsx_async_abort(struct device *dev, struct device_attribute *at
 {
 	return cpu_show_common(dev, attr, buf, X86_BUG_TAA);
 }
+
+ssize_t cpu_show_itlb_multihit(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return cpu_show_common(dev, attr, buf, X86_BUG_ITLB_MULTIHIT);
+}
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f8b8afc8f5b5..d29b71ca3ca7 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1016,13 +1016,14 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
 #endif
 }
 
-#define NO_SPECULATION	BIT(0)
-#define NO_MELTDOWN	BIT(1)
-#define NO_SSB		BIT(2)
-#define NO_L1TF		BIT(3)
-#define NO_MDS		BIT(4)
-#define MSBDS_ONLY	BIT(5)
-#define NO_SWAPGS	BIT(6)
+#define NO_SPECULATION		BIT(0)
+#define NO_MELTDOWN		BIT(1)
+#define NO_SSB			BIT(2)
+#define NO_L1TF			BIT(3)
+#define NO_MDS			BIT(4)
+#define MSBDS_ONLY		BIT(5)
+#define NO_SWAPGS		BIT(6)
+#define NO_ITLB_MULTIHIT	BIT(7)
 
 #define VULNWL(_vendor, _family, _model, _whitelist)	\
 	{ X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist }
@@ -1043,27 +1044,27 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
 	VULNWL(NSC,	5, X86_MODEL_ANY,	NO_SPECULATION),
 
 	/* Intel Family 6 */
-	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION),
-	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION),
-
-	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+
+	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	VULNWL_INTEL(CORE_YONAH,		NO_SSB),
 
-	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
-	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	/*
 	 * Technically, swapgs isn't serializing on AMD (despite it previously
@@ -1074,14 +1075,14 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
 	 */
 
 	/* AMD Family 0xf - 0x12 */
-	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
+	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */
-	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
+	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
 	{}
 };
 
@@ -1106,6 +1107,10 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
 {
 	u64 ia32_cap = x86_read_arch_cap_msr();
 
+	/* Set ITLB_MULTIHIT bug if cpu is not in the whitelist and not mitigated */
+	if (!cpu_matches(NO_ITLB_MULTIHIT) && !(ia32_cap & ARCH_CAP_PSCHANGE_MC_NO))
+		setup_force_cpu_bug(X86_BUG_ITLB_MULTIHIT);
+
 	if (cpu_matches(NO_SPECULATION))
 		return;
 
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 0fccd8c0312e..86f531fea9d0 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -568,6 +568,7 @@ static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
 static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);
 static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL);
 static DEVICE_ATTR(tsx_async_abort, 0444, cpu_show_tsx_async_abort, NULL);
+static DEVICE_ATTR(itlb_multihit, 0444, cpu_show_itlb_multihit, NULL);
 
 static struct attribute *cpu_root_vulnerabilities_attrs[] = {
 	&dev_attr_meltdown.attr,
@@ -577,6 +578,7 @@ static struct attribute *cpu_root_vulnerabilities_attrs[] = {
 	&dev_attr_l1tf.attr,
 	&dev_attr_mds.attr,
 	&dev_attr_tsx_async_abort.attr,
+	&dev_attr_itlb_multihit.attr,
 	NULL
 };
 
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index f35369f79771..2a093434e975 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -62,6 +62,8 @@ extern ssize_t cpu_show_mds(struct device *dev,
 extern ssize_t cpu_show_tsx_async_abort(struct device *dev,
 					struct device_attribute *attr,
 					char *buf);
+extern ssize_t cpu_show_itlb_multihit(struct device *dev,
+				      struct device_attribute *attr, char *buf);
 
 extern __printf(4, 5)
 struct device *cpu_device_create(struct device *parent, void *drvdata,
-- 
2.16.4

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: ***UNCHECKED*** Re: [PATCH v7 0/5] NX 0
  2019-10-28 16:02   ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
@ 2019-10-29  8:51     ` Thomas Gleixner
  2019-10-29  9:39       ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
  2019-10-31 21:00     ` [MODERATED] Re: ***UNCHECKED*** " Tyler Hicks
  1 sibling, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-29  8:51 UTC (permalink / raw)
  To: speck

On Mon, 28 Oct 2019, speck for Joerg Roedel wrote:

> Hi Thomas,
> 
> On Mon, Oct 28, 2019 at 01:08:33PM +0100, speck for Thomas Gleixner wrote:
> > I assume you send another version of these. The series massively conflicts
> > with the merged TAA pile, so can you please rebase on top of that?
> 
> The conflicts are only in the first patch, I resolved them and here is
> the result:

Yes, but what's the final resolution of the 3/5 patch?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: Re: [PATCH v7 0/5] NX 0
  2019-10-29  8:51     ` Thomas Gleixner
@ 2019-10-29  9:39       ` Joerg Roedel
  2019-10-29 11:00         ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Joerg Roedel @ 2019-10-29  9:39 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 449 bytes --]

Hey Thomas,

On Tue, Oct 29, 2019 at 09:51:56AM +0100, speck for Thomas Gleixner wrote:
> Yes, but what's the final resolution of the 3/5 patch?

Hmm, I didn't get a conflict when applying patch 3/5, was that on
master?

Here is the git-bundle of what I ended up with when applying the IFU
patches on-top of the TAA patches (from the taa-master bundle). The
patches have my Signed-off-by, feel free to remove that before applying.

Regards,

	Joerg

[-- Attachment #2: taa_plus_ifu --]
[-- Type: application/octet-stream, Size: 40285 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: ***UNCHECKED*** Re: Re: [PATCH v7 0/5] NX 0
  2019-10-29  9:39       ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
@ 2019-10-29 11:00         ` Thomas Gleixner
  2019-10-29 12:31           ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-29 11:00 UTC (permalink / raw)
  To: speck

On Tue, 29 Oct 2019, speck for Joerg Roedel wrote:

> Hey Thomas,
> 
> On Tue, Oct 29, 2019 at 09:51:56AM +0100, speck for Thomas Gleixner wrote:
> > Yes, but what's the final resolution of the 3/5 patch?
> 
> Hmm, I didn't get a conflict when applying patch 3/5, was that on
> master?

Not a conflict, but 3/5 needs an extra delta patch on top AFAICT. Paolo said he
would send a v8. I'll have a look later today.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: Re: Re: [PATCH v7 0/5] NX 0
  2019-10-29 11:00         ` Thomas Gleixner
@ 2019-10-29 12:31           ` Joerg Roedel
  2019-10-29 15:35             ` Thomas Gleixner
  0 siblings, 1 reply; 21+ messages in thread
From: Joerg Roedel @ 2019-10-29 12:31 UTC (permalink / raw)
  To: speck

On Tue, Oct 29, 2019 at 12:00:10PM +0100, speck for Thomas Gleixner wrote:
> Not a conflict, but 3/5 needs an extra delta patch on top AFAICT. Paolo said he
> would send a v8. I'll have a look later today.

Ah okay, Paolo wanted to include the fix for the !npt/!ept cases (where
efer.nx was clear for the guest at early boot so that the nx-pages
caused rsvd-bit faults).

But he sent the fix separatly to the public already here:

	https://lore.kernel.org/lkml/20191027152323.24326-1-pbonzini@redhat.com/

I guess he plans to send it upstream before the CRD date, Paolo?


Regards,

	Joerg

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: ***UNCHECKED*** Re: Re: Re: [PATCH v7 0/5] NX 0
  2019-10-29 12:31           ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
@ 2019-10-29 15:35             ` Thomas Gleixner
  2019-10-29 15:42               ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
  0 siblings, 1 reply; 21+ messages in thread
From: Thomas Gleixner @ 2019-10-29 15:35 UTC (permalink / raw)
  To: speck

On Tue, 29 Oct 2019, speck for Joerg Roedel wrote:
> On Tue, Oct 29, 2019 at 12:00:10PM +0100, speck for Thomas Gleixner wrote:
> > Not a conflict, but 3/5 needs an extra delta patch on top AFAICT. Paolo said he
> > would send a v8. I'll have a look later today.
> 
> Ah okay, Paolo wanted to include the fix for the !npt/!ept cases (where
> efer.nx was clear for the guest at early boot so that the nx-pages
> caused rsvd-bit faults).
> 
> But he sent the fix separatly to the public already here:
> 
> 	https://lore.kernel.org/lkml/20191027152323.24326-1-pbonzini@redhat.com/
> 
> I guess he plans to send it upstream before the CRD date, Paolo?

So the fix is independent of the NX pile, right? But OTOH, for testing the
NX stuff with all the various knobs and options the patch is needed I
assume.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: Re: Re: Re: [PATCH v7 0/5] NX 0
  2019-10-29 15:35             ` Thomas Gleixner
@ 2019-10-29 15:42               ` Joerg Roedel
  2019-10-29 16:50                 ` Paolo Bonzini
  0 siblings, 1 reply; 21+ messages in thread
From: Joerg Roedel @ 2019-10-29 15:42 UTC (permalink / raw)
  To: speck

On Tue, Oct 29, 2019 at 04:35:56PM +0100, speck for Thomas Gleixner wrote:
> So the fix is independent of the NX pile, right? But OTOH, for testing the
> NX stuff with all the various knobs and options the patch is needed I
> assume.

The fix is independent in a sense that its pure existence doesn't reveal
anything under the embargo, but the issue that is fixed by it only
happens with the NX pile applied. So yes, it is needed to test the NX
patches with EPT disabled (or NPT disabled on AMD).


	Joerg

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: Re: Re: Re: [PATCH v7 0/5] NX 0
  2019-10-29 15:42               ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
@ 2019-10-29 16:50                 ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2019-10-29 16:50 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 914 bytes --]

On 29/10/19 16:42, speck for Joerg Roedel wrote:
> On Tue, Oct 29, 2019 at 04:35:56PM +0100, speck for Thomas Gleixner wrote:
>> So the fix is independent of the NX pile, right? But OTOH, for testing the
>> NX stuff with all the various knobs and options the patch is needed I
>> assume.
> The fix is independent in a sense that its pure existence doesn't reveal
> anything under the embargo, but the issue that is fixed by it only
> happens with the NX pile applied.

On Intel it requires the NX pile to show up, for AMD it is a preexisting
bug that shows up with kvm-unit-tests; this is why I sent it out early.

Sorry about going incommunicado, I was travelling without my decryption
key.  I'll send it to Linus sometime this week so that it also has time
to propagate to stable series.

Paolo

> So yes, it is needed to test the NX
> patches with EPT disabled (or NPT disabled on AMD).



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: [PATCH v7 0/5] NX 0
  2019-10-28 16:02   ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
  2019-10-29  8:51     ` Thomas Gleixner
@ 2019-10-31 21:00     ` Tyler Hicks
  2019-10-31 22:13       ` Paolo Bonzini
  1 sibling, 1 reply; 21+ messages in thread
From: Tyler Hicks @ 2019-10-31 21:00 UTC (permalink / raw)
  To: speck

On 2019-10-28 17:02:03, speck for Joerg Roedel wrote:
> Hi Thomas,
> 
> On Mon, Oct 28, 2019 at 01:08:33PM +0100, speck for Thomas Gleixner wrote:
> > I assume you send another version of these. The series massively conflicts
> > with the merged TAA pile, so can you please rebase on top of that?
> 
> The conflicts are only in the first patch, I resolved them and here is
> the result:
> 
> From 386176962dcbb8827804627dcc762f373a5c0269 Mon Sep 17 00:00:00 2001
> From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Date: Mon, 28 Oct 2019 09:55:28 +0100
> Subject: [PATCH 1/5] x86: Add ITLB_MULTIHIT bug infrastructure
> 
> Some processors may incur a machine check error possibly
> resulting in an unrecoverable cpu hang when an instruction fetch
> encounters a TLB multi-hit in the instruction TLB. This can occur
> when the page size is changed along with either the physical
> address or cache type [1].
> 
> This issue affects both bare-metal x86 page tables and EPT.
> 
> This can be mitigated by either eliminating the use of large
> pages or by using careful TLB invalidations when changing the
> page size in the page tables.
> 
> Just like Spectre, Meltdown, L1TF and MDS, a new bit has been
> allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will
> be set on CPUs which are mitigated against this issue.
> 
> [1] For example please refer to erratum SKL002 in "6th Generation
> Intel Processor Family Specification Update"
> https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html
> https://www.google.com/search?q=site:intel.com+SKL002
> 
> There are a lot of other affected processors outside of Skylake and
> that the erratum(referred above) does not fully disclose the issue
> and the impact, both on Skylake and across all the affected CPUs.
> 
> Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
> Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
>  Documentation/ABI/testing/sysfs-devices-system-cpu |  1 +
>  arch/x86/include/asm/cpufeatures.h                 |  1 +
>  arch/x86/include/asm/msr-index.h                   |  7 +++
>  arch/x86/kernel/cpu/bugs.c                         | 13 +++++
>  arch/x86/kernel/cpu/common.c                       | 65 ++++++++++++----------
>  drivers/base/cpu.c                                 |  2 +
>  include/linux/cpu.h                                |  2 +
>  7 files changed, 61 insertions(+), 30 deletions(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index 0e77569bd5e0..fc20cde63d1e 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -487,6 +487,7 @@ What:		/sys/devices/system/cpu/vulnerabilities
>  		/sys/devices/system/cpu/vulnerabilities/l1tf
>  		/sys/devices/system/cpu/vulnerabilities/mds
>  		/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
> +		/sys/devices/system/cpu/vulnerabilities/itlb_multihit
>  Date:		January 2018
>  Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
>  Description:	Information about CPU vulnerabilities
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 989e03544f18..c4fbe379cc0b 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -400,5 +400,6 @@
>  #define X86_BUG_MSBDS_ONLY		X86_BUG(20) /* CPU is only affected by the  MSDBS variant of BUG_MDS */
>  #define X86_BUG_SWAPGS			X86_BUG(21) /* CPU is affected by speculation through SWAPGS */
>  #define X86_BUG_TAA			X86_BUG(22) /* CPU is affected by TSX Async Abort(TAA) */
> +#define X86_BUG_ITLB_MULTIHIT		X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */
>  
>  #endif /* _ASM_X86_CPUFEATURES_H */
> diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
> index b3a8bb2af0b6..6a3124664289 100644
> --- a/arch/x86/include/asm/msr-index.h
> +++ b/arch/x86/include/asm/msr-index.h
> @@ -93,6 +93,13 @@
>  						  * Microarchitectural Data
>  						  * Sampling (MDS) vulnerabilities.
>  						  */
> +#define ARCH_CAP_PSCHANGE_MC_NO		BIT(6)	 /*
> +						  * The processor is not susceptible to a
> +						  * machine check error due to modifying the
> +						  * code page size along with either the
> +						  * physical address or cache type
> +						  * without TLB invalidation.
> +						  */
>  #define ARCH_CAP_TSX_CTRL_MSR		BIT(7)	/* MSR for TSX control is available. */
>  #define ARCH_CAP_TAA_NO			BIT(8)	/*
>  						 * Not susceptible to
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 43c647e19439..5364beda8c61 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1419,6 +1419,11 @@ static ssize_t l1tf_show_state(char *buf)
>  }
>  #endif
>  
> +static ssize_t itlb_multihit_show_state(char *buf)
> +{
> +	return sprintf(buf, "Processor vulnerable\n");
> +}
> +
>  static ssize_t mds_show_state(char *buf)
>  {
>  	if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
> @@ -1524,6 +1529,9 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
>  	case X86_BUG_TAA:
>  		return tsx_async_abort_show_state(buf);
>  
> +	case X86_BUG_ITLB_MULTIHIT:
> +		return itlb_multihit_show_state(buf);
> +
>  	default:
>  		break;
>  	}
> @@ -1565,4 +1573,9 @@ ssize_t cpu_show_tsx_async_abort(struct device *dev, struct device_attribute *at
>  {
>  	return cpu_show_common(dev, attr, buf, X86_BUG_TAA);
>  }
> +
> +ssize_t cpu_show_itlb_multihit(struct device *dev, struct device_attribute *attr, char *buf)
> +{
> +	return cpu_show_common(dev, attr, buf, X86_BUG_ITLB_MULTIHIT);
> +}
>  #endif
> diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
> index f8b8afc8f5b5..d29b71ca3ca7 100644
> --- a/arch/x86/kernel/cpu/common.c
> +++ b/arch/x86/kernel/cpu/common.c
> @@ -1016,13 +1016,14 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
>  #endif
>  }
>  
> -#define NO_SPECULATION	BIT(0)
> -#define NO_MELTDOWN	BIT(1)
> -#define NO_SSB		BIT(2)
> -#define NO_L1TF		BIT(3)
> -#define NO_MDS		BIT(4)
> -#define MSBDS_ONLY	BIT(5)
> -#define NO_SWAPGS	BIT(6)
> +#define NO_SPECULATION		BIT(0)
> +#define NO_MELTDOWN		BIT(1)
> +#define NO_SSB			BIT(2)
> +#define NO_L1TF			BIT(3)
> +#define NO_MDS			BIT(4)
> +#define MSBDS_ONLY		BIT(5)
> +#define NO_SWAPGS		BIT(6)
> +#define NO_ITLB_MULTIHIT	BIT(7)
>  
>  #define VULNWL(_vendor, _family, _model, _whitelist)	\
>  	{ X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist }
> @@ -1043,27 +1044,27 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
>  	VULNWL(NSC,	5, X86_MODEL_ANY,	NO_SPECULATION),
>  
>  	/* Intel Family 6 */
> -	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION),
> -	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION),
> -	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION),
> -	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION),
> -	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION),
> -
> -	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> +	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
> +
> +	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
>  
>  	VULNWL_INTEL(CORE_YONAH,		NO_SSB),
>  
> -	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS),
> +	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
>  
> -	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS),
> -	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS),
> +	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
>  
>  	/*
>  	 * Technically, swapgs isn't serializing on AMD (despite it previously
> @@ -1074,14 +1075,14 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
>  	 */
>  
>  	/* AMD Family 0xf - 0x12 */
> -	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
> -	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
> -	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
> -	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
> +	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
>  
>  	/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */
> -	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
> -	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
> +	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
> +	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
>  	{}
>  };
>  
> @@ -1106,6 +1107,10 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
>  {
>  	u64 ia32_cap = x86_read_arch_cap_msr();
>  
> +	/* Set ITLB_MULTIHIT bug if cpu is not in the whitelist and not mitigated */
> +	if (!cpu_matches(NO_ITLB_MULTIHIT) && !(ia32_cap & ARCH_CAP_PSCHANGE_MC_NO))
> +		setup_force_cpu_bug(X86_BUG_ITLB_MULTIHIT);
> +
>  	if (cpu_matches(NO_SPECULATION))
>  		return;
>  
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index 0fccd8c0312e..86f531fea9d0 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c

I hate to make this thread more of a mess than it already is but I
noticed that you dropped a hunk that shouldn't have been dropped. The
following needs to be applied on top of this patch:

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 86f531fea9d0..7fbc022a235b 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -561,6 +561,12 @@ ssize_t __weak cpu_show_tsx_async_abort(struct device *dev,
 	return sprintf(buf, "Not affected\n");
 }
 
+ssize_t __weak cpu_show_itlb_multihit(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "Not affected\n");
+}
+
 static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
 static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
 static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);

> @@ -568,6 +568,7 @@ static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
>  static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);
>  static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL);
>  static DEVICE_ATTR(tsx_async_abort, 0444, cpu_show_tsx_async_abort, NULL);
> +static DEVICE_ATTR(itlb_multihit, 0444, cpu_show_itlb_multihit, NULL);
>  
>  static struct attribute *cpu_root_vulnerabilities_attrs[] = {
>  	&dev_attr_meltdown.attr,
> @@ -577,6 +578,7 @@ static struct attribute *cpu_root_vulnerabilities_attrs[] = {
>  	&dev_attr_l1tf.attr,
>  	&dev_attr_mds.attr,
>  	&dev_attr_tsx_async_abort.attr,
> +	&dev_attr_itlb_multihit.attr,
>  	NULL
>  };
>  
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index f35369f79771..2a093434e975 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -62,6 +62,8 @@ extern ssize_t cpu_show_mds(struct device *dev,
>  extern ssize_t cpu_show_tsx_async_abort(struct device *dev,
>  					struct device_attribute *attr,
>  					char *buf);
> +extern ssize_t cpu_show_itlb_multihit(struct device *dev,
> +				      struct device_attribute *attr, char *buf);
>  
>  extern __printf(4, 5)
>  struct device *cpu_device_create(struct device *parent, void *drvdata,
> -- 
> 2.16.4
> 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [MODERATED] Re: ***UNCHECKED*** Re: [PATCH v7 0/5] NX 0
  2019-10-31 21:00     ` [MODERATED] Re: ***UNCHECKED*** " Tyler Hicks
@ 2019-10-31 22:13       ` Paolo Bonzini
  0 siblings, 0 replies; 21+ messages in thread
From: Paolo Bonzini @ 2019-10-31 22:13 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 2520 bytes --]

On 31/10/19 22:00, speck for Tyler Hicks wrote:

>> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
>> index 0fccd8c0312e..86f531fea9d0 100644
>> --- a/drivers/base/cpu.c
>> +++ b/drivers/base/cpu.c
> 
> I hate to make this thread more of a mess than it already is but I
> noticed that you dropped a hunk that shouldn't have been dropped. The
> following needs to be applied on top of this patch:

Thank you Tyler.  Let me just pick Thomas's bundles and send the final
version.

Paolo

> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index 86f531fea9d0..7fbc022a235b 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -561,6 +561,12 @@ ssize_t __weak cpu_show_tsx_async_abort(struct device *dev,
>  	return sprintf(buf, "Not affected\n");
>  }
>  
> +ssize_t __weak cpu_show_itlb_multihit(struct device *dev,
> +				      struct device_attribute *attr, char *buf)
> +{
> +	return sprintf(buf, "Not affected\n");
> +}
> +
>  static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
>  static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
>  static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
> 
>> @@ -568,6 +568,7 @@ static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
>>  static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);
>>  static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL);
>>  static DEVICE_ATTR(tsx_async_abort, 0444, cpu_show_tsx_async_abort, NULL);
>> +static DEVICE_ATTR(itlb_multihit, 0444, cpu_show_itlb_multihit, NULL);
>>  
>>  static struct attribute *cpu_root_vulnerabilities_attrs[] = {
>>  	&dev_attr_meltdown.attr,
>> @@ -577,6 +578,7 @@ static struct attribute *cpu_root_vulnerabilities_attrs[] = {
>>  	&dev_attr_l1tf.attr,
>>  	&dev_attr_mds.attr,
>>  	&dev_attr_tsx_async_abort.attr,
>> +	&dev_attr_itlb_multihit.attr,
>>  	NULL
>>  };
>>  
>> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
>> index f35369f79771..2a093434e975 100644
>> --- a/include/linux/cpu.h
>> +++ b/include/linux/cpu.h
>> @@ -62,6 +62,8 @@ extern ssize_t cpu_show_mds(struct device *dev,
>>  extern ssize_t cpu_show_tsx_async_abort(struct device *dev,
>>  					struct device_attribute *attr,
>>  					char *buf);
>> +extern ssize_t cpu_show_itlb_multihit(struct device *dev,
>> +				      struct device_attribute *attr, char *buf);
>>  
>>  extern __printf(4, 5)
>>  struct device *cpu_device_create(struct device *parent, void *drvdata,
>> -- 
>> 2.16.4
>>



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-10-31 22:13 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-24  7:34 [MODERATED] [PATCH v7 0/5] NX 0 Paolo Bonzini
2019-10-24  7:34 ` [MODERATED] [PATCH v7 1/5] NX 1 Paolo Bonzini
2019-10-24  8:57   ` [MODERATED] " Paolo Bonzini
2019-10-24 14:21     ` Tyler Hicks
2019-10-24 14:25       ` Paolo Bonzini
2019-10-24 15:01         ` Tyler Hicks
2019-10-24  7:34 ` [MODERATED] [PATCH v7 2/5] NX 2 Paolo Bonzini
2019-10-24  7:34 ` [MODERATED] [PATCH v7 3/5] NX 3 Paolo Bonzini
2019-10-24  7:34 ` [MODERATED] [PATCH v7 4/5] NX 4 Paolo Bonzini
2019-10-24  7:34 ` [MODERATED] [PATCH v7 5/5] NX 5 Paolo Bonzini
2019-10-28 12:08 ` [PATCH v7 0/5] NX 0 Thomas Gleixner
2019-10-28 16:02   ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
2019-10-29  8:51     ` Thomas Gleixner
2019-10-29  9:39       ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
2019-10-29 11:00         ` Thomas Gleixner
2019-10-29 12:31           ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
2019-10-29 15:35             ` Thomas Gleixner
2019-10-29 15:42               ` [MODERATED] Re: ***UNCHECKED*** " Joerg Roedel
2019-10-29 16:50                 ` Paolo Bonzini
2019-10-31 21:00     ` [MODERATED] Re: ***UNCHECKED*** " Tyler Hicks
2019-10-31 22:13       ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).