historical-speck.lore.kernel.org archive mirror
 help / color / mirror / Atom feed
* [MODERATED] [PATCH v8 0/5] NX 0
@ 2019-10-31 23:33 Paolo Bonzini
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 1/5] NX 1 Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-31 23:33 UTC (permalink / raw)
  To: speck

From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH v8 0/5] NX


v7->v8: rebase on top of the annoying affair

Junaid Shahid (2):
  kvm: Add helper function for creating VM worker
  kvm: x86: mmu: Recovery of shattered NX large pages

Paolo Bonzini (1):
  kvm: mmu: ITLB_MULTIHIT mitigation

Pawan Gupta (2):
  x86: Add ITLB_MULTIHIT bug infrastructure
  x86/cpu: Add Tremont to the cpu vulnerability whitelist

 .../ABI/testing/sysfs-devices-system-cpu      |   1 +
 .../admin-guide/kernel-parameters.txt         |  17 ++
 arch/x86/include/asm/cpufeatures.h            |   1 +
 arch/x86/include/asm/kvm_host.h               |   6 +
 arch/x86/include/asm/msr-index.h              |   7 +
 arch/x86/kernel/cpu/bugs.c                    |  24 ++
 arch/x86/kernel/cpu/common.c                  |  67 +++--
 arch/x86/kvm/mmu.c                            | 264 +++++++++++++++++-
 arch/x86/kvm/mmu.h                            |   4 +
 arch/x86/kvm/paging_tmpl.h                    |  29 +-
 arch/x86/kvm/x86.c                            |  20 ++
 drivers/base/cpu.c                            |   8 +
 include/linux/cpu.h                           |   2 +
 include/linux/kvm_host.h                      |   6 +
 virt/kvm/kvm_main.c                           | 114 +++++++-
 15 files changed, 527 insertions(+), 43 deletions(-)

-- 
2.21.0

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH v8 1/5] NX 1
  2019-10-31 23:33 [MODERATED] [PATCH v8 0/5] NX 0 Paolo Bonzini
@ 2019-10-31 23:33 ` Paolo Bonzini
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 2/5] NX 2 Paolo Bonzini
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-31 23:33 UTC (permalink / raw)
  To: speck

From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Subject: [PATCH v8 1/5] x86: Add ITLB_MULTIHIT bug infrastructure


Some processors may incur a machine check error possibly
resulting in an unrecoverable cpu hang when an instruction fetch
encounters a TLB multi-hit in the instruction TLB. This can occur
when the page size is changed along with either the physical
address or cache type [1].

This issue affects both bare-metal x86 page tables and EPT.

This can be mitigated by either eliminating the use of large
pages or by using careful TLB invalidations when changing the
page size in the page tables.

Just like Spectre, Meltdown, L1TF and MDS, a new bit has been
allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will
be set on CPUs which are mitigated against this issue.

[1] For example please refer to erratum SKL002 in "6th Generation
Intel Processor Family Specification Update"
https://www.intel.com/content/www/us/en/products/docs/processors/core/desktop-6th-gen-core-family-spec-update.html
https://www.google.com/search?q=site:intel.com+SKL002

There are a lot of other affected processors outside of Skylake and
that the erratum(referred above) does not fully disclose the issue
and the impact, both on Skylake and across all the affected CPUs.

Signed-off-by: Vineela Tummalapalli <vineela.tummalapalli@intel.com>
Co-developed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 .../ABI/testing/sysfs-devices-system-cpu      |  1 +
 arch/x86/include/asm/cpufeatures.h            |  1 +
 arch/x86/include/asm/msr-index.h              |  7 ++
 arch/x86/kernel/cpu/bugs.c                    | 13 ++++
 arch/x86/kernel/cpu/common.c                  | 65 ++++++++++---------
 drivers/base/cpu.c                            |  8 +++
 include/linux/cpu.h                           |  2 +
 7 files changed, 67 insertions(+), 30 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu
index 0e77569bd5e0..fc20cde63d1e 100644
--- a/Documentation/ABI/testing/sysfs-devices-system-cpu
+++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
@@ -487,6 +487,7 @@ What:		/sys/devices/system/cpu/vulnerabilities
 		/sys/devices/system/cpu/vulnerabilities/l1tf
 		/sys/devices/system/cpu/vulnerabilities/mds
 		/sys/devices/system/cpu/vulnerabilities/tsx_async_abort
+		/sys/devices/system/cpu/vulnerabilities/itlb_multihit
 Date:		January 2018
 Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
 Description:	Information about CPU vulnerabilities
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 989e03544f18..c4fbe379cc0b 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -400,5 +400,6 @@
 #define X86_BUG_MSBDS_ONLY		X86_BUG(20) /* CPU is only affected by the  MSDBS variant of BUG_MDS */
 #define X86_BUG_SWAPGS			X86_BUG(21) /* CPU is affected by speculation through SWAPGS */
 #define X86_BUG_TAA			X86_BUG(22) /* CPU is affected by TSX Async Abort(TAA) */
+#define X86_BUG_ITLB_MULTIHIT		X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */
 
 #endif /* _ASM_X86_CPUFEATURES_H */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index b3a8bb2af0b6..6a3124664289 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -93,6 +93,13 @@
 						  * Microarchitectural Data
 						  * Sampling (MDS) vulnerabilities.
 						  */
+#define ARCH_CAP_PSCHANGE_MC_NO		BIT(6)	 /*
+						  * The processor is not susceptible to a
+						  * machine check error due to modifying the
+						  * code page size along with either the
+						  * physical address or cache type
+						  * without TLB invalidation.
+						  */
 #define ARCH_CAP_TSX_CTRL_MSR		BIT(7)	/* MSR for TSX control is available. */
 #define ARCH_CAP_TAA_NO			BIT(8)	/*
 						 * Not susceptible to
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 43c647e19439..5364beda8c61 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1419,6 +1419,11 @@ static ssize_t l1tf_show_state(char *buf)
 }
 #endif
 
+static ssize_t itlb_multihit_show_state(char *buf)
+{
+	return sprintf(buf, "Processor vulnerable\n");
+}
+
 static ssize_t mds_show_state(char *buf)
 {
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
@@ -1524,6 +1529,9 @@ static ssize_t cpu_show_common(struct device *dev, struct device_attribute *attr
 	case X86_BUG_TAA:
 		return tsx_async_abort_show_state(buf);
 
+	case X86_BUG_ITLB_MULTIHIT:
+		return itlb_multihit_show_state(buf);
+
 	default:
 		break;
 	}
@@ -1565,4 +1573,9 @@ ssize_t cpu_show_tsx_async_abort(struct device *dev, struct device_attribute *at
 {
 	return cpu_show_common(dev, attr, buf, X86_BUG_TAA);
 }
+
+ssize_t cpu_show_itlb_multihit(struct device *dev, struct device_attribute *attr, char *buf)
+{
+	return cpu_show_common(dev, attr, buf, X86_BUG_ITLB_MULTIHIT);
+}
 #endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index f8b8afc8f5b5..d29b71ca3ca7 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1016,13 +1016,14 @@ static void identify_cpu_without_cpuid(struct cpuinfo_x86 *c)
 #endif
 }
 
-#define NO_SPECULATION	BIT(0)
-#define NO_MELTDOWN	BIT(1)
-#define NO_SSB		BIT(2)
-#define NO_L1TF		BIT(3)
-#define NO_MDS		BIT(4)
-#define MSBDS_ONLY	BIT(5)
-#define NO_SWAPGS	BIT(6)
+#define NO_SPECULATION		BIT(0)
+#define NO_MELTDOWN		BIT(1)
+#define NO_SSB			BIT(2)
+#define NO_L1TF			BIT(3)
+#define NO_MDS			BIT(4)
+#define MSBDS_ONLY		BIT(5)
+#define NO_SWAPGS		BIT(6)
+#define NO_ITLB_MULTIHIT	BIT(7)
 
 #define VULNWL(_vendor, _family, _model, _whitelist)	\
 	{ X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist }
@@ -1043,27 +1044,27 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
 	VULNWL(NSC,	5, X86_MODEL_ANY,	NO_SPECULATION),
 
 	/* Intel Family 6 */
-	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION),
-	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION),
-	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION),
-
-	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_SALTWELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SALTWELL_TABLET,	NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SALTWELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_BONNELL,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_BONNELL_MID,		NO_SPECULATION | NO_ITLB_MULTIHIT),
+
+	VULNWL_INTEL(ATOM_SILVERMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SILVERMONT_D,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_SILVERMONT_MID,	NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_AIRMONT,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(XEON_PHI_KNL,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(XEON_PHI_KNM,		NO_SSB | NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	VULNWL_INTEL(CORE_YONAH,		NO_SSB),
 
-	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_AIRMONT_MID,		NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_AIRMONT_NP,		NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
-	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS),
-	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS),
+	VULNWL_INTEL(ATOM_GOLDMONT,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_GOLDMONT_D,		NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_INTEL(ATOM_GOLDMONT_PLUS,	NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	/*
 	 * Technically, swapgs isn't serializing on AMD (despite it previously
@@ -1074,14 +1075,14 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
 	 */
 
 	/* AMD Family 0xf - 0x12 */
-	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS),
+	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x11,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_AMD(0x12,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
 
 	/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */
-	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
-	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS),
+	VULNWL_AMD(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
+	VULNWL_HYGON(X86_FAMILY_ANY,	NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
 	{}
 };
 
@@ -1106,6 +1107,10 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
 {
 	u64 ia32_cap = x86_read_arch_cap_msr();
 
+	/* Set ITLB_MULTIHIT bug if cpu is not in the whitelist and not mitigated */
+	if (!cpu_matches(NO_ITLB_MULTIHIT) && !(ia32_cap & ARCH_CAP_PSCHANGE_MC_NO))
+		setup_force_cpu_bug(X86_BUG_ITLB_MULTIHIT);
+
 	if (cpu_matches(NO_SPECULATION))
 		return;
 
diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 0fccd8c0312e..6265871a4af2 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -561,6 +561,12 @@ ssize_t __weak cpu_show_tsx_async_abort(struct device *dev,
 	return sprintf(buf, "Not affected\n");
 }
 
+ssize_t __weak cpu_show_itlb_multihit(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "Not affected\n");
+}
+
 static DEVICE_ATTR(meltdown, 0444, cpu_show_meltdown, NULL);
 static DEVICE_ATTR(spectre_v1, 0444, cpu_show_spectre_v1, NULL);
 static DEVICE_ATTR(spectre_v2, 0444, cpu_show_spectre_v2, NULL);
@@ -568,6 +574,7 @@ static DEVICE_ATTR(spec_store_bypass, 0444, cpu_show_spec_store_bypass, NULL);
 static DEVICE_ATTR(l1tf, 0444, cpu_show_l1tf, NULL);
 static DEVICE_ATTR(mds, 0444, cpu_show_mds, NULL);
 static DEVICE_ATTR(tsx_async_abort, 0444, cpu_show_tsx_async_abort, NULL);
+static DEVICE_ATTR(itlb_multihit, 0444, cpu_show_itlb_multihit, NULL);
 
 static struct attribute *cpu_root_vulnerabilities_attrs[] = {
 	&dev_attr_meltdown.attr,
@@ -577,6 +584,7 @@ static struct attribute *cpu_root_vulnerabilities_attrs[] = {
 	&dev_attr_l1tf.attr,
 	&dev_attr_mds.attr,
 	&dev_attr_tsx_async_abort.attr,
+	&dev_attr_itlb_multihit.attr,
 	NULL
 };
 
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index f35369f79771..2a093434e975 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -62,6 +62,8 @@ extern ssize_t cpu_show_mds(struct device *dev,
 extern ssize_t cpu_show_tsx_async_abort(struct device *dev,
 					struct device_attribute *attr,
 					char *buf);
+extern ssize_t cpu_show_itlb_multihit(struct device *dev,
+				      struct device_attribute *attr, char *buf);
 
 extern __printf(4, 5)
 struct device *cpu_device_create(struct device *parent, void *drvdata,
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH v8 2/5] NX 2
  2019-10-31 23:33 [MODERATED] [PATCH v8 0/5] NX 0 Paolo Bonzini
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 1/5] NX 1 Paolo Bonzini
@ 2019-10-31 23:33 ` Paolo Bonzini
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 3/5] NX 3 Paolo Bonzini
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-31 23:33 UTC (permalink / raw)
  To: speck

From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Subject: [PATCH v8 2/5] x86/cpu: Add Tremont to the cpu vulnerability whitelist


This patch adds new cpu family ATOM_TREMONT_D to the cpu vunerability
whitelist. ATOM_TREMONT_D is not affected by X86_BUG_ITLB_MULTIHIT. There
may be more bugs not affecting ATOM_TREMONT_D which are not known at
this point and could be added later.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kernel/cpu/common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d29b71ca3ca7..fffe21945374 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1074,6 +1074,8 @@ static const __initconst struct x86_cpu_id cpu_vuln_whitelist[] = {
 	 * good enough for our purposes.
 	 */
 
+	VULNWL_INTEL(ATOM_TREMONT_D,		NO_ITLB_MULTIHIT),
+
 	/* AMD Family 0xf - 0x12 */
 	VULNWL_AMD(0x0f,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
 	VULNWL_AMD(0x10,	NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT),
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH v8 3/5] NX 3
  2019-10-31 23:33 [MODERATED] [PATCH v8 0/5] NX 0 Paolo Bonzini
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 1/5] NX 1 Paolo Bonzini
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 2/5] NX 2 Paolo Bonzini
@ 2019-10-31 23:33 ` Paolo Bonzini
  2019-11-01  0:24   ` [MODERATED] " Pawan Gupta
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 4/5] NX 4 Paolo Bonzini
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 5/5] NX 5 Paolo Bonzini
  4 siblings, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-31 23:33 UTC (permalink / raw)
  To: speck

From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH v8 3/5] kvm: mmu: ITLB_MULTIHIT mitigation


With some Intel processors, putting the same virtual address in the TLB
as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit
and cause the processor to issue a machine check.  Unfortunately if EPT
page tables use huge pages, it possible for a malicious guest to cause
this situation.

This patch adds a knob to mark huge pages as non-executable. When the
nx_huge_pages parameter is enabled (and we are using EPT), all huge pages
are marked as NX. If the guest attempts to execute in one of those pages,
the page is broken down into 4K pages, which are then marked executable.

This is not an issue for shadow paging (except nested EPT), because then
the host is in control of TLB flushes and the problematic situation cannot
happen.  With nested EPT, again the nested guest can cause problems so we
treat shadow and direct EPT the same.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 .../admin-guide/kernel-parameters.txt         |  11 ++
 arch/x86/include/asm/kvm_host.h               |   2 +
 arch/x86/kernel/cpu/bugs.c                    |  13 +-
 arch/x86/kvm/mmu.c                            | 135 +++++++++++++++++-
 arch/x86/kvm/paging_tmpl.h                    |  29 +++-
 arch/x86/kvm/x86.c                            |   9 ++
 6 files changed, 186 insertions(+), 13 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fa8f03ddff24..7cddfc98ffa7 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2055,6 +2055,17 @@
 			KVM MMU at runtime.
 			Default is 0 (off)
 
+	kvm.nx_huge_pages=
+			[KVM] Controls the sw workaround for bug
+			X86_BUG_ITLB_MULTIHIT.
+			force	: Always deploy workaround.
+			off	: Default. Never deploy workaround.
+			auto	: Deploy workaround based on presence of
+				  X86_BUG_ITLB_MULTIHIT.
+
+			If the sw workaround is enabled for the host, guests
+			need not enable it for nested guests.
+
 	kvm-amd.nested=	[KVM,AMD] Allow nested virtualization in KVM/SVM.
 			Default is 1 (enabled)
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 24d6598dea29..a37b03483b66 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -315,6 +315,7 @@ struct kvm_mmu_page {
 	bool unsync;
 	u8 mmu_valid_gen;
 	bool mmio_cached;
+	bool lpage_disallowed; /* Can't be replaced by an equiv large page */
 
 	/*
 	 * The following two entries are used to key the shadow page in the
@@ -946,6 +947,7 @@ struct kvm_vm_stat {
 	ulong mmu_unsync;
 	ulong remote_tlb_flush;
 	ulong lpages;
+	ulong nx_lpage_splits;
 	ulong max_mmu_page_hash_collisions;
 };
 
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 5364beda8c61..850005590167 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -1257,6 +1257,9 @@ void x86_spec_ctrl_setup_ap(void)
 		x86_amd_ssb_disable();
 }
 
+bool itlb_multihit_kvm_mitigation;
+EXPORT_SYMBOL_GPL(itlb_multihit_kvm_mitigation);
+
 #undef pr_fmt
 #define pr_fmt(fmt)	"L1TF: " fmt
 
@@ -1412,17 +1415,25 @@ static ssize_t l1tf_show_state(char *buf)
 		       l1tf_vmx_states[l1tf_vmx_mitigation],
 		       sched_smt_active() ? "vulnerable" : "disabled");
 }
+
+static ssize_t itlb_multihit_show_state(char *buf)
+{
+	if (itlb_multihit_kvm_mitigation)
+		return sprintf(buf, "KVM: Mitigation: Split huge pages\n");
+	else
+		return sprintf(buf, "KVM: Vulnerable\n");
+}
 #else
 static ssize_t l1tf_show_state(char *buf)
 {
 	return sprintf(buf, "%s\n", L1TF_DEFAULT_MSG);
 }
-#endif
 
 static ssize_t itlb_multihit_show_state(char *buf)
 {
 	return sprintf(buf, "Processor vulnerable\n");
 }
+#endif
 
 static ssize_t mds_show_state(char *buf)
 {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 24c23c66b226..837beefdf0a5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -47,6 +47,20 @@
 #include <asm/kvm_page_track.h>
 #include "trace.h"
 
+extern bool itlb_multihit_kvm_mitigation;
+
+static int __read_mostly nx_huge_pages = -1;
+
+static int set_nx_huge_pages(const char *val, const struct kernel_param *kp);
+
+static struct kernel_param_ops nx_huge_pages_ops = {
+	.set = set_nx_huge_pages,
+	.get = param_get_bool,
+};
+
+module_param_cb(nx_huge_pages, &nx_huge_pages_ops, &nx_huge_pages, 0644);
+__MODULE_PARM_TYPE(nx_huge_pages, "bool");
+
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
  * where the hardware walks 2 page tables:
@@ -352,6 +366,11 @@ static inline bool spte_ad_need_write_protect(u64 spte)
 	return (spte & SPTE_SPECIAL_MASK) != SPTE_AD_ENABLED_MASK;
 }
 
+static bool is_nx_huge_page_enabled(void)
+{
+	return READ_ONCE(nx_huge_pages);
+}
+
 static inline u64 spte_shadow_accessed_mask(u64 spte)
 {
 	MMU_WARN_ON(is_mmio_spte(spte));
@@ -1190,6 +1209,15 @@ static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_disallow_lpage(slot, gfn);
 }
 
+static void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	if (sp->lpage_disallowed)
+		return;
+
+	++kvm->stat.nx_lpage_splits;
+	sp->lpage_disallowed = true;
+}
+
 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	struct kvm_memslots *slots;
@@ -1207,6 +1235,12 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp)
 	kvm_mmu_gfn_allow_lpage(slot, gfn);
 }
 
+static void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
+{
+	--kvm->stat.nx_lpage_splits;
+	sp->lpage_disallowed = false;
+}
+
 static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
 					  struct kvm_memory_slot *slot)
 {
@@ -2792,6 +2826,9 @@ static bool __kvm_mmu_prepare_zap_page(struct kvm *kvm,
 			kvm_reload_remote_mmus(kvm);
 	}
 
+	if (sp->lpage_disallowed)
+		unaccount_huge_nx_page(kvm, sp);
+
 	sp->role.invalid = 1;
 	return list_unstable;
 }
@@ -3013,6 +3050,11 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
 	if (!speculative)
 		spte |= spte_shadow_accessed_mask(spte);
 
+	if (level > PT_PAGE_TABLE_LEVEL && (pte_access & ACC_EXEC_MASK) &&
+	    is_nx_huge_page_enabled()) {
+		pte_access &= ~ACC_EXEC_MASK;
+	}
+
 	if (pte_access & ACC_EXEC_MASK)
 		spte |= shadow_x_mask;
 	else
@@ -3233,9 +3275,32 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep)
 	__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
+static void disallowed_hugepage_adjust(struct kvm_shadow_walk_iterator it,
+				       gfn_t gfn, kvm_pfn_t *pfnp, int *levelp)
+{
+	int level = *levelp;
+	u64 spte = *it.sptep;
+
+	if (it.level == level && level > PT_PAGE_TABLE_LEVEL &&
+	    is_nx_huge_page_enabled() &&
+	    is_shadow_present_pte(spte) &&
+	    !is_large_pte(spte)) {
+		/*
+		 * A small SPTE exists for this pfn, but FNAME(fetch)
+		 * and __direct_map would like to create a large PTE
+		 * instead: just force them to go down another level,
+		 * patching back for them into pfn the next 9 bits of
+		 * the address.
+		 */
+		u64 page_mask = KVM_PAGES_PER_HPAGE(level) - KVM_PAGES_PER_HPAGE(level - 1);
+		*pfnp |= gfn & page_mask;
+		(*levelp)--;
+	}
+}
+
 static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 			int map_writable, int level, kvm_pfn_t pfn,
-			bool prefault)
+			bool prefault, bool lpage_disallowed)
 {
 	struct kvm_shadow_walk_iterator it;
 	struct kvm_mmu_page *sp;
@@ -3248,6 +3313,12 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 
 	trace_kvm_mmu_spte_requested(gpa, level, pfn);
 	for_each_shadow_entry(vcpu, gpa, it) {
+		/*
+		 * We cannot overwrite existing page tables with an NX
+		 * large page, as the leaf could be executable.
+		 */
+		disallowed_hugepage_adjust(it, gfn, &pfn, &level);
+
 		base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 		if (it.level == level)
 			break;
@@ -3258,6 +3329,8 @@ static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, int write,
 					      it.level - 1, true, ACC_ALL);
 
 			link_shadow_page(vcpu, it.sptep, sp);
+			if (lpage_disallowed)
+				account_huge_nx_page(vcpu->kvm, sp);
 		}
 	}
 
@@ -3550,11 +3623,14 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
 {
 	int r;
 	int level;
-	bool force_pt_level = false;
+	bool force_pt_level;
 	kvm_pfn_t pfn;
 	unsigned long mmu_seq;
 	bool map_writable, write = error_code & PFERR_WRITE_MASK;
+	bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
+				is_nx_huge_page_enabled();
 
+	force_pt_level = lpage_disallowed;
 	level = mapping_level(vcpu, gfn, &force_pt_level);
 	if (likely(!force_pt_level)) {
 		/*
@@ -3588,7 +3664,8 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code,
 		goto out_unlock;
 	if (likely(!force_pt_level))
 		transparent_hugepage_adjust(vcpu, gfn, &pfn, &level);
-	r = __direct_map(vcpu, v, write, map_writable, level, pfn, prefault);
+	r = __direct_map(vcpu, v, write, map_writable, level, pfn,
+			 prefault, false);
 out_unlock:
 	spin_unlock(&vcpu->kvm->mmu_lock);
 	kvm_release_pfn_clean(pfn);
@@ -4174,6 +4251,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 	unsigned long mmu_seq;
 	int write = error_code & PFERR_WRITE_MASK;
 	bool map_writable;
+	bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
+				is_nx_huge_page_enabled();
 
 	MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa));
 
@@ -4184,8 +4263,9 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 	if (r)
 		return r;
 
-	force_pt_level = !check_hugepage_cache_consistency(vcpu, gfn,
-							   PT_DIRECTORY_LEVEL);
+	force_pt_level =
+		lpage_disallowed ||
+		!check_hugepage_cache_consistency(vcpu, gfn, PT_DIRECTORY_LEVEL);
 	level = mapping_level(vcpu, gfn, &force_pt_level);
 	if (likely(!force_pt_level)) {
 		if (level > PT_DIRECTORY_LEVEL &&
@@ -4214,7 +4294,8 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
 		goto out_unlock;
 	if (likely(!force_pt_level))
 		transparent_hugepage_adjust(vcpu, gfn, &pfn, &level);
-	r = __direct_map(vcpu, gpa, write, map_writable, level, pfn, prefault);
+	r = __direct_map(vcpu, gpa, write, map_writable, level, pfn,
+			 prefault, lpage_disallowed);
 out_unlock:
 	spin_unlock(&vcpu->kvm->mmu_lock);
 	kvm_release_pfn_clean(pfn);
@@ -6155,10 +6236,52 @@ static void kvm_set_mmio_spte_mask(void)
 	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
 }
 
+static void __set_nx_huge_pages(bool val)
+{
+	nx_huge_pages = itlb_multihit_kvm_mitigation = val;
+}
+
+static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
+{
+	bool old_val = nx_huge_pages;
+	bool new_val;
+
+	/* In "auto" mode deploy workaround only if CPU has the bug. */
+	if (sysfs_streq(val, "off"))
+		new_val = 0;
+	else if (sysfs_streq(val, "force"))
+		new_val = 1;
+	else if (sysfs_streq(val, "auto"))
+		new_val = boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT);
+	else if (strtobool(val, &new_val) < 0)
+		return -EINVAL;
+
+	__set_nx_huge_pages(new_val);
+
+	if (new_val != old_val) {
+		struct kvm *kvm;
+		int idx;
+
+		mutex_lock(&kvm_lock);
+
+		list_for_each_entry(kvm, &vm_list, vm_list) {
+			idx = srcu_read_lock(&kvm->srcu);
+			kvm_mmu_zap_all_fast(kvm);
+			srcu_read_unlock(&kvm->srcu, idx);
+		}
+		mutex_unlock(&kvm_lock);
+	}
+
+	return 0;
+}
+
 int kvm_mmu_module_init(void)
 {
 	int ret = -ENOMEM;
 
+	if (nx_huge_pages == -1)
+		__set_nx_huge_pages(boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT));
+
 	/*
 	 * MMU roles use union aliasing which is, generally speaking, an
 	 * undefined behavior. However, we supposedly know how compilers behave
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 7d5cdb3af594..97b21e7fd013 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -614,13 +614,14 @@ static void FNAME(pte_prefetch)(struct kvm_vcpu *vcpu, struct guest_walker *gw,
 static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			 struct guest_walker *gw,
 			 int write_fault, int hlevel,
-			 kvm_pfn_t pfn, bool map_writable, bool prefault)
+			 kvm_pfn_t pfn, bool map_writable, bool prefault,
+			 bool lpage_disallowed)
 {
 	struct kvm_mmu_page *sp = NULL;
 	struct kvm_shadow_walk_iterator it;
 	unsigned direct_access, access = gw->pt_access;
 	int top_level, ret;
-	gfn_t base_gfn;
+	gfn_t gfn, base_gfn;
 
 	direct_access = gw->pte_access;
 
@@ -665,13 +666,25 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			link_shadow_page(vcpu, it.sptep, sp);
 	}
 
-	base_gfn = gw->gfn;
+	/*
+	 * FNAME(page_fault) might have clobbered the bottom bits of
+	 * gw->gfn, restore them from the virtual address.
+	 */
+	gfn = gw->gfn | ((addr & PT_LVL_OFFSET_MASK(gw->level)) >> PAGE_SHIFT);
+	base_gfn = gfn;
 
 	trace_kvm_mmu_spte_requested(addr, gw->level, pfn);
 
 	for (; shadow_walk_okay(&it); shadow_walk_next(&it)) {
 		clear_sp_write_flooding_count(it.sptep);
-		base_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
+
+		/*
+		 * We cannot overwrite existing page tables with an NX
+		 * large page, as the leaf could be executable.
+		 */
+		disallowed_hugepage_adjust(it, gfn, &pfn, &hlevel);
+
+		base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1);
 		if (it.level == hlevel)
 			break;
 
@@ -683,6 +696,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
 			sp = kvm_mmu_get_page(vcpu, base_gfn, addr,
 					      it.level - 1, true, direct_access);
 			link_shadow_page(vcpu, it.sptep, sp);
+			if (lpage_disallowed)
+				account_huge_nx_page(vcpu->kvm, sp);
 		}
 	}
 
@@ -759,9 +774,11 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 	int r;
 	kvm_pfn_t pfn;
 	int level = PT_PAGE_TABLE_LEVEL;
-	bool force_pt_level = false;
 	unsigned long mmu_seq;
 	bool map_writable, is_self_change_mapping;
+	bool lpage_disallowed = (error_code & PFERR_FETCH_MASK) &&
+				is_nx_huge_page_enabled();
+	bool force_pt_level = lpage_disallowed;
 
 	pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
 
@@ -851,7 +868,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
 	if (!force_pt_level)
 		transparent_hugepage_adjust(vcpu, walker.gfn, &pfn, &level);
 	r = FNAME(fetch)(vcpu, addr, &walker, write_fault,
-			 level, pfn, map_writable, prefault);
+			 level, pfn, map_writable, prefault, lpage_disallowed);
 	kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT);
 
 out_unlock:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 32d70ca2a7fd..b087d178a774 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -213,6 +213,7 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ "mmu_unsync", VM_STAT(mmu_unsync) },
 	{ "remote_tlb_flush", VM_STAT(remote_tlb_flush) },
 	{ "largepages", VM_STAT(lpages, .mode = 0444) },
+	{ "nx_largepages_splitted", VM_STAT(nx_lpage_splits, .mode = 0444) },
 	{ "max_mmu_page_hash_collisions",
 		VM_STAT(max_mmu_page_hash_collisions) },
 	{ NULL }
@@ -1279,6 +1280,14 @@ static u64 kvm_get_arch_capabilities(void)
 	if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES))
 		rdmsrl(MSR_IA32_ARCH_CAPABILITIES, data);
 
+	/*
+	 * If nx_huge_pages is enabled, KVM's shadow paging will ensure that
+	 * the nested hypervisor runs with NX huge pages.  If it is not,
+	 * L1 is anyway vulnerable to ITLB_MULTIHIT explots from other
+	 * L1 guests, so it need not worry about its own (L2) guests.
+	 */
+	data |= ARCH_CAP_PSCHANGE_MC_NO;
+
 	/*
 	 * If we're doing cache flushes (either "always" or "cond")
 	 * we will do one whenever the guest does a vmlaunch/vmresume.
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH v8 4/5] NX 4
  2019-10-31 23:33 [MODERATED] [PATCH v8 0/5] NX 0 Paolo Bonzini
                   ` (2 preceding siblings ...)
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 3/5] NX 3 Paolo Bonzini
@ 2019-10-31 23:33 ` Paolo Bonzini
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 5/5] NX 5 Paolo Bonzini
  4 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-31 23:33 UTC (permalink / raw)
  To: speck

From: Junaid Shahid <junaids@google.com>
Subject: [PATCH v8 4/5] kvm: Add helper function for creating VM worker threads


This adds a function to create a kernel thread associated with a given
VM. In particular, it ensures that the worker thread inherits the
priority and cgroups of the calling thread.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 include/linux/kvm_host.h |  6 +++
 virt/kvm/kvm_main.c      | 84 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 719fc3e15ea4..52ed5f66e8f9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1382,4 +1382,10 @@ static inline int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
 }
 #endif /* CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE */
 
+typedef int (*kvm_vm_thread_fn_t)(struct kvm *kvm, uintptr_t data);
+
+int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn,
+				uintptr_t data, const char *name,
+				struct task_struct **thread_ptr);
+
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 67ef3f2e19e8..513b49be83e0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -50,6 +50,7 @@
 #include <linux/bsearch.h>
 #include <linux/io.h>
 #include <linux/lockdep.h>
+#include <linux/kthread.h>
 
 #include <asm/processor.h>
 #include <asm/ioctl.h>
@@ -4367,3 +4368,86 @@ void kvm_exit(void)
 	kvm_vfio_ops_exit();
 }
 EXPORT_SYMBOL_GPL(kvm_exit);
+
+struct kvm_vm_worker_thread_context {
+	struct kvm *kvm;
+	struct task_struct *parent;
+	struct completion init_done;
+	kvm_vm_thread_fn_t thread_fn;
+	uintptr_t data;
+	int err;
+};
+
+static int kvm_vm_worker_thread(void *context)
+{
+	/*
+	 * The init_context is allocated on the stack of the parent thread, so
+	 * we have to locally copy anything that is needed beyond initialization
+	 */
+	struct kvm_vm_worker_thread_context *init_context = context;
+	struct kvm *kvm = init_context->kvm;
+	kvm_vm_thread_fn_t thread_fn = init_context->thread_fn;
+	uintptr_t data = init_context->data;
+	int err;
+
+	err = kthread_park(current);
+	/* kthread_park(current) is never supposed to return an error */
+	WARN_ON(err != 0);
+	if (err)
+		goto init_complete;
+
+	err = cgroup_attach_task_all(init_context->parent, current);
+	if (err) {
+		kvm_err("%s: cgroup_attach_task_all failed with err %d\n",
+			__func__, err);
+		goto init_complete;
+	}
+
+	set_user_nice(current, task_nice(init_context->parent));
+
+init_complete:
+	init_context->err = err;
+	complete(&init_context->init_done);
+	init_context = NULL;
+
+	if (err)
+		return err;
+
+	/* Wait to be woken up by the spawner before proceeding. */
+	kthread_parkme();
+
+	if (!kthread_should_stop())
+		err = thread_fn(kvm, data);
+
+	return err;
+}
+
+int kvm_vm_create_worker_thread(struct kvm *kvm, kvm_vm_thread_fn_t thread_fn,
+				uintptr_t data, const char *name,
+				struct task_struct **thread_ptr)
+{
+	struct kvm_vm_worker_thread_context init_context = {};
+	struct task_struct *thread;
+
+	*thread_ptr = NULL;
+	init_context.kvm = kvm;
+	init_context.parent = current;
+	init_context.thread_fn = thread_fn;
+	init_context.data = data;
+	init_completion(&init_context.init_done);
+
+	thread = kthread_run(kvm_vm_worker_thread, &init_context,
+			     "%s-%d", name, task_pid_nr(current));
+	if (IS_ERR(thread))
+		return PTR_ERR(thread);
+
+	/* kthread_run is never supposed to return NULL */
+	WARN_ON(thread == NULL);
+
+	wait_for_completion(&init_context.init_done);
+
+	if (!init_context.err)
+		*thread_ptr = thread;
+
+	return init_context.err;
+}
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH v8 5/5] NX 5
  2019-10-31 23:33 [MODERATED] [PATCH v8 0/5] NX 0 Paolo Bonzini
                   ` (3 preceding siblings ...)
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 4/5] NX 4 Paolo Bonzini
@ 2019-10-31 23:33 ` Paolo Bonzini
  4 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-10-31 23:33 UTC (permalink / raw)
  To: speck

From: Junaid Shahid <junaids@google.com>
Subject: [PATCH v8 5/5] kvm: x86: mmu: Recovery of shattered NX large pages


The page table pages corresponding to broken down large pages are
zapped in FIFO order, so that the large page can potentially
be recovered, if it is no longer being used for execution.  This removes
the performance penalty for walking deeper EPT page tables.

By default, one large page will last about one hour once the guest
reaches a steady state.

Signed-off-by: Junaid Shahid <junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 .../admin-guide/kernel-parameters.txt         |   6 +
 arch/x86/include/asm/kvm_host.h               |   4 +
 arch/x86/kvm/mmu.c                            | 129 ++++++++++++++++++
 arch/x86/kvm/mmu.h                            |   4 +
 arch/x86/kvm/x86.c                            |  11 ++
 virt/kvm/kvm_main.c                           |  30 +++-
 6 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7cddfc98ffa7..e8e0a140a632 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2066,6 +2066,12 @@
 			If the sw workaround is enabled for the host, guests
 			need not enable it for nested guests.
 
+	kvm.nx_huge_pages_recovery_ratio=
+			[KVM] Controls how many 4KiB pages are periodically zapped
+			back to huge pages.  0 disables the recovery, otherwise if
+			the value is N KVM will zap 1/Nth of the 4KiB pages every
+			minute.  The default is 60.
+
 	kvm-amd.nested=	[KVM,AMD] Allow nested virtualization in KVM/SVM.
 			Default is 1 (enabled)
 
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a37b03483b66..4fc61483919a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -312,6 +312,8 @@ struct kvm_rmap_head {
 struct kvm_mmu_page {
 	struct list_head link;
 	struct hlist_node hash_link;
+	struct list_head lpage_disallowed_link;
+
 	bool unsync;
 	u8 mmu_valid_gen;
 	bool mmio_cached;
@@ -860,6 +862,7 @@ struct kvm_arch {
 	 */
 	struct list_head active_mmu_pages;
 	struct list_head zapped_obsolete_pages;
+	struct list_head lpage_disallowed_mmu_pages;
 	struct kvm_page_track_notifier_node mmu_sp_tracker;
 	struct kvm_page_track_notifier_head track_notifier_head;
 
@@ -934,6 +937,7 @@ struct kvm_arch {
 	bool exception_payload_enabled;
 
 	struct kvm_pmu_event_filter *pmu_event_filter;
+	struct task_struct *nx_lpage_recovery_thread;
 };
 
 struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 837beefdf0a5..e6a5748a12d5 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -37,6 +37,7 @@
 #include <linux/uaccess.h>
 #include <linux/hash.h>
 #include <linux/kern_levels.h>
+#include <linux/kthread.h>
 
 #include <asm/page.h>
 #include <asm/pat.h>
@@ -50,16 +51,26 @@
 extern bool itlb_multihit_kvm_mitigation;
 
 static int __read_mostly nx_huge_pages = -1;
+static uint __read_mostly nx_huge_pages_recovery_ratio = 60;
 
 static int set_nx_huge_pages(const char *val, const struct kernel_param *kp);
+static int set_nx_huge_pages_recovery_ratio(const char *val, const struct kernel_param *kp);
 
 static struct kernel_param_ops nx_huge_pages_ops = {
 	.set = set_nx_huge_pages,
 	.get = param_get_bool,
 };
 
+static struct kernel_param_ops nx_huge_pages_recovery_ratio_ops = {
+	.set = set_nx_huge_pages_recovery_ratio,
+	.get = param_get_uint,
+};
+
 module_param_cb(nx_huge_pages, &nx_huge_pages_ops, &nx_huge_pages, 0644);
 __MODULE_PARM_TYPE(nx_huge_pages, "bool");
+module_param_cb(nx_huge_pages_recovery_ratio, &nx_huge_pages_recovery_ratio_ops,
+		&nx_huge_pages_recovery_ratio, 0644);
+__MODULE_PARM_TYPE(nx_huge_pages_recovery_ratio, "uint");
 
 /*
  * When setting this variable to true it enables Two-Dimensional-Paging
@@ -1215,6 +1226,8 @@ static void account_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 		return;
 
 	++kvm->stat.nx_lpage_splits;
+	list_add_tail(&sp->lpage_disallowed_link,
+		      &kvm->arch.lpage_disallowed_mmu_pages);
 	sp->lpage_disallowed = true;
 }
 
@@ -1239,6 +1252,7 @@ static void unaccount_huge_nx_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
 	--kvm->stat.nx_lpage_splits;
 	sp->lpage_disallowed = false;
+	list_del(&sp->lpage_disallowed_link);
 }
 
 static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level,
@@ -6268,6 +6282,8 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
 			idx = srcu_read_lock(&kvm->srcu);
 			kvm_mmu_zap_all_fast(kvm);
 			srcu_read_unlock(&kvm->srcu, idx);
+
+			wake_up_process(kvm->arch.nx_lpage_recovery_thread);
 		}
 		mutex_unlock(&kvm_lock);
 	}
@@ -6361,3 +6377,116 @@ void kvm_mmu_module_exit(void)
 	unregister_shrinker(&mmu_shrinker);
 	mmu_audit_disable();
 }
+
+static int set_nx_huge_pages_recovery_ratio(const char *val, const struct kernel_param *kp)
+{
+	unsigned int old_val;
+	int err;
+
+	old_val = nx_huge_pages_recovery_ratio;
+	err = param_set_uint(val, kp);
+	if (err)
+		return err;
+
+	if (READ_ONCE(nx_huge_pages) &&
+	    !old_val && nx_huge_pages_recovery_ratio) {
+		struct kvm *kvm;
+
+		mutex_lock(&kvm_lock);
+
+		list_for_each_entry(kvm, &vm_list, vm_list)
+			wake_up_process(kvm->arch.nx_lpage_recovery_thread);
+
+		mutex_unlock(&kvm_lock);
+	}
+
+	return err;
+}
+
+static void kvm_recover_nx_lpages(struct kvm *kvm)
+{
+	int rcu_idx;
+	struct kvm_mmu_page *sp;
+	unsigned int ratio;
+	LIST_HEAD(invalid_list);
+	ulong to_zap;
+
+	rcu_idx = srcu_read_lock(&kvm->srcu);
+	spin_lock(&kvm->mmu_lock);
+
+	ratio = READ_ONCE(nx_huge_pages_recovery_ratio);
+	to_zap = ratio ? DIV_ROUND_UP(kvm->stat.nx_lpage_splits, ratio) : 0;
+	while (to_zap && !list_empty(&kvm->arch.lpage_disallowed_mmu_pages)) {
+		/*
+		 * We use a separate list instead of just using active_mmu_pages
+		 * because the number of lpage_disallowed pages is expected to
+		 * be relatively small compared to the total.
+		 */
+		sp = list_first_entry(&kvm->arch.lpage_disallowed_mmu_pages,
+				      struct kvm_mmu_page,
+				      lpage_disallowed_link);
+		WARN_ON_ONCE(!sp->lpage_disallowed);
+		kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list);
+		WARN_ON_ONCE(sp->lpage_disallowed);
+
+		if (!--to_zap || need_resched() || spin_needbreak(&kvm->mmu_lock)) {
+			kvm_mmu_commit_zap_page(kvm, &invalid_list);
+			if (to_zap)
+				cond_resched_lock(&kvm->mmu_lock);
+		}
+	}
+
+	spin_unlock(&kvm->mmu_lock);
+	srcu_read_unlock(&kvm->srcu, rcu_idx);
+}
+
+static long get_nx_lpage_recovery_timeout(u64 start_time)
+{
+	return READ_ONCE(nx_huge_pages) && READ_ONCE(nx_huge_pages_recovery_ratio)
+		? start_time + 60 * HZ - get_jiffies_64()
+		: MAX_SCHEDULE_TIMEOUT;
+}
+
+static int kvm_nx_lpage_recovery_worker(struct kvm *kvm, uintptr_t data)
+{
+	u64 start_time;
+	long remaining_time;
+
+	while (true) {
+		start_time = get_jiffies_64();
+		remaining_time = get_nx_lpage_recovery_timeout(start_time);
+
+		set_current_state(TASK_INTERRUPTIBLE);
+		while (!kthread_should_stop() && remaining_time > 0) {
+			schedule_timeout(remaining_time);
+			remaining_time = get_nx_lpage_recovery_timeout(start_time);
+			set_current_state(TASK_INTERRUPTIBLE);
+		}
+
+		set_current_state(TASK_RUNNING);
+
+		if (kthread_should_stop())
+			return 0;
+
+		kvm_recover_nx_lpages(kvm);
+	}
+}
+
+int kvm_mmu_post_init_vm(struct kvm *kvm)
+{
+	int err;
+
+	err = kvm_vm_create_worker_thread(kvm, kvm_nx_lpage_recovery_worker, 0,
+					  "kvm-nx-lpage-recovery",
+					  &kvm->arch.nx_lpage_recovery_thread);
+	if (!err)
+		kthread_unpark(kvm->arch.nx_lpage_recovery_thread);
+
+	return err;
+}
+
+void kvm_mmu_pre_destroy_vm(struct kvm *kvm)
+{
+	if (kvm->arch.nx_lpage_recovery_thread)
+		kthread_stop(kvm->arch.nx_lpage_recovery_thread);
+}
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 11f8ec89433b..d55674f44a18 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -210,4 +210,8 @@ void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn);
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
 				    struct kvm_memory_slot *slot, u64 gfn);
 int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
+
+int kvm_mmu_post_init_vm(struct kvm *kvm);
+void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
+
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b087d178a774..a30e9962a6ef 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9456,6 +9456,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
+	INIT_LIST_HEAD(&kvm->arch.lpage_disallowed_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.assigned_dev_head);
 	atomic_set(&kvm->arch.noncoherent_dma_count, 0);
 
@@ -9484,6 +9485,11 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	return kvm_x86_ops->vm_init(kvm);
 }
 
+int kvm_arch_post_init_vm(struct kvm *kvm)
+{
+	return kvm_mmu_post_init_vm(kvm);
+}
+
 static void kvm_unload_vcpu_mmu(struct kvm_vcpu *vcpu)
 {
 	vcpu_load(vcpu);
@@ -9585,6 +9591,11 @@ int x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
 }
 EXPORT_SYMBOL_GPL(x86_set_memory_region);
 
+void kvm_arch_pre_destroy_vm(struct kvm *kvm)
+{
+	kvm_mmu_pre_destroy_vm(kvm);
+}
+
 void kvm_arch_destroy_vm(struct kvm *kvm)
 {
 	if (current->mm == kvm->mm) {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 513b49be83e0..f6d4385aad65 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -626,6 +626,23 @@ static int kvm_create_vm_debugfs(struct kvm *kvm, int fd)
 	return 0;
 }
 
+/*
+ * Called after the VM is otherwise initialized, but just before adding it to
+ * the vm_list.
+ */
+int __weak kvm_arch_post_init_vm(struct kvm *kvm)
+{
+	return 0;
+}
+
+/*
+ * Called just after removing the VM from the vm_list, but before doing any
+ * other destruction.
+ */
+void __weak kvm_arch_pre_destroy_vm(struct kvm *kvm)
+{
+}
+
 static struct kvm *kvm_create_vm(unsigned long type)
 {
 	int r, i;
@@ -676,10 +693,14 @@ static struct kvm *kvm_create_vm(unsigned long type)
 		rcu_assign_pointer(kvm->buses[i],
 			kzalloc(sizeof(struct kvm_io_bus), GFP_KERNEL_ACCOUNT));
 		if (!kvm->buses[i])
-			goto out_err;
+			goto out_err_no_mmu_notifier;
 	}
 
 	r = kvm_init_mmu_notifier(kvm);
+	if (r)
+		goto out_err_no_mmu_notifier;
+
+	r = kvm_arch_post_init_vm(kvm);
 	if (r)
 		goto out_err;
 
@@ -692,6 +713,11 @@ static struct kvm *kvm_create_vm(unsigned long type)
 	return kvm;
 
 out_err:
+#if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
+	if (kvm->mmu_notifier.ops)
+		mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
+#endif
+out_err_no_mmu_notifier:
 	cleanup_srcu_struct(&kvm->irq_srcu);
 out_err_no_irq_srcu:
 	cleanup_srcu_struct(&kvm->srcu);
@@ -734,6 +760,8 @@ static void kvm_destroy_vm(struct kvm *kvm)
 	mutex_lock(&kvm_lock);
 	list_del(&kvm->vm_list);
 	mutex_unlock(&kvm_lock);
+	kvm_arch_pre_destroy_vm(kvm);
+
 	kvm_free_irq_routing(kvm);
 	for (i = 0; i < KVM_NR_BUSES; i++) {
 		struct kvm_io_bus *bus = kvm_get_bus(kvm, i);
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v8 3/5] NX 3
  2019-10-31 23:33 ` [MODERATED] [PATCH v8 3/5] NX 3 Paolo Bonzini
@ 2019-11-01  0:24   ` Pawan Gupta
  2019-11-01  7:07     ` Paolo Bonzini
  2019-11-01 14:58     ` Tyler Hicks
  0 siblings, 2 replies; 18+ messages in thread
From: Pawan Gupta @ 2019-11-01  0:24 UTC (permalink / raw)
  To: speck

On Fri, Nov 01, 2019 at 12:33:45AM +0100, speck for Paolo Bonzini wrote:
> From: Paolo Bonzini <pbonzini@redhat.com>
> Subject: [PATCH v8 3/5] kvm: mmu: ITLB_MULTIHIT mitigation
>  
> +	kvm.nx_huge_pages=
> +			[KVM] Controls the sw workaround for bug
> +			X86_BUG_ITLB_MULTIHIT.
> +			force	: Always deploy workaround.
> +			off	: Default. Never deploy workaround.

off is not the default in the code, so the default should be "auto" here.

> +			auto	: Deploy workaround based on presence of
> +				  X86_BUG_ITLB_MULTIHIT.

Also mitigations=off is not disabling this mitigation. Below patch does
that when mitigations=off and kvm.nx_huge_pages=auto.

---
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date: Wed, 30 Oct 2019 21:28:24 -0700
Subject: [PATCH] kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT

Problem: The global mitigation knob mitigations=off does not turn off
X86_BUG_ITLB_MULTIHIT mitigation.

Fix: Turn off the mitigation when ITLB_MULTIHIT mitigation mode is
"auto" and mitigations are turned off globally via cmdline
mitigations=off.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  6 ++++++
 arch/x86/kvm/mmu.c                              | 10 ++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c667844c1c42..422da241a4cb 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2653,6 +2653,12 @@
 					       ssbd=force-off [ARM64]
 					       l1tf=off [X86]
 					       mds=off [X86]
+					       kvm.nx_huge_pages=off [X86].
+
+				Exceptions:
+					       This does not have any effect on
+					       kvm.nx_huge_pages when
+					       kvm.nx_huge_pages=force.
 
 			auto (default)
 				Mitigate all CPU vulnerabilities, but leave SMT
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e6a5748a12d5..529589a42afb 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -6250,6 +6250,12 @@ static void kvm_set_mmio_spte_mask(void)
 	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
 }
 
+static bool get_nx_auto_mode(void)
+{
+	/* Return true when CPU has the bug, and mitigations are ON */
+	return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT) && !cpu_mitigations_off();
+}
+
 static void __set_nx_huge_pages(bool val)
 {
 	nx_huge_pages = itlb_multihit_kvm_mitigation = val;
@@ -6266,7 +6272,7 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
 	else if (sysfs_streq(val, "force"))
 		new_val = 1;
 	else if (sysfs_streq(val, "auto"))
-		new_val = boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT);
+		new_val = get_nx_auto_mode();
 	else if (strtobool(val, &new_val) < 0)
 		return -EINVAL;
 
@@ -6296,7 +6302,7 @@ int kvm_mmu_module_init(void)
 	int ret = -ENOMEM;
 
 	if (nx_huge_pages == -1)
-		__set_nx_huge_pages(boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT));
+		__set_nx_huge_pages(get_nx_auto_mode());
 
 	/*
 	 * MMU roles use union aliasing which is, generally speaking, an
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v8 3/5] NX 3
  2019-11-01  0:24   ` [MODERATED] " Pawan Gupta
@ 2019-11-01  7:07     ` Paolo Bonzini
  2019-11-01 18:38       ` mark gross
  2019-11-01 14:58     ` Tyler Hicks
  1 sibling, 1 reply; 18+ messages in thread
From: Paolo Bonzini @ 2019-11-01  7:07 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 3565 bytes --]

On 01/11/19 01:24, speck for Pawan Gupta wrote:
> On Fri, Nov 01, 2019 at 12:33:45AM +0100, speck for Paolo Bonzini wrote:
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Subject: [PATCH v8 3/5] kvm: mmu: ITLB_MULTIHIT mitigation
>>  
>> +	kvm.nx_huge_pages=
>> +			[KVM] Controls the sw workaround for bug
>> +			X86_BUG_ITLB_MULTIHIT.
>> +			force	: Always deploy workaround.
>> +			off	: Default. Never deploy workaround.
> 
> off is not the default in the code, so the default should be "auto" here.
> 
>> +			auto	: Deploy workaround based on presence of
>> +				  X86_BUG_ITLB_MULTIHIT.
> 
> Also mitigations=off is not disabling this mitigation. Below patch does
> that when mitigations=off and kvm.nx_huge_pages=auto.
> 
> ---
> From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Date: Wed, 30 Oct 2019 21:28:24 -0700
> Subject: [PATCH] kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT
> 
> Problem: The global mitigation knob mitigations=off does not turn off
> X86_BUG_ITLB_MULTIHIT mitigation.
> 
> Fix: Turn off the mitigation when ITLB_MULTIHIT mitigation mode is
> "auto" and mitigations are turned off globally via cmdline
> mitigations=off.
> 
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>

Thanks, I'll post v9 soon.  Are you going to post backports as bundles
on top of Thomas's?

Paolo

> ---
>  Documentation/admin-guide/kernel-parameters.txt |  6 ++++++
>  arch/x86/kvm/mmu.c                              | 10 ++++++++--
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c667844c1c42..422da241a4cb 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2653,6 +2653,12 @@
>  					       ssbd=force-off [ARM64]
>  					       l1tf=off [X86]
>  					       mds=off [X86]
> +					       kvm.nx_huge_pages=off [X86].
> +
> +				Exceptions:
> +					       This does not have any effect on
> +					       kvm.nx_huge_pages when
> +					       kvm.nx_huge_pages=force.
>  
>  			auto (default)
>  				Mitigate all CPU vulnerabilities, but leave SMT
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index e6a5748a12d5..529589a42afb 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -6250,6 +6250,12 @@ static void kvm_set_mmio_spte_mask(void)
>  	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
>  }
>  
> +static bool get_nx_auto_mode(void)
> +{
> +	/* Return true when CPU has the bug, and mitigations are ON */
> +	return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT) && !cpu_mitigations_off();
> +}
> +
>  static void __set_nx_huge_pages(bool val)
>  {
>  	nx_huge_pages = itlb_multihit_kvm_mitigation = val;
> @@ -6266,7 +6272,7 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
>  	else if (sysfs_streq(val, "force"))
>  		new_val = 1;
>  	else if (sysfs_streq(val, "auto"))
> -		new_val = boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT);
> +		new_val = get_nx_auto_mode();
>  	else if (strtobool(val, &new_val) < 0)
>  		return -EINVAL;
>  
> @@ -6296,7 +6302,7 @@ int kvm_mmu_module_init(void)
>  	int ret = -ENOMEM;
>  
>  	if (nx_huge_pages == -1)
> -		__set_nx_huge_pages(boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT));
> +		__set_nx_huge_pages(get_nx_auto_mode());
>  
>  	/*
>  	 * MMU roles use union aliasing which is, generally speaking, an
> 



^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v8 3/5] NX 3
  2019-11-01  0:24   ` [MODERATED] " Pawan Gupta
  2019-11-01  7:07     ` Paolo Bonzini
@ 2019-11-01 14:58     ` Tyler Hicks
  2019-11-01 15:43       ` [MODERATED] [PATCH] NX build fixup Tyler Hicks
  1 sibling, 1 reply; 18+ messages in thread
From: Tyler Hicks @ 2019-11-01 14:58 UTC (permalink / raw)
  To: speck

On 2019-10-31 17:24:21, speck for Pawan Gupta wrote:
> On Fri, Nov 01, 2019 at 12:33:45AM +0100, speck for Paolo Bonzini wrote:
> > From: Paolo Bonzini <pbonzini@redhat.com>
> > Subject: [PATCH v8 3/5] kvm: mmu: ITLB_MULTIHIT mitigation
> >  
> > +	kvm.nx_huge_pages=
> > +			[KVM] Controls the sw workaround for bug
> > +			X86_BUG_ITLB_MULTIHIT.
> > +			force	: Always deploy workaround.
> > +			off	: Default. Never deploy workaround.
> 
> off is not the default in the code, so the default should be "auto" here.
> 
> > +			auto	: Deploy workaround based on presence of
> > +				  X86_BUG_ITLB_MULTIHIT.
> 
> Also mitigations=off is not disabling this mitigation. Below patch does
> that when mitigations=off and kvm.nx_huge_pages=auto.
> 
> ---
> From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Date: Wed, 30 Oct 2019 21:28:24 -0700
> Subject: [PATCH] kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT
> 
> Problem: The global mitigation knob mitigations=off does not turn off
> X86_BUG_ITLB_MULTIHIT mitigation.
> 
> Fix: Turn off the mitigation when ITLB_MULTIHIT mitigation mode is
> "auto" and mitigations are turned off globally via cmdline
> mitigations=off.
> 
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
>  Documentation/admin-guide/kernel-parameters.txt |  6 ++++++
>  arch/x86/kvm/mmu.c                              | 10 ++++++++--
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index c667844c1c42..422da241a4cb 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2653,6 +2653,12 @@
>  					       ssbd=force-off [ARM64]
>  					       l1tf=off [X86]
>  					       mds=off [X86]

This patch is not written against speck/master (which has the TAA
patches) and there's a very minor merge conflict here.

> +					       kvm.nx_huge_pages=off [X86].
> +
> +				Exceptions:
> +					       This does not have any effect on
> +					       kvm.nx_huge_pages when
> +					       kvm.nx_huge_pages=force.
>  
>  			auto (default)
>  				Mitigate all CPU vulnerabilities, but leave SMT
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index e6a5748a12d5..529589a42afb 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -6250,6 +6250,12 @@ static void kvm_set_mmio_spte_mask(void)
>  	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
>  }
>  
> +static bool get_nx_auto_mode(void)
> +{
> +	/* Return true when CPU has the bug, and mitigations are ON */
> +	return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT) && !cpu_mitigations_off();

The call to cpu_mitigations_off() causes a build failure when kvm is
built as a module (CONFIG_KVM=m):

ERROR: "cpu_mitigations" [arch/x86/kvm/kvm.ko] undefined!
make[2]: *** [__modpost] Error 1
make[1]: *** [modules] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [sub-make] Error 2

The problem is due to cpu_mitigations_off() and
cpu_mitigations_auto_nosmt() being inlined in include/linux/cpu.h. Those
functions look to only be used in initialization/setup code so I think
you could fix this easily enough by unlining and exporting them.

Tyler

> +}
> +
>  static void __set_nx_huge_pages(bool val)
>  {
>  	nx_huge_pages = itlb_multihit_kvm_mitigation = val;
> @@ -6266,7 +6272,7 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
>  	else if (sysfs_streq(val, "force"))
>  		new_val = 1;
>  	else if (sysfs_streq(val, "auto"))
> -		new_val = boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT);
> +		new_val = get_nx_auto_mode();
>  	else if (strtobool(val, &new_val) < 0)
>  		return -EINVAL;
>  
> @@ -6296,7 +6302,7 @@ int kvm_mmu_module_init(void)
>  	int ret = -ENOMEM;
>  
>  	if (nx_huge_pages == -1)
> -		__set_nx_huge_pages(boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT));
> +		__set_nx_huge_pages(get_nx_auto_mode());
>  
>  	/*
>  	 * MMU roles use union aliasing which is, generally speaking, an
> -- 
> 2.20.1

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH] NX build fixup
  2019-11-01 14:58     ` Tyler Hicks
@ 2019-11-01 15:43       ` Tyler Hicks
  2019-11-01 16:31         ` [MODERATED] " Josh Poimboeuf
  0 siblings, 1 reply; 18+ messages in thread
From: Tyler Hicks @ 2019-11-01 15:43 UTC (permalink / raw)
  To: speck

From: Tyler Hicks <tyhicks@canonical.com>
Subject: [PATCH] cpu/speculation: Uninline and export CPU mitigations helpers

A kernel module may need to check the value of the "mitigations=" kernel
command line parameter as part of its setup when the module needs
to perform software mitigations for a CPU flaw. Uninline and export the
helper functions surrounding the cpu_mitigations enum to allow for their
usage from a module.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
---

Only compile tested, with both CONFIG_KVM=y and CONFIG_KVM=m. Paolo, if
this looks good to you and Pawan, please include it in NX v9.

 include/linux/cpu.h | 13 ++-----------
 kernel/cpu.c        | 14 ++++++++++++++
 2 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 2a093434e975..f1965255526a 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -230,16 +230,7 @@ enum cpu_mitigations {
 
 extern enum cpu_mitigations cpu_mitigations;
 
-/* mitigations=off */
-static inline bool cpu_mitigations_off(void)
-{
-	return cpu_mitigations == CPU_MITIGATIONS_OFF;
-}
-
-/* mitigations=auto,nosmt */
-static inline bool cpu_mitigations_auto_nosmt(void)
-{
-	return cpu_mitigations == CPU_MITIGATIONS_AUTO_NOSMT;
-}
+extern bool cpu_mitigations_off(void);
+extern bool cpu_mitigations_auto_nosmt(void);
 
 #endif /* _LINUX_CPU_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index fc28e17940e0..bb9f3e636363 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2390,3 +2390,17 @@ static int __init mitigations_parse_cmdline(char *arg)
 	return 0;
 }
 early_param("mitigations", mitigations_parse_cmdline);
+
+/* mitigations=off */
+bool cpu_mitigations_off(void)
+{
+	return cpu_mitigations == CPU_MITIGATIONS_OFF;
+}
+EXPORT_SYMBOL_GPL(cpu_mitigations_off);
+
+/* mitigations=auto,nosmt */
+bool cpu_mitigations_auto_nosmt(void)
+{
+	return cpu_mitigations == CPU_MITIGATIONS_AUTO_NOSMT;
+}
+EXPORT_SYMBOL_GPL(cpu_mitigations_auto_nosmt);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH] NX build fixup
  2019-11-01 15:43       ` [MODERATED] [PATCH] NX build fixup Tyler Hicks
@ 2019-11-01 16:31         ` Josh Poimboeuf
  2019-11-01 20:39           ` [MODERATED] [PATCH v2] " Tyler Hicks
  0 siblings, 1 reply; 18+ messages in thread
From: Josh Poimboeuf @ 2019-11-01 16:31 UTC (permalink / raw)
  To: speck

On Fri, Nov 01, 2019 at 03:43:31PM +0000, speck for Tyler Hicks wrote:
> From: Tyler Hicks <tyhicks@canonical.com>
> Subject: [PATCH] cpu/speculation: Uninline and export CPU mitigations helpers
> 
> A kernel module may need to check the value of the "mitigations=" kernel
> command line parameter as part of its setup when the module needs
> to perform software mitigations for a CPU flaw. Uninline and export the
> helper functions surrounding the cpu_mitigations enum to allow for their
> usage from a module.
> 
> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
> ---
> 
> Only compile tested, with both CONFIG_KVM=y and CONFIG_KVM=m. Paolo, if
> this looks good to you and Pawan, please include it in NX v9.
> 
>  include/linux/cpu.h | 13 ++-----------
>  kernel/cpu.c        | 14 ++++++++++++++
>  2 files changed, 16 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 2a093434e975..f1965255526a 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -230,16 +230,7 @@ enum cpu_mitigations {
>  
>  extern enum cpu_mitigations cpu_mitigations;
>  
> -/* mitigations=off */
> -static inline bool cpu_mitigations_off(void)
> -{
> -	return cpu_mitigations == CPU_MITIGATIONS_OFF;
> -}
> -
> -/* mitigations=auto,nosmt */
> -static inline bool cpu_mitigations_auto_nosmt(void)
> -{
> -	return cpu_mitigations == CPU_MITIGATIONS_AUTO_NOSMT;
> -}
> +extern bool cpu_mitigations_off(void);
> +extern bool cpu_mitigations_auto_nosmt(void);

You could probably also remove the "extern enum cpu_mitigations
cpu_mitigations" and just make it a static variable in kernel/cpu.c.

Same with the enum itself, it could be made private.

Or, as a completely different approach, cpu_mitigations could be
exported.  Then it's just a one-liner patch.  Either way works for me...

-- 
Josh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v8 3/5] NX 3
  2019-11-01  7:07     ` Paolo Bonzini
@ 2019-11-01 18:38       ` mark gross
  2019-11-01 18:51         ` Tyler Hicks
  0 siblings, 1 reply; 18+ messages in thread
From: mark gross @ 2019-11-01 18:38 UTC (permalink / raw)
  To: speck

On Fri, Nov 01, 2019 at 08:07:27AM +0100, speck for Paolo Bonzini wrote:
> On 01/11/19 01:24, speck for Pawan Gupta wrote:
> > On Fri, Nov 01, 2019 at 12:33:45AM +0100, speck for Paolo Bonzini wrote:
> >> From: Paolo Bonzini <pbonzini@redhat.com>
> >> Subject: [PATCH v8 3/5] kvm: mmu: ITLB_MULTIHIT mitigation
> >>  
> >> +	kvm.nx_huge_pages=
> >> +			[KVM] Controls the sw workaround for bug
> >> +			X86_BUG_ITLB_MULTIHIT.
> >> +			force	: Always deploy workaround.
> >> +			off	: Default. Never deploy workaround.
> > 
> > off is not the default in the code, so the default should be "auto" here.
> > 
> >> +			auto	: Deploy workaround based on presence of
> >> +				  X86_BUG_ITLB_MULTIHIT.
> > 
> > Also mitigations=off is not disabling this mitigation. Below patch does
> > that when mitigations=off and kvm.nx_huge_pages=auto.
> > 
> > ---
> > From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > Date: Wed, 30 Oct 2019 21:28:24 -0700
> > Subject: [PATCH] kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT
> > 
> > Problem: The global mitigation knob mitigations=off does not turn off
> > X86_BUG_ITLB_MULTIHIT mitigation.
> > 
> > Fix: Turn off the mitigation when ITLB_MULTIHIT mitigation mode is
> > "auto" and mitigations are turned off globally via cmdline
> > mitigations=off.
> > 
> > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> 
> Thanks, I'll post v9 soon.  Are you going to post backports as bundles
> on top of Thomas's?
> 
> Paolo
We (I) have attempted the backports of this and quicly hit issues with the page
zapping changes do to code flux in the upstream.  My troubles started with with
5.2 and 4.19.  I have reached out to an internal resource who is more
knowledgable with KVM mm files to assist with that aspect of the backport.  The
engineer was at a confrence this week and we hope to make more progress next
week once he is back.

--mark

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v8 3/5] NX 3
  2019-11-01 18:38       ` mark gross
@ 2019-11-01 18:51         ` Tyler Hicks
  2019-11-01 20:36           ` mark gross
  0 siblings, 1 reply; 18+ messages in thread
From: Tyler Hicks @ 2019-11-01 18:51 UTC (permalink / raw)
  To: speck

On 2019-11-01 11:38:15, speck for mark gross wrote:
> On Fri, Nov 01, 2019 at 08:07:27AM +0100, speck for Paolo Bonzini wrote:
> > On 01/11/19 01:24, speck for Pawan Gupta wrote:
> > > On Fri, Nov 01, 2019 at 12:33:45AM +0100, speck for Paolo Bonzini wrote:
> > >> From: Paolo Bonzini <pbonzini@redhat.com>
> > >> Subject: [PATCH v8 3/5] kvm: mmu: ITLB_MULTIHIT mitigation
> > >>  
> > >> +	kvm.nx_huge_pages=
> > >> +			[KVM] Controls the sw workaround for bug
> > >> +			X86_BUG_ITLB_MULTIHIT.
> > >> +			force	: Always deploy workaround.
> > >> +			off	: Default. Never deploy workaround.
> > > 
> > > off is not the default in the code, so the default should be "auto" here.
> > > 
> > >> +			auto	: Deploy workaround based on presence of
> > >> +				  X86_BUG_ITLB_MULTIHIT.
> > > 
> > > Also mitigations=off is not disabling this mitigation. Below patch does
> > > that when mitigations=off and kvm.nx_huge_pages=auto.
> > > 
> > > ---
> > > From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > > Date: Wed, 30 Oct 2019 21:28:24 -0700
> > > Subject: [PATCH] kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT
> > > 
> > > Problem: The global mitigation knob mitigations=off does not turn off
> > > X86_BUG_ITLB_MULTIHIT mitigation.
> > > 
> > > Fix: Turn off the mitigation when ITLB_MULTIHIT mitigation mode is
> > > "auto" and mitigations are turned off globally via cmdline
> > > mitigations=off.
> > > 
> > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > 
> > Thanks, I'll post v9 soon.  Are you going to post backports as bundles
> > on top of Thomas's?
> > 
> > Paolo
> We (I) have attempted the backports of this and quicly hit issues with the page
> zapping changes do to code flux in the upstream.  My troubles started with with
> 5.2 and 4.19. 

Is this due to the lack of kvm_mmu_zap_all_fast()? I believe that Paolo
and others said in an older thread that
kvm_mmu_invalidate_zap_all_pages() could be used instead.

That's what I've been planning to do in Ubuntu but I haven't had a
chance to test it yet.

Tyler

> I have reached out to an internal resource who is more knowledgable
> with KVM mm files to assist with that aspect of the backport.  The
> engineer was at a confrence this week and we hope to make more
> progress next week once he is back.
> 
> --mark
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v8 3/5] NX 3
  2019-11-01 18:51         ` Tyler Hicks
@ 2019-11-01 20:36           ` mark gross
  2019-11-02  7:36             ` Paolo Bonzini
  0 siblings, 1 reply; 18+ messages in thread
From: mark gross @ 2019-11-01 20:36 UTC (permalink / raw)
  To: speck

On Fri, Nov 01, 2019 at 01:51:27PM -0500, speck for Tyler Hicks wrote:
> On 2019-11-01 11:38:15, speck for mark gross wrote:
> > On Fri, Nov 01, 2019 at 08:07:27AM +0100, speck for Paolo Bonzini wrote:
> > > On 01/11/19 01:24, speck for Pawan Gupta wrote:
> > > > On Fri, Nov 01, 2019 at 12:33:45AM +0100, speck for Paolo Bonzini wrote:
> > > >> From: Paolo Bonzini <pbonzini@redhat.com>
> > > >> Subject: [PATCH v8 3/5] kvm: mmu: ITLB_MULTIHIT mitigation
> > > >>  
> > > >> +	kvm.nx_huge_pages=
> > > >> +			[KVM] Controls the sw workaround for bug
> > > >> +			X86_BUG_ITLB_MULTIHIT.
> > > >> +			force	: Always deploy workaround.
> > > >> +			off	: Default. Never deploy workaround.
> > > > 
> > > > off is not the default in the code, so the default should be "auto" here.
> > > > 
> > > >> +			auto	: Deploy workaround based on presence of
> > > >> +				  X86_BUG_ITLB_MULTIHIT.
> > > > 
> > > > Also mitigations=off is not disabling this mitigation. Below patch does
> > > > that when mitigations=off and kvm.nx_huge_pages=auto.
> > > > 
> > > > ---
> > > > From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > > > Date: Wed, 30 Oct 2019 21:28:24 -0700
> > > > Subject: [PATCH] kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT
> > > > 
> > > > Problem: The global mitigation knob mitigations=off does not turn off
> > > > X86_BUG_ITLB_MULTIHIT mitigation.
> > > > 
> > > > Fix: Turn off the mitigation when ITLB_MULTIHIT mitigation mode is
> > > > "auto" and mitigations are turned off globally via cmdline
> > > > mitigations=off.
> > > > 
> > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > > 
> > > Thanks, I'll post v9 soon.  Are you going to post backports as bundles
> > > on top of Thomas's?
> > > 
> > > Paolo
> > We (I) have attempted the backports of this and quicly hit issues with the page
> > zapping changes do to code flux in the upstream.  My troubles started with with
> > 5.2 and 4.19. 
> 
> Is this due to the lack of kvm_mmu_zap_all_fast()? I believe that Paolo
> and others said in an older thread that
> kvm_mmu_invalidate_zap_all_pages() could be used instead.
> 
> That's what I've been planning to do in Ubuntu but I haven't had a
> chance to test it yet.
> 
> Tyler

I missed that comment.  I'll retry my backporting using your recomendation
above.

--mark

> 
> > I have reached out to an internal resource who is more knowledgable
> > with KVM mm files to assist with that aspect of the backport.  The
> > engineer was at a confrence this week and we hope to make more
> > progress next week once he is back.
> > 
> > --mark
> > 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] [PATCH v2] NX build fixup
  2019-11-01 16:31         ` [MODERATED] " Josh Poimboeuf
@ 2019-11-01 20:39           ` Tyler Hicks
  2019-11-01 21:14             ` [MODERATED] " Josh Poimboeuf
  2019-11-01 21:38             ` [MODERATED] Re: [PATCH v2] NX mitigations=off fix Pawan Gupta
  0 siblings, 2 replies; 18+ messages in thread
From: Tyler Hicks @ 2019-11-01 20:39 UTC (permalink / raw)
  To: speck

From: Tyler Hicks <tyhicks@canonical.com>
Subject: [PATCH] cpu/speculation: Uninline and export CPU mitigations helpers

A kernel module may need to check the value of the "mitigations=" kernel
command line parameter as part of its setup when the module needs
to perform software mitigations for a CPU flaw. Uninline and export the
helper functions surrounding the cpu_mitigations enum to allow for their
usage from a module. Lastly, privatize the enum and cpu_mitigations
variable since the value of cpu_mitigations can be checked with the
exported helper functions.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
---
 include/linux/cpu.h | 25 ++-----------------------
 kernel/cpu.c        | 27 ++++++++++++++++++++++++++-
 2 files changed, 28 insertions(+), 24 deletions(-)

diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 2a093434e975..bc6c879bd110 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -218,28 +218,7 @@ static inline int cpuhp_smt_enable(void) { return 0; }
 static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; }
 #endif
 
-/*
- * These are used for a global "mitigations=" cmdline option for toggling
- * optional CPU mitigations.
- */
-enum cpu_mitigations {
-	CPU_MITIGATIONS_OFF,
-	CPU_MITIGATIONS_AUTO,
-	CPU_MITIGATIONS_AUTO_NOSMT,
-};
-
-extern enum cpu_mitigations cpu_mitigations;
-
-/* mitigations=off */
-static inline bool cpu_mitigations_off(void)
-{
-	return cpu_mitigations == CPU_MITIGATIONS_OFF;
-}
-
-/* mitigations=auto,nosmt */
-static inline bool cpu_mitigations_auto_nosmt(void)
-{
-	return cpu_mitigations == CPU_MITIGATIONS_AUTO_NOSMT;
-}
+extern bool cpu_mitigations_off(void);
+extern bool cpu_mitigations_auto_nosmt(void);
 
 #endif /* _LINUX_CPU_H_ */
diff --git a/kernel/cpu.c b/kernel/cpu.c
index fc28e17940e0..e2cad3ee2ead 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2373,7 +2373,18 @@ void __init boot_cpu_hotplug_init(void)
 	this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);
 }
 
-enum cpu_mitigations cpu_mitigations __ro_after_init = CPU_MITIGATIONS_AUTO;
+/*
+ * These are used for a global "mitigations=" cmdline option for toggling
+ * optional CPU mitigations.
+ */
+enum cpu_mitigations {
+	CPU_MITIGATIONS_OFF,
+	CPU_MITIGATIONS_AUTO,
+	CPU_MITIGATIONS_AUTO_NOSMT,
+};
+
+static enum cpu_mitigations cpu_mitigations __ro_after_init =
+	CPU_MITIGATIONS_AUTO;
 
 static int __init mitigations_parse_cmdline(char *arg)
 {
@@ -2390,3 +2401,17 @@ static int __init mitigations_parse_cmdline(char *arg)
 	return 0;
 }
 early_param("mitigations", mitigations_parse_cmdline);
+
+/* mitigations=off */
+bool cpu_mitigations_off(void)
+{
+	return cpu_mitigations == CPU_MITIGATIONS_OFF;
+}
+EXPORT_SYMBOL_GPL(cpu_mitigations_off);
+
+/* mitigations=auto,nosmt */
+bool cpu_mitigations_auto_nosmt(void)
+{
+	return cpu_mitigations == CPU_MITIGATIONS_AUTO_NOSMT;
+}
+EXPORT_SYMBOL_GPL(cpu_mitigations_auto_nosmt);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v2] NX build fixup
  2019-11-01 20:39           ` [MODERATED] [PATCH v2] " Tyler Hicks
@ 2019-11-01 21:14             ` Josh Poimboeuf
  2019-11-01 21:38             ` [MODERATED] Re: [PATCH v2] NX mitigations=off fix Pawan Gupta
  1 sibling, 0 replies; 18+ messages in thread
From: Josh Poimboeuf @ 2019-11-01 21:14 UTC (permalink / raw)
  To: speck

On Fri, Nov 01, 2019 at 08:39:09PM +0000, speck for Tyler Hicks wrote:
> From: Tyler Hicks <tyhicks@canonical.com>
> Subject: [PATCH] cpu/speculation: Uninline and export CPU mitigations helpers
> 
> A kernel module may need to check the value of the "mitigations=" kernel
> command line parameter as part of its setup when the module needs
> to perform software mitigations for a CPU flaw. Uninline and export the
> helper functions surrounding the cpu_mitigations enum to allow for their
> usage from a module. Lastly, privatize the enum and cpu_mitigations
> variable since the value of cpu_mitigations can be checked with the
> exported helper functions.
> 
> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>

Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>

-- 
Josh

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v2] NX mitigations=off fix
  2019-11-01 20:39           ` [MODERATED] [PATCH v2] " Tyler Hicks
  2019-11-01 21:14             ` [MODERATED] " Josh Poimboeuf
@ 2019-11-01 21:38             ` Pawan Gupta
  1 sibling, 0 replies; 18+ messages in thread
From: Pawan Gupta @ 2019-11-01 21:38 UTC (permalink / raw)
  To: speck


From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Date: Wed, 30 Oct 2019 21:28:24 -0700
Subject: [PATCH] kvm: x86: mmu: Apply global mitigations knob to ITLB_MULTIHIT

Problem: The global mitigation knob mitigations=off does not turn off
X86_BUG_ITLB_MULTIHIT mitigation.

Fix: Turn off the mitigation when ITLB_MULTIHIT mitigation mode is
"auto" and mitigations are turned off globally via cmdline
mitigations=off.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
---
Rebased on taa-master bundle. I do not have access to Thomas's repo.
This needs to go on top of Tyler's fix.

 Documentation/admin-guide/kernel-parameters.txt |  6 ++++++
 arch/x86/kvm/mmu.c                              | 10 ++++++++--
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index e8e0a140a632..555236b92289 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2654,6 +2654,12 @@
 					       l1tf=off [X86]
 					       mds=off [X86]
 					       tsx_async_abort=off [X86]
+					       kvm.nx_huge_pages=off [X86]
+
+				Exceptions:
+					       This does not have any effect on
+					       kvm.nx_huge_pages when
+					       kvm.nx_huge_pages=force.
 
 			auto (default)
 				Mitigate all CPU vulnerabilities, but leave SMT
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e6a5748a12d5..529589a42afb 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -6250,6 +6250,12 @@ static void kvm_set_mmio_spte_mask(void)
 	kvm_mmu_set_mmio_spte_mask(mask, mask, ACC_WRITE_MASK | ACC_USER_MASK);
 }
 
+static bool get_nx_auto_mode(void)
+{
+	/* Return true when CPU has the bug, and mitigations are ON */
+	return boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT) && !cpu_mitigations_off();
+}
+
 static void __set_nx_huge_pages(bool val)
 {
 	nx_huge_pages = itlb_multihit_kvm_mitigation = val;
@@ -6266,7 +6272,7 @@ static int set_nx_huge_pages(const char *val, const struct kernel_param *kp)
 	else if (sysfs_streq(val, "force"))
 		new_val = 1;
 	else if (sysfs_streq(val, "auto"))
-		new_val = boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT);
+		new_val = get_nx_auto_mode();
 	else if (strtobool(val, &new_val) < 0)
 		return -EINVAL;
 
@@ -6296,7 +6302,7 @@ int kvm_mmu_module_init(void)
 	int ret = -ENOMEM;
 
 	if (nx_huge_pages == -1)
-		__set_nx_huge_pages(boot_cpu_has_bug(X86_BUG_ITLB_MULTIHIT));
+		__set_nx_huge_pages(get_nx_auto_mode());
 
 	/*
 	 * MMU roles use union aliasing which is, generally speaking, an
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [MODERATED] Re: [PATCH v8 3/5] NX 3
  2019-11-01 20:36           ` mark gross
@ 2019-11-02  7:36             ` Paolo Bonzini
  0 siblings, 0 replies; 18+ messages in thread
From: Paolo Bonzini @ 2019-11-02  7:36 UTC (permalink / raw)
  To: speck

[-- Attachment #1: Type: text/plain, Size: 721 bytes --]

On 01/11/19 21:36, speck for mark gross wrote:
>>> We (I) have attempted the backports of this and quicly hit issues with the page
>>> zapping changes do to code flux in the upstream.  My troubles started with with
>>> 5.2 and 4.19. 
>>
>> Is this due to the lack of kvm_mmu_zap_all_fast()? I believe that Paolo
>> and others said in an older thread that
>> kvm_mmu_invalidate_zap_all_pages() could be used instead.
>>
>> That's what I've been planning to do in Ubuntu but I haven't had a
>> chance to test it yet.
>
> I missed that comment.  I'll retry my backporting using your recomendation
> above.

Correct, backports were discussed a couple weeks ago and nothing has
changed since then.

Paolo


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-11-02  7:36 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-31 23:33 [MODERATED] [PATCH v8 0/5] NX 0 Paolo Bonzini
2019-10-31 23:33 ` [MODERATED] [PATCH v8 1/5] NX 1 Paolo Bonzini
2019-10-31 23:33 ` [MODERATED] [PATCH v8 2/5] NX 2 Paolo Bonzini
2019-10-31 23:33 ` [MODERATED] [PATCH v8 3/5] NX 3 Paolo Bonzini
2019-11-01  0:24   ` [MODERATED] " Pawan Gupta
2019-11-01  7:07     ` Paolo Bonzini
2019-11-01 18:38       ` mark gross
2019-11-01 18:51         ` Tyler Hicks
2019-11-01 20:36           ` mark gross
2019-11-02  7:36             ` Paolo Bonzini
2019-11-01 14:58     ` Tyler Hicks
2019-11-01 15:43       ` [MODERATED] [PATCH] NX build fixup Tyler Hicks
2019-11-01 16:31         ` [MODERATED] " Josh Poimboeuf
2019-11-01 20:39           ` [MODERATED] [PATCH v2] " Tyler Hicks
2019-11-01 21:14             ` [MODERATED] " Josh Poimboeuf
2019-11-01 21:38             ` [MODERATED] Re: [PATCH v2] NX mitigations=off fix Pawan Gupta
2019-10-31 23:33 ` [MODERATED] [PATCH v8 4/5] NX 4 Paolo Bonzini
2019-10-31 23:33 ` [MODERATED] [PATCH v8 5/5] NX 5 Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).