linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] TSX force abort
@ 2021-06-09 20:57 Pawan Gupta
  2021-06-09 20:58 ` [PATCH 1/4] x86/msr: Define new bits in TSX_FORCE_ABORT MSR Pawan Gupta
                   ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Pawan Gupta @ 2021-06-09 20:57 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Pawan Gupta, Brijesh Singh,
	Dave Hansen, Mike Rapoport, Anthony Steinhauser, Anand K Mistry,
	Andi Kleen, Miguel Ojeda, Nick Desaulniers, Joe Perches,
	linux-doc, linux-kernel, linux-perf-users

Introduction
============
On some Intel processors [1] a microcode update will always abort
Transactional Synchronization Extensions (TSX) transactions by default. These
CPUs were previously affected by the TSX memory ordering issue [2]. A
workaround was earlier added to perf related to memory ordering which is no
longer required(because TSX is defeatured on these systems). This series adds
support for new bits added to TSX_FORCE_ABORT MSR and CPUID to enumerate new
abort behavior and to bypass the workaround.

Roadmap to this series
======================

0001:	Define new CPUID and MSR bits that are added by the microcode update.
	(The new CPUID.RTM_ALWAYS_ABORT is not shown in /proc/cpuinfo)

0002:	When new microcode is enumerated bypass perf counter workaround for [1].
	Perf workaround is no longer required after the microcode update.

0003:	Clear CPUID.RTM and CPUID.HLE when TSX is defeatured, so that software
	does not enumerate and try to use TSX.

0004:	Add tsx=fake cmdline option to not hide CPUID.RTM and CPUID.HLE. This
	may be desirable when resuming saved guest image that require RTM and HLE
	feature bits to be present.

Thanks,
Pawan

[1] Intel® TSX Memory and Performance Monitoring Update for Intel® Processors
    https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html

[2] Performance Monitoring Impact of Intel® Transactional Synchronization Extension Memory
    http://cdrdv2.intel.com/v1/dl/getContent/604224

Pawan Gupta (4):
  x86/msr: Define new bits in TSX_FORCE_ABORT MSR
  perf/x86/intel: Do not deploy workaround when TSX is deprecated
  x86/tsx: Clear CPUID bits when TSX always force aborts
  x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE

 Documentation/admin-guide/kernel-parameters.txt |  3 +-
 arch/x86/events/intel/core.c                    | 22 +++++++--
 arch/x86/include/asm/cpufeatures.h              |  1 +-
 arch/x86/include/asm/msr-index.h                |  4 ++-
 arch/x86/kernel/cpu/bugs.c                      |  5 +-
 arch/x86/kernel/cpu/cpu.h                       |  3 +-
 arch/x86/kernel/cpu/intel.c                     |  4 +-
 arch/x86/kernel/cpu/tsx.c                       | 44 ++++++++++++++++--
 8 files changed, 75 insertions(+), 11 deletions(-)

base-commit: 614124bea77e452aa6df7a8714e8bc820b489922
-- 
git-series 0.9.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/4] x86/msr: Define new bits in TSX_FORCE_ABORT MSR
  2021-06-09 20:57 [PATCH 0/4] TSX force abort Pawan Gupta
@ 2021-06-09 20:58 ` Pawan Gupta
  2021-06-11  8:39   ` Borislav Petkov
  2021-06-09 21:12 ` [PATCH 2/4] perf/x86/intel: Do not deploy workaround when TSX is deprecated Pawan Gupta
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Pawan Gupta @ 2021-06-09 20:58 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Pawan Gupta, Brijesh Singh,
	Dave Hansen, Mike Rapoport, Anthony Steinhauser, Anand K Mistry,
	Andi Kleen, Miguel Ojeda, Nick Desaulniers, Joe Perches,
	linux-doc, linux-kernel, linux-perf-users

Intel client processors that support IA32_TSX_FORCE_ABORT MSR related to
perf counter interaction [1] received a microcode update that deprecates
Transactional Synchronization Extension (TSX) feature. MSR
IA32_TSX_FORCE_ABORT bit FORCE_ABORT_RTM now defaults to 1, writes to
this bit are ignored. A new bit TSX_CPUID_CLEAR clears the TSX related
CPUID bits.

Below is the summary of changes to IA32_TSX_FORCE_ABORT MSR:

  Bit 0: FORCE_ABORT_RTM (legacy bit, new default=1) Status bit that
  indicates if RTM transactions are always aborted. This bit is
  essentially !SDV_ENABLE_RTM(Bit 2). Writes to this bit are ignored.

  Bit 1: TSX_CPUID_CLEAR (new bit, default=0) When set, CPUID.HLE = 0
  and CPUID.RTM = 0.

  Bit 2: SDV_ENABLE_RTM (new bit, default=0) When clear, XBEGIN will
  always abort with EAX code 0. When set, XBEGIN will not be forced to
  abort (but will always abort in SGX enclaves). This bit is intended to
  be SDV-only. If this bit is set transactional atomicity correctness is
  not certain.

Performance monitoring counter 3 is usable in all cases, regardless of
the value of above bits.

A new CPUID bit CPUID.RTM_ALWAYS_ABORT (CPUID 7.EDX[11]) is added to
indicate the status of always abort behavior.

Define these new CPUID and MSR bits.

[1] Performance Monitoring Impact of Intel® Transactional Synchronization Extension Memory
    http://cdrdv2.intel.com/v1/dl/getContent/604224

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
---
 arch/x86/include/asm/cpufeatures.h | 1 +
 arch/x86/include/asm/msr-index.h   | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index ac37830ae941..21c1855b5c14 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -378,6 +378,7 @@
 #define X86_FEATURE_AVX512_VP2INTERSECT (18*32+ 8) /* AVX-512 Intersect for D/Q */
 #define X86_FEATURE_SRBDS_CTRL		(18*32+ 9) /* "" SRBDS mitigation MSR available */
 #define X86_FEATURE_MD_CLEAR		(18*32+10) /* VERW clears CPU buffers */
+#define X86_FEATURE_RTM_ALWAYS_ABORT	(18*32+11) /* "" RTM transaction always aborts */
 #define X86_FEATURE_TSX_FORCE_ABORT	(18*32+13) /* "" TSX_FORCE_ABORT */
 #define X86_FEATURE_SERIALIZE		(18*32+14) /* SERIALIZE instruction */
 #define X86_FEATURE_HYBRID_CPU		(18*32+15) /* "" This part has CPUs of more than one type */
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 211ba3375ee9..a7c413432b33 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -772,6 +772,10 @@
 
 #define MSR_TFA_RTM_FORCE_ABORT_BIT	0
 #define MSR_TFA_RTM_FORCE_ABORT		BIT_ULL(MSR_TFA_RTM_FORCE_ABORT_BIT)
+#define MSR_TFA_TSX_CPUID_CLEAR_BIT	1
+#define MSR_TFA_TSX_CPUID_CLEAR		BIT_ULL(MSR_TFA_TSX_CPUID_CLEAR_BIT)
+#define MSR_TFA_SDV_ENABLE_RTM_BIT	2
+#define MSR_TFA_SDV_ENABLE_RTM		BIT_ULL(MSR_TFA_SDV_ENABLE_RTM_BIT)
 
 /* P4/Xeon+ specific */
 #define MSR_IA32_MCG_EAX		0x00000180
-- 
git-series 0.9.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 2/4] perf/x86/intel: Do not deploy workaround when TSX is deprecated
  2021-06-09 20:57 [PATCH 0/4] TSX force abort Pawan Gupta
  2021-06-09 20:58 ` [PATCH 1/4] x86/msr: Define new bits in TSX_FORCE_ABORT MSR Pawan Gupta
@ 2021-06-09 21:12 ` Pawan Gupta
  2021-06-11  7:50   ` Borislav Petkov
  2021-06-09 21:13 ` [PATCH 3/4] x86/tsx: Clear CPUID bits when TSX always force aborts Pawan Gupta
  2021-06-09 21:14 ` [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE Pawan Gupta
  3 siblings, 1 reply; 28+ messages in thread
From: Pawan Gupta @ 2021-06-09 21:12 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Pawan Gupta, Brijesh Singh,
	Dave Hansen, Mike Rapoport, Anthony Steinhauser, Anand K Mistry,
	Andi Kleen, Miguel Ojeda, Nick Desaulniers, Joe Perches,
	linux-doc, linux-kernel, linux-perf-users

Earlier workaround added by commit 400816f60c54 ("perf/x86/intel:
Implement support for TSX Force Abort") for perf counter interactions
[1] are not required on some client systems which received a microcode
update that deprecates TSX.

Bypass the perf workaround when such microcode is enumerated.

[1] Performance Monitoring Impact of Intel® Transactional Synchronization Extension Memory
    http://cdrdv2.intel.com/v1/dl/getContent/604224

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
---
 arch/x86/events/intel/core.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index e28892270c58..b5953e1e59a2 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -6016,10 +6016,24 @@ __init int intel_pmu_init(void)
 		intel_pmu_pebs_data_source_skl(pmem);
 
 		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
-			x86_pmu.flags |= PMU_FL_TFA;
-			x86_pmu.get_event_constraints = tfa_get_event_constraints;
-			x86_pmu.enable_all = intel_tfa_pmu_enable_all;
-			x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
+			u64 msr;
+
+			rdmsrl(MSR_TSX_FORCE_ABORT, msr);
+			/* Systems that enumerate CPUID.RTM_ALWAYS_ABORT or
+			 * support MSR_TSX_FORCE_ABORT[SDV_ENABLE_RTM] bit have
+			 * TSX deprecated by default. TSX force abort hooks are
+			 * not required on these systems.
+			 *
+			 * Only deploy the workaround when older microcode is
+			 * detected.
+			 */
+			if (!boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
+			    !(msr & MSR_TFA_SDV_ENABLE_RTM)) {
+				x86_pmu.flags |= PMU_FL_TFA;
+				x86_pmu.get_event_constraints = tfa_get_event_constraints;
+				x86_pmu.enable_all = intel_tfa_pmu_enable_all;
+				x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
+			}
 		}
 
 		pr_cont("Skylake events, ");
-- 
git-series 0.9.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 3/4] x86/tsx: Clear CPUID bits when TSX always force aborts
  2021-06-09 20:57 [PATCH 0/4] TSX force abort Pawan Gupta
  2021-06-09 20:58 ` [PATCH 1/4] x86/msr: Define new bits in TSX_FORCE_ABORT MSR Pawan Gupta
  2021-06-09 21:12 ` [PATCH 2/4] perf/x86/intel: Do not deploy workaround when TSX is deprecated Pawan Gupta
@ 2021-06-09 21:13 ` Pawan Gupta
  2021-06-11 10:03   ` Borislav Petkov
  2021-06-09 21:14 ` [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE Pawan Gupta
  3 siblings, 1 reply; 28+ messages in thread
From: Pawan Gupta @ 2021-06-09 21:13 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Pawan Gupta, Brijesh Singh,
	Dave Hansen, Mike Rapoport, Anthony Steinhauser, Anand K Mistry,
	Andi Kleen, Miguel Ojeda, Nick Desaulniers, Joe Perches,
	linux-doc, linux-kernel, linux-perf-users

As a result of TSX deprecation some processors always aborts TSX
transactions by default.

When TSX feature cannot be used it is better to hide it. Clear CPUID.RTM
and CPUID.HLE bits when TSX transactions always aborts.

Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
---
 arch/x86/kernel/cpu/cpu.h   |  2 ++-
 arch/x86/kernel/cpu/intel.c |  4 +++-
 arch/x86/kernel/cpu/tsx.c   | 41 ++++++++++++++++++++++++++++++++++----
 3 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index 67944128876d..95521302630d 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -48,6 +48,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[],
 enum tsx_ctrl_states {
 	TSX_CTRL_ENABLE,
 	TSX_CTRL_DISABLE,
+	TSX_CTRL_RTM_ALWAYS_ABORT,
 	TSX_CTRL_NOT_SUPPORTED,
 };
 
@@ -56,6 +57,7 @@ extern __ro_after_init enum tsx_ctrl_states tsx_ctrl_state;
 extern void __init tsx_init(void);
 extern void tsx_enable(void);
 extern void tsx_disable(void);
+extern void tsx_clear_cpuid(void);
 #else
 static inline void tsx_init(void) { }
 #endif /* CONFIG_CPU_SUP_INTEL */
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 8adffc17fa8b..861e919eba9a 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -717,8 +717,10 @@ static void init_intel(struct cpuinfo_x86 *c)
 
 	if (tsx_ctrl_state == TSX_CTRL_ENABLE)
 		tsx_enable();
-	if (tsx_ctrl_state == TSX_CTRL_DISABLE)
+	else if (tsx_ctrl_state == TSX_CTRL_DISABLE)
 		tsx_disable();
+	else if (tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT)
+		tsx_clear_cpuid();
 
 	split_lock_init();
 	bus_lock_init();
diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c
index e2ad30e474f8..5ed99811504c 100644
--- a/arch/x86/kernel/cpu/tsx.c
+++ b/arch/x86/kernel/cpu/tsx.c
@@ -2,7 +2,7 @@
 /*
  * Intel Transactional Synchronization Extensions (TSX) control.
  *
- * Copyright (C) 2019 Intel Corporation
+ * Copyright (C) 2019-2021 Intel Corporation
  *
  * Author:
  *	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
@@ -84,14 +84,27 @@ static enum tsx_ctrl_states x86_get_tsx_auto_mode(void)
 	return TSX_CTRL_ENABLE;
 }
 
+void tsx_clear_cpuid(void)
+{
+	u64 msr;
+
+	/*
+	 * MSR_TFA_TSX_CPUID_CLEAR bit is only present when both CPUID bits
+	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are enumerated.
+	 */
+	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
+	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+		rdmsrl(MSR_TSX_FORCE_ABORT, msr);
+		msr |= MSR_TFA_TSX_CPUID_CLEAR;
+		wrmsrl(MSR_TSX_FORCE_ABORT, msr);
+	}
+}
+
 void __init tsx_init(void)
 {
 	char arg[5] = {};
 	int ret;
 
-	if (!tsx_ctrl_is_supported())
-		return;
-
 	ret = cmdline_find_option(boot_command_line, "tsx", arg, sizeof(arg));
 	if (ret >= 0) {
 		if (!strcmp(arg, "on")) {
@@ -114,6 +127,26 @@ void __init tsx_init(void)
 			tsx_ctrl_state = TSX_CTRL_ENABLE;
 	}
 
+	/*
+	 * Hardware will always abort a TSX transaction if both CPUID bits
+	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are enumerated.  In this case it
+	 * is better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
+	 * here.
+	 */
+	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
+	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+		tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
+		tsx_clear_cpuid();
+		setup_clear_cpu_cap(X86_FEATURE_RTM);
+		setup_clear_cpu_cap(X86_FEATURE_HLE);
+		return;
+	}
+
+	if (!tsx_ctrl_is_supported()) {
+		tsx_ctrl_state = TSX_CTRL_NOT_SUPPORTED;
+		return;
+	}
+
 	if (tsx_ctrl_state == TSX_CTRL_DISABLE) {
 		tsx_disable();
 
-- 
git-series 0.9.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-06-09 20:57 [PATCH 0/4] TSX force abort Pawan Gupta
                   ` (2 preceding siblings ...)
  2021-06-09 21:13 ` [PATCH 3/4] x86/tsx: Clear CPUID bits when TSX always force aborts Pawan Gupta
@ 2021-06-09 21:14 ` Pawan Gupta
  2021-06-11 10:06   ` Borislav Petkov
  2021-07-06 19:52   ` Eduardo Habkost
  3 siblings, 2 replies; 28+ messages in thread
From: Pawan Gupta @ 2021-06-09 21:14 UTC (permalink / raw)
  To: Thomas Gleixner, Borislav Petkov
  Cc: Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Pawan Gupta, Brijesh Singh,
	Dave Hansen, Mike Rapoport, Anthony Steinhauser, Anand K Mistry,
	Andi Kleen, Miguel Ojeda, Nick Desaulniers, Joe Perches,
	linux-doc, linux-kernel, linux-perf-users

On CPUs that deprecated TSX, clearing the enumeration bits CPUID.RTM and
CPUID.HLE may not be desirable in some corner cases. Like a saved guest
would refuse to resume if it was saved before the microcode update
that deprecated TSX.

Add a cmdline option "tsx=fake" to not clear CPUID bits even when the
hardware always aborts TSX transactions.

Suggested-by: Tony Luck <tony.luck@intel.com>
Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
---
 Documentation/admin-guide/kernel-parameters.txt | 3 +++
 arch/x86/kernel/cpu/bugs.c                      | 5 +++--
 arch/x86/kernel/cpu/cpu.h                       | 1 +
 arch/x86/kernel/cpu/tsx.c                       | 7 +++++--
 4 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index cb89dbdedc46..ced9e5596163 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -5693,6 +5693,9 @@
 			auto	- Disable TSX if X86_BUG_TAA is present,
 				  otherwise enable TSX on the system.
 
+			fake	- Do not clear the CPUID bits RTM and HLE even
+				  when hardware always aborts TSX transactions.
+
 			Not specifying this option is equivalent to tsx=off.
 
 			See Documentation/admin-guide/hw-vuln/tsx_async_abort.rst
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index d41b70fe4918..46fcc392a339 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -316,8 +316,9 @@ static void __init taa_select_mitigation(void)
 		return;
 	}
 
-	/* TSX previously disabled by tsx=off */
-	if (!boot_cpu_has(X86_FEATURE_RTM)) {
+	/* TSX previously disabled by tsx=off or by microcode */
+	if (!boot_cpu_has(X86_FEATURE_RTM) ||
+	     boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {
 		taa_mitigation = TAA_MITIGATION_TSX_DISABLED;
 		goto out;
 	}
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index 95521302630d..84a479866c4b 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -49,6 +49,7 @@ enum tsx_ctrl_states {
 	TSX_CTRL_ENABLE,
 	TSX_CTRL_DISABLE,
 	TSX_CTRL_RTM_ALWAYS_ABORT,
+	TSX_CTRL_FAKE,
 	TSX_CTRL_NOT_SUPPORTED,
 };
 
diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c
index 5ed99811504c..2f8e50584297 100644
--- a/arch/x86/kernel/cpu/tsx.c
+++ b/arch/x86/kernel/cpu/tsx.c
@@ -113,6 +113,8 @@ void __init tsx_init(void)
 			tsx_ctrl_state = TSX_CTRL_DISABLE;
 		} else if (!strcmp(arg, "auto")) {
 			tsx_ctrl_state = x86_get_tsx_auto_mode();
+		} else if (!strcmp(arg, "fake")) {
+			tsx_ctrl_state = TSX_CTRL_FAKE;
 		} else {
 			tsx_ctrl_state = TSX_CTRL_DISABLE;
 			pr_err("invalid option, defaulting to off\n");
@@ -131,9 +133,10 @@ void __init tsx_init(void)
 	 * Hardware will always abort a TSX transaction if both CPUID bits
 	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are enumerated.  In this case it
 	 * is better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
-	 * here.
+	 * here, except when user requested not to clear via cmdline tsx=fake.
 	 */
-	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
+	if (tsx_ctrl_state != TSX_CTRL_FAKE &&
+	    boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
 	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
 		tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
 		tsx_clear_cpuid();
-- 
git-series 0.9.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: Do not deploy workaround when TSX is deprecated
  2021-06-09 21:12 ` [PATCH 2/4] perf/x86/intel: Do not deploy workaround when TSX is deprecated Pawan Gupta
@ 2021-06-11  7:50   ` Borislav Petkov
  2021-06-11 21:34     ` Pawan Gupta
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2021-06-11  7:50 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Thomas Gleixner, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Brijesh Singh, Dave Hansen,
	Mike Rapoport, Anthony Steinhauser, Anand K Mistry, Andi Kleen,
	Miguel Ojeda, Nick Desaulniers, Joe Perches, linux-doc,
	linux-kernel, linux-perf-users

On Wed, Jun 09, 2021 at 02:12:38PM -0700, Pawan Gupta wrote:
> Earlier workaround added by commit 400816f60c54 ("perf/x86/intel:
> Implement support for TSX Force Abort") for perf counter interactions
> [1] are not required on some client systems which received a microcode
> update that deprecates TSX.
> 
> Bypass the perf workaround when such microcode is enumerated.
> 
> [1] Performance Monitoring Impact of Intel® Transactional Synchronization Extension Memory
>     http://cdrdv2.intel.com/v1/dl/getContent/604224
> 
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
> ---
>  arch/x86/events/intel/core.c | 22 ++++++++++++++++++----
>  1 file changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index e28892270c58..b5953e1e59a2 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -6016,10 +6016,24 @@ __init int intel_pmu_init(void)
>  		intel_pmu_pebs_data_source_skl(pmem);
>  
>  		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
> -			x86_pmu.flags |= PMU_FL_TFA;
> -			x86_pmu.get_event_constraints = tfa_get_event_constraints;
> -			x86_pmu.enable_all = intel_tfa_pmu_enable_all;
> -			x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
> +			u64 msr;
> +
> +			rdmsrl(MSR_TSX_FORCE_ABORT, msr);
> +			/* Systems that enumerate CPUID.RTM_ALWAYS_ABORT or
> +			 * support MSR_TSX_FORCE_ABORT[SDV_ENABLE_RTM] bit have
> +			 * TSX deprecated by default. TSX force abort hooks are
> +			 * not required on these systems.

So if they're not required, why aren't you simply disabling the force
abort "workaround" by clearing the feature flag?

	if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
		if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT))
			setup_clear_cpu_cap(X86_FEATURE_TSX_FORCE_ABORT);
	}

so that it doesn't get enabled in the first place?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] x86/msr: Define new bits in TSX_FORCE_ABORT MSR
  2021-06-09 20:58 ` [PATCH 1/4] x86/msr: Define new bits in TSX_FORCE_ABORT MSR Pawan Gupta
@ 2021-06-11  8:39   ` Borislav Petkov
  2021-06-11 21:31     ` Pawan Gupta
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2021-06-11  8:39 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Thomas Gleixner, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Brijesh Singh, Dave Hansen,
	Mike Rapoport, Anthony Steinhauser, Anand K Mistry, Andi Kleen,
	Miguel Ojeda, Nick Desaulniers, Joe Perches, linux-doc,
	linux-kernel, linux-perf-users

On Wed, Jun 09, 2021 at 01:58:02PM -0700, Pawan Gupta wrote:
> Intel client processors that support IA32_TSX_FORCE_ABORT MSR related to
> perf counter interaction [1] received a microcode update that deprecates
> Transactional Synchronization Extension (TSX) feature. MSR
> IA32_TSX_FORCE_ABORT bit FORCE_ABORT_RTM now defaults to 1, writes to
> this bit are ignored. A new bit TSX_CPUID_CLEAR clears the TSX related
> CPUID bits.
> 
> Below is the summary of changes to IA32_TSX_FORCE_ABORT MSR:
> 
>   Bit 0: FORCE_ABORT_RTM (legacy bit, new default=1) Status bit that
>   indicates if RTM transactions are always aborted. This bit is
>   essentially !SDV_ENABLE_RTM(Bit 2). Writes to this bit are ignored.
> 
>   Bit 1: TSX_CPUID_CLEAR (new bit, default=0) When set, CPUID.HLE = 0
>   and CPUID.RTM = 0.
> 
>   Bit 2: SDV_ENABLE_RTM (new bit, default=0) When clear, XBEGIN will
>   always abort with EAX code 0. When set, XBEGIN will not be forced to
>   abort (but will always abort in SGX enclaves). This bit is intended to
>   be SDV-only. If this bit is set transactional atomicity correctness is

SDV?

>   not certain.
> 
> Performance monitoring counter 3 is usable in all cases, regardless of
> the value of above bits.
> 
> A new CPUID bit CPUID.RTM_ALWAYS_ABORT (CPUID 7.EDX[11]) is added to
> indicate the status of always abort behavior.
> 
> Define these new CPUID and MSR bits.
> 
> [1] Performance Monitoring Impact of Intel® Transactional Synchronization Extension Memory
>     http://cdrdv2.intel.com/v1/dl/getContent/604224

That link does not look stable enough to put in commit messages.
Besides, you've said it all in the commit message already.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/4] x86/tsx: Clear CPUID bits when TSX always force aborts
  2021-06-09 21:13 ` [PATCH 3/4] x86/tsx: Clear CPUID bits when TSX always force aborts Pawan Gupta
@ 2021-06-11 10:03   ` Borislav Petkov
  2021-06-11 21:36     ` Pawan Gupta
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2021-06-11 10:03 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Thomas Gleixner, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Brijesh Singh, Dave Hansen,
	Mike Rapoport, Anthony Steinhauser, Anand K Mistry, Andi Kleen,
	Miguel Ojeda, Nick Desaulniers, Joe Perches, linux-doc,
	linux-kernel, linux-perf-users

On Wed, Jun 09, 2021 at 02:13:38PM -0700, Pawan Gupta wrote:
> @@ -114,6 +127,26 @@ void __init tsx_init(void)
>  			tsx_ctrl_state = TSX_CTRL_ENABLE;
>  	}
>  
> +	/*
> +	 * Hardware will always abort a TSX transaction if both CPUID bits
> +	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are enumerated.  In this case it
> +	 * is better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
> +	 * here.
> +	 */
> +	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
> +	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
> +		tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
> +		tsx_clear_cpuid();
> +		setup_clear_cpu_cap(X86_FEATURE_RTM);
> +		setup_clear_cpu_cap(X86_FEATURE_HLE);
> +		return;
> +	}

Why isn't this happening as the first thing on entry in that tsx_init()
function? IOW, there's no point to noodle through cmdline options etc if
the hardware will always abort transactions.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-06-09 21:14 ` [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE Pawan Gupta
@ 2021-06-11 10:06   ` Borislav Petkov
  2021-06-11 21:37     ` Pawan Gupta
  2021-07-06 19:52   ` Eduardo Habkost
  1 sibling, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2021-06-11 10:06 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Thomas Gleixner, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Brijesh Singh, Dave Hansen,
	Mike Rapoport, Anthony Steinhauser, Anand K Mistry, Andi Kleen,
	Miguel Ojeda, Nick Desaulniers, Joe Perches, linux-doc,
	linux-kernel, linux-perf-users

On Wed, Jun 09, 2021 at 02:14:39PM -0700, Pawan Gupta wrote:
> On CPUs that deprecated TSX, clearing the enumeration bits CPUID.RTM and
> CPUID.HLE may not be desirable in some corner cases. Like a saved guest
> would refuse to resume if it was saved before the microcode update
> that deprecated TSX.

That corner case needs a lot more justification. Otherwise this is just
silly.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/4] x86/msr: Define new bits in TSX_FORCE_ABORT MSR
  2021-06-11  8:39   ` Borislav Petkov
@ 2021-06-11 21:31     ` Pawan Gupta
  0 siblings, 0 replies; 28+ messages in thread
From: Pawan Gupta @ 2021-06-11 21:31 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Brijesh Singh, Dave Hansen,
	Mike Rapoport, Anthony Steinhauser, Anand K Mistry, Andi Kleen,
	Miguel Ojeda, Joe Perches, linux-doc, linux-kernel,
	linux-perf-users

On 11.06.2021 10:39, Borislav Petkov wrote:
>On Wed, Jun 09, 2021 at 01:58:02PM -0700, Pawan Gupta wrote:
>> Intel client processors that support IA32_TSX_FORCE_ABORT MSR related to
>> perf counter interaction [1] received a microcode update that deprecates
>> Transactional Synchronization Extension (TSX) feature. MSR
>> IA32_TSX_FORCE_ABORT bit FORCE_ABORT_RTM now defaults to 1, writes to
>> this bit are ignored. A new bit TSX_CPUID_CLEAR clears the TSX related
>> CPUID bits.
>>
>> Below is the summary of changes to IA32_TSX_FORCE_ABORT MSR:
>>
>>   Bit 0: FORCE_ABORT_RTM (legacy bit, new default=1) Status bit that
>>   indicates if RTM transactions are always aborted. This bit is
>>   essentially !SDV_ENABLE_RTM(Bit 2). Writes to this bit are ignored.
>>
>>   Bit 1: TSX_CPUID_CLEAR (new bit, default=0) When set, CPUID.HLE = 0
>>   and CPUID.RTM = 0.
>>
>>   Bit 2: SDV_ENABLE_RTM (new bit, default=0) When clear, XBEGIN will
>>   always abort with EAX code 0. When set, XBEGIN will not be forced to
>>   abort (but will always abort in SGX enclaves). This bit is intended to
>>   be SDV-only. If this bit is set transactional atomicity correctness is
>
>SDV?

Sorry should have expanded this. It stands for Software Development
Vehicle (SDV), i.e. developer systems.

>>   not certain.
>>
>> Performance monitoring counter 3 is usable in all cases, regardless of
>> the value of above bits.
>>
>> A new CPUID bit CPUID.RTM_ALWAYS_ABORT (CPUID 7.EDX[11]) is added to
>> indicate the status of always abort behavior.
>>
>> Define these new CPUID and MSR bits.
>>
>> [1] Performance Monitoring Impact of Intel® Transactional Synchronization Extension Memory
>>     http://cdrdv2.intel.com/v1/dl/getContent/604224
>
>That link does not look stable enough to put in commit messages.
>Besides, you've said it all in the commit message already.

The important part is the Document ID 604224. I can remove the link
completely, or add the document ID to it, whatever you suggest:

	[1] Performance Monitoring Impact of Intel® Transactional Synchronization Extension Memory
	    http://cdrdv2.intel.com/v1/dl/getContent/604224 (Document ID 604224).
							     ^^^^^^^^^^^^^^^^^^

Thanks,
Pawan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: Do not deploy workaround when TSX is deprecated
  2021-06-11  7:50   ` Borislav Petkov
@ 2021-06-11 21:34     ` Pawan Gupta
  2021-06-11 22:01       ` Borislav Petkov
  0 siblings, 1 reply; 28+ messages in thread
From: Pawan Gupta @ 2021-06-11 21:34 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Brijesh Singh, Dave Hansen,
	Mike Rapoport, Anthony Steinhauser, Anand K Mistry, Andi Kleen,
	Miguel Ojeda, Joe Perches, linux-doc, linux-kernel,
	linux-perf-users

On 11.06.2021 09:50, Borislav Petkov wrote:
>On Wed, Jun 09, 2021 at 02:12:38PM -0700, Pawan Gupta wrote:
>> Earlier workaround added by commit 400816f60c54 ("perf/x86/intel:
>> Implement support for TSX Force Abort") for perf counter interactions
>> [1] are not required on some client systems which received a microcode
>> update that deprecates TSX.
>>
>> Bypass the perf workaround when such microcode is enumerated.
>>
>> [1] Performance Monitoring Impact of Intel® Transactional Synchronization Extension Memory
>>     http://cdrdv2.intel.com/v1/dl/getContent/604224
>>
>> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> Reviewed-by: Tony Luck <tony.luck@intel.com>
>> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
>> ---
>>  arch/x86/events/intel/core.c | 22 ++++++++++++++++++----
>>  1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index e28892270c58..b5953e1e59a2 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -6016,10 +6016,24 @@ __init int intel_pmu_init(void)
>>  		intel_pmu_pebs_data_source_skl(pmem);
>>
>>  		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
>> -			x86_pmu.flags |= PMU_FL_TFA;
>> -			x86_pmu.get_event_constraints = tfa_get_event_constraints;
>> -			x86_pmu.enable_all = intel_tfa_pmu_enable_all;
>> -			x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
>> +			u64 msr;
>> +
>> +			rdmsrl(MSR_TSX_FORCE_ABORT, msr);
>> +			/* Systems that enumerate CPUID.RTM_ALWAYS_ABORT or
>> +			 * support MSR_TSX_FORCE_ABORT[SDV_ENABLE_RTM] bit have
>> +			 * TSX deprecated by default. TSX force abort hooks are
>> +			 * not required on these systems.
>
>So if they're not required, why aren't you simply disabling the force
>abort "workaround" by clearing the feature flag?

Feature flag also enumerates MSR_TSX_FORCE_ABORT, which is still present
after the microcode update. Patch 3/4 in this series clears the TSX
CPUID bits using MSR_TSX_FORCE_ABORT. So we do need the feature flag 
X86_FEATURE_TSX_FORCE_ABORT.

>
>	if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
>		if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT))
>			setup_clear_cpu_cap(X86_FEATURE_TSX_FORCE_ABORT);
>	}
>
>so that it doesn't get enabled in the first place?

Feature flag is needed as explained above.

Thanks,
Pawan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/4] x86/tsx: Clear CPUID bits when TSX always force aborts
  2021-06-11 10:03   ` Borislav Petkov
@ 2021-06-11 21:36     ` Pawan Gupta
  0 siblings, 0 replies; 28+ messages in thread
From: Pawan Gupta @ 2021-06-11 21:36 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Brijesh Singh, Dave Hansen,
	Mike Rapoport, Anthony Steinhauser, Anand K Mistry, Andi Kleen,
	Miguel Ojeda, Joe Perches, linux-doc, linux-kernel,
	linux-perf-users

On 11.06.2021 12:03, Borislav Petkov wrote:
>On Wed, Jun 09, 2021 at 02:13:38PM -0700, Pawan Gupta wrote:
>> @@ -114,6 +127,26 @@ void __init tsx_init(void)
>>  			tsx_ctrl_state = TSX_CTRL_ENABLE;
>>  	}
>>
>> +	/*
>> +	 * Hardware will always abort a TSX transaction if both CPUID bits
>> +	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are enumerated.  In this case it
>> +	 * is better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
>> +	 * here.
>> +	 */
>> +	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
>> +	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
>> +		tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
>> +		tsx_clear_cpuid();
>> +		setup_clear_cpu_cap(X86_FEATURE_RTM);
>> +		setup_clear_cpu_cap(X86_FEATURE_HLE);
>> +		return;
>> +	}
>
>Why isn't this happening as the first thing on entry in that tsx_init()
>function? IOW, there's no point to noodle through cmdline options etc if
>the hardware will always abort transactions.

Because the next patch is adding tsx=fake cmdline, for that this code
needs to be executed after cmdline parsing. I wanted to avoid the churn
to move this code between patches.

But, I see your comment for 4/4 i.e. tsx=fake may not be needed. It was
added after an internal test failure.

tsx=fake is really for a corner case and if that patch goes away it
would make sense to move the code in question here before cmdline
parsing.

Thanks,
Pawan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-06-11 10:06   ` Borislav Petkov
@ 2021-06-11 21:37     ` Pawan Gupta
  0 siblings, 0 replies; 28+ messages in thread
From: Pawan Gupta @ 2021-06-11 21:37 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Brijesh Singh, Dave Hansen,
	Mike Rapoport, Anthony Steinhauser, Anand K Mistry, Andi Kleen,
	Miguel Ojeda, Joe Perches, linux-doc, linux-kernel,
	linux-perf-users

On 11.06.2021 12:06, Borislav Petkov wrote:
>On Wed, Jun 09, 2021 at 02:14:39PM -0700, Pawan Gupta wrote:
>> On CPUs that deprecated TSX, clearing the enumeration bits CPUID.RTM and
>> CPUID.HLE may not be desirable in some corner cases. Like a saved guest
>> would refuse to resume if it was saved before the microcode update
>> that deprecated TSX.
>
>That corner case needs a lot more justification. Otherwise this is just
>silly.

Agree, chances of hitting the save/resume guest condition described
above is low. I am okay with dropping this patch.

Thanks,
Pawan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: Do not deploy workaround when TSX is deprecated
  2021-06-11 21:34     ` Pawan Gupta
@ 2021-06-11 22:01       ` Borislav Petkov
  2021-06-11 23:21         ` Pawan Gupta
  0 siblings, 1 reply; 28+ messages in thread
From: Borislav Petkov @ 2021-06-11 22:01 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Thomas Gleixner, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Brijesh Singh, Dave Hansen,
	Mike Rapoport, Anthony Steinhauser, Anand K Mistry, Andi Kleen,
	Miguel Ojeda, Joe Perches, linux-doc, linux-kernel,
	linux-perf-users

On Fri, Jun 11, 2021 at 02:34:43PM -0700, Pawan Gupta wrote:
> > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > > index e28892270c58..b5953e1e59a2 100644
> > > --- a/arch/x86/events/intel/core.c
> > > +++ b/arch/x86/events/intel/core.c
> > > @@ -6016,10 +6016,24 @@ __init int intel_pmu_init(void)
> > >  		intel_pmu_pebs_data_source_skl(pmem);
> > > 
> > >  		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
> > > -			x86_pmu.flags |= PMU_FL_TFA;
> > > -			x86_pmu.get_event_constraints = tfa_get_event_constraints;
> > > -			x86_pmu.enable_all = intel_tfa_pmu_enable_all;
> > > -			x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
> > > +			u64 msr;
> > > +
> > > +			rdmsrl(MSR_TSX_FORCE_ABORT, msr);
> > > +			/* Systems that enumerate CPUID.RTM_ALWAYS_ABORT or
> > > +			 * support MSR_TSX_FORCE_ABORT[SDV_ENABLE_RTM] bit have
> > > +			 * TSX deprecated by default. TSX force abort hooks are
> > > +			 * not required on these systems.
> > 
> > So if they're not required, why aren't you simply disabling the force
> > abort "workaround" by clearing the feature flag?
> 
> Feature flag also enumerates MSR_TSX_FORCE_ABORT, which is still present
> after the microcode update. Patch 3/4 in this series clears the TSX
> CPUID bits using MSR_TSX_FORCE_ABORT. So we do need the feature flag
> X86_FEATURE_TSX_FORCE_ABORT.

So it seems to me then, the if test above should be changed to:

	if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT) && 
	   !boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {
	   ...

and no need for the MSR read.

Please don't tell me there are configurations
where CPUID.RTM_ALWAYS_ABORT is clear but the
MSR_TSX_FORCE_ABORT[SDV_ENABLE_RTM] is there?!

This

"A new CPUID bit CPUID.RTM_ALWAYS_ABORT (CPUID 7.EDX[11]) is added to
indicate the status of always abort behavior."

tells me that the CPUID bit is always set by the microcode so we should
be ok.

If not, you should read that MSR early and do

	setup_force_cpu_cap(X86_FEATURE_RTM_ALWAYS_ABORT)

so that this "always abort" flag is always set when TSX transactions are
always aborted.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/4] perf/x86/intel: Do not deploy workaround when TSX is deprecated
  2021-06-11 22:01       ` Borislav Petkov
@ 2021-06-11 23:21         ` Pawan Gupta
  0 siblings, 0 replies; 28+ messages in thread
From: Pawan Gupta @ 2021-06-11 23:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Paolo Bonzini, Sean Christopherson,
	Kyung Min Park, Fenghua Yu, Ricardo Neri, Tom Lendacky,
	Juergen Gross, Krish Sadhukhan, Kan Liang, Joerg Roedel,
	Victor Ding, Srinivas Pandruvada, Brijesh Singh, Dave Hansen,
	Mike Rapoport, Anthony Steinhauser, Anand K Mistry, Andi Kleen,
	Miguel Ojeda, Joe Perches, linux-doc, linux-kernel,
	linux-perf-users

On 12.06.2021 00:01, Borislav Petkov wrote:
>On Fri, Jun 11, 2021 at 02:34:43PM -0700, Pawan Gupta wrote:
>> > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> > > index e28892270c58..b5953e1e59a2 100644
>> > > --- a/arch/x86/events/intel/core.c
>> > > +++ b/arch/x86/events/intel/core.c
>> > > @@ -6016,10 +6016,24 @@ __init int intel_pmu_init(void)
>> > >  		intel_pmu_pebs_data_source_skl(pmem);
>> > >
>> > >  		if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
>> > > -			x86_pmu.flags |= PMU_FL_TFA;
>> > > -			x86_pmu.get_event_constraints = tfa_get_event_constraints;
>> > > -			x86_pmu.enable_all = intel_tfa_pmu_enable_all;
>> > > -			x86_pmu.commit_scheduling = intel_tfa_commit_scheduling;
>> > > +			u64 msr;
>> > > +
>> > > +			rdmsrl(MSR_TSX_FORCE_ABORT, msr);
>> > > +			/* Systems that enumerate CPUID.RTM_ALWAYS_ABORT or
>> > > +			 * support MSR_TSX_FORCE_ABORT[SDV_ENABLE_RTM] bit have
>> > > +			 * TSX deprecated by default. TSX force abort hooks are
>> > > +			 * not required on these systems.
>> >
>> > So if they're not required, why aren't you simply disabling the force
>> > abort "workaround" by clearing the feature flag?
>>
>> Feature flag also enumerates MSR_TSX_FORCE_ABORT, which is still present
>> after the microcode update. Patch 3/4 in this series clears the TSX
>> CPUID bits using MSR_TSX_FORCE_ABORT. So we do need the feature flag
>> X86_FEATURE_TSX_FORCE_ABORT.
>
>So it seems to me then, the if test above should be changed to:
>
>	if (boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT) &&
>	   !boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {
>	   ...
>
>and no need for the MSR read.
>
>Please don't tell me there are configurations
>where CPUID.RTM_ALWAYS_ABORT is clear but the
>MSR_TSX_FORCE_ABORT[SDV_ENABLE_RTM] is there?!

MSR_TSX_FORCE_ABORT[SDV_ENABLE_RTM]=1 actually clears
CPUID.RTM_ALWAYS_ABORT, because in this case RTM is enabled and doesn't
always aborts. As the code above is only executed at bootup and BIOS is
not expected to set it. So at bootup SDV_ENABLE_RTM will be clear (if we
ignore kexec) then yes, MSR read can avoided.

>
>This
>
>"A new CPUID bit CPUID.RTM_ALWAYS_ABORT (CPUID 7.EDX[11]) is added to
>indicate the status of always abort behavior."
>
>tells me that the CPUID bit is always set by the microcode so we should
>be ok.

Yes, but MSR_TSX_FORCE_ABORT[SDV_ENABLE_RTM]=1 clears it. As
SDV_ENABLE_RTM is expected to be set only by developers (e.g. using
msr-tools), it should be fine in most other user cases. So we can avoid
reading the MSR.

>
>If not, you should read that MSR early and do

Isn't the MSR read early enough already:

	early_initcall()
		init_hw_perf_events()
			intel_pmu_init()
				MSR read

Anyways, we can avoid reading the MSR completely and rely on
CPUID.RTM_ALWAYS_ABORT. I will change it in the next version.

Thanks,
Pawan

>
>	setup_force_cpu_cap(X86_FEATURE_RTM_ALWAYS_ABORT)
>
>so that this "always abort" flag is always set when TSX transactions are
>always aborted.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-06-09 21:14 ` [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE Pawan Gupta
  2021-06-11 10:06   ` Borislav Petkov
@ 2021-07-06 19:52   ` Eduardo Habkost
  2021-07-06 21:05     ` Paolo Bonzini
  2021-07-06 21:16     ` Pawan Gupta
  1 sibling, 2 replies; 28+ messages in thread
From: Eduardo Habkost @ 2021-07-06 19:52 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Thomas Gleixner, Borislav Petkov, Jonathan Corbet,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, x86,
	H. Peter Anvin, Paul E. McKenney, Randy Dunlap, Andrew Morton,
	Maciej W. Rozycki, Viresh Kumar, Vlastimil Babka, Tony Luck,
	Paolo Bonzini, Sean Christopherson, Kyung Min Park, Fenghua Yu,
	Ricardo Neri, Tom Lendacky, Juergen Gross, Krish Sadhukhan,
	Kan Liang, Joerg Roedel, Victor Ding, Srinivas Pandruvada,
	Brijesh Singh, Dave Hansen, Mike Rapoport, Anthony Steinhauser,
	Anand K Mistry, Andi Kleen, Miguel Ojeda, Nick Desaulniers,
	Joe Perches, linux-doc, linux-kernel, linux-perf-users, kvm

On Wed, Jun 09, 2021 at 02:14:39PM -0700, Pawan Gupta wrote:
> On CPUs that deprecated TSX, clearing the enumeration bits CPUID.RTM and
> CPUID.HLE may not be desirable in some corner cases. Like a saved guest
> would refuse to resume if it was saved before the microcode update
> that deprecated TSX.

Why is a global option necessary to allow those guests to be
resumed?  Why can't KVM_GET_SUPPORTED_CPUID always return the HLE
and RTM bits as supported when the host CPU has them?


> 
> Add a cmdline option "tsx=fake" to not clear CPUID bits even when the
> hardware always aborts TSX transactions.
> 
> Suggested-by: Tony Luck <tony.luck@intel.com>
> Suggested-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Neelima Krishnan <neelima.krishnan@intel.com>
[...]

-- 
Eduardo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-06 19:52   ` Eduardo Habkost
@ 2021-07-06 21:05     ` Paolo Bonzini
  2021-07-06 21:33       ` Eduardo Habkost
  2021-07-06 21:16     ` Pawan Gupta
  1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2021-07-06 21:05 UTC (permalink / raw)
  To: Eduardo Habkost, Pawan Gupta
  Cc: Thomas Gleixner, Borislav Petkov, Jonathan Corbet,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, x86,
	H. Peter Anvin, Paul E. McKenney, Randy Dunlap, Andrew Morton,
	Maciej W. Rozycki, Viresh Kumar, Vlastimil Babka, Tony Luck,
	Sean Christopherson, Kyung Min Park, Fenghua Yu, Ricardo Neri,
	Tom Lendacky, Juergen Gross, Krish Sadhukhan, Kan Liang,
	Joerg Roedel, Victor Ding, Srinivas Pandruvada, Brijesh Singh,
	Dave Hansen, Mike Rapoport, Anthony Steinhauser, Anand K Mistry,
	Andi Kleen, Miguel Ojeda, Nick Desaulniers, Joe Perches,
	linux-doc, linux-kernel, linux-perf-users, kvm

On 06/07/21 21:52, Eduardo Habkost wrote:
> On Wed, Jun 09, 2021 at 02:14:39PM -0700, Pawan Gupta wrote:
>> On CPUs that deprecated TSX, clearing the enumeration bits CPUID.RTM and
>> CPUID.HLE may not be desirable in some corner cases. Like a saved guest
>> would refuse to resume if it was saved before the microcode update
>> that deprecated TSX.
> Why is a global option necessary to allow those guests to be
> resumed?  Why can't KVM_GET_SUPPORTED_CPUID always return the HLE
> and RTM bits as supported when the host CPU has them?

It's a bit tricky, because HLE and RTM won't really behave well.  An old 
guest that sees RTM=1 might end up retrying and aborting transactions 
too much.  So I'm not sure that a QEMU "-cpu host" guest should have HLE 
and RTM enabled.

So it makes sense to handle it in userspace, with one of the two 
following possibilities:

- userspace sees TSX_FORCE_ABORT and if so it somehow "discourages" 
setting HLE/RTM, even though they are shown as supported

- userspace sees TSX_FORCE_ABORT and if so it knows HLE/RTM can be set, 
even though they are discouraged in general

In any case, KVM's "supported CPUID" is based on the host features but 
independent.  KVM can decide to show or hide the hardware HLE and RTM 
bits independent of the host tsx= setting; it may make sense to hide the 
bits via a module parameter, but in any case this patch is not needed.

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-06 19:52   ` Eduardo Habkost
  2021-07-06 21:05     ` Paolo Bonzini
@ 2021-07-06 21:16     ` Pawan Gupta
  2021-07-06 21:19       ` Eduardo Habkost
  1 sibling, 1 reply; 28+ messages in thread
From: Pawan Gupta @ 2021-07-06 21:16 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Thomas Gleixner, Borislav Petkov, Jonathan Corbet,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, x86,
	H. Peter Anvin, Paul E. McKenney, Randy Dunlap, Andrew Morton,
	Maciej W. Rozycki, Viresh Kumar, Vlastimil Babka, Tony Luck,
	Paolo Bonzini, Sean Christopherson, Kyung Min Park, Fenghua Yu,
	Ricardo Neri, Tom Lendacky, Juergen Gross, Krish Sadhukhan,
	Kan Liang, Joerg Roedel, Victor Ding, Srinivas Pandruvada,
	Brijesh Singh, Dave Hansen, Mike Rapoport, Anthony Steinhauser,
	Anand K Mistry, Andi Kleen, Miguel Ojeda, Nick Desaulniers,
	Joe Perches, linux-doc, linux-kernel, linux-perf-users, kvm

On 06.07.2021 15:52, Eduardo Habkost wrote:
>On Wed, Jun 09, 2021 at 02:14:39PM -0700, Pawan Gupta wrote:
>> On CPUs that deprecated TSX, clearing the enumeration bits CPUID.RTM and
>> CPUID.HLE may not be desirable in some corner cases. Like a saved guest
>> would refuse to resume if it was saved before the microcode update
>> that deprecated TSX.
>
>Why is a global option necessary to allow those guests to be
>resumed?  Why can't KVM_GET_SUPPORTED_CPUID always return the HLE
>and RTM bits as supported when the host CPU has them?

Yes, the global option is unnecessary and this patch was dropped in v2.

Thanks,
Pawan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-06 21:16     ` Pawan Gupta
@ 2021-07-06 21:19       ` Eduardo Habkost
  2021-07-06 22:51         ` Pawan Gupta
  0 siblings, 1 reply; 28+ messages in thread
From: Eduardo Habkost @ 2021-07-06 21:19 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Thomas Gleixner, Borislav Petkov, Jonathan Corbet,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, x86,
	H. Peter Anvin, Paul E. McKenney, Randy Dunlap, Andrew Morton,
	Maciej W. Rozycki, Viresh Kumar, Vlastimil Babka, Tony Luck,
	Paolo Bonzini, Sean Christopherson, Kyung Min Park, Fenghua Yu,
	Ricardo Neri, Tom Lendacky, Juergen Gross, Krish Sadhukhan,
	Kan Liang, Joerg Roedel, Victor Ding, Srinivas Pandruvada,
	Brijesh Singh, Dave Hansen, Mike Rapoport, Anthony Steinhauser,
	Anand K Mistry, Andi Kleen, Miguel Ojeda, Nick Desaulniers,
	Joe Perches, linux-doc, linux-kernel, linux-perf-users, kvm

On Tue, Jul 6, 2021 at 5:15 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On 06.07.2021 15:52, Eduardo Habkost wrote:
> >On Wed, Jun 09, 2021 at 02:14:39PM -0700, Pawan Gupta wrote:
> >> On CPUs that deprecated TSX, clearing the enumeration bits CPUID.RTM and
> >> CPUID.HLE may not be desirable in some corner cases. Like a saved guest
> >> would refuse to resume if it was saved before the microcode update
> >> that deprecated TSX.
> >
> >Why is a global option necessary to allow those guests to be
> >resumed?  Why can't KVM_GET_SUPPORTED_CPUID always return the HLE
> >and RTM bits as supported when the host CPU has them?
>
> Yes, the global option is unnecessary and this patch was dropped in v2.

Was the behaviour this patch originally tried to fix changed in v2 as
well? Is it going to be possible to resume a HLE=1,RTM=1 VM on a
TSX_FORCE_ABORT=1 host with no extra kernel command line options
needed?

-- 
Eduardo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-06 21:05     ` Paolo Bonzini
@ 2021-07-06 21:33       ` Eduardo Habkost
  2021-07-06 21:58         ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Eduardo Habkost @ 2021-07-06 21:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Pawan Gupta, Thomas Gleixner, Borislav Petkov, Jonathan Corbet,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, x86,
	H. Peter Anvin, Paul E. McKenney, Randy Dunlap, Andrew Morton,
	Maciej W. Rozycki, Viresh Kumar, Vlastimil Babka, Tony Luck,
	Sean Christopherson, Kyung Min Park, Fenghua Yu, Ricardo Neri,
	Tom Lendacky, Juergen Gross, Krish Sadhukhan, Kan Liang,
	Joerg Roedel, Victor Ding, Srinivas Pandruvada, Brijesh Singh,
	Dave Hansen, Mike Rapoport, Anthony Steinhauser, Anand K Mistry,
	Andi Kleen, Miguel Ojeda, Nick Desaulniers, Joe Perches,
	linux-doc, linux-kernel, linux-perf-users, kvm

On Tue, Jul 6, 2021 at 5:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 06/07/21 21:52, Eduardo Habkost wrote:
> > On Wed, Jun 09, 2021 at 02:14:39PM -0700, Pawan Gupta wrote:
> >> On CPUs that deprecated TSX, clearing the enumeration bits CPUID.RTM and
> >> CPUID.HLE may not be desirable in some corner cases. Like a saved guest
> >> would refuse to resume if it was saved before the microcode update
> >> that deprecated TSX.
> > Why is a global option necessary to allow those guests to be
> > resumed?  Why can't KVM_GET_SUPPORTED_CPUID always return the HLE
> > and RTM bits as supported when the host CPU has them?
>
> It's a bit tricky, because HLE and RTM won't really behave well.  An old
> guest that sees RTM=1 might end up retrying and aborting transactions
> too much.  So I'm not sure that a QEMU "-cpu host" guest should have HLE
> and RTM enabled.

Is the purpose of GET_SUPPORTED_CPUID to return what is supported by
KVM, or to return what "-cpu host" should enable by default? They are
conflicting requirements in this case.

>
> So it makes sense to handle it in userspace, with one of the two
> following possibilities:
>
> - userspace sees TSX_FORCE_ABORT and if so it somehow "discourages"
> setting HLE/RTM, even though they are shown as supported
>
> - userspace sees TSX_FORCE_ABORT and if so it knows HLE/RTM can be set,
> even though they are discouraged in general

In either case, we can make new userspace behave well. I'm worried
about existing userspace:

Returning HLE=1,RTM=1 in GET_SUPPORTED_CPUID makes existing userspace
take bad decisions until it's updated.

Returning HLE=0,RTM=0 in GET_SUPPORTED_CPUID prevents existing
userspace from resuming existing VMs (despite being technically
possible).

The first option has an easy workaround that doesn't require a
software update (disabling HLE/RTM in the VM configuration). The
second option doesn't have a workaround. I'm inclined towards the
first option.


>
> In any case, KVM's "supported CPUID" is based on the host features but
> independent.  KVM can decide to show or hide the hardware HLE and RTM
> bits independent of the host tsx= setting; it may make sense to hide the
> bits via a module parameter, but in any case this patch is not needed.
>
> Paolo
>

-- 
Eduardo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-06 21:33       ` Eduardo Habkost
@ 2021-07-06 21:58         ` Paolo Bonzini
  2021-07-07 15:08           ` Eduardo Habkost
  0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2021-07-06 21:58 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Pawan Gupta, Thomas Gleixner, Borislav Petkov, Jonathan Corbet,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, x86,
	H. Peter Anvin, Paul E. McKenney, Randy Dunlap, Andrew Morton,
	Maciej W. Rozycki, Viresh Kumar, Vlastimil Babka, Tony Luck,
	Sean Christopherson, Kyung Min Park, Fenghua Yu, Ricardo Neri,
	Tom Lendacky, Juergen Gross, Krish Sadhukhan, Kan Liang,
	Joerg Roedel, Victor Ding, Srinivas Pandruvada, Brijesh Singh,
	Dave Hansen, Mike Rapoport, Anthony Steinhauser, Anand K Mistry,
	Andi Kleen, Miguel Ojeda, Nick Desaulniers, Joe Perches,
	linux-doc, linux-kernel, linux-perf-users, kvm

On 06/07/21 23:33, Eduardo Habkost wrote:
> On Tue, Jul 6, 2021 at 5:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>> It's a bit tricky, because HLE and RTM won't really behave well.  An old
>> guest that sees RTM=1 might end up retrying and aborting transactions
>> too much.  So I'm not sure that a QEMU "-cpu host" guest should have HLE
>> and RTM enabled.
> 
> Is the purpose of GET_SUPPORTED_CPUID to return what is supported by
> KVM, or to return what "-cpu host" should enable by default? They are
> conflicting requirements in this case.

In theory there is GET_EMULATED_CPUID for the former, so it should be 
the latter.  In practice neither QEMU nor Libvirt use it; maybe now we 
have a good reason to add it, but note that userspace could also check 
host RTM_ALWAYS_ABORT.

> Returning HLE=1,RTM=1 in GET_SUPPORTED_CPUID makes existing userspace
> take bad decisions until it's updated.
> 
> Returning HLE=0,RTM=0 in GET_SUPPORTED_CPUID prevents existing
> userspace from resuming existing VMs (despite being technically
> possible).
> 
> The first option has an easy workaround that doesn't require a
> software update (disabling HLE/RTM in the VM configuration). The
> second option doesn't have a workaround. I'm inclined towards the
> first option.

The default has already been tsx=off for a while though, so checking 
either GET_EMULATED_CPUID or host RTM_ALWAYS_ABORT in userspace might 
also be feasible for those that are still on tsx=on.

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-06 21:19       ` Eduardo Habkost
@ 2021-07-06 22:51         ` Pawan Gupta
  0 siblings, 0 replies; 28+ messages in thread
From: Pawan Gupta @ 2021-07-06 22:51 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Thomas Gleixner, Borislav Petkov, Jonathan Corbet,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, x86,
	H. Peter Anvin, Paul E. McKenney, Randy Dunlap, Andrew Morton,
	Maciej W. Rozycki, Viresh Kumar, Vlastimil Babka, Tony Luck,
	Paolo Bonzini, Sean Christopherson, Kyung Min Park, Fenghua Yu,
	Ricardo Neri, Tom Lendacky, Juergen Gross, Krish Sadhukhan,
	Kan Liang, Joerg Roedel, Victor Ding, Srinivas Pandruvada,
	Brijesh Singh, Dave Hansen, Mike Rapoport, Anthony Steinhauser,
	Anand K Mistry, Andi Kleen, Miguel Ojeda, Nick Desaulniers,
	Joe Perches, linux-doc, linux-kernel, linux-perf-users, kvm

On 06.07.2021 17:19, Eduardo Habkost wrote:
>On Tue, Jul 6, 2021 at 5:15 PM Pawan Gupta
><pawan.kumar.gupta@linux.intel.com> wrote:
>>
>> On 06.07.2021 15:52, Eduardo Habkost wrote:
>> >On Wed, Jun 09, 2021 at 02:14:39PM -0700, Pawan Gupta wrote:
>> >> On CPUs that deprecated TSX, clearing the enumeration bits CPUID.RTM and
>> >> CPUID.HLE may not be desirable in some corner cases. Like a saved guest
>> >> would refuse to resume if it was saved before the microcode update
>> >> that deprecated TSX.
>> >
>> >Why is a global option necessary to allow those guests to be
>> >resumed?  Why can't KVM_GET_SUPPORTED_CPUID always return the HLE
>> >and RTM bits as supported when the host CPU has them?
>>
>> Yes, the global option is unnecessary and this patch was dropped in v2.
>
>Was the behaviour this patch originally tried to fix changed in v2 as
>well? Is it going to be possible to resume a HLE=1,RTM=1 VM on a
>TSX_FORCE_ABORT=1 host with no extra kernel command line options
>needed?

The problem it tried to solve is still present, but the global switch
was thought to be unnecessary. I see that Paolo has some suggestions to
fix this in the userspace.

Thanks,
Pawan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-06 21:58         ` Paolo Bonzini
@ 2021-07-07 15:08           ` Eduardo Habkost
  2021-07-07 16:42             ` Jim Mattson
  0 siblings, 1 reply; 28+ messages in thread
From: Eduardo Habkost @ 2021-07-07 15:08 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Pawan Gupta, Thomas Gleixner, Borislav Petkov, Jonathan Corbet,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, x86,
	H. Peter Anvin, Paul E. McKenney, Randy Dunlap, Andrew Morton,
	Maciej W. Rozycki, Viresh Kumar, Vlastimil Babka, Tony Luck,
	Sean Christopherson, Kyung Min Park, Fenghua Yu, Ricardo Neri,
	Tom Lendacky, Juergen Gross, Krish Sadhukhan, Kan Liang,
	Joerg Roedel, Victor Ding, Srinivas Pandruvada, Brijesh Singh,
	Dave Hansen, Mike Rapoport, Anthony Steinhauser, Anand K Mistry,
	Andi Kleen, Miguel Ojeda, Nick Desaulniers, Joe Perches,
	linux-doc, linux-kernel, linux-perf-users, kvm, Jiri Denemark,
	libvir-list, Michal Privoznik

CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware
that the definition of "supported CPU features" will probably become a
bit more complex in the future.

On Tue, Jul 6, 2021 at 5:58 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 06/07/21 23:33, Eduardo Habkost wrote:
> > On Tue, Jul 6, 2021 at 5:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >> It's a bit tricky, because HLE and RTM won't really behave well.  An old
> >> guest that sees RTM=1 might end up retrying and aborting transactions
> >> too much.  So I'm not sure that a QEMU "-cpu host" guest should have HLE
> >> and RTM enabled.
> >
> > Is the purpose of GET_SUPPORTED_CPUID to return what is supported by
> > KVM, or to return what "-cpu host" should enable by default? They are
> > conflicting requirements in this case.
>
> In theory there is GET_EMULATED_CPUID for the former, so it should be
> the latter.  In practice neither QEMU nor Libvirt use it; maybe now we
> have a good reason to add it, but note that userspace could also check
> host RTM_ALWAYS_ABORT.
>
> > Returning HLE=1,RTM=1 in GET_SUPPORTED_CPUID makes existing userspace
> > take bad decisions until it's updated.
> >
> > Returning HLE=0,RTM=0 in GET_SUPPORTED_CPUID prevents existing
> > userspace from resuming existing VMs (despite being technically
> > possible).
> >
> > The first option has an easy workaround that doesn't require a
> > software update (disabling HLE/RTM in the VM configuration). The
> > second option doesn't have a workaround. I'm inclined towards the
> > first option.
>
> The default has already been tsx=off for a while though, so checking
> either GET_EMULATED_CPUID or host RTM_ALWAYS_ABORT in userspace might
> also be feasible for those that are still on tsx=on.

This sounds like a perfect use case for GET_EMULATED_CPUID. My only
concern is breaking existing userspace.

But if this was already broken for a few kernel releases due to
tsx=off being the default, maybe GET_EMULATED_CPUID will be a
reasonable approach.

--
Eduardo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-07 15:08           ` Eduardo Habkost
@ 2021-07-07 16:42             ` Jim Mattson
  2021-07-07 17:08               ` Eduardo Habkost
  0 siblings, 1 reply; 28+ messages in thread
From: Jim Mattson @ 2021-07-07 16:42 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Paolo Bonzini, Pawan Gupta, Thomas Gleixner, Borislav Petkov,
	Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Sean Christopherson, Kyung Min Park,
	Fenghua Yu, Ricardo Neri, Tom Lendacky, Juergen Gross,
	Krish Sadhukhan, Kan Liang, Joerg Roedel, Victor Ding,
	Srinivas Pandruvada, Brijesh Singh, Dave Hansen, Mike Rapoport,
	Anthony Steinhauser, Anand K Mistry, Andi Kleen, Miguel Ojeda,
	Nick Desaulniers, Joe Perches, linux-doc, linux-kernel,
	linux-perf-users, kvm, Jiri Denemark, libvir-list,
	Michal Privoznik

On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
>
> CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware
> that the definition of "supported CPU features" will probably become a
> bit more complex in the future.

Has there ever been a clear definition? Family, model, and stepping,
for instance: are these the only values supported? That would make
cross-platform migration impossible. What about the vendor string? Is
that the only value supported? That would make cross-vendor migration
impossible. For the maximum input value for basic CPUID information
(CPUID.0H:EAX), is that the only value supported, or is it the maximum
value supported? On the various individual feature bits, does a '1'
imply that '0' is also supported, or is '1' the only value supported?
What about the feature bits with reversed polarity (e.g.
CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])?

This API has never made sense to me. I have no idea how to interpret
what it is telling me.

> On Tue, Jul 6, 2021 at 5:58 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > On 06/07/21 23:33, Eduardo Habkost wrote:
> > > On Tue, Jul 6, 2021 at 5:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > >> It's a bit tricky, because HLE and RTM won't really behave well.  An old
> > >> guest that sees RTM=1 might end up retrying and aborting transactions
> > >> too much.  So I'm not sure that a QEMU "-cpu host" guest should have HLE
> > >> and RTM enabled.
> > >
> > > Is the purpose of GET_SUPPORTED_CPUID to return what is supported by
> > > KVM, or to return what "-cpu host" should enable by default? They are
> > > conflicting requirements in this case.
> >
> > In theory there is GET_EMULATED_CPUID for the former, so it should be
> > the latter.  In practice neither QEMU nor Libvirt use it; maybe now we
> > have a good reason to add it, but note that userspace could also check
> > host RTM_ALWAYS_ABORT.
> >
> > > Returning HLE=1,RTM=1 in GET_SUPPORTED_CPUID makes existing userspace
> > > take bad decisions until it's updated.
> > >
> > > Returning HLE=0,RTM=0 in GET_SUPPORTED_CPUID prevents existing
> > > userspace from resuming existing VMs (despite being technically
> > > possible).
> > >
> > > The first option has an easy workaround that doesn't require a
> > > software update (disabling HLE/RTM in the VM configuration). The
> > > second option doesn't have a workaround. I'm inclined towards the
> > > first option.
> >
> > The default has already been tsx=off for a while though, so checking
> > either GET_EMULATED_CPUID or host RTM_ALWAYS_ABORT in userspace might
> > also be feasible for those that are still on tsx=on.
>
> This sounds like a perfect use case for GET_EMULATED_CPUID. My only
> concern is breaking existing userspace.
>
> But if this was already broken for a few kernel releases due to
> tsx=off being the default, maybe GET_EMULATED_CPUID will be a
> reasonable approach.
>
> --
> Eduardo
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-07 16:42             ` Jim Mattson
@ 2021-07-07 17:08               ` Eduardo Habkost
  2021-07-07 17:15                 ` Jim Mattson
  0 siblings, 1 reply; 28+ messages in thread
From: Eduardo Habkost @ 2021-07-07 17:08 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Pawan Gupta, Thomas Gleixner, Borislav Petkov,
	Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Sean Christopherson, Kyung Min Park,
	Fenghua Yu, Ricardo Neri, Tom Lendacky, Juergen Gross,
	Krish Sadhukhan, Kan Liang, Joerg Roedel, Victor Ding,
	Srinivas Pandruvada, Brijesh Singh, Dave Hansen, Mike Rapoport,
	Anthony Steinhauser, Anand K Mistry, Andi Kleen, Miguel Ojeda,
	Nick Desaulniers, Joe Perches, linux-doc, linux-kernel,
	linux-perf-users, kvm, Jiri Denemark, libvir-list,
	Michal Privoznik

On Wed, Jul 7, 2021 at 12:42 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
> >
> > CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware
> > that the definition of "supported CPU features" will probably become a
> > bit more complex in the future.
>
> Has there ever been a clear definition? Family, model, and stepping,
> for instance: are these the only values supported? That would make
> cross-platform migration impossible. What about the vendor string? Is
> that the only value supported? That would make cross-vendor migration
> impossible. For the maximum input value for basic CPUID information
> (CPUID.0H:EAX), is that the only value supported, or is it the maximum
> value supported? On the various individual feature bits, does a '1'
> imply that '0' is also supported, or is '1' the only value supported?
> What about the feature bits with reversed polarity (e.g.
> CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])?
>
> This API has never made sense to me. I have no idea how to interpret
> what it is telling me.

Is this about GET_SUPPORTED_CPUID, QEMU's query-cpu-model-expansion &
related commands, or the libvirt CPU APIs?

-- 
Eduardo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-07 17:08               ` Eduardo Habkost
@ 2021-07-07 17:15                 ` Jim Mattson
  2021-07-07 18:23                   ` Eduardo Habkost
  0 siblings, 1 reply; 28+ messages in thread
From: Jim Mattson @ 2021-07-07 17:15 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Paolo Bonzini, Pawan Gupta, Thomas Gleixner, Borislav Petkov,
	Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Sean Christopherson, Kyung Min Park,
	Fenghua Yu, Ricardo Neri, Tom Lendacky, Juergen Gross,
	Krish Sadhukhan, Kan Liang, Joerg Roedel, Victor Ding,
	Srinivas Pandruvada, Brijesh Singh, Dave Hansen, Mike Rapoport,
	Anthony Steinhauser, Anand K Mistry, Andi Kleen, Miguel Ojeda,
	Nick Desaulniers, Joe Perches, linux-doc, linux-kernel,
	linux-perf-users, kvm, Jiri Denemark, libvir-list,
	Michal Privoznik

On Wed, Jul 7, 2021 at 10:08 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
>
> On Wed, Jul 7, 2021 at 12:42 PM Jim Mattson <jmattson@google.com> wrote:
> >
> > On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
> > >
> > > CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware
> > > that the definition of "supported CPU features" will probably become a
> > > bit more complex in the future.
> >
> > Has there ever been a clear definition? Family, model, and stepping,
> > for instance: are these the only values supported? That would make
> > cross-platform migration impossible. What about the vendor string? Is
> > that the only value supported? That would make cross-vendor migration
> > impossible. For the maximum input value for basic CPUID information
> > (CPUID.0H:EAX), is that the only value supported, or is it the maximum
> > value supported? On the various individual feature bits, does a '1'
> > imply that '0' is also supported, or is '1' the only value supported?
> > What about the feature bits with reversed polarity (e.g.
> > CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])?
> >
> > This API has never made sense to me. I have no idea how to interpret
> > what it is telling me.
>
> Is this about GET_SUPPORTED_CPUID, QEMU's query-cpu-model-expansion &
> related commands, or the libvirt CPU APIs?

This is my ongoing rant about KVM_GET_SUPPORTED_CPUID.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-07 17:15                 ` Jim Mattson
@ 2021-07-07 18:23                   ` Eduardo Habkost
  2021-07-08  9:15                     ` Paolo Bonzini
  0 siblings, 1 reply; 28+ messages in thread
From: Eduardo Habkost @ 2021-07-07 18:23 UTC (permalink / raw)
  To: Jim Mattson
  Cc: Paolo Bonzini, Pawan Gupta, Thomas Gleixner, Borislav Petkov,
	Jonathan Corbet, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, x86, H. Peter Anvin, Paul E. McKenney,
	Randy Dunlap, Andrew Morton, Maciej W. Rozycki, Viresh Kumar,
	Vlastimil Babka, Tony Luck, Sean Christopherson, Kyung Min Park,
	Fenghua Yu, Ricardo Neri, Tom Lendacky, Juergen Gross,
	Krish Sadhukhan, Kan Liang, Joerg Roedel, Victor Ding,
	Srinivas Pandruvada, Brijesh Singh, Dave Hansen, Mike Rapoport,
	Anthony Steinhauser, Anand K Mistry, Andi Kleen, Miguel Ojeda,
	Nick Desaulniers, Joe Perches, linux-doc, linux-kernel,
	linux-perf-users, kvm, Jiri Denemark, libvir-list,
	Michal Privoznik

On Wed, Jul 7, 2021 at 1:18 PM Jim Mattson <jmattson@google.com> wrote:
>
> On Wed, Jul 7, 2021 at 10:08 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
> >
> > On Wed, Jul 7, 2021 at 12:42 PM Jim Mattson <jmattson@google.com> wrote:
> > >
> > > On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
> > > >
> > > > CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware
> > > > that the definition of "supported CPU features" will probably become a
> > > > bit more complex in the future.
> > >
> > > Has there ever been a clear definition? Family, model, and stepping,
> > > for instance: are these the only values supported? That would make
> > > cross-platform migration impossible. What about the vendor string? Is
> > > that the only value supported? That would make cross-vendor migration
> > > impossible. For the maximum input value for basic CPUID information
> > > (CPUID.0H:EAX), is that the only value supported, or is it the maximum
> > > value supported? On the various individual feature bits, does a '1'
> > > imply that '0' is also supported, or is '1' the only value supported?
> > > What about the feature bits with reversed polarity (e.g.
> > > CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])?
> > >
> > > This API has never made sense to me. I have no idea how to interpret
> > > what it is telling me.
> >
> > Is this about GET_SUPPORTED_CPUID, QEMU's query-cpu-model-expansion &
> > related commands, or the libvirt CPU APIs?
>
> This is my ongoing rant about KVM_GET_SUPPORTED_CPUID.
>

I agree the definition is not clear. I have tried to enumerate below
what QEMU assumes about the return value of KVM_GET_SUPPORTED_CPUID.
These are a collection of workarounds and feature-specific rules that
are encoded in the kvm_arch_get_supported_cpuid()
x86_cpu_filter_features(), and cpu_x86_cpuid() functions in QEMU.

1. Passing through the returned values (unchanged) from
KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID is assumed to be always safe,
as long as the ability to save/resume VCPU state is not required.
(This is the behavior implemented by "-cpu host,migratable=off")
2. The safety of setting a bit to a different value requires specific
knowledge about the CPUID bit.
2.1. For a specific set of registers (see below), QEMU assumes it's
safe to set the bit to 0 when KVM_GET_SUPPORTED_CPUID returns 1.
2.2. For a few specific leaves (see below), there are more complex rules.
2.4. For all other leaves, QEMU doesn't use the return value of
KVM_GET_SUPPORTED_CPUID at all (AFAICS).


The CPUID leaves mentioned in 2.1 are:

CPUID[1].EDX
CPUID[1].ECX
CPUID[6].EAX
CPUID[EAX=7,ECX=0].EBX
- This unfortunately includes de-feature bits like FDP_EXCPTN_ONLY and
ZERO_FCS_FDS
CPUID[EAX=7,ECX=0].ECX
CPUID[EAX=7,ECX=0].EDX
CPUID[EAX=7,ECX=1].EAX
CPUID[EAX=0Dh,ECX=0].EAX
CPUID[EAX=0Dh,ECX=0].EDX
CPUID[EAX=0Dh,ECX=1].EAX
- Note that CPUID[0Dh] has additional logic to ensure XSAVE component
info on CPUID is consistent
CPUID[40000001h].EAX
CPUID[40000001h].EDX
CPUID[80000001h].EDX
CPUID[80000001h].ECX
CPUID[80000007h].EDX
CPUID[80000008h].EBX
CPUID[8000000Ah].EDX
CPUID[C0000001h].EDX


Some of the CPUID leaves mentioned in 2.2 are:

CPUID[1].ECX.HYPERVISOR[bit 31]
- Can be enabled unconditionally
CPUID[1].ECX.TSC_DEADLINE_TIMER[bit 24]
- Can be set to 1 if using the in-kernel irqchip and
KVM_CAP_TSC_DEADLINE_TIMER is enabled
CPUID[1].ECX.X2APIC[bit 21]
- Can be set to 1 if using the in-kernel irqchip
CPUID[1].ECX.MONITOR[bit 3]
- Can be set to 1 if KVM_X86_DISABLE_EXITS_MWAIT is enabled
CPUID[6].EAX.ARAT[bit 2]
- Can be enabled unconditionally
CPUID[EAX=7,ECX=0].EDX.ARCH_CAPABILITIES
- Workaround for KVM bug in Linux v4.17-v4.20
CPUID[EAX=14h,ECX=0], CPUID{EAX=14h,ECX=1]
- Most bits must match the host, unless
CPUID[EAX=7,ECX=0].EBX.INTEL_PT[bit 25] is 0
CPUID[80000001h].EDX
- AMD-specific feature flag aliases can be set based on CPUID[1].EDX
CPUID[40000001h].EAX
- KVM_FEATURE_PV_UNHALT requires in-kernel irqchip
- KVM_FEATURE_MSI_EXT_DEST_ID requires split irqchip
CPUID[40000001].EDX.KVM_HINTS_REALTIME
- Can be enabled unconditionally

--
Eduardo


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE
  2021-07-07 18:23                   ` Eduardo Habkost
@ 2021-07-08  9:15                     ` Paolo Bonzini
  0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2021-07-08  9:15 UTC (permalink / raw)
  To: Eduardo Habkost, Jim Mattson
  Cc: Pawan Gupta, Thomas Gleixner, Borislav Petkov, Jonathan Corbet,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, x86,
	H. Peter Anvin, Paul E. McKenney, Randy Dunlap, Andrew Morton,
	Maciej W. Rozycki, Viresh Kumar, Vlastimil Babka, Tony Luck,
	Sean Christopherson, Kyung Min Park, Fenghua Yu, Ricardo Neri,
	Tom Lendacky, Juergen Gross, Krish Sadhukhan, Kan Liang,
	Joerg Roedel, Victor Ding, Srinivas Pandruvada, Brijesh Singh,
	Dave Hansen, Mike Rapoport, Anthony Steinhauser, Anand K Mistry,
	Andi Kleen, Miguel Ojeda, Nick Desaulniers, Joe Perches,
	linux-doc, linux-kernel, linux-perf-users, kvm, Jiri Denemark,
	libvir-list, Michal Privoznik

On 07/07/21 20:23, Eduardo Habkost wrote:
> On Wed, Jul 7, 2021 at 1:18 PM Jim Mattson <jmattson@google.com> wrote:
>>
>> On Wed, Jul 7, 2021 at 10:08 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
>>>
>>> On Wed, Jul 7, 2021 at 12:42 PM Jim Mattson <jmattson@google.com> wrote:
>>>>
>>>> On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost <ehabkost@redhat.com> wrote:
>>>>>
>>>>> CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware
>>>>> that the definition of "supported CPU features" will probably become a
>>>>> bit more complex in the future.
>>>>
>>>> Has there ever been a clear definition? Family, model, and stepping,
>>>> for instance: are these the only values supported? That would make
>>>> cross-platform migration impossible. What about the vendor string? Is
>>>> that the only value supported? That would make cross-vendor migration
>>>> impossible. For the maximum input value for basic CPUID information
>>>> (CPUID.0H:EAX), is that the only value supported, or is it the maximum
>>>> value supported? On the various individual feature bits, does a '1'
>>>> imply that '0' is also supported, or is '1' the only value supported?
>>>> What about the feature bits with reversed polarity (e.g.
>>>> CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])?
>>>>
>>>> This API has never made sense to me. I have no idea how to interpret
>>>> what it is telling me.
>>>
>>> Is this about GET_SUPPORTED_CPUID, QEMU's query-cpu-model-expansion &
>>> related commands, or the libvirt CPU APIs?
>>
>> This is my ongoing rant about KVM_GET_SUPPORTED_CPUID.
>>
> 
> I agree the definition is not clear. I have tried to enumerate below
> what QEMU assumes about the return value of KVM_GET_SUPPORTED_CPUID.
> These are a collection of workarounds and feature-specific rules that
> are encoded in the kvm_arch_get_supported_cpuid()
> x86_cpu_filter_features(), and cpu_x86_cpuid() functions in QEMU.
> 
> 1. Passing through the returned values (unchanged) from
> KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID is assumed to be always safe,
> as long as the ability to save/resume VCPU state is not required.
> (This is the behavior implemented by "-cpu host,migratable=off")

Right, this is basically the definition of KVM_GET_SUPPORTED_CPUID.

> 2. The safety of setting a bit to a different value requires specific
> knowledge about the CPUID bit.
> 2.1. For a specific set of registers (see below), QEMU assumes it's
> safe to set the bit to 0 when KVM_GET_SUPPORTED_CPUID returns 1.
> 2.2. For a few specific leaves (see below), there are more complex rules.
> 2.4. For all other leaves, QEMU doesn't use the return value of
> KVM_GET_SUPPORTED_CPUID at all (AFAICS).
> 
> 
> The CPUID leaves mentioned in 2.1 are:
> 
> CPUID[1].EDX
> CPUID[1].ECX
> CPUID[6].EAX
> CPUID[EAX=7,ECX=0].EBX
> - This unfortunately includes de-feature bits like FDP_EXCPTN_ONLY and
> ZERO_FCS_FDS
> CPUID[EAX=7,ECX=0].ECX
> CPUID[EAX=7,ECX=0].EDX
> CPUID[EAX=7,ECX=1].EAX
> CPUID[EAX=0Dh,ECX=0].EAX
> CPUID[EAX=0Dh,ECX=0].EDX
> CPUID[EAX=0Dh,ECX=1].EAX
> - Note that CPUID[0Dh] has additional logic to ensure XSAVE component
> info on CPUID is consistent
> CPUID[40000001h].EAX
> CPUID[40000001h].EDX
> CPUID[80000001h].EDX
> CPUID[80000001h].ECX
> CPUID[80000007h].EDX
> CPUID[80000008h].EBX
> CPUID[8000000Ah].EDX
> CPUID[C0000001h].EDX

Plus all unknown leaves.

> 
> Some of the CPUID leaves mentioned in 2.2 are:
> 
> CPUID[1].ECX.HYPERVISOR[bit 31]
> - Can be enabled unconditionally
> CPUID[1].ECX.TSC_DEADLINE_TIMER[bit 24]
> - Can be set to 1 if using the in-kernel irqchip and
> KVM_CAP_TSC_DEADLINE_TIMER is enabled
> CPUID[1].ECX.X2APIC[bit 21]
> - Can be set to 1 if using the in-kernel irqchip
> CPUID[1].ECX.MONITOR[bit 3]
> - Can be set to 1 if KVM_X86_DISABLE_EXITS_MWAIT is enabled

Can always be set to 1, but only makes sense to do so if 
KVM_X86_DISABLE_EXITS_MWAIT is enabled.

> CPUID[6].EAX.ARAT[bit 2]
> - Can be enabled unconditionally
> CPUID[EAX=7,ECX=0].EDX.ARCH_CAPABILITIES
> - Workaround for KVM bug in Linux v4.17-v4.20
> CPUID[EAX=14h,ECX=0], CPUID{EAX=14h,ECX=1]
> - Most bits must match the host, unless
> CPUID[EAX=7,ECX=0].EBX.INTEL_PT[bit 25] is 0
> CPUID[80000001h].EDX
> - AMD-specific feature flag aliases can be set based on CPUID[1].EDX
> CPUID[40000001h].EAX
> - KVM_FEATURE_PV_UNHALT requires in-kernel irqchip
> - KVM_FEATURE_MSI_EXT_DEST_ID requires split irqchip
> CPUID[40000001].EDX.KVM_HINTS_REALTIME
> - Can be enabled unconditionally

This should apply to all of CPUID[4000_0001h].EDX in the future

Thanks Eduardo, this is a great start for kernel-side documentation! 
I'll wrap it in a patch.

Paolo


^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2021-07-08  9:20 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 20:57 [PATCH 0/4] TSX force abort Pawan Gupta
2021-06-09 20:58 ` [PATCH 1/4] x86/msr: Define new bits in TSX_FORCE_ABORT MSR Pawan Gupta
2021-06-11  8:39   ` Borislav Petkov
2021-06-11 21:31     ` Pawan Gupta
2021-06-09 21:12 ` [PATCH 2/4] perf/x86/intel: Do not deploy workaround when TSX is deprecated Pawan Gupta
2021-06-11  7:50   ` Borislav Petkov
2021-06-11 21:34     ` Pawan Gupta
2021-06-11 22:01       ` Borislav Petkov
2021-06-11 23:21         ` Pawan Gupta
2021-06-09 21:13 ` [PATCH 3/4] x86/tsx: Clear CPUID bits when TSX always force aborts Pawan Gupta
2021-06-11 10:03   ` Borislav Petkov
2021-06-11 21:36     ` Pawan Gupta
2021-06-09 21:14 ` [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE Pawan Gupta
2021-06-11 10:06   ` Borislav Petkov
2021-06-11 21:37     ` Pawan Gupta
2021-07-06 19:52   ` Eduardo Habkost
2021-07-06 21:05     ` Paolo Bonzini
2021-07-06 21:33       ` Eduardo Habkost
2021-07-06 21:58         ` Paolo Bonzini
2021-07-07 15:08           ` Eduardo Habkost
2021-07-07 16:42             ` Jim Mattson
2021-07-07 17:08               ` Eduardo Habkost
2021-07-07 17:15                 ` Jim Mattson
2021-07-07 18:23                   ` Eduardo Habkost
2021-07-08  9:15                     ` Paolo Bonzini
2021-07-06 21:16     ` Pawan Gupta
2021-07-06 21:19       ` Eduardo Habkost
2021-07-06 22:51         ` Pawan Gupta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).