All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
@ 2022-04-11 18:01 Jon Kohler
  2022-04-11 19:26 ` Dave Hansen
  2022-04-11 20:07 ` [PATCH v2] " Jon Kohler
  0 siblings, 2 replies; 14+ messages in thread
From: Jon Kohler @ 2022-04-11 18:01 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Tony Luck, Jon Kohler, Andi Kleen, Pawan Gupta,
	linux-kernel
  Cc: Borislav Petkov, Neelima Krishnan, kvm @ vger . kernel . org

Move automatic disablement for TSX microcode deprecation from tsx_init() to
x86_get_tsx_auto_mode(), such that systems with tsx=on will continue to
see the TSX CPU features (HLE, RTM) even on updated microcode.

KVM live migration could be possibly be broken in 5.14+ commit 293649307ef9
("x86/tsx: Clear CPUID bits when TSX always force aborts"). Consider the
following scenario:

1. KVM hosts clustered in a live migration capable setup.
2. KVM guests have TSX CPU features HLE and/or RTM presented.
3. One of the three maintenance events occur:
3a. An existing host running kernel >= 5.14 in the pool updated with the
    new microcode.
3b. A new host running kernel >= 5.14 is commissioned that already has the
    microcode update preloaded.
3c. All hosts are running kernel < 5.14 with microcode update already
    loaded and one existing host gets updated to kernel >= 5.14.
4. After maintenance event, the impacted host will not have HLE and RTM
   exposed, and live migrations with guests with TSX features might not
   migrate.

Users using tsx=on or CONFIG_X86_INTEL_TSX_MODE_ON should always see
HLE and RTM on capable Intel SKUs, even if microcode has been clubbed to
prevent functionality.

Users using tsx=auto get or CONFIG_X86_INTEL_TSX_MODE_AUTO get to roll the
dice with whatever the kernel believes the appropriate default is, which
includes the feature disappearing after a kernel and/or microcode update.
These users should consider masking HLE and RTM at a higher control plane
level, e.g. qemu or libvirt, such that guests on TSX enabled systems do not
see HLE/RTM and therefore do not enable TAA mitigation.

Fixes: 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts")

Signed-off-by: Jon Kohler <jon@nutanix.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Neelima Krishnan <neelima.krishnan@intel.com>
Cc: kvm@vger.kernel.org <kvm@vger.kernel.org>
---
 arch/x86/kernel/cpu/tsx.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c
index 9c7a5f049292..a24e5e471e3f 100644
--- a/arch/x86/kernel/cpu/tsx.c
+++ b/arch/x86/kernel/cpu/tsx.c
@@ -78,6 +78,20 @@ static bool __init tsx_ctrl_is_supported(void)
 
 static enum tsx_ctrl_states x86_get_tsx_auto_mode(void)
 {
+	/*
+	 * Hardware will always abort a TSX transaction if both CPUID bits
+	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
+	 * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
+	 * here.
+	 */
+	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
+	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
+		tsx_clear_cpuid();
+		setup_clear_cpu_cap(X86_FEATURE_RTM);
+		setup_clear_cpu_cap(X86_FEATURE_HLE);
+		return TSX_CTRL_RTM_ALWAYS_ABORT;
+	}
+
 	if (boot_cpu_has_bug(X86_BUG_TAA))
 		return TSX_CTRL_DISABLE;
 
@@ -105,21 +119,6 @@ void __init tsx_init(void)
 	char arg[5] = {};
 	int ret;
 
-	/*
-	 * Hardware will always abort a TSX transaction if both CPUID bits
-	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
-	 * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
-	 * here.
-	 */
-	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
-	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
-		tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
-		tsx_clear_cpuid();
-		setup_clear_cpu_cap(X86_FEATURE_RTM);
-		setup_clear_cpu_cap(X86_FEATURE_HLE);
-		return;
-	}
-
 	if (!tsx_ctrl_is_supported()) {
 		tsx_ctrl_state = TSX_CTRL_NOT_SUPPORTED;
 		return;
-- 
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-11 18:01 [PATCH] x86/tsx: fix KVM guest live migration for tsx=on Jon Kohler
@ 2022-04-11 19:26 ` Dave Hansen
  2022-04-11 19:35   ` Jon Kohler
  2022-04-11 20:07 ` [PATCH v2] " Jon Kohler
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Hansen @ 2022-04-11 19:26 UTC (permalink / raw)
  To: Jon Kohler, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Tony Luck, Andi Kleen,
	Pawan Gupta, linux-kernel
  Cc: Borislav Petkov, Neelima Krishnan, kvm @ vger . kernel . org

On 4/11/22 11:01, Jon Kohler wrote:
>  static enum tsx_ctrl_states x86_get_tsx_auto_mode(void)
>  {
> +	/*
> +	 * Hardware will always abort a TSX transaction if both CPUID bits
> +	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
> +	 * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
> +	 * here.
> +	 */
> +	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
> +	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
> +		tsx_clear_cpuid();
> +		setup_clear_cpu_cap(X86_FEATURE_RTM);
> +		setup_clear_cpu_cap(X86_FEATURE_HLE);
> +		return TSX_CTRL_RTM_ALWAYS_ABORT;
> +	}

I don't really like hiding the setup_clear_cpu_cap() like this.  Right
now, all of the setup_clear_cpu_cap()'s are in a single function and
they are pretty easy to figure out.

This seems like logic that deserves to be appended down to the last if()
block of code in tsx_init() instead of squirreled away in a "get mode"
function.  Does this work?

        if (tsx_ctrl_state == TSX_CTRL_DISABLE) {
		...
        } else if (tsx_ctrl_state == TSX_CTRL_ENABLE) {
		...	
        } else if (tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT) {
		tsx_clear_cpuid();

		setup_clear_cpu_cap(X86_FEATURE_RTM);
		setup_clear_cpu_cap(X86_FEATURE_HLE);
	}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-11 19:26 ` Dave Hansen
@ 2022-04-11 19:35   ` Jon Kohler
  2022-04-11 23:45     ` Dave Hansen
  0 siblings, 1 reply; 14+ messages in thread
From: Jon Kohler @ 2022-04-11 19:35 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Jon Kohler, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Tony Luck, Andi Kleen,
	Pawan Gupta, linux-kernel, Borislav Petkov, Neelima Krishnan,
	kvm @ vger . kernel . org



> On Apr 11, 2022, at 3:26 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 4/11/22 11:01, Jon Kohler wrote:
>> static enum tsx_ctrl_states x86_get_tsx_auto_mode(void)
>> {
>> +	/*
>> +	 * Hardware will always abort a TSX transaction if both CPUID bits
>> +	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
>> +	 * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
>> +	 * here.
>> +	 */
>> +	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
>> +	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
>> +		tsx_clear_cpuid();
>> +		setup_clear_cpu_cap(X86_FEATURE_RTM);
>> +		setup_clear_cpu_cap(X86_FEATURE_HLE);
>> +		return TSX_CTRL_RTM_ALWAYS_ABORT;
>> +	}
> 
> I don't really like hiding the setup_clear_cpu_cap() like this.  Right
> now, all of the setup_clear_cpu_cap()'s are in a single function and
> they are pretty easy to figure out.
> 
> This seems like logic that deserves to be appended down to the last if()
> block of code in tsx_init() instead of squirreled away in a "get mode"
> function.  Does this work?

Thanks for the review, Dave. Was trying to make the change simple
with just a cut-n-paste of existing code from one place to the other,
but I see what you’re saying. Yea, I can rework the logic as you
suggested, I’ll send out a v2 patch.

Also, while I’ve got you, I’d also like to send out a patch to simply
force abort all transactions even when tsx=on, and just be done with
TSX. Now that we’ve had the patch that introduced this functionality
I’m patching for roughly a year, combined with the microcode going
out, it seems like TSX’s numbered days have come to an end. 

That could greatly simplify the kernels handling of TAA on systems
that have ARCH_CAP_TSX_CTRL_MSR.

Thoughts?

>        if (tsx_ctrl_state == TSX_CTRL_DISABLE) {
> 		...
>        } else if (tsx_ctrl_state == TSX_CTRL_ENABLE) {
> 		...	
>        } else if (tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT) {
> 		tsx_clear_cpuid();
> 
> 		setup_clear_cpu_cap(X86_FEATURE_RTM);
> 		setup_clear_cpu_cap(X86_FEATURE_HLE);
> 	}
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-11 18:01 [PATCH] x86/tsx: fix KVM guest live migration for tsx=on Jon Kohler
  2022-04-11 19:26 ` Dave Hansen
@ 2022-04-11 20:07 ` Jon Kohler
  2022-04-12 19:55   ` Pawan Gupta
  2022-04-12 20:54   ` Pawan Gupta
  1 sibling, 2 replies; 14+ messages in thread
From: Jon Kohler @ 2022-04-11 20:07 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andi Kleen, Pawan Gupta, Jon Kohler, Tony Luck,
	linux-kernel
  Cc: dave.hansen, Borislav Petkov, Neelima Krishnan,
	kvm @ vger . kernel . org

Move automatic disablement for TSX microcode deprecation from tsx_init() to
x86_get_tsx_auto_mode(), such that systems with tsx=on will continue to
see the TSX CPU features (HLE, RTM) even on updated microcode.

KVM live migration could be possibly be broken in 5.14+ commit 293649307ef9
("x86/tsx: Clear CPUID bits when TSX always force aborts"). Consider the
following scenario:

1. KVM hosts clustered in a live migration capable setup.
2. KVM guests have TSX CPU features HLE and/or RTM presented.
3. One of the three maintenance events occur:
3a. An existing host running kernel >= 5.14 in the pool updated with the
    new microcode.
3b. A new host running kernel >= 5.14 is commissioned that already has the
    microcode update preloaded.
3c. All hosts are running kernel < 5.14 with microcode update already
    loaded and one existing host gets updated to kernel >= 5.14.
4. After maintenance event, the impacted host will not have HLE and RTM
   exposed, and live migrations with guests with TSX features might not
   migrate.

Users using tsx=on or CONFIG_X86_INTEL_TSX_MODE_ON should always see
HLE and RTM on capable Intel SKUs, even if microcode has been clubbed to
prevent functionality.

Users using tsx=auto get or CONFIG_X86_INTEL_TSX_MODE_AUTO get to roll the
dice with whatever the kernel believes the appropriate default is, which
includes the feature disappearing after a kernel and/or microcode update.
These users should consider masking HLE and RTM at a higher control plane
level, e.g. qemu or libvirt, such that guests on TSX enabled systems do not
see HLE/RTM and therefore do not enable TAA mitigation.

Fixes: 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts")

Signed-off-by: Jon Kohler <jon@nutanix.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Neelima Krishnan <neelima.krishnan@intel.com>
Cc: kvm@vger.kernel.org <kvm@vger.kernel.org>
---
v1 -> v2:
 - Addressed comments on approach from Dave.

 arch/x86/kernel/cpu/tsx.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c
index 9c7a5f049292..4b701fa64869 100644
--- a/arch/x86/kernel/cpu/tsx.c
+++ b/arch/x86/kernel/cpu/tsx.c
@@ -78,6 +78,10 @@ static bool __init tsx_ctrl_is_supported(void)

 static enum tsx_ctrl_states x86_get_tsx_auto_mode(void)
 {
+	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
+	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT))
+		return TSX_CTRL_RTM_ALWAYS_ABORT;
+
 	if (boot_cpu_has_bug(X86_BUG_TAA))
 		return TSX_CTRL_DISABLE;

@@ -105,21 +109,6 @@ void __init tsx_init(void)
 	char arg[5] = {};
 	int ret;

-	/*
-	 * Hardware will always abort a TSX transaction if both CPUID bits
-	 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
-	 * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
-	 * here.
-	 */
-	if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
-	    boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
-		tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
-		tsx_clear_cpuid();
-		setup_clear_cpu_cap(X86_FEATURE_RTM);
-		setup_clear_cpu_cap(X86_FEATURE_HLE);
-		return;
-	}
-
 	if (!tsx_ctrl_is_supported()) {
 		tsx_ctrl_state = TSX_CTRL_NOT_SUPPORTED;
 		return;
@@ -173,5 +162,16 @@ void __init tsx_init(void)
 		 */
 		setup_force_cpu_cap(X86_FEATURE_RTM);
 		setup_force_cpu_cap(X86_FEATURE_HLE);
+	} else if (tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT) {
+
+		/*
+		 * Hardware will always abort a TSX transaction if both CPUID bits
+		 * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
+		 * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
+		 * here.
+		 */
+		tsx_clear_cpuid();
+		setup_clear_cpu_cap(X86_FEATURE_RTM);
+		setup_clear_cpu_cap(X86_FEATURE_HLE);
 	}
 }
--
2.30.1 (Apple Git-130)


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-11 19:35   ` Jon Kohler
@ 2022-04-11 23:45     ` Dave Hansen
  2022-04-12 13:36       ` Jon Kohler
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Hansen @ 2022-04-11 23:45 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Tony Luck, Andi Kleen, Pawan Gupta, linux-kernel,
	Borislav Petkov, Neelima Krishnan, kvm @ vger . kernel . org

On 4/11/22 12:35, Jon Kohler wrote:
> Also, while I’ve got you, I’d also like to send out a patch to simply
> force abort all transactions even when tsx=on, and just be done with
> TSX. Now that we’ve had the patch that introduced this functionality
> I’m patching for roughly a year, combined with the microcode going
> out, it seems like TSX’s numbered days have come to an end. 

Could you elaborate a little more here?  Why would we ever want to force
abort transactions that don't need to be aborted for some reason?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-11 23:45     ` Dave Hansen
@ 2022-04-12 13:36       ` Jon Kohler
  2022-04-12 15:54         ` Dave Hansen
  2022-04-12 20:40         ` Pawan Gupta
  0 siblings, 2 replies; 14+ messages in thread
From: Jon Kohler @ 2022-04-12 13:36 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Jon Kohler, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Tony Luck, Andi Kleen,
	Pawan Gupta, linux-kernel, Borislav Petkov, Neelima Krishnan,
	kvm @ vger . kernel . org



> On Apr 11, 2022, at 7:45 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 4/11/22 12:35, Jon Kohler wrote:
>> Also, while I’ve got you, I’d also like to send out a patch to simply
>> force abort all transactions even when tsx=on, and just be done with
>> TSX. Now that we’ve had the patch that introduced this functionality
>> I’m patching for roughly a year, combined with the microcode going
>> out, it seems like TSX’s numbered days have come to an end. 
> 
> Could you elaborate a little more here?  Why would we ever want to force
> abort transactions that don't need to be aborted for some reason?

Sure, I'm talking specifically about when users of tsx=on (or
CONFIG_X86_INTEL_TSX_MODE_ON) on X86_BUG_TAA CPU SKUs. In this situation,
TSX features are enabled, as are TAA mitigations. Using our own use case
as an example, we only do this because of legacy live migration reasons.

This is fine on Skylake (because we're signed up for MDS mitigation anyhow)
and fine on Ice Lake because TAA_NO=1; however this is wicked painful on
Cascade Lake, because MDS_NO=1 and TAA_NO=0, so we're still signed up for
TAA mitigation by default. On CLX, this hits us on host syscalls as well as
vmexits with the mds clear on every one :(

So tsx=on is this oddball for us, because if we switch to auto, we'll break
live migration for some of our customers (but TAA overhead is gone), but
if we leave tsx=on, we keep the feature enabled (but no one likely uses it)
and still have to pay the TAA tax even if a customer doesn't use it.

So my theory here is to extend the logical effort of the microcode driven
automatic disablement as well as the tsx=auto automatic disablement and
have tsx=on force abort all transactions on X86_BUG_TAA SKUs, but leave
the CPU features enumerated to maintain live migration.

This would still leave TSX totally good on Ice Lake / non-buggy systems.

If it would help, I'm working up an RFC patch, and we could discuss there?

In the mean time, I did send out a v2 patch for this series addressing your
comments.

Thanks again,
Jon

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-12 13:36       ` Jon Kohler
@ 2022-04-12 15:54         ` Dave Hansen
  2022-04-12 16:08           ` Jon Kohler
  2022-04-12 20:40         ` Pawan Gupta
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Hansen @ 2022-04-12 15:54 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Tony Luck, Andi Kleen, Pawan Gupta, linux-kernel,
	Borislav Petkov, Neelima Krishnan, kvm @ vger . kernel . org

On 4/12/22 06:36, Jon Kohler wrote:
> So my theory here is to extend the logical effort of the microcode driven
> automatic disablement as well as the tsx=auto automatic disablement and
> have tsx=on force abort all transactions on X86_BUG_TAA SKUs, but leave
> the CPU features enumerated to maintain live migration.
> 
> This would still leave TSX totally good on Ice Lake / non-buggy systems.
> 
> If it would help, I'm working up an RFC patch, and we could discuss there?

Sure.  But, it sounds like you really want a new tdx=something rather
than to muck with tsx=on behavior.  Surely someone else will come along
and complain that we broke their TDX setup if we change its behavior.

Maybe you should just pay the one-time cost and move your whole fleet
over to tsx=off if you truly believe nobody is using it.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-12 15:54         ` Dave Hansen
@ 2022-04-12 16:08           ` Jon Kohler
  2022-04-12 18:04             ` Pawan Gupta
  0 siblings, 1 reply; 14+ messages in thread
From: Jon Kohler @ 2022-04-12 16:08 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Jon Kohler, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Tony Luck, Andi Kleen,
	Pawan Gupta, linux-kernel, Borislav Petkov, Neelima Krishnan,
	kvm @ vger . kernel . org



> On Apr 12, 2022, at 11:54 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 4/12/22 06:36, Jon Kohler wrote:
>> So my theory here is to extend the logical effort of the microcode driven
>> automatic disablement as well as the tsx=auto automatic disablement and
>> have tsx=on force abort all transactions on X86_BUG_TAA SKUs, but leave
>> the CPU features enumerated to maintain live migration.
>> 
>> This would still leave TSX totally good on Ice Lake / non-buggy systems.
>> 
>> If it would help, I'm working up an RFC patch, and we could discuss there?
> 
> Sure.  But, it sounds like you really want a new tdx=something rather
> than to muck with tsx=on behavior.  Surely someone else will come along
> and complain that we broke their TDX setup if we change its behavior.

Good point, there will always be a squeaky wheel. I’ll work that into the RFC,
I’ll do something like tsx=compat and see how it shapes up. 

To be fair though, this commit I’m patching with this series would break
setups as they apply 5.14+ and the microcode update, but you have a 
good point for certain.

> 
> Maybe you should just pay the one-time cost and move your whole fleet
> over to tsx=off if you truly believe nobody is using it.
> 

Trust me, I’d love to do that; however:
We’ve thousands of hosts across thousands of unique customers,
which aren't managed as a centralized service (customers manage them directly),
so doing that would require each individual customer to organize a full power
cycle for all of their VMs prior to an upgrade to tsx=off hosts.

That said, we are marching in that direction, we're shipping a control plane
update that will mask HLE and RTM after power cycles, but that requires
customers to apply that control plane update, then power cycle everything. Just
means that we've begun the feature deprecation now, it will take years to fully
bleed off without having customers to micro manage full power cycles.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-12 16:08           ` Jon Kohler
@ 2022-04-12 18:04             ` Pawan Gupta
  2022-04-12 18:12               ` Jon Kohler
  0 siblings, 1 reply; 14+ messages in thread
From: Pawan Gupta @ 2022-04-12 18:04 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Tony Luck, Andi Kleen,
	linux-kernel, Borislav Petkov, Neelima Krishnan,
	kvm @ vger . kernel . org

On Tue, Apr 12, 2022 at 04:08:32PM +0000, Jon Kohler wrote:
>
>
>> On Apr 12, 2022, at 11:54 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>>
>> On 4/12/22 06:36, Jon Kohler wrote:
>>> So my theory here is to extend the logical effort of the microcode driven
>>> automatic disablement as well as the tsx=auto automatic disablement and
>>> have tsx=on force abort all transactions on X86_BUG_TAA SKUs, but leave
>>> the CPU features enumerated to maintain live migration.
>>>
>>> This would still leave TSX totally good on Ice Lake / non-buggy systems.
>>>
>>> If it would help, I'm working up an RFC patch, and we could discuss there?
>>
>> Sure.  But, it sounds like you really want a new tdx=something rather
>> than to muck with tsx=on behavior.  Surely someone else will come along
>> and complain that we broke their TDX setup if we change its behavior.
>
>Good point, there will always be a squeaky wheel. I’ll work that into the RFC,
>I’ll do something like tsx=compat and see how it shapes up.

FYI, the original series had tsx=fake, that would have taken care of
this breakage.

   https://lore.kernel.org/lkml/de6b97a567e273adff1f5268998692bad548aa10.1623272033.git-series.pawan.kumar.gupta@linux.intel.com/

For the lack of real world use-cases at that time, this patch was dropped.

Thanks,
Pawan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-12 18:04             ` Pawan Gupta
@ 2022-04-12 18:12               ` Jon Kohler
  0 siblings, 0 replies; 14+ messages in thread
From: Jon Kohler @ 2022-04-12 18:12 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Jon Kohler, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Tony Luck,
	Andi Kleen, linux-kernel, Borislav Petkov, Neelima Krishnan,
	kvm @ vger . kernel . org



> On Apr 12, 2022, at 2:04 PM, Pawan Gupta <pawan.kumar.gupta@linux.intel.com> wrote:
> 
> On Tue, Apr 12, 2022 at 04:08:32PM +0000, Jon Kohler wrote:
>> 
>> 
>>> On Apr 12, 2022, at 11:54 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>>> 
>>> On 4/12/22 06:36, Jon Kohler wrote:
>>>> So my theory here is to extend the logical effort of the microcode driven
>>>> automatic disablement as well as the tsx=auto automatic disablement and
>>>> have tsx=on force abort all transactions on X86_BUG_TAA SKUs, but leave
>>>> the CPU features enumerated to maintain live migration.
>>>> 
>>>> This would still leave TSX totally good on Ice Lake / non-buggy systems.
>>>> 
>>>> If it would help, I'm working up an RFC patch, and we could discuss there?
>>> 
>>> Sure.  But, it sounds like you really want a new tdx=something rather
>>> than to muck with tsx=on behavior.  Surely someone else will come along
>>> and complain that we broke their TDX setup if we change its behavior.
>> 
>> Good point, there will always be a squeaky wheel. I’ll work that into the RFC,
>> I’ll do something like tsx=compat and see how it shapes up.
> 
> FYI, the original series had tsx=fake, that would have taken care of
> this breakage.

Fake sounds way better than compat, which is what I had :) 

My RFC code looks similar to your patch, I’ll combine the
approaches and send it out shortly, almost done

> 
>  https://urldefense.proofpoint.com/v2/url?u=https-3A__lore.kernel.org_lkml_de6b97a567e273adff1f5268998692bad548aa10.1623272033.git-2Dseries.pawan.kumar.gupta-40linux.intel.com_&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=AgPWHzCORdn5x5rYXE0QeJ2yf158HOjDA5Bn8udzp-m6i9V9s7S_jtSiLog-dk93&s=kR74kfovpa0zOK0tZ2Ss9xbg2aRLI5oocB_cp_6DLkg&e= 
> For the lack of real world use-cases at that time, this patch was dropped.
> 
> Thanks,
> Pawan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-11 20:07 ` [PATCH v2] " Jon Kohler
@ 2022-04-12 19:55   ` Pawan Gupta
  2022-04-12 20:54   ` Pawan Gupta
  1 sibling, 0 replies; 14+ messages in thread
From: Pawan Gupta @ 2022-04-12 19:55 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andi Kleen, Tony Luck, linux-kernel, dave.hansen,
	Borislav Petkov, Neelima Krishnan, kvm @ vger . kernel . org

On Mon, Apr 11, 2022 at 04:07:01PM -0400, Jon Kohler wrote:
>Move automatic disablement for TSX microcode deprecation from tsx_init() to
>x86_get_tsx_auto_mode(), such that systems with tsx=on will continue to
>see the TSX CPU features (HLE, RTM) even on updated microcode.

This patch needs to be based on recent changes in TSX handling (due to
Feb 2022 microcode update). These patches were recently merged in tip
tree:

   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86/urgent

Specifically these patches:

   x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits [1]
   x86/tsx: Disable TSX development mode at boot [2]

Thanks,
Pawan

[1] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=258f3b8c3210b03386e4ad92b4bd8652b5c1beb3
[2] https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=400331f8ffa3bec5c561417e5eec6848464e9160

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-12 13:36       ` Jon Kohler
  2022-04-12 15:54         ` Dave Hansen
@ 2022-04-12 20:40         ` Pawan Gupta
  2022-04-13 12:43           ` Jon Kohler
  1 sibling, 1 reply; 14+ messages in thread
From: Pawan Gupta @ 2022-04-12 20:40 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	Dave Hansen, x86, H. Peter Anvin, Tony Luck, Andi Kleen,
	linux-kernel, Borislav Petkov, Neelima Krishnan,
	kvm @ vger . kernel . org

On Tue, Apr 12, 2022 at 01:36:20PM +0000, Jon Kohler wrote:
>
>
>> On Apr 11, 2022, at 7:45 PM, Dave Hansen <dave.hansen@intel.com> wrote:
>>
>> On 4/11/22 12:35, Jon Kohler wrote:
>>> Also, while I’ve got you, I’d also like to send out a patch to simply
>>> force abort all transactions even when tsx=on, and just be done with
>>> TSX. Now that we’ve had the patch that introduced this functionality
>>> I’m patching for roughly a year, combined with the microcode going
>>> out, it seems like TSX’s numbered days have come to an end.
>>
>> Could you elaborate a little more here?  Why would we ever want to force
>> abort transactions that don't need to be aborted for some reason?
>
>Sure, I'm talking specifically about when users of tsx=on (or
>CONFIG_X86_INTEL_TSX_MODE_ON) on X86_BUG_TAA CPU SKUs. In this situation,
>TSX features are enabled, as are TAA mitigations. Using our own use case
>as an example, we only do this because of legacy live migration reasons.
>
>This is fine on Skylake (because we're signed up for MDS mitigation anyhow)
>and fine on Ice Lake because TAA_NO=1; however this is wicked painful on
>Cascade Lake, because MDS_NO=1 and TAA_NO=0, so we're still signed up for
>TAA mitigation by default. On CLX, this hits us on host syscalls as well as
>vmexits with the mds clear on every one :(
>
>So tsx=on is this oddball for us, because if we switch to auto, we'll break
>live migration for some of our customers (but TAA overhead is gone), but
>if we leave tsx=on, we keep the feature enabled (but no one likely uses it)
>and still have to pay the TAA tax even if a customer doesn't use it.
>
>So my theory here is to extend the logical effort of the microcode driven
>automatic disablement as well as the tsx=auto automatic disablement and
>have tsx=on force abort all transactions on X86_BUG_TAA SKUs, but leave
>the CPU features enumerated to maintain live migration.

This won't help on CLX as server parts did not get the microcode driven
automatic disablement. On CLX CPUID.RTM_ALWAYS_ABORT will not be set.

What could work on CLX is TSX_CTRL_RTM_DISABLE=1 and
TSX_CTRL_CPUID_CLEAR=0. This can be done for tsx=auto or with a new mode
tsx=fake|compat. IMO, adding a new mode would be better, otherwise
tsx=auto behavior will differ depending on the kernel version.

Provided that software using TSX is following below guidance [*]:

   When Intel TSX is disabled at runtime using TSX_CTRL, but the CPUID
   enumeration of Intel TSX is not cleared, existing software using RTM may
   see aborts for every transaction. The abort will always return a 0
   status code in EAX after XBEGIN. When the software does a number of
   transaction retries, it should never retry for a 0 status value, but go
   to the nontransactional fall back path immediately.

Thanks,
Pawan

[*] TAA document: section -> Implications on Intel TSX software
     https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/intel-tsx-asynchronous-abort.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-11 20:07 ` [PATCH v2] " Jon Kohler
  2022-04-12 19:55   ` Pawan Gupta
@ 2022-04-12 20:54   ` Pawan Gupta
  1 sibling, 0 replies; 14+ messages in thread
From: Pawan Gupta @ 2022-04-12 20:54 UTC (permalink / raw)
  To: Jon Kohler
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Andi Kleen, Tony Luck, linux-kernel, dave.hansen,
	Borislav Petkov, Neelima Krishnan, kvm @ vger . kernel . org

On Mon, Apr 11, 2022 at 04:07:01PM -0400, Jon Kohler wrote:
>Move automatic disablement for TSX microcode deprecation from tsx_init() to
>x86_get_tsx_auto_mode(), such that systems with tsx=on will continue to
>see the TSX CPU features (HLE, RTM) even on updated microcode.
>
>KVM live migration could be possibly be broken in 5.14+ commit 293649307ef9
>("x86/tsx: Clear CPUID bits when TSX always force aborts"). Consider the
>following scenario:
>
>1. KVM hosts clustered in a live migration capable setup.
>2. KVM guests have TSX CPU features HLE and/or RTM presented.
>3. One of the three maintenance events occur:
>3a. An existing host running kernel >= 5.14 in the pool updated with the
>    new microcode.
>3b. A new host running kernel >= 5.14 is commissioned that already has the
>    microcode update preloaded.
>3c. All hosts are running kernel < 5.14 with microcode update already
>    loaded and one existing host gets updated to kernel >= 5.14.
>4. After maintenance event, the impacted host will not have HLE and RTM
>   exposed, and live migrations with guests with TSX features might not
>   migrate.

Which part was this reproduced on? AFAIK server parts(except for some
Intel Xeon E3s) did not get such microcode update.

Thanks,
Pawan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
  2022-04-12 20:40         ` Pawan Gupta
@ 2022-04-13 12:43           ` Jon Kohler
  0 siblings, 0 replies; 14+ messages in thread
From: Jon Kohler @ 2022-04-13 12:43 UTC (permalink / raw)
  To: Pawan Gupta
  Cc: Jon Kohler, Dave Hansen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Tony Luck,
	Andi Kleen, linux-kernel, Borislav Petkov, Neelima Krishnan,
	kvm @ vger . kernel . org



> On Apr 12, 2022, at 4:40 PM, Pawan Gupta <pawan.kumar.gupta@linux.intel.com> wrote:
> 
> On Tue, Apr 12, 2022 at 01:36:20PM +0000, Jon Kohler wrote:
>> 
>> 
>>> On Apr 11, 2022, at 7:45 PM, Dave Hansen <dave.hansen@intel.com> wrote:
>>> 
>>> On 4/11/22 12:35, Jon Kohler wrote:
>>>> Also, while I’ve got you, I’d also like to send out a patch to simply
>>>> force abort all transactions even when tsx=on, and just be done with
>>>> TSX. Now that we’ve had the patch that introduced this functionality
>>>> I’m patching for roughly a year, combined with the microcode going
>>>> out, it seems like TSX’s numbered days have come to an end.
>>> 
>>> Could you elaborate a little more here?  Why would we ever want to force
>>> abort transactions that don't need to be aborted for some reason?
>> 
>> Sure, I'm talking specifically about when users of tsx=on (or
>> CONFIG_X86_INTEL_TSX_MODE_ON) on X86_BUG_TAA CPU SKUs. In this situation,
>> TSX features are enabled, as are TAA mitigations. Using our own use case
>> as an example, we only do this because of legacy live migration reasons.
>> 
>> This is fine on Skylake (because we're signed up for MDS mitigation anyhow)
>> and fine on Ice Lake because TAA_NO=1; however this is wicked painful on
>> Cascade Lake, because MDS_NO=1 and TAA_NO=0, so we're still signed up for
>> TAA mitigation by default. On CLX, this hits us on host syscalls as well as
>> vmexits with the mds clear on every one :(
>> 
>> So tsx=on is this oddball for us, because if we switch to auto, we'll break
>> live migration for some of our customers (but TAA overhead is gone), but
>> if we leave tsx=on, we keep the feature enabled (but no one likely uses it)
>> and still have to pay the TAA tax even if a customer doesn't use it.
>> 
>> So my theory here is to extend the logical effort of the microcode driven
>> automatic disablement as well as the tsx=auto automatic disablement and
>> have tsx=on force abort all transactions on X86_BUG_TAA SKUs, but leave
>> the CPU features enumerated to maintain live migration.
> 
> This won't help on CLX as server parts did not get the microcode driven
> automatic disablement. On CLX CPUID.RTM_ALWAYS_ABORT will not be set.
> 
> What could work on CLX is TSX_CTRL_RTM_DISABLE=1 and
> TSX_CTRL_CPUID_CLEAR=0. This can be done for tsx=auto or with a new mode
> tsx=fake|compat. IMO, adding a new mode would be better, otherwise
> tsx=auto behavior will differ depending on the kernel version.

Thanks for the guidance, Pawan, I appreciate it. This is exactly the
approach my other patch is taking. Need to do a bit more review and
testing and ill get the RFC out

> 
> Provided that software using TSX is following below guidance [*]:
> 
>  When Intel TSX is disabled at runtime using TSX_CTRL, but the CPUID
>  enumeration of Intel TSX is not cleared, existing software using RTM may
>  see aborts for every transaction. The abort will always return a 0
>  status code in EAX after XBEGIN. When the software does a number of
>  transaction retries, it should never retry for a 0 status value, but go
>  to the nontransactional fall back path immediately.
> 
> Thanks,
> Pawan
> 
> [*] TAA document: section -> Implications on Intel TSX software
>    https://urldefense.proofpoint.com/v2/url?u=https-3A__www.intel.com_content_www_us_en_developer_articles_technical_software-2Dsecurity-2Dguidance_technical-2Ddocumentation_intel-2Dtsx-2Dasynchronous-2Dabort.html&d=DwIDaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=-yy3gpUOG7W2s79bE3KTnzd9h32x038M5CkPkhFsUW22MWWzcf3SoX6An2835zrn&s=t85c0qBMosrY_UvEVGzkR4j125aGfHju3SFEEPAImpQ&e=


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-04-13 12:44 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-11 18:01 [PATCH] x86/tsx: fix KVM guest live migration for tsx=on Jon Kohler
2022-04-11 19:26 ` Dave Hansen
2022-04-11 19:35   ` Jon Kohler
2022-04-11 23:45     ` Dave Hansen
2022-04-12 13:36       ` Jon Kohler
2022-04-12 15:54         ` Dave Hansen
2022-04-12 16:08           ` Jon Kohler
2022-04-12 18:04             ` Pawan Gupta
2022-04-12 18:12               ` Jon Kohler
2022-04-12 20:40         ` Pawan Gupta
2022-04-13 12:43           ` Jon Kohler
2022-04-11 20:07 ` [PATCH v2] " Jon Kohler
2022-04-12 19:55   ` Pawan Gupta
2022-04-12 20:54   ` Pawan Gupta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.