All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
To: Jon Kohler <jon@nutanix.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"x86@kernel.org" <x86@kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>, Tony Luck <tony.luck@intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Borislav Petkov <bp@suse.de>,
	Neelima Krishnan <neelima.krishnan@intel.com>,
	"kvm @ vger . kernel . org" <kvm@vger.kernel.org>
Subject: Re: [PATCH] x86/tsx: fix KVM guest live migration for tsx=on
Date: Tue, 12 Apr 2022 13:40:25 -0700	[thread overview]
Message-ID: <20220412204025.evmoxjr5beqindro@guptapa-desk> (raw)
In-Reply-To: <1767A554-CC0A-412D-B70C-12DF0AF4C690@nutanix.com>

On Tue, Apr 12, 2022 at 01:36:20PM +0000, Jon Kohler wrote:
>
>
>> On Apr 11, 2022, at 7:45 PM, Dave Hansen <dave.hansen@intel.com> wrote:
>>
>> On 4/11/22 12:35, Jon Kohler wrote:
>>> Also, while I’ve got you, I’d also like to send out a patch to simply
>>> force abort all transactions even when tsx=on, and just be done with
>>> TSX. Now that we’ve had the patch that introduced this functionality
>>> I’m patching for roughly a year, combined with the microcode going
>>> out, it seems like TSX’s numbered days have come to an end.
>>
>> Could you elaborate a little more here?  Why would we ever want to force
>> abort transactions that don't need to be aborted for some reason?
>
>Sure, I'm talking specifically about when users of tsx=on (or
>CONFIG_X86_INTEL_TSX_MODE_ON) on X86_BUG_TAA CPU SKUs. In this situation,
>TSX features are enabled, as are TAA mitigations. Using our own use case
>as an example, we only do this because of legacy live migration reasons.
>
>This is fine on Skylake (because we're signed up for MDS mitigation anyhow)
>and fine on Ice Lake because TAA_NO=1; however this is wicked painful on
>Cascade Lake, because MDS_NO=1 and TAA_NO=0, so we're still signed up for
>TAA mitigation by default. On CLX, this hits us on host syscalls as well as
>vmexits with the mds clear on every one :(
>
>So tsx=on is this oddball for us, because if we switch to auto, we'll break
>live migration for some of our customers (but TAA overhead is gone), but
>if we leave tsx=on, we keep the feature enabled (but no one likely uses it)
>and still have to pay the TAA tax even if a customer doesn't use it.
>
>So my theory here is to extend the logical effort of the microcode driven
>automatic disablement as well as the tsx=auto automatic disablement and
>have tsx=on force abort all transactions on X86_BUG_TAA SKUs, but leave
>the CPU features enumerated to maintain live migration.

This won't help on CLX as server parts did not get the microcode driven
automatic disablement. On CLX CPUID.RTM_ALWAYS_ABORT will not be set.

What could work on CLX is TSX_CTRL_RTM_DISABLE=1 and
TSX_CTRL_CPUID_CLEAR=0. This can be done for tsx=auto or with a new mode
tsx=fake|compat. IMO, adding a new mode would be better, otherwise
tsx=auto behavior will differ depending on the kernel version.

Provided that software using TSX is following below guidance [*]:

   When Intel TSX is disabled at runtime using TSX_CTRL, but the CPUID
   enumeration of Intel TSX is not cleared, existing software using RTM may
   see aborts for every transaction. The abort will always return a 0
   status code in EAX after XBEGIN. When the software does a number of
   transaction retries, it should never retry for a 0 status value, but go
   to the nontransactional fall back path immediately.

Thanks,
Pawan

[*] TAA document: section -> Implications on Intel TSX software
     https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/intel-tsx-asynchronous-abort.html

  parent reply	other threads:[~2022-04-12 23:21 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-11 18:01 [PATCH] x86/tsx: fix KVM guest live migration for tsx=on Jon Kohler
2022-04-11 19:26 ` Dave Hansen
2022-04-11 19:35   ` Jon Kohler
2022-04-11 23:45     ` Dave Hansen
2022-04-12 13:36       ` Jon Kohler
2022-04-12 15:54         ` Dave Hansen
2022-04-12 16:08           ` Jon Kohler
2022-04-12 18:04             ` Pawan Gupta
2022-04-12 18:12               ` Jon Kohler
2022-04-12 20:40         ` Pawan Gupta [this message]
2022-04-13 12:43           ` Jon Kohler
2022-04-11 20:07 ` [PATCH v2] " Jon Kohler
2022-04-12 19:55   ` Pawan Gupta
2022-04-12 20:54   ` Pawan Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220412204025.evmoxjr5beqindro@guptapa-desk \
    --to=pawan.kumar.gupta@linux.intel.com \
    --cc=ak@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=bp@suse.de \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jon@nutanix.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=neelima.krishnan@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.