All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Josh Poimboeuf <jpoimboe@kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Paolo Bonzini <pbonzini@redhat.com>,
	tony.luck@intel.com, ak@linux.intel.com,
	tim.c.chen@linux.intel.com, linux-kernel@vger.kernel.org,
	linux-doc@vger.kernel.org, kvm@vger.kernel.org,
	Alyssa Milburn <alyssa.milburn@linux.intel.com>,
	Daniel Sneddon <daniel.sneddon@linux.intel.com>,
	antonio.gomez.iglesias@linux.intel.com
Subject: Re: [PATCH  v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation
Date: Thu, 26 Oct 2023 12:30:55 -0700	[thread overview]
Message-ID: <ZTq-b0uVyf6KLNV0@google.com> (raw)
In-Reply-To: <20231025-delay-verw-v3-6-52663677ee35@linux.intel.com>

On Wed, Oct 25, 2023, Pawan Gupta wrote:
> During VMentry VERW is executed to mitigate MDS. After VERW, any memory
> access like register push onto stack may put host data in MDS affected
> CPU buffers. A guest can then use MDS to sample host data.
> 
> Although likelihood of secrets surviving in registers at current VERW
> callsite is less, but it can't be ruled out. Harden the MDS mitigation
> by moving the VERW mitigation late in VMentry path.
> 
> Note that VERW for MMIO Stale Data mitigation is unchanged because of
> the complexity of per-guest conditional VERW which is not easy to handle
> that late in asm with no GPRs available. If the CPU is also affected by
> MDS, VERW is unconditionally executed late in asm regardless of guest
> having MMIO access.
> 
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> ---
>  arch/x86/kvm/vmx/vmenter.S |  3 +++
>  arch/x86/kvm/vmx/vmx.c     | 10 +++++++---
>  2 files changed, 10 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index b3b13ec04bac..139960deb736 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -161,6 +161,9 @@ SYM_FUNC_START(__vmx_vcpu_run)
>  	/* Load guest RAX.  This kills the @regs pointer! */
>  	mov VCPU_RAX(%_ASM_AX), %_ASM_AX
>  
> +	/* Clobbers EFLAGS.ZF */
> +	CLEAR_CPU_BUFFERS
> +
>  	/* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */
>  	jnc .Lvmlaunch
>  
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 24e8694b83fc..2d149589cf5b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -7226,13 +7226,17 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
>  
>  	guest_state_enter_irqoff();
>  
> -	/* L1D Flush includes CPU buffer clear to mitigate MDS */
> +	/*
> +	 * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW
> +	 * mitigation for MDS is done late in VMentry and is still
> +	 * executed inspite of L1D Flush. This is because an extra VERW

in spite

> +	 * should not matter much after the big hammer L1D Flush.
> +	 */
>  	if (static_branch_unlikely(&vmx_l1d_should_flush))
>  		vmx_l1d_flush(vcpu);

There's an existing bug here.  vmx_1ld_flush() is not guaranteed to do a flush in
"conditional mode", and is not guaranteed to do a ucode-based flush (though I can't
tell if it's possible for the VERW magic to exist without X86_FEATURE_FLUSH_L1D).

If we care, something like the diff at the bottom is probably needed.

> -	else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))
> -		mds_clear_cpu_buffers();
>  	else if (static_branch_unlikely(&mmio_stale_data_clear) &&
>  		 kvm_arch_has_assigned_device(vcpu->kvm))
> +		/* MMIO mitigation is mutually exclusive with MDS mitigation later in asm */

Please don't put comments inside an if/elif without curly braces (and I don't
want to add curly braces).  Though I think that's a moot point if we first fix
the conditional L1D flush issue.  E.g. when the dust settles we can end up with:

	/*
	 * Note, a ucode-based L1D flush also flushes CPU buffers, i.e. the
	 * manual VERW in __vmx_vcpu_run() to mitigate MDS *may* be redundant.
	 * But an L1D Flush is not guaranteed for "conditional mode", and the
	 * cost of an extra VERW after a full L1D flush is negligible.
	 */
	if (static_branch_unlikely(&vmx_l1d_should_flush))
		cpu_buffers_flushed = vmx_l1d_flush(vcpu);

	/*
	 * The MMIO stale data vulnerability is a subset of the general MDS
	 * vulnerability, i.e. this is mutually exclusive with the VERW that's
	 * done just before VM-Enter.  The vulnerability requires the attacker,
	 * i.e. the guest, to do MMIO, so this "clear" can be done earlier.
	 */
	if (static_branch_unlikely(&mmio_stale_data_clear) &&
	    !cpu_buffers_flushed && kvm_arch_has_assigned_device(vcpu->kvm))
		mds_clear_cpu_buffers();

>  		mds_clear_cpu_buffers();
>  
>  	vmx_disable_fb_clear(vmx);

LOL, nice.  IIUC, setting FB_CLEAR_DIS is mutually exclusive with doing a late
VERW, as KVM will never set FB_CLEAR_DIS if the CPU is susceptible to X86_BUG_MDS.
But the checks aren't identical, which makes this _look_ sketchy.

Can you do something like this to ensure we don't accidentally neuter the late VERW?

static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
{
	vmx->disable_fb_clear = (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
				!boot_cpu_has_bug(X86_BUG_MDS) &&
				!boot_cpu_has_bug(X86_BUG_TAA);

	if (vmx->disable_fb_clear &&
	    WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF)))
	    	vmx->disable_fb_clear = false;

	...
}

--
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6e502ba93141..cf6e06bb8310 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6606,8 +6606,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
  * is not exactly LRU. This could be sized at runtime via topology
  * information but as all relevant affected CPUs have 32KiB L1D cache size
  * there is no point in doing so.
+ *
+ * Returns %true if CPU buffers were cleared, i.e. if a microcode-based L1D
+ * flush was executed (which also clears CPU buffers).
  */
-static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
+static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
 {
        int size = PAGE_SIZE << L1D_CACHE_ORDER;
 
@@ -6634,14 +6637,14 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
                kvm_clear_cpu_l1tf_flush_l1d();
 
                if (!flush_l1d)
-                       return;
+                       return false;
        }
 
        vcpu->stat.l1d_flush++;
 
        if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
                native_wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
-               return;
+               return true;
        }
 
        asm volatile(
@@ -6665,6 +6668,8 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
                :: [flush_pages] "r" (vmx_l1d_flush_pages),
                    [size] "r" (size)
                : "eax", "ebx", "ecx", "edx");
+
+       return false;
 }
 
 static void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
@@ -7222,16 +7227,17 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
                                        unsigned int flags)
 {
        struct vcpu_vmx *vmx = to_vmx(vcpu);
+       bool cpu_buffers_flushed = false;
 
        guest_state_enter_irqoff();
 
-       /* L1D Flush includes CPU buffer clear to mitigate MDS */
        if (static_branch_unlikely(&vmx_l1d_should_flush))
-               vmx_l1d_flush(vcpu);
-       else if (static_branch_unlikely(&mds_user_clear))
-               mds_clear_cpu_buffers();
-       else if (static_branch_unlikely(&mmio_stale_data_clear) &&
-                kvm_arch_has_assigned_device(vcpu->kvm))
+               cpu_buffers_flushed = vmx_l1d_flush(vcpu);
+
+       if ((static_branch_unlikely(&mds_user_clear) ||
+            (static_branch_unlikely(&mmio_stale_data_clear) &&
+             kvm_arch_has_assigned_device(vcpu->kvm))) &&
+           !cpu_buffers_flushed)
                mds_clear_cpu_buffers();
 
        vmx_disable_fb_clear(vmx);


  parent reply	other threads:[~2023-10-26 19:31 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-25 20:52 [PATCH v3 0/6] Delay VERW Pawan Gupta
2023-10-25 20:52 ` [PATCH v3 1/6] x86/bugs: Add asm helpers for executing VERW Pawan Gupta
2023-10-25 21:10   ` Andrew Cooper
2023-10-25 21:28     ` Josh Poimboeuf
2023-10-25 21:30       ` Andrew Cooper
2023-10-25 21:49         ` Josh Poimboeuf
2023-10-25 22:07     ` Pawan Gupta
2023-10-25 22:13       ` Andrew Cooper
2023-10-27 13:48         ` Pawan Gupta
2023-10-27 14:12           ` Andrew Cooper
2023-10-27 14:24             ` Pawan Gupta
2023-10-26 13:44   ` Nikolay Borisov
2023-10-26 13:58     ` Andrew Cooper
2023-10-25 20:52 ` [PATCH v3 2/6] x86/entry_64: Add VERW just before userspace transition Pawan Gupta
2023-10-26 16:25   ` Nikolay Borisov
2023-10-26 19:29     ` Pawan Gupta
2023-10-26 19:40       ` Dave Hansen
2023-10-26 21:15         ` Pawan Gupta
2023-10-26 22:13           ` Pawan Gupta
2023-10-26 22:17             ` Dave Hansen
2023-10-25 20:53 ` [PATCH v3 3/6] x86/entry_32: " Pawan Gupta
2023-10-25 20:53 ` [PATCH v3 4/6] x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key Pawan Gupta
2023-10-25 20:53 ` [PATCH v3 5/6] KVM: VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH Pawan Gupta
2023-10-25 20:53 ` [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation Pawan Gupta
2023-10-26 16:14   ` Nikolay Borisov
2023-10-26 19:07     ` Pawan Gupta
2023-10-26 19:30   ` Sean Christopherson [this message]
2023-10-26 20:17     ` Sean Christopherson
2023-10-26 21:27       ` Pawan Gupta
2023-10-26 20:48     ` Pawan Gupta
2023-10-26 21:22       ` Sean Christopherson
2023-10-26 22:03         ` Pawan Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZTq-b0uVyf6KLNV0@google.com \
    --to=seanjc@google.com \
    --cc=ak@linux.intel.com \
    --cc=alyssa.milburn@linux.intel.com \
    --cc=antonio.gomez.iglesias@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=daniel.sneddon@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jpoimboe@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.