From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linutronix.de (146.0.238.70:993) by crypto-ml.lab.linutronix.de with IMAP4-SSL for ; 13 Jul 2018 17:28:27 -0000 Received: from userp2120.oracle.com ([156.151.31.85]) by Galois.linutronix.de with esmtps (TLS1.2:RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fe1rm-0004C9-5h for speck@linutronix.de; Fri, 13 Jul 2018 19:28:26 +0200 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w6DHNtZh041866 for ; Fri, 13 Jul 2018 17:28:19 GMT Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by userp2120.oracle.com with ESMTP id 2k2p7vrpe5-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Fri, 13 Jul 2018 17:28:19 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w6DHSIUs016993 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Fri, 13 Jul 2018 17:28:19 GMT Received: from abhmp0014.oracle.com (abhmp0014.oracle.com [141.146.116.20]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w6DHSIKW013984 for ; Fri, 13 Jul 2018 17:28:18 GMT Date: Fri, 13 Jul 2018 13:28:17 -0400 From: Konrad Rzeszutek Wilk Subject: [MODERATED] Re: [patch V10 00/10] Control knobs and Documentation 0 Message-ID: <20180713172817.GA22875@char.US.ORACLE.com> References: <20180712141902.576562442@linutronix.de> <6e2b04bb-4786-ae48-1fe8-e1bbdbcd8b92@redhat.com> MIME-Version: 1.0 In-Reply-To: <6e2b04bb-4786-ae48-1fe8-e1bbdbcd8b92@redhat.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: speck@linutronix.de List-ID: On Fri, Jul 13, 2018 at 06:22:47PM +0200, speck for Paolo Bonzini wrote: > On 12/07/2018 16:19, speck for Thomas Gleixner wrote: > > The following series provides the following changes: > > > > - Fix EPT=off handling so it avoids flushing > > > > - Expose proper VMX mitigation information in sysfs > > > > - Drops the MSR list mechanism for flush 'always' to prepare for runtime > > control. The default flush mechanism is conditional anyway and the MSR > > list is set up at guest init time, which is nasty to run time switch > > especially because the static key is a global control which can be > > flipped by an update. > > > > - Make the flush always/conditional static key based. > > > > - Serialize the kvm parameter setter function > > > > - Enable runtime control for the kvm parameter > > > > - Add the l1tf command line option. It's not run time controllable as it > > does not make sense to have 3 knobs at runtime. For the command line > > the combo knob setting the default is convenient > > > > - Documentation update > > > > This takes the review comments into account as much as still applicable. > > > > Thanks to Jiri for testing the lot and debugging and fixing my brainfarts! > > > > Git bundle follows in separate mail. > > Another case on top of this series... > > ---------------------- 8< -------------------- > From a0f605fed99cf1623f8716b22c11113653c258a3 Mon Sep 17 00:00:00 2001 > From: Paolo Bonzini > Date: Fri, 13 Jul 2018 18:15:29 +0200 > Subject: [PATCH] kvm: vmx: disable L1D flush when running as a nested > hypervisor > > VMENTER operations from the nested hypervisor into the nested guest > will always be processed by the bare metal hypervisor. Therefore, > when running as a nested hypervisor, doing L1D cache flushes on vmentry > will result in twice the work and twice the slowdown, for no benefit. > > Special case this situation and report it in sysfs. > > (The three levels involved are usually called L0/L1/L2 in KVM slang. I'm > avoiding that naming because of the confusion with cache levels). > > Signed-off-by: Paolo Bonzini > --- > Documentation/admin-guide/l1tf.rst | 23 ++++++++++++++++++++++- > arch/x86/include/asm/vmx.h | 1 + > arch/x86/kernel/cpu/bugs.c | 3 -++ > arch/x86/kvm/vmx.c | 5 +++++ > 4 files changed, 30 insertions(+), 2 deletion(-) > > diff --git a/Documentation/admin-guide/l1tf.rst b/Documentation/admin-guide/l1tf.rst > index 5adf7d7c2b4e..a962afbce156 100644 > --- a/Documentation/admin-guide/l1tf.rst > +++ b/Documentation/admin-guide/l1tf.rst > @@ -528,6 +528,27 @@ available: > EPT can be disabled in the hypervisor via the 'kvm-intel.ept' > parameter. > > +3.4. Nested virtual machines > +"""""""""""""""""""""""""""" > + > +When nested virtualization is in use, three operating systems are involved: > +the bare metal hypervisor, the nested hypervisor, and the nested virtual > +machine. VMENTER operations from the nested hypervisor into the nested > +guest will always be processed by the bare metal hypervisor. Therefore, > +when running as a nested hypervisor, KVM will not perform any L1D cache > +flush, assuming instead that the "outermost" hypervisor takes care of > +flushing the L1D cache on VMENTER to nested guests. > + > +When running as a bare metal hypervisor, instead, KVM will: > + > + - flush the L1D cache on every switch from nested hypervisor to > + nested virtual machine, so that the nested hypervisor's secrets > + are not exposed to the nested virtual machine; > + > + - flush the L1D cache on every switch from nested virtual machine to > + nested hypervisor; this is a complex operation, and flushing the L1D > + cache avoids that the bare metal hypervisor's secrets be exposed > + to the nested virtual machine. > > .. _default_mitigations: > > @@ -540,7 +561,7 @@ Default mitigations > unconditionally and cannot be controlled. > > - L1D conditional flushing on VMENTER when EPT is enabled for > - a guest. > + a guest, and the guest is not a nested virtual machine. > > The kernel does not by default enforce the disabling of SMT, which leaves > SMT systems vulnerable when running untrusted guests with EPT enabled. > diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h > index 94a8547d915b..7c0438751fa5 100644 > --- a/arch/x86/include/asm/vmx.h > +++ b/arch/x86/include/asm/vmx.h > @@ -579,6 +579,7 @@ enum vmx_l1d_flush_state { > VMENTER_L1D_FLUSH_COND, > VMENTER_L1D_FLUSH_ALWAYS, > VMENTER_L1D_FLUSH_EPT_DISABLED, > + VMENTER_L1D_FLUSH_NESTED_VM, > }; > > extern enum vmx_l1d_flush_state l1tf_vmx_mitigation; > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c > index d63cb1501784..87828f2f64a5 100644 > --- a/arch/x86/kernel/cpu/bugs.c > +++ b/arch/x86/kernel/cpu/bugs.c > @@ -745,7 +745,8 @@ static const char *l1tf_vmx_states[] = { > [VMENTER_L1D_FLUSH_NEVER] = "vulnerable", > [VMENTER_L1D_FLUSH_COND] = "conditional cache flushes", > [VMENTER_L1D_FLUSH_ALWAYS] = "cache flushes", > - [VMENTER_L1D_FLUSH_EPT_DISABLED]= "EPT disabled" > + [VMENTER_L1D_FLUSH_EPT_DISABLED]= "EPT disabled", > + [VMENTER_L1D_FLUSH_NESTED_VM] = "nested virtual machine", > }; > > static ssize_t l1tf_show_state(char *buf) > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index c5c0118b126d..a7e41ac4256f 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -212,6 +212,11 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) > { > struct page *page; > > + if (static_cpu_has(X86_FEATURE_HYPERVISOR)) { > + l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NESTED_VM; > + return 0; > + } Perhaps: diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 0e75170..f03ec33 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -70,6 +70,7 @@ #define MSR_IA32_ARCH_CAPABILITIES 0x0000010a #define ARCH_CAP_RDCL_NO (1 << 0) /* Not susceptible to Meltdown */ #define ARCH_CAP_IBRS_ALL (1 << 1) /* Enhanced IBRS support */ +#define ARCH_CAP_SKIP_L1DFL_VMENTRY (1 << 3) /* Skip L1DF on VMENTRY */ #define ARCH_CAP_SSB_NO (1 << 4) /* * Not susceptible to Speculative Store Bypass * attack, so no Speculative Store Bypass diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c5c0118..5209252 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -216,6 +216,15 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_EPT_DISABLED; return 0; } + if (static_cpu_has(X86_FEATURE_HYPERVISOR) && + static_cpu_has(X86_FEATURE_FLUSH_L1D) && + boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES)) { + u64 msr; + + rdmsrl(MSR_IA32_ARCH_CAPABILITIES, msr); + if (msr & ARCH_CAP_SKIP_L1DFL_VMENTRY) + l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_NESTED_VM; + } /* If set to auto use the default l1tf mitigation method */ if (l1tf == VMENTER_L1D_FLUSH_AUTO) {