From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.linutronix.de (146.0.238.70:993) by crypto-ml.lab.linutronix.de with IMAP4-SSL for ; 13 Jul 2018 16:51:19 -0000 Received: from mx3-rdu2.redhat.com ([66.187.233.73] helo=mx1.redhat.com) by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fe0qO-0002Ws-3w for speck@linutronix.de; Fri, 13 Jul 2018 18:22:56 +0200 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id ACD8181663EE for ; Fri, 13 Jul 2018 16:22:49 +0000 (UTC) Received: from [10.36.117.240] (ovpn-117-240.ams2.redhat.com [10.36.117.240]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 1BCEA1C67B for ; Fri, 13 Jul 2018 16:22:48 +0000 (UTC) Subject: [MODERATED] Re: [patch V10 00/10] Control knobs and Documentation 0 References: <20180712141902.576562442@linutronix.de> From: Paolo Bonzini Message-ID: <6e2b04bb-4786-ae48-1fe8-e1bbdbcd8b92@redhat.com> Date: Fri, 13 Jul 2018 18:22:47 +0200 MIME-Version: 1.0 In-Reply-To: <20180712141902.576562442@linutronix.de> Content-Type: multipart/mixed; boundary="q3j0rEUt3JC7PIBKngpI67nqGDf6kTEbX"; protected-headers="v1" To: speck@linutronix.de List-ID: This is an OpenPGP/MIME encrypted message (RFC 4880 and 3156) --q3j0rEUt3JC7PIBKngpI67nqGDf6kTEbX Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 12/07/2018 16:19, speck for Thomas Gleixner wrote: > The following series provides the following changes: >=20 > - Fix EPT=3Doff handling so it avoids flushing > =20 > - Expose proper VMX mitigation information in sysfs >=20 > - Drops the MSR list mechanism for flush 'always' to prepare for runt= ime > control. The default flush mechanism is conditional anyway and the = MSR > list is set up at guest init time, which is nasty to run time switc= h > especially because the static key is a global control which can be > flipped by an update. >=20 > - Make the flush always/conditional static key based. >=20 > - Serialize the kvm parameter setter function >=20 > - Enable runtime control for the kvm parameter >=20 > - Add the l1tf command line option. It's not run time controllable as= it > does not make sense to have 3 knobs at runtime. For the command lin= e > the combo knob setting the default is convenient >=20 > - Documentation update >=20 > This takes the review comments into account as much as still applicable= =2E >=20 > Thanks to Jiri for testing the lot and debugging and fixing my brainfar= ts! >=20 > Git bundle follows in separate mail. Another case on top of this series... ---------------------- 8< -------------------- =46rom a0f605fed99cf1623f8716b22c11113653c258a3 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Fri, 13 Jul 2018 18:15:29 +0200 Subject: [PATCH] kvm: vmx: disable L1D flush when running as a nested hypervisor VMENTER operations from the nested hypervisor into the nested guest will always be processed by the bare metal hypervisor. Therefore, when running as a nested hypervisor, doing L1D cache flushes on vmentry will result in twice the work and twice the slowdown, for no benefit. Special case this situation and report it in sysfs. (The three levels involved are usually called L0/L1/L2 in KVM slang. I'm= avoiding that naming because of the confusion with cache levels). Signed-off-by: Paolo Bonzini --- Documentation/admin-guide/l1tf.rst | 23 ++++++++++++++++++++++- arch/x86/include/asm/vmx.h | 1 + arch/x86/kernel/cpu/bugs.c | 3 -++ arch/x86/kvm/vmx.c | 5 +++++ 4 files changed, 30 insertions(+), 2 deletion(-) diff --git a/Documentation/admin-guide/l1tf.rst b/Documentation/admin-gui= de/l1tf.rst index 5adf7d7c2b4e..a962afbce156 100644 --- a/Documentation/admin-guide/l1tf.rst +++ b/Documentation/admin-guide/l1tf.rst @@ -528,6 +528,27 @@ available: EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter. =20 +3.4. Nested virtual machines +"""""""""""""""""""""""""""" + +When nested virtualization is in use, three operating systems are involv= ed: +the bare metal hypervisor, the nested hypervisor, and the nested virtual= +machine. VMENTER operations from the nested hypervisor into the nested +guest will always be processed by the bare metal hypervisor. Therefore,= +when running as a nested hypervisor, KVM will not perform any L1D cache +flush, assuming instead that the "outermost" hypervisor takes care of +flushing the L1D cache on VMENTER to nested guests. + +When running as a bare metal hypervisor, instead, KVM will: + + - flush the L1D cache on every switch from nested hypervisor to + nested virtual machine, so that the nested hypervisor's secrets + are not exposed to the nested virtual machine; + + - flush the L1D cache on every switch from nested virtual machine to + nested hypervisor; this is a complex operation, and flushing the L1D + cache avoids that the bare metal hypervisor's secrets be exposed + to the nested virtual machine. =20 .. _default_mitigations: =20 @@ -540,7 +561,7 @@ Default mitigations unconditionally and cannot be controlled. =20 - L1D conditional flushing on VMENTER when EPT is enabled for - a guest. + a guest, and the guest is not a nested virtual machine. =20 The kernel does not by default enforce the disabling of SMT, which lea= ves SMT systems vulnerable when running untrusted guests with EPT enabled.= diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h index 94a8547d915b..7c0438751fa5 100644 --- a/arch/x86/include/asm/vmx.h +++ b/arch/x86/include/asm/vmx.h @@ -579,6 +579,7 @@ enum vmx_l1d_flush_state { VMENTER_L1D_FLUSH_COND, VMENTER_L1D_FLUSH_ALWAYS, VMENTER_L1D_FLUSH_EPT_DISABLED, + VMENTER_L1D_FLUSH_NESTED_VM, }; =20 extern enum vmx_l1d_flush_state l1tf_vmx_mitigation; diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index d63cb1501784..87828f2f64a5 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -745,7 +745,8 @@ static const char *l1tf_vmx_states[] =3D { [VMENTER_L1D_FLUSH_NEVER] =3D "vulnerable", [VMENTER_L1D_FLUSH_COND] =3D "conditional cache flushes", [VMENTER_L1D_FLUSH_ALWAYS] =3D "cache flushes", - [VMENTER_L1D_FLUSH_EPT_DISABLED]=3D "EPT disabled" + [VMENTER_L1D_FLUSH_EPT_DISABLED]=3D "EPT disabled", + [VMENTER_L1D_FLUSH_NESTED_VM] =3D "nested virtual machine", }; =20 static ssize_t l1tf_show_state(char *buf) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index c5c0118b126d..a7e41ac4256f 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -212,6 +212,11 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_st= ate l1tf) { struct page *page; =20 + if (static_cpu_has(X86_FEATURE_HYPERVISOR)) { + l1tf_vmx_mitigation =3D VMENTER_L1D_FLUSH_NESTED_VM; + return 0; + } + if (!enable_ept) { l1tf_vmx_mitigation =3D VMENTER_L1D_FLUSH_EPT_DISABLED; return 0; --=20 2.17.1 --q3j0rEUt3JC7PIBKngpI67nqGDf6kTEbX--