From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <tglx@linutronix.de>
Received: from mail.linutronix.de (146.0.238.70:993) by
  crypto-ml.lab.linutronix.de with IMAP4-SSL for <speck@linutronix.de>; 30 Jul
  2018 21:36:15 -0000
Received: from p4fea5a5a.dip0.t-ipconnect.de ([79.234.90.90] helo=nanos)
	by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256)
	(Exim 4.80)
	(envelope-from <tglx@linutronix.de>)
	id 1fkFpu-0007RP-SM
	for speck@linutronix.de; Mon, 30 Jul 2018 23:36:15 +0200
Date: Mon, 30 Jul 2018 23:36:14 +0200 (CEST)
From: Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH v2 4/4] L1TF KVM ARCH_CAPABILITIES #4
In-Reply-To: <20180725143100.16309-5-pbonzini@redhat.com>
Message-ID: <alpine.DEB.2.21.1807302329530.1725@nanos.tec.linutronix.de>
References: <20180725143100.16309-1-pbonzini@redhat.com>
 <20180725143100.16309-5-pbonzini@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
To: speck@linutronix.de
List-ID: <speck.linutronix.de>

On Wed, 25 Jul 2018, speck for Paolo Bonzini wrote:
>  
> +3.4. Nested virtual machines
> +""""""""""""""""""""""""""""
> +
> +When nested virtualization is in use, three operating systems are involved:
> +the bare metal hypervisor, the nested hypervisor, and the nested virtual
> +machine.  VMENTER operations from the nested hypervisor into the nested
> +guest will always be processed by the bare metal hypervisor.  Therefore:
> +
> +When running as a bare metal hypervisor, instead, KVM will:
> +
> + - flush the L1D cache on every switch from nested hypervisor to
> +   nested virtual machine, so that the nested hypervisor's secrets
> +   are not exposed to the nested virtual machine;
> +
> + - flush the L1D cache on every switch from nested virtual machine to
> +   nested hypervisor; this is a complex operation, and flushing the L1D
> +   cache avoids that the bare metal hypervisor's secrets be exposed
> +   to the nested virtual machine;
> +
> + - instruct the nested hypervisor to not perform any L1D cache flush.

I still think that we need some explanation about SMT in guests, i.e. that
the SMT information in guests is inaccurate and does not tell anything
about the host side SMT control state. But that's independent of this
nested optimization as it applies to all guest levels.

> +u64 kvm_get_arch_capabilities(void)
> +{
> +	u64 data;
> +
> +	rdmsrl_safe(MSR_IA32_ARCH_CAPABILITIES, &data);
> +	if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
> +		data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;

That really wants a comment explaining the magic here.

Thanks,

	tglx