From mboxrd@z Thu Jan  1 00:00:00 1970
From: Bandan Das <bsd@redhat.com>
Subject: Re: [PATCH 3/3] KVM: nVMX: check for null vmcs12 when L1 does invept
Date: Thu, 27 Mar 2014 18:14:22 -0400
Message-ID: <jpgmwgbthkx.fsf@nelium.bos.redhat.com>
References: <1395286089-5406-1-git-send-email-bsd@redhat.com>
	<1395286089-5406-4-git-send-email-bsd@redhat.com>
	<532AB624.4020400@siemens.com> <jpgob10si3g.fsf@nelium.bos.redhat.com>
	<532D764E.20607@web.de> <jpgtxaqi3pc.fsf@nelium.bos.redhat.com>
	<532F3322.7060404@web.de> <jpgsiq4k8vv.fsf@nelium.bos.redhat.com>
	<5333E968.8030708@web.de>
Mime-Version: 1.0
Content-Type: text/plain
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Gleb Natapov <gleb@kernel.org>
To: Jan Kiszka <jan.kiszka@web.de>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx1.redhat.com ([209.132.183.28]:6211 "EHLO mx1.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1756166AbaC0WO1 (ORCPT <rfc822;kvm@vger.kernel.org>);
	Thu, 27 Mar 2014 18:14:27 -0400
In-Reply-To: <5333E968.8030708@web.de> (Jan Kiszka's message of "Thu, 27 Mar
	2014 10:03:36 +0100")
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

Jan Kiszka <jan.kiszka@web.de> writes:

> On 2014-03-26 21:22, Bandan Das wrote:
>> Jan Kiszka <jan.kiszka@web.de> writes:
>> 
>>> On 2014-03-22 17:43, Bandan Das wrote:
>>>> Jan Kiszka <jan.kiszka@web.de> writes:
>>>>
>>>>> On 2014-03-20 21:58, Bandan Das wrote:
>>>>>> Jan Kiszka <jan.kiszka@siemens.com> writes:
>>>>>>
>>>>>>> On 2014-03-20 04:28, Bandan Das wrote:
>>>>>>>> Some L1 hypervisors such as Xen seem to be calling invept after
>>>>>>>> vmclear or before vmptrld on L2. In this case, proceed with
>>>>>>>> falling through and syncing roots as a case where
>>>>>>>> context wide invalidation can't be supported
>>>>>>>
>>>>>>> Can we also base this behaviour on a statement in the SDM? But on first
>>>>>>> glance, I do not find anything like this over there.
>>>>>>
>>>>>> The SDM has nothing of this sort explicitly mentioned but 28.3.3.1 
>>>>>> "Operations that invalidate Cached Mappings" does mention that
>>>>>> the instruction may invalidate mappings associated with other
>>>>>> EP4TAs (even in single context).
>>>>>
>>>>> Yes, "may". So we are implementing undefined behavior in order to please
>>>>> a broken hypervisor that relies on it? Then please state this in the
>>>>> patch and probably also inform Xen about their issue.
>>>>
>>>> Why undefined behavior ? We don't do anything specific for 
>>>> the single context invalidation case ianyway .e If the eptp matches what 
>>>> vmcs12 has, single context invalidation does fall though to the global 
>>>> invalidation case already. All this change does is add the "L1 calls 
>>>> invept after vmclear and  before vmptrld" to the list of cases to fall 
>>>> though to global invalidation since nvmx doesn't have any knowledge of 
>>>> the current eptp for this case.
>>>
>>> OK, I think I misunderstood what the guest expects and how we currently
>>> achieve this: we do not track the mapping between guest and host eptp,
>>> thus cannot properly emulate its behaviour. We therefore need to flush
>>> everything.
>>>
>>>>
>>>> Or do you think we should rethink this approach ?
>>>
>>> Well, I wonder if we should expose single-context invept support at all.
>>>
>>> I'm also wondering if we are returning proper flags on
>>>
>>>     if ((operand.eptp & eptp_mask) !=
>>>                     (nested_ept_get_cr3(vcpu) & eptp_mask))
>>>             break;
>> 
>> That does sound plausible but then we would have to get rid of this 
>> little optimization in the code you have quoted above. We would have
>> to flush and sync roots for all invept calls. So, my preference is to 
>> keep it.
>
> Do you have any numbers on how many per-context invept calls we are able
> to ignore? At least for KVM this is should be 0 as we only flush on
> changes to the current EPT structures.

Just instrumented a Debian L2 bootup under L1 Xen and there were 0 
per-context invept calls (with a valid vmcs12). I will spin out a 
version for testing where we don't advertise single context invept and 
if everything works, put a comment as to why we are removing support.

Thanks,
Bandan

> Jan
>
>> 
>>> Neither nested_vmx_succeed nor nested_vmx_fail* is called if this
>>> condition evaluates to true.
>> 
>> It appears we should always call nested_vmx_succeed(). If you are ok,
>> I will send a v2.
>> 
>> Thanks,
>> Bandan
>> 
>>> Jan