linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cathy Avery <cavery@redhat.com>
To: Maxim Levitsky <mlevitsk@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org
Cc: vkuznets@redhat.com, wei.huang2@amd.com
Subject: Re: [PATCH] KVM: SVM: Use a separate vmcb for the nested L2 guest
Date: Thu, 8 Oct 2020 08:46:58 -0400	[thread overview]
Message-ID: <85ec3bfd-e46e-a6fa-b530-4dc87f0c7169@redhat.com> (raw)
In-Reply-To: <51d447be4a6f430ed5cc60242457394aceb004e9.camel@redhat.com>

On 10/8/20 6:54 AM, Maxim Levitsky wrote:
> On Thu, 2020-10-08 at 13:39 +0300, Maxim Levitsky wrote:
>> On Thu, 2020-10-08 at 13:23 +0300, Maxim Levitsky wrote:
>>> On Thu, 2020-10-08 at 07:52 +0200, Paolo Bonzini wrote:
>>>> On 08/10/20 00:14, Maxim Levitsky wrote:
>>>>>> +	if (svm->vmcb01->control.asid == 0)
>>>>>> +		svm->vmcb01->control.asid = svm->nested.vmcb02->control.asid;
>>>>> I think that the above should be done always. The asid field is currently host
>>>>> controlled only (that is L2 value is ignored, selective ASID tlb flush is not
>>>>> advertized to the guest and lnvlpga is emulated as invlpg).
>>>> Yes, in fact I suggested that ASID should be in svm->asid and moved to
>>>> svm->vmcb->asid in svm_vcpu_run.  Then there's no need to special case
>>>> it in nested code.
>>> This makes lot of sense!
>>>> This should be a patch coming before this one.
>>>>
>>>>> 1. Something wrong with memory types - like guest is using UC memory for everything.
>>>>>      I can't completely rule that out yet
>>>> You can print g_pat and see if it is all zeroes.
>>> I don't even need to print it. I know that it is never set anywhere, unless guest writes it,
>>> but now that I look at it, we set it to a default value and there is no code to set it to
>>> default value for vmcb02. This is it. now my fedora guest boots just fine!
>>>
>>> I played a lot with g_pat, and yet this didn't occur to me . I was that close :-(
>>> I knew that it has to be something with memory types, but it never occured to me
>>> that guest just doesn't write IA32_PAT and uses our value which we set in init_vmcb
>>>
>>>
>>>> In general I think it's better to be explicit with vmcb01 vs. vmcb02,
>>>> like Cathy did, but I can see it's a matter of personal preference to
>>>> some extent.
>>> I also think so in general, but in the code that is outside 'is_guest_mode'
>>> IMHO it is better to refer to vmcb01 as vmcb, although now that I think of
>>> it, its not wrong to do so either.
>>>
>>>
>>> My windows hyper-v guest doesn't boot though and I know why.
>>>
>>> As we know the vmcb save area has extra state which vmrun/vmexit don't touch.
>>> Now suppose a nested hypervisor wants to enter a nested guest.
>>>
>>> It will do vmload, which will load the extra state from the nested vmcb (vmcb12
>>> or as I woudl say the vmcb that nested hypervisor thinks that it is using),
>>> to the CPU. This can cause some vmexits I think, but this doesn't matter much.
>>>
>>> Now the nested hypervisor does vmrun. The extra state of L2 guest is in CPU registers,
>>> and it is untouched. We do vmsave on vmcb01 to preserve that state, but later
>>> when we do vmload on vmcb02 prior to vmenter on it, which loads stale state from it.
>>> The same issue happens the other way around on nested vmexit.
>>>
>>> I fixed this by doing nested_svm_vmloadsave, but that should be probably be
>>> optimized with dirty bits. Now though I guess the goal it to make
>>> it work first.
>>>
>>> With this fixed HyperV boots fine, and even passes the 'works' test of booting
>>> the windows 10 with hyperv enabled nested itself and starting the vm inside,
>>> which makes that VM L3 (in addition to windows itself that runs as L3 in relation to hyper-v)
>>>
>>> https://i.imgur.com/sRYqtVV.png
>>>
>>> In summary this is the diff of fixes (just pasted to email, probably mangled):
>>>
>>>
>>> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
>>> index 0a06e62010d8c..7293ba23b3cbc 100644
>>> --- a/arch/x86/kvm/svm/nested.c
>>> +++ b/arch/x86/kvm/svm/nested.c
>>> @@ -436,6 +436,9 @@ int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
>>>          WARN_ON(svm->vmcb == svm->nested.vmcb02);
>>>   
>>>          svm->nested.vmcb02->control = svm->vmcb01->control;
>>> +
>>> +       nested_svm_vmloadsave(svm->vmcb01, svm->nested.vmcb02);
>>> +
>>>          svm->vmcb = svm->nested.vmcb02;
>>>          svm->vmcb_pa = svm->nested.vmcb02_pa;
>>>          load_nested_vmcb_control(svm, &nested_vmcb->control);
>>> @@ -622,6 +625,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>>>          if (svm->vmcb01->control.asid == 0)
>>>                  svm->vmcb01->control.asid = svm->nested.vmcb02->control.asid;
>>>   
>>> +       nested_svm_vmloadsave(svm->nested.vmcb02, svm->vmcb01);
>>>          svm->vmcb = svm->vmcb01;
>>>          svm->vmcb_pa = svm->nested.vmcb01_pa;
>>>   
>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index b66239b26885d..ee9f87fe611f2 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -1097,6 +1097,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>>>                  clr_cr_intercept(svm, INTERCEPT_CR3_READ);
>>>                  clr_cr_intercept(svm, INTERCEPT_CR3_WRITE);
>>>                  save->g_pat = svm->vcpu.arch.pat;
>>> +               svm->nested.vmcb02->save.g_pat = svm->vcpu.arch.pat;

I had noticed that g_pat was not set in vmcb02 and set it to vmcb01's 
value which was not helpful but I did not try the current vcpu value.

I am getting a #UD which I suspected had something to do with cr3 but 
I'll know more after I add your suggestions.

emu-system-x86-1647  [033] ....  3167.589402: kvm_nested_vmexit_inject: 
reason: UD excp ext_inf1: 0x0000000000000000 ext_inf2: 
0x0000000000000000 ext_int: 0x00000000 ext_int_err: 0x00000000


>>>                  save->cr3 = 0;
>>>                  save->cr4 = 0;
>>>          }
>>>
>>>
>>>
>>> Best regards,
>>> 	Maxim Levitsky
>>>
>>>> Paolo
>>>>
>> And another thing I spotted before I forget.
>>
>> If we setup a tlb flush in ctl.tlb_ctl of vmcb01, just prior to nested vmentry
>> then this field will be copied to vmcb02 but on next vmexit we clear it in current
>> (that is vmcb02) and that change will not propogate to vmcb01.
ctl.tlb_ctl is dependent on the value of save.cr4 which was not being 
set in vmcb02.
>>
>> I am not sure if this is a theorerical issue or not. We probably should apply the same treatment to
>> it as what Paulo suggested to do with asid -
>> set it just prior to vmentry if tlb flush is needed, and clear it afterwards as we do.
> And yet another thing to note is that we curently ignore L2's g_pat. However it _seems_ that we practically
> ignore L1 PAT as well in regard to shadowing NPT mmu. I am not 100% sure about this.
> I'll dig that area eventually.
>
> Best regards,
> 	Maxim Levitsky
>
>> Best regards,
>> 	Maxim Levitsky
>
Thanks,

Cathy


  reply	other threads:[~2020-10-08 12:47 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-17 19:23 [PATCH] KVM: SVM: Use a separate vmcb for the nested L2 guest Cathy Avery
2020-09-18 15:16 ` Babu Moger
2020-09-18 15:27   ` Cathy Avery
2020-09-18 16:46     ` Babu Moger
2020-09-18 21:11 ` Wei Huang
2020-09-21 14:07   ` Cathy Avery
2020-09-22 15:02     ` Paolo Bonzini
2020-09-22 15:04 ` Paolo Bonzini
2020-09-24 15:17 ` Maxim Levitsky
2020-10-07 22:14 ` Maxim Levitsky
2020-10-08  5:52   ` Paolo Bonzini
2020-10-08 10:23     ` Maxim Levitsky
2020-10-08 10:39       ` Maxim Levitsky
2020-10-08 10:54         ` Maxim Levitsky
2020-10-08 12:46           ` Cathy Avery [this message]
2020-10-08 13:11             ` Maxim Levitsky
2020-10-08 13:52               ` Cathy Avery
2020-10-08 14:19                 ` Maxim Levitsky
2020-10-09 12:31       ` Cathy Avery

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=85ec3bfd-e46e-a6fa-b530-4dc87f0c7169@redhat.com \
    --to=cavery@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wei.huang2@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).