* [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 @ 2020-10-11 18:48 Cathy Avery 2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: Cathy Avery @ 2020-10-11 18:48 UTC (permalink / raw) To: linux-kernel, kvm, pbonzini; +Cc: vkuznets, wei.huang2, mlevitsk svm->vmcb will now point to either a separate vmcb L1 ( not nested ) or L2 vmcb ( nested ). Changes: v1 -> v2 - Removed unnecessary update check of L1 save.cr3 during nested_svm_vmexit. - Moved vmcb01_pa to svm. - Removed get_host_vmcb() function. - Updated vmsave/vmload corresponding vmcb state during L2 enter and exit which fixed the L2 load issue. - Moved asid workaround to a new patch which adds asid to svm. - Init previously uninitialized L2 vmcb save.gpat and save.cr4 Tested: kvm-unit-tests kvm self tests Loaded fedora nested guest on fedora Cathy Avery (2): KVM: SVM: Move asid to vcpu_svm KVM: SVM: Use a separate vmcb for the nested L2 guest arch/x86/kvm/svm/nested.c | 117 +++++++++++++++++--------------------- arch/x86/kvm/svm/svm.c | 46 ++++++++------- arch/x86/kvm/svm/svm.h | 50 +++++----------- 3 files changed, 93 insertions(+), 120 deletions(-) -- 2.20.1 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm 2020-10-11 18:48 [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Cathy Avery @ 2020-10-11 18:48 ` Cathy Avery 2020-10-13 1:29 ` Sean Christopherson 2020-11-13 16:57 ` Paolo Bonzini 2020-10-11 18:48 ` [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest Cathy Avery 2020-11-17 11:03 ` [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Paolo Bonzini 2 siblings, 2 replies; 14+ messages in thread From: Cathy Avery @ 2020-10-11 18:48 UTC (permalink / raw) To: linux-kernel, kvm, pbonzini; +Cc: vkuznets, wei.huang2, mlevitsk Move asid to svm->asid to allow for vmcb assignment during svm_vcpu_run without regard to which level guest is running. Signed-off-by: Cathy Avery <cavery@redhat.com> --- arch/x86/kvm/svm/svm.c | 4 +++- arch/x86/kvm/svm/svm.h | 1 + 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index d4e18bda19c7..619980a5d540 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1101,6 +1101,7 @@ static void init_vmcb(struct vcpu_svm *svm) save->cr4 = 0; } svm->asid_generation = 0; + svm->asid = 0; svm->nested.vmcb = 0; svm->vcpu.arch.hflags = 0; @@ -1663,7 +1664,7 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd) } svm->asid_generation = sd->asid_generation; - svm->vmcb->control.asid = sd->next_asid++; + svm->asid = sd->next_asid++; vmcb_mark_dirty(svm->vmcb, VMCB_ASID); } @@ -3446,6 +3447,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu) sync_lapic_to_cr8(vcpu); + svm->vmcb->control.asid = svm->asid; svm->vmcb->save.cr2 = vcpu->arch.cr2; /* diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index a798e1731709..862f0d2405e8 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -104,6 +104,7 @@ struct vcpu_svm { struct vmcb *vmcb; unsigned long vmcb_pa; struct svm_cpu_data *svm_data; + u32 asid; uint64_t asid_generation; uint64_t sysenter_esp; uint64_t sysenter_eip; -- 2.20.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm 2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery @ 2020-10-13 1:29 ` Sean Christopherson 2020-11-13 17:03 ` Paolo Bonzini 2020-11-13 16:57 ` Paolo Bonzini 1 sibling, 1 reply; 14+ messages in thread From: Sean Christopherson @ 2020-10-13 1:29 UTC (permalink / raw) To: Cathy Avery; +Cc: linux-kernel, kvm, pbonzini, vkuznets, wei.huang2, mlevitsk On Sun, Oct 11, 2020 at 02:48:17PM -0400, Cathy Avery wrote: > Move asid to svm->asid to allow for vmcb assignment This is misleading. The asid isn't being moved, it's being copied/tracked. The "to allow" wording also confused me; I though this was just a prep patch and the actual assignment was in a follow-up patch. > during svm_vcpu_run without regard to which level > guest is running. > Signed-off-by: Cathy Avery <cavery@redhat.com> > --- > arch/x86/kvm/svm/svm.c | 4 +++- > arch/x86/kvm/svm/svm.h | 1 + > 2 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index d4e18bda19c7..619980a5d540 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1101,6 +1101,7 @@ static void init_vmcb(struct vcpu_svm *svm) > save->cr4 = 0; > } > svm->asid_generation = 0; > + svm->asid = 0; > > svm->nested.vmcb = 0; > svm->vcpu.arch.hflags = 0; > @@ -1663,7 +1664,7 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd) > } > > svm->asid_generation = sd->asid_generation; > - svm->vmcb->control.asid = sd->next_asid++; > + svm->asid = sd->next_asid++; > vmcb_mark_dirty(svm->vmcb, VMCB_ASID); I know very little (ok, nothing) about SVM VMCB caching rules, but I strongly suspect this is broken. The existing code explicitly marks VMCB_ASID dirty, but there is no equivalent code for the case where there are multiple VMCBs, e.g. if new_asid() is called while vmcb01 is active, then vmcb02 will pick up the new ASID but will not mark it dirty. > } > @@ -3446,6 +3447,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu) > > sync_lapic_to_cr8(vcpu); > > + svm->vmcb->control.asid = svm->asid; Related to the above, handling this in vcpu_run() feels wrong. There really shouldn't be a need to track the ASID. vmcb01 will always exist if vmcb02 exits, e.g. the ASID can be copied and marked dirty when loading vmcb02. For new_asid(), it can unconditionally update vmcb01 and conditionally update vmcb02. > svm->vmcb->save.cr2 = vcpu->arch.cr2; > > /* > diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h > index a798e1731709..862f0d2405e8 100644 > --- a/arch/x86/kvm/svm/svm.h > +++ b/arch/x86/kvm/svm/svm.h > @@ -104,6 +104,7 @@ struct vcpu_svm { > struct vmcb *vmcb; > unsigned long vmcb_pa; > struct svm_cpu_data *svm_data; > + u32 asid; > uint64_t asid_generation; > uint64_t sysenter_esp; > uint64_t sysenter_eip; > -- > 2.20.1 > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm 2020-10-13 1:29 ` Sean Christopherson @ 2020-11-13 17:03 ` Paolo Bonzini 0 siblings, 0 replies; 14+ messages in thread From: Paolo Bonzini @ 2020-11-13 17:03 UTC (permalink / raw) To: Sean Christopherson, Cathy Avery Cc: linux-kernel, kvm, vkuznets, wei.huang2, mlevitsk On 13/10/20 03:29, Sean Christopherson wrote: >> @@ -3446,6 +3447,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu) >> >> sync_lapic_to_cr8(vcpu); >> >> + svm->vmcb->control.asid = svm->asid; > > Related to the above, handling this in vcpu_run() feels wrong. There really > shouldn't be a need to track the ASID. vmcb01 will always exist if vmcb02 > exits, e.g. the ASID can be copied and marked dirty when loading vmcb02. > For new_asid(), it can unconditionally update vmcb01 and conditionally update > vmcb02. Yeah, it is a bit ugly and it is only needed on processors without flush-by-ASID. On those processors the ASID is used by KVM as a sort of TLB flush generation count. A TLB flush should affect both VMCB01 and VMCB02, and that's the reason to have a global ASID in struct vcpu_svm. On processors with flush-by-ASID, currently we bump the ASID on every physical CPU change. However even that is not needed, in principle svm->asid should never change and we could also have different ASID01 and ASID02, similar to VMX. So: long term, svm->asid should only be used only on processors without flush-by-ASID. kvm-amd however is not quite ready for that. Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm 2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery 2020-10-13 1:29 ` Sean Christopherson @ 2020-11-13 16:57 ` Paolo Bonzini 2020-11-29 9:41 ` Ashish Kalra 1 sibling, 1 reply; 14+ messages in thread From: Paolo Bonzini @ 2020-11-13 16:57 UTC (permalink / raw) To: Cathy Avery, linux-kernel, kvm; +Cc: vkuznets, wei.huang2, mlevitsk On 11/10/20 20:48, Cathy Avery wrote: > Move asid to svm->asid to allow for vmcb assignment > during svm_vcpu_run without regard to which level > guest is running. Slightly more verbose commit message: KVM does not have separate ASIDs for L1 and L2; either the nested hypervisor and nested guests share a single ASID, or on older processor the ASID is used only to implement TLB flushing. Either way, ASIDs are handled at the VM level. In preparation for having different VMCBs passed to VMLOAD/VMRUN/VMSAVE for L1 and L2, store the current ASID to struct vcpu_svm and only move it to the VMCB in svm_vcpu_run. This way, TLB flushes can be applied no matter which VMCB will be active during the next svm_vcpu_run. > Signed-off-by: Cathy Avery <cavery@redhat.com> > --- > arch/x86/kvm/svm/svm.c | 4 +++- > arch/x86/kvm/svm/svm.h | 1 + > 2 files changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index d4e18bda19c7..619980a5d540 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -1101,6 +1101,7 @@ static void init_vmcb(struct vcpu_svm *svm) > save->cr4 = 0; > } > svm->asid_generation = 0; > + svm->asid = 0; > > svm->nested.vmcb = 0; > svm->vcpu.arch.hflags = 0; > @@ -1663,7 +1664,7 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd) > } > > svm->asid_generation = sd->asid_generation; > - svm->vmcb->control.asid = sd->next_asid++; > + svm->asid = sd->next_asid++; > > vmcb_mark_dirty(svm->vmcb, VMCB_ASID); > } This vmcb_mark_dirty must be delayed to svm_vcpu_run as well, because the active VMCB could change: diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 48965bfa3d1e..3b53a7ead04b 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1756,12 +1756,11 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd) ++sd->asid_generation; sd->next_asid = sd->min_asid; svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID; + vmcb_mark_dirty(svm->vmcb, VMCB_ASID); } svm->asid_generation = sd->asid_generation; svm->asid = sd->next_asid++; - - vmcb_mark_dirty(svm->vmcb, VMCB_ASID); } static void svm_set_dr6(struct vcpu_svm *svm, unsigned long value) @@ -3571,7 +3570,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu) sync_lapic_to_cr8(vcpu); - svm->vmcb->control.asid = svm->asid; + if (unlikely(svm->asid != svm->vmcb->control.asid)) { + svm->vmcb->control.asid = svm->asid; + vmcb_mark_dirty(svm->vmcb, VMCB_ASID); + } svm->vmcb->save.cr2 = vcpu->arch.cr2; /* Queued with this change. Paolo ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm 2020-11-13 16:57 ` Paolo Bonzini @ 2020-11-29 9:41 ` Ashish Kalra 2020-11-30 14:41 ` Paolo Bonzini 0 siblings, 1 reply; 14+ messages in thread From: Ashish Kalra @ 2020-11-29 9:41 UTC (permalink / raw) To: pbonzini Cc: cavery, kvm, linux-kernel, mlevitsk, vkuznets, wei.huang2, thomas.lendacky, brijesh.singh, jon.grimm From: Ashish Kalra <ashish.kalra@amd.com> This patch breaks SEV guests. The patch stores current ASID in struct vcpu_svm and only moves it to VMCB in svm_vcpu_run(), but by doing so, the ASID allocated for SEV guests and setup in vmcb->control.asid by pre_sev_run() gets over-written by this ASID stored in struct vcpu_svm and hence, VMRUN fails as SEV guest is bound/activated on a different ASID then the one overwritten in vmcb->control.asid at VMRUN. For example, asid#1 was activated for SEV guest and then vmcb->control.asid is overwritten with asid#0 (svm->asid) as part of this patch in svm_vcpu_run() and hence VMRUN fails. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm 2020-11-29 9:41 ` Ashish Kalra @ 2020-11-30 14:41 ` Paolo Bonzini 2020-11-30 21:06 ` Ashish Kalra 0 siblings, 1 reply; 14+ messages in thread From: Paolo Bonzini @ 2020-11-30 14:41 UTC (permalink / raw) To: Ashish Kalra Cc: cavery, kvm, linux-kernel, mlevitsk, vkuznets, wei.huang2, thomas.lendacky, brijesh.singh, jon.grimm On 29/11/20 10:41, Ashish Kalra wrote: > From: Ashish Kalra <ashish.kalra@amd.com> > > This patch breaks SEV guests. > > The patch stores current ASID in struct vcpu_svm and only moves it to VMCB in > svm_vcpu_run(), but by doing so, the ASID allocated for SEV guests and setup > in vmcb->control.asid by pre_sev_run() gets over-written by this ASID > stored in struct vcpu_svm and hence, VMRUN fails as SEV guest is bound/activated > on a different ASID then the one overwritten in vmcb->control.asid at VMRUN. > > For example, asid#1 was activated for SEV guest and then vmcb->control.asid is > overwritten with asid#0 (svm->asid) as part of this patch in svm_vcpu_run() and > hence VMRUN fails. > Thanks Ashish, I've sent a patch to fix it. Would it be possible to add a minimal SEV test to tools/testing/selftests/kvm? It doesn't have to do full attestation etc., if you can just write an "out" instruction using SEV_DBG_ENCRYPT and check that you can run it that's enough. Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm 2020-11-30 14:41 ` Paolo Bonzini @ 2020-11-30 21:06 ` Ashish Kalra 0 siblings, 0 replies; 14+ messages in thread From: Ashish Kalra @ 2020-11-30 21:06 UTC (permalink / raw) To: Paolo Bonzini Cc: cavery, kvm, linux-kernel, mlevitsk, vkuznets, wei.huang2, thomas.lendacky, brijesh.singh, jon.grimm Hello Paolo, I believe one of my teammates is currently working on adding a KVM selftest for SEV and SEV-ES. Thanks, Ashish On Mon, Nov 30, 2020 at 03:41:41PM +0100, Paolo Bonzini wrote: > On 29/11/20 10:41, Ashish Kalra wrote: > > From: Ashish Kalra <ashish.kalra@amd.com> > > > > This patch breaks SEV guests. > > > > The patch stores current ASID in struct vcpu_svm and only moves it to VMCB in > > svm_vcpu_run(), but by doing so, the ASID allocated for SEV guests and setup > > in vmcb->control.asid by pre_sev_run() gets over-written by this ASID > > stored in struct vcpu_svm and hence, VMRUN fails as SEV guest is bound/activated > > on a different ASID then the one overwritten in vmcb->control.asid at VMRUN. > > > > For example, asid#1 was activated for SEV guest and then vmcb->control.asid is > > overwritten with asid#0 (svm->asid) as part of this patch in svm_vcpu_run() and > > hence VMRUN fails. > > > > Thanks Ashish, I've sent a patch to fix it. > > Would it be possible to add a minimal SEV test to > tools/testing/selftests/kvm? It doesn't have to do full attestation etc., > if you can just write an "out" instruction using SEV_DBG_ENCRYPT and check > that you can run it that's enough. > > Paolo > ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest 2020-10-11 18:48 [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Cathy Avery 2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery @ 2020-10-11 18:48 ` Cathy Avery 2020-10-13 1:33 ` Sean Christopherson 2020-11-13 17:58 ` Paolo Bonzini 2020-11-17 11:03 ` [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Paolo Bonzini 2 siblings, 2 replies; 14+ messages in thread From: Cathy Avery @ 2020-10-11 18:48 UTC (permalink / raw) To: linux-kernel, kvm, pbonzini; +Cc: vkuznets, wei.huang2, mlevitsk svm->vmcb will now point to either a separate vmcb L1 ( not nested ) or L2 vmcb ( nested ). Issues: 1) There is some wholesale copying of vmcb.save and vmcb.contol areas which will need to be refined. Tested: kvm-unit-tests kvm self tests Loaded fedora nested guest on fedora Signed-off-by: Cathy Avery <cavery@redhat.com> --- arch/x86/kvm/svm/nested.c | 117 +++++++++++++++++--------------------- arch/x86/kvm/svm/svm.c | 42 +++++++------- arch/x86/kvm/svm/svm.h | 49 +++++----------- 3 files changed, 89 insertions(+), 119 deletions(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index e90bc436f584..bcce92f0a90c 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -75,12 +75,12 @@ static unsigned long nested_svm_get_tdp_cr3(struct kvm_vcpu *vcpu) static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); - struct vmcb *hsave = svm->nested.hsave; WARN_ON(mmu_is_nested(vcpu)); vcpu->arch.mmu = &vcpu->arch.guest_mmu; - kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, hsave->save.cr4, hsave->save.efer, + kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, svm->vmcb01->save.cr4, + svm->vmcb01->save.efer, svm->nested.ctl.nested_cr3); vcpu->arch.mmu->get_guest_pgd = nested_svm_get_tdp_cr3; vcpu->arch.mmu->get_pdptr = nested_svm_get_tdp_pdptr; @@ -105,7 +105,7 @@ void recalc_intercepts(struct vcpu_svm *svm) return; c = &svm->vmcb->control; - h = &svm->nested.hsave->control; + h = &svm->vmcb01->control; g = &svm->nested.ctl; svm->nested.host_intercept_exceptions = h->intercept_exceptions; @@ -403,7 +403,7 @@ static void nested_prepare_vmcb_control(struct vcpu_svm *svm) svm->vmcb->control.int_ctl = (svm->nested.ctl.int_ctl & ~mask) | - (svm->nested.hsave->control.int_ctl & mask); + (svm->vmcb01->control.int_ctl & mask); svm->vmcb->control.virt_ext = svm->nested.ctl.virt_ext; svm->vmcb->control.int_vector = svm->nested.ctl.int_vector; @@ -432,6 +432,16 @@ int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa, int ret; svm->nested.vmcb = vmcb_gpa; + + WARN_ON(svm->vmcb == svm->nested.vmcb02); + + svm->nested.vmcb02->control = svm->vmcb01->control; + svm->nested.vmcb02->save.cr4 = svm->vmcb01->save.cr4; + + nested_svm_vmloadsave(svm->vmcb01, svm->nested.vmcb02); + + svm->vmcb = svm->nested.vmcb02; + svm->vmcb_pa = svm->nested.vmcb02_pa; load_nested_vmcb_control(svm, &nested_vmcb->control); nested_prepare_vmcb_save(svm, nested_vmcb); nested_prepare_vmcb_control(svm); @@ -450,8 +460,6 @@ int nested_svm_vmrun(struct vcpu_svm *svm) { int ret; struct vmcb *nested_vmcb; - struct vmcb *hsave = svm->nested.hsave; - struct vmcb *vmcb = svm->vmcb; struct kvm_host_map map; u64 vmcb_gpa; @@ -496,29 +504,14 @@ int nested_svm_vmrun(struct vcpu_svm *svm) kvm_clear_exception_queue(&svm->vcpu); kvm_clear_interrupt_queue(&svm->vcpu); - /* - * Save the old vmcb, so we don't need to pick what we save, but can - * restore everything when a VMEXIT occurs - */ - hsave->save.es = vmcb->save.es; - hsave->save.cs = vmcb->save.cs; - hsave->save.ss = vmcb->save.ss; - hsave->save.ds = vmcb->save.ds; - hsave->save.gdtr = vmcb->save.gdtr; - hsave->save.idtr = vmcb->save.idtr; - hsave->save.efer = svm->vcpu.arch.efer; - hsave->save.cr0 = kvm_read_cr0(&svm->vcpu); - hsave->save.cr4 = svm->vcpu.arch.cr4; - hsave->save.rflags = kvm_get_rflags(&svm->vcpu); - hsave->save.rip = kvm_rip_read(&svm->vcpu); - hsave->save.rsp = vmcb->save.rsp; - hsave->save.rax = vmcb->save.rax; - if (npt_enabled) - hsave->save.cr3 = vmcb->save.cr3; - else - hsave->save.cr3 = kvm_read_cr3(&svm->vcpu); - - copy_vmcb_control_area(&hsave->control, &vmcb->control); + svm->vmcb01->save.efer = svm->vcpu.arch.efer; + svm->vmcb01->save.cr0 = kvm_read_cr0(&svm->vcpu); + svm->vmcb01->save.cr4 = svm->vcpu.arch.cr4; + svm->vmcb01->save.rflags = kvm_get_rflags(&svm->vcpu); + svm->vmcb01->save.rip = kvm_rip_read(&svm->vcpu); + + if (!npt_enabled) + svm->vmcb01->save.cr3 = kvm_read_cr3(&svm->vcpu); svm->nested.nested_run_pending = 1; @@ -564,7 +557,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm) { int rc; struct vmcb *nested_vmcb; - struct vmcb *hsave = svm->nested.hsave; struct vmcb *vmcb = svm->vmcb; struct kvm_host_map map; @@ -628,8 +620,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm) nested_vmcb->control.pause_filter_thresh = svm->vmcb->control.pause_filter_thresh; - /* Restore the original control entries */ - copy_vmcb_control_area(&vmcb->control, &hsave->control); + nested_svm_vmloadsave(svm->nested.vmcb02, svm->vmcb01); + + svm->vmcb = svm->vmcb01; + svm->vmcb_pa = svm->vmcb01_pa; /* On vmexit the GIF is set to false */ svm_set_gif(svm, false); @@ -640,19 +634,13 @@ int nested_svm_vmexit(struct vcpu_svm *svm) svm->nested.ctl.nested_cr3 = 0; /* Restore selected save entries */ - svm->vmcb->save.es = hsave->save.es; - svm->vmcb->save.cs = hsave->save.cs; - svm->vmcb->save.ss = hsave->save.ss; - svm->vmcb->save.ds = hsave->save.ds; - svm->vmcb->save.gdtr = hsave->save.gdtr; - svm->vmcb->save.idtr = hsave->save.idtr; - kvm_set_rflags(&svm->vcpu, hsave->save.rflags); - svm_set_efer(&svm->vcpu, hsave->save.efer); - svm_set_cr0(&svm->vcpu, hsave->save.cr0 | X86_CR0_PE); - svm_set_cr4(&svm->vcpu, hsave->save.cr4); - kvm_rax_write(&svm->vcpu, hsave->save.rax); - kvm_rsp_write(&svm->vcpu, hsave->save.rsp); - kvm_rip_write(&svm->vcpu, hsave->save.rip); + kvm_set_rflags(&svm->vcpu, svm->vmcb->save.rflags); + svm_set_efer(&svm->vcpu, svm->vmcb->save.efer); + svm_set_cr0(&svm->vcpu, svm->vmcb->save.cr0 | X86_CR0_PE); + svm_set_cr4(&svm->vcpu, svm->vmcb->save.cr4); + kvm_rax_write(&svm->vcpu, svm->vmcb->save.rax); + kvm_rsp_write(&svm->vcpu, svm->vmcb->save.rsp); + kvm_rip_write(&svm->vcpu, svm->vmcb->save.rip); svm->vmcb->save.dr7 = 0; svm->vmcb->save.cpl = 0; svm->vmcb->control.exit_int_info = 0; @@ -670,13 +658,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm) nested_svm_uninit_mmu_context(&svm->vcpu); - rc = nested_svm_load_cr3(&svm->vcpu, hsave->save.cr3, false); + rc = nested_svm_load_cr3(&svm->vcpu, svm->vmcb->save.cr3, false); if (rc) return 1; - if (npt_enabled) - svm->vmcb->save.cr3 = hsave->save.cr3; - /* * Drop what we picked up for L2 via svm_complete_interrupts() so it * doesn't end up in L1. @@ -694,12 +679,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm) void svm_leave_nested(struct vcpu_svm *svm) { if (is_guest_mode(&svm->vcpu)) { - struct vmcb *hsave = svm->nested.hsave; - struct vmcb *vmcb = svm->vmcb; - svm->nested.nested_run_pending = 0; leave_guest_mode(&svm->vcpu); - copy_vmcb_control_area(&vmcb->control, &hsave->control); + svm->vmcb = svm->vmcb01; + svm->vmcb_pa = svm->vmcb01_pa; nested_svm_uninit_mmu_context(&svm->vcpu); } } @@ -982,7 +965,7 @@ int nested_svm_exit_special(struct vcpu_svm *svm) case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: { u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE); - if (get_host_vmcb(svm)->control.intercept_exceptions & excp_bits) + if (svm->vmcb01->control.intercept_exceptions & excp_bits) return NESTED_EXIT_HOST; else if (exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR && svm->vcpu.arch.apf.host_apf_flags) @@ -1046,10 +1029,9 @@ static int svm_get_nested_state(struct kvm_vcpu *vcpu, if (copy_to_user(&user_vmcb->control, &svm->nested.ctl, sizeof(user_vmcb->control))) return -EFAULT; - if (copy_to_user(&user_vmcb->save, &svm->nested.hsave->save, + if (copy_to_user(&user_vmcb->save, &svm->vmcb01->save, sizeof(user_vmcb->save))) return -EFAULT; - out: return kvm_state.size; } @@ -1059,7 +1041,6 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, struct kvm_nested_state *kvm_state) { struct vcpu_svm *svm = to_svm(vcpu); - struct vmcb *hsave = svm->nested.hsave; struct vmcb __user *user_vmcb = (struct vmcb __user *) &user_kvm_nested_state->data.svm[0]; struct vmcb_control_area ctl; @@ -1121,16 +1102,24 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, if (!(save.cr0 & X86_CR0_PG)) return -EINVAL; + svm->nested.vmcb02->control = svm->vmcb01->control; + svm->nested.vmcb02->save = svm->vmcb01->save; + svm->vmcb01->save = save; + + WARN_ON(svm->vmcb == svm->nested.vmcb02); + + svm->nested.vmcb = kvm_state->hdr.svm.vmcb_pa; + + svm->vmcb = svm->nested.vmcb02; + svm->vmcb_pa = svm->nested.vmcb02_pa; + /* - * All checks done, we can enter guest mode. L1 control fields - * come from the nested save state. Guest state is already - * in the registers, the save area of the nested state instead - * contains saved L1 state. + * All checks done, we can enter guest mode. L2 control fields will + * be the result of a combination of L1 and userspace indicated + * L12.control. The save area of L1 vmcb now contains the userspace + * indicated L1.save. */ - copy_vmcb_control_area(&hsave->control, &svm->vmcb->control); - hsave->save = save; - svm->nested.vmcb = kvm_state->hdr.svm.vmcb_pa; load_nested_vmcb_control(svm, &ctl); nested_prepare_vmcb_control(svm); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 619980a5d540..d7e46d9f666d 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -971,8 +971,8 @@ static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) if (is_guest_mode(vcpu)) { /* Write L1's TSC offset. */ g_tsc_offset = svm->vmcb->control.tsc_offset - - svm->nested.hsave->control.tsc_offset; - svm->nested.hsave->control.tsc_offset = offset; + svm->vmcb01->control.tsc_offset; + svm->vmcb01->control.tsc_offset = offset; } trace_kvm_write_tsc_offset(vcpu->vcpu_id, @@ -1097,6 +1097,7 @@ static void init_vmcb(struct vcpu_svm *svm) clr_cr_intercept(svm, INTERCEPT_CR3_READ); clr_cr_intercept(svm, INTERCEPT_CR3_WRITE); save->g_pat = svm->vcpu.arch.pat; + svm->nested.vmcb02->save.g_pat = svm->vcpu.arch.pat; save->cr3 = 0; save->cr4 = 0; } @@ -1172,9 +1173,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) static int svm_create_vcpu(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm; - struct page *page; + struct page *vmcb01_page; + struct page *vmcb02_page; struct page *msrpm_pages; - struct page *hsave_page; struct page *nested_msrpm_pages; int err; @@ -1182,8 +1183,8 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu) svm = to_svm(vcpu); err = -ENOMEM; - page = alloc_page(GFP_KERNEL_ACCOUNT); - if (!page) + vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT); + if (!vmcb01_page) goto out; msrpm_pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER); @@ -1194,8 +1195,8 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu) if (!nested_msrpm_pages) goto free_page2; - hsave_page = alloc_page(GFP_KERNEL_ACCOUNT); - if (!hsave_page) + vmcb02_page = alloc_page(GFP_KERNEL_ACCOUNT); + if (!vmcb02_page) goto free_page3; err = avic_init_vcpu(svm); @@ -1208,8 +1209,9 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu) if (irqchip_in_kernel(vcpu->kvm) && kvm_apicv_activated(vcpu->kvm)) svm->avic_is_running = true; - svm->nested.hsave = page_address(hsave_page); - clear_page(svm->nested.hsave); + svm->nested.vmcb02 = page_address(vmcb02_page); + clear_page(svm->nested.vmcb02); + svm->nested.vmcb02_pa = __sme_set(page_to_pfn(vmcb02_page) << PAGE_SHIFT); svm->msrpm = page_address(msrpm_pages); svm_vcpu_init_msrpm(svm->msrpm); @@ -1217,9 +1219,11 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu) svm->nested.msrpm = page_address(nested_msrpm_pages); svm_vcpu_init_msrpm(svm->nested.msrpm); - svm->vmcb = page_address(page); + svm->vmcb = svm->vmcb01 = page_address(vmcb01_page); clear_page(svm->vmcb); - svm->vmcb_pa = __sme_set(page_to_pfn(page) << PAGE_SHIFT); + svm->vmcb_pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT); + svm->vmcb01_pa = svm->vmcb_pa; + svm->asid_generation = 0; init_vmcb(svm); @@ -1229,13 +1233,13 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu) return 0; free_page4: - __free_page(hsave_page); + __free_page(vmcb02_page); free_page3: __free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER); free_page2: __free_pages(msrpm_pages, MSRPM_ALLOC_ORDER); free_page1: - __free_page(page); + __free_page(vmcb01_page); out: return err; } @@ -1257,11 +1261,11 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu) * svm_vcpu_load(). So, ensure that no logical CPU has this * vmcb page recorded as its current vmcb. */ - svm_clear_current_vmcb(svm->vmcb); - __free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT)); + svm_clear_current_vmcb(svm->vmcb); + __free_page(pfn_to_page(__sme_clr(svm->vmcb01_pa) >> PAGE_SHIFT)); + __free_page(pfn_to_page(__sme_clr(svm->nested.vmcb02_pa) >> PAGE_SHIFT)); __free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER); - __free_page(virt_to_page(svm->nested.hsave)); __free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER); } @@ -1394,7 +1398,7 @@ static void svm_clear_vintr(struct vcpu_svm *svm) /* Drop int_ctl fields related to VINTR injection. */ svm->vmcb->control.int_ctl &= mask; if (is_guest_mode(&svm->vcpu)) { - svm->nested.hsave->control.int_ctl &= mask; + svm->vmcb01->control.int_ctl &= mask; WARN_ON((svm->vmcb->control.int_ctl & V_TPR_MASK) != (svm->nested.ctl.int_ctl & V_TPR_MASK)); @@ -3134,7 +3138,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu) if (is_guest_mode(vcpu)) { /* As long as interrupts are being delivered... */ if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) - ? !(svm->nested.hsave->save.rflags & X86_EFLAGS_IF) + ? !(svm->vmcb01->save.rflags & X86_EFLAGS_IF) : !(kvm_get_rflags(vcpu) & X86_EFLAGS_IF)) return true; diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 862f0d2405e8..1a4af5bd10b3 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -82,7 +82,8 @@ struct kvm_svm { struct kvm_vcpu; struct svm_nested_state { - struct vmcb *hsave; + struct vmcb *vmcb02; + unsigned long vmcb02_pa; u64 hsave_msr; u64 vm_cr_msr; u64 vmcb; @@ -103,6 +104,8 @@ struct vcpu_svm { struct kvm_vcpu vcpu; struct vmcb *vmcb; unsigned long vmcb_pa; + struct vmcb *vmcb01; + unsigned long vmcb01_pa; struct svm_cpu_data *svm_data; u32 asid; uint64_t asid_generation; @@ -207,44 +210,28 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_svm, vcpu); } -static inline struct vmcb *get_host_vmcb(struct vcpu_svm *svm) -{ - if (is_guest_mode(&svm->vcpu)) - return svm->nested.hsave; - else - return svm->vmcb; -} - static inline void set_cr_intercept(struct vcpu_svm *svm, int bit) { - struct vmcb *vmcb = get_host_vmcb(svm); - - vmcb->control.intercept_cr |= (1U << bit); + svm->vmcb01->control.intercept_cr |= (1U << bit); recalc_intercepts(svm); } static inline void clr_cr_intercept(struct vcpu_svm *svm, int bit) { - struct vmcb *vmcb = get_host_vmcb(svm); - - vmcb->control.intercept_cr &= ~(1U << bit); + svm->vmcb01->control.intercept_cr &= ~(1U << bit); recalc_intercepts(svm); } static inline bool is_cr_intercept(struct vcpu_svm *svm, int bit) { - struct vmcb *vmcb = get_host_vmcb(svm); - - return vmcb->control.intercept_cr & (1U << bit); + return svm->vmcb01->control.intercept_cr & (1U << bit); } static inline void set_dr_intercepts(struct vcpu_svm *svm) { - struct vmcb *vmcb = get_host_vmcb(svm); - - vmcb->control.intercept_dr = (1 << INTERCEPT_DR0_READ) + svm->vmcb01->control.intercept_dr = (1 << INTERCEPT_DR0_READ) | (1 << INTERCEPT_DR1_READ) | (1 << INTERCEPT_DR2_READ) | (1 << INTERCEPT_DR3_READ) @@ -266,45 +253,35 @@ static inline void set_dr_intercepts(struct vcpu_svm *svm) static inline void clr_dr_intercepts(struct vcpu_svm *svm) { - struct vmcb *vmcb = get_host_vmcb(svm); - - vmcb->control.intercept_dr = 0; + svm->vmcb01->control.intercept_dr = 0; recalc_intercepts(svm); } static inline void set_exception_intercept(struct vcpu_svm *svm, int bit) { - struct vmcb *vmcb = get_host_vmcb(svm); - - vmcb->control.intercept_exceptions |= (1U << bit); + svm->vmcb01->control.intercept_exceptions |= (1U << bit); recalc_intercepts(svm); } static inline void clr_exception_intercept(struct vcpu_svm *svm, int bit) { - struct vmcb *vmcb = get_host_vmcb(svm); - - vmcb->control.intercept_exceptions &= ~(1U << bit); + svm->vmcb01->control.intercept_exceptions &= ~(1U << bit); recalc_intercepts(svm); } static inline void svm_set_intercept(struct vcpu_svm *svm, int bit) { - struct vmcb *vmcb = get_host_vmcb(svm); - - vmcb->control.intercept |= (1ULL << bit); + svm->vmcb01->control.intercept |= (1ULL << bit); recalc_intercepts(svm); } static inline void svm_clr_intercept(struct vcpu_svm *svm, int bit) { - struct vmcb *vmcb = get_host_vmcb(svm); - - vmcb->control.intercept &= ~(1ULL << bit); + svm->vmcb01->control.intercept &= ~(1ULL << bit); recalc_intercepts(svm); } -- 2.20.1 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest 2020-10-11 18:48 ` [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest Cathy Avery @ 2020-10-13 1:33 ` Sean Christopherson 2020-11-13 17:36 ` Paolo Bonzini 2020-11-13 17:58 ` Paolo Bonzini 1 sibling, 1 reply; 14+ messages in thread From: Sean Christopherson @ 2020-10-13 1:33 UTC (permalink / raw) To: Cathy Avery; +Cc: linux-kernel, kvm, pbonzini, vkuznets, wei.huang2, mlevitsk On Sun, Oct 11, 2020 at 02:48:18PM -0400, Cathy Avery wrote: > @@ -628,8 +620,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm) > nested_vmcb->control.pause_filter_thresh = > svm->vmcb->control.pause_filter_thresh; > > - /* Restore the original control entries */ > - copy_vmcb_control_area(&vmcb->control, &hsave->control); > + nested_svm_vmloadsave(svm->nested.vmcb02, svm->vmcb01); > + > + svm->vmcb = svm->vmcb01; > + svm->vmcb_pa = svm->vmcb01_pa; I very highly recommend adding a helper to switch VMCB. Odds are very good there will be more than just these two lines of boilerplate code for changing the active VMCB. > > /* On vmexit the GIF is set to false */ > svm_set_gif(svm, false); ... > @@ -1121,16 +1102,24 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, > if (!(save.cr0 & X86_CR0_PG)) > return -EINVAL; > > + svm->nested.vmcb02->control = svm->vmcb01->control; > + svm->nested.vmcb02->save = svm->vmcb01->save; > + svm->vmcb01->save = save; > + > + WARN_ON(svm->vmcb == svm->nested.vmcb02); I'm pretty sure this is user triggerable. AFAIK, nothing prevents calling svm_set_nested_state() while L2 is active, e.g. VMX explicitly (and forcefully) kicks the vCPU out of L2 in vmx_set_nested_state(). > + > + svm->nested.vmcb = kvm_state->hdr.svm.vmcb_pa; > + > + svm->vmcb = svm->nested.vmcb02; > + svm->vmcb_pa = svm->nested.vmcb02_pa; > + > /* > - * All checks done, we can enter guest mode. L1 control fields > - * come from the nested save state. Guest state is already > - * in the registers, the save area of the nested state instead > - * contains saved L1 state. > + * All checks done, we can enter guest mode. L2 control fields will > + * be the result of a combination of L1 and userspace indicated > + * L12.control. The save area of L1 vmcb now contains the userspace > + * indicated L1.save. > */ > - copy_vmcb_control_area(&hsave->control, &svm->vmcb->control); > - hsave->save = save; > > - svm->nested.vmcb = kvm_state->hdr.svm.vmcb_pa; > load_nested_vmcb_control(svm, &ctl); > nested_prepare_vmcb_control(svm); > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest 2020-10-13 1:33 ` Sean Christopherson @ 2020-11-13 17:36 ` Paolo Bonzini 0 siblings, 0 replies; 14+ messages in thread From: Paolo Bonzini @ 2020-11-13 17:36 UTC (permalink / raw) To: Sean Christopherson, Cathy Avery Cc: linux-kernel, kvm, vkuznets, wei.huang2, mlevitsk On 13/10/20 03:33, Sean Christopherson wrote: >> + svm->vmcb = svm->vmcb01; >> + svm->vmcb_pa = svm->vmcb01_pa; > I very highly recommend adding a helper to switch VMCB. Odds are very good > there will be more than just these two lines of boilerplate code for changing > the active VMCB. Yes, probably we can make svm->vmcb01 and svm->vmcb02 something like VMX's struct loaded_vmcs: struct kvm_vmcb { void *vmcb; unsigned long pa; } I don't expect a lot more to happen due to SVM having no need for caching, so for now I think it's okay. I have other comments for which I'll reply to the patch itself. Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest 2020-10-11 18:48 ` [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest Cathy Avery 2020-10-13 1:33 ` Sean Christopherson @ 2020-11-13 17:58 ` Paolo Bonzini 2020-11-13 20:38 ` Paolo Bonzini 1 sibling, 1 reply; 14+ messages in thread From: Paolo Bonzini @ 2020-11-13 17:58 UTC (permalink / raw) To: Cathy Avery, linux-kernel, kvm; +Cc: vkuznets, wei.huang2, mlevitsk On 11/10/20 20:48, Cathy Avery wrote: > @@ -432,6 +432,16 @@ int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa, > int ret; > > svm->nested.vmcb = vmcb_gpa; > + > + WARN_ON(svm->vmcb == svm->nested.vmcb02); > + > + svm->nested.vmcb02->control = svm->vmcb01->control; This assignment of the control area should be in nested_prepare_vmcb_control, which is already filling in most of vmcb02->control. Right now, we save a copy_vmcb_control-area in nested_svm_vmexit so it evens out. Later, it should be possible to remove most of the assignments from nested_prepare_vmcb_control. > + svm->nested.vmcb02->save.cr4 = svm->vmcb01->save.cr4; I cannot understand this statement. > + nested_svm_vmloadsave(svm->vmcb01, svm->nested.vmcb02); This is because the vmsave just after the vmexit has moved the vmloadsave registers from vmcb12 to vmcb01, but the next vmload will use vmcb02. > + svm->vmcb = svm->nested.vmcb02; > + svm->vmcb_pa = svm->nested.vmcb02_pa; > load_nested_vmcb_control(svm, &nested_vmcb->control); > nested_prepare_vmcb_save(svm, nested_vmcb); > nested_prepare_vmcb_control(svm); > @@ -628,8 +620,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm) > nested_vmcb->control.pause_filter_thresh = > svm->vmcb->control.pause_filter_thresh; > > - /* Restore the original control entries */ > - copy_vmcb_control_area(&vmcb->control, &hsave->control); > + nested_svm_vmloadsave(svm->nested.vmcb02, svm->vmcb01); Same here: the next vmentry's vmload will move the vmloadsave registers from vmcb01 to vmcb12, but for now they are in vmcb02. It's 16+16 memory-to-memory u64 copies. They mostly even out with the 14+14 copies that we don't have to do anymore on registers handled by VMRUN (es/cs/ss/ds/gdt/idt/rsp/rax---they don't have to be stashed away in hsave anymore). Also, we are able to reuse nested_svm_vmloadsave, which makes it overall a fair tradeoff... but it would have been worth a comment or two. :) > + svm->nested.vmcb02->control = svm->vmcb01->control; > + svm->nested.vmcb02->save = svm->vmcb01->save; > + svm->vmcb01->save = save; I would have moved these after the comment, matching the existing copy_vmcb_control_area and save-area assignment. Also, the first save-area assignment should be (because the WARN_ON below must be removed) svm->nested.vmcb02->save = svm->vmcb->save; or if (svm->vmcb == svm->vmcb01) svm->nested.vmcb02->save = svm->vmcb01->save; I have applied the patch and fixed the conflicts, so when I have some time I will play a bit more with it and/or pass the updated version back to you. In the meanwhile, perhaps you could write a new selftests testcase that tries to do KVM_SET_NESTED_STATE while in L2. It can be a simplified version of state-test that invokes KVM_GET_NESTED_STATE + KVM_SET_NESTED_STATE when it gets back to L0. Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest 2020-11-13 17:58 ` Paolo Bonzini @ 2020-11-13 20:38 ` Paolo Bonzini 0 siblings, 0 replies; 14+ messages in thread From: Paolo Bonzini @ 2020-11-13 20:38 UTC (permalink / raw) To: Cathy Avery, linux-kernel, kvm; +Cc: vkuznets, wei.huang2, mlevitsk On 13/11/20 18:58, Paolo Bonzini wrote: > >> + svm->nested.vmcb02->save.cr4 = svm->vmcb01->save.cr4; > > I cannot understand this statement. I wonder if it has something to do with unsigned long old_cr4 = to_svm(vcpu)->vmcb->save.cr4; whereas vmx.c has unsigned long old_cr4 = vcpu->arch.cr4; without this assignment, the old_cr4 would be taken from the last value stored in the vmcb02, instead of the current value for the vCPU. In general uses of svm->vmcb01 (in svm.c especially) needs to be audited carefully. Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 2020-10-11 18:48 [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Cathy Avery 2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery 2020-10-11 18:48 ` [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest Cathy Avery @ 2020-11-17 11:03 ` Paolo Bonzini 2 siblings, 0 replies; 14+ messages in thread From: Paolo Bonzini @ 2020-11-17 11:03 UTC (permalink / raw) To: Cathy Avery, linux-kernel, kvm Cc: vkuznets, wei.huang2, mlevitsk, Sean Christopherson Hi Cathy, I found an issue with the second patch: the svm->asid_generation and svm->vcpu.cpu fields must become per-VMCB. Once again we are rediscovering what VMX already does (for different reasons) with its struct loaded_vmcs. The good thing is that it can be worked around easily: for the ASID generation, it simply be cleared after changing svm->vmcb. For the CPU field, it's not an issue yet because the VMCB is marked all-dirty after each switch. With these workarounds, everything works nicely. However, removing the calls to vmcb_mark_all_dirty is the main optimization that is enabled by the VMCB01/VMCB02 split, so it should be fixed too. And also, the code duplication every time svm->vmcb is assigned starts to be ugly, proving Sean to be right. :) My suggestion is that you do something like this: 1) create a struct for all per-VMCB data: struct kvm_vmcb_info { void *ptr; u64 pa; } and use it in structs vcpu_svm and svm_nested_state: struct vcpu_svm { ... struct kvm_vmcb_info vmcb01; struct kvm_vmcb_info *current_vmcb; void *vmcb; u64 vmcb_pa; ... } struct svm_nested_state { struct kvm_vmcb_info vmcb02; ... } The struct can be passed to a vmcb_switch function like this one: void vmcb_switch(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb) { svm->current_vmcb = target_vmcb; svm->vmcb = target_vmcb->ptr; svm->vmcb_pa = target_vmcb->pa; /* * Workaround: we don't yet track the ASID generation * that was active the last time target_vmcb was run. */ svm->asid_generation = 0; /* * Workaround: we don't yet track the physical CPU that * target_vmcb has run on. */ vmcb_mark_all_dirty(svm->vmcb); } You can use this function every time the current code is assigning to svm->vmcb. Once the struct and function is in place, we can proceed to removing the last two (inefficient) lines of vmcb_switch by augmenting struct kvm_vmcb_info. 2) First, add an "int cpu" field. Move the vcpu->cpu check from svm_vcpu_load to pre_svm_run, using svm->current_vmcb->cpu instead of vcpu->cpu, and you will be able to remove the vmcb_mark_all_dirty call from vmcb_switch. 3) Then do the same with asid_generation. All uses of svm->asid_generation become uses of svm->current_vmcb->asid_generation, and you can remove the clearing of svm->asid_generation. These can be three separate patches on top of the changes you have sent (or rather the rebased version, see below). Writing good commit messages for them will be a good exercise too. :) I have pushed the current nested SVM queue to kvm.git on a "nested-svm" branch. You can discard the top commits and work on top of commit a33b86f151a0 from that branch ("KVM: SVM: Use a separate vmcb for the nested L2 guest", 2020-11-17). Once this is done, we can start reaping the advantages of the VMCB01/VMCB02 split. Some patches for that are already in the nested-svm branch, but there's more fun to be had. First of all, Maxim's ill-fated attempt at using VMCB12 clean bits will now work. Second, we can try doing VMLOAD/VMSAVE always from VMCB01 (while VMRUN switches between VMCB01 and VMCB02) and therefore remove the nested_svm_vmloadsave calls from nested_vmrun and nested_vmexit. But, one thing at a time. Thanks, Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2020-11-30 21:07 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-10-11 18:48 [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Cathy Avery 2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery 2020-10-13 1:29 ` Sean Christopherson 2020-11-13 17:03 ` Paolo Bonzini 2020-11-13 16:57 ` Paolo Bonzini 2020-11-29 9:41 ` Ashish Kalra 2020-11-30 14:41 ` Paolo Bonzini 2020-11-30 21:06 ` Ashish Kalra 2020-10-11 18:48 ` [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest Cathy Avery 2020-10-13 1:33 ` Sean Christopherson 2020-11-13 17:36 ` Paolo Bonzini 2020-11-13 17:58 ` Paolo Bonzini 2020-11-13 20:38 ` Paolo Bonzini 2020-11-17 11:03 ` [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).