[PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2
@ 2020-10-11 18:48 Cathy Avery
  2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Cathy Avery @ 2020-10-11 18:48 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini; +Cc: vkuznets, wei.huang2, mlevitsk

svm->vmcb will now point to either a separate vmcb L1 ( not nested ) or L2 vmcb ( nested ).

Changes:
v1 -> v2
 - Removed unnecessary update check of L1 save.cr3 during nested_svm_vmexit.
 - Moved vmcb01_pa to svm.
 - Removed get_host_vmcb() function.
 - Updated vmsave/vmload corresponding vmcb state during L2
   enter and exit which fixed the L2 load issue.
 - Moved asid workaround to a new patch which adds asid to svm.
 - Init previously uninitialized L2 vmcb save.gpat and save.cr4

Tested:
kvm-unit-tests
kvm self tests
Loaded fedora nested guest on fedora

Cathy Avery (2):
  KVM: SVM: Move asid to vcpu_svm
  KVM: SVM: Use a separate vmcb for the nested L2 guest

 arch/x86/kvm/svm/nested.c | 117 +++++++++++++++++---------------------
 arch/x86/kvm/svm/svm.c    |  46 ++++++++-------
 arch/x86/kvm/svm/svm.h    |  50 +++++-----------
 3 files changed, 93 insertions(+), 120 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm
  2020-10-11 18:48 [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Cathy Avery
@ 2020-10-11 18:48 ` Cathy Avery
  2020-10-13  1:29   ` Sean Christopherson
  2020-11-13 16:57   ` Paolo Bonzini
  2020-10-11 18:48 ` [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest Cathy Avery
  2020-11-17 11:03 ` [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Paolo Bonzini
  2 siblings, 2 replies; 14+ messages in thread
From: Cathy Avery @ 2020-10-11 18:48 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini; +Cc: vkuznets, wei.huang2, mlevitsk

Move asid to svm->asid to allow for vmcb assignment
during svm_vcpu_run without regard to which level
guest is running.

Signed-off-by: Cathy Avery <cavery@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 4 +++-
 arch/x86/kvm/svm/svm.h | 1 +
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d4e18bda19c7..619980a5d540 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1101,6 +1101,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 		save->cr4 = 0;
 	}
 	svm->asid_generation = 0;
+	svm->asid = 0;
 
 	svm->nested.vmcb = 0;
 	svm->vcpu.arch.hflags = 0;
@@ -1663,7 +1664,7 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
 	}
 
 	svm->asid_generation = sd->asid_generation;
-	svm->vmcb->control.asid = sd->next_asid++;
+	svm->asid = sd->next_asid++;
 
 	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
 }
@@ -3446,6 +3447,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	sync_lapic_to_cr8(vcpu);
 
+	svm->vmcb->control.asid = svm->asid;
 	svm->vmcb->save.cr2 = vcpu->arch.cr2;
 
 	/*
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index a798e1731709..862f0d2405e8 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -104,6 +104,7 @@ struct vcpu_svm {
 	struct vmcb *vmcb;
 	unsigned long vmcb_pa;
 	struct svm_cpu_data *svm_data;
+	u32 asid;
 	uint64_t asid_generation;
 	uint64_t sysenter_esp;
 	uint64_t sysenter_eip;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm
  2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery
@ 2020-10-13  1:29   ` Sean Christopherson
  2020-11-13 17:03     ` Paolo Bonzini
  2020-11-13 16:57   ` Paolo Bonzini
  1 sibling, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2020-10-13  1:29 UTC (permalink / raw)
  To: Cathy Avery; +Cc: linux-kernel, kvm, pbonzini, vkuznets, wei.huang2, mlevitsk

On Sun, Oct 11, 2020 at 02:48:17PM -0400, Cathy Avery wrote:
> Move asid to svm->asid to allow for vmcb assignment

This is misleading.  The asid isn't being moved, it's being copied/tracked.
The "to allow" wording also confused me; I though this was just a prep patch
and the actual assignment was in a follow-up patch.

> during svm_vcpu_run without regard to which level
> guest is running.
> Signed-off-by: Cathy Avery <cavery@redhat.com>
> ---
>  arch/x86/kvm/svm/svm.c | 4 +++-
>  arch/x86/kvm/svm/svm.h | 1 +
>  2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d4e18bda19c7..619980a5d540 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1101,6 +1101,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>  		save->cr4 = 0;
>  	}
>  	svm->asid_generation = 0;
> +	svm->asid = 0;
>  
>  	svm->nested.vmcb = 0;
>  	svm->vcpu.arch.hflags = 0;
> @@ -1663,7 +1664,7 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
>  	}
>  
>  	svm->asid_generation = sd->asid_generation;
> -	svm->vmcb->control.asid = sd->next_asid++;
> +	svm->asid = sd->next_asid++;
>  	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);

I know very little (ok, nothing) about SVM VMCB caching rules, but I strongly
suspect this is broken.  The existing code explicitly marks VMCB_ASID dirty,
but there is no equivalent code for the case where there are multiple VMCBs,
e.g. if new_asid() is called while vmcb01 is active, then vmcb02 will pick up
the new ASID but will not mark it dirty.

>  }
> @@ -3446,6 +3447,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>  
>  	sync_lapic_to_cr8(vcpu);
>  
> +	svm->vmcb->control.asid = svm->asid;

Related to the above, handling this in vcpu_run() feels wrong.  There really
shouldn't be a need to track the ASID.  vmcb01 will always exist if vmcb02
exits, e.g. the ASID can be copied and marked dirty when loading vmcb02.
For new_asid(), it can unconditionally update vmcb01 and conditionally update
vmcb02.

>  	svm->vmcb->save.cr2 = vcpu->arch.cr2;
>  
>  	/*
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index a798e1731709..862f0d2405e8 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -104,6 +104,7 @@ struct vcpu_svm {
>  	struct vmcb *vmcb;
>  	unsigned long vmcb_pa;
>  	struct svm_cpu_data *svm_data;
> +	u32 asid;
>  	uint64_t asid_generation;
>  	uint64_t sysenter_esp;
>  	uint64_t sysenter_eip;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm
  2020-10-13  1:29   ` Sean Christopherson
@ 2020-11-13 17:03     ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2020-11-13 17:03 UTC (permalink / raw)
  To: Sean Christopherson, Cathy Avery
  Cc: linux-kernel, kvm, vkuznets, wei.huang2, mlevitsk

On 13/10/20 03:29, Sean Christopherson wrote:
>> @@ -3446,6 +3447,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu)
>>   
>>   	sync_lapic_to_cr8(vcpu);
>>   
>> +	svm->vmcb->control.asid = svm->asid;
> 
> Related to the above, handling this in vcpu_run() feels wrong.  There really
> shouldn't be a need to track the ASID.  vmcb01 will always exist if vmcb02
> exits, e.g. the ASID can be copied and marked dirty when loading vmcb02.
> For new_asid(), it can unconditionally update vmcb01 and conditionally update
> vmcb02.

Yeah, it is a bit ugly and it is only needed on processors without 
flush-by-ASID.  On those processors the ASID is used by KVM as a sort of 
TLB flush generation count.  A TLB flush should affect both VMCB01 and 
VMCB02, and that's the reason to have a global ASID in struct vcpu_svm.

On processors with flush-by-ASID, currently we bump the ASID on every 
physical CPU change.  However even that is not needed, in principle 
svm->asid should never change and we could also have different ASID01 
and ASID02, similar to VMX.  So: long term, svm->asid should only be 
used only on processors without flush-by-ASID.  kvm-amd however is not 
quite ready for that.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm
  2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery
  2020-10-13  1:29   ` Sean Christopherson
@ 2020-11-13 16:57   ` Paolo Bonzini
  2020-11-29  9:41     ` Ashish Kalra
  1 sibling, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2020-11-13 16:57 UTC (permalink / raw)
  To: Cathy Avery, linux-kernel, kvm; +Cc: vkuznets, wei.huang2, mlevitsk

On 11/10/20 20:48, Cathy Avery wrote:
> Move asid to svm->asid to allow for vmcb assignment
> during svm_vcpu_run without regard to which level
> guest is running.

Slightly more verbose commit message:

KVM does not have separate ASIDs for L1 and L2; either the nested
hypervisor and nested guests share a single ASID, or on older processor
the ASID is used only to implement TLB flushing.

Either way, ASIDs are handled at the VM level.  In preparation
for having different VMCBs passed to VMLOAD/VMRUN/VMSAVE for L1 and
L2, store the current ASID to struct vcpu_svm and only move it to
the VMCB in svm_vcpu_run.  This way, TLB flushes can be applied
no matter which VMCB will be active during the next svm_vcpu_run.


> Signed-off-by: Cathy Avery <cavery@redhat.com>
> ---
>   arch/x86/kvm/svm/svm.c | 4 +++-
>   arch/x86/kvm/svm/svm.h | 1 +
>   2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index d4e18bda19c7..619980a5d540 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1101,6 +1101,7 @@ static void init_vmcb(struct vcpu_svm *svm)
>   		save->cr4 = 0;
>   	}
>   	svm->asid_generation = 0;
> +	svm->asid = 0;
>   
>   	svm->nested.vmcb = 0;
>   	svm->vcpu.arch.hflags = 0;
> @@ -1663,7 +1664,7 @@ static void new_asid(struct vcpu_svm *svm, struct svm_cpu_data *sd)
>   	}
>   
>   	svm->asid_generation = sd->asid_generation;
> -	svm->vmcb->control.asid = sd->next_asid++;
> +	svm->asid = sd->next_asid++;
>   
>   	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
>   }

This vmcb_mark_dirty must be delayed to svm_vcpu_run as well, because 
the active VMCB could change:

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 48965bfa3d1e..3b53a7ead04b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1756,12 +1756,11 @@ static void new_asid(struct vcpu_svm *svm, 
struct svm_cpu_data *sd)
  		++sd->asid_generation;
  		sd->next_asid = sd->min_asid;
  		svm->vmcb->control.tlb_ctl = TLB_CONTROL_FLUSH_ALL_ASID;
+		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
  	}

  	svm->asid_generation = sd->asid_generation;
  	svm->asid = sd->next_asid++;
-
-	vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
  }

  static void svm_set_dr6(struct vcpu_svm *svm, unsigned long value)
@@ -3571,7 +3570,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct 
kvm_vcpu *vcpu)

  	sync_lapic_to_cr8(vcpu);

-	svm->vmcb->control.asid = svm->asid;
+	if (unlikely(svm->asid != svm->vmcb->control.asid)) {
+		svm->vmcb->control.asid = svm->asid;
+		vmcb_mark_dirty(svm->vmcb, VMCB_ASID);
+	}
  	svm->vmcb->save.cr2 = vcpu->arch.cr2;

  	/*

Queued with this change.

Paolo


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm
  2020-11-13 16:57   ` Paolo Bonzini
@ 2020-11-29  9:41     ` Ashish Kalra
  2020-11-30 14:41       ` Paolo Bonzini
  0 siblings, 1 reply; 14+ messages in thread
From: Ashish Kalra @ 2020-11-29  9:41 UTC (permalink / raw)
  To: pbonzini
  Cc: cavery, kvm, linux-kernel, mlevitsk, vkuznets, wei.huang2,
	thomas.lendacky, brijesh.singh, jon.grimm

From: Ashish Kalra <ashish.kalra@amd.com>

This patch breaks SEV guests. 

The patch stores current ASID in struct vcpu_svm and only moves it to VMCB in
svm_vcpu_run(), but by doing so, the ASID allocated for SEV guests and setup 
in vmcb->control.asid by pre_sev_run() gets over-written by this ASID
stored in struct vcpu_svm and hence, VMRUN fails as SEV guest is bound/activated
on a different ASID then the one overwritten in vmcb->control.asid at VMRUN. 

For example, asid#1 was activated for SEV guest and then vmcb->control.asid is
overwritten with asid#0 (svm->asid) as part of this patch in svm_vcpu_run() and
hence VMRUN fails.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm
  2020-11-29  9:41     ` Ashish Kalra
@ 2020-11-30 14:41       ` Paolo Bonzini
  2020-11-30 21:06         ` Ashish Kalra
  0 siblings, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2020-11-30 14:41 UTC (permalink / raw)
  To: Ashish Kalra
  Cc: cavery, kvm, linux-kernel, mlevitsk, vkuznets, wei.huang2,
	thomas.lendacky, brijesh.singh, jon.grimm

On 29/11/20 10:41, Ashish Kalra wrote:
> From: Ashish Kalra <ashish.kalra@amd.com>
> 
> This patch breaks SEV guests.
> 
> The patch stores current ASID in struct vcpu_svm and only moves it to VMCB in
> svm_vcpu_run(), but by doing so, the ASID allocated for SEV guests and setup
> in vmcb->control.asid by pre_sev_run() gets over-written by this ASID
> stored in struct vcpu_svm and hence, VMRUN fails as SEV guest is bound/activated
> on a different ASID then the one overwritten in vmcb->control.asid at VMRUN.
> 
> For example, asid#1 was activated for SEV guest and then vmcb->control.asid is
> overwritten with asid#0 (svm->asid) as part of this patch in svm_vcpu_run() and
> hence VMRUN fails.
> 

Thanks Ashish, I've sent a patch to fix it.

Would it be possible to add a minimal SEV test to 
tools/testing/selftests/kvm?  It doesn't have to do full attestation 
etc., if you can just write an "out" instruction using SEV_DBG_ENCRYPT 
and check that you can run it that's enough.

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm
  2020-11-30 14:41       ` Paolo Bonzini
@ 2020-11-30 21:06         ` Ashish Kalra
  0 siblings, 0 replies; 14+ messages in thread
From: Ashish Kalra @ 2020-11-30 21:06 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: cavery, kvm, linux-kernel, mlevitsk, vkuznets, wei.huang2,
	thomas.lendacky, brijesh.singh, jon.grimm

Hello Paolo,

I believe one of my teammates is currently working on adding a KVM
selftest for SEV and SEV-ES.

Thanks,
Ashish

On Mon, Nov 30, 2020 at 03:41:41PM +0100, Paolo Bonzini wrote:
> On 29/11/20 10:41, Ashish Kalra wrote:
> > From: Ashish Kalra <ashish.kalra@amd.com>
> > 
> > This patch breaks SEV guests.
> > 
> > The patch stores current ASID in struct vcpu_svm and only moves it to VMCB in
> > svm_vcpu_run(), but by doing so, the ASID allocated for SEV guests and setup
> > in vmcb->control.asid by pre_sev_run() gets over-written by this ASID
> > stored in struct vcpu_svm and hence, VMRUN fails as SEV guest is bound/activated
> > on a different ASID then the one overwritten in vmcb->control.asid at VMRUN.
> > 
> > For example, asid#1 was activated for SEV guest and then vmcb->control.asid is
> > overwritten with asid#0 (svm->asid) as part of this patch in svm_vcpu_run() and
> > hence VMRUN fails.
> > 
> 
> Thanks Ashish, I've sent a patch to fix it.
> 
> Would it be possible to add a minimal SEV test to
> tools/testing/selftests/kvm?  It doesn't have to do full attestation etc.,
> if you can just write an "out" instruction using SEV_DBG_ENCRYPT and check
> that you can run it that's enough.
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest
  2020-10-11 18:48 [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Cathy Avery
  2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery
@ 2020-10-11 18:48 ` Cathy Avery
  2020-10-13  1:33   ` Sean Christopherson
  2020-11-13 17:58   ` Paolo Bonzini
  2020-11-17 11:03 ` [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Paolo Bonzini
  2 siblings, 2 replies; 14+ messages in thread
From: Cathy Avery @ 2020-10-11 18:48 UTC (permalink / raw)
  To: linux-kernel, kvm, pbonzini; +Cc: vkuznets, wei.huang2, mlevitsk

svm->vmcb will now point to either a separate vmcb L1 ( not nested ) or L2 vmcb ( nested ).

Issues:

1) There is some wholesale copying of vmcb.save and vmcb.contol
   areas which will need to be refined.

Tested:
kvm-unit-tests
kvm self tests
Loaded fedora nested guest on fedora

Signed-off-by: Cathy Avery <cavery@redhat.com>
---
 arch/x86/kvm/svm/nested.c | 117 +++++++++++++++++---------------------
 arch/x86/kvm/svm/svm.c    |  42 +++++++-------
 arch/x86/kvm/svm/svm.h    |  49 +++++-----------
 3 files changed, 89 insertions(+), 119 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index e90bc436f584..bcce92f0a90c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -75,12 +75,12 @@ static unsigned long nested_svm_get_tdp_cr3(struct kvm_vcpu *vcpu)
 static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	struct vmcb *hsave = svm->nested.hsave;
 
 	WARN_ON(mmu_is_nested(vcpu));
 
 	vcpu->arch.mmu = &vcpu->arch.guest_mmu;
-	kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, hsave->save.cr4, hsave->save.efer,
+	kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, svm->vmcb01->save.cr4,
+				svm->vmcb01->save.efer,
 				svm->nested.ctl.nested_cr3);
 	vcpu->arch.mmu->get_guest_pgd     = nested_svm_get_tdp_cr3;
 	vcpu->arch.mmu->get_pdptr         = nested_svm_get_tdp_pdptr;
@@ -105,7 +105,7 @@ void recalc_intercepts(struct vcpu_svm *svm)
 		return;
 
 	c = &svm->vmcb->control;
-	h = &svm->nested.hsave->control;
+	h = &svm->vmcb01->control;
 	g = &svm->nested.ctl;
 
 	svm->nested.host_intercept_exceptions = h->intercept_exceptions;
@@ -403,7 +403,7 @@ static void nested_prepare_vmcb_control(struct vcpu_svm *svm)
 
 	svm->vmcb->control.int_ctl             =
 		(svm->nested.ctl.int_ctl & ~mask) |
-		(svm->nested.hsave->control.int_ctl & mask);
+		(svm->vmcb01->control.int_ctl & mask);
 
 	svm->vmcb->control.virt_ext            = svm->nested.ctl.virt_ext;
 	svm->vmcb->control.int_vector          = svm->nested.ctl.int_vector;
@@ -432,6 +432,16 @@ int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
 	int ret;
 
 	svm->nested.vmcb = vmcb_gpa;
+
+	WARN_ON(svm->vmcb == svm->nested.vmcb02);
+
+	svm->nested.vmcb02->control = svm->vmcb01->control;
+	svm->nested.vmcb02->save.cr4 = svm->vmcb01->save.cr4;
+
+	nested_svm_vmloadsave(svm->vmcb01, svm->nested.vmcb02);
+
+	svm->vmcb = svm->nested.vmcb02;
+	svm->vmcb_pa = svm->nested.vmcb02_pa;
 	load_nested_vmcb_control(svm, &nested_vmcb->control);
 	nested_prepare_vmcb_save(svm, nested_vmcb);
 	nested_prepare_vmcb_control(svm);
@@ -450,8 +460,6 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 {
 	int ret;
 	struct vmcb *nested_vmcb;
-	struct vmcb *hsave = svm->nested.hsave;
-	struct vmcb *vmcb = svm->vmcb;
 	struct kvm_host_map map;
 	u64 vmcb_gpa;
 
@@ -496,29 +504,14 @@ int nested_svm_vmrun(struct vcpu_svm *svm)
 	kvm_clear_exception_queue(&svm->vcpu);
 	kvm_clear_interrupt_queue(&svm->vcpu);
 
-	/*
-	 * Save the old vmcb, so we don't need to pick what we save, but can
-	 * restore everything when a VMEXIT occurs
-	 */
-	hsave->save.es     = vmcb->save.es;
-	hsave->save.cs     = vmcb->save.cs;
-	hsave->save.ss     = vmcb->save.ss;
-	hsave->save.ds     = vmcb->save.ds;
-	hsave->save.gdtr   = vmcb->save.gdtr;
-	hsave->save.idtr   = vmcb->save.idtr;
-	hsave->save.efer   = svm->vcpu.arch.efer;
-	hsave->save.cr0    = kvm_read_cr0(&svm->vcpu);
-	hsave->save.cr4    = svm->vcpu.arch.cr4;
-	hsave->save.rflags = kvm_get_rflags(&svm->vcpu);
-	hsave->save.rip    = kvm_rip_read(&svm->vcpu);
-	hsave->save.rsp    = vmcb->save.rsp;
-	hsave->save.rax    = vmcb->save.rax;
-	if (npt_enabled)
-		hsave->save.cr3    = vmcb->save.cr3;
-	else
-		hsave->save.cr3    = kvm_read_cr3(&svm->vcpu);
-
-	copy_vmcb_control_area(&hsave->control, &vmcb->control);
+	svm->vmcb01->save.efer   = svm->vcpu.arch.efer;
+	svm->vmcb01->save.cr0    = kvm_read_cr0(&svm->vcpu);
+	svm->vmcb01->save.cr4    = svm->vcpu.arch.cr4;
+	svm->vmcb01->save.rflags = kvm_get_rflags(&svm->vcpu);
+	svm->vmcb01->save.rip    = kvm_rip_read(&svm->vcpu);
+
+	if (!npt_enabled)
+		svm->vmcb01->save.cr3 = kvm_read_cr3(&svm->vcpu);
 
 	svm->nested.nested_run_pending = 1;
 
@@ -564,7 +557,6 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 {
 	int rc;
 	struct vmcb *nested_vmcb;
-	struct vmcb *hsave = svm->nested.hsave;
 	struct vmcb *vmcb = svm->vmcb;
 	struct kvm_host_map map;
 
@@ -628,8 +620,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	nested_vmcb->control.pause_filter_thresh =
 		svm->vmcb->control.pause_filter_thresh;
 
-	/* Restore the original control entries */
-	copy_vmcb_control_area(&vmcb->control, &hsave->control);
+	nested_svm_vmloadsave(svm->nested.vmcb02, svm->vmcb01);
+
+	svm->vmcb = svm->vmcb01;
+	svm->vmcb_pa = svm->vmcb01_pa;
 
 	/* On vmexit the  GIF is set to false */
 	svm_set_gif(svm, false);
@@ -640,19 +634,13 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	svm->nested.ctl.nested_cr3 = 0;
 
 	/* Restore selected save entries */
-	svm->vmcb->save.es = hsave->save.es;
-	svm->vmcb->save.cs = hsave->save.cs;
-	svm->vmcb->save.ss = hsave->save.ss;
-	svm->vmcb->save.ds = hsave->save.ds;
-	svm->vmcb->save.gdtr = hsave->save.gdtr;
-	svm->vmcb->save.idtr = hsave->save.idtr;
-	kvm_set_rflags(&svm->vcpu, hsave->save.rflags);
-	svm_set_efer(&svm->vcpu, hsave->save.efer);
-	svm_set_cr0(&svm->vcpu, hsave->save.cr0 | X86_CR0_PE);
-	svm_set_cr4(&svm->vcpu, hsave->save.cr4);
-	kvm_rax_write(&svm->vcpu, hsave->save.rax);
-	kvm_rsp_write(&svm->vcpu, hsave->save.rsp);
-	kvm_rip_write(&svm->vcpu, hsave->save.rip);
+	kvm_set_rflags(&svm->vcpu, svm->vmcb->save.rflags);
+	svm_set_efer(&svm->vcpu, svm->vmcb->save.efer);
+	svm_set_cr0(&svm->vcpu, svm->vmcb->save.cr0 | X86_CR0_PE);
+	svm_set_cr4(&svm->vcpu, svm->vmcb->save.cr4);
+	kvm_rax_write(&svm->vcpu, svm->vmcb->save.rax);
+	kvm_rsp_write(&svm->vcpu, svm->vmcb->save.rsp);
+	kvm_rip_write(&svm->vcpu, svm->vmcb->save.rip);
 	svm->vmcb->save.dr7 = 0;
 	svm->vmcb->save.cpl = 0;
 	svm->vmcb->control.exit_int_info = 0;
@@ -670,13 +658,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 
 	nested_svm_uninit_mmu_context(&svm->vcpu);
 
-	rc = nested_svm_load_cr3(&svm->vcpu, hsave->save.cr3, false);
+	rc = nested_svm_load_cr3(&svm->vcpu, svm->vmcb->save.cr3, false);
 	if (rc)
 		return 1;
 
-	if (npt_enabled)
-		svm->vmcb->save.cr3 = hsave->save.cr3;
-
 	/*
 	 * Drop what we picked up for L2 via svm_complete_interrupts() so it
 	 * doesn't end up in L1.
@@ -694,12 +679,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 void svm_leave_nested(struct vcpu_svm *svm)
 {
 	if (is_guest_mode(&svm->vcpu)) {
-		struct vmcb *hsave = svm->nested.hsave;
-		struct vmcb *vmcb = svm->vmcb;
-
 		svm->nested.nested_run_pending = 0;
 		leave_guest_mode(&svm->vcpu);
-		copy_vmcb_control_area(&vmcb->control, &hsave->control);
+		svm->vmcb = svm->vmcb01;
+		svm->vmcb_pa = svm->vmcb01_pa;
 		nested_svm_uninit_mmu_context(&svm->vcpu);
 	}
 }
@@ -982,7 +965,7 @@ int nested_svm_exit_special(struct vcpu_svm *svm)
 	case SVM_EXIT_EXCP_BASE ... SVM_EXIT_EXCP_BASE + 0x1f: {
 		u32 excp_bits = 1 << (exit_code - SVM_EXIT_EXCP_BASE);
 
-		if (get_host_vmcb(svm)->control.intercept_exceptions & excp_bits)
+		if (svm->vmcb01->control.intercept_exceptions & excp_bits)
 			return NESTED_EXIT_HOST;
 		else if (exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR &&
 			 svm->vcpu.arch.apf.host_apf_flags)
@@ -1046,10 +1029,9 @@ static int svm_get_nested_state(struct kvm_vcpu *vcpu,
 	if (copy_to_user(&user_vmcb->control, &svm->nested.ctl,
 			 sizeof(user_vmcb->control)))
 		return -EFAULT;
-	if (copy_to_user(&user_vmcb->save, &svm->nested.hsave->save,
+	if (copy_to_user(&user_vmcb->save, &svm->vmcb01->save,
 			 sizeof(user_vmcb->save)))
 		return -EFAULT;
-
 out:
 	return kvm_state.size;
 }
@@ -1059,7 +1041,6 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 				struct kvm_nested_state *kvm_state)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
-	struct vmcb *hsave = svm->nested.hsave;
 	struct vmcb __user *user_vmcb = (struct vmcb __user *)
 		&user_kvm_nested_state->data.svm[0];
 	struct vmcb_control_area ctl;
@@ -1121,16 +1102,24 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
 	if (!(save.cr0 & X86_CR0_PG))
 		return -EINVAL;
 
+	svm->nested.vmcb02->control = svm->vmcb01->control;
+	svm->nested.vmcb02->save = svm->vmcb01->save;
+	svm->vmcb01->save = save;
+
+	WARN_ON(svm->vmcb == svm->nested.vmcb02);
+
+	svm->nested.vmcb = kvm_state->hdr.svm.vmcb_pa;
+
+	svm->vmcb = svm->nested.vmcb02;
+	svm->vmcb_pa = svm->nested.vmcb02_pa;
+
 	/*
-	 * All checks done, we can enter guest mode.  L1 control fields
-	 * come from the nested save state.  Guest state is already
-	 * in the registers, the save area of the nested state instead
-	 * contains saved L1 state.
+	 * All checks done, we can enter guest mode. L2 control fields will
+	 * be the result of a combination of L1 and userspace indicated
+	 * L12.control. The save area of L1 vmcb now contains the userspace
+	 * indicated L1.save.
 	 */
-	copy_vmcb_control_area(&hsave->control, &svm->vmcb->control);
-	hsave->save = save;
 
-	svm->nested.vmcb = kvm_state->hdr.svm.vmcb_pa;
 	load_nested_vmcb_control(svm, &ctl);
 	nested_prepare_vmcb_control(svm);
 
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 619980a5d540..d7e46d9f666d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -971,8 +971,8 @@ static u64 svm_write_l1_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 	if (is_guest_mode(vcpu)) {
 		/* Write L1's TSC offset.  */
 		g_tsc_offset = svm->vmcb->control.tsc_offset -
-			       svm->nested.hsave->control.tsc_offset;
-		svm->nested.hsave->control.tsc_offset = offset;
+			       svm->vmcb01->control.tsc_offset;
+		svm->vmcb01->control.tsc_offset = offset;
 	}
 
 	trace_kvm_write_tsc_offset(vcpu->vcpu_id,
@@ -1097,6 +1097,7 @@ static void init_vmcb(struct vcpu_svm *svm)
 		clr_cr_intercept(svm, INTERCEPT_CR3_READ);
 		clr_cr_intercept(svm, INTERCEPT_CR3_WRITE);
 		save->g_pat = svm->vcpu.arch.pat;
+		svm->nested.vmcb02->save.g_pat = svm->vcpu.arch.pat;
 		save->cr3 = 0;
 		save->cr4 = 0;
 	}
@@ -1172,9 +1173,9 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_svm *svm;
-	struct page *page;
+	struct page *vmcb01_page;
+	struct page *vmcb02_page;
 	struct page *msrpm_pages;
-	struct page *hsave_page;
 	struct page *nested_msrpm_pages;
 	int err;
 
@@ -1182,8 +1183,8 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	svm = to_svm(vcpu);
 
 	err = -ENOMEM;
-	page = alloc_page(GFP_KERNEL_ACCOUNT);
-	if (!page)
+	vmcb01_page = alloc_page(GFP_KERNEL_ACCOUNT);
+	if (!vmcb01_page)
 		goto out;
 
 	msrpm_pages = alloc_pages(GFP_KERNEL_ACCOUNT, MSRPM_ALLOC_ORDER);
@@ -1194,8 +1195,8 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	if (!nested_msrpm_pages)
 		goto free_page2;
 
-	hsave_page = alloc_page(GFP_KERNEL_ACCOUNT);
-	if (!hsave_page)
+	vmcb02_page = alloc_page(GFP_KERNEL_ACCOUNT);
+	if (!vmcb02_page)
 		goto free_page3;
 
 	err = avic_init_vcpu(svm);
@@ -1208,8 +1209,9 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	if (irqchip_in_kernel(vcpu->kvm) && kvm_apicv_activated(vcpu->kvm))
 		svm->avic_is_running = true;
 
-	svm->nested.hsave = page_address(hsave_page);
-	clear_page(svm->nested.hsave);
+	svm->nested.vmcb02 = page_address(vmcb02_page);
+	clear_page(svm->nested.vmcb02);
+	svm->nested.vmcb02_pa = __sme_set(page_to_pfn(vmcb02_page) << PAGE_SHIFT);
 
 	svm->msrpm = page_address(msrpm_pages);
 	svm_vcpu_init_msrpm(svm->msrpm);
@@ -1217,9 +1219,11 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	svm->nested.msrpm = page_address(nested_msrpm_pages);
 	svm_vcpu_init_msrpm(svm->nested.msrpm);
 
-	svm->vmcb = page_address(page);
+	svm->vmcb = svm->vmcb01 = page_address(vmcb01_page);
 	clear_page(svm->vmcb);
-	svm->vmcb_pa = __sme_set(page_to_pfn(page) << PAGE_SHIFT);
+	svm->vmcb_pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT);
+	svm->vmcb01_pa = svm->vmcb_pa;
+
 	svm->asid_generation = 0;
 	init_vmcb(svm);
 
@@ -1229,13 +1233,13 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu)
 	return 0;
 
 free_page4:
-	__free_page(hsave_page);
+	__free_page(vmcb02_page);
 free_page3:
 	__free_pages(nested_msrpm_pages, MSRPM_ALLOC_ORDER);
 free_page2:
 	__free_pages(msrpm_pages, MSRPM_ALLOC_ORDER);
 free_page1:
-	__free_page(page);
+	__free_page(vmcb01_page);
 out:
 	return err;
 }
@@ -1257,11 +1261,11 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 	 * svm_vcpu_load(). So, ensure that no logical CPU has this
 	 * vmcb page recorded as its current vmcb.
 	 */
-	svm_clear_current_vmcb(svm->vmcb);
 
-	__free_page(pfn_to_page(__sme_clr(svm->vmcb_pa) >> PAGE_SHIFT));
+	svm_clear_current_vmcb(svm->vmcb);
+	__free_page(pfn_to_page(__sme_clr(svm->vmcb01_pa) >> PAGE_SHIFT));
+	__free_page(pfn_to_page(__sme_clr(svm->nested.vmcb02_pa) >> PAGE_SHIFT));
 	__free_pages(virt_to_page(svm->msrpm), MSRPM_ALLOC_ORDER);
-	__free_page(virt_to_page(svm->nested.hsave));
 	__free_pages(virt_to_page(svm->nested.msrpm), MSRPM_ALLOC_ORDER);
 }
 
@@ -1394,7 +1398,7 @@ static void svm_clear_vintr(struct vcpu_svm *svm)
 	/* Drop int_ctl fields related to VINTR injection.  */
 	svm->vmcb->control.int_ctl &= mask;
 	if (is_guest_mode(&svm->vcpu)) {
-		svm->nested.hsave->control.int_ctl &= mask;
+		svm->vmcb01->control.int_ctl &= mask;
 
 		WARN_ON((svm->vmcb->control.int_ctl & V_TPR_MASK) !=
 			(svm->nested.ctl.int_ctl & V_TPR_MASK));
@@ -3134,7 +3138,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu)
 	if (is_guest_mode(vcpu)) {
 		/* As long as interrupts are being delivered...  */
 		if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK)
-		    ? !(svm->nested.hsave->save.rflags & X86_EFLAGS_IF)
+		    ? !(svm->vmcb01->save.rflags & X86_EFLAGS_IF)
 		    : !(kvm_get_rflags(vcpu) & X86_EFLAGS_IF))
 			return true;
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 862f0d2405e8..1a4af5bd10b3 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -82,7 +82,8 @@ struct kvm_svm {
 struct kvm_vcpu;
 
 struct svm_nested_state {
-	struct vmcb *hsave;
+	struct vmcb *vmcb02;
+	unsigned long vmcb02_pa;
 	u64 hsave_msr;
 	u64 vm_cr_msr;
 	u64 vmcb;
@@ -103,6 +104,8 @@ struct vcpu_svm {
 	struct kvm_vcpu vcpu;
 	struct vmcb *vmcb;
 	unsigned long vmcb_pa;
+	struct vmcb *vmcb01;
+	unsigned long vmcb01_pa;
 	struct svm_cpu_data *svm_data;
 	u32 asid;
 	uint64_t asid_generation;
@@ -207,44 +210,28 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu)
 	return container_of(vcpu, struct vcpu_svm, vcpu);
 }
 
-static inline struct vmcb *get_host_vmcb(struct vcpu_svm *svm)
-{
-	if (is_guest_mode(&svm->vcpu))
-		return svm->nested.hsave;
-	else
-		return svm->vmcb;
-}
-
 static inline void set_cr_intercept(struct vcpu_svm *svm, int bit)
 {
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb->control.intercept_cr |= (1U << bit);
+	svm->vmcb01->control.intercept_cr |= (1U << bit);
 
 	recalc_intercepts(svm);
 }
 
 static inline void clr_cr_intercept(struct vcpu_svm *svm, int bit)
 {
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb->control.intercept_cr &= ~(1U << bit);
+	svm->vmcb01->control.intercept_cr &= ~(1U << bit);
 
 	recalc_intercepts(svm);
 }
 
 static inline bool is_cr_intercept(struct vcpu_svm *svm, int bit)
 {
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	return vmcb->control.intercept_cr & (1U << bit);
+	return svm->vmcb01->control.intercept_cr & (1U << bit);
 }
 
 static inline void set_dr_intercepts(struct vcpu_svm *svm)
 {
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb->control.intercept_dr = (1 << INTERCEPT_DR0_READ)
+	svm->vmcb01->control.intercept_dr = (1 << INTERCEPT_DR0_READ)
 		| (1 << INTERCEPT_DR1_READ)
 		| (1 << INTERCEPT_DR2_READ)
 		| (1 << INTERCEPT_DR3_READ)
@@ -266,45 +253,35 @@ static inline void set_dr_intercepts(struct vcpu_svm *svm)
 
 static inline void clr_dr_intercepts(struct vcpu_svm *svm)
 {
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb->control.intercept_dr = 0;
+	svm->vmcb01->control.intercept_dr = 0;
 
 	recalc_intercepts(svm);
 }
 
 static inline void set_exception_intercept(struct vcpu_svm *svm, int bit)
 {
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb->control.intercept_exceptions |= (1U << bit);
+	svm->vmcb01->control.intercept_exceptions |= (1U << bit);
 
 	recalc_intercepts(svm);
 }
 
 static inline void clr_exception_intercept(struct vcpu_svm *svm, int bit)
 {
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb->control.intercept_exceptions &= ~(1U << bit);
+	svm->vmcb01->control.intercept_exceptions &= ~(1U << bit);
 
 	recalc_intercepts(svm);
 }
 
 static inline void svm_set_intercept(struct vcpu_svm *svm, int bit)
 {
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb->control.intercept |= (1ULL << bit);
+	svm->vmcb01->control.intercept |= (1ULL << bit);
 
 	recalc_intercepts(svm);
 }
 
 static inline void svm_clr_intercept(struct vcpu_svm *svm, int bit)
 {
-	struct vmcb *vmcb = get_host_vmcb(svm);
-
-	vmcb->control.intercept &= ~(1ULL << bit);
+	svm->vmcb01->control.intercept &= ~(1ULL << bit);
 
 	recalc_intercepts(svm);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest
  2020-10-11 18:48 ` [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest Cathy Avery
@ 2020-10-13  1:33   ` Sean Christopherson
  2020-11-13 17:36     ` Paolo Bonzini
  2020-11-13 17:58   ` Paolo Bonzini
  1 sibling, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2020-10-13  1:33 UTC (permalink / raw)
  To: Cathy Avery; +Cc: linux-kernel, kvm, pbonzini, vkuznets, wei.huang2, mlevitsk

On Sun, Oct 11, 2020 at 02:48:18PM -0400, Cathy Avery wrote:
> @@ -628,8 +620,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>  	nested_vmcb->control.pause_filter_thresh =
>  		svm->vmcb->control.pause_filter_thresh;
>  
> -	/* Restore the original control entries */
> -	copy_vmcb_control_area(&vmcb->control, &hsave->control);
> +	nested_svm_vmloadsave(svm->nested.vmcb02, svm->vmcb01);
> +
> +	svm->vmcb = svm->vmcb01;
> +	svm->vmcb_pa = svm->vmcb01_pa;

I very highly recommend adding a helper to switch VMCB.  Odds are very good
there will be more than just these two lines of boilerplate code for changing
the active VMCB.

>  
>  	/* On vmexit the  GIF is set to false */
>  	svm_set_gif(svm, false);

...

> @@ -1121,16 +1102,24 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu,
>  	if (!(save.cr0 & X86_CR0_PG))
>  		return -EINVAL;
>  
> +	svm->nested.vmcb02->control = svm->vmcb01->control;
> +	svm->nested.vmcb02->save = svm->vmcb01->save;
> +	svm->vmcb01->save = save;
> +
> +	WARN_ON(svm->vmcb == svm->nested.vmcb02);

I'm pretty sure this is user triggerable.  AFAIK, nothing prevents calling
svm_set_nested_state() while L2 is active, e.g. VMX explicitly (and forcefully)
kicks the vCPU out of L2 in vmx_set_nested_state().

> +
> +	svm->nested.vmcb = kvm_state->hdr.svm.vmcb_pa;
> +
> +	svm->vmcb = svm->nested.vmcb02;
> +	svm->vmcb_pa = svm->nested.vmcb02_pa;
> +
>  	/*
> -	 * All checks done, we can enter guest mode.  L1 control fields
> -	 * come from the nested save state.  Guest state is already
> -	 * in the registers, the save area of the nested state instead
> -	 * contains saved L1 state.
> +	 * All checks done, we can enter guest mode. L2 control fields will
> +	 * be the result of a combination of L1 and userspace indicated
> +	 * L12.control. The save area of L1 vmcb now contains the userspace
> +	 * indicated L1.save.
>  	 */
> -	copy_vmcb_control_area(&hsave->control, &svm->vmcb->control);
> -	hsave->save = save;
>  
> -	svm->nested.vmcb = kvm_state->hdr.svm.vmcb_pa;
>  	load_nested_vmcb_control(svm, &ctl);
>  	nested_prepare_vmcb_control(svm);
>  

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest
  2020-10-13  1:33   ` Sean Christopherson
@ 2020-11-13 17:36     ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2020-11-13 17:36 UTC (permalink / raw)
  To: Sean Christopherson, Cathy Avery
  Cc: linux-kernel, kvm, vkuznets, wei.huang2, mlevitsk

On 13/10/20 03:33, Sean Christopherson wrote:
>> +	svm->vmcb = svm->vmcb01;
>> +	svm->vmcb_pa = svm->vmcb01_pa;
> I very highly recommend adding a helper to switch VMCB.  Odds are very good
> there will be more than just these two lines of boilerplate code for changing
> the active VMCB.

Yes, probably we can make svm->vmcb01 and svm->vmcb02 something like 
VMX's struct loaded_vmcs:

struct kvm_vmcb {
	void *vmcb;
	unsigned long pa;
}

I don't expect a lot more to happen due to SVM having no need for 
caching, so for now I think it's okay.

I have other comments for which I'll reply to the patch itself.

Paolo


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest
  2020-10-11 18:48 ` [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest Cathy Avery
  2020-10-13  1:33   ` Sean Christopherson
@ 2020-11-13 17:58   ` Paolo Bonzini
  2020-11-13 20:38     ` Paolo Bonzini
  1 sibling, 1 reply; 14+ messages in thread
From: Paolo Bonzini @ 2020-11-13 17:58 UTC (permalink / raw)
  To: Cathy Avery, linux-kernel, kvm; +Cc: vkuznets, wei.huang2, mlevitsk

On 11/10/20 20:48, Cathy Avery wrote:
> @@ -432,6 +432,16 @@ int enter_svm_guest_mode(struct vcpu_svm *svm, u64 vmcb_gpa,
>   	int ret;
>   
>   	svm->nested.vmcb = vmcb_gpa;
> +
> +	WARN_ON(svm->vmcb == svm->nested.vmcb02);
> +
> +	svm->nested.vmcb02->control = svm->vmcb01->control;

This assignment of the control area should be in 
nested_prepare_vmcb_control, which is already filling in most of 
vmcb02->control.

Right now, we save a copy_vmcb_control-area in nested_svm_vmexit so it 
evens out.

Later, it should be possible to remove most of the assignments from 
nested_prepare_vmcb_control.

> +	svm->nested.vmcb02->save.cr4 = svm->vmcb01->save.cr4;

I cannot understand this statement.

> +	nested_svm_vmloadsave(svm->vmcb01, svm->nested.vmcb02);

This is because the vmsave just after the vmexit has moved the 
vmloadsave registers from vmcb12 to vmcb01, but the next vmload will use 
vmcb02.

> +	svm->vmcb = svm->nested.vmcb02;
> +	svm->vmcb_pa = svm->nested.vmcb02_pa;
>   	load_nested_vmcb_control(svm, &nested_vmcb->control);
>   	nested_prepare_vmcb_save(svm, nested_vmcb);
>   	nested_prepare_vmcb_control(svm);

> @@ -628,8 +620,10 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
>   	nested_vmcb->control.pause_filter_thresh =
>   		svm->vmcb->control.pause_filter_thresh;
>   
> -	/* Restore the original control entries */
> -	copy_vmcb_control_area(&vmcb->control, &hsave->control);
> +	nested_svm_vmloadsave(svm->nested.vmcb02, svm->vmcb01);

Same here: the next vmentry's vmload will move the vmloadsave registers 
from vmcb01 to vmcb12, but for now they are in vmcb02.

It's 16+16 memory-to-memory u64 copies.  They mostly even out with the 
14+14 copies that we don't have to do anymore on registers handled by 
VMRUN (es/cs/ss/ds/gdt/idt/rsp/rax---they don't have to be stashed away 
in hsave anymore).  Also, we are able to reuse nested_svm_vmloadsave, 
which makes it overall a fair tradeoff... but it would have been worth a 
comment or two. :)

> +	svm->nested.vmcb02->control = svm->vmcb01->control;
> +	svm->nested.vmcb02->save = svm->vmcb01->save;
> +	svm->vmcb01->save = save;

I would have moved these after the comment, matching the existing 
copy_vmcb_control_area and save-area assignment.

Also, the first save-area assignment should be (because the WARN_ON 
below must be removed)

	svm->nested.vmcb02->save = svm->vmcb->save;

or

	if (svm->vmcb == svm->vmcb01)
		svm->nested.vmcb02->save = svm->vmcb01->save;

I have applied the patch and fixed the conflicts, so when I have some 
time I will play a bit more with it and/or pass the updated version back 
to you.

In the meanwhile, perhaps you could write a new selftests testcase that 
tries to do KVM_SET_NESTED_STATE while in L2.  It can be a simplified 
version of state-test that invokes KVM_GET_NESTED_STATE + 
KVM_SET_NESTED_STATE when it gets back to L0.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest
  2020-11-13 17:58   ` Paolo Bonzini
@ 2020-11-13 20:38     ` Paolo Bonzini
  0 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2020-11-13 20:38 UTC (permalink / raw)
  To: Cathy Avery, linux-kernel, kvm; +Cc: vkuznets, wei.huang2, mlevitsk

On 13/11/20 18:58, Paolo Bonzini wrote:
> 
>> +    svm->nested.vmcb02->save.cr4 = svm->vmcb01->save.cr4;
> 
> I cannot understand this statement.

I wonder if it has something to do with

         unsigned long old_cr4 = to_svm(vcpu)->vmcb->save.cr4;

whereas vmx.c has

         unsigned long old_cr4 = vcpu->arch.cr4;

without this assignment, the old_cr4 would be taken from the last value 
stored in the vmcb02, instead of the current value for the vCPU.

In general uses of svm->vmcb01 (in svm.c especially) needs to be audited 
carefully.

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2
  2020-10-11 18:48 [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Cathy Avery
  2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery
  2020-10-11 18:48 ` [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest Cathy Avery
@ 2020-11-17 11:03 ` Paolo Bonzini
  2 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2020-11-17 11:03 UTC (permalink / raw)
  To: Cathy Avery, linux-kernel, kvm
  Cc: vkuznets, wei.huang2, mlevitsk, Sean Christopherson

Hi Cathy,

I found an issue with the second patch: the svm->asid_generation and 
svm->vcpu.cpu fields must become per-VMCB.  Once again we are 
rediscovering what VMX already does (for different reasons) with its 
struct loaded_vmcs.

The good thing is that it can be worked around easily: for the ASID 
generation, it simply be cleared after changing svm->vmcb.  For the CPU 
field, it's not an issue yet because the VMCB is marked all-dirty after 
each switch.  With these workarounds, everything works nicely.

However, removing the calls to vmcb_mark_all_dirty is the main 
optimization that is enabled by the VMCB01/VMCB02 split, so it should be 
fixed too.  And also, the code duplication every time svm->vmcb is 
assigned starts to be ugly, proving Sean to be right. :)

My suggestion is that you do something like this:

1) create a struct for all per-VMCB data:

	struct kvm_vmcb_info {
		void *ptr;
		u64 pa;
	}

and use it in structs vcpu_svm and svm_nested_state:

	struct vcpu_svm {
		...
		struct kvm_vmcb_info vmcb01;
		struct kvm_vmcb_info *current_vmcb;
		void *vmcb;
		u64 vmcb_pa;
		...
	}

	struct svm_nested_state {
		struct kvm_vmcb_info vmcb02;
		...
	}

The struct can be passed to a vmcb_switch function like this one:

	void vmcb_switch(struct vcpu_svm *svm,
			 struct kvm_vmcb_info *target_vmcb)
	{
		svm->current_vmcb = target_vmcb;
		svm->vmcb = target_vmcb->ptr;
		svm->vmcb_pa = target_vmcb->pa;

		/*
		 * Workaround: we don't yet track the ASID generation
		 * that was active the last time target_vmcb was run.
		 */
		svm->asid_generation = 0;

		/*
		 * Workaround: we don't yet track the physical CPU that
		 * target_vmcb has run on.
		 */
		vmcb_mark_all_dirty(svm->vmcb);
	}

You can use this function every time the current code is assigning to 
svm->vmcb.  Once the struct and function is in place, we can proceed to 
removing the last two (inefficient) lines of vmcb_switch by augmenting 
struct kvm_vmcb_info.

2) First, add an "int cpu" field.  Move the vcpu->cpu check from 
svm_vcpu_load to pre_svm_run, using svm->current_vmcb->cpu instead of 
vcpu->cpu, and you will be able to remove the vmcb_mark_all_dirty call 
from vmcb_switch.

3) Then do the same with asid_generation.  All uses of 
svm->asid_generation become uses of svm->current_vmcb->asid_generation, 
and you can remove the clearing of svm->asid_generation.

These can be three separate patches on top of the changes you have sent 
(or rather the rebased version, see below).  Writing good commit 
messages for them will be a good exercise too. :)

I have pushed the current nested SVM queue to kvm.git on a "nested-svm" 
branch.  You can discard the top commits and work on top of commit 
a33b86f151a0 from that branch ("KVM: SVM: Use a separate vmcb for the 
nested L2 guest", 2020-11-17).

Once this is done, we can start reaping the advantages of the 
VMCB01/VMCB02 split.  Some patches for that are already in the 
nested-svm branch, but there's more fun to be had.  First of all, 
Maxim's ill-fated attempt at using VMCB12 clean bits will now work. 
Second, we can try doing VMLOAD/VMSAVE always from VMCB01 (while VMRUN 
switches between VMCB01 and VMCB02) and therefore remove the 
nested_svm_vmloadsave calls from nested_vmrun and nested_vmexit.  But, 
one thing at a time.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-11-30 21:07 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-11 18:48 [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Cathy Avery
2020-10-11 18:48 ` [PATCH v2 1/2] KVM: SVM: Move asid to vcpu_svm Cathy Avery
2020-10-13  1:29   ` Sean Christopherson
2020-11-13 17:03     ` Paolo Bonzini
2020-11-13 16:57   ` Paolo Bonzini
2020-11-29  9:41     ` Ashish Kalra
2020-11-30 14:41       ` Paolo Bonzini
2020-11-30 21:06         ` Ashish Kalra
2020-10-11 18:48 ` [PATCH v2 2/2] KVM: SVM: Use a separate vmcb for the nested L2 guest Cathy Avery
2020-10-13  1:33   ` Sean Christopherson
2020-11-13 17:36     ` Paolo Bonzini
2020-11-13 17:58   ` Paolo Bonzini
2020-11-13 20:38     ` Paolo Bonzini
2020-11-17 11:03 ` [PATCH v2 0/2] KVM: SVM: Create separate vmcbs for L1 and L2 Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).