linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] SVM fixes + apic fix
@ 2022-03-01 13:55 Maxim Levitsky
  2022-03-01 13:55 ` [PATCH 1/4] KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SW Maxim Levitsky
                   ` (3 more replies)
  0 siblings, 4 replies; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-01 13:55 UTC (permalink / raw)
  To: kvm
  Cc: Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Sean Christopherson,
	Thomas Gleixner, Dave Hansen, Wanpeng Li, Borislav Petkov, x86,
	Maxim Levitsky

Those are few bug fixes which are in my patch queue,
rebased against current kvm/queue.

Best regards,
	Maxim Levitsky

Maxim Levitsky (4):
  KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SW
  KVM: x86: SVM: disable preemption in avic_refresh_apicv_exec_ctrl
  KVM: x86: SVM: use vmcb01 in avic_init_vmcb
  KVM: x86: lapic: don't allow to set non default apic id when not using
    x2apic api

 arch/x86/kvm/lapic.c    | 17 ++++++++---------
 arch/x86/kvm/svm/avic.c |  6 +++++-
 arch/x86/kvm/svm/svm.c  |  2 +-
 3 files changed, 14 insertions(+), 11 deletions(-)

-- 
2.26.3



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/4] KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SW
  2022-03-01 13:55 [PATCH 0/4] SVM fixes + apic fix Maxim Levitsky
@ 2022-03-01 13:55 ` Maxim Levitsky
  2022-03-01 16:31   ` Sean Christopherson
  2022-03-01 13:55 ` [PATCH 2/4] KVM: x86: SVM: disable preemption in avic_refresh_apicv_exec_ctrl Maxim Levitsky
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-01 13:55 UTC (permalink / raw)
  To: kvm
  Cc: Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Sean Christopherson,
	Thomas Gleixner, Dave Hansen, Wanpeng Li, Borislav Petkov, x86,
	Maxim Levitsky

Use a dummy unused vmexit reason to mark the 'VM exit' that is happening
when kvm exits to handle SMM, which is not a real VM exit.

This makes it a bit easier to read the KVM trace, and avoids
other potential problems.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7038c76fa8410..c08fd7f4f3414 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4218,7 +4218,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
 	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
 	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
 
-	ret = nested_svm_vmexit(svm);
+	ret = nested_svm_simple_vmexit(svm, SVM_EXIT_SW);
 	if (ret)
 		return ret;
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/4] KVM: x86: SVM: disable preemption in avic_refresh_apicv_exec_ctrl
  2022-03-01 13:55 [PATCH 0/4] SVM fixes + apic fix Maxim Levitsky
  2022-03-01 13:55 ` [PATCH 1/4] KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SW Maxim Levitsky
@ 2022-03-01 13:55 ` Maxim Levitsky
  2022-03-01 17:15   ` Sean Christopherson
  2022-03-01 13:55 ` [PATCH 3/4] KVM: x86: SVM: use vmcb01 in avic_init_vmcb Maxim Levitsky
  2022-03-01 13:55 ` [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api Maxim Levitsky
  3 siblings, 1 reply; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-01 13:55 UTC (permalink / raw)
  To: kvm
  Cc: Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Sean Christopherson,
	Thomas Gleixner, Dave Hansen, Wanpeng Li, Borislav Petkov, x86,
	Maxim Levitsky

avic_refresh_apicv_exec_ctrl is called from vcpu_enter_guest,
without preemption disabled, however avic_vcpu_load, and
avic_vcpu_put expect preemption to be disabled.

This issue was found by lockdep.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index aea0b13773fd3..e23159f3a62ba 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -640,12 +640,16 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 	}
 	vmcb_mark_dirty(vmcb, VMCB_AVIC);
 
+	preempt_disable();
+
 	if (activated)
 		avic_vcpu_load(vcpu, vcpu->cpu);
 	else
 		avic_vcpu_put(vcpu);
 
 	avic_set_pi_irte_mode(vcpu, activated);
+
+	preempt_enable();
 }
 
 static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/4] KVM: x86: SVM: use vmcb01 in avic_init_vmcb
  2022-03-01 13:55 [PATCH 0/4] SVM fixes + apic fix Maxim Levitsky
  2022-03-01 13:55 ` [PATCH 1/4] KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SW Maxim Levitsky
  2022-03-01 13:55 ` [PATCH 2/4] KVM: x86: SVM: disable preemption in avic_refresh_apicv_exec_ctrl Maxim Levitsky
@ 2022-03-01 13:55 ` Maxim Levitsky
  2022-03-01 16:21   ` Sean Christopherson
  2022-03-01 13:55 ` [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api Maxim Levitsky
  3 siblings, 1 reply; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-01 13:55 UTC (permalink / raw)
  To: kvm
  Cc: Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Sean Christopherson,
	Thomas Gleixner, Dave Hansen, Wanpeng Li, Borislav Petkov, x86,
	Maxim Levitsky

Out of precation use vmcb01 when enabling host AVIC.
No functional change intended.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/avic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index e23159f3a62ba..9656e192c646b 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -167,7 +167,7 @@ int avic_vm_init(struct kvm *kvm)
 
 void avic_init_vmcb(struct vcpu_svm *svm)
 {
-	struct vmcb *vmcb = svm->vmcb;
+	struct vmcb *vmcb = svm->vmcb01.ptr;
 	struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
 	phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page));
 	phys_addr_t lpa = __sme_set(page_to_phys(kvm_svm->avic_logical_id_table_page));
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api
  2022-03-01 13:55 [PATCH 0/4] SVM fixes + apic fix Maxim Levitsky
                   ` (2 preceding siblings ...)
  2022-03-01 13:55 ` [PATCH 3/4] KVM: x86: SVM: use vmcb01 in avic_init_vmcb Maxim Levitsky
@ 2022-03-01 13:55 ` Maxim Levitsky
  2022-03-01 16:56   ` Sean Christopherson
  3 siblings, 1 reply; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-01 13:55 UTC (permalink / raw)
  To: kvm
  Cc: Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Sean Christopherson,
	Thomas Gleixner, Dave Hansen, Wanpeng Li, Borislav Petkov, x86,
	Maxim Levitsky

Fix a loop hole in setting the apic state that didn't check if
apic id == vcpu_id when x2apic is enabled but userspace is using
a older variant of the ioctl which didn't had 32 bit apic ids.

Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/lapic.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 80a2020c4db40..8d35f56c64020 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
 		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
 		u64 icr;
 
-		if (vcpu->kvm->arch.x2apic_format) {
-			if (*id != vcpu->vcpu_id)
-				return -EINVAL;
-		} else {
-			if (set)
-				*id >>= 24;
-			else
-				*id <<= 24;
-		}
+		if (!vcpu->kvm->arch.x2apic_format && set)
+			*id >>= 24;
+
+		if (*id != vcpu->vcpu_id)
+			return -EINVAL;
+
+		if (!vcpu->kvm->arch.x2apic_format && !set)
+			*id <<= 24;
 
 		/*
 		 * In x2APIC mode, the LDR is fixed and based on the id.  And
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] KVM: x86: SVM: use vmcb01 in avic_init_vmcb
  2022-03-01 13:55 ` [PATCH 3/4] KVM: x86: SVM: use vmcb01 in avic_init_vmcb Maxim Levitsky
@ 2022-03-01 16:21   ` Sean Christopherson
  2022-03-01 17:25     ` Maxim Levitsky
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-03-01 16:21 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

Just "KVM: SVM:" for the shortlog, please.

On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> Out of precation use vmcb01 when enabling host AVIC.
> No functional change intended.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/avic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index e23159f3a62ba..9656e192c646b 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -167,7 +167,7 @@ int avic_vm_init(struct kvm *kvm)
>  
>  void avic_init_vmcb(struct vcpu_svm *svm)
>  {
> -	struct vmcb *vmcb = svm->vmcb;
> +	struct vmcb *vmcb = svm->vmcb01.ptr;

I don't like this change.  It's not bad code, but it'll be confusing because it
implies that it's legal for svm->vmcb to be something other than svm->vmcb01.ptr
when this is called.

If we want to guard AVIC, I'd much rather we do something like:

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7038c76fa841..dcc856bd628d 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -992,8 +992,12 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
 static void init_vmcb(struct kvm_vcpu *vcpu)
 {
        struct vcpu_svm *svm = to_svm(vcpu);
-       struct vmcb_control_area *control = &svm->vmcb->control;
-       struct vmcb_save_area *save = &svm->vmcb->save;
+       struct vmcb *vmcb = svm->vmcb01.ptr;
+       struct vmcb_control_area *control = &vmcb->control;
+       struct vmcb_save_area *save = &vmcb->save;
+
+       if (WARN_ON_ONCE(vmcb != svm->vmcb))
+               svm_leave_nested(vcpu);

        svm_set_intercept(svm, INTERCEPT_CR0_READ);
        svm_set_intercept(svm, INTERCEPT_CR3_READ);


On a related topic, init_vmcb_after_set_cpuid() is broken for nested, it needs to
play nice with being called when svm->vmcb == &svm->nested.vmcb02, e.g. update
vmcb01 and re-merge (or just recalc?) vmcb02's intercepts.

>  	struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
>  	phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page));
>  	phys_addr_t lpa = __sme_set(page_to_phys(kvm_svm->avic_logical_id_table_page));
> -- 
> 2.26.3
> 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SW
  2022-03-01 13:55 ` [PATCH 1/4] KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SW Maxim Levitsky
@ 2022-03-01 16:31   ` Sean Christopherson
  2022-03-01 17:13     ` Maxim Levitsky
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-03-01 16:31 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> Use a dummy unused vmexit reason to mark the 'VM exit' that is happening
> when kvm exits to handle SMM, which is not a real VM exit.

Why not use "62h VMEXIT_SMI"?

> This makes it a bit easier to read the KVM trace, and avoids
> other potential problems.

What other potential problems?

> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/svm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7038c76fa8410..c08fd7f4f3414 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -4218,7 +4218,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
>  	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
>  	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
>  
> -	ret = nested_svm_vmexit(svm);
> +	ret = nested_svm_simple_vmexit(svm, SVM_EXIT_SW);
>  	if (ret)
>  		return ret;
>  
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api
  2022-03-01 13:55 ` [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api Maxim Levitsky
@ 2022-03-01 16:56   ` Sean Christopherson
  2022-03-01 17:09     ` Maxim Levitsky
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-03-01 16:56 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

Please, please post standalone patches/fixes as standalone patches/fixes.  And in
general, keep series to one topic.  There is very real value in following the
established and documented process, e.g. avoids creating artificial dependencies
where a changes works only because of an "unrelated" patch earlier in the series.
And for us reviewers, it helps tremendously as it means I can scan just the cover
letter for a series to prioritize review accordingly.  Bundling things together
means I have to scan through every patch to triage the series..

On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> Fix a loop hole in setting the apic state that didn't check if

Heh, "loophole", took me a second to figure out there was no literal loop. :-)

> apic id == vcpu_id when x2apic is enabled but userspace is using
> a older variant of the ioctl which didn't had 32 bit apic ids.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/lapic.c | 17 ++++++++---------
>  1 file changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 80a2020c4db40..8d35f56c64020 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
>  		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
>  		u64 icr;
>  
> -		if (vcpu->kvm->arch.x2apic_format) {
> -			if (*id != vcpu->vcpu_id)
> -				return -EINVAL;
> -		} else {
> -			if (set)
> -				*id >>= 24;
> -			else
> -				*id <<= 24;
> -		}
> +		if (!vcpu->kvm->arch.x2apic_format && set)
> +			*id >>= 24;
> +
> +		if (*id != vcpu->vcpu_id)
> +			return -EINVAL;

This breaks backwards compability, userspace will start failing where it previously
succeeded.  It doesn't even require a malicious userspace, e.g. if userspace creates
a vCPU with a vcpu_id > 255 vCPUs, the shift will truncate and cause failure.  It'll
obviously do weird things, but this code is 5.5 years old, I don't think it's worth
trying to close a loophole that doesn't harm KVM.

If we provide a way for userspace to opt into disallowiong APICID != vcpu_id, we
can address this further upstream, e.g. reject vcpu_id > 255 without x2apic_format.

> +
> +		if (!vcpu->kvm->arch.x2apic_format && !set)
> +			*id <<= 24;
>  
>  		/*
>  		 * In x2APIC mode, the LDR is fixed and based on the id.  And
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api
  2022-03-01 16:56   ` Sean Christopherson
@ 2022-03-01 17:09     ` Maxim Levitsky
  2022-03-01 17:46       ` Sean Christopherson
  0 siblings, 1 reply; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-01 17:09 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Tue, 2022-03-01 at 16:56 +0000, Sean Christopherson wrote:
> Please, please post standalone patches/fixes as standalone patches/fixes.  And in
> general, keep series to one topic.  There is very real value in following the
> established and documented process, e.g. avoids creating artificial dependencies
> where a changes works only because of an "unrelated" patch earlier in the series.
> And for us reviewers, it helps tremendously as it means I can scan just the cover
> letter for a series to prioritize review accordingly.  Bundling things together
> means I have to scan through every patch to triage the series..

I know, this series is just set of small fixes - this patch belongs to the series
of my nested avic, but as was requested, I posted it seperately as part of
'fixes only' series, since it is stanadlone. All patches in this series
are standalone.

> 
> On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > Fix a loop hole in setting the apic state that didn't check if
> 
> Heh, "loophole", took me a second to figure out there was no literal loop. :-)
> 
> > apic id == vcpu_id when x2apic is enabled but userspace is using
> > a older variant of the ioctl which didn't had 32 bit apic ids.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/lapic.c | 17 ++++++++---------
> >  1 file changed, 8 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 80a2020c4db40..8d35f56c64020 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> >  		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
> >  		u64 icr;
> >  
> > -		if (vcpu->kvm->arch.x2apic_format) {
> > -			if (*id != vcpu->vcpu_id)
> > -				return -EINVAL;
> > -		} else {
> > -			if (set)
> > -				*id >>= 24;
> > -			else
> > -				*id <<= 24;
> > -		}
> > +		if (!vcpu->kvm->arch.x2apic_format && set)
> > +			*id >>= 24;
> > +
> > +		if (*id != vcpu->vcpu_id)
> > +			return -EINVAL;
> 
> This breaks backwards compability, userspace will start failing where it previously
> succeeded.  It doesn't even require a malicious userspace, e.g. if userspace creates
> a vCPU with a vcpu_id > 255 vCPUs, the shift will truncate and cause failure.  It'll
> obviously do weird things, but this code is 5.5 years old, I don't think it's worth
> trying to close a loophole that doesn't harm KVM.
> 
> If we provide a way for userspace to opt into disallowiong APICID != vcpu_id, we
> can address this further upstream, e.g. reject vcpu_id > 255 without x2apic_format.

This check is only when CPU is in x2apic mode. In this mode the X86 specs demands that
apic_id == vcpu_id.

When old API is used, APIC IDs are encoded in xapic format, even when vCPU is in x2apic
mode, meaning that apic id can't be more  that 255 even in x2apic mode.
That is why new API 'x2apic_format' was added in first place.

Thus I don't see how this is breaking userspace.

Best regards,
	Maxim Levitsky



> 
> > +
> > +		if (!vcpu->kvm->arch.x2apic_format && !set)
> > +			*id <<= 24;
> >  
> >  		/*
> >  		 * In x2APIC mode, the LDR is fixed and based on the id.  And
> > -- 
> > 2.26.3
> > 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SW
  2022-03-01 16:31   ` Sean Christopherson
@ 2022-03-01 17:13     ` Maxim Levitsky
  2022-03-09 15:46       ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-01 17:13 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Tue, 2022-03-01 at 16:31 +0000, Sean Christopherson wrote:
> On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > Use a dummy unused vmexit reason to mark the 'VM exit' that is happening
> > when kvm exits to handle SMM, which is not a real VM exit.
> 
> Why not use "62h VMEXIT_SMI"?

Because VMEXIT_SMI is real vmexit which happens when L1 intercepts #SMI
And here nested_svm_vmexit is abused to just exit guest mode without vmexit.

> 
> > This makes it a bit easier to read the KVM trace, and avoids
> > other potential problems.
> 
> What other potential problems?

The fact that we have a stale VM exit reason in vmcb without this
patch which can be in theory consumed somewhere down the road.

This stale vm exit reason also appears in the tracs which is
very misleading.

Best regards,
	Maxim Levitsky

> 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/svm.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> > index 7038c76fa8410..c08fd7f4f3414 100644
> > --- a/arch/x86/kvm/svm/svm.c
> > +++ b/arch/x86/kvm/svm/svm.c
> > @@ -4218,7 +4218,7 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate)
> >  	svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP];
> >  	svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP];
> >  
> > -	ret = nested_svm_vmexit(svm);
> > +	ret = nested_svm_simple_vmexit(svm, SVM_EXIT_SW);
> >  	if (ret)
> >  		return ret;
> >  
> > -- 
> > 2.26.3
> > 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] KVM: x86: SVM: disable preemption in avic_refresh_apicv_exec_ctrl
  2022-03-01 13:55 ` [PATCH 2/4] KVM: x86: SVM: disable preemption in avic_refresh_apicv_exec_ctrl Maxim Levitsky
@ 2022-03-01 17:15   ` Sean Christopherson
  2022-03-01 17:20     ` Maxim Levitsky
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-03-01 17:15 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> avic_refresh_apicv_exec_ctrl is called from vcpu_enter_guest,
> without preemption disabled, however avic_vcpu_load, and
> avic_vcpu_put expect preemption to be disabled.
> 
> This issue was found by lockdep.
> 
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/x86/kvm/svm/avic.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index aea0b13773fd3..e23159f3a62ba 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -640,12 +640,16 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
>  	}
>  	vmcb_mark_dirty(vmcb, VMCB_AVIC);
>  
> +	preempt_disable();
> +
>  	if (activated)
>  		avic_vcpu_load(vcpu, vcpu->cpu);
>  	else
>  		avic_vcpu_put(vcpu);
>  
>  	avic_set_pi_irte_mode(vcpu, activated);
> +
> +	preempt_enable();

Assuming avic_set_pi_irte_mode() doesn't need to be protected, I'd prefer the
below patch.  This is the second time we done messed this up.

From: Sean Christopherson <seanjc@google.com>
Date: Tue, 1 Mar 2022 09:05:09 -0800
Subject: [PATCH] KVM: SVM: Disable preemption across AVIC load/put during
 APICv refresh

Disable preemption when loading/putting the AVIC during an APICv refresh.
If the vCPU task is preempted and migrated ot a different pCPU, the
unprotected avic_vcpu_load() could set the wrong pCPU in the physical ID
cache/table.

Pull the necessary code out of avic_vcpu_{,un}blocking() and into a new
helper to reduce the probability of introducing this exact bug a third
time.

Fixes: df7e4827c549 ("KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC")
Cc: stable@vger.kernel.org
Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/svm/avic.c | 103 ++++++++++++++++++++++------------------
 arch/x86/kvm/svm/svm.c  |   4 +-
 arch/x86/kvm/svm/svm.h  |   4 +-
 3 files changed, 60 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index aea0b13773fd..1afde44b1252 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -616,38 +616,6 @@ static int avic_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
 	return ret;
 }
 
-void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
-{
-	struct vcpu_svm *svm = to_svm(vcpu);
-	struct vmcb *vmcb = svm->vmcb01.ptr;
-	bool activated = kvm_vcpu_apicv_active(vcpu);
-
-	if (!enable_apicv)
-		return;
-
-	if (activated) {
-		/**
-		 * During AVIC temporary deactivation, guest could update
-		 * APIC ID, DFR and LDR registers, which would not be trapped
-		 * by avic_unaccelerated_access_interception(). In this case,
-		 * we need to check and update the AVIC logical APIC ID table
-		 * accordingly before re-activating.
-		 */
-		avic_apicv_post_state_restore(vcpu);
-		vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
-	} else {
-		vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
-	}
-	vmcb_mark_dirty(vmcb, VMCB_AVIC);
-
-	if (activated)
-		avic_vcpu_load(vcpu, vcpu->cpu);
-	else
-		avic_vcpu_put(vcpu);
-
-	avic_set_pi_irte_mode(vcpu, activated);
-}
-
 static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
 {
 	unsigned long flags;
@@ -899,7 +867,7 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
 	return ret;
 }
 
-void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
+void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	u64 entry;
 	/* ID = 0xff (broadcast), ID > 0xff (reserved) */
@@ -936,7 +904,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true);
 }
 
-void avic_vcpu_put(struct kvm_vcpu *vcpu)
+void __avic_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	u64 entry;
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -955,13 +923,63 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
 	WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
 }
 
-void avic_vcpu_blocking(struct kvm_vcpu *vcpu)
+static void avic_vcpu_load(struct kvm_vcpu *vcpu)
 {
-	if (!kvm_vcpu_apicv_active(vcpu))
-		return;
+	int cpu = get_cpu();
 
+	WARN_ON(cpu != vcpu->cpu);
+
+	__avic_vcpu_load(vcpu, cpu);
+
+	put_cpu();
+}
+
+static void avic_vcpu_put(struct kvm_vcpu *vcpu)
+{
 	preempt_disable();
 
+	__avic_vcpu_put(vcpu);
+
+	preempt_enable();
+}
+
+void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
+{
+	struct vcpu_svm *svm = to_svm(vcpu);
+	struct vmcb *vmcb = svm->vmcb01.ptr;
+	bool activated = kvm_vcpu_apicv_active(vcpu);
+
+	if (!enable_apicv)
+		return;
+
+	if (activated) {
+		/**
+		 * During AVIC temporary deactivation, guest could update
+		 * APIC ID, DFR and LDR registers, which would not be trapped
+		 * by avic_unaccelerated_access_interception(). In this case,
+		 * we need to check and update the AVIC logical APIC ID table
+		 * accordingly before re-activating.
+		 */
+		avic_apicv_post_state_restore(vcpu);
+		vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
+	} else {
+		vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
+	}
+	vmcb_mark_dirty(vmcb, VMCB_AVIC);
+
+	if (activated)
+		avic_vcpu_load(vcpu);
+	else
+		avic_vcpu_put(vcpu);
+
+	avic_set_pi_irte_mode(vcpu, activated);
+}
+
+void avic_vcpu_blocking(struct kvm_vcpu *vcpu)
+{
+	if (!kvm_vcpu_apicv_active(vcpu))
+		return;
+
        /*
         * Unload the AVIC when the vCPU is about to block, _before_
         * the vCPU actually blocks.
@@ -976,21 +994,12 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu)
         * the cause of errata #1235).
         */
 	avic_vcpu_put(vcpu);
-
-	preempt_enable();
 }
 
 void avic_vcpu_unblocking(struct kvm_vcpu *vcpu)
 {
-	int cpu;
-
 	if (!kvm_vcpu_apicv_active(vcpu))
 		return;
 
-	cpu = get_cpu();
-	WARN_ON(cpu != vcpu->cpu);
-
-	avic_vcpu_load(vcpu, cpu);
-
-	put_cpu();
+	avic_vcpu_load(vcpu);
 }
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7038c76fa841..c5e3f219803e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -1318,13 +1318,13 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 		indirect_branch_prediction_barrier();
 	}
 	if (kvm_vcpu_apicv_active(vcpu))
-		avic_vcpu_load(vcpu, cpu);
+		__avic_vcpu_load(vcpu, cpu);
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	if (kvm_vcpu_apicv_active(vcpu))
-		avic_vcpu_put(vcpu);
+		__avic_vcpu_put(vcpu);
 
 	svm_prepare_host_switch(vcpu);
 
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 70850cbe5bcb..e45b5645d5e0 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -576,8 +576,8 @@ void avic_init_vmcb(struct vcpu_svm *svm);
 int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu);
 int avic_unaccelerated_access_interception(struct kvm_vcpu *vcpu);
 int avic_init_vcpu(struct vcpu_svm *svm);
-void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
-void avic_vcpu_put(struct kvm_vcpu *vcpu);
+void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
+void __avic_vcpu_put(struct kvm_vcpu *vcpu);
 void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu);
 void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
 void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu);

base-commit: 44af02b939d6a6a166c9cd2d86d4c2a6959f0875
-- 


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] KVM: x86: SVM: disable preemption in avic_refresh_apicv_exec_ctrl
  2022-03-01 17:15   ` Sean Christopherson
@ 2022-03-01 17:20     ` Maxim Levitsky
  2022-03-01 17:23       ` Paolo Bonzini
  0 siblings, 1 reply; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-01 17:20 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Tue, 2022-03-01 at 17:15 +0000, Sean Christopherson wrote:
> On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > avic_refresh_apicv_exec_ctrl is called from vcpu_enter_guest,
> > without preemption disabled, however avic_vcpu_load, and
> > avic_vcpu_put expect preemption to be disabled.
> > 
> > This issue was found by lockdep.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/avic.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index aea0b13773fd3..e23159f3a62ba 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -640,12 +640,16 @@ void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> >  	}
> >  	vmcb_mark_dirty(vmcb, VMCB_AVIC);
> >  
> > +	preempt_disable();
> > +
> >  	if (activated)
> >  		avic_vcpu_load(vcpu, vcpu->cpu);
> >  	else
> >  		avic_vcpu_put(vcpu);
> >  
> >  	avic_set_pi_irte_mode(vcpu, activated);
> > +
> > +	preempt_enable();
> 
> Assuming avic_set_pi_irte_mode() doesn't need to be protected, I'd prefer the
> below patch.  This is the second time we done messed this up.
> 
> From: Sean Christopherson <seanjc@google.com>
> Date: Tue, 1 Mar 2022 09:05:09 -0800
> Subject: [PATCH] KVM: SVM: Disable preemption across AVIC load/put during
>  APICv refresh
> 
> Disable preemption when loading/putting the AVIC during an APICv refresh.
> If the vCPU task is preempted and migrated ot a different pCPU, the
> unprotected avic_vcpu_load() could set the wrong pCPU in the physical ID
> cache/table.
> 
> Pull the necessary code out of avic_vcpu_{,un}blocking() and into a new
> helper to reduce the probability of introducing this exact bug a third
> time.
> 
> Fixes: df7e4827c549 ("KVM: SVM: call avic_vcpu_load/avic_vcpu_put when enabling/disabling AVIC")
> Cc: stable@vger.kernel.org
> Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/svm/avic.c | 103 ++++++++++++++++++++++------------------
>  arch/x86/kvm/svm/svm.c  |   4 +-
>  arch/x86/kvm/svm/svm.h  |   4 +-
>  3 files changed, 60 insertions(+), 51 deletions(-)
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index aea0b13773fd..1afde44b1252 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -616,38 +616,6 @@ static int avic_set_pi_irte_mode(struct kvm_vcpu *vcpu, bool activate)
>  	return ret;
>  }
>  
> -void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> -{
> -	struct vcpu_svm *svm = to_svm(vcpu);
> -	struct vmcb *vmcb = svm->vmcb01.ptr;
> -	bool activated = kvm_vcpu_apicv_active(vcpu);
> -
> -	if (!enable_apicv)
> -		return;
> -
> -	if (activated) {
> -		/**
> -		 * During AVIC temporary deactivation, guest could update
> -		 * APIC ID, DFR and LDR registers, which would not be trapped
> -		 * by avic_unaccelerated_access_interception(). In this case,
> -		 * we need to check and update the AVIC logical APIC ID table
> -		 * accordingly before re-activating.
> -		 */
> -		avic_apicv_post_state_restore(vcpu);
> -		vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
> -	} else {
> -		vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
> -	}
> -	vmcb_mark_dirty(vmcb, VMCB_AVIC);
> -
> -	if (activated)
> -		avic_vcpu_load(vcpu, vcpu->cpu);
> -	else
> -		avic_vcpu_put(vcpu);
> -
> -	avic_set_pi_irte_mode(vcpu, activated);
> -}
> -
>  static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi)
>  {
>  	unsigned long flags;
> @@ -899,7 +867,7 @@ avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r)
>  	return ret;
>  }
>  
> -void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> +void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  {
>  	u64 entry;
>  	/* ID = 0xff (broadcast), ID > 0xff (reserved) */
> @@ -936,7 +904,7 @@ void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  	avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, true);
>  }
>  
> -void avic_vcpu_put(struct kvm_vcpu *vcpu)
> +void __avic_vcpu_put(struct kvm_vcpu *vcpu)
>  {
>  	u64 entry;
>  	struct vcpu_svm *svm = to_svm(vcpu);
> @@ -955,13 +923,63 @@ void avic_vcpu_put(struct kvm_vcpu *vcpu)
>  	WRITE_ONCE(*(svm->avic_physical_id_cache), entry);
>  }
>  
> -void avic_vcpu_blocking(struct kvm_vcpu *vcpu)
> +static void avic_vcpu_load(struct kvm_vcpu *vcpu)
>  {
> -	if (!kvm_vcpu_apicv_active(vcpu))
> -		return;
> +	int cpu = get_cpu();
>  
> +	WARN_ON(cpu != vcpu->cpu);
> +
> +	__avic_vcpu_load(vcpu, cpu);
> +
> +	put_cpu();
> +}
> +
> +static void avic_vcpu_put(struct kvm_vcpu *vcpu)
> +{
>  	preempt_disable();
>  
> +	__avic_vcpu_put(vcpu);
> +
> +	preempt_enable();
> +}
> +
> +void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
> +{
> +	struct vcpu_svm *svm = to_svm(vcpu);
> +	struct vmcb *vmcb = svm->vmcb01.ptr;
> +	bool activated = kvm_vcpu_apicv_active(vcpu);
> +
> +	if (!enable_apicv)
> +		return;
> +
> +	if (activated) {
> +		/**
> +		 * During AVIC temporary deactivation, guest could update
> +		 * APIC ID, DFR and LDR registers, which would not be trapped
> +		 * by avic_unaccelerated_access_interception(). In this case,
> +		 * we need to check and update the AVIC logical APIC ID table
> +		 * accordingly before re-activating.
> +		 */
> +		avic_apicv_post_state_restore(vcpu);
> +		vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
> +	} else {
> +		vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
> +	}
> +	vmcb_mark_dirty(vmcb, VMCB_AVIC);
> +
> +	if (activated)
> +		avic_vcpu_load(vcpu);
> +	else
> +		avic_vcpu_put(vcpu);
> +
> +	avic_set_pi_irte_mode(vcpu, activated);
> +}
> +
> +void avic_vcpu_blocking(struct kvm_vcpu *vcpu)
> +{
> +	if (!kvm_vcpu_apicv_active(vcpu))
> +		return;
> +
>         /*
>          * Unload the AVIC when the vCPU is about to block, _before_
>          * the vCPU actually blocks.
> @@ -976,21 +994,12 @@ void avic_vcpu_blocking(struct kvm_vcpu *vcpu)
>          * the cause of errata #1235).
>          */
>  	avic_vcpu_put(vcpu);
> -
> -	preempt_enable();
>  }
>  
>  void avic_vcpu_unblocking(struct kvm_vcpu *vcpu)
>  {
> -	int cpu;
> -
>  	if (!kvm_vcpu_apicv_active(vcpu))
>  		return;
>  
> -	cpu = get_cpu();
> -	WARN_ON(cpu != vcpu->cpu);
> -
> -	avic_vcpu_load(vcpu, cpu);
> -
> -	put_cpu();
> +	avic_vcpu_load(vcpu);
>  }
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7038c76fa841..c5e3f219803e 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1318,13 +1318,13 @@ static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>  		indirect_branch_prediction_barrier();
>  	}
>  	if (kvm_vcpu_apicv_active(vcpu))
> -		avic_vcpu_load(vcpu, cpu);
> +		__avic_vcpu_load(vcpu, cpu);
>  }
>  
>  static void svm_vcpu_put(struct kvm_vcpu *vcpu)
>  {
>  	if (kvm_vcpu_apicv_active(vcpu))
> -		avic_vcpu_put(vcpu);
> +		__avic_vcpu_put(vcpu);
>  
>  	svm_prepare_host_switch(vcpu);
>  
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 70850cbe5bcb..e45b5645d5e0 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -576,8 +576,8 @@ void avic_init_vmcb(struct vcpu_svm *svm);
>  int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu);
>  int avic_unaccelerated_access_interception(struct kvm_vcpu *vcpu);
>  int avic_init_vcpu(struct vcpu_svm *svm);
> -void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
> -void avic_vcpu_put(struct kvm_vcpu *vcpu);
> +void __avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu);
> +void __avic_vcpu_put(struct kvm_vcpu *vcpu);
>  void avic_apicv_post_state_restore(struct kvm_vcpu *vcpu);
>  void avic_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
>  void avic_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu);
> 
> base-commit: 44af02b939d6a6a166c9cd2d86d4c2a6959f0875


I don't see that this patch is much different that what I proposed,
especially since disable of preemption can be nested.

But I won't be arguing with you about this.

Best regards,
	Maxim levitsky



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/4] KVM: x86: SVM: disable preemption in avic_refresh_apicv_exec_ctrl
  2022-03-01 17:20     ` Maxim Levitsky
@ 2022-03-01 17:23       ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2022-03-01 17:23 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Joerg Roedel, Thomas Gleixner, Dave Hansen, Wanpeng Li,
	Borislav Petkov, x86

On 3/1/22 18:20, Maxim Levitsky wrote:
> I don't see that this patch is much different that what I proposed,
> especially since disable of preemption can be nested.

The difference is that it's done in avic_vcpu_{load,put} instead of the 
caller.  I like that your patch is smaller, but I like that Sean's patch 
solves it for good.

I queued his, but maybe I could apply yours to 5.17 and his to 5.18.  I 
might do this if I have to send more stuff to Linus before the release.

Paolo


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] KVM: x86: SVM: use vmcb01 in avic_init_vmcb
  2022-03-01 16:21   ` Sean Christopherson
@ 2022-03-01 17:25     ` Maxim Levitsky
  2022-03-01 17:35       ` Sean Christopherson
  2022-03-09 15:48       ` Paolo Bonzini
  0 siblings, 2 replies; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-01 17:25 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Tue, 2022-03-01 at 16:21 +0000, Sean Christopherson wrote:
> Just "KVM: SVM:" for the shortlog, please.
> 
> On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > Out of precation use vmcb01 when enabling host AVIC.
> > No functional change intended.
> > 
> > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > ---
> >  arch/x86/kvm/svm/avic.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > index e23159f3a62ba..9656e192c646b 100644
> > --- a/arch/x86/kvm/svm/avic.c
> > +++ b/arch/x86/kvm/svm/avic.c
> > @@ -167,7 +167,7 @@ int avic_vm_init(struct kvm *kvm)
> >  
> >  void avic_init_vmcb(struct vcpu_svm *svm)
> >  {
> > -	struct vmcb *vmcb = svm->vmcb;
> > +	struct vmcb *vmcb = svm->vmcb01.ptr;
> 
> I don't like this change.  It's not bad code, but it'll be confusing because it
> implies that it's legal for svm->vmcb to be something other than svm->vmcb01.ptr
> when this is called.

Honestly I don't see how you had reached this conclusion.
 
I just think that code that always works on vmcb01
should use it, even if it happens that vmcb == vmcb01.
 
If you insist I can drop this patch or add WARN_ON instead,
I just think that this way is cleaner.
 
Best regards,
	Maxim Levitsky

> 
> If we want to guard AVIC, I'd much rather we do something like:
> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7038c76fa841..dcc856bd628d 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -992,8 +992,12 @@ static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu)
>  static void init_vmcb(struct kvm_vcpu *vcpu)
>  {
>         struct vcpu_svm *svm = to_svm(vcpu);
> -       struct vmcb_control_area *control = &svm->vmcb->control;
> -       struct vmcb_save_area *save = &svm->vmcb->save;
> +       struct vmcb *vmcb = svm->vmcb01.ptr;
> +       struct vmcb_control_area *control = &vmcb->control;
> +       struct vmcb_save_area *save = &vmcb->save;
> +
> +       if (WARN_ON_ONCE(vmcb != svm->vmcb))
> +               svm_leave_nested(vcpu);
> 
>         svm_set_intercept(svm, INTERCEPT_CR0_READ);
>         svm_set_intercept(svm, INTERCEPT_CR3_READ);
> 
> 
> On a related topic, init_vmcb_after_set_cpuid() is broken for nested, it needs to
> play nice with being called when svm->vmcb == &svm->nested.vmcb02, e.g. update
> vmcb01 and re-merge (or just recalc?) vmcb02's intercepts.
> 
> >  	struct kvm_svm *kvm_svm = to_kvm_svm(svm->vcpu.kvm);
> >  	phys_addr_t bpa = __sme_set(page_to_phys(svm->avic_backing_page));
> >  	phys_addr_t lpa = __sme_set(page_to_phys(kvm_svm->avic_logical_id_table_page));
> > -- 
> > 2.26.3
> > 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] KVM: x86: SVM: use vmcb01 in avic_init_vmcb
  2022-03-01 17:25     ` Maxim Levitsky
@ 2022-03-01 17:35       ` Sean Christopherson
  2022-03-09 15:48       ` Paolo Bonzini
  1 sibling, 0 replies; 26+ messages in thread
From: Sean Christopherson @ 2022-03-01 17:35 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> On Tue, 2022-03-01 at 16:21 +0000, Sean Christopherson wrote:
> > Just "KVM: SVM:" for the shortlog, please.
> > 
> > On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > > Out of precation use vmcb01 when enabling host AVIC.
> > > No functional change intended.
> > > 
> > > Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> > > ---
> > >  arch/x86/kvm/svm/avic.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> > > index e23159f3a62ba..9656e192c646b 100644
> > > --- a/arch/x86/kvm/svm/avic.c
> > > +++ b/arch/x86/kvm/svm/avic.c
> > > @@ -167,7 +167,7 @@ int avic_vm_init(struct kvm *kvm)
> > >  
> > >  void avic_init_vmcb(struct vcpu_svm *svm)
> > >  {
> > > -	struct vmcb *vmcb = svm->vmcb;
> > > +	struct vmcb *vmcb = svm->vmcb01.ptr;
> > 
> > I don't like this change.  It's not bad code, but it'll be confusing because it
> > implies that it's legal for svm->vmcb to be something other than svm->vmcb01.ptr
> > when this is called.
> 
> Honestly I don't see how you had reached this conclusion.

There's exactly one caller, init_vmcb(), and that caller doesn't assert that the
current VMCB is vmcb01, nor does it unconditionally use vmcb01.  Adding code here
without an assert implies that init_vmcb() may be called with vmcb02 active,
otherwise why diverge from its one caller?

> I just think that code that always works on vmcb01
> should use it, even if it happens that vmcb == vmcb01.

I'm not disagreeing, I'm saying that the rule you want to enforce also applies
to init_vmcb(), so rather than introduce inconsistent code in all the leafs, fix
the problem at the root.  I've no objection to adding a WARN in the AVIC code (though
at that point I'd vote to just pass in @vmcb), I'm objecting to "silently" diverging.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api
  2022-03-01 17:09     ` Maxim Levitsky
@ 2022-03-01 17:46       ` Sean Christopherson
  2022-03-01 17:56         ` Maxim Levitsky
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-03-01 17:46 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > index 80a2020c4db40..8d35f56c64020 100644
> > > --- a/arch/x86/kvm/lapic.c
> > > +++ b/arch/x86/kvm/lapic.c
> > > @@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> > >  		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
> > >  		u64 icr;
> > >  
> > > -		if (vcpu->kvm->arch.x2apic_format) {
> > > -			if (*id != vcpu->vcpu_id)
> > > -				return -EINVAL;
> > > -		} else {
> > > -			if (set)
> > > -				*id >>= 24;
> > > -			else
> > > -				*id <<= 24;
> > > -		}
> > > +		if (!vcpu->kvm->arch.x2apic_format && set)
> > > +			*id >>= 24;
> > > +
> > > +		if (*id != vcpu->vcpu_id)
> > > +			return -EINVAL;
> > 
> > This breaks backwards compability, userspace will start failing where it previously
> > succeeded.  It doesn't even require a malicious userspace, e.g. if userspace creates
> > a vCPU with a vcpu_id > 255 vCPUs, the shift will truncate and cause failure.  It'll
> > obviously do weird things, but this code is 5.5 years old, I don't think it's worth
> > trying to close a loophole that doesn't harm KVM.
> > 
> > If we provide a way for userspace to opt into disallowiong APICID != vcpu_id, we
> > can address this further upstream, e.g. reject vcpu_id > 255 without x2apic_format.
> 
> This check is only when CPU is in x2apic mode. In this mode the X86 specs demands that
> apic_id == vcpu_id.

AMD's APM explicitly allows changing the x2APIC ID by writing the xAPIC ID and then
switching to x2APIC mode.

> When old API is used, APIC IDs are encoded in xapic format, even when vCPU is in x2apic
> mode, meaning that apic id can't be more  that 255 even in x2apic mode.

Even on Intel CPUs, if userspace creates a vCPU with vcpu_id = 256, then the xAPIC ID
will be (256 << 24) == 0.  If userspace does get+set, then KVM will see 0 != 256 and
reject the set with return -EINVAL.

And as above, userspace that hasn't opted into the x2apic_format could also legitimately
change the x2APIC ID.

I 100% agree this is a mess and probably can't work without major shenanigans, but on
the other hand this isn't harming anything, so why risk breaking userspace, even if the
risk is infinitesimally small?

I'm all for locking down the APIC ID with a userspace opt-in control, I just don't
think it's worth trying to retroactively plug the various holes in KVM.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api
  2022-03-01 17:46       ` Sean Christopherson
@ 2022-03-01 17:56         ` Maxim Levitsky
  2022-03-02 11:50           ` Maxim Levitsky
  0 siblings, 1 reply; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-01 17:56 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Tue, 2022-03-01 at 17:46 +0000, Sean Christopherson wrote:
> On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > index 80a2020c4db40..8d35f56c64020 100644
> > > > --- a/arch/x86/kvm/lapic.c
> > > > +++ b/arch/x86/kvm/lapic.c
> > > > @@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> > > >  		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
> > > >  		u64 icr;
> > > >  
> > > > -		if (vcpu->kvm->arch.x2apic_format) {
> > > > -			if (*id != vcpu->vcpu_id)
> > > > -				return -EINVAL;
> > > > -		} else {
> > > > -			if (set)
> > > > -				*id >>= 24;
> > > > -			else
> > > > -				*id <<= 24;
> > > > -		}
> > > > +		if (!vcpu->kvm->arch.x2apic_format && set)
> > > > +			*id >>= 24;
> > > > +
> > > > +		if (*id != vcpu->vcpu_id)
> > > > +			return -EINVAL;
> > > 
> > > This breaks backwards compability, userspace will start failing where it previously
> > > succeeded.  It doesn't even require a malicious userspace, e.g. if userspace creates
> > > a vCPU with a vcpu_id > 255 vCPUs, the shift will truncate and cause failure.  It'll
> > > obviously do weird things, but this code is 5.5 years old, I don't think it's worth
> > > trying to close a loophole that doesn't harm KVM.
> > > 
> > > If we provide a way for userspace to opt into disallowiong APICID != vcpu_id, we
> > > can address this further upstream, e.g. reject vcpu_id > 255 without x2apic_format.
> > 
> > This check is only when CPU is in x2apic mode. In this mode the X86 specs demands that
> > apic_id == vcpu_id.
> 
> AMD's APM explicitly allows changing the x2APIC ID by writing the xAPIC ID and then
> switching to x2APIC mode.
This patch was supposed to go together with patch that fully forbids changing apic id.

> 
> > When old API is used, APIC IDs are encoded in xapic format, even when vCPU is in x2apic
> > mode, meaning that apic id can't be more  that 255 even in x2apic mode.
> 
> Even on Intel CPUs, if userspace creates a vCPU with vcpu_id = 256, then the xAPIC ID
> will be (256 << 24) == 0.  If userspace does get+set, then KVM will see 0 != 256 and
> reject the set with return -EINVAL.

First of all, I am saying again - with old API (!x2apic_format) - the APIC ID was limited
to 8 bits - literaly only bits 24-31 of the value was used.

With old API - literaly, the APIC_ID register was supposed to have x2apic format
regardless of if vCPU is in x2apic mode or not, and the above code cares to translate
it to the real apic id register from and to.

Thus there is really no point of letting old API change apic id for x2apic mode,
it is literaly a loophole that should be plugged.

Best regards,
	Maxim Levitsky

> 
> And as above, userspace that hasn't opted into the x2apic_format could also legitimately
> change the x2APIC ID.
> 
> I 100% agree this is a mess and probably can't work without major shenanigans, but on
> the other hand this isn't harming anything, so why risk breaking userspace, even if the
> risk is infinitesimally small?
> 
> I'm all for locking down the APIC ID with a userspace opt-in control, I just don't
> think it's worth trying to retroactively plug the various holes in KVM.
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api
  2022-03-01 17:56         ` Maxim Levitsky
@ 2022-03-02 11:50           ` Maxim Levitsky
  2022-03-03 16:51             ` Sean Christopherson
  0 siblings, 1 reply; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-02 11:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Tue, 2022-03-01 at 19:56 +0200, Maxim Levitsky wrote:
> On Tue, 2022-03-01 at 17:46 +0000, Sean Christopherson wrote:
> > On Tue, Mar 01, 2022, Maxim Levitsky wrote:
> > > > > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > > > > index 80a2020c4db40..8d35f56c64020 100644
> > > > > --- a/arch/x86/kvm/lapic.c
> > > > > +++ b/arch/x86/kvm/lapic.c
> > > > > @@ -2618,15 +2618,14 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu,
> > > > >  		u32 *ldr = (u32 *)(s->regs + APIC_LDR);
> > > > >  		u64 icr;
> > > > >  
> > > > > -		if (vcpu->kvm->arch.x2apic_format) {
> > > > > -			if (*id != vcpu->vcpu_id)
> > > > > -				return -EINVAL;
> > > > > -		} else {
> > > > > -			if (set)
> > > > > -				*id >>= 24;
> > > > > -			else
> > > > > -				*id <<= 24;
> > > > > -		}
> > > > > +		if (!vcpu->kvm->arch.x2apic_format && set)
> > > > > +			*id >>= 24;
> > > > > +
> > > > > +		if (*id != vcpu->vcpu_id)
> > > > > +			return -EINVAL;
> > > > 
> > > > This breaks backwards compability, userspace will start failing where it previously
> > > > succeeded.  It doesn't even require a malicious userspace, e.g. if userspace creates
> > > > a vCPU with a vcpu_id > 255 vCPUs, the shift will truncate and cause failure.  It'll
> > > > obviously do weird things, but this code is 5.5 years old, I don't think it's worth
> > > > trying to close a loophole that doesn't harm KVM.
> > > > 
> > > > If we provide a way for userspace to opt into disallowiong APICID != vcpu_id, we
> > > > can address this further upstream, e.g. reject vcpu_id > 255 without x2apic_format.
> > > 
> > > This check is only when CPU is in x2apic mode. In this mode the X86 specs demands that
> > > apic_id == vcpu_id.
> > 
> > AMD's APM explicitly allows changing the x2APIC ID by writing the xAPIC ID and then
> > switching to x2APIC mode.
> This patch was supposed to go together with patch that fully forbids changing apic id.
> 
> > > When old API is used, APIC IDs are encoded in xapic format, even when vCPU is in x2apic
> > > mode, meaning that apic id can't be more  that 255 even in x2apic mode.
> > 
> > Even on Intel CPUs, if userspace creates a vCPU with vcpu_id = 256, then the xAPIC ID
> > will be (256 << 24) == 0.  If userspace does get+set, then KVM will see 0 != 256 and
> > reject the set with return -EINVAL.
> 
> First of all, I am saying again - with old API (!x2apic_format) - the APIC ID was limited
> to 8 bits - literaly only bits 24-31 of the value was used.
> 
> With old API - literaly, the APIC_ID register was supposed to have x2apic format
> regardless of if vCPU is in x2apic mode or not, and the above code cares to translate
> it to the real apic id register from and to.
> 
> Thus there is really no point of letting old API change apic id for x2apic mode,
> it is literaly a loophole that should be plugged.

Since I probably didn't explain myself correctly let me try to explain myself again:
 
 
First of all let me explain the x86 spec:
 
APIC id is number, usually relatively small number - its initial value is taken from
vcpu_id, when vCPU is created.
Initial APIC id is exposed in CPUID, as well.
 
When APIC is in xapic mode, then mmio apic_base + 0x20, contains apic id in bits 24-31.
plus apic id 0xff is reserved for broadcast, thus max of 255 cpus.
 
When APIC is in x2apic mode, mmio access is disabled, and instead MSR 0x802 contains
the local apic id, and it is the whole 32 bit number.
 
 
We have 2 ioctls, KVM_GET_LAPIC/KVM_SET_LAPIC which set/get local apic state.
They use array of apic registers which is basically snapshot of xapic mmio, 
and it is not defined fully if these are x2apic or xapic registers.
 
Because of this, initially the APIC_ID register in this snapshot would
contain apic id in bits 24-31.
 
To make it possible to support apic id > 256, new KVM cap was added KVM_CAP_X2APIC_API
It has nothing to do with x2apic strictly speaking and can be used regardless of
x2apic mode.
 
All it does was to make the value at 0x20 offset in that mmio state dump be full 32 bit
value of the apic id.
 
 
When APIC state is loading while APIC is in *x2apic* mode it does enforce that
value in this 0x20 offset is initial apic id if KVM_CAP_X2APIC_API.
 
I think that it is fair to also enforce this when KVM_CAP_X2APIC_API is not used,
especially if we make apic id read-only.
 
I hope that this explains my point better, sorry if I was not able
to explain it well before.
 
Best regards,
	Maxim Levitsky


> 
> Best regards,
> 	Maxim Levitsky
> 
> > And as above, userspace that hasn't opted into the x2apic_format could also legitimately
> > change the x2APIC ID.
> > 
> > I 100% agree this is a mess and probably can't work without major shenanigans, but on
> > the other hand this isn't harming anything, so why risk breaking userspace, even if the
> > risk is infinitesimally small?
> > 
> > I'm all for locking down the APIC ID with a userspace opt-in control, I just don't
> > think it's worth trying to retroactively plug the various holes in KVM.
> > 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api
  2022-03-02 11:50           ` Maxim Levitsky
@ 2022-03-03 16:51             ` Sean Christopherson
  2022-03-03 18:15               ` Maxim Levitsky
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-03-03 16:51 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Wed, Mar 02, 2022, Maxim Levitsky wrote:
> When APIC state is loading while APIC is in *x2apic* mode it does enforce that
> value in this 0x20 offset is initial apic id if KVM_CAP_X2APIC_API.
>  
> I think that it is fair to also enforce this when KVM_CAP_X2APIC_API is not used,
> especially if we make apic id read-only.

I don't disagree in principle.  But, (a) this loophole as existing for nearly 6
years, (b) closing the loophole could break userspace, (c) false positive are
possible due to truncation, and (d) KVM gains nothing meaningful by closing the
loophole.

(d) changes when we add a knob to make xAPIC ID read-only, but we can simply
require userspace to enable KVM_CAP_X2APIC_API (or force it).  That approach
avoids (c) by eliminating truncation, and avoids (b) by virtue of being opt-in.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api
  2022-03-03 16:51             ` Sean Christopherson
@ 2022-03-03 18:15               ` Maxim Levitsky
  2022-03-03 19:38                 ` Sean Christopherson
  0 siblings, 1 reply; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-03 18:15 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Thu, 2022-03-03 at 16:51 +0000, Sean Christopherson wrote:
> On Wed, Mar 02, 2022, Maxim Levitsky wrote:
> > When APIC state is loading while APIC is in *x2apic* mode it does enforce that
> > value in this 0x20 offset is initial apic id if KVM_CAP_X2APIC_API.
> >  
> > I think that it is fair to also enforce this when KVM_CAP_X2APIC_API is not used,
> > especially if we make apic id read-only.
> 
> I don't disagree in principle.  But, (a) this loophole as existing for nearly 6
> years, (b) closing the loophole could break userspace, (c) false positive are
> possible due to truncation, and (d) KVM gains nothing meaningful by closing the
> loophole.
> 
> (d) changes when we add a knob to make xAPIC ID read-only, but we can simply
> require userspace to enable KVM_CAP_X2APIC_API (or force it).  That approach
> avoids (c) by eliminating truncation, and avoids (b) by virtue of being opt-in.
> 

(a) - doesn't matter.

(b) - if userspace wants to have non default apic id with x2apic mode,
      which (*)can't even really be set from the guest - this is ridiculous.
 
      (*) Yes I know that in *theory* user can change apic id in xapic mode
      and then switch to x2apic mode - but I really doubt that KVM
      would even honor this - there are already places which assume
      that this is not the case. In fact it would be nice to audit KVM
      on what happens when userspace does this, there might be a nice
      CVE somewhere....
 
(c) - without KVM_CAP_X2APIC_API, literally just call to KVM_GET_LAPIC/KVM_SET_LAPIC
will truncate x2apic id if > 255 regardless of my patch - literally this cap
was added to avoid this.
What we should do is to avoid creating cpu with vcpu_id > 256 when this cap is not set…
 
(d) - doesn't matter - again we are talking about x2apic mode in which apic id is read only.
 

Don’t get me wrong - I understand your concerns about this, but I hope that you
also understand mine - I still think that you just don't understand me.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api
  2022-03-03 18:15               ` Maxim Levitsky
@ 2022-03-03 19:38                 ` Sean Christopherson
  2022-03-03 19:49                   ` Sean Christopherson
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-03-03 19:38 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Thu, Mar 03, 2022, Maxim Levitsky wrote:
> On Thu, 2022-03-03 at 16:51 +0000, Sean Christopherson wrote:
> > On Wed, Mar 02, 2022, Maxim Levitsky wrote:
> > > When APIC state is loading while APIC is in *x2apic* mode it does enforce that
> > > value in this 0x20 offset is initial apic id if KVM_CAP_X2APIC_API.
> > >  
> > > I think that it is fair to also enforce this when KVM_CAP_X2APIC_API is not used,
> > > especially if we make apic id read-only.
> > 
> > I don't disagree in principle.  But, (a) this loophole as existing for nearly 6
> > years, (b) closing the loophole could break userspace, (c) false positive are
> > possible due to truncation, and (d) KVM gains nothing meaningful by closing the
> > loophole.
> > 
> > (d) changes when we add a knob to make xAPIC ID read-only, but we can simply
> > require userspace to enable KVM_CAP_X2APIC_API (or force it).  That approach
> > avoids (c) by eliminating truncation, and avoids (b) by virtue of being opt-in.
> > 
> 
> (a) - doesn't matter.

Yes, it absolutely matters.  If KVM_CAP_X2APIC_API was added in the 5.17 cycle
then I would not be objecting because there is zero chance of breaking userspace.

> (b) - if userspace wants to have non default apic id with x2apic mode,
>       which (*)can't even really be set from the guest - this is ridiculous.
>  
>       (*) Yes I know that in *theory* user can change apic id in xapic mode
>       and then switch to x2apic mode - but I really doubt that KVM
>       would even honor this - there are already places which assume
>       that this is not the case. In fact it would be nice to audit KVM
>       on what happens when userspace does this, there might be a nice
>       CVE somewhere....
>  
> (c) - without KVM_CAP_X2APIC_API, literally just call to KVM_GET_LAPIC/KVM_SET_LAPIC
> will truncate x2apic id if > 255 regardless of my patch - literally this cap
> was added to avoid this.
> What we should do is to avoid creating cpu with vcpu_id > 256 when this cap is not set…

Yes, but _rejecting_ that behavior is a change in KVM's ABI.  That's why (a) matters.

> (d) - doesn't matter - again we are talking about x2apic mode in which apic id is read only.

It does matter that changes that potentially break userspace provide value to KVM.
Not theoretical, make us feel warm and fuzzy value, but actual value to KVM or
userspace.  KVM is no worse off for this loophole.  For userspace, at best this
will break a flawed but functional setup.

Concretely, the below example of using x2APIC with an ID > 255 works because KVM
calculates its APIC maps using what KVM thinks is the x2APIC ID. 

With your proposed change, KVM_SET_LAPIC will fail and we've broken a functional,
if sketchy, setup.  Is there likely to be such a real-world setup that doesn't
barf on the inconsistent x2APIC ID?  Probably not, but I don't see any reason to
find out.

==== Test Assertion Failure ====
  lib/kvm_util.c:1953: ret == 0
  pid=837 tid=837 errno=22 - Invalid argument
     1	0x0000000000403c0e: vcpu_ioctl at kvm_util.c:1952
     2	0x0000000000401548: main at smm_test.c:151
     3	0x00007ff76518ebf6: ?? ??:0
     4	0x0000000000401829: _start at ??:?
  vcpu ioctl 1140895375 failed, rc: -1 errno: 22 (Invalid argument)


diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c
index a626d40fdb48..cce44d99c919 100644
--- a/tools/testing/selftests/kvm/x86_64/smm_test.c
+++ b/tools/testing/selftests/kvm/x86_64/smm_test.c
@@ -19,7 +19,7 @@
 #include "vmx.h"
 #include "svm_util.h"

-#define VCPU_ID              1
+#define VCPU_ID              256

 #define PAGE_SIZE  4096

@@ -56,7 +56,7 @@ static inline void sync_with_host(uint64_t phase)
 static void self_smi(void)
 {
        x2apic_write_reg(APIC_ICR,
-                        APIC_DEST_SELF | APIC_INT_ASSERT | APIC_DM_SMI);
+                        ((uint64_t)VCPU_ID << 32) | APIC_INT_ASSERT | APIC_DM_SMI);
 }

 static void l2_guest_code(void)
@@ -72,14 +72,10 @@ static void guest_code(void *arg)
 {
        #define L2_GUEST_STACK_SIZE 64
        unsigned long l2_guest_stack[L2_GUEST_STACK_SIZE];
-       uint64_t apicbase = rdmsr(MSR_IA32_APICBASE);
        struct svm_test_data *svm = arg;
        struct vmx_pages *vmx_pages = arg;

        sync_with_host(1);
-
-       wrmsr(MSR_IA32_APICBASE, apicbase | X2APIC_ENABLE);
-
        sync_with_host(2);

        self_smi();
@@ -139,12 +135,21 @@ int main(int argc, char *argv[])
        struct kvm_run *run;
        struct kvm_x86_state *state;
        int stage, stage_reported;
+       struct kvm_lapic_state lapic;

        /* Create VM */
        vm = vm_create_default(VCPU_ID, 0, guest_code);

        run = vcpu_state(vm, VCPU_ID);

+       vcpu_set_msr(vm, VCPU_ID, MSR_IA32_APICBASE,
+                    vcpu_get_msr(vm, VCPU_ID, MSR_IA32_APICBASE) | X2APIC_ENABLE);
+
+       vcpu_ioctl(vm, VCPU_ID, KVM_GET_LAPIC, &lapic);
+       TEST_ASSERT(*((u32 *)&lapic.regs[0x20]) == 0,
+                   "x2APIC ID == %u", *((u32 *)&lapic.regs[0x20]));
+       vcpu_ioctl(vm, VCPU_ID, KVM_SET_LAPIC, &lapic);
+
        vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, SMRAM_GPA,
                                    SMRAM_MEMSLOT, SMRAM_PAGES, 0);
        TEST_ASSERT(vm_phy_pages_alloc(vm, SMRAM_PAGES, SMRAM_GPA, SMRAM_MEMSLOT)

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api
  2022-03-03 19:38                 ` Sean Christopherson
@ 2022-03-03 19:49                   ` Sean Christopherson
  2022-03-04 10:54                     ` Maxim Levitsky
  0 siblings, 1 reply; 26+ messages in thread
From: Sean Christopherson @ 2022-03-03 19:49 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Thu, Mar 03, 2022, Sean Christopherson wrote:
> With your proposed change, KVM_SET_LAPIC will fail and we've broken a functional,
> if sketchy, setup.  Is there likely to be such a real-world setup that doesn't
> barf on the inconsistent x2APIC ID?  Probably not, but I don't see any reason to
> find out.

I take back the "probably not", this isn't even all that contrived.  Prior to the
"migration", the guest will see a consistent x2APIC ID.  It's not hard to imagine
a guest that snapshots the ID and never re-reads the value from "hardware".

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api
  2022-03-03 19:49                   ` Sean Christopherson
@ 2022-03-04 10:54                     ` Maxim Levitsky
  0 siblings, 0 replies; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-04 10:54 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Paolo Bonzini, Joerg Roedel, Thomas Gleixner, Dave Hansen,
	Wanpeng Li, Borislav Petkov, x86

On Thu, 2022-03-03 at 19:49 +0000, Sean Christopherson wrote:
> On Thu, Mar 03, 2022, Sean Christopherson wrote:
> > With your proposed change, KVM_SET_LAPIC will fail and we've broken a functional,
> > if sketchy, setup.  Is there likely to be such a real-world setup that doesn't
> > barf on the inconsistent x2APIC ID?  Probably not, but I don't see any reason to
> > find out.
> 
> I take back the "probably not", this isn't even all that contrived.  Prior to the
> "migration", the guest will see a consistent x2APIC ID.  It's not hard to imagine
> a guest that snapshots the ID and never re-reads the value from "hardware".
> 
Ok. you win. I will not argue about this.

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/4] KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SW
  2022-03-01 17:13     ` Maxim Levitsky
@ 2022-03-09 15:46       ` Paolo Bonzini
  0 siblings, 0 replies; 26+ messages in thread
From: Paolo Bonzini @ 2022-03-09 15:46 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Joerg Roedel, Thomas Gleixner, Dave Hansen, Wanpeng Li,
	Borislav Petkov, x86

On 3/1/22 18:13, Maxim Levitsky wrote:
> The fact that we have a stale VM exit reason in vmcb without this
> patch which can be in theory consumed somewhere down the road.
> 
> This stale vm exit reason also appears in the tracs which is
> very misleading.

Let's put it in the commit message:

This makes it a bit easier to read the KVM trace, and avoids
other potential problems due to a stale vmexit reason in the vmcb.  If
SVM_EXIT_SW somehow reaches svm_invoke_exit_handler(), instead, 
svm_check_exit_valid() will return false and a WARN will be logged.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] KVM: x86: SVM: use vmcb01 in avic_init_vmcb
  2022-03-01 17:25     ` Maxim Levitsky
  2022-03-01 17:35       ` Sean Christopherson
@ 2022-03-09 15:48       ` Paolo Bonzini
  2022-03-15 12:27         ` Maxim Levitsky
  1 sibling, 1 reply; 26+ messages in thread
From: Paolo Bonzini @ 2022-03-09 15:48 UTC (permalink / raw)
  To: Maxim Levitsky, Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Joerg Roedel, Thomas Gleixner, Dave Hansen, Wanpeng Li,
	Borislav Petkov, x86

On 3/1/22 18:25, Maxim Levitsky wrote:
>> I don't like this change.  It's not bad code, but it'll be confusing because it
>> implies that it's legal for svm->vmcb to be something other than svm->vmcb01.ptr
>> when this is called.
> Honestly I don't see how you had reached this conclusion.
>   
> I just think that code that always works on vmcb01
> should use it, even if it happens that vmcb == vmcb01.
>   
> If you insist I can drop this patch or add WARN_ON instead,
> I just think that this way is cleaner.
>   

I do like the patch, but you should do the same in init_vmcb() and 
svm_hv_init_vmcb() as well.

Paolo

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/4] KVM: x86: SVM: use vmcb01 in avic_init_vmcb
  2022-03-09 15:48       ` Paolo Bonzini
@ 2022-03-15 12:27         ` Maxim Levitsky
  0 siblings, 0 replies; 26+ messages in thread
From: Maxim Levitsky @ 2022-03-15 12:27 UTC (permalink / raw)
  To: Paolo Bonzini, Sean Christopherson
  Cc: kvm, Jim Mattson, H. Peter Anvin, linux-kernel, Vitaly Kuznetsov,
	Joerg Roedel, Thomas Gleixner, Dave Hansen, Wanpeng Li,
	Borislav Petkov, x86

On Wed, 2022-03-09 at 16:48 +0100, Paolo Bonzini wrote:
> On 3/1/22 18:25, Maxim Levitsky wrote:
> > > I don't like this change.  It's not bad code, but it'll be confusing because it
> > > implies that it's legal for svm->vmcb to be something other than svm->vmcb01.ptr
> > > when this is called.
> > Honestly I don't see how you had reached this conclusion.
> >   
> > I just think that code that always works on vmcb01
> > should use it, even if it happens that vmcb == vmcb01.
> >   
> > If you insist I can drop this patch or add WARN_ON instead,
> > I just think that this way is cleaner.
> >   
> 
> I do like the patch, but you should do the same in init_vmcb() and 
> svm_hv_init_vmcb() as well.

I will do this.
Thanks!

Best regards,
	Maxim Levitsky
> 
> Paolo
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-03-15 12:27 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-01 13:55 [PATCH 0/4] SVM fixes + apic fix Maxim Levitsky
2022-03-01 13:55 ` [PATCH 1/4] KVM: x86: mark synthetic SMM vmexit as SVM_EXIT_SW Maxim Levitsky
2022-03-01 16:31   ` Sean Christopherson
2022-03-01 17:13     ` Maxim Levitsky
2022-03-09 15:46       ` Paolo Bonzini
2022-03-01 13:55 ` [PATCH 2/4] KVM: x86: SVM: disable preemption in avic_refresh_apicv_exec_ctrl Maxim Levitsky
2022-03-01 17:15   ` Sean Christopherson
2022-03-01 17:20     ` Maxim Levitsky
2022-03-01 17:23       ` Paolo Bonzini
2022-03-01 13:55 ` [PATCH 3/4] KVM: x86: SVM: use vmcb01 in avic_init_vmcb Maxim Levitsky
2022-03-01 16:21   ` Sean Christopherson
2022-03-01 17:25     ` Maxim Levitsky
2022-03-01 17:35       ` Sean Christopherson
2022-03-09 15:48       ` Paolo Bonzini
2022-03-15 12:27         ` Maxim Levitsky
2022-03-01 13:55 ` [PATCH 4/4] KVM: x86: lapic: don't allow to set non default apic id when not using x2apic api Maxim Levitsky
2022-03-01 16:56   ` Sean Christopherson
2022-03-01 17:09     ` Maxim Levitsky
2022-03-01 17:46       ` Sean Christopherson
2022-03-01 17:56         ` Maxim Levitsky
2022-03-02 11:50           ` Maxim Levitsky
2022-03-03 16:51             ` Sean Christopherson
2022-03-03 18:15               ` Maxim Levitsky
2022-03-03 19:38                 ` Sean Christopherson
2022-03-03 19:49                   ` Sean Christopherson
2022-03-04 10:54                     ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).