All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: x86: Fix a WARN in kvm_apic_send_ipi()
@ 2023-09-04  1:35 Tao Su
  2023-09-04  1:35 ` [PATCH 1/2] x86/apic: Introduce X2APIC_ICR_UNUSED_12 for x2APIC mode Tao Su
  2023-09-04  1:35 ` [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit Tao Su
  0 siblings, 2 replies; 14+ messages in thread
From: Tao Su @ 2023-09-04  1:35 UTC (permalink / raw)
  To: kvm; +Cc: seanjc, pbonzini, chao.gao, guang.zeng, yi1.lai, tao1.su

When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.

The APIC_ICR_BUSY bit is removed in x2APIC mode, and bit12 of ICR is
changed to UNUSED bit, but kvm_x2apic_icr_write() still uses
APIC_ICR_BUSY, which may cause ambiguity, so introducing
X2APIC_ICR_UNUSED_12 instead.

When X2APIC_ICR_UNUSED_12 is set, how the hardware handles it
determines how the WARN is fixed. However SDM has no detail about it,
we tested on Intel CPU (SRF/GNR) with IPI virtualization and found
X2APIC_ICR_UNUSED_12 was also cleared by hardware without #GP. Therefore,
the clearing of bit12 should be still kept being consistent with the
hardware behavior.

Tao Su (2):
  x86/apic: Introduce X2APIC_ICR_UNUSED_12 for x2APIC mode
  KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit

 arch/x86/include/asm/apicdef.h |  1 +
 arch/x86/kvm/lapic.c           | 27 ++++++++++++++++++++-------
 2 files changed, 21 insertions(+), 7 deletions(-)


base-commit: 708283abf896dd4853e673cc8cba70acaf9bf4ea
-- 
2.34.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] x86/apic: Introduce X2APIC_ICR_UNUSED_12 for x2APIC mode
  2023-09-04  1:35 [PATCH 0/2] KVM: x86: Fix a WARN in kvm_apic_send_ipi() Tao Su
@ 2023-09-04  1:35 ` Tao Su
  2023-09-04  2:58   ` Chao Gao
  2023-09-04  1:35 ` [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit Tao Su
  1 sibling, 1 reply; 14+ messages in thread
From: Tao Su @ 2023-09-04  1:35 UTC (permalink / raw)
  To: kvm; +Cc: seanjc, pbonzini, chao.gao, guang.zeng, yi1.lai, tao1.su

According to SDM, bit12 of ICR is no longer BUSY bit but UNUSED bit in
x2APIC mode, which is the only difference of lower ICR between xAPIC and
x2APIC mode. To avoid ambiguity, introduce X2APIC_ICR_UNUSED_12 for
x2APIC mode.

Signed-off-by: Tao Su <tao1.su@linux.intel.com>
---
 arch/x86/include/asm/apicdef.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
index 4b125e5b3187..ea2725738b81 100644
--- a/arch/x86/include/asm/apicdef.h
+++ b/arch/x86/include/asm/apicdef.h
@@ -78,6 +78,7 @@
 #define		APIC_INT_LEVELTRIG	0x08000
 #define		APIC_INT_ASSERT		0x04000
 #define		APIC_ICR_BUSY		0x01000
+#define		X2APIC_ICR_UNUSED_12	0x01000
 #define		APIC_DEST_LOGICAL	0x00800
 #define		APIC_DEST_PHYSICAL	0x00000
 #define		APIC_DM_FIXED		0x00000
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
  2023-09-04  1:35 [PATCH 0/2] KVM: x86: Fix a WARN in kvm_apic_send_ipi() Tao Su
  2023-09-04  1:35 ` [PATCH 1/2] x86/apic: Introduce X2APIC_ICR_UNUSED_12 for x2APIC mode Tao Su
@ 2023-09-04  1:35 ` Tao Su
  2023-09-04  2:46   ` Chao Gao
                     ` (4 more replies)
  1 sibling, 5 replies; 14+ messages in thread
From: Tao Su @ 2023-09-04  1:35 UTC (permalink / raw)
  To: kvm; +Cc: seanjc, pbonzini, chao.gao, guang.zeng, yi1.lai, tao1.su

When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.

Since bit12 of ICR is no longer BUSY bit but UNUSED bit in x2APIC mode,
and SDM has no detail about how hardware will handle the UNUSED bit12
set, we tested on Intel CPU (SRF/GNR) with IPI virtualization and found
the UNUSED bit12 was also cleared by hardware without #GP. Therefore,
the clearing of bit12 should be still kept being consistent with the
hardware behavior.

Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")
Signed-off-by: Tao Su <tao1.su@linux.intel.com>
Tested-by: Yi Lai <yi1.lai@intel.com>
---
 arch/x86/kvm/lapic.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a983a16163b1..09a376aeb4a0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1482,8 +1482,17 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
 {
 	struct kvm_lapic_irq irq;
 
-	/* KVM has no delay and should always clear the BUSY/PENDING flag. */
-	WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
+	/*
+	 * In non-x2apic mode, KVM has no delay and should always clear the
+	 * BUSY/PENDING flag. In x2apic mode, KVM should clear the unused bit12
+	 * of ICR since hardware will also clear this bit. Although
+	 * APIC_ICR_BUSY and X2APIC_ICR_UNUSED_12 are same, they mean different
+	 * things in different modes.
+	 */
+	if (!apic_x2apic_mode(apic))
+		WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
+	else
+		WARN_ON_ONCE(icr_low & X2APIC_ICR_UNUSED_12);
 
 	irq.vector = icr_low & APIC_VECTOR_MASK;
 	irq.delivery_mode = icr_low & APIC_MODE_MASK;
@@ -2429,13 +2438,12 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 	 * ICR is a single 64-bit register when x2APIC is enabled.  For legacy
 	 * xAPIC, ICR writes need to go down the common (slightly slower) path
 	 * to get the upper half from ICR2.
+	 *
+	 * TODO: optimize to just emulate side effect w/o one more write
 	 */
 	if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
-		val = kvm_lapic_get_reg64(apic, APIC_ICR);
-		kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32));
-		trace_kvm_apic_write(APIC_ICR, val);
+		kvm_x2apic_icr_write(apic, val);
 	} else {
-		/* TODO: optimize to just emulate side effect w/o one more write */
 		val = kvm_lapic_get_reg(apic, offset);
 		kvm_lapic_reg_write(apic, offset, (u32)val);
 	}
@@ -3122,7 +3130,12 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
 
 int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
 {
-	data &= ~APIC_ICR_BUSY;
+	/*
+	 * The Delivery Status Bit(bit 12) is removed in x2apic mode, but this
+	 * bit is also cleared by hardware, so keep consistent with hardware
+	 * behavior to clear this bit.
+	 */
+	data &= ~X2APIC_ICR_UNUSED_12;
 
 	kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
 	kvm_lapic_set_reg64(apic, APIC_ICR, data);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
  2023-09-04  1:35 ` [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit Tao Su
@ 2023-09-04  2:46   ` Chao Gao
  2023-09-04  3:00     ` Tao Su
  2023-09-04  4:16   ` kernel test robot
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Chao Gao @ 2023-09-04  2:46 UTC (permalink / raw)
  To: Tao Su; +Cc: kvm, seanjc, pbonzini, guang.zeng, yi1.lai

On Mon, Sep 04, 2023 at 09:35:55AM +0800, Tao Su wrote:
>When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
>MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
>thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
>but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.
>
>Since bit12 of ICR is no longer BUSY bit but UNUSED bit in x2APIC mode,
>and SDM has no detail about how hardware will handle the UNUSED bit12
>set, we tested on Intel CPU (SRF/GNR) with IPI virtualization and found
>the UNUSED bit12 was also cleared by hardware without #GP. Therefore,
>the clearing of bit12 should be still kept being consistent with the
>hardware behavior.
>
>Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")
>Signed-off-by: Tao Su <tao1.su@linux.intel.com>
>Tested-by: Yi Lai <yi1.lai@intel.com>
>---
> arch/x86/kvm/lapic.c | 27 ++++++++++++++++++++-------
> 1 file changed, 20 insertions(+), 7 deletions(-)
>
>diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>index a983a16163b1..09a376aeb4a0 100644
>--- a/arch/x86/kvm/lapic.c
>+++ b/arch/x86/kvm/lapic.c
>@@ -1482,8 +1482,17 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
> {
> 	struct kvm_lapic_irq irq;
> 
>-	/* KVM has no delay and should always clear the BUSY/PENDING flag. */
>-	WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
>+	/*
>+	 * In non-x2apic mode, KVM has no delay and should always clear the
>+	 * BUSY/PENDING flag. In x2apic mode, KVM should clear the unused bit12
>+	 * of ICR since hardware will also clear this bit. Although
>+	 * APIC_ICR_BUSY and X2APIC_ICR_UNUSED_12 are same, they mean different
>+	 * things in different modes.
>+	 */
>+	if (!apic_x2apic_mode(apic))
>+		WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
>+	else
>+		WARN_ON_ONCE(icr_low & X2APIC_ICR_UNUSED_12);


Hi Tao,

Using the alias for APIC_ICR_BUSY is not strictly necessary for the bug fix.
I think it is better to move this hunk into patch 1.

> 
> 	irq.vector = icr_low & APIC_VECTOR_MASK;
> 	irq.delivery_mode = icr_low & APIC_MODE_MASK;
>@@ -2429,13 +2438,12 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
> 	 * ICR is a single 64-bit register when x2APIC is enabled.  For legacy
> 	 * xAPIC, ICR writes need to go down the common (slightly slower) path
> 	 * to get the upper half from ICR2.
>+	 *
>+	 * TODO: optimize to just emulate side effect w/o one more write
> 	 */
> 	if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
>-		val = kvm_lapic_get_reg64(apic, APIC_ICR);
>-		kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32));
>-		trace_kvm_apic_write(APIC_ICR, val);
>+		kvm_x2apic_icr_write(apic, val);
> 	} else {
>-		/* TODO: optimize to just emulate side effect w/o one more write */
> 		val = kvm_lapic_get_reg(apic, offset);
> 		kvm_lapic_reg_write(apic, offset, (u32)val);
> 	}
>@@ -3122,7 +3130,12 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
> 
> int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
> {
>-	data &= ~APIC_ICR_BUSY;
>+	/*
>+	 * The Delivery Status Bit(bit 12) is removed in x2apic mode, but this
>+	 * bit is also cleared by hardware, so keep consistent with hardware
>+	 * behavior to clear this bit.
>+	 */
>+	data &= ~X2APIC_ICR_UNUSED_12;

Ditto.

and can you also swap patch 1 with patch 2? Because patch 1 is an optional
improvement, putting such a patch at the end of a patch set gives
maintainers an easy choice about whether they want it or not.

> 
> 	kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
> 	kvm_lapic_set_reg64(apic, APIC_ICR, data);
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86/apic: Introduce X2APIC_ICR_UNUSED_12 for x2APIC mode
  2023-09-04  1:35 ` [PATCH 1/2] x86/apic: Introduce X2APIC_ICR_UNUSED_12 for x2APIC mode Tao Su
@ 2023-09-04  2:58   ` Chao Gao
  2023-09-04  3:03     ` Tao Su
  0 siblings, 1 reply; 14+ messages in thread
From: Chao Gao @ 2023-09-04  2:58 UTC (permalink / raw)
  To: Tao Su; +Cc: kvm, seanjc, pbonzini, guang.zeng, yi1.lai

On Mon, Sep 04, 2023 at 09:35:54AM +0800, Tao Su wrote:
>According to SDM, bit12 of ICR is no longer BUSY bit but UNUSED bit in
>x2APIC mode, which is the only difference of lower ICR between xAPIC and
>x2APIC mode. To avoid ambiguity, introduce X2APIC_ICR_UNUSED_12 for
>x2APIC mode.
>
>Signed-off-by: Tao Su <tao1.su@linux.intel.com>

Please use scripts/get_maintainer.pl to help create the Cc/To lists.
I believe x86 maintainers should be copied for this patch.

>---
> arch/x86/include/asm/apicdef.h | 1 +
> 1 file changed, 1 insertion(+)
>
>diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
>index 4b125e5b3187..ea2725738b81 100644
>--- a/arch/x86/include/asm/apicdef.h
>+++ b/arch/x86/include/asm/apicdef.h
>@@ -78,6 +78,7 @@
> #define		APIC_INT_LEVELTRIG	0x08000
> #define		APIC_INT_ASSERT		0x04000
> #define		APIC_ICR_BUSY		0x01000
>+#define		X2APIC_ICR_UNUSED_12	0x01000
> #define		APIC_DEST_LOGICAL	0x00800
> #define		APIC_DEST_PHYSICAL	0x00000
> #define		APIC_DM_FIXED		0x00000
>-- 
>2.34.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
  2023-09-04  2:46   ` Chao Gao
@ 2023-09-04  3:00     ` Tao Su
  0 siblings, 0 replies; 14+ messages in thread
From: Tao Su @ 2023-09-04  3:00 UTC (permalink / raw)
  To: Chao Gao; +Cc: kvm, seanjc, pbonzini, guang.zeng, yi1.lai

On Mon, Sep 04, 2023 at 10:46:38AM +0800, Chao Gao wrote:
> On Mon, Sep 04, 2023 at 09:35:55AM +0800, Tao Su wrote:
> >When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
> >MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
> >thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
> >but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.
> >
> >Since bit12 of ICR is no longer BUSY bit but UNUSED bit in x2APIC mode,
> >and SDM has no detail about how hardware will handle the UNUSED bit12
> >set, we tested on Intel CPU (SRF/GNR) with IPI virtualization and found
> >the UNUSED bit12 was also cleared by hardware without #GP. Therefore,
> >the clearing of bit12 should be still kept being consistent with the
> >hardware behavior.
> >
> >Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")
> >Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> >Tested-by: Yi Lai <yi1.lai@intel.com>
> >---
> > arch/x86/kvm/lapic.c | 27 ++++++++++++++++++++-------
> > 1 file changed, 20 insertions(+), 7 deletions(-)
> >
> >diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >index a983a16163b1..09a376aeb4a0 100644
> >--- a/arch/x86/kvm/lapic.c
> >+++ b/arch/x86/kvm/lapic.c
> >@@ -1482,8 +1482,17 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
> > {
> > 	struct kvm_lapic_irq irq;
> > 
> >-	/* KVM has no delay and should always clear the BUSY/PENDING flag. */
> >-	WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
> >+	/*
> >+	 * In non-x2apic mode, KVM has no delay and should always clear the
> >+	 * BUSY/PENDING flag. In x2apic mode, KVM should clear the unused bit12
> >+	 * of ICR since hardware will also clear this bit. Although
> >+	 * APIC_ICR_BUSY and X2APIC_ICR_UNUSED_12 are same, they mean different
> >+	 * things in different modes.
> >+	 */
> >+	if (!apic_x2apic_mode(apic))
> >+		WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
> >+	else
> >+		WARN_ON_ONCE(icr_low & X2APIC_ICR_UNUSED_12);
> 
> 
> Hi Tao,
> 
> Using the alias for APIC_ICR_BUSY is not strictly necessary for the bug fix.
> I think it is better to move this hunk into patch 1.
> 
> > 
> > 	irq.vector = icr_low & APIC_VECTOR_MASK;
> > 	irq.delivery_mode = icr_low & APIC_MODE_MASK;
> >@@ -2429,13 +2438,12 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
> > 	 * ICR is a single 64-bit register when x2APIC is enabled.  For legacy
> > 	 * xAPIC, ICR writes need to go down the common (slightly slower) path
> > 	 * to get the upper half from ICR2.
> >+	 *
> >+	 * TODO: optimize to just emulate side effect w/o one more write
> > 	 */
> > 	if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
> >-		val = kvm_lapic_get_reg64(apic, APIC_ICR);
> >-		kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32));
> >-		trace_kvm_apic_write(APIC_ICR, val);
> >+		kvm_x2apic_icr_write(apic, val);
> > 	} else {
> >-		/* TODO: optimize to just emulate side effect w/o one more write */
> > 		val = kvm_lapic_get_reg(apic, offset);
> > 		kvm_lapic_reg_write(apic, offset, (u32)val);
> > 	}
> >@@ -3122,7 +3130,12 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
> > 
> > int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
> > {
> >-	data &= ~APIC_ICR_BUSY;
> >+	/*
> >+	 * The Delivery Status Bit(bit 12) is removed in x2apic mode, but this
> >+	 * bit is also cleared by hardware, so keep consistent with hardware
> >+	 * behavior to clear this bit.
> >+	 */
> >+	data &= ~X2APIC_ICR_UNUSED_12;
> 
> Ditto.
> 
> and can you also swap patch 1 with patch 2? Because patch 1 is an optional
> improvement, putting such a patch at the end of a patch set gives
> maintainers an easy choice about whether they want it or not.

Yes, I just think the alias will make the code clearer because x2APIC
does remove bit12. If it is not necessary, I will directly drop this
alias in version 2.

Thanks,
Tao

> 
> > 
> > 	kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
> > 	kvm_lapic_set_reg64(apic, APIC_ICR, data);
> >-- 
> >2.34.1
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] x86/apic: Introduce X2APIC_ICR_UNUSED_12 for x2APIC mode
  2023-09-04  2:58   ` Chao Gao
@ 2023-09-04  3:03     ` Tao Su
  0 siblings, 0 replies; 14+ messages in thread
From: Tao Su @ 2023-09-04  3:03 UTC (permalink / raw)
  To: Chao Gao; +Cc: kvm, seanjc, pbonzini, guang.zeng, yi1.lai

On Mon, Sep 04, 2023 at 10:58:32AM +0800, Chao Gao wrote:
> On Mon, Sep 04, 2023 at 09:35:54AM +0800, Tao Su wrote:
> >According to SDM, bit12 of ICR is no longer BUSY bit but UNUSED bit in
> >x2APIC mode, which is the only difference of lower ICR between xAPIC and
> >x2APIC mode. To avoid ambiguity, introduce X2APIC_ICR_UNUSED_12 for
> >x2APIC mode.
> >
> >Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> 
> Please use scripts/get_maintainer.pl to help create the Cc/To lists.
> I believe x86 maintainers should be copied for this patch.

Ok, I will cc x86 maintainers if keep this patch in the next version.

Thanks,
Tao

> 
> >---
> > arch/x86/include/asm/apicdef.h | 1 +
> > 1 file changed, 1 insertion(+)
> >
> >diff --git a/arch/x86/include/asm/apicdef.h b/arch/x86/include/asm/apicdef.h
> >index 4b125e5b3187..ea2725738b81 100644
> >--- a/arch/x86/include/asm/apicdef.h
> >+++ b/arch/x86/include/asm/apicdef.h
> >@@ -78,6 +78,7 @@
> > #define		APIC_INT_LEVELTRIG	0x08000
> > #define		APIC_INT_ASSERT		0x04000
> > #define		APIC_ICR_BUSY		0x01000
> >+#define		X2APIC_ICR_UNUSED_12	0x01000
> > #define		APIC_DEST_LOGICAL	0x00800
> > #define		APIC_DEST_PHYSICAL	0x00000
> > #define		APIC_DM_FIXED		0x00000
> >-- 
> >2.34.1
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
  2023-09-04  1:35 ` [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit Tao Su
  2023-09-04  2:46   ` Chao Gao
@ 2023-09-04  4:16   ` kernel test robot
  2023-09-04  5:02   ` Tao Su
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-09-04  4:16 UTC (permalink / raw)
  To: Tao Su, kvm
  Cc: llvm, oe-kbuild-all, seanjc, pbonzini, chao.gao, guang.zeng,
	yi1.lai, tao1.su

Hi Tao,

kernel test robot noticed the following build warnings:

[auto build test WARNING on 708283abf896dd4853e673cc8cba70acaf9bf4ea]

url:    https://github.com/intel-lab-lkp/linux/commits/Tao-Su/x86-apic-Introduce-X2APIC_ICR_UNUSED_12-for-x2APIC-mode/20230904-093801
base:   708283abf896dd4853e673cc8cba70acaf9bf4ea
patch link:    https://lore.kernel.org/r/20230904013555.725413-3-tao1.su%40linux.intel.com
patch subject: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
config: x86_64-randconfig-003-20230904 (https://download.01.org/0day-ci/archive/20230904/202309041224.CDj6t1BN-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230904/202309041224.CDj6t1BN-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202309041224.CDj6t1BN-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> arch/x86/kvm/lapic.c:2445:30: warning: variable 'val' is uninitialized when used here [-Wuninitialized]
                   kvm_x2apic_icr_write(apic, val);
                                              ^~~
   arch/x86/kvm/lapic.c:2435:9: note: initialize the variable 'val' to silence this warning
           u64 val;
                  ^
                   = 0
   1 warning generated.


vim +/val +2445 arch/x86/kvm/lapic.c

  2430	
  2431	/* emulate APIC access in a trap manner */
  2432	void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
  2433	{
  2434		struct kvm_lapic *apic = vcpu->arch.apic;
  2435		u64 val;
  2436	
  2437		/*
  2438		 * ICR is a single 64-bit register when x2APIC is enabled.  For legacy
  2439		 * xAPIC, ICR writes need to go down the common (slightly slower) path
  2440		 * to get the upper half from ICR2.
  2441		 *
  2442		 * TODO: optimize to just emulate side effect w/o one more write
  2443		 */
  2444		if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
> 2445			kvm_x2apic_icr_write(apic, val);
  2446		} else {
  2447			val = kvm_lapic_get_reg(apic, offset);
  2448			kvm_lapic_reg_write(apic, offset, (u32)val);
  2449		}
  2450	}
  2451	EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
  2452	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
  2023-09-04  1:35 ` [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit Tao Su
  2023-09-04  2:46   ` Chao Gao
  2023-09-04  4:16   ` kernel test robot
@ 2023-09-04  5:02   ` Tao Su
  2023-09-05 23:03   ` Sean Christopherson
  2023-09-24 13:58   ` kernel test robot
  4 siblings, 0 replies; 14+ messages in thread
From: Tao Su @ 2023-09-04  5:02 UTC (permalink / raw)
  To: kvm; +Cc: seanjc, pbonzini, chao.gao, guang.zeng, yi1.lai

On Mon, Sep 04, 2023 at 09:35:55AM +0800, Tao Su wrote:
> When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
> MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
> thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
> but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.
> 
> Since bit12 of ICR is no longer BUSY bit but UNUSED bit in x2APIC mode,
> and SDM has no detail about how hardware will handle the UNUSED bit12
> set, we tested on Intel CPU (SRF/GNR) with IPI virtualization and found
> the UNUSED bit12 was also cleared by hardware without #GP. Therefore,
> the clearing of bit12 should be still kept being consistent with the
> hardware behavior.
> 
> Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")
> Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> Tested-by: Yi Lai <yi1.lai@intel.com>
> ---
>  arch/x86/kvm/lapic.c | 27 ++++++++++++++++++++-------
>  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index a983a16163b1..09a376aeb4a0 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1482,8 +1482,17 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
>  {
>  	struct kvm_lapic_irq irq;
>  
> -	/* KVM has no delay and should always clear the BUSY/PENDING flag. */
> -	WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
> +	/*
> +	 * In non-x2apic mode, KVM has no delay and should always clear the
> +	 * BUSY/PENDING flag. In x2apic mode, KVM should clear the unused bit12
> +	 * of ICR since hardware will also clear this bit. Although
> +	 * APIC_ICR_BUSY and X2APIC_ICR_UNUSED_12 are same, they mean different
> +	 * things in different modes.
> +	 */
> +	if (!apic_x2apic_mode(apic))
> +		WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
> +	else
> +		WARN_ON_ONCE(icr_low & X2APIC_ICR_UNUSED_12);
>  
>  	irq.vector = icr_low & APIC_VECTOR_MASK;
>  	irq.delivery_mode = icr_low & APIC_MODE_MASK;
> @@ -2429,13 +2438,12 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
>  	 * ICR is a single 64-bit register when x2APIC is enabled.  For legacy
>  	 * xAPIC, ICR writes need to go down the common (slightly slower) path
>  	 * to get the upper half from ICR2.
> +	 *
> +	 * TODO: optimize to just emulate side effect w/o one more write
>  	 */
>  	if (apic_x2apic_mode(apic) && offset == APIC_ICR) {
> -		val = kvm_lapic_get_reg64(apic, APIC_ICR);

Sorry for the mistake, I notice this line is removed accidentally, I will add it back in the
next version.

> -		kvm_apic_send_ipi(apic, (u32)val, (u32)(val >> 32));
> -		trace_kvm_apic_write(APIC_ICR, val);
> +		kvm_x2apic_icr_write(apic, val);
>  	} else {
> -		/* TODO: optimize to just emulate side effect w/o one more write */
>  		val = kvm_lapic_get_reg(apic, offset);
>  		kvm_lapic_reg_write(apic, offset, (u32)val);
>  	}
> @@ -3122,7 +3130,12 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
>  
>  int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
>  {
> -	data &= ~APIC_ICR_BUSY;
> +	/*
> +	 * The Delivery Status Bit(bit 12) is removed in x2apic mode, but this
> +	 * bit is also cleared by hardware, so keep consistent with hardware
> +	 * behavior to clear this bit.
> +	 */
> +	data &= ~X2APIC_ICR_UNUSED_12;
>  
>  	kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
>  	kvm_lapic_set_reg64(apic, APIC_ICR, data);
> -- 
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
  2023-09-04  1:35 ` [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit Tao Su
                     ` (2 preceding siblings ...)
  2023-09-04  5:02   ` Tao Su
@ 2023-09-05 23:03   ` Sean Christopherson
  2023-09-06  5:07     ` Tao Su
  2023-09-24 13:58   ` kernel test robot
  4 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2023-09-05 23:03 UTC (permalink / raw)
  To: Tao Su; +Cc: kvm, pbonzini, chao.gao, guang.zeng, yi1.lai

+Suravee

On Mon, Sep 04, 2023, Tao Su wrote:
> When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
> MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
> thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
> but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.
> 
> Since bit12 of ICR is no longer BUSY bit but UNUSED bit in x2APIC mode,
> and SDM has no detail about how hardware will handle the UNUSED bit12
> set, we tested on Intel CPU (SRF/GNR) with IPI virtualization and found
> the UNUSED bit12 was also cleared by hardware without #GP. Therefore,
> the clearing of bit12 should be still kept being consistent with the
> hardware behavior.

I'm confused.  If hardware clears the bit, then why is it set in the vAPIC page
after a trap-like APIC-write VM-Exit?  In other words, how is this not a ucode
or hardware bug?

Suravee, can you confirm what happens on AMD with x2AVIC?  Does hardware *always*
clear the busy bit if it's set by the guest?  If so, then we could "optimize"
avic_incomplete_ipi_interception() to skip the busy check, e.g.

diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index cfc8ab773025..4bf0bb250147 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -513,7 +513,7 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
                 * in which case KVM needs to emulate the ICR write as well in
                 * order to clear the BUSY flag.
                 */
-               if (icrl & APIC_ICR_BUSY)
+               if (!apic_x2apic_mode(apic) && (icrl & APIC_ICR_BUSY))
                        kvm_apic_write_nodecode(vcpu, APIC_ICR);
                else
                        kvm_apic_send_ipi(apic, icrl, icrh);

> Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")
> Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> Tested-by: Yi Lai <yi1.lai@intel.com>
> ---
>  arch/x86/kvm/lapic.c | 27 ++++++++++++++++++++-------
>  1 file changed, 20 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index a983a16163b1..09a376aeb4a0 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1482,8 +1482,17 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
>  {
>  	struct kvm_lapic_irq irq;
>  
> -	/* KVM has no delay and should always clear the BUSY/PENDING flag. */
> -	WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
> +	/*
> +	 * In non-x2apic mode, KVM has no delay and should always clear the
> +	 * BUSY/PENDING flag. In x2apic mode, KVM should clear the unused bit12
> +	 * of ICR since hardware will also clear this bit. Although
> +	 * APIC_ICR_BUSY and X2APIC_ICR_UNUSED_12 are same, they mean different
> +	 * things in different modes.
> +	 */
> +	if (!apic_x2apic_mode(apic))
> +		WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
> +	else
> +		WARN_ON_ONCE(icr_low & X2APIC_ICR_UNUSED_12);

NAK to the new name, KVM is absolutely not going to zero an arbitrary "unused"
bit.  If Intel wants to reclaim bit 12 for something useful in the future, then
Intel can ship CPUs that don't touch the "reserved" bit, and deal with all the
fun of finding and updating all software that unnecessarily sets the busy bit in
x2apic mode.

If we really want to pretend that Intel has more than a snowball's chance in hell
of doing something useful with bit 12, then the right thing to do in KVM is to
ignore the bit entirely and let the guest keep the pieces, e.g.

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 113ca9661ab2..36ec195a3339 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1473,8 +1473,13 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
 {
        struct kvm_lapic_irq irq;
 
-       /* KVM has no delay and should always clear the BUSY/PENDING flag. */
-       WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
+       /*
+        * KVM has no delay and should always clear the BUSY/PENDING flag.
+        * The flag doesn't exist in x2APIC mode; both the SDM and APM state
+        * that the flag "Must Be Zero", but neither Intel nor AMD enforces
+        * that (or any other reserved bits in ICR).
+        */
+       WARN_ON_ONCE(!apic_x2apic_mode(apic) && (icr_low & APIC_ICR_BUSY));
 
        irq.vector = icr_low & APIC_VECTOR_MASK;
        irq.delivery_mode = icr_low & APIC_MODE_MASK;
@@ -3113,8 +3118,6 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
 
 int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
 {
-       data &= ~APIC_ICR_BUSY;
-
        kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
        kvm_lapic_set_reg64(apic, APIC_ICR, data);
        trace_kvm_apic_write(APIC_ICR, data);
diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
index cfc8ab773025..4bf0bb250147 100644
--- a/arch/x86/kvm/svm/avic.c
+++ b/arch/x86/kvm/svm/avic.c
@@ -513,7 +513,7 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
                 * in which case KVM needs to emulate the ICR write as well in
                 * order to clear the BUSY flag.
                 */
-               if (icrl & APIC_ICR_BUSY)
+               if (!apic_x2apic_mode(apic) && (icrl & APIC_ICR_BUSY))
                        kvm_apic_write_nodecode(vcpu, APIC_ICR);
                else
                        kvm_apic_send_ipi(apic, icrl, icrh);


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
  2023-09-05 23:03   ` Sean Christopherson
@ 2023-09-06  5:07     ` Tao Su
  2023-09-06 22:17       ` Sean Christopherson
  0 siblings, 1 reply; 14+ messages in thread
From: Tao Su @ 2023-09-06  5:07 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, pbonzini, chao.gao, guang.zeng, yi1.lai

On Tue, Sep 05, 2023 at 04:03:36PM -0700, Sean Christopherson wrote:
> +Suravee
> 
> On Mon, Sep 04, 2023, Tao Su wrote:
> > When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
> > MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
> > thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
> > but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.
> > 
> > Since bit12 of ICR is no longer BUSY bit but UNUSED bit in x2APIC mode,
> > and SDM has no detail about how hardware will handle the UNUSED bit12
> > set, we tested on Intel CPU (SRF/GNR) with IPI virtualization and found
> > the UNUSED bit12 was also cleared by hardware without #GP. Therefore,
> > the clearing of bit12 should be still kept being consistent with the
> > hardware behavior.
> 
> I'm confused.  If hardware clears the bit, then why is it set in the vAPIC page
> after a trap-like APIC-write VM-Exit?  In other words, how is this not a ucode
> or hardware bug?

Sorry, I didn't describe it clearly.

On bare-metal, bit12 of ICR MSR will be cleared after setting this bit.

If bit12 is set in guest, the bit is not cleared in the vAPIC page after APIC-write
VM-Exit. So whether to clear bit12 in vAPIC page needs to be considered.

> 
> Suravee, can you confirm what happens on AMD with x2AVIC?  Does hardware *always*
> clear the busy bit if it's set by the guest?  If so, then we could "optimize"
> avic_incomplete_ipi_interception() to skip the busy check, e.g.
> 
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index cfc8ab773025..4bf0bb250147 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -513,7 +513,7 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
>                  * in which case KVM needs to emulate the ICR write as well in
>                  * order to clear the BUSY flag.
>                  */
> -               if (icrl & APIC_ICR_BUSY)
> +               if (!apic_x2apic_mode(apic) && (icrl & APIC_ICR_BUSY))
>                         kvm_apic_write_nodecode(vcpu, APIC_ICR);
>                 else
>                         kvm_apic_send_ipi(apic, icrl, icrh);
> 
> > Fixes: 5413bcba7ed5 ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")
> > Signed-off-by: Tao Su <tao1.su@linux.intel.com>
> > Tested-by: Yi Lai <yi1.lai@intel.com>
> > ---
> >  arch/x86/kvm/lapic.c | 27 ++++++++++++++++++++-------
> >  1 file changed, 20 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index a983a16163b1..09a376aeb4a0 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1482,8 +1482,17 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
> >  {
> >  	struct kvm_lapic_irq irq;
> >  
> > -	/* KVM has no delay and should always clear the BUSY/PENDING flag. */
> > -	WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
> > +	/*
> > +	 * In non-x2apic mode, KVM has no delay and should always clear the
> > +	 * BUSY/PENDING flag. In x2apic mode, KVM should clear the unused bit12
> > +	 * of ICR since hardware will also clear this bit. Although
> > +	 * APIC_ICR_BUSY and X2APIC_ICR_UNUSED_12 are same, they mean different
> > +	 * things in different modes.
> > +	 */
> > +	if (!apic_x2apic_mode(apic))
> > +		WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
> > +	else
> > +		WARN_ON_ONCE(icr_low & X2APIC_ICR_UNUSED_12);
> 
> NAK to the new name, KVM is absolutely not going to zero an arbitrary "unused"
> bit.  If Intel wants to reclaim bit 12 for something useful in the future, then
> Intel can ship CPUs that don't touch the "reserved" bit, and deal with all the
> fun of finding and updating all software that unnecessarily sets the busy bit in
> x2apic mode.
> 
> If we really want to pretend that Intel has more than a snowball's chance in hell
> of doing something useful with bit 12, then the right thing to do in KVM is to
> ignore the bit entirely and let the guest keep the pieces, e.g.

Yes, agree. Currently bit12 is unused and cleared by hardware on bare-metal, clearing
bit12 in guest is for keeping the same behavior as bare-metal. But KVM should not to
zero the unused bit until SDM reclaims it for something, so ignoring the bit in KVM
should be better.

Thanks,
Tao

> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 113ca9661ab2..36ec195a3339 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1473,8 +1473,13 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
>  {
>         struct kvm_lapic_irq irq;
>  
> -       /* KVM has no delay and should always clear the BUSY/PENDING flag. */
> -       WARN_ON_ONCE(icr_low & APIC_ICR_BUSY);
> +       /*
> +        * KVM has no delay and should always clear the BUSY/PENDING flag.
> +        * The flag doesn't exist in x2APIC mode; both the SDM and APM state
> +        * that the flag "Must Be Zero", but neither Intel nor AMD enforces
> +        * that (or any other reserved bits in ICR).
> +        */
> +       WARN_ON_ONCE(!apic_x2apic_mode(apic) && (icr_low & APIC_ICR_BUSY));
>  
>         irq.vector = icr_low & APIC_VECTOR_MASK;
>         irq.delivery_mode = icr_low & APIC_MODE_MASK;
> @@ -3113,8 +3118,6 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
>  
>  int kvm_x2apic_icr_write(struct kvm_lapic *apic, u64 data)
>  {
> -       data &= ~APIC_ICR_BUSY;
> -
>         kvm_apic_send_ipi(apic, (u32)data, (u32)(data >> 32));
>         kvm_lapic_set_reg64(apic, APIC_ICR, data);
>         trace_kvm_apic_write(APIC_ICR, data);
> diff --git a/arch/x86/kvm/svm/avic.c b/arch/x86/kvm/svm/avic.c
> index cfc8ab773025..4bf0bb250147 100644
> --- a/arch/x86/kvm/svm/avic.c
> +++ b/arch/x86/kvm/svm/avic.c
> @@ -513,7 +513,7 @@ int avic_incomplete_ipi_interception(struct kvm_vcpu *vcpu)
>                  * in which case KVM needs to emulate the ICR write as well in
>                  * order to clear the BUSY flag.
>                  */
> -               if (icrl & APIC_ICR_BUSY)
> +               if (!apic_x2apic_mode(apic) && (icrl & APIC_ICR_BUSY))
>                         kvm_apic_write_nodecode(vcpu, APIC_ICR);
>                 else
>                         kvm_apic_send_ipi(apic, icrl, icrh);
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
  2023-09-06  5:07     ` Tao Su
@ 2023-09-06 22:17       ` Sean Christopherson
  2023-09-07  9:56         ` Tao Su
  0 siblings, 1 reply; 14+ messages in thread
From: Sean Christopherson @ 2023-09-06 22:17 UTC (permalink / raw)
  To: Tao Su; +Cc: kvm, pbonzini, chao.gao, guang.zeng, yi1.lai

On Wed, Sep 06, 2023, Tao Su wrote:
> On Tue, Sep 05, 2023 at 04:03:36PM -0700, Sean Christopherson wrote:
> > +Suravee
> > 
> > On Mon, Sep 04, 2023, Tao Su wrote:
> > > When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
> > > MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
> > > thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
> > > but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.
> > > 
> > > Since bit12 of ICR is no longer BUSY bit but UNUSED bit in x2APIC mode,
> > > and SDM has no detail about how hardware will handle the UNUSED bit12
> > > set, we tested on Intel CPU (SRF/GNR) with IPI virtualization and found
> > > the UNUSED bit12 was also cleared by hardware without #GP. Therefore,
> > > the clearing of bit12 should be still kept being consistent with the
> > > hardware behavior.
> > 
> > I'm confused.  If hardware clears the bit, then why is it set in the vAPIC page
> > after a trap-like APIC-write VM-Exit?  In other words, how is this not a ucode
> > or hardware bug?
> 
> Sorry, I didn't describe it clearly.
> 
> On bare-metal, bit12 of ICR MSR will be cleared after setting this bit.
> 
> If bit12 is set in guest, the bit is not cleared in the vAPIC page after APIC-write
> VM-Exit. So whether to clear bit12 in vAPIC page needs to be considered.

I got that, the behavior just seems odd to me.  And I'm grumpy that Intel punted
the problem to software.  But the SDM specifically calls out that this is the
correct behavior :-/

Specifically, in the context of IPI virtualization:

  If ECX contains 830H, a general-protection fault occurs if any of bits 31:20,
  17:16, or 13 of EAX is non-zero.

and

  If ECX contains 830H, the processor then checks the value of VICR to determine
  whether the following are all true:

  Bits 19:18 (destination shorthand) are 00B (no shorthand).
  Bit 15 (trigger mode) is 0 (edge).
  Bit 12 (unused) is 0.
  Bit 11 (destination mode) is 0 (physical).
  Bits 10:8 (delivery mode) are 000B (fixed).
  
  If all of the items above are true, the processor performs IPI virtualization
  using the 8-bit vector in byte 0 of VICR and the 32-bit APIC ID in VICR[63:32]
  (see Section 30.1.6). Otherwise, the logical processor causes an APIC-write VM
  exit (see Section 30.4.3.3).  If ECX contains 830H, the processor then checks
  the value of VICR to determine whether the following are all true:

I.e. the "unused" busy bit must be zero.  The part that makes me grumpy is that
hardware does check that other reserved bits are actually zero:

  If special processing applies, no general-protection exception is produced due
  to the fact that the local APIC is in xAPIC mode. However, WRMSR does perform
  the normal reserved-bit checking:
   - If ECX contains 808H or 83FH, a general-protection fault occurs if either EDX or EAX[31:8] is non-zero.
   - If ECX contains 80BH, a general-protection fault occurs if either EDX or EAX is non-zero.
   - If ECX contains 830H, a general-protection fault occurs if any of bits 31:20, 17:16, or 13 of EAX is non-zero.

Which implies that the hardware *does* enforce all the other reserved bits, but
punted bit 12 to the hypervisor :-(

That said, I think we have an "out".  Under the x2APIC section, regarding ICR,
the SDM also says:

  It remains readable only to aid in debugging; however, software should not
  assume the value returned by reading the ICR is the last written value.

I.e. KVM basically has free reign to do whatever it wants, so long as it doesn't
confuse userspace or break KVM's ABI.  As much as I want to say "do nothing",
clearing bit 12 so that it reads back as '0' is the safer approach.  Just please
don't add a new #define for, it's far easier to understand what's going on if we
just use APIC_ICR_BUSY, especially given that I highly doubt the bit will actually
be repurposed for something new.

FWIW, I also suspect that hardware isn't clearing the busy bit per se, I suspect
that hardware simply reads the bit as zero.

Side topic, unless I'm blind, KVM is missing the reserved bits #GP checks for ICR
bits bits 31:20, 17:16, and 13.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
  2023-09-06 22:17       ` Sean Christopherson
@ 2023-09-07  9:56         ` Tao Su
  0 siblings, 0 replies; 14+ messages in thread
From: Tao Su @ 2023-09-07  9:56 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: kvm, pbonzini, chao.gao, guang.zeng, yi1.lai

On Wed, Sep 06, 2023 at 03:17:44PM -0700, Sean Christopherson wrote:
> On Wed, Sep 06, 2023, Tao Su wrote:
> > On Tue, Sep 05, 2023 at 04:03:36PM -0700, Sean Christopherson wrote:
> > > +Suravee
> > > 
> > > On Mon, Sep 04, 2023, Tao Su wrote:
> > > > When IPI virtualization is enabled, a WARN is triggered if bit12 of ICR
> > > > MSR is set after APIC-write VM-exit. The reason is kvm_apic_send_ipi()
> > > > thinks the APIC_ICR_BUSY bit should be cleared because KVM has no delay,
> > > > but kvm_apic_write_nodecode() doesn't clear the APIC_ICR_BUSY bit.
> > > > 
> > > > Since bit12 of ICR is no longer BUSY bit but UNUSED bit in x2APIC mode,
> > > > and SDM has no detail about how hardware will handle the UNUSED bit12
> > > > set, we tested on Intel CPU (SRF/GNR) with IPI virtualization and found
> > > > the UNUSED bit12 was also cleared by hardware without #GP. Therefore,
> > > > the clearing of bit12 should be still kept being consistent with the
> > > > hardware behavior.
> > > 
> > > I'm confused.  If hardware clears the bit, then why is it set in the vAPIC page
> > > after a trap-like APIC-write VM-Exit?  In other words, how is this not a ucode
> > > or hardware bug?
> > 
> > Sorry, I didn't describe it clearly.
> > 
> > On bare-metal, bit12 of ICR MSR will be cleared after setting this bit.
> > 
> > If bit12 is set in guest, the bit is not cleared in the vAPIC page after APIC-write
> > VM-Exit. So whether to clear bit12 in vAPIC page needs to be considered.
> 
> I got that, the behavior just seems odd to me.  And I'm grumpy that Intel punted
> the problem to software.  But the SDM specifically calls out that this is the
> correct behavior :-/
> 
> Specifically, in the context of IPI virtualization:
> 
>   If ECX contains 830H, a general-protection fault occurs if any of bits 31:20,
>   17:16, or 13 of EAX is non-zero.
> 
> and
> 
>   If ECX contains 830H, the processor then checks the value of VICR to determine
>   whether the following are all true:
> 
>   Bits 19:18 (destination shorthand) are 00B (no shorthand).
>   Bit 15 (trigger mode) is 0 (edge).
>   Bit 12 (unused) is 0.
>   Bit 11 (destination mode) is 0 (physical).
>   Bits 10:8 (delivery mode) are 000B (fixed).
>   
>   If all of the items above are true, the processor performs IPI virtualization
>   using the 8-bit vector in byte 0 of VICR and the 32-bit APIC ID in VICR[63:32]
>   (see Section 30.1.6). Otherwise, the logical processor causes an APIC-write VM
>   exit (see Section 30.4.3.3).  If ECX contains 830H, the processor then checks
>   the value of VICR to determine whether the following are all true:
> 
> I.e. the "unused" busy bit must be zero.  The part that makes me grumpy is that
> hardware does check that other reserved bits are actually zero:
> 
>   If special processing applies, no general-protection exception is produced due
>   to the fact that the local APIC is in xAPIC mode. However, WRMSR does perform
>   the normal reserved-bit checking:
>    - If ECX contains 808H or 83FH, a general-protection fault occurs if either EDX or EAX[31:8] is non-zero.
>    - If ECX contains 80BH, a general-protection fault occurs if either EDX or EAX is non-zero.
>    - If ECX contains 830H, a general-protection fault occurs if any of bits 31:20, 17:16, or 13 of EAX is non-zero.
> 
> Which implies that the hardware *does* enforce all the other reserved bits, but
> punted bit 12 to the hypervisor :-(
> 
> That said, I think we have an "out".  Under the x2APIC section, regarding ICR,
> the SDM also says:
> 
>   It remains readable only to aid in debugging; however, software should not
>   assume the value returned by reading the ICR is the last written value.
> 
> I.e. KVM basically has free reign to do whatever it wants, so long as it doesn't
> confuse userspace or break KVM's ABI.  As much as I want to say "do nothing",
> clearing bit 12 so that it reads back as '0' is the safer approach.  Just please
> don't add a new #define for, it's far easier to understand what's going on if we
> just use APIC_ICR_BUSY, especially given that I highly doubt the bit will actually
> be repurposed for something new.

Thanks for such a detailed analysis. I will submit a patch to clear bit12 since it is
a safer approach, and also drop the unnecessary alias.

> 
> FWIW, I also suspect that hardware isn't clearing the busy bit per se, I suspect
> that hardware simply reads the bit as zero.
> 
> Side topic, unless I'm blind, KVM is missing the reserved bits #GP checks for ICR
> bits bits 31:20, 17:16, and 13.

Yes, it is needed to check the reserved bits of ICR and inject #GP when IPI virtualization
disabled (otherwise it is injected by hardware). The related kselftest is also necessary
to verify the #GP is correctly emulated.

Thanks,
Tao


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit
  2023-09-04  1:35 ` [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit Tao Su
                     ` (3 preceding siblings ...)
  2023-09-05 23:03   ` Sean Christopherson
@ 2023-09-24 13:58   ` kernel test robot
  4 siblings, 0 replies; 14+ messages in thread
From: kernel test robot @ 2023-09-24 13:58 UTC (permalink / raw)
  To: Tao Su
  Cc: oe-lkp, lkp, Yi Lai, kvm, xudong.hao, seanjc, pbonzini, chao.gao,
	guang.zeng, tao1.su, oliver.sang



Hello,

kernel test robot noticed "kernel-selftests.kvm.xapic_state_test.fail" on:

commit: 6ccade077c151e719397b0a86f48554db9aa1c24 ("[PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit")
url: https://github.com/intel-lab-lkp/linux/commits/Tao-Su/x86-apic-Introduce-X2APIC_ICR_UNUSED_12-for-x2APIC-mode/20230904-093801
patch link: https://lore.kernel.org/all/20230904013555.725413-3-tao1.su@linux.intel.com/
patch subject: [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit

in testcase: kernel-selftests
version: kernel-selftests-x86_64-60acb023-1_20230329
with following parameters:

	group: kvm



compiler: gcc-12
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) with 256G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


besides, we also noticed kvm.smm_test.fail which keeps pass on parent.

83c02e23e21fec27 6ccade077c151e719397b0a86f4
---------------- ---------------------------
       fail:runs  %reproduction    fail:runs
           |             |             |
           :6          100%           6:6     kernel-selftests.kvm.smm_test.fail
           :6          100%           6:6     kernel-selftests.kvm.xapic_state_test.fail



If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202309242109.43637c74-oliver.sang@intel.com



# timeout set to 120
# selftests: kvm: smm_test
# ==== Test Assertion Failure ====
#   x86_64/smm_test.c:179: stage_reported == stage || stage_reported == SMRAM_STAGE
#   pid=6354 tid=6354 errno=4 - Interrupted system call
#      1	0x00000000004026fc: main at smm_test.c:179
#      2	0x00007f08310cb189: ?? ??:0
#      3	0x00007f08310cb244: ?? ??:0
#      4	0x0000000000402810: _start at ??:?
#   Unexpected stage: #3, got 4
not ok 25 selftests: kvm: smm_test # exit=254

...

# timeout set to 120
# selftests: kvm: xapic_state_test
# ==== Test Assertion Failure ====
#   x86_64/xapic_state_test.c:78: __a == __b
#   pid=7081 tid=7081 errno=4 - Interrupted system call
#      1	0x0000000000402a1f: ____test_icr at xapic_state_test.c:78
#      2	0x0000000000402c51: __test_icr at xapic_state_test.c:95 (discriminator 3)
#      3	 (inlined by) test_icr at xapic_state_test.c:106 (discriminator 3)
#      4	0x0000000000402433: main at xapic_state_test.c:197
#      5	0x00007f8a0c660189: ?? ??:0
#      6	0x00007f8a0c660244: ?? ??:0
#      7	0x0000000000402640: _start at ??:?
#   ASSERT_EQ(icr & ~APIC_ICR_BUSY, val & ~APIC_ICR_BUSY) failed.
# 	icr & ~APIC_ICR_BUSY is 0
# 	val & ~APIC_ICR_BUSY is 0x44000
not ok 47 selftests: kvm: xapic_state_test # exit=254



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20230924/202309242109.43637c74-oliver.sang@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-24 13:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-04  1:35 [PATCH 0/2] KVM: x86: Fix a WARN in kvm_apic_send_ipi() Tao Su
2023-09-04  1:35 ` [PATCH 1/2] x86/apic: Introduce X2APIC_ICR_UNUSED_12 for x2APIC mode Tao Su
2023-09-04  2:58   ` Chao Gao
2023-09-04  3:03     ` Tao Su
2023-09-04  1:35 ` [PATCH 2/2] KVM: x86: Clear X2APIC_ICR_UNUSED_12 after APIC-write VM-exit Tao Su
2023-09-04  2:46   ` Chao Gao
2023-09-04  3:00     ` Tao Su
2023-09-04  4:16   ` kernel test robot
2023-09-04  5:02   ` Tao Su
2023-09-05 23:03   ` Sean Christopherson
2023-09-06  5:07     ` Tao Su
2023-09-06 22:17       ` Sean Christopherson
2023-09-07  9:56         ` Tao Su
2023-09-24 13:58   ` kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.