All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] KVM: x86: Various IOAPIC improvements
@ 2017-11-05 13:52 Nikita Leshenko
  2017-11-05 13:52 ` [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race Nikita Leshenko
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Nikita Leshenko @ 2017-11-05 13:52 UTC (permalink / raw)
  To: kvm, pbonzini, rkrcmar; +Cc: idan.brown

Hello,

This patch series aims to improve various aspects of the in-kernel
IOAPIC.

The first patch fixes EOIs that were not delivered if the redirection
table was modified from an interrupt handler. We observed that VMWare
ESX tends to do that.

The other patches implement small semantic IOAPIC improvements that
are in-line with the respective code in QEMU.

Thank you,
Nikita

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  2017-11-05 13:52 [PATCH 0/5] KVM: x86: Various IOAPIC improvements Nikita Leshenko
@ 2017-11-05 13:52 ` Nikita Leshenko
  2017-11-06  2:00   ` Wanpeng Li
  2017-11-05 13:52 ` [PATCH 2/5] KVM: x86: ioapic: Don't fire level irq when Remote IRR set Nikita Leshenko
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Nikita Leshenko @ 2017-11-05 13:52 UTC (permalink / raw)
  To: kvm, pbonzini, rkrcmar; +Cc: idan.brown, Nikita Leshenko, Konrad Rzeszutek Wilk

KVM uses ioapic_handled_vectors to track vectors that need to notify the
IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an
interrupt with old configuration is pending or running and
ioapic_handled_vectors only remembers the newest configuration;
thus EOI from the old interrupt is not delievered to the IOAPIC.

A previous commit db2bdcbbbd32
("KVM: x86: fix edge EOI and IOAPIC reconfig race")
addressed this issue by adding pending edge-triggered interrupts to
ioapic_handled_vectors, fixing this race for edge-triggered interrupts.
The commit explicitly ignored level-triggered interrupts,
but this race applies to them as well:

1) IOAPIC sends a level triggered interrupt vector to VCPU0
2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC
   to route the vector to VCPU1. The reconfiguration rewrites only the
   upper 32 bits of the IOREDTBLn register. (Causes KVM to update
   ioapic_handled_vectors for VCPU0 and it no longer includes the vector.)
3) VCPU0 sends EOI for the vector, but it's not delievered to the
   IOAPIC because the ioapic_handled_vectors doesn't include the vector.
4) New interrupts are not delievered to VCPU1 because remote_irr bit
   is set forever.

Therefore, the correct behavior is to add all pending and running
interrupts to ioapic_handled_vectors.

This commit introduces a slight performance hit similar to
commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
for the rare case that the vector is reused by a non-IOAPIC source on
VCPU0. We prefer to keep solution simple and not handle this case just
as the original commit does.

Fixes: db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")

Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/ioapic.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index bdff437acbcb..ae0a7dc318b2 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -257,8 +257,7 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors)
 		    index == RTC_GSI) {
 			if (kvm_apic_match_dest(vcpu, NULL, 0,
 			             e->fields.dest_id, e->fields.dest_mode) ||
-			    (e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
-			     kvm_apic_pending_eoi(vcpu, e->fields.vector)))
+			    kvm_apic_pending_eoi(vcpu, e->fields.vector))
 				__set_bit(e->fields.vector,
 					  ioapic_handled_vectors);
 		}
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] KVM: x86: ioapic: Don't fire level irq when Remote IRR set
  2017-11-05 13:52 [PATCH 0/5] KVM: x86: Various IOAPIC improvements Nikita Leshenko
  2017-11-05 13:52 ` [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race Nikita Leshenko
@ 2017-11-05 13:52 ` Nikita Leshenko
  2017-11-06  2:50   ` Wanpeng Li
  2017-11-05 13:52 ` [PATCH 3/5] KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq Nikita Leshenko
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Nikita Leshenko @ 2017-11-05 13:52 UTC (permalink / raw)
  To: kvm, pbonzini, rkrcmar; +Cc: idan.brown, Nikita Leshenko, Konrad Rzeszutek Wilk

Avoid firing a level-triggered interrupt that has the Remote IRR bit set,
because that means that some CPU is already processing it. The Remote
IRR bit will be cleared after an EOI and the interrupt will refire
if the irq line is still asserted.

This behavior is aligned with QEMU's IOAPIC implementation that was
introduced by commit f99b86b94987
("x86: ioapic: ignore level irq during processing") in QEMU.

Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/ioapic.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index ae0a7dc318b2..5c9231139243 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -323,7 +323,9 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status)
 	struct kvm_lapic_irq irqe;
 	int ret;
 
-	if (entry->fields.mask)
+	if (entry->fields.mask ||
+	    (entry->fields.trig_mode == IOAPIC_LEVEL_TRIG &&
+	    entry->fields.remote_irr))
 		return -1;
 
 	ioapic_debug("dest=%x dest_mode=%x delivery_mode=%x "
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
  2017-11-05 13:52 [PATCH 0/5] KVM: x86: Various IOAPIC improvements Nikita Leshenko
  2017-11-05 13:52 ` [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race Nikita Leshenko
  2017-11-05 13:52 ` [PATCH 2/5] KVM: x86: ioapic: Don't fire level irq when Remote IRR set Nikita Leshenko
@ 2017-11-05 13:52 ` Nikita Leshenko
  2017-11-06  3:16   ` Wanpeng Li
  2017-11-05 13:52 ` [PATCH 4/5] KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered Nikita Leshenko
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Nikita Leshenko @ 2017-11-05 13:52 UTC (permalink / raw)
  To: kvm, pbonzini, rkrcmar; +Cc: idan.brown, Nikita Leshenko, Konrad Rzeszutek Wilk

Remote IRR for level-triggered interrupts was previously checked in
ioapic_set_irq, but since we now have a check in ioapic_service we
can remove the redundant check from ioapic_set_irq.

This commit doesn't change semantics.

Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/ioapic.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 5c9231139243..6df150eaaa78 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -209,12 +209,12 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
 
 	old_irr = ioapic->irr;
 	ioapic->irr |= mask;
-	if (edge)
+	if (edge) {
 		ioapic->irr_delivered &= ~mask;
-	if ((edge && old_irr == ioapic->irr) ||
-	    (!edge && entry.fields.remote_irr)) {
-		ret = 0;
-		goto out;
+		if (old_irr == ioapic->irr) {
+			ret = 0;
+			goto out;
+		}
 	}
 
 	ret = ioapic_service(ioapic, irq, line_status);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
  2017-11-05 13:52 [PATCH 0/5] KVM: x86: Various IOAPIC improvements Nikita Leshenko
                   ` (2 preceding siblings ...)
  2017-11-05 13:52 ` [PATCH 3/5] KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq Nikita Leshenko
@ 2017-11-05 13:52 ` Nikita Leshenko
  2017-11-06  3:30   ` Wanpeng Li
  2017-11-05 13:52 ` [PATCH 5/5] KVM: x86: ioapic: Preserve read-only values in the redirection table Nikita Leshenko
  2017-11-10 21:42 ` [PATCH 0/5] KVM: x86: Various IOAPIC improvements Radim Krčmář
  5 siblings, 1 reply; 22+ messages in thread
From: Nikita Leshenko @ 2017-11-05 13:52 UTC (permalink / raw)
  To: kvm, pbonzini, rkrcmar; +Cc: idan.brown, Nikita Leshenko, Konrad Rzeszutek Wilk

Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
IOAPICs without an EOI register. They simulate the EOI message manually
by changing the trigger mode to edge and then back to level, with the
entry being masked during this.

QEMU implements this feature in commit ed1263c363c9
("ioapic: clear remote irr bit for edge-triggered interrupts")

As a side effect, this commit removes an incorrect behavior where Remote
IRR was cleared when the redirection table entry was rewritten. This is not
consistent with the manual and also opens an opportunity for a strange
behavior when a redirection table entry is modified from an interrupt
handler that handles the same entry: The modification will clear the
Remote IRR bit even though the interrupt handler is still running.

Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/ioapic.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 6df150eaaa78..163d340ee5f8 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -304,8 +304,17 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 		} else {
 			e->bits &= ~0xffffffffULL;
 			e->bits |= (u32) val;
-			e->fields.remote_irr = 0;
 		}
+
+		/*
+		 * Some OSes (Linux, Xen) assume that Remote IRR bit will
+		 * be cleared by IOAPIC hardware when the entry is configured
+		 * as edge-triggered. This behavior is used to simulate an
+		 * explicit EOI on IOAPICs that don't have the EOI register.
+		 */
+		if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
+			e->fields.remote_irr = 0;
+
 		mask_after = e->fields.mask;
 		if (mask_before != mask_after)
 			kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] KVM: x86: ioapic: Preserve read-only values in the redirection table
  2017-11-05 13:52 [PATCH 0/5] KVM: x86: Various IOAPIC improvements Nikita Leshenko
                   ` (3 preceding siblings ...)
  2017-11-05 13:52 ` [PATCH 4/5] KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered Nikita Leshenko
@ 2017-11-05 13:52 ` Nikita Leshenko
  2017-11-06  3:20   ` Wanpeng Li
  2017-11-10 21:42 ` [PATCH 0/5] KVM: x86: Various IOAPIC improvements Radim Krčmář
  5 siblings, 1 reply; 22+ messages in thread
From: Nikita Leshenko @ 2017-11-05 13:52 UTC (permalink / raw)
  To: kvm, pbonzini, rkrcmar; +Cc: idan.brown, Nikita Leshenko, Konrad Rzeszutek Wilk

According to 82093AA (IOAPIC) manual, Remote IRR and Delivery Status are
read-only. QEMU implements the bits as RO in commit 479c2a1cb7fb
("ioapic: keep RO bits for IOAPIC entry").

Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
---
 arch/x86/kvm/ioapic.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 163d340ee5f8..4e822ad363f3 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -276,6 +276,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 {
 	unsigned index;
 	bool mask_before, mask_after;
+	int old_remote_irr, old_delivery_status;
 	union kvm_ioapic_redirect_entry *e;
 
 	switch (ioapic->ioregsel) {
@@ -298,6 +299,9 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 			return;
 		e = &ioapic->redirtbl[index];
 		mask_before = e->fields.mask;
+		/* Preserve read-only fields */
+		old_remote_irr = e->fields.remote_irr;
+		old_delivery_status = e->fields.delivery_status;
 		if (ioapic->ioregsel & 1) {
 			e->bits &= 0xffffffff;
 			e->bits |= (u64) val << 32;
@@ -305,6 +309,8 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
 			e->bits &= ~0xffffffffULL;
 			e->bits |= (u32) val;
 		}
+		e->fields.remote_irr = old_remote_irr;
+		e->fields.delivery_status = old_delivery_status;
 
 		/*
 		 * Some OSes (Linux, Xen) assume that Remote IRR bit will
-- 
2.13.3

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  2017-11-05 13:52 ` [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race Nikita Leshenko
@ 2017-11-06  2:00   ` Wanpeng Li
  2017-11-06 13:53     ` Nikita Leshchenko
  0 siblings, 1 reply; 22+ messages in thread
From: Wanpeng Li @ 2017-11-06  2:00 UTC (permalink / raw)
  To: Nikita Leshenko
  Cc: kvm, Paolo Bonzini, Radim Krcmar, idan.brown, Konrad Rzeszutek Wilk

2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
> KVM uses ioapic_handled_vectors to track vectors that need to notify the
> IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an
> interrupt with old configuration is pending or running and
> ioapic_handled_vectors only remembers the newest configuration;
> thus EOI from the old interrupt is not delievered to the IOAPIC.
>
> A previous commit db2bdcbbbd32
> ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
> addressed this issue by adding pending edge-triggered interrupts to
> ioapic_handled_vectors, fixing this race for edge-triggered interrupts.
> The commit explicitly ignored level-triggered interrupts,
> but this race applies to them as well:
>
> 1) IOAPIC sends a level triggered interrupt vector to VCPU0
> 2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC
>    to route the vector to VCPU1. The reconfiguration rewrites only the
>    upper 32 bits of the IOREDTBLn register. (Causes KVM to update
>    ioapic_handled_vectors for VCPU0 and it no longer includes the vector.)

Refer to __ioapic_write_entry() in linux guest kernel codes, both
upper 32 bits and lower 32 bits are written to IOREDTBLx, so when the
scenario which you mentioned will occur?

Regards,
Wanpeng Li

> 3) VCPU0 sends EOI for the vector, but it's not delievered to the
>    IOAPIC because the ioapic_handled_vectors doesn't include the vector.
> 4) New interrupts are not delievered to VCPU1 because remote_irr bit
>    is set forever.
>
> Therefore, the correct behavior is to add all pending and running
> interrupts to ioapic_handled_vectors.
>
> This commit introduces a slight performance hit similar to
> commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
> for the rare case that the vector is reused by a non-IOAPIC source on
> VCPU0. We prefer to keep solution simple and not handle this case just
> as the original commit does.
>
> Fixes: db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
>
> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> ---
>  arch/x86/kvm/ioapic.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index bdff437acbcb..ae0a7dc318b2 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -257,8 +257,7 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors)
>                     index == RTC_GSI) {
>                         if (kvm_apic_match_dest(vcpu, NULL, 0,
>                                      e->fields.dest_id, e->fields.dest_mode) ||
> -                           (e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
> -                            kvm_apic_pending_eoi(vcpu, e->fields.vector)))
> +                           kvm_apic_pending_eoi(vcpu, e->fields.vector))
>                                 __set_bit(e->fields.vector,
>                                           ioapic_handled_vectors);
>                 }
> --
> 2.13.3
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] KVM: x86: ioapic: Don't fire level irq when Remote IRR set
  2017-11-05 13:52 ` [PATCH 2/5] KVM: x86: ioapic: Don't fire level irq when Remote IRR set Nikita Leshenko
@ 2017-11-06  2:50   ` Wanpeng Li
  0 siblings, 0 replies; 22+ messages in thread
From: Wanpeng Li @ 2017-11-06  2:50 UTC (permalink / raw)
  To: Nikita Leshenko
  Cc: kvm, Paolo Bonzini, Radim Krcmar, idan.brown, Konrad Rzeszutek Wilk

2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
> Avoid firing a level-triggered interrupt that has the Remote IRR bit set,
> because that means that some CPU is already processing it. The Remote
> IRR bit will be cleared after an EOI and the interrupt will refire
> if the irq line is still asserted.
>
> This behavior is aligned with QEMU's IOAPIC implementation that was
> introduced by commit f99b86b94987
> ("x86: ioapic: ignore level irq during processing") in QEMU.
>
> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
>  arch/x86/kvm/ioapic.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index ae0a7dc318b2..5c9231139243 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -323,7 +323,9 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status)
>         struct kvm_lapic_irq irqe;
>         int ret;
>
> -       if (entry->fields.mask)
> +       if (entry->fields.mask ||
> +           (entry->fields.trig_mode == IOAPIC_LEVEL_TRIG &&
> +           entry->fields.remote_irr))
>                 return -1;
>
>         ioapic_debug("dest=%x dest_mode=%x delivery_mode=%x "
> --
> 2.13.3
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq
  2017-11-05 13:52 ` [PATCH 3/5] KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq Nikita Leshenko
@ 2017-11-06  3:16   ` Wanpeng Li
  0 siblings, 0 replies; 22+ messages in thread
From: Wanpeng Li @ 2017-11-06  3:16 UTC (permalink / raw)
  To: Nikita Leshenko
  Cc: kvm, Paolo Bonzini, Radim Krcmar, idan.brown, Konrad Rzeszutek Wilk

2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
> Remote IRR for level-triggered interrupts was previously checked in
> ioapic_set_irq, but since we now have a check in ioapic_service we
> can remove the redundant check from ioapic_set_irq.
>
> This commit doesn't change semantics.
>
> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
>  arch/x86/kvm/ioapic.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 5c9231139243..6df150eaaa78 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -209,12 +209,12 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
>
>         old_irr = ioapic->irr;
>         ioapic->irr |= mask;
> -       if (edge)
> +       if (edge) {
>                 ioapic->irr_delivered &= ~mask;
> -       if ((edge && old_irr == ioapic->irr) ||
> -           (!edge && entry.fields.remote_irr)) {
> -               ret = 0;
> -               goto out;
> +               if (old_irr == ioapic->irr) {
> +                       ret = 0;
> +                       goto out;
> +               }
>         }
>
>         ret = ioapic_service(ioapic, irq, line_status);
> --
> 2.13.3
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] KVM: x86: ioapic: Preserve read-only values in the redirection table
  2017-11-05 13:52 ` [PATCH 5/5] KVM: x86: ioapic: Preserve read-only values in the redirection table Nikita Leshenko
@ 2017-11-06  3:20   ` Wanpeng Li
  2017-11-08  0:18     ` Steve Rutherford
  0 siblings, 1 reply; 22+ messages in thread
From: Wanpeng Li @ 2017-11-06  3:20 UTC (permalink / raw)
  To: Nikita Leshenko
  Cc: kvm, Paolo Bonzini, Radim Krcmar, idan.brown, Konrad Rzeszutek Wilk

2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
> According to 82093AA (IOAPIC) manual, Remote IRR and Delivery Status are
> read-only. QEMU implements the bits as RO in commit 479c2a1cb7fb
> ("ioapic: keep RO bits for IOAPIC entry").
>
> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
>  arch/x86/kvm/ioapic.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 163d340ee5f8..4e822ad363f3 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -276,6 +276,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>  {
>         unsigned index;
>         bool mask_before, mask_after;
> +       int old_remote_irr, old_delivery_status;
>         union kvm_ioapic_redirect_entry *e;
>
>         switch (ioapic->ioregsel) {
> @@ -298,6 +299,9 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>                         return;
>                 e = &ioapic->redirtbl[index];
>                 mask_before = e->fields.mask;
> +               /* Preserve read-only fields */
> +               old_remote_irr = e->fields.remote_irr;
> +               old_delivery_status = e->fields.delivery_status;
>                 if (ioapic->ioregsel & 1) {
>                         e->bits &= 0xffffffff;
>                         e->bits |= (u64) val << 32;
> @@ -305,6 +309,8 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>                         e->bits &= ~0xffffffffULL;
>                         e->bits |= (u32) val;
>                 }
> +               e->fields.remote_irr = old_remote_irr;
> +               e->fields.delivery_status = old_delivery_status;
>
>                 /*
>                  * Some OSes (Linux, Xen) assume that Remote IRR bit will
> --
> 2.13.3
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
  2017-11-05 13:52 ` [PATCH 4/5] KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered Nikita Leshenko
@ 2017-11-06  3:30   ` Wanpeng Li
  2017-11-08  0:16     ` Steve Rutherford
  0 siblings, 1 reply; 22+ messages in thread
From: Wanpeng Li @ 2017-11-06  3:30 UTC (permalink / raw)
  To: Nikita Leshenko
  Cc: kvm, Paolo Bonzini, Radim Krcmar, idan.brown, Konrad Rzeszutek Wilk

2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
> Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
> IOAPICs without an EOI register. They simulate the EOI message manually
> by changing the trigger mode to edge and then back to level, with the
> entry being masked during this.
>
> QEMU implements this feature in commit ed1263c363c9
> ("ioapic: clear remote irr bit for edge-triggered interrupts")
>
> As a side effect, this commit removes an incorrect behavior where Remote
> IRR was cleared when the redirection table entry was rewritten. This is not
> consistent with the manual and also opens an opportunity for a strange
> behavior when a redirection table entry is modified from an interrupt
> handler that handles the same entry: The modification will clear the
> Remote IRR bit even though the interrupt handler is still running.
>
> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
> Reviewed-by: Liran Alon <liran.alon@oracle.com>
> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>

> ---
>  arch/x86/kvm/ioapic.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 6df150eaaa78..163d340ee5f8 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -304,8 +304,17 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>                 } else {
>                         e->bits &= ~0xffffffffULL;
>                         e->bits |= (u32) val;
> -                       e->fields.remote_irr = 0;
>                 }
> +
> +               /*
> +                * Some OSes (Linux, Xen) assume that Remote IRR bit will
> +                * be cleared by IOAPIC hardware when the entry is configured
> +                * as edge-triggered. This behavior is used to simulate an
> +                * explicit EOI on IOAPICs that don't have the EOI register.
> +                */
> +               if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
> +                       e->fields.remote_irr = 0;
> +
>                 mask_after = e->fields.mask;
>                 if (mask_before != mask_after)
>                         kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
> --
> 2.13.3
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  2017-11-06  2:00   ` Wanpeng Li
@ 2017-11-06 13:53     ` Nikita Leshchenko
  2017-11-06 14:05       ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Nikita Leshchenko @ 2017-11-06 13:53 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: kvm, Paolo Bonzini, Radim Krcmar, idan.brown, Konrad Rzeszutek Wilk


> On 6 Nov 2017, at 4:00, Wanpeng Li <kernellwp@gmail.com> wrote:
> 
> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>> KVM uses ioapic_handled_vectors to track vectors that need to notify the
>> IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an
>> interrupt with old configuration is pending or running and
>> ioapic_handled_vectors only remembers the newest configuration;
>> thus EOI from the old interrupt is not delievered to the IOAPIC.
>> 
>> A previous commit db2bdcbbbd32
>> ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
>> addressed this issue by adding pending edge-triggered interrupts to
>> ioapic_handled_vectors, fixing this race for edge-triggered interrupts.
>> The commit explicitly ignored level-triggered interrupts,
>> but this race applies to them as well:
>> 
>> 1) IOAPIC sends a level triggered interrupt vector to VCPU0
>> 2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC
>>   to route the vector to VCPU1. The reconfiguration rewrites only the
>>   upper 32 bits of the IOREDTBLn register. (Causes KVM to update
>>   ioapic_handled_vectors for VCPU0 and it no longer includes the vector.)
> 
> Refer to __ioapic_write_entry() in linux guest kernel codes, both
> upper 32 bits and lower 32 bits are written to IOREDTBLx, so when the
> scenario which you mentioned will occur?

You’re correct regarding Linux, but other operating systems may update
the redirection entry partially.

For example, ESXi 5.5 has a kernel function IOAPICSteerVector that
modifies the Destination Field in the upper bits without affecting the
lower bits of IOREDTBLx. We have observed that without this patch ESXi
would loose network connectivity because it would get stuck with the
Remote IRR bit set forever.

Nikita

> 
> Regards,
> Wanpeng Li
> 
>> 3) VCPU0 sends EOI for the vector, but it's not delievered to the
>>   IOAPIC because the ioapic_handled_vectors doesn't include the vector.
>> 4) New interrupts are not delievered to VCPU1 because remote_irr bit
>>   is set forever.
>> 
>> Therefore, the correct behavior is to add all pending and running
>> interrupts to ioapic_handled_vectors.
>> 
>> This commit introduces a slight performance hit similar to
>> commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
>> for the rare case that the vector is reused by a non-IOAPIC source on
>> VCPU0. We prefer to keep solution simple and not handle this case just
>> as the original commit does.
>> 
>> Fixes: db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
>> 
>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> ---
>> arch/x86/kvm/ioapic.c | 3 +--
>> 1 file changed, 1 insertion(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index bdff437acbcb..ae0a7dc318b2 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -257,8 +257,7 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors)
>>                    index == RTC_GSI) {
>>                        if (kvm_apic_match_dest(vcpu, NULL, 0,
>>                                     e->fields.dest_id, e->fields.dest_mode) ||
>> -                           (e->fields.trig_mode == IOAPIC_EDGE_TRIG &&
>> -                            kvm_apic_pending_eoi(vcpu, e->fields.vector)))
>> +                           kvm_apic_pending_eoi(vcpu, e->fields.vector))
>>                                __set_bit(e->fields.vector,
>>                                          ioapic_handled_vectors);
>>                }
>> --
>> 2.13.3

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  2017-11-06 13:53     ` Nikita Leshchenko
@ 2017-11-06 14:05       ` Paolo Bonzini
  2017-11-07  8:13         ` Nikita Leshchenko
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2017-11-06 14:05 UTC (permalink / raw)
  To: Nikita Leshchenko, Wanpeng Li
  Cc: kvm, Radim Krcmar, idan.brown, Konrad Rzeszutek Wilk

On 06/11/2017 14:53, Nikita Leshchenko wrote:
> 
>> On 6 Nov 2017, at 4:00, Wanpeng Li <kernellwp@gmail.com> wrote:
>>
>> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>>> KVM uses ioapic_handled_vectors to track vectors that need to notify the
>>> IOAPIC on EOI. The problem is that IOAPIC can be reconfigured while an
>>> interrupt with old configuration is pending or running and
>>> ioapic_handled_vectors only remembers the newest configuration;
>>> thus EOI from the old interrupt is not delievered to the IOAPIC.
>>>
>>> A previous commit db2bdcbbbd32
>>> ("KVM: x86: fix edge EOI and IOAPIC reconfig race")
>>> addressed this issue by adding pending edge-triggered interrupts to
>>> ioapic_handled_vectors, fixing this race for edge-triggered interrupts.
>>> The commit explicitly ignored level-triggered interrupts,
>>> but this race applies to them as well:
>>>
>>> 1) IOAPIC sends a level triggered interrupt vector to VCPU0
>>> 2) VCPU0's handler deasserts the irq line and reconfigures the IOAPIC
>>>   to route the vector to VCPU1. The reconfiguration rewrites only the
>>>   upper 32 bits of the IOREDTBLn register. (Causes KVM to update
>>>   ioapic_handled_vectors for VCPU0 and it no longer includes the vector.)
>>
>> Refer to __ioapic_write_entry() in linux guest kernel codes, both
>> upper 32 bits and lower 32 bits are written to IOREDTBLx, so when the
>> scenario which you mentioned will occur?
> 
> You’re correct regarding Linux, but other operating systems may update
> the redirection entry partially.
> 
> For example, ESXi 5.5 has a kernel function IOAPICSteerVector that
> modifies the Destination Field in the upper bits without affecting the
> lower bits of IOREDTBLx. We have observed that without this patch ESXi
> would loose network connectivity because it would get stuck with the
> Remote IRR bit set forever.

Thanks, this is helpful.

Can you write testcases for kvm-unit-tests for some or all of these
scenarios?

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  2017-11-06 14:05       ` Paolo Bonzini
@ 2017-11-07  8:13         ` Nikita Leshchenko
  2017-11-07 10:18           ` Paolo Bonzini
  0 siblings, 1 reply; 22+ messages in thread
From: Nikita Leshchenko @ 2017-11-07  8:13 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Wanpeng Li, kvm, Radim Krcmar, idan.brown, Konrad Rzeszutek Wilk


> On 6 Nov 2017, at 16:05, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> Can you write testcases for kvm-unit-tests for some or all of these
> scenarios?
Submitted patch.

Nikita

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  2017-11-07  8:13         ` Nikita Leshchenko
@ 2017-11-07 10:18           ` Paolo Bonzini
  2017-11-07 23:09             ` Steve Rutherford
  0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2017-11-07 10:18 UTC (permalink / raw)
  To: Nikita Leshchenko
  Cc: Wanpeng Li, kvm, Radim Krcmar, idan.brown, Konrad Rzeszutek Wilk

On 07/11/2017 09:13, Nikita Leshchenko wrote:
> 
>> On 6 Nov 2017, at 16:05, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> Can you write testcases for kvm-unit-tests for some or all of these
>> scenarios?
> Submitted patch.

For 2-4-5 would be nice too (especially 2, the others are easier).

Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
  2017-11-07 10:18           ` Paolo Bonzini
@ 2017-11-07 23:09             ` Steve Rutherford
  0 siblings, 0 replies; 22+ messages in thread
From: Steve Rutherford @ 2017-11-07 23:09 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Nikita Leshchenko, Wanpeng Li, kvm, Radim Krcmar, idan.brown,
	Konrad Rzeszutek Wilk

Thanks for doing this! This looks fine from the split irqchip
perspective as well.

On Tue, Nov 7, 2017 at 2:18 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> On 07/11/2017 09:13, Nikita Leshchenko wrote:
>>
>>> On 6 Nov 2017, at 16:05, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>>
>>> Can you write testcases for kvm-unit-tests for some or all of these
>>> scenarios?
>> Submitted patch.
>
> For 2-4-5 would be nice too (especially 2, the others are easier).
>
> Paolo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
  2017-11-06  3:30   ` Wanpeng Li
@ 2017-11-08  0:16     ` Steve Rutherford
  2017-11-08  9:52       ` Nikita Leshchenko
  0 siblings, 1 reply; 22+ messages in thread
From: Steve Rutherford @ 2017-11-08  0:16 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Nikita Leshenko, kvm, Paolo Bonzini, Radim Krcmar, idan.brown,
	Konrad Rzeszutek Wilk

This is cleaner. Thanks for doing this. (Note that this is racy
without the read only remote irr change, so I have a slight preference
that swap the order of these patches.)
Reviewed-by: Steve Rutherford <srutherford@google.com>

On Sun, Nov 5, 2017 at 7:30 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>> Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
>> IOAPICs without an EOI register. They simulate the EOI message manually
>> by changing the trigger mode to edge and then back to level, with the
>> entry being masked during this.
>>
>> QEMU implements this feature in commit ed1263c363c9
>> ("ioapic: clear remote irr bit for edge-triggered interrupts")
>>
>> As a side effect, this commit removes an incorrect behavior where Remote
>> IRR was cleared when the redirection table entry was rewritten. This is not
>> consistent with the manual and also opens an opportunity for a strange
>> behavior when a redirection table entry is modified from an interrupt
>> handler that handles the same entry: The modification will clear the
>> Remote IRR bit even though the interrupt handler is still running.
>>
>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>
> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
>
>> ---
>>  arch/x86/kvm/ioapic.c | 11 ++++++++++-
>>  1 file changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index 6df150eaaa78..163d340ee5f8 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -304,8 +304,17 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>>                 } else {
>>                         e->bits &= ~0xffffffffULL;
>>                         e->bits |= (u32) val;
>> -                       e->fields.remote_irr = 0;
>>                 }
>> +
>> +               /*
>> +                * Some OSes (Linux, Xen) assume that Remote IRR bit will
>> +                * be cleared by IOAPIC hardware when the entry is configured
>> +                * as edge-triggered. This behavior is used to simulate an
>> +                * explicit EOI on IOAPICs that don't have the EOI register.
>> +                */
>> +               if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
>> +                       e->fields.remote_irr = 0;
>> +
>>                 mask_after = e->fields.mask;
>>                 if (mask_before != mask_after)
>>                         kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
>> --
>> 2.13.3
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] KVM: x86: ioapic: Preserve read-only values in the redirection table
  2017-11-06  3:20   ` Wanpeng Li
@ 2017-11-08  0:18     ` Steve Rutherford
  0 siblings, 0 replies; 22+ messages in thread
From: Steve Rutherford @ 2017-11-08  0:18 UTC (permalink / raw)
  To: Wanpeng Li
  Cc: Nikita Leshenko, kvm, Paolo Bonzini, Radim Krcmar, idan.brown,
	Konrad Rzeszutek Wilk

Reviewed-by: Steve Rutherford <srutherford@google.com>

On Sun, Nov 5, 2017 at 7:20 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>> According to 82093AA (IOAPIC) manual, Remote IRR and Delivery Status are
>> read-only. QEMU implements the bits as RO in commit 479c2a1cb7fb
>> ("ioapic: keep RO bits for IOAPIC entry").
>>
>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>
> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
>
>> ---
>>  arch/x86/kvm/ioapic.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index 163d340ee5f8..4e822ad363f3 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -276,6 +276,7 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>>  {
>>         unsigned index;
>>         bool mask_before, mask_after;
>> +       int old_remote_irr, old_delivery_status;
>>         union kvm_ioapic_redirect_entry *e;
>>
>>         switch (ioapic->ioregsel) {
>> @@ -298,6 +299,9 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>>                         return;
>>                 e = &ioapic->redirtbl[index];
>>                 mask_before = e->fields.mask;
>> +               /* Preserve read-only fields */
>> +               old_remote_irr = e->fields.remote_irr;
>> +               old_delivery_status = e->fields.delivery_status;
>>                 if (ioapic->ioregsel & 1) {
>>                         e->bits &= 0xffffffff;
>>                         e->bits |= (u64) val << 32;
>> @@ -305,6 +309,8 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>>                         e->bits &= ~0xffffffffULL;
>>                         e->bits |= (u32) val;
>>                 }
>> +               e->fields.remote_irr = old_remote_irr;
>> +               e->fields.delivery_status = old_delivery_status;
>>
>>                 /*
>>                  * Some OSes (Linux, Xen) assume that Remote IRR bit will
>> --
>> 2.13.3
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
  2017-11-08  0:16     ` Steve Rutherford
@ 2017-11-08  9:52       ` Nikita Leshchenko
  2017-11-08 21:24         ` Steve Rutherford
  0 siblings, 1 reply; 22+ messages in thread
From: Nikita Leshchenko @ 2017-11-08  9:52 UTC (permalink / raw)
  To: Steve Rutherford
  Cc: Wanpeng Li, kvm, Paolo Bonzini, Radim Krcmar, idan.brown,
	Konrad Rzeszutek Wilk


> On 8 Nov 2017, at 2:16, Steve Rutherford <srutherford@google.com> wrote:
> 
> This is cleaner. Thanks for doing this. (Note that this is racy
> without the read only remote irr change, so I have a slight preference
> that swap the order of these patches.)
> Reviewed-by: Steve Rutherford <srutherford@google.com>
> 
Thanks for your review.

Can you explain why this is racy?

I think that swapping the order of patches 4 (clearing remote irr) and
5 (making remote irr read only) will break Xen (and other guests that
do explicit EOI) between the patches. Even though the current code is
not semantically correct regarding remote irr behavior, it still makes
explicit EOI work. If we swap the patches and apply patch 5 first,
explicit EOI will break until we apply patch 4.

Nikita
> On Sun, Nov 5, 2017 at 7:30 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>>> Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
>>> IOAPICs without an EOI register. They simulate the EOI message manually
>>> by changing the trigger mode to edge and then back to level, with the
>>> entry being masked during this.
>>> 
>>> QEMU implements this feature in commit ed1263c363c9
>>> ("ioapic: clear remote irr bit for edge-triggered interrupts")
>>> 
>>> As a side effect, this commit removes an incorrect behavior where Remote
>>> IRR was cleared when the redirection table entry was rewritten. This is not
>>> consistent with the manual and also opens an opportunity for a strange
>>> behavior when a redirection table entry is modified from an interrupt
>>> handler that handles the same entry: The modification will clear the
>>> Remote IRR bit even though the interrupt handler is still running.
>>> 
>>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>> 
>> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
>> 
>>> ---
>>> arch/x86/kvm/ioapic.c | 11 ++++++++++-
>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>> 
>>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>>> index 6df150eaaa78..163d340ee5f8 100644
>>> --- a/arch/x86/kvm/ioapic.c
>>> +++ b/arch/x86/kvm/ioapic.c
>>> @@ -304,8 +304,17 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>>>                } else {
>>>                        e->bits &= ~0xffffffffULL;
>>>                        e->bits |= (u32) val;
>>> -                       e->fields.remote_irr = 0;
>>>                }
>>> +
>>> +               /*
>>> +                * Some OSes (Linux, Xen) assume that Remote IRR bit will
>>> +                * be cleared by IOAPIC hardware when the entry is configured
>>> +                * as edge-triggered. This behavior is used to simulate an
>>> +                * explicit EOI on IOAPICs that don't have the EOI register.
>>> +                */
>>> +               if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
>>> +                       e->fields.remote_irr = 0;
>>> +
>>>                mask_after = e->fields.mask;
>>>                if (mask_before != mask_after)
>>>                        kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
>>> --
>>> 2.13.3
>>> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
  2017-11-08  9:52       ` Nikita Leshchenko
@ 2017-11-08 21:24         ` Steve Rutherford
  2017-11-08 21:25           ` Steve Rutherford
  0 siblings, 1 reply; 22+ messages in thread
From: Steve Rutherford @ 2017-11-08 21:24 UTC (permalink / raw)
  To: Nikita Leshchenko
  Cc: Wanpeng Li, kvm, Paolo Bonzini, Radim Krcmar, idan.brown,
	Konrad Rzeszutek Wilk

If an eoi on one cpu for irq x races with a guest reconfiguring the
redir entry that points to irq x, the reconfiguration might writeback
the high remote irr value that it read out in the first place. This
would leave the remote irr stuck high since there is no pending eoi
waiting to clear that.

On Wed, Nov 8, 2017 at 1:52 AM, Nikita Leshchenko
<nikita.leshchenko@oracle.com> wrote:
>
>> On 8 Nov 2017, at 2:16, Steve Rutherford <srutherford@google.com> wrote:
>>
>> This is cleaner. Thanks for doing this. (Note that this is racy
>> without the read only remote irr change, so I have a slight preference
>> that swap the order of these patches.)
>> Reviewed-by: Steve Rutherford <srutherford@google.com>
>>
> Thanks for your review.
>
> Can you explain why this is racy?
>
> I think that swapping the order of patches 4 (clearing remote irr) and
> 5 (making remote irr read only) will break Xen (and other guests that
> do explicit EOI) between the patches. Even though the current code is
> not semantically correct regarding remote irr behavior, it still makes
> explicit EOI work. If we swap the patches and apply patch 5 first,
> explicit EOI will break until we apply patch 4.
>
> Nikita
>> On Sun, Nov 5, 2017 at 7:30 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>>>> Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
>>>> IOAPICs without an EOI register. They simulate the EOI message manually
>>>> by changing the trigger mode to edge and then back to level, with the
>>>> entry being masked during this.
>>>>
>>>> QEMU implements this feature in commit ed1263c363c9
>>>> ("ioapic: clear remote irr bit for edge-triggered interrupts")
>>>>
>>>> As a side effect, this commit removes an incorrect behavior where Remote
>>>> IRR was cleared when the redirection table entry was rewritten. This is not
>>>> consistent with the manual and also opens an opportunity for a strange
>>>> behavior when a redirection table entry is modified from an interrupt
>>>> handler that handles the same entry: The modification will clear the
>>>> Remote IRR bit even though the interrupt handler is still running.
>>>>
>>>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>
>>> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>>
>>>> ---
>>>> arch/x86/kvm/ioapic.c | 11 ++++++++++-
>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>>>> index 6df150eaaa78..163d340ee5f8 100644
>>>> --- a/arch/x86/kvm/ioapic.c
>>>> +++ b/arch/x86/kvm/ioapic.c
>>>> @@ -304,8 +304,17 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>>>>                } else {
>>>>                        e->bits &= ~0xffffffffULL;
>>>>                        e->bits |= (u32) val;
>>>> -                       e->fields.remote_irr = 0;
>>>>                }
>>>> +
>>>> +               /*
>>>> +                * Some OSes (Linux, Xen) assume that Remote IRR bit will
>>>> +                * be cleared by IOAPIC hardware when the entry is configured
>>>> +                * as edge-triggered. This behavior is used to simulate an
>>>> +                * explicit EOI on IOAPICs that don't have the EOI register.
>>>> +                */
>>>> +               if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
>>>> +                       e->fields.remote_irr = 0;
>>>> +
>>>>                mask_after = e->fields.mask;
>>>>                if (mask_before != mask_after)
>>>>                        kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
>>>> --
>>>> 2.13.3
>>>>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
  2017-11-08 21:24         ` Steve Rutherford
@ 2017-11-08 21:25           ` Steve Rutherford
  0 siblings, 0 replies; 22+ messages in thread
From: Steve Rutherford @ 2017-11-08 21:25 UTC (permalink / raw)
  To: Nikita Leshchenko
  Cc: Wanpeng Li, kvm, Paolo Bonzini, Radim Krcmar, idan.brown,
	Konrad Rzeszutek Wilk

Since there are two issues competing here, there's no reason to reorder these.

On Wed, Nov 8, 2017 at 1:24 PM, Steve Rutherford <srutherford@google.com> wrote:
> If an eoi on one cpu for irq x races with a guest reconfiguring the
> redir entry that points to irq x, the reconfiguration might writeback
> the high remote irr value that it read out in the first place. This
> would leave the remote irr stuck high since there is no pending eoi
> waiting to clear that.
>
> On Wed, Nov 8, 2017 at 1:52 AM, Nikita Leshchenko
> <nikita.leshchenko@oracle.com> wrote:
>>
>>> On 8 Nov 2017, at 2:16, Steve Rutherford <srutherford@google.com> wrote:
>>>
>>> This is cleaner. Thanks for doing this. (Note that this is racy
>>> without the read only remote irr change, so I have a slight preference
>>> that swap the order of these patches.)
>>> Reviewed-by: Steve Rutherford <srutherford@google.com>
>>>
>> Thanks for your review.
>>
>> Can you explain why this is racy?
>>
>> I think that swapping the order of patches 4 (clearing remote irr) and
>> 5 (making remote irr read only) will break Xen (and other guests that
>> do explicit EOI) between the patches. Even though the current code is
>> not semantically correct regarding remote irr behavior, it still makes
>> explicit EOI work. If we swap the patches and apply patch 5 first,
>> explicit EOI will break until we apply patch 4.
>>
>> Nikita
>>> On Sun, Nov 5, 2017 at 7:30 PM, Wanpeng Li <kernellwp@gmail.com> wrote:
>>>> 2017-11-05 21:52 GMT+08:00 Nikita Leshenko <nikita.leshchenko@oracle.com>:
>>>>> Some OSes (Linux, Xen) use this behavior to clear the Remote IRR bit for
>>>>> IOAPICs without an EOI register. They simulate the EOI message manually
>>>>> by changing the trigger mode to edge and then back to level, with the
>>>>> entry being masked during this.
>>>>>
>>>>> QEMU implements this feature in commit ed1263c363c9
>>>>> ("ioapic: clear remote irr bit for edge-triggered interrupts")
>>>>>
>>>>> As a side effect, this commit removes an incorrect behavior where Remote
>>>>> IRR was cleared when the redirection table entry was rewritten. This is not
>>>>> consistent with the manual and also opens an opportunity for a strange
>>>>> behavior when a redirection table entry is modified from an interrupt
>>>>> handler that handles the same entry: The modification will clear the
>>>>> Remote IRR bit even though the interrupt handler is still running.
>>>>>
>>>>> Signed-off-by: Nikita Leshenko <nikita.leshchenko@oracle.com>
>>>>> Reviewed-by: Liran Alon <liran.alon@oracle.com>
>>>>> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
>>>>
>>>> Reviewed-by: Wanpeng Li <wanpeng.li@hotmail.com>
>>>>
>>>>> ---
>>>>> arch/x86/kvm/ioapic.c | 11 ++++++++++-
>>>>> 1 file changed, 10 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>>>>> index 6df150eaaa78..163d340ee5f8 100644
>>>>> --- a/arch/x86/kvm/ioapic.c
>>>>> +++ b/arch/x86/kvm/ioapic.c
>>>>> @@ -304,8 +304,17 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
>>>>>                } else {
>>>>>                        e->bits &= ~0xffffffffULL;
>>>>>                        e->bits |= (u32) val;
>>>>> -                       e->fields.remote_irr = 0;
>>>>>                }
>>>>> +
>>>>> +               /*
>>>>> +                * Some OSes (Linux, Xen) assume that Remote IRR bit will
>>>>> +                * be cleared by IOAPIC hardware when the entry is configured
>>>>> +                * as edge-triggered. This behavior is used to simulate an
>>>>> +                * explicit EOI on IOAPICs that don't have the EOI register.
>>>>> +                */
>>>>> +               if (e->fields.trig_mode == IOAPIC_EDGE_TRIG)
>>>>> +                       e->fields.remote_irr = 0;
>>>>> +
>>>>>                mask_after = e->fields.mask;
>>>>>                if (mask_before != mask_after)
>>>>>                        kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after);
>>>>> --
>>>>> 2.13.3
>>>>>
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/5] KVM: x86: Various IOAPIC improvements
  2017-11-05 13:52 [PATCH 0/5] KVM: x86: Various IOAPIC improvements Nikita Leshenko
                   ` (4 preceding siblings ...)
  2017-11-05 13:52 ` [PATCH 5/5] KVM: x86: ioapic: Preserve read-only values in the redirection table Nikita Leshenko
@ 2017-11-10 21:42 ` Radim Krčmář
  5 siblings, 0 replies; 22+ messages in thread
From: Radim Krčmář @ 2017-11-10 21:42 UTC (permalink / raw)
  To: Nikita Leshenko; +Cc: kvm, pbonzini, idan.brown

2017-11-05 15:52+0200, Nikita Leshenko:
> Hello,
> 
> This patch series aims to improve various aspects of the in-kernel
> IOAPIC.
> 
> The first patch fixes EOIs that were not delivered if the redirection
> table was modified from an interrupt handler. We observed that VMWare
> ESX tends to do that.
> 
> The other patches implement small semantic IOAPIC improvements that
> are in-line with the respective code in QEMU.

Applied all, thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-11-10 21:43 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-05 13:52 [PATCH 0/5] KVM: x86: Various IOAPIC improvements Nikita Leshenko
2017-11-05 13:52 ` [PATCH 1/5] KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race Nikita Leshenko
2017-11-06  2:00   ` Wanpeng Li
2017-11-06 13:53     ` Nikita Leshchenko
2017-11-06 14:05       ` Paolo Bonzini
2017-11-07  8:13         ` Nikita Leshchenko
2017-11-07 10:18           ` Paolo Bonzini
2017-11-07 23:09             ` Steve Rutherford
2017-11-05 13:52 ` [PATCH 2/5] KVM: x86: ioapic: Don't fire level irq when Remote IRR set Nikita Leshenko
2017-11-06  2:50   ` Wanpeng Li
2017-11-05 13:52 ` [PATCH 3/5] KVM: x86: ioapic: Remove redundant check for Remote IRR in ioapic_set_irq Nikita Leshenko
2017-11-06  3:16   ` Wanpeng Li
2017-11-05 13:52 ` [PATCH 4/5] KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered Nikita Leshenko
2017-11-06  3:30   ` Wanpeng Li
2017-11-08  0:16     ` Steve Rutherford
2017-11-08  9:52       ` Nikita Leshchenko
2017-11-08 21:24         ` Steve Rutherford
2017-11-08 21:25           ` Steve Rutherford
2017-11-05 13:52 ` [PATCH 5/5] KVM: x86: ioapic: Preserve read-only values in the redirection table Nikita Leshenko
2017-11-06  3:20   ` Wanpeng Li
2017-11-08  0:18     ` Steve Rutherford
2017-11-10 21:42 ` [PATCH 0/5] KVM: x86: Various IOAPIC improvements Radim Krčmář

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.