All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
@ 2017-04-11 14:11 Ladi Prosek
  2017-04-12  6:40 ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Ladi Prosek @ 2017-04-11 14:11 UTC (permalink / raw)
  To: kvm; +Cc: rkrcmar

If the guest takes advantage of the directed EOI feature by setting
APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
the EOI register of the respective IOAPIC.

>From Intel's x2APIC Specification:
"following the EOI to the local x2APIC unit for a level triggered
interrupt, perform a directed EOI to the IOxAPIC generating the
interrupt by writing to its EOI register."

Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
was later removed with the rest of IA64 support.

The bug has gone undetected for a long time because Linux writes to
IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
seem to perform such a check.

This commit re-adds IOAPIC_REG_EOI and implements it in terms of
__kvm_ioapic_update_eoi.

Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
Signed-off-by: Ladi Prosek <lprosek@redhat.com>
---
 arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------
 arch/x86/kvm/ioapic.h |  1 +
 2 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 289270a..8df1c6c 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
 #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
 
 static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
-			struct kvm_ioapic *ioapic, int vector, int trigger_mode)
+			struct kvm_ioapic *ioapic, int vector, int trigger_mode,
+			bool directed)
 {
 	struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
 	struct kvm_lapic *apic = vcpu->arch.apic;
 	int i;
 
 	/* RTC special handling */
-	if (test_bit(vcpu->vcpu_id, dest_map->map) &&
+	if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) &&
 	    vector == dest_map->vectors[vcpu->vcpu_id])
 		rtc_irq_eoi(ioapic, vcpu);
 
@@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
 		if (ent->fields.vector != vector)
 			continue;
 
-		/*
-		 * We are dropping lock while calling ack notifiers because ack
-		 * notifier callbacks for assigned devices call into IOAPIC
-		 * recursively. Since remote_irr is cleared only after call
-		 * to notifiers if the same vector will be delivered while lock
-		 * is dropped it will be put into irr and will be delivered
-		 * after ack notifier returns.
-		 */
-		spin_unlock(&ioapic->lock);
-		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
-		spin_lock(&ioapic->lock);
-
-		if (trigger_mode != IOAPIC_LEVEL_TRIG ||
-		    kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
-			continue;
+		if (!directed) {
+			/*
+			 * We are dropping lock while calling ack notifiers because ack
+			 * notifier callbacks for assigned devices call into IOAPIC
+			 * recursively. Since remote_irr is cleared only after call
+			 * to notifiers if the same vector will be delivered while lock
+			 * is dropped it will be put into irr and will be delivered
+			 * after ack notifier returns.
+			 */
+			spin_unlock(&ioapic->lock);
+			kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
+			spin_lock(&ioapic->lock);
+
+			if (trigger_mode != IOAPIC_LEVEL_TRIG ||
+			    kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+				continue;
+		}
 
 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
 		ent->fields.remote_irr = 0;
@@ -478,7 +481,7 @@ void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
 	struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
 
 	spin_lock(&ioapic->lock);
-	__kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
+	__kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode, false);
 	spin_unlock(&ioapic->lock);
 }
 
@@ -540,6 +543,7 @@ static int ioapic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 				 gpa_t addr, int len, const void *val)
 {
 	struct kvm_ioapic *ioapic = to_ioapic(this);
+	struct kvm_lapic *apic = vcpu->arch.apic;
 	u32 data;
 	if (!ioapic_in_range(ioapic, addr))
 		return -EOPNOTSUPP;
@@ -575,6 +579,12 @@ static int ioapic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
 		ioapic_write_indirect(ioapic, data);
 		break;
 
+	case IOAPIC_REG_EOI:
+		if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+			__kvm_ioapic_update_eoi(vcpu, ioapic, data,
+						IOAPIC_LEVEL_TRIG, true);
+		break;
+
 	default:
 		break;
 	}
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index 1cc6e54..251b61b 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -20,6 +20,7 @@ struct kvm_vcpu;
 /* Direct registers. */
 #define IOAPIC_REG_SELECT  0x00
 #define IOAPIC_REG_WINDOW  0x10
+#define IOAPIC_REG_EOI     0x40
 
 /* Indirect registers. */
 #define IOAPIC_REG_APIC_ID 0x00	/* x86 IOAPIC only */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-11 14:11 [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support Ladi Prosek
@ 2017-04-12  6:40 ` Peter Xu
  2017-04-12  7:36   ` Ladi Prosek
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2017-04-12  6:40 UTC (permalink / raw)
  To: Ladi Prosek; +Cc: kvm, rkrcmar

On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
> If the guest takes advantage of the directed EOI feature by setting
> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
> the EOI register of the respective IOAPIC.
> 
> From Intel's x2APIC Specification:
> "following the EOI to the local x2APIC unit for a level triggered
> interrupt, perform a directed EOI to the IOxAPIC generating the
> interrupt by writing to its EOI register."
> 
> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
> was later removed with the rest of IA64 support.
> 
> The bug has gone undetected for a long time because Linux writes to
> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
> seem to perform such a check.

Hi, Ladi,

Not sure I'm understanding it correctly... I see "direct EOI" is a
feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
another feature for APIC. They are not the same feature, so it may not
be required to have them all together. IIUC current x86 kvm is just
the case - it supports EOI broadcast suppression on APIC, but it does
not support direct EOI on kernel IOAPIC.

I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even
if IOAPIC does not support direct EOI (the guest can know it by
probing IOAPIC version). Please correct if I'm wrong.

> 
> This commit re-adds IOAPIC_REG_EOI and implements it in terms of
> __kvm_ioapic_update_eoi.
> 
> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> ---
>  arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------
>  arch/x86/kvm/ioapic.h |  1 +
>  2 files changed, 29 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 289270a..8df1c6c 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>  #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
>  
>  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> -			struct kvm_ioapic *ioapic, int vector, int trigger_mode)
> +			struct kvm_ioapic *ioapic, int vector, int trigger_mode,
> +			bool directed)
>  {
>  	struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
>  	struct kvm_lapic *apic = vcpu->arch.apic;
>  	int i;
>  
>  	/* RTC special handling */
> -	if (test_bit(vcpu->vcpu_id, dest_map->map) &&
> +	if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) &&
>  	    vector == dest_map->vectors[vcpu->vcpu_id])
>  		rtc_irq_eoi(ioapic, vcpu);
>  
> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>  		if (ent->fields.vector != vector)
>  			continue;
>  
> -		/*
> -		 * We are dropping lock while calling ack notifiers because ack
> -		 * notifier callbacks for assigned devices call into IOAPIC
> -		 * recursively. Since remote_irr is cleared only after call
> -		 * to notifiers if the same vector will be delivered while lock
> -		 * is dropped it will be put into irr and will be delivered
> -		 * after ack notifier returns.
> -		 */
> -		spin_unlock(&ioapic->lock);
> -		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> -		spin_lock(&ioapic->lock);
> -
> -		if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> -		    kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> -			continue;
> +		if (!directed) {

Could I ask why we need to skip this if the EOI is sent via direct EOI
register of IOAPIC?

Thanks,

> +			/*
> +			 * We are dropping lock while calling ack notifiers because ack
> +			 * notifier callbacks for assigned devices call into IOAPIC
> +			 * recursively. Since remote_irr is cleared only after call
> +			 * to notifiers if the same vector will be delivered while lock
> +			 * is dropped it will be put into irr and will be delivered
> +			 * after ack notifier returns.
> +			 */
> +			spin_unlock(&ioapic->lock);
> +			kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> +			spin_lock(&ioapic->lock);
> +
> +			if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> +			    kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> +				continue;
> +		}
>  
>  		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>  		ent->fields.remote_irr = 0;
> @@ -478,7 +481,7 @@ void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
>  	struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
>  
>  	spin_lock(&ioapic->lock);
> -	__kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
> +	__kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode, false);
>  	spin_unlock(&ioapic->lock);
>  }
>  
> @@ -540,6 +543,7 @@ static int ioapic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
>  				 gpa_t addr, int len, const void *val)
>  {
>  	struct kvm_ioapic *ioapic = to_ioapic(this);
> +	struct kvm_lapic *apic = vcpu->arch.apic;
>  	u32 data;
>  	if (!ioapic_in_range(ioapic, addr))
>  		return -EOPNOTSUPP;
> @@ -575,6 +579,12 @@ static int ioapic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
>  		ioapic_write_indirect(ioapic, data);
>  		break;
>  
> +	case IOAPIC_REG_EOI:
> +		if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> +			__kvm_ioapic_update_eoi(vcpu, ioapic, data,
> +						IOAPIC_LEVEL_TRIG, true);
> +		break;
> +
>  	default:
>  		break;
>  	}
> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
> index 1cc6e54..251b61b 100644
> --- a/arch/x86/kvm/ioapic.h
> +++ b/arch/x86/kvm/ioapic.h
> @@ -20,6 +20,7 @@ struct kvm_vcpu;
>  /* Direct registers. */
>  #define IOAPIC_REG_SELECT  0x00
>  #define IOAPIC_REG_WINDOW  0x10
> +#define IOAPIC_REG_EOI     0x40
>  
>  /* Indirect registers. */
>  #define IOAPIC_REG_APIC_ID 0x00	/* x86 IOAPIC only */
> -- 
> 2.9.3
> 

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-12  6:40 ` Peter Xu
@ 2017-04-12  7:36   ` Ladi Prosek
  2017-04-12  9:06     ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Ladi Prosek @ 2017-04-12  7:36 UTC (permalink / raw)
  To: Peter Xu; +Cc: KVM list, Radim Krcmar

On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@redhat.com> wrote:
> On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
>> If the guest takes advantage of the directed EOI feature by setting
>> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
>> the EOI register of the respective IOAPIC.
>>
>> From Intel's x2APIC Specification:
>> "following the EOI to the local x2APIC unit for a level triggered
>> interrupt, perform a directed EOI to the IOxAPIC generating the
>> interrupt by writing to its EOI register."
>>
>> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
>> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
>> was later removed with the rest of IA64 support.
>>
>> The bug has gone undetected for a long time because Linux writes to
>> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
>> seem to perform such a check.
>
> Hi, Ladi,

Hi Peter,

> Not sure I'm understanding it correctly... I see "direct EOI" is a
> feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
> another feature for APIC. They are not the same feature, so it may not
> be required to have them all together. IIUC current x86 kvm is just
> the case - it supports EOI broadcast suppression on APIC, but it does
> not support direct EOI on kernel IOAPIC.

Thanks, that makes perfect sense and explains why Linux behaves the
way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).

This document makes it look like suppress EOI-broadcast always implies
directed EOI, though:

http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf

NB "The support for Directed EOI capability can be detected by means
of bit 24 in the Local APIC Version Register. "

There is no mention of APIC version or any other detection mechanism
for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
but I don't see that in the document either.

I suspect that Microsoft implemented EOI by following this same spec.
Level-triggered interrupts don't work right on Windows Server 2016
with Hyper-V enabled without this patch.

> I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even
> if IOAPIC does not support direct EOI (the guest can know it by
> probing IOAPIC version). Please correct if I'm wrong.

Yes, I think that the guest is to blame here. We might add that to the
commit message.

>>
>> This commit re-adds IOAPIC_REG_EOI and implements it in terms of
>> __kvm_ioapic_update_eoi.
>>
>> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
>> ---
>>  arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------
>>  arch/x86/kvm/ioapic.h |  1 +
>>  2 files changed, 29 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index 289270a..8df1c6c 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>>  #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
>>
>>  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>> -                     struct kvm_ioapic *ioapic, int vector, int trigger_mode)
>> +                     struct kvm_ioapic *ioapic, int vector, int trigger_mode,
>> +                     bool directed)
>>  {
>>       struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
>>       struct kvm_lapic *apic = vcpu->arch.apic;
>>       int i;
>>
>>       /* RTC special handling */
>> -     if (test_bit(vcpu->vcpu_id, dest_map->map) &&
>> +     if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) &&
>>           vector == dest_map->vectors[vcpu->vcpu_id])
>>               rtc_irq_eoi(ioapic, vcpu);
>>
>> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>>               if (ent->fields.vector != vector)
>>                       continue;
>>
>> -             /*
>> -              * We are dropping lock while calling ack notifiers because ack
>> -              * notifier callbacks for assigned devices call into IOAPIC
>> -              * recursively. Since remote_irr is cleared only after call
>> -              * to notifiers if the same vector will be delivered while lock
>> -              * is dropped it will be put into irr and will be delivered
>> -              * after ack notifier returns.
>> -              */
>> -             spin_unlock(&ioapic->lock);
>> -             kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>> -             spin_lock(&ioapic->lock);
>> -
>> -             if (trigger_mode != IOAPIC_LEVEL_TRIG ||
>> -                 kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> -                     continue;
>> +             if (!directed) {
>
> Could I ask why we need to skip this if the EOI is sent via direct EOI
> register of IOAPIC?

Because it's already been done as part of the local EOI. With directed
EOI we hit this function twice, first time when doing the local EOI
and then the newly added code path for IOAPIC EOI with directed=true.

I, again, followed the above mentioned document which explicitly
dictates the sequence. And I mechanically split the function to the
"local part' - what it had been doing up to the continue statement -
and the "directed part" - what it had been skipping. I'll admit that
my familiarity with this code is limited and there may be a better way
to do this.

Thanks!
Ladi

> Thanks,
>
>> +                     /*
>> +                      * We are dropping lock while calling ack notifiers because ack
>> +                      * notifier callbacks for assigned devices call into IOAPIC
>> +                      * recursively. Since remote_irr is cleared only after call
>> +                      * to notifiers if the same vector will be delivered while lock
>> +                      * is dropped it will be put into irr and will be delivered
>> +                      * after ack notifier returns.
>> +                      */
>> +                     spin_unlock(&ioapic->lock);
>> +                     kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>> +                     spin_lock(&ioapic->lock);
>> +
>> +                     if (trigger_mode != IOAPIC_LEVEL_TRIG ||
>> +                         kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> +                             continue;
>> +             }
>>
>>               ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>>               ent->fields.remote_irr = 0;
>> @@ -478,7 +481,7 @@ void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
>>       struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
>>
>>       spin_lock(&ioapic->lock);
>> -     __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
>> +     __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode, false);
>>       spin_unlock(&ioapic->lock);
>>  }
>>
>> @@ -540,6 +543,7 @@ static int ioapic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
>>                                gpa_t addr, int len, const void *val)
>>  {
>>       struct kvm_ioapic *ioapic = to_ioapic(this);
>> +     struct kvm_lapic *apic = vcpu->arch.apic;
>>       u32 data;
>>       if (!ioapic_in_range(ioapic, addr))
>>               return -EOPNOTSUPP;
>> @@ -575,6 +579,12 @@ static int ioapic_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *this,
>>               ioapic_write_indirect(ioapic, data);
>>               break;
>>
>> +     case IOAPIC_REG_EOI:
>> +             if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> +                     __kvm_ioapic_update_eoi(vcpu, ioapic, data,
>> +                                             IOAPIC_LEVEL_TRIG, true);
>> +             break;
>> +
>>       default:
>>               break;
>>       }
>> diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
>> index 1cc6e54..251b61b 100644
>> --- a/arch/x86/kvm/ioapic.h
>> +++ b/arch/x86/kvm/ioapic.h
>> @@ -20,6 +20,7 @@ struct kvm_vcpu;
>>  /* Direct registers. */
>>  #define IOAPIC_REG_SELECT  0x00
>>  #define IOAPIC_REG_WINDOW  0x10
>> +#define IOAPIC_REG_EOI     0x40
>>
>>  /* Indirect registers. */
>>  #define IOAPIC_REG_APIC_ID 0x00      /* x86 IOAPIC only */
>> --
>> 2.9.3
>>
>
> --
> Peter Xu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-12  7:36   ` Ladi Prosek
@ 2017-04-12  9:06     ` Peter Xu
  2017-04-12  9:37       ` Ladi Prosek
  2017-04-12 20:52       ` Radim Krcmar
  0 siblings, 2 replies; 16+ messages in thread
From: Peter Xu @ 2017-04-12  9:06 UTC (permalink / raw)
  To: Ladi Prosek; +Cc: KVM list, Radim Krcmar

On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote:
> On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@redhat.com> wrote:
> > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
> >> If the guest takes advantage of the directed EOI feature by setting
> >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
> >> the EOI register of the respective IOAPIC.
> >>
> >> From Intel's x2APIC Specification:
> >> "following the EOI to the local x2APIC unit for a level triggered
> >> interrupt, perform a directed EOI to the IOxAPIC generating the
> >> interrupt by writing to its EOI register."
> >>
> >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
> >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
> >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
> >> was later removed with the rest of IA64 support.
> >>
> >> The bug has gone undetected for a long time because Linux writes to
> >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
> >> seem to perform such a check.
> >
> > Hi, Ladi,
> 
> Hi Peter,
> 
> > Not sure I'm understanding it correctly... I see "direct EOI" is a
> > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
> > another feature for APIC. They are not the same feature, so it may not
> > be required to have them all together. IIUC current x86 kvm is just
> > the case - it supports EOI broadcast suppression on APIC, but it does
> > not support direct EOI on kernel IOAPIC.
> 
> Thanks, that makes perfect sense and explains why Linux behaves the
> way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).
> 
> This document makes it look like suppress EOI-broadcast always implies
> directed EOI, though:
> 
> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
> 
> NB "The support for Directed EOI capability can be detected by means
> of bit 24 in the Local APIC Version Register. "
> 
> There is no mention of APIC version or any other detection mechanism
> for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
> but I don't see that in the document either.
> 
> I suspect that Microsoft implemented EOI by following this same spec.
> Level-triggered interrupts don't work right on Windows Server 2016
> with Hyper-V enabled without this patch.

Yes, the documents for IOAPIC is always hard to find, at least for
me...

There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here:
http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf

However I see nothing related to how the IOAPIC version is defined. In
that sense, the comment above __eoi_ioapic_pin() seems to be better. :)

> 
> > I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even
> > if IOAPIC does not support direct EOI (the guest can know it by
> > probing IOAPIC version). Please correct if I'm wrong.
> 
> Yes, I think that the guest is to blame here. We might add that to the
> commit message.

Agreed.

> 
> >>
> >> This commit re-adds IOAPIC_REG_EOI and implements it in terms of
> >> __kvm_ioapic_update_eoi.
> >>
> >> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
> >> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> >> ---
> >>  arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------
> >>  arch/x86/kvm/ioapic.h |  1 +
> >>  2 files changed, 29 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> >> index 289270a..8df1c6c 100644
> >> --- a/arch/x86/kvm/ioapic.c
> >> +++ b/arch/x86/kvm/ioapic.c
> >> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
> >>  #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
> >>
> >>  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >> -                     struct kvm_ioapic *ioapic, int vector, int trigger_mode)
> >> +                     struct kvm_ioapic *ioapic, int vector, int trigger_mode,
> >> +                     bool directed)
> >>  {
> >>       struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
> >>       struct kvm_lapic *apic = vcpu->arch.apic;
> >>       int i;
> >>
> >>       /* RTC special handling */
> >> -     if (test_bit(vcpu->vcpu_id, dest_map->map) &&
> >> +     if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) &&
> >>           vector == dest_map->vectors[vcpu->vcpu_id])
> >>               rtc_irq_eoi(ioapic, vcpu);
> >>
> >> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >>               if (ent->fields.vector != vector)
> >>                       continue;
> >>
> >> -             /*
> >> -              * We are dropping lock while calling ack notifiers because ack
> >> -              * notifier callbacks for assigned devices call into IOAPIC
> >> -              * recursively. Since remote_irr is cleared only after call
> >> -              * to notifiers if the same vector will be delivered while lock
> >> -              * is dropped it will be put into irr and will be delivered
> >> -              * after ack notifier returns.
> >> -              */
> >> -             spin_unlock(&ioapic->lock);
> >> -             kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> >> -             spin_lock(&ioapic->lock);
> >> -
> >> -             if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> >> -                 kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> >> -                     continue;
> >> +             if (!directed) {
> >
> > Could I ask why we need to skip this if the EOI is sent via direct EOI
> > register of IOAPIC?
> 
> Because it's already been done as part of the local EOI. With directed
> EOI we hit this function twice, first time when doing the local EOI
> and then the newly added code path for IOAPIC EOI with directed=true.
> 
> I, again, followed the above mentioned document which explicitly
> dictates the sequence. And I mechanically split the function to the
> "local part' - what it had been doing up to the continue statement -
> and the "directed part" - what it had been skipping. I'll admit that
> my familiarity with this code is limited and there may be a better way
> to do this.

Instead of the "!directed" flag (which is imho duplicated with what
APIC_SPIV_DIRECTED_EOI means), do you like below fix?

-----8<-----
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 6e219e5..78d3ec8 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
                kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
                spin_lock(&ioapic->lock);
 
-               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
-                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+               if (trigger_mode != IOAPIC_LEVEL_TRIG)
                        continue;
 
                ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
@@ -473,10 +472,15 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
        }
 }
 
+/* This should only be triggered by APIC EOI broadcast */
 void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
 {
        struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
 
+       /* If we'll be using direct EOI, skip broadcast */
+       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+               return;
+
        spin_lock(&ioapic->lock);
        __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
        spin_unlock(&ioapic->lock);
---->8----

This patch along will break kvm_notify_acked_irq() in some way I
guess, but if with your patch (though will possibly need to boost
IOAPIC version to 0x20 as well), it should work fine as long as guest
remembers to send the direct EOI.

Thanks,

-- 
Peter Xu

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-12  9:06     ` Peter Xu
@ 2017-04-12  9:37       ` Ladi Prosek
  2017-04-12 10:28         ` Ladi Prosek
  2017-04-12 20:52       ` Radim Krcmar
  1 sibling, 1 reply; 16+ messages in thread
From: Ladi Prosek @ 2017-04-12  9:37 UTC (permalink / raw)
  To: Peter Xu; +Cc: KVM list, Radim Krcmar

On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@redhat.com> wrote:
> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote:
>> On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@redhat.com> wrote:
>> > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
>> >> If the guest takes advantage of the directed EOI feature by setting
>> >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
>> >> the EOI register of the respective IOAPIC.
>> >>
>> >> From Intel's x2APIC Specification:
>> >> "following the EOI to the local x2APIC unit for a level triggered
>> >> interrupt, perform a directed EOI to the IOxAPIC generating the
>> >> interrupt by writing to its EOI register."
>> >>
>> >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>> >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
>> >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
>> >> was later removed with the rest of IA64 support.
>> >>
>> >> The bug has gone undetected for a long time because Linux writes to
>> >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
>> >> seem to perform such a check.
>> >
>> > Hi, Ladi,
>>
>> Hi Peter,
>>
>> > Not sure I'm understanding it correctly... I see "direct EOI" is a
>> > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
>> > another feature for APIC. They are not the same feature, so it may not
>> > be required to have them all together. IIUC current x86 kvm is just
>> > the case - it supports EOI broadcast suppression on APIC, but it does
>> > not support direct EOI on kernel IOAPIC.
>>
>> Thanks, that makes perfect sense and explains why Linux behaves the
>> way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).
>>
>> This document makes it look like suppress EOI-broadcast always implies
>> directed EOI, though:
>>
>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
>>
>> NB "The support for Directed EOI capability can be detected by means
>> of bit 24 in the Local APIC Version Register. "
>>
>> There is no mention of APIC version or any other detection mechanism
>> for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
>> but I don't see that in the document either.
>>
>> I suspect that Microsoft implemented EOI by following this same spec.
>> Level-triggered interrupts don't work right on Windows Server 2016
>> with Hyper-V enabled without this patch.
>
> Yes, the documents for IOAPIC is always hard to find, at least for
> me...
>
> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here:
> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf
>
> However I see nothing related to how the IOAPIC version is defined. In
> that sense, the comment above __eoi_ioapic_pin() seems to be better. :)
>
>>
>> > I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even
>> > if IOAPIC does not support direct EOI (the guest can know it by
>> > probing IOAPIC version). Please correct if I'm wrong.
>>
>> Yes, I think that the guest is to blame here. We might add that to the
>> commit message.
>
> Agreed.
>
>>
>> >>
>> >> This commit re-adds IOAPIC_REG_EOI and implements it in terms of
>> >> __kvm_ioapic_update_eoi.
>> >>
>> >> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>> >> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
>> >> ---
>> >>  arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------
>> >>  arch/x86/kvm/ioapic.h |  1 +
>> >>  2 files changed, 29 insertions(+), 18 deletions(-)
>> >>
>> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> >> index 289270a..8df1c6c 100644
>> >> --- a/arch/x86/kvm/ioapic.c
>> >> +++ b/arch/x86/kvm/ioapic.c
>> >> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>> >>  #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
>> >>
>> >>  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>> >> -                     struct kvm_ioapic *ioapic, int vector, int trigger_mode)
>> >> +                     struct kvm_ioapic *ioapic, int vector, int trigger_mode,
>> >> +                     bool directed)
>> >>  {
>> >>       struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
>> >>       struct kvm_lapic *apic = vcpu->arch.apic;
>> >>       int i;
>> >>
>> >>       /* RTC special handling */
>> >> -     if (test_bit(vcpu->vcpu_id, dest_map->map) &&
>> >> +     if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) &&
>> >>           vector == dest_map->vectors[vcpu->vcpu_id])
>> >>               rtc_irq_eoi(ioapic, vcpu);
>> >>
>> >> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>> >>               if (ent->fields.vector != vector)
>> >>                       continue;
>> >>
>> >> -             /*
>> >> -              * We are dropping lock while calling ack notifiers because ack
>> >> -              * notifier callbacks for assigned devices call into IOAPIC
>> >> -              * recursively. Since remote_irr is cleared only after call
>> >> -              * to notifiers if the same vector will be delivered while lock
>> >> -              * is dropped it will be put into irr and will be delivered
>> >> -              * after ack notifier returns.
>> >> -              */
>> >> -             spin_unlock(&ioapic->lock);
>> >> -             kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>> >> -             spin_lock(&ioapic->lock);
>> >> -
>> >> -             if (trigger_mode != IOAPIC_LEVEL_TRIG ||
>> >> -                 kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> >> -                     continue;
>> >> +             if (!directed) {
>> >
>> > Could I ask why we need to skip this if the EOI is sent via direct EOI
>> > register of IOAPIC?
>>
>> Because it's already been done as part of the local EOI. With directed
>> EOI we hit this function twice, first time when doing the local EOI
>> and then the newly added code path for IOAPIC EOI with directed=true.
>>
>> I, again, followed the above mentioned document which explicitly
>> dictates the sequence. And I mechanically split the function to the
>> "local part' - what it had been doing up to the continue statement -
>> and the "directed part" - what it had been skipping. I'll admit that
>> my familiarity with this code is limited and there may be a better way
>> to do this.
>
> Instead of the "!directed" flag (which is imho duplicated with what
> APIC_SPIV_DIRECTED_EOI means), do you like below fix?
>
> -----8<-----
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 6e219e5..78d3ec8 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>                 kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>                 spin_lock(&ioapic->lock);
>
> -               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> -                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> +               if (trigger_mode != IOAPIC_LEVEL_TRIG)
>                         continue;
>
>                 ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
> @@ -473,10 +472,15 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>         }
>  }
>
> +/* This should only be triggered by APIC EOI broadcast */
>  void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
>  {
>         struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
>
> +       /* If we'll be using direct EOI, skip broadcast */
> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> +               return;
> +

I've only seen the direct EOI sent for level irqs so I'm afraid that
__kvm_ioapic_update_eoi needs to run for edge-triggered even if the
APIC_SPIV_DIRECTED_EOI flag is set.

Other than that it looks reasonable.

>         spin_lock(&ioapic->lock);
>         __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
>         spin_unlock(&ioapic->lock);
> ---->8----
>
> This patch along will break kvm_notify_acked_irq() in some way I
> guess, but if with your patch (though will possibly need to boost
> IOAPIC version to 0x20 as well), it should work fine as long as guest
> remembers to send the direct EOI.

Not sure about the version boost, especially since we don't have a
good spec to define what the version means. Maybe only if it helps
Linux performance. In theory __eoi_ioapic_pin should be causing fewer
vmexits with version>=0x20.

> Thanks,
>
> --
> Peter Xu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-12  9:37       ` Ladi Prosek
@ 2017-04-12 10:28         ` Ladi Prosek
  2017-04-12 11:57           ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Ladi Prosek @ 2017-04-12 10:28 UTC (permalink / raw)
  To: Peter Xu; +Cc: KVM list, Radim Krcmar

On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek <lprosek@redhat.com> wrote:
> On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@redhat.com> wrote:
>> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote:
>>> On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@redhat.com> wrote:
>>> > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
>>> >> If the guest takes advantage of the directed EOI feature by setting
>>> >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
>>> >> the EOI register of the respective IOAPIC.
>>> >>
>>> >> From Intel's x2APIC Specification:
>>> >> "following the EOI to the local x2APIC unit for a level triggered
>>> >> interrupt, perform a directed EOI to the IOxAPIC generating the
>>> >> interrupt by writing to its EOI register."
>>> >>
>>> >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>>> >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
>>> >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
>>> >> was later removed with the rest of IA64 support.
>>> >>
>>> >> The bug has gone undetected for a long time because Linux writes to
>>> >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
>>> >> seem to perform such a check.
>>> >
>>> > Hi, Ladi,
>>>
>>> Hi Peter,
>>>
>>> > Not sure I'm understanding it correctly... I see "direct EOI" is a
>>> > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
>>> > another feature for APIC. They are not the same feature, so it may not
>>> > be required to have them all together. IIUC current x86 kvm is just
>>> > the case - it supports EOI broadcast suppression on APIC, but it does
>>> > not support direct EOI on kernel IOAPIC.
>>>
>>> Thanks, that makes perfect sense and explains why Linux behaves the
>>> way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).
>>>
>>> This document makes it look like suppress EOI-broadcast always implies
>>> directed EOI, though:
>>>
>>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
>>>
>>> NB "The support for Directed EOI capability can be detected by means
>>> of bit 24 in the Local APIC Version Register. "
>>>
>>> There is no mention of APIC version or any other detection mechanism
>>> for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
>>> but I don't see that in the document either.
>>>
>>> I suspect that Microsoft implemented EOI by following this same spec.
>>> Level-triggered interrupts don't work right on Windows Server 2016
>>> with Hyper-V enabled without this patch.
>>
>> Yes, the documents for IOAPIC is always hard to find, at least for
>> me...
>>
>> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here:
>> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf
>>
>> However I see nothing related to how the IOAPIC version is defined. In
>> that sense, the comment above __eoi_ioapic_pin() seems to be better. :)
>>
>>>
>>> > I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even
>>> > if IOAPIC does not support direct EOI (the guest can know it by
>>> > probing IOAPIC version). Please correct if I'm wrong.
>>>
>>> Yes, I think that the guest is to blame here. We might add that to the
>>> commit message.
>>
>> Agreed.
>>
>>>
>>> >>
>>> >> This commit re-adds IOAPIC_REG_EOI and implements it in terms of
>>> >> __kvm_ioapic_update_eoi.
>>> >>
>>> >> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>>> >> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
>>> >> ---
>>> >>  arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------
>>> >>  arch/x86/kvm/ioapic.h |  1 +
>>> >>  2 files changed, 29 insertions(+), 18 deletions(-)
>>> >>
>>> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>>> >> index 289270a..8df1c6c 100644
>>> >> --- a/arch/x86/kvm/ioapic.c
>>> >> +++ b/arch/x86/kvm/ioapic.c
>>> >> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>>> >>  #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
>>> >>
>>> >>  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>>> >> -                     struct kvm_ioapic *ioapic, int vector, int trigger_mode)
>>> >> +                     struct kvm_ioapic *ioapic, int vector, int trigger_mode,
>>> >> +                     bool directed)
>>> >>  {
>>> >>       struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
>>> >>       struct kvm_lapic *apic = vcpu->arch.apic;
>>> >>       int i;
>>> >>
>>> >>       /* RTC special handling */
>>> >> -     if (test_bit(vcpu->vcpu_id, dest_map->map) &&
>>> >> +     if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) &&
>>> >>           vector == dest_map->vectors[vcpu->vcpu_id])
>>> >>               rtc_irq_eoi(ioapic, vcpu);
>>> >>
>>> >> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>>> >>               if (ent->fields.vector != vector)
>>> >>                       continue;
>>> >>
>>> >> -             /*
>>> >> -              * We are dropping lock while calling ack notifiers because ack
>>> >> -              * notifier callbacks for assigned devices call into IOAPIC
>>> >> -              * recursively. Since remote_irr is cleared only after call
>>> >> -              * to notifiers if the same vector will be delivered while lock
>>> >> -              * is dropped it will be put into irr and will be delivered
>>> >> -              * after ack notifier returns.
>>> >> -              */
>>> >> -             spin_unlock(&ioapic->lock);
>>> >> -             kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>>> >> -             spin_lock(&ioapic->lock);
>>> >> -
>>> >> -             if (trigger_mode != IOAPIC_LEVEL_TRIG ||
>>> >> -                 kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>>> >> -                     continue;
>>> >> +             if (!directed) {
>>> >
>>> > Could I ask why we need to skip this if the EOI is sent via direct EOI
>>> > register of IOAPIC?
>>>
>>> Because it's already been done as part of the local EOI. With directed
>>> EOI we hit this function twice, first time when doing the local EOI
>>> and then the newly added code path for IOAPIC EOI with directed=true.
>>>
>>> I, again, followed the above mentioned document which explicitly
>>> dictates the sequence. And I mechanically split the function to the
>>> "local part' - what it had been doing up to the continue statement -
>>> and the "directed part" - what it had been skipping. I'll admit that
>>> my familiarity with this code is limited and there may be a better way
>>> to do this.
>>
>> Instead of the "!directed" flag (which is imho duplicated with what
>> APIC_SPIV_DIRECTED_EOI means), do you like below fix?
>>
>> -----8<-----
>> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> index 6e219e5..78d3ec8 100644
>> --- a/arch/x86/kvm/ioapic.c
>> +++ b/arch/x86/kvm/ioapic.c
>> @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>>                 kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>>                 spin_lock(&ioapic->lock);
>>
>> -               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
>> -                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> +               if (trigger_mode != IOAPIC_LEVEL_TRIG)
>>                         continue;
>>
>>                 ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>> @@ -473,10 +472,15 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>>         }
>>  }
>>
>> +/* This should only be triggered by APIC EOI broadcast */
>>  void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
>>  {
>>         struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
>>
>> +       /* If we'll be using direct EOI, skip broadcast */
>> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> +               return;
>> +
>
> I've only seen the direct EOI sent for level irqs so I'm afraid that
> __kvm_ioapic_update_eoi needs to run for edge-triggered even if the
> APIC_SPIV_DIRECTED_EOI flag is set.
>
> Other than that it looks reasonable.

Although, wait, what if the guest uses APIC_SPIV_DIRECTED_EOI to
suppress the broadcast but then does EOI by writing to the IOAPIC
routing entry? You kind of indicated that this would be a valid use of
the feature. This is what __eoi_ioapic_pin does for version<0x20 and
on the host side we reset the remote_irr in ioapic_write_indirect if
I'm reading the code correctly. Wouldn't we want to deliver the
notification via kvm_notify_acked_irq in this case also?

Thanks!

>>         spin_lock(&ioapic->lock);
>>         __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
>>         spin_unlock(&ioapic->lock);
>> ---->8----
>>
>> This patch along will break kvm_notify_acked_irq() in some way I
>> guess, but if with your patch (though will possibly need to boost
>> IOAPIC version to 0x20 as well), it should work fine as long as guest
>> remembers to send the direct EOI.
>
> Not sure about the version boost, especially since we don't have a
> good spec to define what the version means. Maybe only if it helps
> Linux performance. In theory __eoi_ioapic_pin should be causing fewer
> vmexits with version>=0x20.
>
>> Thanks,
>>
>> --
>> Peter Xu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-12 10:28         ` Ladi Prosek
@ 2017-04-12 11:57           ` Peter Xu
  2017-04-12 12:20             ` Ladi Prosek
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2017-04-12 11:57 UTC (permalink / raw)
  To: Ladi Prosek; +Cc: KVM list, Radim Krcmar

On Wed, Apr 12, 2017 at 12:28:25PM +0200, Ladi Prosek wrote:
> On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek <lprosek@redhat.com> wrote:
> > On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@redhat.com> wrote:
> >> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote:
> >>> On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@redhat.com> wrote:
> >>> > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
> >>> >> If the guest takes advantage of the directed EOI feature by setting
> >>> >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
> >>> >> the EOI register of the respective IOAPIC.
> >>> >>
> >>> >> From Intel's x2APIC Specification:
> >>> >> "following the EOI to the local x2APIC unit for a level triggered
> >>> >> interrupt, perform a directed EOI to the IOxAPIC generating the
> >>> >> interrupt by writing to its EOI register."
> >>> >>
> >>> >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
> >>> >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
> >>> >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
> >>> >> was later removed with the rest of IA64 support.
> >>> >>
> >>> >> The bug has gone undetected for a long time because Linux writes to
> >>> >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
> >>> >> seem to perform such a check.
> >>> >
> >>> > Hi, Ladi,
> >>>
> >>> Hi Peter,
> >>>
> >>> > Not sure I'm understanding it correctly... I see "direct EOI" is a
> >>> > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
> >>> > another feature for APIC. They are not the same feature, so it may not
> >>> > be required to have them all together. IIUC current x86 kvm is just
> >>> > the case - it supports EOI broadcast suppression on APIC, but it does
> >>> > not support direct EOI on kernel IOAPIC.
> >>>
> >>> Thanks, that makes perfect sense and explains why Linux behaves the
> >>> way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).
> >>>
> >>> This document makes it look like suppress EOI-broadcast always implies
> >>> directed EOI, though:
> >>>
> >>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
> >>>
> >>> NB "The support for Directed EOI capability can be detected by means
> >>> of bit 24 in the Local APIC Version Register. "
> >>>
> >>> There is no mention of APIC version or any other detection mechanism
> >>> for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
> >>> but I don't see that in the document either.
> >>>
> >>> I suspect that Microsoft implemented EOI by following this same spec.
> >>> Level-triggered interrupts don't work right on Windows Server 2016
> >>> with Hyper-V enabled without this patch.
> >>
> >> Yes, the documents for IOAPIC is always hard to find, at least for
> >> me...
> >>
> >> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here:
> >> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf
> >>
> >> However I see nothing related to how the IOAPIC version is defined. In
> >> that sense, the comment above __eoi_ioapic_pin() seems to be better. :)
> >>
> >>>
> >>> > I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even
> >>> > if IOAPIC does not support direct EOI (the guest can know it by
> >>> > probing IOAPIC version). Please correct if I'm wrong.
> >>>
> >>> Yes, I think that the guest is to blame here. We might add that to the
> >>> commit message.
> >>
> >> Agreed.
> >>
> >>>
> >>> >>
> >>> >> This commit re-adds IOAPIC_REG_EOI and implements it in terms of
> >>> >> __kvm_ioapic_update_eoi.
> >>> >>
> >>> >> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
> >>> >> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> >>> >> ---
> >>> >>  arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------
> >>> >>  arch/x86/kvm/ioapic.h |  1 +
> >>> >>  2 files changed, 29 insertions(+), 18 deletions(-)
> >>> >>
> >>> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> >>> >> index 289270a..8df1c6c 100644
> >>> >> --- a/arch/x86/kvm/ioapic.c
> >>> >> +++ b/arch/x86/kvm/ioapic.c
> >>> >> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
> >>> >>  #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
> >>> >>
> >>> >>  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >>> >> -                     struct kvm_ioapic *ioapic, int vector, int trigger_mode)
> >>> >> +                     struct kvm_ioapic *ioapic, int vector, int trigger_mode,
> >>> >> +                     bool directed)
> >>> >>  {
> >>> >>       struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
> >>> >>       struct kvm_lapic *apic = vcpu->arch.apic;
> >>> >>       int i;
> >>> >>
> >>> >>       /* RTC special handling */
> >>> >> -     if (test_bit(vcpu->vcpu_id, dest_map->map) &&
> >>> >> +     if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) &&
> >>> >>           vector == dest_map->vectors[vcpu->vcpu_id])
> >>> >>               rtc_irq_eoi(ioapic, vcpu);
> >>> >>
> >>> >> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >>> >>               if (ent->fields.vector != vector)
> >>> >>                       continue;
> >>> >>
> >>> >> -             /*
> >>> >> -              * We are dropping lock while calling ack notifiers because ack
> >>> >> -              * notifier callbacks for assigned devices call into IOAPIC
> >>> >> -              * recursively. Since remote_irr is cleared only after call
> >>> >> -              * to notifiers if the same vector will be delivered while lock
> >>> >> -              * is dropped it will be put into irr and will be delivered
> >>> >> -              * after ack notifier returns.
> >>> >> -              */
> >>> >> -             spin_unlock(&ioapic->lock);
> >>> >> -             kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> >>> >> -             spin_lock(&ioapic->lock);
> >>> >> -
> >>> >> -             if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> >>> >> -                 kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> >>> >> -                     continue;
> >>> >> +             if (!directed) {
> >>> >
> >>> > Could I ask why we need to skip this if the EOI is sent via direct EOI
> >>> > register of IOAPIC?
> >>>
> >>> Because it's already been done as part of the local EOI. With directed
> >>> EOI we hit this function twice, first time when doing the local EOI
> >>> and then the newly added code path for IOAPIC EOI with directed=true.
> >>>
> >>> I, again, followed the above mentioned document which explicitly
> >>> dictates the sequence. And I mechanically split the function to the
> >>> "local part' - what it had been doing up to the continue statement -
> >>> and the "directed part" - what it had been skipping. I'll admit that
> >>> my familiarity with this code is limited and there may be a better way
> >>> to do this.
> >>
> >> Instead of the "!directed" flag (which is imho duplicated with what
> >> APIC_SPIV_DIRECTED_EOI means), do you like below fix?
> >>
> >> -----8<-----
> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> >> index 6e219e5..78d3ec8 100644
> >> --- a/arch/x86/kvm/ioapic.c
> >> +++ b/arch/x86/kvm/ioapic.c
> >> @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >>                 kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> >>                 spin_lock(&ioapic->lock);
> >>
> >> -               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> >> -                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> >> +               if (trigger_mode != IOAPIC_LEVEL_TRIG)
> >>                         continue;
> >>
> >>                 ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
> >> @@ -473,10 +472,15 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >>         }
> >>  }
> >>
> >> +/* This should only be triggered by APIC EOI broadcast */
> >>  void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
> >>  {
> >>         struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
> >>
> >> +       /* If we'll be using direct EOI, skip broadcast */
> >> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> >> +               return;
> >> +
> >
> > I've only seen the direct EOI sent for level irqs so I'm afraid that
> > __kvm_ioapic_update_eoi needs to run for edge-triggered even if the
> > APIC_SPIV_DIRECTED_EOI flag is set.

Yes, if without your patch, it is problematic. But if with your patch,
__kvm_ioapic_update_eoi() will be called in ioapic mmio write then.

> >
> > Other than that it looks reasonable.
> 
> Although, wait, what if the guest uses APIC_SPIV_DIRECTED_EOI to
> suppress the broadcast but then does EOI by writing to the IOAPIC
> routing entry? You kind of indicated that this would be a valid use of
> the feature.

That's exactly how I understand it. :)

> This is what __eoi_ioapic_pin does for version<0x20 and
> on the host side we reset the remote_irr in ioapic_write_indirect if
> I'm reading the code correctly. Wouldn't we want to deliver the
> notification via kvm_notify_acked_irq in this case also?

I think in that case (EOI sent via "direct EOI" of ioapic mmio
register) the remote_irr is not cleaned in ioapic_write_indirect(),
but it's cleaned in __kvm_ioapic_update_eoi() as well. It has the
line:

		ent->fields.remote_irr = 0;

Right?

> 
> Thanks!
> 
> >>         spin_lock(&ioapic->lock);
> >>         __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
> >>         spin_unlock(&ioapic->lock);
> >> ---->8----
> >>
> >> This patch along will break kvm_notify_acked_irq() in some way I
> >> guess, but if with your patch (though will possibly need to boost
> >> IOAPIC version to 0x20 as well), it should work fine as long as guest
> >> remembers to send the direct EOI.
> >
> > Not sure about the version boost, especially since we don't have a
> > good spec to define what the version means. Maybe only if it helps
> > Linux performance. In theory __eoi_ioapic_pin should be causing fewer
> > vmexits with version>=0x20.

I think at least from the comment it means only ioapic version 0x20
has that EOI register, and that's how I understand it.

Btw, I would even optimize my above fix to the following, which I like
it better:

---->8----
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 6e219e5..67d0849 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
                kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
                spin_lock(&ioapic->lock);
 
-               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
-                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+               if (trigger_mode != IOAPIC_LEVEL_TRIG)
                        continue;
 
                ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bad6a25..11ee9b7 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1064,6 +1064,9 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
        if (!kvm_ioapic_handles_vector(apic, vector))
                return;
 
+       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+               return;
+
        /* Request a KVM exit to inform the userspace IOAPIC. */
        if (irqchip_split(apic->vcpu->kvm)) {
                apic->vcpu->arch.pending_ioapic_eoi = vector;
----8<----

Moving the APIC_SPIV_DIRECTED_EOI check into kvm_ioapic_send_eoi()
will make sure kvm will not send this useless EOI broadcast even the
ioapic is in userspace.

Thanks,

-- 
Peter Xu

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-12 11:57           ` Peter Xu
@ 2017-04-12 12:20             ` Ladi Prosek
  2017-04-13  8:32               ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Ladi Prosek @ 2017-04-12 12:20 UTC (permalink / raw)
  To: Peter Xu; +Cc: KVM list, Radim Krcmar

On Wed, Apr 12, 2017 at 1:57 PM, Peter Xu <peterx@redhat.com> wrote:
> On Wed, Apr 12, 2017 at 12:28:25PM +0200, Ladi Prosek wrote:
>> On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek <lprosek@redhat.com> wrote:
>> > On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@redhat.com> wrote:
>> >> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote:
>> >>> On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@redhat.com> wrote:
>> >>> > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
>> >>> >> If the guest takes advantage of the directed EOI feature by setting
>> >>> >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
>> >>> >> the EOI register of the respective IOAPIC.
>> >>> >>
>> >>> >> From Intel's x2APIC Specification:
>> >>> >> "following the EOI to the local x2APIC unit for a level triggered
>> >>> >> interrupt, perform a directed EOI to the IOxAPIC generating the
>> >>> >> interrupt by writing to its EOI register."
>> >>> >>
>> >>> >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>> >>> >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
>> >>> >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
>> >>> >> was later removed with the rest of IA64 support.
>> >>> >>
>> >>> >> The bug has gone undetected for a long time because Linux writes to
>> >>> >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
>> >>> >> seem to perform such a check.
>> >>> >
>> >>> > Hi, Ladi,
>> >>>
>> >>> Hi Peter,
>> >>>
>> >>> > Not sure I'm understanding it correctly... I see "direct EOI" is a
>> >>> > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
>> >>> > another feature for APIC. They are not the same feature, so it may not
>> >>> > be required to have them all together. IIUC current x86 kvm is just
>> >>> > the case - it supports EOI broadcast suppression on APIC, but it does
>> >>> > not support direct EOI on kernel IOAPIC.
>> >>>
>> >>> Thanks, that makes perfect sense and explains why Linux behaves the
>> >>> way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).
>> >>>
>> >>> This document makes it look like suppress EOI-broadcast always implies
>> >>> directed EOI, though:
>> >>>
>> >>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
>> >>>
>> >>> NB "The support for Directed EOI capability can be detected by means
>> >>> of bit 24 in the Local APIC Version Register. "
>> >>>
>> >>> There is no mention of APIC version or any other detection mechanism
>> >>> for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
>> >>> but I don't see that in the document either.
>> >>>
>> >>> I suspect that Microsoft implemented EOI by following this same spec.
>> >>> Level-triggered interrupts don't work right on Windows Server 2016
>> >>> with Hyper-V enabled without this patch.
>> >>
>> >> Yes, the documents for IOAPIC is always hard to find, at least for
>> >> me...
>> >>
>> >> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here:
>> >> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf
>> >>
>> >> However I see nothing related to how the IOAPIC version is defined. In
>> >> that sense, the comment above __eoi_ioapic_pin() seems to be better. :)
>> >>
>> >>>
>> >>> > I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even
>> >>> > if IOAPIC does not support direct EOI (the guest can know it by
>> >>> > probing IOAPIC version). Please correct if I'm wrong.
>> >>>
>> >>> Yes, I think that the guest is to blame here. We might add that to the
>> >>> commit message.
>> >>
>> >> Agreed.
>> >>
>> >>>
>> >>> >>
>> >>> >> This commit re-adds IOAPIC_REG_EOI and implements it in terms of
>> >>> >> __kvm_ioapic_update_eoi.
>> >>> >>
>> >>> >> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>> >>> >> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
>> >>> >> ---
>> >>> >>  arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------
>> >>> >>  arch/x86/kvm/ioapic.h |  1 +
>> >>> >>  2 files changed, 29 insertions(+), 18 deletions(-)
>> >>> >>
>> >>> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> >>> >> index 289270a..8df1c6c 100644
>> >>> >> --- a/arch/x86/kvm/ioapic.c
>> >>> >> +++ b/arch/x86/kvm/ioapic.c
>> >>> >> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
>> >>> >>  #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
>> >>> >>
>> >>> >>  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>> >>> >> -                     struct kvm_ioapic *ioapic, int vector, int trigger_mode)
>> >>> >> +                     struct kvm_ioapic *ioapic, int vector, int trigger_mode,
>> >>> >> +                     bool directed)
>> >>> >>  {
>> >>> >>       struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
>> >>> >>       struct kvm_lapic *apic = vcpu->arch.apic;
>> >>> >>       int i;
>> >>> >>
>> >>> >>       /* RTC special handling */
>> >>> >> -     if (test_bit(vcpu->vcpu_id, dest_map->map) &&
>> >>> >> +     if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) &&
>> >>> >>           vector == dest_map->vectors[vcpu->vcpu_id])
>> >>> >>               rtc_irq_eoi(ioapic, vcpu);
>> >>> >>
>> >>> >> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>> >>> >>               if (ent->fields.vector != vector)
>> >>> >>                       continue;
>> >>> >>
>> >>> >> -             /*
>> >>> >> -              * We are dropping lock while calling ack notifiers because ack
>> >>> >> -              * notifier callbacks for assigned devices call into IOAPIC
>> >>> >> -              * recursively. Since remote_irr is cleared only after call
>> >>> >> -              * to notifiers if the same vector will be delivered while lock
>> >>> >> -              * is dropped it will be put into irr and will be delivered
>> >>> >> -              * after ack notifier returns.
>> >>> >> -              */
>> >>> >> -             spin_unlock(&ioapic->lock);
>> >>> >> -             kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>> >>> >> -             spin_lock(&ioapic->lock);
>> >>> >> -
>> >>> >> -             if (trigger_mode != IOAPIC_LEVEL_TRIG ||
>> >>> >> -                 kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> >>> >> -                     continue;
>> >>> >> +             if (!directed) {
>> >>> >
>> >>> > Could I ask why we need to skip this if the EOI is sent via direct EOI
>> >>> > register of IOAPIC?
>> >>>
>> >>> Because it's already been done as part of the local EOI. With directed
>> >>> EOI we hit this function twice, first time when doing the local EOI
>> >>> and then the newly added code path for IOAPIC EOI with directed=true.
>> >>>
>> >>> I, again, followed the above mentioned document which explicitly
>> >>> dictates the sequence. And I mechanically split the function to the
>> >>> "local part' - what it had been doing up to the continue statement -
>> >>> and the "directed part" - what it had been skipping. I'll admit that
>> >>> my familiarity with this code is limited and there may be a better way
>> >>> to do this.
>> >>
>> >> Instead of the "!directed" flag (which is imho duplicated with what
>> >> APIC_SPIV_DIRECTED_EOI means), do you like below fix?
>> >>
>> >> -----8<-----
>> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
>> >> index 6e219e5..78d3ec8 100644
>> >> --- a/arch/x86/kvm/ioapic.c
>> >> +++ b/arch/x86/kvm/ioapic.c
>> >> @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>> >>                 kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>> >>                 spin_lock(&ioapic->lock);
>> >>
>> >> -               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
>> >> -                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> >> +               if (trigger_mode != IOAPIC_LEVEL_TRIG)
>> >>                         continue;
>> >>
>> >>                 ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
>> >> @@ -473,10 +472,15 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>> >>         }
>> >>  }
>> >>
>> >> +/* This should only be triggered by APIC EOI broadcast */
>> >>  void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
>> >>  {
>> >>         struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
>> >>
>> >> +       /* If we'll be using direct EOI, skip broadcast */
>> >> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> >> +               return;
>> >> +
>> >
>> > I've only seen the direct EOI sent for level irqs so I'm afraid that
>> > __kvm_ioapic_update_eoi needs to run for edge-triggered even if the
>> > APIC_SPIV_DIRECTED_EOI flag is set.
>
> Yes, if without your patch, it is problematic. But if with your patch,
> __kvm_ioapic_update_eoi() will be called in ioapic mmio write then.

No, there's no ioapic mmio write for edge-triggered interrupts, at
least Windows don't do any. Edge interrupts must still be handled on
the lapic EOI path.

>> >
>> > Other than that it looks reasonable.
>>
>> Although, wait, what if the guest uses APIC_SPIV_DIRECTED_EOI to
>> suppress the broadcast but then does EOI by writing to the IOAPIC
>> routing entry? You kind of indicated that this would be a valid use of
>> the feature.
>
> That's exactly how I understand it. :)
>
>> This is what __eoi_ioapic_pin does for version<0x20 and
>> on the host side we reset the remote_irr in ioapic_write_indirect if
>> I'm reading the code correctly. Wouldn't we want to deliver the
>> notification via kvm_notify_acked_irq in this case also?
>
> I think in that case (EOI sent via "direct EOI" of ioapic mmio
> register) the remote_irr is not cleaned in ioapic_write_indirect(),
> but it's cleaned in __kvm_ioapic_update_eoi() as well. It has the
> line:
>
>                 ent->fields.remote_irr = 0;
>
> Right?

Right, but I meant the other case, a guest that sets
APIC_SPIV_DIRECTED_EOI but doesn't write to IOAPIC_REG_EOI. With your
patch there would be no kvm_notify_acked_irq and that doesn't look
correct.

>>
>> Thanks!
>>
>> >>         spin_lock(&ioapic->lock);
>> >>         __kvm_ioapic_update_eoi(vcpu, ioapic, vector, trigger_mode);
>> >>         spin_unlock(&ioapic->lock);
>> >> ---->8----
>> >>
>> >> This patch along will break kvm_notify_acked_irq() in some way I
>> >> guess, but if with your patch (though will possibly need to boost
>> >> IOAPIC version to 0x20 as well), it should work fine as long as guest
>> >> remembers to send the direct EOI.
>> >
>> > Not sure about the version boost, especially since we don't have a
>> > good spec to define what the version means. Maybe only if it helps
>> > Linux performance. In theory __eoi_ioapic_pin should be causing fewer
>> > vmexits with version>=0x20.
>
> I think at least from the comment it means only ioapic version 0x20
> has that EOI register, and that's how I understand it.
>
> Btw, I would even optimize my above fix to the following, which I like
> it better:
>
> ---->8----
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 6e219e5..67d0849 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>                 kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>                 spin_lock(&ioapic->lock);
>
> -               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> -                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> +               if (trigger_mode != IOAPIC_LEVEL_TRIG)
>                         continue;
>
>                 ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index bad6a25..11ee9b7 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1064,6 +1064,9 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
>         if (!kvm_ioapic_handles_vector(apic, vector))
>                 return;
>
> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> +               return;
> +
>         /* Request a KVM exit to inform the userspace IOAPIC. */
>         if (irqchip_split(apic->vcpu->kvm)) {
>                 apic->vcpu->arch.pending_ioapic_eoi = vector;
> ----8<----
>
> Moving the APIC_SPIV_DIRECTED_EOI check into kvm_ioapic_send_eoi()
> will make sure kvm will not send this useless EOI broadcast even the
> ioapic is in userspace.
>
> Thanks,
>
> --
> Peter Xu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-12  9:06     ` Peter Xu
  2017-04-12  9:37       ` Ladi Prosek
@ 2017-04-12 20:52       ` Radim Krcmar
  2017-04-13  6:36         ` Ladi Prosek
  1 sibling, 1 reply; 16+ messages in thread
From: Radim Krcmar @ 2017-04-12 20:52 UTC (permalink / raw)
  To: Peter Xu; +Cc: Ladi Prosek, KVM list

2017-04-12 17:06+0800, Peter Xu:
> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote:
> > On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@redhat.com> wrote:
> > > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
> > >> If the guest takes advantage of the directed EOI feature by setting
> > >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
> > >> the EOI register of the respective IOAPIC.
> > >>
> > >> From Intel's x2APIC Specification:
> > >> "following the EOI to the local x2APIC unit for a level triggered
> > >> interrupt, perform a directed EOI to the IOxAPIC generating the
> > >> interrupt by writing to its EOI register."
> > >>
> > >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
> > >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
> > >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
> > >> was later removed with the rest of IA64 support.
> > >>
> > >> The bug has gone undetected for a long time because Linux writes to
> > >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
> > >> seem to perform such a check.
> > >
> > > Hi, Ladi,
> > 
> > Hi Peter,
> > 
> > > Not sure I'm understanding it correctly... I see "direct EOI" is a
> > > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
> > > another feature for APIC. They are not the same feature, so it may not
> > > be required to have them all together. IIUC current x86 kvm is just
> > > the case - it supports EOI broadcast suppression on APIC, but it does
> > > not support direct EOI on kernel IOAPIC.
> > 
> > Thanks, that makes perfect sense and explains why Linux behaves the
> > way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).
> > 
> > This document makes it look like suppress EOI-broadcast always implies
> > directed EOI, though:
> > 
> > http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
> > 
> > NB "The support for Directed EOI capability can be detected by means
> > of bit 24 in the Local APIC Version Register. "
> > 
> > There is no mention of APIC version or any other detection mechanism
> > for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
> > but I don't see that in the document either.
> > 
> > I suspect that Microsoft implemented EOI by following this same spec.
> > Level-triggered interrupts don't work right on Windows Server 2016
> > with Hyper-V enabled without this patch.
> 
> Yes, the documents for IOAPIC is always hard to find, at least for
> me...
> 
> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here:
> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf
> 
> However I see nothing related to how the IOAPIC version is defined. In
> that sense, the comment above __eoi_ioapic_pin() seems to be better. :)

Yeah, it is officially described in ICH9 datasheet as:

  13.5.6 VER—Version Register (LPC I/F—D31:F0)
  Default Value: 00170020h

The one we emulate in KVM is in 82093AA datasheet:

  3.2.2.  IOAPICVER—IOAPIC VERSION REGISTER
  Default Value: 00170011h

The former has the EOI register, the latter doesn't.

---
I don't like the idea behind the patch, but it is acceptable and
thinking about good solutions gets us into compatibility nightmares ...
(We could remove support for directed EOI, because it is a detectable
 feature and makes little sense in KVM, or we could implement the IOAPIC
 version 0x20, but both would be tricky to migrate.)

People should switch to userspace IOAPIC anyway. :)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-12 20:52       ` Radim Krcmar
@ 2017-04-13  6:36         ` Ladi Prosek
  2017-04-13 14:15           ` Radim Krcmar
  2017-04-14  5:27           ` Paolo Bonzini
  0 siblings, 2 replies; 16+ messages in thread
From: Ladi Prosek @ 2017-04-13  6:36 UTC (permalink / raw)
  To: Radim Krcmar; +Cc: Peter Xu, KVM list

On Wed, Apr 12, 2017 at 10:52 PM, Radim Krcmar <rkrcmar@redhat.com> wrote:
> 2017-04-12 17:06+0800, Peter Xu:
>> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote:
>> > On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@redhat.com> wrote:
>> > > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
>> > >> If the guest takes advantage of the directed EOI feature by setting
>> > >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
>> > >> the EOI register of the respective IOAPIC.
>> > >>
>> > >> From Intel's x2APIC Specification:
>> > >> "following the EOI to the local x2APIC unit for a level triggered
>> > >> interrupt, perform a directed EOI to the IOxAPIC generating the
>> > >> interrupt by writing to its EOI register."
>> > >>
>> > >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>> > >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
>> > >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
>> > >> was later removed with the rest of IA64 support.
>> > >>
>> > >> The bug has gone undetected for a long time because Linux writes to
>> > >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
>> > >> seem to perform such a check.
>> > >
>> > > Hi, Ladi,
>> >
>> > Hi Peter,
>> >
>> > > Not sure I'm understanding it correctly... I see "direct EOI" is a
>> > > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
>> > > another feature for APIC. They are not the same feature, so it may not
>> > > be required to have them all together. IIUC current x86 kvm is just
>> > > the case - it supports EOI broadcast suppression on APIC, but it does
>> > > not support direct EOI on kernel IOAPIC.
>> >
>> > Thanks, that makes perfect sense and explains why Linux behaves the
>> > way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).
>> >
>> > This document makes it look like suppress EOI-broadcast always implies
>> > directed EOI, though:
>> >
>> > http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
>> >
>> > NB "The support for Directed EOI capability can be detected by means
>> > of bit 24 in the Local APIC Version Register. "
>> >
>> > There is no mention of APIC version or any other detection mechanism
>> > for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
>> > but I don't see that in the document either.
>> >
>> > I suspect that Microsoft implemented EOI by following this same spec.
>> > Level-triggered interrupts don't work right on Windows Server 2016
>> > with Hyper-V enabled without this patch.
>>
>> Yes, the documents for IOAPIC is always hard to find, at least for
>> me...
>>
>> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here:
>> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf
>>
>> However I see nothing related to how the IOAPIC version is defined. In
>> that sense, the comment above __eoi_ioapic_pin() seems to be better. :)
>
> Yeah, it is officially described in ICH9 datasheet as:
>
>   13.5.6 VER—Version Register (LPC I/F—D31:F0)
>   Default Value: 00170020h
>
> The one we emulate in KVM is in 82093AA datasheet:
>
>   3.2.2.  IOAPICVER—IOAPIC VERSION REGISTER
>   Default Value: 00170011h
>
> The former has the EOI register, the latter doesn't.

Got it. Do you want me to resubmit with a different commit message and
a comment blaming the guest OS instead of calling it a KVM bug? :) Or
do you think it's worth exploring Peter's suggestion to make a more
invasive fix?

> ---
> I don't like the idea behind the patch, but it is acceptable and
> thinking about good solutions gets us into compatibility nightmares ...
> (We could remove support for directed EOI, because it is a detectable
>  feature and makes little sense in KVM, or we could implement the IOAPIC
>  version 0x20, but both would be tricky to migrate.)
>
> People should switch to userspace IOAPIC anyway. :)

For the record, QEMU with kernel-irqchip=split works fine as it
emulates version 0x20 with the IOAPIC_EOI register by default.
kernel-irqchip=off does not seem to work, I will look into it.

Thanks!
Ladi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-12 12:20             ` Ladi Prosek
@ 2017-04-13  8:32               ` Peter Xu
  2017-04-13 14:09                 ` Radim Krcmar
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2017-04-13  8:32 UTC (permalink / raw)
  To: Ladi Prosek; +Cc: KVM list, Radim Krcmar

On Wed, Apr 12, 2017 at 02:20:12PM +0200, Ladi Prosek wrote:
> On Wed, Apr 12, 2017 at 1:57 PM, Peter Xu <peterx@redhat.com> wrote:
> > On Wed, Apr 12, 2017 at 12:28:25PM +0200, Ladi Prosek wrote:
> >> On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek <lprosek@redhat.com> wrote:
> >> > On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@redhat.com> wrote:
> >> >> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote:
> >> >>> On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@redhat.com> wrote:
> >> >>> > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
> >> >>> >> If the guest takes advantage of the directed EOI feature by setting
> >> >>> >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
> >> >>> >> the EOI register of the respective IOAPIC.
> >> >>> >>
> >> >>> >> From Intel's x2APIC Specification:
> >> >>> >> "following the EOI to the local x2APIC unit for a level triggered
> >> >>> >> interrupt, perform a directed EOI to the IOxAPIC generating the
> >> >>> >> interrupt by writing to its EOI register."
> >> >>> >>
> >> >>> >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
> >> >>> >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
> >> >>> >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
> >> >>> >> was later removed with the rest of IA64 support.
> >> >>> >>
> >> >>> >> The bug has gone undetected for a long time because Linux writes to
> >> >>> >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
> >> >>> >> seem to perform such a check.
> >> >>> >
> >> >>> > Hi, Ladi,
> >> >>>
> >> >>> Hi Peter,
> >> >>>
> >> >>> > Not sure I'm understanding it correctly... I see "direct EOI" is a
> >> >>> > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
> >> >>> > another feature for APIC. They are not the same feature, so it may not
> >> >>> > be required to have them all together. IIUC current x86 kvm is just
> >> >>> > the case - it supports EOI broadcast suppression on APIC, but it does
> >> >>> > not support direct EOI on kernel IOAPIC.
> >> >>>
> >> >>> Thanks, that makes perfect sense and explains why Linux behaves the
> >> >>> way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).
> >> >>>
> >> >>> This document makes it look like suppress EOI-broadcast always implies
> >> >>> directed EOI, though:
> >> >>>
> >> >>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
> >> >>>
> >> >>> NB "The support for Directed EOI capability can be detected by means
> >> >>> of bit 24 in the Local APIC Version Register. "
> >> >>>
> >> >>> There is no mention of APIC version or any other detection mechanism
> >> >>> for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
> >> >>> but I don't see that in the document either.
> >> >>>
> >> >>> I suspect that Microsoft implemented EOI by following this same spec.
> >> >>> Level-triggered interrupts don't work right on Windows Server 2016
> >> >>> with Hyper-V enabled without this patch.
> >> >>
> >> >> Yes, the documents for IOAPIC is always hard to find, at least for
> >> >> me...
> >> >>
> >> >> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here:
> >> >> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf
> >> >>
> >> >> However I see nothing related to how the IOAPIC version is defined. In
> >> >> that sense, the comment above __eoi_ioapic_pin() seems to be better. :)
> >> >>
> >> >>>
> >> >>> > I think the problem is why the guest setup APIC_SPIV_DIRECTED_EOI even
> >> >>> > if IOAPIC does not support direct EOI (the guest can know it by
> >> >>> > probing IOAPIC version). Please correct if I'm wrong.
> >> >>>
> >> >>> Yes, I think that the guest is to blame here. We might add that to the
> >> >>> commit message.
> >> >>
> >> >> Agreed.
> >> >>
> >> >>>
> >> >>> >>
> >> >>> >> This commit re-adds IOAPIC_REG_EOI and implements it in terms of
> >> >>> >> __kvm_ioapic_update_eoi.
> >> >>> >>
> >> >>> >> Fixes: fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
> >> >>> >> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
> >> >>> >> ---
> >> >>> >>  arch/x86/kvm/ioapic.c | 46 ++++++++++++++++++++++++++++------------------
> >> >>> >>  arch/x86/kvm/ioapic.h |  1 +
> >> >>> >>  2 files changed, 29 insertions(+), 18 deletions(-)
> >> >>> >>
> >> >>> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> >> >>> >> index 289270a..8df1c6c 100644
> >> >>> >> --- a/arch/x86/kvm/ioapic.c
> >> >>> >> +++ b/arch/x86/kvm/ioapic.c
> >> >>> >> @@ -415,14 +415,15 @@ static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
> >> >>> >>  #define IOAPIC_SUCCESSIVE_IRQ_MAX_COUNT 10000
> >> >>> >>
> >> >>> >>  static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >> >>> >> -                     struct kvm_ioapic *ioapic, int vector, int trigger_mode)
> >> >>> >> +                     struct kvm_ioapic *ioapic, int vector, int trigger_mode,
> >> >>> >> +                     bool directed)
> >> >>> >>  {
> >> >>> >>       struct dest_map *dest_map = &ioapic->rtc_status.dest_map;
> >> >>> >>       struct kvm_lapic *apic = vcpu->arch.apic;
> >> >>> >>       int i;
> >> >>> >>
> >> >>> >>       /* RTC special handling */
> >> >>> >> -     if (test_bit(vcpu->vcpu_id, dest_map->map) &&
> >> >>> >> +     if (!directed && test_bit(vcpu->vcpu_id, dest_map->map) &&
> >> >>> >>           vector == dest_map->vectors[vcpu->vcpu_id])
> >> >>> >>               rtc_irq_eoi(ioapic, vcpu);
> >> >>> >>
> >> >>> >> @@ -432,21 +433,23 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >> >>> >>               if (ent->fields.vector != vector)
> >> >>> >>                       continue;
> >> >>> >>
> >> >>> >> -             /*
> >> >>> >> -              * We are dropping lock while calling ack notifiers because ack
> >> >>> >> -              * notifier callbacks for assigned devices call into IOAPIC
> >> >>> >> -              * recursively. Since remote_irr is cleared only after call
> >> >>> >> -              * to notifiers if the same vector will be delivered while lock
> >> >>> >> -              * is dropped it will be put into irr and will be delivered
> >> >>> >> -              * after ack notifier returns.
> >> >>> >> -              */
> >> >>> >> -             spin_unlock(&ioapic->lock);
> >> >>> >> -             kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> >> >>> >> -             spin_lock(&ioapic->lock);
> >> >>> >> -
> >> >>> >> -             if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> >> >>> >> -                 kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> >> >>> >> -                     continue;
> >> >>> >> +             if (!directed) {
> >> >>> >
> >> >>> > Could I ask why we need to skip this if the EOI is sent via direct EOI
> >> >>> > register of IOAPIC?
> >> >>>
> >> >>> Because it's already been done as part of the local EOI. With directed
> >> >>> EOI we hit this function twice, first time when doing the local EOI
> >> >>> and then the newly added code path for IOAPIC EOI with directed=true.
> >> >>>
> >> >>> I, again, followed the above mentioned document which explicitly
> >> >>> dictates the sequence. And I mechanically split the function to the
> >> >>> "local part' - what it had been doing up to the continue statement -
> >> >>> and the "directed part" - what it had been skipping. I'll admit that
> >> >>> my familiarity with this code is limited and there may be a better way
> >> >>> to do this.
> >> >>
> >> >> Instead of the "!directed" flag (which is imho duplicated with what
> >> >> APIC_SPIV_DIRECTED_EOI means), do you like below fix?
> >> >>
> >> >> -----8<-----
> >> >> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> >> >> index 6e219e5..78d3ec8 100644
> >> >> --- a/arch/x86/kvm/ioapic.c
> >> >> +++ b/arch/x86/kvm/ioapic.c
> >> >> @@ -444,8 +444,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >> >>                 kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
> >> >>                 spin_lock(&ioapic->lock);
> >> >>
> >> >> -               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> >> >> -                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> >> >> +               if (trigger_mode != IOAPIC_LEVEL_TRIG)
> >> >>                         continue;
> >> >>
> >> >>                 ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
> >> >> @@ -473,10 +472,15 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
> >> >>         }
> >> >>  }
> >> >>
> >> >> +/* This should only be triggered by APIC EOI broadcast */
> >> >>  void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
> >> >>  {
> >> >>         struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
> >> >>
> >> >> +       /* If we'll be using direct EOI, skip broadcast */
> >> >> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> >> >> +               return;
> >> >> +
> >> >
> >> > I've only seen the direct EOI sent for level irqs so I'm afraid that
> >> > __kvm_ioapic_update_eoi needs to run for edge-triggered even if the
> >> > APIC_SPIV_DIRECTED_EOI flag is set.
> >
> > Yes, if without your patch, it is problematic. But if with your patch,
> > __kvm_ioapic_update_eoi() will be called in ioapic mmio write then.
> 
> No, there's no ioapic mmio write for edge-triggered interrupts, at
> least Windows don't do any. Edge interrupts must still be handled on
> the lapic EOI path.

I see. Yes it may break that. Though I don't know in what case someone
would register to this IOAPIC eoi message if it's edge-triggered...

> 
> >> >
> >> > Other than that it looks reasonable.
> >>
> >> Although, wait, what if the guest uses APIC_SPIV_DIRECTED_EOI to
> >> suppress the broadcast but then does EOI by writing to the IOAPIC
> >> routing entry? You kind of indicated that this would be a valid use of
> >> the feature.
> >
> > That's exactly how I understand it. :)
> >
> >> This is what __eoi_ioapic_pin does for version<0x20 and
> >> on the host side we reset the remote_irr in ioapic_write_indirect if
> >> I'm reading the code correctly. Wouldn't we want to deliver the
> >> notification via kvm_notify_acked_irq in this case also?
> >
> > I think in that case (EOI sent via "direct EOI" of ioapic mmio
> > register) the remote_irr is not cleaned in ioapic_write_indirect(),
> > but it's cleaned in __kvm_ioapic_update_eoi() as well. It has the
> > line:
> >
> >                 ent->fields.remote_irr = 0;
> >
> > Right?
> 
> Right, but I meant the other case, a guest that sets
> APIC_SPIV_DIRECTED_EOI but doesn't write to IOAPIC_REG_EOI. With your
> patch there would be no kvm_notify_acked_irq and that doesn't look
> correct.

For level triggered irq, if it sets APIC_SPIV_DIRECTED_EOI but doesn't
write IOAPIC_REG_EOI, it should be a guest bug, right?

Thinking about the migration issue that Radim has mentioned, I agree
maybe we can consider just revert the suppress eoi broadcast. Actually
now I start to wonder why direct EOI is introduced. I think it should
be for performance's sake on systems has lots of ioapics? But looks
like that's normally not the case, at least for kvm and most general
machines, which only has one ioapic.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-13  8:32               ` Peter Xu
@ 2017-04-13 14:09                 ` Radim Krcmar
  2017-04-13 15:11                   ` Ladi Prosek
  2017-04-14  5:28                   ` Paolo Bonzini
  0 siblings, 2 replies; 16+ messages in thread
From: Radim Krcmar @ 2017-04-13 14:09 UTC (permalink / raw)
  To: Peter Xu; +Cc: Ladi Prosek, KVM list

2017-04-13 16:32+0800, Peter Xu:
> On Wed, Apr 12, 2017 at 02:20:12PM +0200, Ladi Prosek wrote:
> > On Wed, Apr 12, 2017 at 1:57 PM, Peter Xu <peterx@redhat.com> wrote:
> > > On Wed, Apr 12, 2017 at 12:28:25PM +0200, Ladi Prosek wrote:
> > >> On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek <lprosek@redhat.com> wrote:
> > >> > On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@redhat.com> wrote:
> > >> >>  void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
> > >> >>  {
> > >> >>         struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
> > >> >>
> > >> >> +       /* If we'll be using direct EOI, skip broadcast */
> > >> >> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> > >> >> +               return;
> > >> >> +
> > >> >
> > >> > I've only seen the direct EOI sent for level irqs so I'm afraid that
> > >> > __kvm_ioapic_update_eoi needs to run for edge-triggered even if the
> > >> > APIC_SPIV_DIRECTED_EOI flag is set.
> > >
> > > Yes, if without your patch, it is problematic. But if with your patch,
> > > __kvm_ioapic_update_eoi() will be called in ioapic mmio write then.
> > 
> > No, there's no ioapic mmio write for edge-triggered interrupts, at
> > least Windows don't do any. Edge interrupts must still be handled on
> > the lapic EOI path.
> 
> I see. Yes it may break that. Though I don't know in what case someone
> would register to this IOAPIC eoi message if it's edge-triggered...

I don't know of any sane case ...

KVM PIT (arch/x86/kvm/i8254.c) uses it to keep track of lost edge
interrupts so it could re-inject them.
Windows needed it for reasonable timeflow, IIRC.

The userspace probably uses it for similar reasons.

We have to keep the edge notifier working. :(

> > >> This is what __eoi_ioapic_pin does for version<0x20 and
> > >> on the host side we reset the remote_irr in ioapic_write_indirect if
> > >> I'm reading the code correctly. Wouldn't we want to deliver the
> > >> notification via kvm_notify_acked_irq in this case also?
> > >
> > > I think in that case (EOI sent via "direct EOI" of ioapic mmio
> > > register) the remote_irr is not cleaned in ioapic_write_indirect(),
> > > but it's cleaned in __kvm_ioapic_update_eoi() as well. It has the
> > > line:
> > >
> > >                 ent->fields.remote_irr = 0;
> > >
> > > Right?
> > 
> > Right, but I meant the other case, a guest that sets
> > APIC_SPIV_DIRECTED_EOI but doesn't write to IOAPIC_REG_EOI. With your
> > patch there would be no kvm_notify_acked_irq and that doesn't look
> > correct.
> 
> For level triggered irq, if it sets APIC_SPIV_DIRECTED_EOI but doesn't
> write IOAPIC_REG_EOI, it should be a guest bug, right?

Yes.

> Thinking about the migration issue that Radim has mentioned, I agree
> maybe we can consider just revert the suppress eoi broadcast. Actually
> now I start to wonder why direct EOI is introduced. I think it should
> be for performance's sake on systems has lots of ioapics? But looks
> like that's normally not the case, at least for kvm and most general
> machines, which only has one ioapic.

That is what I thought as well.

I am wondering why Windows uses it -- maybe they remap vector numbers
and therefore need to EOI a different vector?

The feature might have been introduced because some OS assumed presence
of the feature if x2APIC was available ...
Right, reverting it is pretty safe, because it was broken before. :)
(And we ignore the SPIV completely when forwarding EOI to userspace.)

Does Windows 2016 work with this patch?

---8<---
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index bdff437acbcb..6a1a94d63c52 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -442,8 +442,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
 		kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
 		spin_lock(&ioapic->lock);
 
-		if (trigger_mode != IOAPIC_LEVEL_TRIG ||
-		    kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
+		if (trigger_mode != IOAPIC_LEVEL_TRIG)
 			continue;
 
 		ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index bad6a25067bc..20fdd93d7dd4 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -319,8 +319,6 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
 		return;
 
 	feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0);
-	if (feat && (feat->ecx & (1 << (X86_FEATURE_X2APIC & 31))))
-		v |= APIC_LVR_DIRECTED_EOI;
 	kvm_lapic_set_reg(apic, APIC_LVR, v);
 }
 
@@ -1636,8 +1634,6 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
 
 	case APIC_SPIV: {
 		u32 mask = 0x3ff;
-		if (kvm_lapic_get_reg(apic, APIC_LVR) & APIC_LVR_DIRECTED_EOI)
-			mask |= APIC_SPIV_DIRECTED_EOI;
 		apic_set_spiv(apic, val & mask);
 		if (!(val & APIC_SPIV_APIC_ENABLED)) {
 			int i;

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-13  6:36         ` Ladi Prosek
@ 2017-04-13 14:15           ` Radim Krcmar
  2017-04-14  5:27           ` Paolo Bonzini
  1 sibling, 0 replies; 16+ messages in thread
From: Radim Krcmar @ 2017-04-13 14:15 UTC (permalink / raw)
  To: Ladi Prosek; +Cc: Peter Xu, KVM list

2017-04-13 08:36+0200, Ladi Prosek:
> On Wed, Apr 12, 2017 at 10:52 PM, Radim Krcmar <rkrcmar@redhat.com> wrote:
>> 2017-04-12 17:06+0800, Peter Xu:
>>> On Wed, Apr 12, 2017 at 09:36:58AM +0200, Ladi Prosek wrote:
>>> > On Wed, Apr 12, 2017 at 8:40 AM, Peter Xu <peterx@redhat.com> wrote:
>>> > > On Tue, Apr 11, 2017 at 04:11:15PM +0200, Ladi Prosek wrote:
>>> > >> If the guest takes advantage of the directed EOI feature by setting
>>> > >> APIC_SPIV_DIRECTED_EOI, it is expected to signal EOI by writing to
>>> > >> the EOI register of the respective IOAPIC.
>>> > >>
>>> > >> From Intel's x2APIC Specification:
>>> > >> "following the EOI to the local x2APIC unit for a level triggered
>>> > >> interrupt, perform a directed EOI to the IOxAPIC generating the
>>> > >> interrupt by writing to its EOI register."
>>> > >>
>>> > >> Commit fc61b800f950 ("KVM: Add Directed EOI support to APIC emulation")
>>> > >> inhibited EOI on LAPIC EOI register write but didn't add the IOAPIC
>>> > >> part. IOAPIC_REG_EOI writes were handled only on IA64 and the code
>>> > >> was later removed with the rest of IA64 support.
>>> > >>
>>> > >> The bug has gone undetected for a long time because Linux writes to
>>> > >> IOAPIC_REG_EOI only if the IOAPIC version is >=0x20. Windows doesn't
>>> > >> seem to perform such a check.
>>> > >
>>> > > Hi, Ladi,
>>> >
>>> > Hi Peter,
>>> >
>>> > > Not sure I'm understanding it correctly... I see "direct EOI" is a
>>> > > feature for IOAPIC version 0x20, while "suppress EOI-broadcast" is
>>> > > another feature for APIC. They are not the same feature, so it may not
>>> > > be required to have them all together. IIUC current x86 kvm is just
>>> > > the case - it supports EOI broadcast suppression on APIC, but it does
>>> > > not support direct EOI on kernel IOAPIC.
>>> >
>>> > Thanks, that makes perfect sense and explains why Linux behaves the
>>> > way it does (__eoi_ioapic_pin in arch/x86/kernel/apic/io_apic.c).
>>> >
>>> > This document makes it look like suppress EOI-broadcast always implies
>>> > directed EOI, though:
>>> >
>>> > http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
>>> >
>>> > NB "The support for Directed EOI capability can be detected by means
>>> > of bit 24 in the Local APIC Version Register. "
>>> >
>>> > There is no mention of APIC version or any other detection mechanism
>>> > for directed EOI. Maybe the chip being x2APIC implies version >= 0x20
>>> > but I don't see that in the document either.
>>> >
>>> > I suspect that Microsoft implemented EOI by following this same spec.
>>> > Level-triggered interrupts don't work right on Windows Server 2016
>>> > with Hyper-V enabled without this patch.
>>>
>>> Yes, the documents for IOAPIC is always hard to find, at least for
>>> me...
>>>
>>> There is some pages mentioned IOAPIC in ICH9 manual on chap 13.5 here:
>>> http://www.intel.com/content/dam/doc/datasheet/io-controller-hub-9-datasheet.pdf
>>>
>>> However I see nothing related to how the IOAPIC version is defined. In
>>> that sense, the comment above __eoi_ioapic_pin() seems to be better. :)
>>
>> Yeah, it is officially described in ICH9 datasheet as:
>>
>>   13.5.6 VER—Version Register (LPC I/F—D31:F0)
>>   Default Value: 00170020h
>>
>> The one we emulate in KVM is in 82093AA datasheet:
>>
>>   3.2.2.  IOAPICVER—IOAPIC VERSION REGISTER
>>   Default Value: 00170011h
>>
>> The former has the EOI register, the latter doesn't.
> 
> Got it. Do you want me to resubmit with a different commit message and
> a comment blaming the guest OS instead of calling it a KVM bug? :) Or
> do you think it's worth exploring Peter's suggestion to make a more
> invasive fix?

I would experiment with reverting the APIC suppress-EOI-broadcast first.
It doesn't currently work well and is pointless in theory.

>> ---
>> I don't like the idea behind the patch, but it is acceptable and
>> thinking about good solutions gets us into compatibility nightmares ...
>> (We could remove support for directed EOI, because it is a detectable
>>  feature and makes little sense in KVM, or we could implement the IOAPIC
>>  version 0x20, but both would be tricky to migrate.)
>>
>> People should switch to userspace IOAPIC anyway. :)
> 
> For the record, QEMU with kernel-irqchip=split works fine as it
> emulates version 0x20 with the IOAPIC_EOI register by default.

Great, thanks.

> kernel-irqchip=off does not seem to work, I will look into it.

Well, good luck. :)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-13 14:09                 ` Radim Krcmar
@ 2017-04-13 15:11                   ` Ladi Prosek
  2017-04-14  5:28                   ` Paolo Bonzini
  1 sibling, 0 replies; 16+ messages in thread
From: Ladi Prosek @ 2017-04-13 15:11 UTC (permalink / raw)
  To: Radim Krcmar; +Cc: Peter Xu, KVM list

On Thu, Apr 13, 2017 at 4:09 PM, Radim Krcmar <rkrcmar@redhat.com> wrote:
> 2017-04-13 16:32+0800, Peter Xu:
>> On Wed, Apr 12, 2017 at 02:20:12PM +0200, Ladi Prosek wrote:
>> > On Wed, Apr 12, 2017 at 1:57 PM, Peter Xu <peterx@redhat.com> wrote:
>> > > On Wed, Apr 12, 2017 at 12:28:25PM +0200, Ladi Prosek wrote:
>> > >> On Wed, Apr 12, 2017 at 11:37 AM, Ladi Prosek <lprosek@redhat.com> wrote:
>> > >> > On Wed, Apr 12, 2017 at 11:06 AM, Peter Xu <peterx@redhat.com> wrote:
>> > >> >>  void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector, int trigger_mode)
>> > >> >>  {
>> > >> >>         struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic;
>> > >> >>
>> > >> >> +       /* If we'll be using direct EOI, skip broadcast */
>> > >> >> +       if (kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>> > >> >> +               return;
>> > >> >> +
>> > >> >
>> > >> > I've only seen the direct EOI sent for level irqs so I'm afraid that
>> > >> > __kvm_ioapic_update_eoi needs to run for edge-triggered even if the
>> > >> > APIC_SPIV_DIRECTED_EOI flag is set.
>> > >
>> > > Yes, if without your patch, it is problematic. But if with your patch,
>> > > __kvm_ioapic_update_eoi() will be called in ioapic mmio write then.
>> >
>> > No, there's no ioapic mmio write for edge-triggered interrupts, at
>> > least Windows don't do any. Edge interrupts must still be handled on
>> > the lapic EOI path.
>>
>> I see. Yes it may break that. Though I don't know in what case someone
>> would register to this IOAPIC eoi message if it's edge-triggered...
>
> I don't know of any sane case ...
>
> KVM PIT (arch/x86/kvm/i8254.c) uses it to keep track of lost edge
> interrupts so it could re-inject them.
> Windows needed it for reasonable timeflow, IIRC.
>
> The userspace probably uses it for similar reasons.
>
> We have to keep the edge notifier working. :(
>
>> > >> This is what __eoi_ioapic_pin does for version<0x20 and
>> > >> on the host side we reset the remote_irr in ioapic_write_indirect if
>> > >> I'm reading the code correctly. Wouldn't we want to deliver the
>> > >> notification via kvm_notify_acked_irq in this case also?
>> > >
>> > > I think in that case (EOI sent via "direct EOI" of ioapic mmio
>> > > register) the remote_irr is not cleaned in ioapic_write_indirect(),
>> > > but it's cleaned in __kvm_ioapic_update_eoi() as well. It has the
>> > > line:
>> > >
>> > >                 ent->fields.remote_irr = 0;
>> > >
>> > > Right?
>> >
>> > Right, but I meant the other case, a guest that sets
>> > APIC_SPIV_DIRECTED_EOI but doesn't write to IOAPIC_REG_EOI. With your
>> > patch there would be no kvm_notify_acked_irq and that doesn't look
>> > correct.
>>
>> For level triggered irq, if it sets APIC_SPIV_DIRECTED_EOI but doesn't
>> write IOAPIC_REG_EOI, it should be a guest bug, right?
>
> Yes.
>
>> Thinking about the migration issue that Radim has mentioned, I agree
>> maybe we can consider just revert the suppress eoi broadcast. Actually
>> now I start to wonder why direct EOI is introduced. I think it should
>> be for performance's sake on systems has lots of ioapics? But looks
>> like that's normally not the case, at least for kvm and most general
>> machines, which only has one ioapic.
>
> That is what I thought as well.
>
> I am wondering why Windows uses it -- maybe they remap vector numbers
> and therefore need to EOI a different vector?

Maybe it has something to do with how they assign devices to the root
Hyper-V partition. I've seen the issue only with NICs, if it helps.

> The feature might have been introduced because some OS assumed presence
> of the feature if x2APIC was available ...
> Right, reverting it is pretty safe, because it was broken before. :)
> (And we ignore the SPIV completely when forwarding EOI to userspace.)
>
> Does Windows 2016 work with this patch?

Yes, this patch works.

> ---8<---
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index bdff437acbcb..6a1a94d63c52 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -442,8 +442,7 @@ static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu,
>                 kvm_notify_acked_irq(ioapic->kvm, KVM_IRQCHIP_IOAPIC, i);
>                 spin_lock(&ioapic->lock);
>
> -               if (trigger_mode != IOAPIC_LEVEL_TRIG ||
> -                   kvm_lapic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
> +               if (trigger_mode != IOAPIC_LEVEL_TRIG)
>                         continue;
>
>                 ASSERT(ent->fields.trig_mode == IOAPIC_LEVEL_TRIG);
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index bad6a25067bc..20fdd93d7dd4 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -319,8 +319,6 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
>                 return;
>
>         feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0);
> -       if (feat && (feat->ecx & (1 << (X86_FEATURE_X2APIC & 31))))
> -               v |= APIC_LVR_DIRECTED_EOI;
>         kvm_lapic_set_reg(apic, APIC_LVR, v);
>  }
>
> @@ -1636,8 +1634,6 @@ int kvm_lapic_reg_write(struct kvm_lapic *apic, u32 reg, u32 val)
>
>         case APIC_SPIV: {
>                 u32 mask = 0x3ff;
> -               if (kvm_lapic_get_reg(apic, APIC_LVR) & APIC_LVR_DIRECTED_EOI)
> -                       mask |= APIC_SPIV_DIRECTED_EOI;
>                 apic_set_spiv(apic, val & mask);
>                 if (!(val & APIC_SPIV_APIC_ENABLED)) {
>                         int i;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-13  6:36         ` Ladi Prosek
  2017-04-13 14:15           ` Radim Krcmar
@ 2017-04-14  5:27           ` Paolo Bonzini
  1 sibling, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2017-04-14  5:27 UTC (permalink / raw)
  To: Ladi Prosek, Radim Krcmar; +Cc: Peter Xu, KVM list



On 13/04/2017 14:36, Ladi Prosek wrote:
> For the record, QEMU with kernel-irqchip=split works fine as it
> emulates version 0x20 with the IOAPIC_EOI register by default.
> kernel-irqchip=off does not seem to work, I will look into it.

There's a lot more breakage to kernel-irqchip=off, and it's more or less
on life support.  Don't waste your time on it.

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support
  2017-04-13 14:09                 ` Radim Krcmar
  2017-04-13 15:11                   ` Ladi Prosek
@ 2017-04-14  5:28                   ` Paolo Bonzini
  1 sibling, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2017-04-14  5:28 UTC (permalink / raw)
  To: Radim Krcmar, Peter Xu; +Cc: Ladi Prosek, KVM list



On 13/04/2017 22:09, Radim Krcmar wrote:
> KVM PIT (arch/x86/kvm/i8254.c) uses it to keep track of lost edge
> interrupts so it could re-inject them.
> Windows needed it for reasonable timeflow, IIRC.

This is for old Linux.

Windows needs the RTC tracking, which also uses the irq ack notifier.

> The feature might have been introduced because some OS assumed presence
> of the feature if x2APIC was available ...
> Right, reverting it is pretty safe, because it was broken before. 
> (And we ignore the SPIV completely when forwarding EOI to userspace.)

Yeah, that's the important part.  I'm a bit worried of the consequences
for migration, but AIUI you think it's fine?  (That is, too many EOIs
are not a problem, only too few are).

Paolo

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-04-14  5:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-11 14:11 [PATCH] KVM: x86: implement IOAPIC_REG_EOI for directed EOI support Ladi Prosek
2017-04-12  6:40 ` Peter Xu
2017-04-12  7:36   ` Ladi Prosek
2017-04-12  9:06     ` Peter Xu
2017-04-12  9:37       ` Ladi Prosek
2017-04-12 10:28         ` Ladi Prosek
2017-04-12 11:57           ` Peter Xu
2017-04-12 12:20             ` Ladi Prosek
2017-04-13  8:32               ` Peter Xu
2017-04-13 14:09                 ` Radim Krcmar
2017-04-13 15:11                   ` Ladi Prosek
2017-04-14  5:28                   ` Paolo Bonzini
2017-04-12 20:52       ` Radim Krcmar
2017-04-13  6:36         ` Ladi Prosek
2017-04-13 14:15           ` Radim Krcmar
2017-04-14  5:27           ` Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.