qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH qemu] spapr_pci: Disable IRQFD resampling on XIVE
@ 2022-04-27  4:36 Alexey Kardashevskiy
  2022-04-27  7:36 ` Cédric Le Goater
  0 siblings, 1 reply; 6+ messages in thread
From: Alexey Kardashevskiy @ 2022-04-27  4:36 UTC (permalink / raw)
  To: qemu-ppc
  Cc: Alex Williamson, Alexey Kardashevskiy, qemu-devel,
	Timothy Pearson, Cédric Le Goater, Frederic Barrat,
	David Gibson

VFIO-PCI has an "KVM_IRQFD_FLAG_RESAMPLE" optimization for INTx EOI
handling when KVM can unmask PCI INTx (level triggered interrupt) without
switching to the userspace (==QEMU).

Unfortunately XIVE does not support level interrupts, QEMU emulates them
and therefore there is no existing code path to kick the resamplefd.
The problem appears when passing through a PCI adapter with
the "pci=nomsi" kernel parameter - the adapter's interrupt interrupt
count in /proc/interrupts will stuck at "1".

This disables resampler when the XIVE interrupt controller is configured.
This should not be very visible though KVM already exits to QEMU for INTx
and XIVE-capable boxes (POWER9 and newer) do not seem to have
performance-critical INTx-only capable devices.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---


Cédric, this is what I meant when I said that spapr_pci.c was unaware of
the interrupt controller type, neither xics nor xive was mentioned
in the file before.


---
 hw/ppc/spapr_pci.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
index 5bfd4aa9e5aa..2675052601db 100644
--- a/hw/ppc/spapr_pci.c
+++ b/hw/ppc/spapr_pci.c
@@ -729,11 +729,19 @@ static void pci_spapr_set_irq(void *opaque, int irq_num, int level)
 
 static PCIINTxRoute spapr_route_intx_pin_to_irq(void *opaque, int pin)
 {
+    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
     SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(opaque);
-    PCIINTxRoute route;
+    PCIINTxRoute route = { .mode = PCI_INTX_DISABLED };
 
-    route.mode = PCI_INTX_ENABLED;
-    route.irq = sphb->lsi_table[pin].irq;
+    /*
+     * Disable IRQFD resampler on XIVE as it does not support LSI and QEMU
+     * emulates those so the KVM kernel resamplefd kick is skipped and EOI
+     * is not delivered to VFIO-PCI.
+     */
+    if (!spapr->xive) {
+        route.mode = PCI_INTX_ENABLED;
+        route.irq = sphb->lsi_table[pin].irq;
+    }
 
     return route;
 }
-- 
2.30.2



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH qemu] spapr_pci: Disable IRQFD resampling on XIVE
  2022-04-27  4:36 [PATCH qemu] spapr_pci: Disable IRQFD resampling on XIVE Alexey Kardashevskiy
@ 2022-04-27  7:36 ` Cédric Le Goater
  2022-04-28  5:32   ` Alexey Kardashevskiy
  0 siblings, 1 reply; 6+ messages in thread
From: Cédric Le Goater @ 2022-04-27  7:36 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-ppc
  Cc: Frederic Barrat, Timothy Pearson, Alex Williamson, qemu-devel,
	David Gibson

Hello Alexey,

On 4/27/22 06:36, Alexey Kardashevskiy wrote:
> VFIO-PCI has an "KVM_IRQFD_FLAG_RESAMPLE" optimization for INTx EOI
> handling when KVM can unmask PCI INTx (level triggered interrupt) without
> switching to the userspace (==QEMU).
> 
> Unfortunately XIVE does not support level interrupts, 

That's not correctly phrased I think.

The QEMU XIVE device support LSIs but the POWER9 kernel-irqchips,
KVM XICS-on-XIVE and XIVE native devices, are broken with respect
to passthrough adapters using INTx.


> QEMU emulates them
> and therefore there is no existing code path to kick the resamplefd.
> The problem appears when passing through a PCI adapter with
> the "pci=nomsi" kernel parameter - the adapter's interrupt interrupt
> count in /proc/interrupts will stuck at "1".
> 
> This disables resampler when the XIVE interrupt controller is configured.
> This should not be very visible though KVM already exits to QEMU for INTx
> and XIVE-capable boxes (POWER9 and newer) do not seem to have
> performance-critical INTx-only capable devices.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> 
> 
> Cédric, this is what I meant when I said that spapr_pci.c was unaware of
> the interrupt controller type, neither xics nor xive was mentioned
> in the file before.
> 
> 
> ---
>   hw/ppc/spapr_pci.c | 14 +++++++++++---
>   1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
> index 5bfd4aa9e5aa..2675052601db 100644
> --- a/hw/ppc/spapr_pci.c
> +++ b/hw/ppc/spapr_pci.c
> @@ -729,11 +729,19 @@ static void pci_spapr_set_irq(void *opaque, int irq_num, int level)
>   
>   static PCIINTxRoute spapr_route_intx_pin_to_irq(void *opaque, int pin)
>   {
> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>       SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(opaque);
> -    PCIINTxRoute route;
> +    PCIINTxRoute route = { .mode = PCI_INTX_DISABLED };
>   
> -    route.mode = PCI_INTX_ENABLED;
> -    route.irq = sphb->lsi_table[pin].irq;
> +    /*
> +     * Disable IRQFD resampler on XIVE as it does not support LSI and QEMU
> +     * emulates those so the KVM kernel resamplefd kick is skipped and EOI
> +     * is not delivered to VFIO-PCI.
> +     */
> +    if (!spapr->xive) {

This is testing the availability of the XIVE interrupt mode, but not
the activate controller. See spapr_irq_init() which is called very
early in the machine initialization.

Is that what we want ? Is everything fine if we start the machine with
ic-mode=xics ? On a POWER9 host, this would use the KVM XICS-on-XIVE
device which is broken also AFAICT.

You should extend the SpaprInterruptControllerClass (for a routine) or
simply SpaprIrq (for a bool) if you need to handle IRQ matters from a
device model.

Thanks,

C.


> +        route.mode = PCI_INTX_ENABLED;
> +        route.irq = sphb->lsi_table[pin].irq;
> +    }
>   
>       return route;
>   }


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH qemu] spapr_pci: Disable IRQFD resampling on XIVE
  2022-04-27  7:36 ` Cédric Le Goater
@ 2022-04-28  5:32   ` Alexey Kardashevskiy
  2022-04-28  6:25     ` Cédric Le Goater
  0 siblings, 1 reply; 6+ messages in thread
From: Alexey Kardashevskiy @ 2022-04-28  5:32 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc
  Cc: Frederic Barrat, Timothy Pearson, Alex Williamson, qemu-devel,
	David Gibson



On 4/27/22 17:36, Cédric Le Goater wrote:
> Hello Alexey,
> 
> On 4/27/22 06:36, Alexey Kardashevskiy wrote:
>> VFIO-PCI has an "KVM_IRQFD_FLAG_RESAMPLE" optimization for INTx EOI
>> handling when KVM can unmask PCI INTx (level triggered interrupt) without
>> switching to the userspace (==QEMU).
>>
>> Unfortunately XIVE does not support level interrupts, 
> 
> That's not correctly phrased I think.


My bad, I meant "XIVE hardware".

> 
> The QEMU XIVE device support LSIs but the POWER9 kernel-irqchips,
> KVM XICS-on-XIVE and XIVE native devices, are broken with respect
> to passthrough adapters using INTx.
> 
> 
>> QEMU emulates them
>> and therefore there is no existing code path to kick the resamplefd.
>> The problem appears when passing through a PCI adapter with
>> the "pci=nomsi" kernel parameter - the adapter's interrupt interrupt
>> count in /proc/interrupts will stuck at "1".
>>
>> This disables resampler when the XIVE interrupt controller is configured.
>> This should not be very visible though KVM already exits to QEMU for INTx
>> and XIVE-capable boxes (POWER9 and newer) do not seem to have
>> performance-critical INTx-only capable devices.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>>
>>
>> Cédric, this is what I meant when I said that spapr_pci.c was unaware of
>> the interrupt controller type, neither xics nor xive was mentioned
>> in the file before.
>>
>>
>> ---
>>   hw/ppc/spapr_pci.c | 14 +++++++++++---
>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>> index 5bfd4aa9e5aa..2675052601db 100644
>> --- a/hw/ppc/spapr_pci.c
>> +++ b/hw/ppc/spapr_pci.c
>> @@ -729,11 +729,19 @@ static void pci_spapr_set_irq(void *opaque, int 
>> irq_num, int level)
>>   static PCIINTxRoute spapr_route_intx_pin_to_irq(void *opaque, int pin)
>>   {
>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>       SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(opaque);
>> -    PCIINTxRoute route;
>> +    PCIINTxRoute route = { .mode = PCI_INTX_DISABLED };
>> -    route.mode = PCI_INTX_ENABLED;
>> -    route.irq = sphb->lsi_table[pin].irq;
>> +    /*
>> +     * Disable IRQFD resampler on XIVE as it does not support LSI and 
>> QEMU
>> +     * emulates those so the KVM kernel resamplefd kick is skipped 
>> and EOI
>> +     * is not delivered to VFIO-PCI.
>> +     */
>> +    if (!spapr->xive) {
> 
> This is testing the availability of the XIVE interrupt mode, but not
> the activate controller. See spapr_irq_init() which is called very
> early in the machine initialization.
> 
> Is that what we want ? Is everything fine if we start the machine with
> ic-mode=xics ? On a POWER9 host, this would use the KVM XICS-on-XIVE
> device which is broken also AFAICT.

I should probably fix that in KVM, just not quite sure yet how for the 
realmode handlers, or just drop those on P9 and then the fix is trivial.


> You should extend the SpaprInterruptControllerClass (for a routine) or
> simply SpaprIrq (for a bool) if you need to handle IRQ matters from a
> device model.

It is a property of KVM rather than the interrupt controller so it 
probably makes more sense to just stop advertising 
KVM_CAP_IRQFD_RESAMPLE. Hmmm...


> 
> Thanks,
> 
> C.
> 
> 
>> +        route.mode = PCI_INTX_ENABLED;
>> +        route.irq = sphb->lsi_table[pin].irq;
>> +    }
>>       return route;
>>   }


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH qemu] spapr_pci: Disable IRQFD resampling on XIVE
  2022-04-28  5:32   ` Alexey Kardashevskiy
@ 2022-04-28  6:25     ` Cédric Le Goater
  2022-04-28  7:26       ` Alexey Kardashevskiy
  0 siblings, 1 reply; 6+ messages in thread
From: Cédric Le Goater @ 2022-04-28  6:25 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-ppc
  Cc: Frederic Barrat, Timothy Pearson, Alex Williamson, qemu-devel,
	David Gibson

On 4/28/22 07:32, Alexey Kardashevskiy wrote:
> 
> 
> On 4/27/22 17:36, Cédric Le Goater wrote:
>> Hello Alexey,
>>
>> On 4/27/22 06:36, Alexey Kardashevskiy wrote:
>>> VFIO-PCI has an "KVM_IRQFD_FLAG_RESAMPLE" optimization for INTx EOI
>>> handling when KVM can unmask PCI INTx (level triggered interrupt) without
>>> switching to the userspace (==QEMU).
>>>
>>> Unfortunately XIVE does not support level interrupts, 
>>
>> That's not correctly phrased I think.
> 
> 
> My bad, I meant "XIVE hardware".

ok. It makes more sense.

PSIHB and PHBs have internal latches to maintain the assertion level.
XIVE has none.


> 
>>
>> The QEMU XIVE device support LSIs but the POWER9 kernel-irqchips,
>> KVM XICS-on-XIVE and XIVE native devices, are broken with respect
>> to passthrough adapters using INTx.
>>
>>
>>> QEMU emulates them
>>> and therefore there is no existing code path to kick the resamplefd.
>>> The problem appears when passing through a PCI adapter with
>>> the "pci=nomsi" kernel parameter - the adapter's interrupt interrupt
>>> count in /proc/interrupts will stuck at "1".
>>>
>>> This disables resampler when the XIVE interrupt controller is configured.
>>> This should not be very visible though KVM already exits to QEMU for INTx
>>> and XIVE-capable boxes (POWER9 and newer) do not seem to have
>>> performance-critical INTx-only capable devices.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>
>>>
>>> Cédric, this is what I meant when I said that spapr_pci.c was unaware of
>>> the interrupt controller type, neither xics nor xive was mentioned
>>> in the file before.
>>>
>>>
>>> ---
>>>   hw/ppc/spapr_pci.c | 14 +++++++++++---
>>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>> index 5bfd4aa9e5aa..2675052601db 100644
>>> --- a/hw/ppc/spapr_pci.c
>>> +++ b/hw/ppc/spapr_pci.c
>>> @@ -729,11 +729,19 @@ static void pci_spapr_set_irq(void *opaque, int irq_num, int level)
>>>   static PCIINTxRoute spapr_route_intx_pin_to_irq(void *opaque, int pin)
>>>   {
>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>       SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(opaque);
>>> -    PCIINTxRoute route;
>>> +    PCIINTxRoute route = { .mode = PCI_INTX_DISABLED };
>>> -    route.mode = PCI_INTX_ENABLED;
>>> -    route.irq = sphb->lsi_table[pin].irq;
>>> +    /*
>>> +     * Disable IRQFD resampler on XIVE as it does not support LSI and QEMU
>>> +     * emulates those so the KVM kernel resamplefd kick is skipped and EOI
>>> +     * is not delivered to VFIO-PCI.
>>> +     */
>>> +    if (!spapr->xive) {
>>
>> This is testing the availability of the XIVE interrupt mode, but not
>> the activate controller. See spapr_irq_init() which is called very
>> early in the machine initialization.
>>
>> Is that what we want ? Is everything fine if we start the machine with
>> ic-mode=xics ? On a POWER9 host, this would use the KVM XICS-on-XIVE
>> device which is broken also AFAICT.
> 
> I should probably fix that in KVM, just not quite sure yet how for the realmode handlers, or just drop those on P9 and then the fix is trivial.
> 
> 
>> You should extend the SpaprInterruptControllerClass (for a routine) or
>> simply SpaprIrq (for a bool) if you need to handle IRQ matters from a
>> device model.
> 
> It is a property of KVM rather than the interrupt controller so it probably makes more sense to just stop advertising KVM_CAP_IRQFD_RESAMPLE. Hmmm...

I would fix the realmode handlers of the the KVM XICS-on-XIVE device
first. The problem has been there for a while.

Then, for the XIVE native mode, I would simply handle it at the QEMU
level with a 'resample' bool in SpaprIrq. It  would be tested in spapr
pci when configuring the INTx routing.


Thanks,

C.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH qemu] spapr_pci: Disable IRQFD resampling on XIVE
  2022-04-28  6:25     ` Cédric Le Goater
@ 2022-04-28  7:26       ` Alexey Kardashevskiy
  2022-04-28  7:31         ` Cédric Le Goater
  0 siblings, 1 reply; 6+ messages in thread
From: Alexey Kardashevskiy @ 2022-04-28  7:26 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-ppc
  Cc: Frederic Barrat, Timothy Pearson, Alex Williamson, qemu-devel,
	David Gibson



On 4/28/22 16:25, Cédric Le Goater wrote:
> On 4/28/22 07:32, Alexey Kardashevskiy wrote:
>>
>>
>> On 4/27/22 17:36, Cédric Le Goater wrote:
>>> Hello Alexey,
>>>
>>> On 4/27/22 06:36, Alexey Kardashevskiy wrote:
>>>> VFIO-PCI has an "KVM_IRQFD_FLAG_RESAMPLE" optimization for INTx EOI
>>>> handling when KVM can unmask PCI INTx (level triggered interrupt) 
>>>> without
>>>> switching to the userspace (==QEMU).
>>>>
>>>> Unfortunately XIVE does not support level interrupts, 
>>>
>>> That's not correctly phrased I think.
>>
>>
>> My bad, I meant "XIVE hardware".
> 
> ok. It makes more sense.
> 
> PSIHB and PHBs have internal latches to maintain the assertion level.
> XIVE has none.
> 
> 
>>
>>>
>>> The QEMU XIVE device support LSIs but the POWER9 kernel-irqchips,
>>> KVM XICS-on-XIVE and XIVE native devices, are broken with respect
>>> to passthrough adapters using INTx.
>>>
>>>
>>>> QEMU emulates them
>>>> and therefore there is no existing code path to kick the resamplefd.
>>>> The problem appears when passing through a PCI adapter with
>>>> the "pci=nomsi" kernel parameter - the adapter's interrupt interrupt
>>>> count in /proc/interrupts will stuck at "1".
>>>>
>>>> This disables resampler when the XIVE interrupt controller is 
>>>> configured.
>>>> This should not be very visible though KVM already exits to QEMU for 
>>>> INTx
>>>> and XIVE-capable boxes (POWER9 and newer) do not seem to have
>>>> performance-critical INTx-only capable devices.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>> ---
>>>>
>>>>
>>>> Cédric, this is what I meant when I said that spapr_pci.c was 
>>>> unaware of
>>>> the interrupt controller type, neither xics nor xive was mentioned
>>>> in the file before.
>>>>
>>>>
>>>> ---
>>>>   hw/ppc/spapr_pci.c | 14 +++++++++++---
>>>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>> index 5bfd4aa9e5aa..2675052601db 100644
>>>> --- a/hw/ppc/spapr_pci.c
>>>> +++ b/hw/ppc/spapr_pci.c
>>>> @@ -729,11 +729,19 @@ static void pci_spapr_set_irq(void *opaque, 
>>>> int irq_num, int level)
>>>>   static PCIINTxRoute spapr_route_intx_pin_to_irq(void *opaque, int 
>>>> pin)
>>>>   {
>>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>>       SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(opaque);
>>>> -    PCIINTxRoute route;
>>>> +    PCIINTxRoute route = { .mode = PCI_INTX_DISABLED };
>>>> -    route.mode = PCI_INTX_ENABLED;
>>>> -    route.irq = sphb->lsi_table[pin].irq;
>>>> +    /*
>>>> +     * Disable IRQFD resampler on XIVE as it does not support LSI 
>>>> and QEMU
>>>> +     * emulates those so the KVM kernel resamplefd kick is skipped 
>>>> and EOI
>>>> +     * is not delivered to VFIO-PCI.
>>>> +     */
>>>> +    if (!spapr->xive) {
>>>
>>> This is testing the availability of the XIVE interrupt mode, but not
>>> the activate controller. See spapr_irq_init() which is called very
>>> early in the machine initialization.
>>>
>>> Is that what we want ? Is everything fine if we start the machine with
>>> ic-mode=xics ? On a POWER9 host, this would use the KVM XICS-on-XIVE
>>> device which is broken also AFAICT.
>>
>> I should probably fix that in KVM, just not quite sure yet how for the 
>> realmode handlers, or just drop those on P9 and then the fix is trivial.
>>
>>
>>> You should extend the SpaprInterruptControllerClass (for a routine) or
>>> simply SpaprIrq (for a bool) if you need to handle IRQ matters from a
>>> device model.
>>
>> It is a property of KVM rather than the interrupt controller so it 
>> probably makes more sense to just stop advertising 
>> KVM_CAP_IRQFD_RESAMPLE. Hmmm...
> 
> I would fix the realmode handlers of the the KVM XICS-on-XIVE device
> first. The problem has been there for a while.


Are they really used on POWER9? TCE ones are not.


> Then, for the XIVE native mode, I would simply handle it at the QEMU
> level with a 'resample' bool in SpaprIrq. It  would be tested in spapr
> pci when configuring the INTx routing.


But there is a dedicated CAP advertised by the KVM already which is not 
correct as we know that KVM won't resample.


> 
> 
> Thanks,
> 
> C.
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH qemu] spapr_pci: Disable IRQFD resampling on XIVE
  2022-04-28  7:26       ` Alexey Kardashevskiy
@ 2022-04-28  7:31         ` Cédric Le Goater
  0 siblings, 0 replies; 6+ messages in thread
From: Cédric Le Goater @ 2022-04-28  7:31 UTC (permalink / raw)
  To: Alexey Kardashevskiy, qemu-ppc
  Cc: Frederic Barrat, Timothy Pearson, Alex Williamson, qemu-devel,
	David Gibson

On 4/28/22 09:26, Alexey Kardashevskiy wrote:
> 
> 
> On 4/28/22 16:25, Cédric Le Goater wrote:
>> On 4/28/22 07:32, Alexey Kardashevskiy wrote:
>>>
>>>
>>> On 4/27/22 17:36, Cédric Le Goater wrote:
>>>> Hello Alexey,
>>>>
>>>> On 4/27/22 06:36, Alexey Kardashevskiy wrote:
>>>>> VFIO-PCI has an "KVM_IRQFD_FLAG_RESAMPLE" optimization for INTx EOI
>>>>> handling when KVM can unmask PCI INTx (level triggered interrupt) without
>>>>> switching to the userspace (==QEMU).
>>>>>
>>>>> Unfortunately XIVE does not support level interrupts, 
>>>>
>>>> That's not correctly phrased I think.
>>>
>>>
>>> My bad, I meant "XIVE hardware".
>>
>> ok. It makes more sense.
>>
>> PSIHB and PHBs have internal latches to maintain the assertion level.
>> XIVE has none.
>>
>>
>>>
>>>>
>>>> The QEMU XIVE device support LSIs but the POWER9 kernel-irqchips,
>>>> KVM XICS-on-XIVE and XIVE native devices, are broken with respect
>>>> to passthrough adapters using INTx.
>>>>
>>>>
>>>>> QEMU emulates them
>>>>> and therefore there is no existing code path to kick the resamplefd.
>>>>> The problem appears when passing through a PCI adapter with
>>>>> the "pci=nomsi" kernel parameter - the adapter's interrupt interrupt
>>>>> count in /proc/interrupts will stuck at "1".
>>>>>
>>>>> This disables resampler when the XIVE interrupt controller is configured.
>>>>> This should not be very visible though KVM already exits to QEMU for INTx
>>>>> and XIVE-capable boxes (POWER9 and newer) do not seem to have
>>>>> performance-critical INTx-only capable devices.
>>>>>
>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>>>> ---
>>>>>
>>>>>
>>>>> Cédric, this is what I meant when I said that spapr_pci.c was unaware of
>>>>> the interrupt controller type, neither xics nor xive was mentioned
>>>>> in the file before.
>>>>>
>>>>>
>>>>> ---
>>>>>   hw/ppc/spapr_pci.c | 14 +++++++++++---
>>>>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>>>>
>>>>> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c
>>>>> index 5bfd4aa9e5aa..2675052601db 100644
>>>>> --- a/hw/ppc/spapr_pci.c
>>>>> +++ b/hw/ppc/spapr_pci.c
>>>>> @@ -729,11 +729,19 @@ static void pci_spapr_set_irq(void *opaque, int irq_num, int level)
>>>>>   static PCIINTxRoute spapr_route_intx_pin_to_irq(void *opaque, int pin)
>>>>>   {
>>>>> +    SpaprMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
>>>>>       SpaprPhbState *sphb = SPAPR_PCI_HOST_BRIDGE(opaque);
>>>>> -    PCIINTxRoute route;
>>>>> +    PCIINTxRoute route = { .mode = PCI_INTX_DISABLED };
>>>>> -    route.mode = PCI_INTX_ENABLED;
>>>>> -    route.irq = sphb->lsi_table[pin].irq;
>>>>> +    /*
>>>>> +     * Disable IRQFD resampler on XIVE as it does not support LSI and QEMU
>>>>> +     * emulates those so the KVM kernel resamplefd kick is skipped and EOI
>>>>> +     * is not delivered to VFIO-PCI.
>>>>> +     */
>>>>> +    if (!spapr->xive) {
>>>>
>>>> This is testing the availability of the XIVE interrupt mode, but not
>>>> the activate controller. See spapr_irq_init() which is called very
>>>> early in the machine initialization.
>>>>
>>>> Is that what we want ? Is everything fine if we start the machine with
>>>> ic-mode=xics ? On a POWER9 host, this would use the KVM XICS-on-XIVE
>>>> device which is broken also AFAICT.
>>>
>>> I should probably fix that in KVM, just not quite sure yet how for the realmode handlers, or just drop those on P9 and then the fix is trivial.
>>>
>>>
>>>> You should extend the SpaprInterruptControllerClass (for a routine) or
>>>> simply SpaprIrq (for a bool) if you need to handle IRQ matters from a
>>>> device model.
>>>
>>> It is a property of KVM rather than the interrupt controller so it probably makes more sense to just stop advertising KVM_CAP_IRQFD_RESAMPLE. Hmmm...
>>
>> I would fix the realmode handlers of the the KVM XICS-on-XIVE device
>> first. The problem has been there for a while.
> 
> 
> Are they really used on POWER9? TCE ones are not.

The HCALLs should be.

>> Then, for the XIVE native mode, I would simply handle it at the QEMU
>> level with a 'resample' bool in SpaprIrq. It  would be tested in spapr
>> pci when configuring the INTx routing.
> 
> 
> But there is a dedicated CAP advertised by the KVM already which is not correct as we know that KVM won't resample.

You know more that I do in that area now.

C.

  
> 
>>
>>
>> Thanks,
>>
>> C.
>>



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-28  7:52 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-27  4:36 [PATCH qemu] spapr_pci: Disable IRQFD resampling on XIVE Alexey Kardashevskiy
2022-04-27  7:36 ` Cédric Le Goater
2022-04-28  5:32   ` Alexey Kardashevskiy
2022-04-28  6:25     ` Cédric Le Goater
2022-04-28  7:26       ` Alexey Kardashevskiy
2022-04-28  7:31         ` Cédric Le Goater

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).