[Xenomai-core] BUG on xnintr

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-core] BUG on xnintr_attach
@ 2009-04-22 12:26 Jan Kiszka
  2009-04-22 16:23 ` Philippe Gerum
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kiszka @ 2009-04-22 12:26 UTC (permalink / raw)
  To: xenomai-core

Hi all,

issuing  rtdm_irq_request and, thus, xnintr_attach can trigger a
"I-pipe: Detected stalled topmost domain, probably caused by a bug." if
the interrupt type is MSI:

  [<ffffffff80273cce>] ipipe_check_context+0xe7/0xe9
  [<ffffffff8049dae9>] _spin_lock_irqsave+0x18/0x54
  [<ffffffff8037dcc2>] pci_bus_read_config_dword+0x3c/0x87
  [<ffffffff80387c1d>] read_msi_msg+0x61/0xe1
  [<ffffffff8021c5b8>] ? assign_irq_vector+0x3e/0x49
  [<ffffffff8021d7b2>] set_msi_irq_affinity+0x6d/0xc8
  [<ffffffff8021fa5d>] __ipipe_set_irq_affinity+0x6c/0x77
  [<ffffffff80274231>] ipipe_set_irq_affinity+0x34/0x3d
  [<ffffffff8027c572>] xnintr_attach+0xaa/0x11e

Two option to fix this, but I'm currently undecided which one to go:
 - harden pci_lock (drivers/pci/access.c) - didn't we applied such a
   MSI-related workaround before?
 - move xnarch_set_irq_affinity out of intrlock (but couldn't we face
   even more pci_lock related issues?)

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Xenomai-core] BUG on xnintr_attach
  2009-04-22 12:26 [Xenomai-core] BUG on xnintr_attach Jan Kiszka
@ 2009-04-22 16:23 ` Philippe Gerum
  2009-04-22 16:48   ` Jan Kiszka
  0 siblings, 1 reply; 4+ messages in thread
From: Philippe Gerum @ 2009-04-22 16:23 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

On Wed, 2009-04-22 at 14:26 +0200, Jan Kiszka wrote:
> Hi all,
> 
> issuing  rtdm_irq_request and, thus, xnintr_attach can trigger a
> "I-pipe: Detected stalled topmost domain, probably caused by a bug." if
> the interrupt type is MSI:
> 
>   [<ffffffff80273cce>] ipipe_check_context+0xe7/0xe9
>   [<ffffffff8049dae9>] _spin_lock_irqsave+0x18/0x54
>   [<ffffffff8037dcc2>] pci_bus_read_config_dword+0x3c/0x87
>   [<ffffffff80387c1d>] read_msi_msg+0x61/0xe1
>   [<ffffffff8021c5b8>] ? assign_irq_vector+0x3e/0x49
>   [<ffffffff8021d7b2>] set_msi_irq_affinity+0x6d/0xc8
>   [<ffffffff8021fa5d>] __ipipe_set_irq_affinity+0x6c/0x77
>   [<ffffffff80274231>] ipipe_set_irq_affinity+0x34/0x3d
>   [<ffffffff8027c572>] xnintr_attach+0xaa/0x11e
> 
> Two option to fix this, but I'm currently undecided which one to go:
>  - harden pci_lock (drivers/pci/access.c) - didn't we applied such a
>    MSI-related workaround before?

This did not work as expected: pathological latency spots. This said,
the vanilla code has evolved since I tried this quick hack months ago,
so it may be worth to look at this option once again.

>  - move xnarch_set_irq_affinity out of intrlock (but couldn't we face
>    even more pci_lock related issues?)
> 

Since upstream decided to use PCI config reads even inside hot paths
when processing MSI interrupts, the only sane way would be to make the
locking used there Adeos-aware, likely virtualizing the interrupt mask.
The way upstream generally deals with MSI is currently a problem for us.

> Jan
> 
-- 
Philippe.




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Xenomai-core] BUG on xnintr_attach
  2009-04-22 16:23 ` Philippe Gerum
@ 2009-04-22 16:48   ` Jan Kiszka
  2009-04-23  7:20     ` Jan Kiszka
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Kiszka @ 2009-04-22 16:48 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-core

Philippe Gerum wrote:
> On Wed, 2009-04-22 at 14:26 +0200, Jan Kiszka wrote:
>> Hi all,
>>
>> issuing  rtdm_irq_request and, thus, xnintr_attach can trigger a
>> "I-pipe: Detected stalled topmost domain, probably caused by a bug." if
>> the interrupt type is MSI:
>>
>>   [<ffffffff80273cce>] ipipe_check_context+0xe7/0xe9
>>   [<ffffffff8049dae9>] _spin_lock_irqsave+0x18/0x54
>>   [<ffffffff8037dcc2>] pci_bus_read_config_dword+0x3c/0x87
>>   [<ffffffff80387c1d>] read_msi_msg+0x61/0xe1
>>   [<ffffffff8021c5b8>] ? assign_irq_vector+0x3e/0x49
>>   [<ffffffff8021d7b2>] set_msi_irq_affinity+0x6d/0xc8
>>   [<ffffffff8021fa5d>] __ipipe_set_irq_affinity+0x6c/0x77
>>   [<ffffffff80274231>] ipipe_set_irq_affinity+0x34/0x3d
>>   [<ffffffff8027c572>] xnintr_attach+0xaa/0x11e
>>
>> Two option to fix this, but I'm currently undecided which one to go:
>>  - harden pci_lock (drivers/pci/access.c) - didn't we applied such a
>>    MSI-related workaround before?
> 
> This did not work as expected: pathological latency spots. This said,
> the vanilla code has evolved since I tried this quick hack months ago,
> so it may be worth to look at this option once again.
> 
>>  - move xnarch_set_irq_affinity out of intrlock (but couldn't we face
>>    even more pci_lock related issues?)
>>
> 
> Since upstream decided to use PCI config reads even inside hot paths
> when processing MSI interrupts, the only sane way would be to make the
> locking used there Adeos-aware, likely virtualizing the interrupt mask.
> The way upstream generally deals with MSI is currently a problem for us.

Hmm, guess this needs a closer look again. But I vaguely recall upstream
had removed the config reading at least from the hot paths due to
complaints about performance.

However, I think we go with approach 1 independently. So please pull
this from git://git.xenomai.org/xenomai-jki.git for-upstream

Jan

--------->

Subject: [PATCH] nucleus: Move xnarch_set_irq_affinity out of intrlock

There is no need to hold intrlock while setting the affinity of the
to-be-registered IRQ. Additionally, we have troubles with MSI code that
can use Linux locks from within this service. So move the affinity
adjustment out of the critical section.

Signed-off-by: Jan Kiszka <jan.kiszka@domain.hid>
---

 ksrc/nucleus/intr.c |    5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/ksrc/nucleus/intr.c b/ksrc/nucleus/intr.c
index 870efb1..dba3b26 100644
--- a/ksrc/nucleus/intr.c
+++ b/ksrc/nucleus/intr.c
@@ -721,11 +721,12 @@ int xnintr_attach(xnintr_t *intr, void *cookie)
 	intr->cookie = cookie;
 	memset(&intr->stat, 0, sizeof(intr->stat));
 
-	xnlock_get_irqsave(&intrlock, s);
-
 #ifdef CONFIG_SMP
 	xnarch_set_irq_affinity(intr->irq, nkaffinity);
 #endif /* CONFIG_SMP */
+
+	xnlock_get_irqsave(&intrlock, s);
+
 	err = xnintr_irq_attach(intr);
 
 	if (!err)


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Xenomai-core] BUG on xnintr_attach
  2009-04-22 16:48   ` Jan Kiszka
@ 2009-04-23  7:20     ` Jan Kiszka
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Kiszka @ 2009-04-23  7:20 UTC (permalink / raw)
  Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 2697 bytes --]

Jan Kiszka wrote:
> Philippe Gerum wrote:
>> On Wed, 2009-04-22 at 14:26 +0200, Jan Kiszka wrote:
>>> Hi all,
>>>
>>> issuing  rtdm_irq_request and, thus, xnintr_attach can trigger a
>>> "I-pipe: Detected stalled topmost domain, probably caused by a bug." if
>>> the interrupt type is MSI:
>>>
>>>   [<ffffffff80273cce>] ipipe_check_context+0xe7/0xe9
>>>   [<ffffffff8049dae9>] _spin_lock_irqsave+0x18/0x54
>>>   [<ffffffff8037dcc2>] pci_bus_read_config_dword+0x3c/0x87
>>>   [<ffffffff80387c1d>] read_msi_msg+0x61/0xe1
>>>   [<ffffffff8021c5b8>] ? assign_irq_vector+0x3e/0x49
>>>   [<ffffffff8021d7b2>] set_msi_irq_affinity+0x6d/0xc8
>>>   [<ffffffff8021fa5d>] __ipipe_set_irq_affinity+0x6c/0x77
>>>   [<ffffffff80274231>] ipipe_set_irq_affinity+0x34/0x3d
>>>   [<ffffffff8027c572>] xnintr_attach+0xaa/0x11e
>>>
>>> Two option to fix this, but I'm currently undecided which one to go:
>>>  - harden pci_lock (drivers/pci/access.c) - didn't we applied such a
>>>    MSI-related workaround before?
>> This did not work as expected: pathological latency spots. This said,
>> the vanilla code has evolved since I tried this quick hack months ago,
>> so it may be worth to look at this option once again.
>>
>>>  - move xnarch_set_irq_affinity out of intrlock (but couldn't we face
>>>    even more pci_lock related issues?)
>>>
>> Since upstream decided to use PCI config reads even inside hot paths
>> when processing MSI interrupts, the only sane way would be to make the
>> locking used there Adeos-aware, likely virtualizing the interrupt mask.
>> The way upstream generally deals with MSI is currently a problem for us.
> 
> Hmm, guess this needs a closer look again. But I vaguely recall upstream
> had removed the config reading at least from the hot paths due to
> complaints about performance.

I went through the critical MSI code paths again. Its irqchip comes with
ack, mask/unmask and set_affinity which may be used by Xenomai. The ack
is most important as it runs during early dispatching, but it maps to
ack_apic_edge, thus is clean. The rest is contaminated with PCI config
access.

At least pci_lock is involved for this, depending on the PCI access
method, also pci_config_lock. And then there are other code paths that
call wake_up_all while holding pci_lock. This locks more or less hopeless.

On the other hand, we shouldn't depend on mask/unmask or set_affinity
while in critical Xenomai/I-pipe code paths. The MSI interrupt flow is
the same as for edge-triggered (except that there is even no EIO). That
means we do not mask deferred interrupts. And that means we should be
safe once the affinity setting is fixed.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 257 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-04-23  7:20 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-22 12:26 [Xenomai-core] BUG on xnintr_attach Jan Kiszka
2009-04-22 16:23 ` Philippe Gerum
2009-04-22 16:48   ` Jan Kiszka
2009-04-23  7:20     ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.