All of lore.kernel.org
 help / color / mirror / Atom feed
* Got stacktrace "irq 17: nobody cared" on Intel GalileoGen2 with 4.6.7-rt13
@ 2016-09-22  8:19 Claudius Heine
  2016-09-22  8:24 ` Jan Kiszka
  0 siblings, 1 reply; 9+ messages in thread
From: Claudius Heine @ 2016-09-22  8:19 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Jan Kiszka

Hi everyone,

I am currently trying to get the current Linux 4.6.7-rt13 running on an
Intel QUARK/GalileoGen2 Board and got the same repeating stacktrace 
when booting. This does not happen in the vanilla kernel.

For 4.6.7-rt13 I used this Kernel commit:
f1cf4c43c5ff76957459ea90dc39dcd93a5d0ebd

Also tried the 4.4.19-rt27 and 4.4.20-rt19 with the same result.

Does anyone know what could cause this and how to fix it?

Thanks and have a nice day,
Claudius

Full bootlog: http://pastebin.com/qhSQdHYY
Kernel config: http://pastebin.com/XFKihFr4

Stacktrace:

[    7.165722] irq 17: nobody cared (try booting with the "irqpoll"
option)
[    7.165752] CPU: 0 PID: 82 Comm: irq/17-pxa2xx-s Not tainted 4.6.7-
rt13-yocto-preempt-rt #1
[    7.165762] Hardware name: Intel Corp. QUARK/GalileoGen2, BIOS
0x01000400 01/01/2014
[    7.165804]  cd65db78 cd65db78 cd433f4c c13a0c54 cd433f68 c10888cd
c18829d8 00000011
[    7.165839]  cd65db40 cd65db40 00000011 cd433f8c c1088c4f cd65db40
00000020 cd798a40
[    7.165872]  cd433f84 cd65db40 00000020 00000000 cd433fd8 c10866c3
00000000 00000007
[    7.165877] Call Trace:
[    7.165921]  [<c13a0c54>] dump_stack+0x16/0x22
[    7.165967]  [<c10888cd>] __report_bad_irq.isra.0+0x2d/0x110
[    7.166008]  [<c1088c4f>] note_interrupt+0x22f/0x270
[    7.166051]  [<c10866c3>] handle_irq_event_percpu+0xd3/0x240
[    7.166091]  [<c10878b4>] ? irq_thread+0xc4/0x1b0
[    7.166137]  [<c10894e0>] ? handle_edge_irq+0x170/0x170
[    7.166173]  [<c1086870>] handle_irq_event+0x40/0x90
[    7.166212]  [<c1089570>] handle_fasteoi_irq+0x90/0x190
[    7.166251]  [<c101a6ae>] handle_irq+0x5e/0x70
[    7.166299]  <IRQ>  [<c1755822>] do_IRQ+0x42/0xd0
[    7.166334]  [<c1754f8c>] common_interrupt+0x2c/0x40
[    7.166370]  [<c105043d>] ? __local_bh_enable+0xd/0x70
[    7.166405]  [<c1080000>] ? wakeup_count_show+0x40/0x50
[    7.166443]  [<c10878b4>] ? irq_thread+0xc4/0x1b0
[    7.166485]  [<c1087680>] ? irq_finalize_oneshot.part.1+0x160/0x160
[    7.166521]  [<c1087720>] ? irq_thread_fn+0x40/0x40
[    7.166559]  [<c10877f0>] ? irq_thread_dtor+0xd0/0xd0
[    7.166596]  [<c1066cba>] kthread+0x9a/0xb0
[    7.166636]  [<c1754730>] ret_from_kernel_thread+0x20/0x40
[    7.166666]  [<c1066c20>] ? kthread_worker_fn+0x130/0x130
[    7.166689] handlers:
[    7.166740] [<c10868c0>] irq_default_primary_handler threaded
[<c14d0360>] ssp_int
[    7.166749] Disabling IRQ #17

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Got stacktrace "irq 17: nobody cared" on Intel GalileoGen2 with 4.6.7-rt13
  2016-09-22  8:19 Got stacktrace "irq 17: nobody cared" on Intel GalileoGen2 with 4.6.7-rt13 Claudius Heine
@ 2016-09-22  8:24 ` Jan Kiszka
  2016-09-22 13:49   ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2016-09-22  8:24 UTC (permalink / raw)
  To: linux-rt-users; +Cc: Claudius Heine

On 2016-09-22 10:19, Claudius Heine wrote:
> Hi everyone,
> 
> I am currently trying to get the current Linux 4.6.7-rt13 running on an
> Intel QUARK/GalileoGen2 Board and got the same repeating stacktrace 
> when booting. This does not happen in the vanilla kernel.
> 
> For 4.6.7-rt13 I used this Kernel commit:
> f1cf4c43c5ff76957459ea90dc39dcd93a5d0ebd
> 
> Also tried the 4.4.19-rt27 and 4.4.20-rt19 with the same result.
> 
> Does anyone know what could cause this and how to fix it?
> 
> Thanks and have a nice day,
> Claudius
> 
> Full bootlog: http://pastebin.com/qhSQdHYY
> Kernel config: http://pastebin.com/XFKihFr4
> 
> Stacktrace:
> 
> [    7.165722] irq 17: nobody cared (try booting with the "irqpoll"
> option)
> [    7.165752] CPU: 0 PID: 82 Comm: irq/17-pxa2xx-s Not tainted 4.6.7-
> rt13-yocto-preempt-rt #1
> [    7.165762] Hardware name: Intel Corp. QUARK/GalileoGen2, BIOS
> 0x01000400 01/01/2014
> [    7.165804]  cd65db78 cd65db78 cd433f4c c13a0c54 cd433f68 c10888cd
> c18829d8 00000011
> [    7.165839]  cd65db40 cd65db40 00000011 cd433f8c c1088c4f cd65db40
> 00000020 cd798a40
> [    7.165872]  cd433f84 cd65db40 00000020 00000000 cd433fd8 c10866c3
> 00000000 00000007
> [    7.165877] Call Trace:
> [    7.165921]  [<c13a0c54>] dump_stack+0x16/0x22
> [    7.165967]  [<c10888cd>] __report_bad_irq.isra.0+0x2d/0x110
> [    7.166008]  [<c1088c4f>] note_interrupt+0x22f/0x270
> [    7.166051]  [<c10866c3>] handle_irq_event_percpu+0xd3/0x240
> [    7.166091]  [<c10878b4>] ? irq_thread+0xc4/0x1b0
> [    7.166137]  [<c10894e0>] ? handle_edge_irq+0x170/0x170
> [    7.166173]  [<c1086870>] handle_irq_event+0x40/0x90
> [    7.166212]  [<c1089570>] handle_fasteoi_irq+0x90/0x190
> [    7.166251]  [<c101a6ae>] handle_irq+0x5e/0x70
> [    7.166299]  <IRQ>  [<c1755822>] do_IRQ+0x42/0xd0
> [    7.166334]  [<c1754f8c>] common_interrupt+0x2c/0x40
> [    7.166370]  [<c105043d>] ? __local_bh_enable+0xd/0x70
> [    7.166405]  [<c1080000>] ? wakeup_count_show+0x40/0x50
> [    7.166443]  [<c10878b4>] ? irq_thread+0xc4/0x1b0
> [    7.166485]  [<c1087680>] ? irq_finalize_oneshot.part.1+0x160/0x160
> [    7.166521]  [<c1087720>] ? irq_thread_fn+0x40/0x40
> [    7.166559]  [<c10877f0>] ? irq_thread_dtor+0xd0/0xd0
> [    7.166596]  [<c1066cba>] kthread+0x9a/0xb0
> [    7.166636]  [<c1754730>] ret_from_kernel_thread+0x20/0x40
> [    7.166666]  [<c1066c20>] ? kthread_worker_fn+0x130/0x130
> [    7.166689] handlers:
> [    7.166740] [<c10868c0>] irq_default_primary_handler threaded
> [<c14d0360>] ssp_int
> [    7.166749] Disabling IRQ #17
> 

One theory I was thinking about: Could - for whatever reason - disabling
of the interrupt line from the primary handler be broken / work
unreliably and cause this storm?

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Got stacktrace "irq 17: nobody cared" on Intel GalileoGen2 with 4.6.7-rt13
  2016-09-22  8:24 ` Jan Kiszka
@ 2016-09-22 13:49   ` Sebastian Andrzej Siewior
  2016-09-22 14:16     ` Claudius Heine
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-22 13:49 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: linux-rt-users, Claudius Heine

On 2016-09-22 10:24:26 [+0200], Jan Kiszka wrote:
> One theory I was thinking about: Could - for whatever reason - disabling
> of the interrupt line from the primary handler be broken / work
> unreliably and cause this storm?

It seems so. If you look at the complete boot log then the core code
attempts multiple times to disable IRQ #17 with zero success.
I would assume that booting an unpatched kernel with the threadirqs
option would give the same result.

> Jan

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Got stacktrace "irq 17: nobody cared" on Intel GalileoGen2 with 4.6.7-rt13
  2016-09-22 13:49   ` Sebastian Andrzej Siewior
@ 2016-09-22 14:16     ` Claudius Heine
  2016-09-22 14:41       ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Claudius Heine @ 2016-09-22 14:16 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Jan Kiszka; +Cc: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 987 bytes --]

On Thu, 2016-09-22 at 15:49 +0200, Sebastian Andrzej Siewior wrote:
> On 2016-09-22 10:24:26 [+0200], Jan Kiszka wrote:
> > 
> > One theory I was thinking about: Could - for whatever reason -
> > disabling
> > of the interrupt line from the primary handler be broken / work
> > unreliably and cause this storm?
> 
> It seems so. If you look at the complete boot log then the core code
> attempts multiple times to disable IRQ #17 with zero success.
> I would assume that booting an unpatched kernel with the threadirqs
> option would give the same result.

Thanks for your help, but I am sorry to disappoint:

Full bootlog: http://pastebin.com/8dHz0LLq
Kernel config: http://pastebin.com/itPLjpjS
ps: http://pastebin.com/D4N0DXae

Claudius

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-54 Fax: (+49)-8142-66989-80 Email: ch@denx.de

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Got stacktrace "irq 17: nobody cared" on Intel GalileoGen2 with 4.6.7-rt13
  2016-09-22 14:16     ` Claudius Heine
@ 2016-09-22 14:41       ` Sebastian Andrzej Siewior
  2016-09-23 15:39         ` Jan Kiszka
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-22 14:41 UTC (permalink / raw)
  To: Claudius Heine; +Cc: Jan Kiszka, linux-rt-users

On 2016-09-22 16:16:19 [+0200], Claudius Heine wrote:
> Thanks for your help, but I am sorry to disappoint:

I am surprised. You have UART and SPI sharing the interrupt and from
output it is SPI that does not cooperate.
Disable SPI and the warning/error should go.
Maybe RT hits a condition the SPI driver return IRQ_NONE instead of
doing something.

> 
> Claudius

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Got stacktrace "irq 17: nobody cared" on Intel GalileoGen2 with 4.6.7-rt13
  2016-09-22 14:41       ` Sebastian Andrzej Siewior
@ 2016-09-23 15:39         ` Jan Kiszka
  2016-09-23 16:00           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2016-09-23 15:39 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, Claudius Heine; +Cc: linux-rt-users

On 2016-09-22 16:41, Sebastian Andrzej Siewior wrote:
> On 2016-09-22 16:16:19 [+0200], Claudius Heine wrote:
>> Thanks for your help, but I am sorry to disappoint:
> 
> I am surprised. You have UART and SPI sharing the interrupt and from
> output it is SPI that does not cooperate.
> Disable SPI and the warning/error should go.
> Maybe RT hits a condition the SPI driver return IRQ_NONE instead of
> doing something.

It does, but I'm not yet understanding why (the driver's status register
actually does not list any reasons, but when disabling the device, the
problem is gone). I'll try to dig deeper, just to exclude a generic -rt
problem.


However, I found a workaround for our target, with positive side effect
(one slow legacy INT less, one shared INT less - we could do more):

diff --git a/drivers/spi/spi-pxa2xx-pci.c b/drivers/spi/spi-pxa2xx-pci.c
index 520ed1d..4ec458f 100644
--- a/drivers/spi/spi-pxa2xx-pci.c
+++ b/drivers/spi/spi-pxa2xx-pci.c
@@ -168,6 +168,7 @@ static int pxa2xx_spi_pci_probe(struct pci_dev *dev,
 		dev_err(&dev->dev, "failed to ioremap() registers\n");
 		return -EIO;
 	}
+	pci_enable_msi(dev);
 	ssp->irq = dev->irq;
 	ssp->port_id = (c->port_id >= 0) ? c->port_id : dev->devfn;
 	ssp->type = c->type;


As you've written that code: Any reason why we should refrain from doing
that upstream?

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: Got stacktrace "irq 17: nobody cared" on Intel GalileoGen2 with 4.6.7-rt13
  2016-09-23 15:39         ` Jan Kiszka
@ 2016-09-23 16:00           ` Sebastian Andrzej Siewior
  2016-09-23 16:12             ` Jan Kiszka
  0 siblings, 1 reply; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-23 16:00 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Claudius Heine, linux-rt-users

On 2016-09-23 17:39:38 [+0200], Jan Kiszka wrote:
> diff --git a/drivers/spi/spi-pxa2xx-pci.c b/drivers/spi/spi-pxa2xx-pci.c
> index 520ed1d..4ec458f 100644
> --- a/drivers/spi/spi-pxa2xx-pci.c
> +++ b/drivers/spi/spi-pxa2xx-pci.c
> @@ -168,6 +168,7 @@ static int pxa2xx_spi_pci_probe(struct pci_dev *dev,
>  		dev_err(&dev->dev, "failed to ioremap() registers\n");
>  		return -EIO;
>  	}
> +	pci_enable_msi(dev);
>  	ssp->irq = dev->irq;
>  	ssp->port_id = (c->port_id >= 0) ? c->port_id : dev->devfn;
>  	ssp->type = c->type;
> 
> 
> As you've written that code: Any reason why we should refrain from doing
> that upstream?

I don't see why this pops up in -RT and not in !RT and this cures it. I
remember that the UART on CE4100 went crazy if you used inb/outb once
you enabled interrupts in multi mode (PCI A/B/C/D instead only A).
Switching over to MMIO access fixed that.

So. Enabling MSI unconditionally isn't usually that bad. If the HW does
not advertise MSI then nothing happens. However there is faulty HW that
advertises MSI and after enabling it you get no interrupts (grep for
XHCI_BROKEN_MSI if you want an example). I would suggest to enable it
and if people complain add an exception list (like XHCI does).
And you might want to disable MSI (but it might happen if you disable /
leave the PCI).

> Jan
> 

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Got stacktrace "irq 17: nobody cared" on Intel GalileoGen2 with 4.6.7-rt13
  2016-09-23 16:00           ` Sebastian Andrzej Siewior
@ 2016-09-23 16:12             ` Jan Kiszka
  2016-09-23 16:17               ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kiszka @ 2016-09-23 16:12 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Claudius Heine, linux-rt-users

On 2016-09-23 18:00, Sebastian Andrzej Siewior wrote:
> On 2016-09-23 17:39:38 [+0200], Jan Kiszka wrote:
>> diff --git a/drivers/spi/spi-pxa2xx-pci.c b/drivers/spi/spi-pxa2xx-pci.c
>> index 520ed1d..4ec458f 100644
>> --- a/drivers/spi/spi-pxa2xx-pci.c
>> +++ b/drivers/spi/spi-pxa2xx-pci.c
>> @@ -168,6 +168,7 @@ static int pxa2xx_spi_pci_probe(struct pci_dev *dev,
>>  		dev_err(&dev->dev, "failed to ioremap() registers\n");
>>  		return -EIO;
>>  	}
>> +	pci_enable_msi(dev);
>>  	ssp->irq = dev->irq;
>>  	ssp->port_id = (c->port_id >= 0) ? c->port_id : dev->devfn;
>>  	ssp->type = c->type;
>>
>>
>> As you've written that code: Any reason why we should refrain from doing
>> that upstream?
> 
> I don't see why this pops up in -RT and not in !RT and this cures it. I
> remember that the UART on CE4100 went crazy if you used inb/outb once
> you enabled interrupts in multi mode (PCI A/B/C/D instead only A).
> Switching over to MMIO access fixed that.

Yeah, it's still not a good overall feeling.

> 
> So. Enabling MSI unconditionally isn't usually that bad. If the HW does
> not advertise MSI then nothing happens. However there is faulty HW that
> advertises MSI and after enabling it you get no interrupts (grep for
> XHCI_BROKEN_MSI if you want an example). I would suggest to enable it
> and if people complain add an exception list (like XHCI does).
> And you might want to disable MSI (but it might happen if you disable /
> leave the PCI).

I already checked that: As there is no pcim_enable_msi, I thought there
is no need (pcim_enable_device would take care). But a loop over
bind/unbind showed that we were leaking vectors. Will fix that and send
a proper patch.

I was also trying to turn MSI on in intel_quark_i2c_gpio, but that
device fired back by stopping to deliver interrupts. OK.

Jan

-- 
Siemens AG, Corporate Technology, CT RDA ITP SES-DE
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Got stacktrace "irq 17: nobody cared" on Intel GalileoGen2 with 4.6.7-rt13
  2016-09-23 16:12             ` Jan Kiszka
@ 2016-09-23 16:17               ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 9+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-23 16:17 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Claudius Heine, linux-rt-users

On 2016-09-23 18:12:02 [+0200], Jan Kiszka wrote:
> I was also trying to turn MSI on in intel_quark_i2c_gpio, but that
> device fired back by stopping to deliver interrupts. OK.

you could also take a week off and talk to the BIOS developer.

> Jan

Sebastian

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-09-23 16:17 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-22  8:19 Got stacktrace "irq 17: nobody cared" on Intel GalileoGen2 with 4.6.7-rt13 Claudius Heine
2016-09-22  8:24 ` Jan Kiszka
2016-09-22 13:49   ` Sebastian Andrzej Siewior
2016-09-22 14:16     ` Claudius Heine
2016-09-22 14:41       ` Sebastian Andrzej Siewior
2016-09-23 15:39         ` Jan Kiszka
2016-09-23 16:00           ` Sebastian Andrzej Siewior
2016-09-23 16:12             ` Jan Kiszka
2016-09-23 16:17               ` Sebastian Andrzej Siewior

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.