All of lore.kernel.org
 help / color / mirror / Atom feed
* freezing system
@ 2018-10-11 16:30 Diez Roggisch
  2018-10-11 19:01 ` Julia Cartwright
  0 siblings, 1 reply; 4+ messages in thread
From: Diez Roggisch @ 2018-10-11 16:30 UTC (permalink / raw)
  To: linux-rt-users

Hi,

during work on a driver I encountered a behaviour I can’t explain and would like to hear the opinion of the experts here.

We work with the rasperry PI 3 (in it’s compute module incarnation, but that shouldn’t make a difference for this discussion).

It is hooked up to a microcontroller via SPI, and because the uC needs to signal that it is ready to receive data, we strung an IRQ line to a GPIO on the pi.

In the driver I register an ISR for that GPIO which signals a wait_queue_head_t with wake_up. The ISR is allocated with IRQF_NO_THREAD because this is on our critical path and should be as fast as possible.

The user space thread (our actual realtime thread) invokes an ioctl that will wait_event_interruptible for this event and then initiates the SPI communication.

After a few minutes of operation, the system freezes. Without any discernible output, no ooops, no nothing. Just stops.

Further investigation revealed that this only happens if we actually wait for the event. Not just by running the ISR.

In order to report this problem, I reduced the setup to a simpler one that can be created with a simple RPI3:

 - allocate one IRQ gpio 12.
 - allocate a trigger gpio 13.
 - short circuit the two with a cable.
 - load the kernel module.
 - run a test program that triggers and waits in a different thread.

After a few seconds, the system  freezes.

You can find the kernel module here:

 https://gist.github.com/dir-ableton/7005fa10fd4bdcf65cbc21ab22f5a572

The test program:

 https://gist.github.com/dir-ableton/6330988cbde504bd6f024f5513571ab6

Kernel version

 a42048c6eee58b1b8d252e30224b7d065615c3fd from 

 https://github.com/raspberrypi/linux/tree/rpi-4.14.y-rt

which is a 4.14.71 with PREEMPT_RT patch applied as of two days ago.

I’ve also run this test with the corresponding vanilla kernel

 12d78096b1669a08d440f7ebaddf5d925e52fe79

 https://github.com/raspberrypi/linux/tree/rpi-4.14.y

This did not freeze after running a full night.

I’m probably doing something stupid here, by not disabling IRQs or similar stuff. I took the example for a gpio ISR from a linux driver development book (german, “Linux Treiber Entwickeln”). Maybe the example is not complete?

Any hints on what is going on here, or how to debug the problem further are much appreciated.

Cheers,

Diez B. Roggisch

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: freezing system
  2018-10-11 16:30 freezing system Diez Roggisch
@ 2018-10-11 19:01 ` Julia Cartwright
  2018-10-11 19:30   ` Diez Roggisch
  2018-10-16  9:10   ` Diez Roggisch
  0 siblings, 2 replies; 4+ messages in thread
From: Julia Cartwright @ 2018-10-11 19:01 UTC (permalink / raw)
  To: Diez Roggisch; +Cc: linux-rt-users

Hello Diez-

On Thu, Oct 11, 2018 at 04:30:36PM +0000, Diez Roggisch wrote:
> Hi,
>
> during work on a driver I encountered a behaviour I can't explain and
> would like to hear the opinion of the experts here.
>
> We work with the rasperry PI 3 (in it's compute module incarnation,
> but that shouldn't make a difference for this discussion).
>
> It is hooked up to a microcontroller via SPI, and because the uC needs
> to signal that it is ready to receive data, we strung an IRQ line to a
> GPIO on the pi.
>
> In the driver I register an ISR for that GPIO which signals a
> wait_queue_head_t with wake_up. The ISR is allocated with
> IRQF_NO_THREAD because this is on our critical path and should be as
> fast as possible.

Can you reproduce the hanging behaviour without IRQF_NO_THREAD?

In general, the signalling/wakeup of wait queues is not allowed from
hardirq context (like what would happen if you've passed IRQF_NO_THREAD
at the time you register your handler).  If there's ever contention on
the waitqueue's internal spin_lock on RT, the handler will end up trying
to schedule() in a non-schedulable context.  This might lead to a
complete hang like you're seeing.

Good luck,
   Julia

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: freezing system
  2018-10-11 19:01 ` Julia Cartwright
@ 2018-10-11 19:30   ` Diez Roggisch
  2018-10-16  9:10   ` Diez Roggisch
  1 sibling, 0 replies; 4+ messages in thread
From: Diez Roggisch @ 2018-10-11 19:30 UTC (permalink / raw)
  To: Julia Cartwright; +Cc: linux-rt-users

Thanks to both of you. I admit I did this kind of prematurely. I’ll give removing the flag a shot next week!

Cheers,

Diez 

Von meinem iPad gesendet

> Am 11.10.2018 um 21:01 schrieb Julia Cartwright <julia@ni.com>:
> 
> Hello Diez-
> 
>> On Thu, Oct 11, 2018 at 04:30:36PM +0000, Diez Roggisch wrote:
>> Hi,
>> 
>> during work on a driver I encountered a behaviour I can't explain and
>> would like to hear the opinion of the experts here.
>> 
>> We work with the rasperry PI 3 (in it's compute module incarnation,
>> but that shouldn't make a difference for this discussion).
>> 
>> It is hooked up to a microcontroller via SPI, and because the uC needs
>> to signal that it is ready to receive data, we strung an IRQ line to a
>> GPIO on the pi.
>> 
>> In the driver I register an ISR for that GPIO which signals a
>> wait_queue_head_t with wake_up. The ISR is allocated with
>> IRQF_NO_THREAD because this is on our critical path and should be as
>> fast as possible.
> 
> Can you reproduce the hanging behaviour without IRQF_NO_THREAD?
> 
> In general, the signalling/wakeup of wait queues is not allowed from
> hardirq context (like what would happen if you've passed IRQF_NO_THREAD
> at the time you register your handler).  If there's ever contention on
> the waitqueue's internal spin_lock on RT, the handler will end up trying
> to schedule() in a non-schedulable context.  This might lead to a
> complete hang like you're seeing.
> 
> Good luck,
>   Julia

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: freezing system
  2018-10-11 19:01 ` Julia Cartwright
  2018-10-11 19:30   ` Diez Roggisch
@ 2018-10-16  9:10   ` Diez Roggisch
  1 sibling, 0 replies; 4+ messages in thread
From: Diez Roggisch @ 2018-10-16  9:10 UTC (permalink / raw)
  To: linux-rt-users

Hello Julia & Mark,


> On 11. Oct 2018, at 21:01, Julia Cartwright <julia@ni.com> wrote:
> 
> Can you reproduce the hanging behaviour without IRQF_NO_THREAD?

Thankfully not! So that’s the solution we’re rolling for now. Thanks again for pointing this out.

Cheers,

Diez

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-10-16 17:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-11 16:30 freezing system Diez Roggisch
2018-10-11 19:01 ` Julia Cartwright
2018-10-11 19:30   ` Diez Roggisch
2018-10-16  9:10   ` Diez Roggisch

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.