Hi, On Wed, Oct 17, 2012 at 05:00:02PM +0300, Felipe Balbi wrote: > Hi, > > On Tue, Oct 16, 2012 at 02:39:50PM -0700, Kevin Hilman wrote: > > + peterz, tglx > > > > Felipe Balbi writes: > > > > [...] > > > > > The problem I see is that even though we properly return IRQ_WAKE_THREAD > > > and wake_up_process() manages to wakeup the IRQ thread (it returns 1), > > > the thread is never scheduled. To make things even worse, ouw irq thread > > > runs once, but doesn't run on a consecutive call. Here's some (rather > > > nasty) debug prints showing the problem: > > > > [...] > > > > >> [ 88.721923] try_to_wake_up 1411 > > >> [ 88.725189] ===> irq_wake_thread 139: IRQ 72 wake_up_process 0 > > >> [ 88.731292] [sched_delayed] sched: RT throttling activated > > > > This throttling message is the key one. > > > > With RT throttling activated, the IRQ thread will not be run (it > > eventually will be allowed much later on, but by then, the I2C xfers > > have timed out.) > > > > As a quick hack, the throttling can be disabled by seeting the > > sched_rt_runtime to RUNTIME_INF: > > > > # sysctl -w kernel.sched_rt_runtime_us=-1 > > > > and a quick test shows that things go back to working as expected. But > > we still need to figure out why the throttling is hapenning... > > > > So I started digging into why the RT runtime was so high, and noticed > > that time spent in suspend was being counted as RT runtime! > > > > So spending time in suspend anywhere near sched_rt_runtime (0.95s) will > > cause the RT throttling to always be triggered, and thus prevent IRQ > > threads from running in the resume path. Ouch. > > > > I think I'm already in over my head in the RT runtime stuff, but > > counting the time spent in suspend as RT runtime smells like a bug to > > me. no? > > > > Peter? Thomas? > > it looks like removing console output completely (echo 0 > > /proc/sysrq-trigger) I don't see the issue anymore. Let me just run for > a few more iterations to make sure what I'm saying is correct. Yeah, really looks like removing console output makes the problem go away. Ran a few iterations and it always worked fine. Full logs attached BTW, In the meantime I think I might have found some problems with omap_device's PM implementation. I'll test the patchset a little longer and send an RFC. -- balbi