Hi,

On Wed, Oct 17, 2012 at 05:00:02PM +0300, Felipe Balbi wrote:
> Hi,
> 
> On Tue, Oct 16, 2012 at 02:39:50PM -0700, Kevin Hilman wrote:
> > + peterz, tglx
> > 
> > Felipe Balbi <balbi@ti.com> writes:
> > 
> > [...]
> > 
> > > The problem I see is that even though we properly return IRQ_WAKE_THREAD
> > > and wake_up_process() manages to wakeup the IRQ thread (it returns 1),
> > > the thread is never scheduled. To make things even worse, ouw irq thread
> > > runs once, but doesn't run on a consecutive call. Here's some (rather
> > > nasty) debug prints showing the problem:
> > 
> > [...]
> > 
> > >> [   88.721923] try_to_wake_up 1411
> > >> [   88.725189] ===> irq_wake_thread 139: IRQ 72 wake_up_process 0
> > >> [   88.731292] [sched_delayed] sched: RT throttling activated
> > 
> > This throttling message is the key one.
> > 
> > With RT throttling activated, the IRQ thread will not be run (it
> > eventually will be allowed much later on, but by then, the I2C xfers
> > have timed out.)
> > 
> > As a quick hack, the throttling can be disabled by seeting the
> > sched_rt_runtime to RUNTIME_INF:
> > 
> >         # sysctl -w kernel.sched_rt_runtime_us=-1
> > 
> > and a quick test shows that things go back to working as expected.  But
> > we still need to figure out why the throttling is hapenning...
> > 
> > So I started digging into why the RT runtime was so high, and noticed
> > that time spent in suspend was being counted as RT runtime!
> > 
> > So spending time in suspend anywhere near sched_rt_runtime (0.95s) will
> > cause the RT throttling to always be triggered, and thus prevent IRQ
> > threads from running in the resume path.  Ouch.
> > 
> > I think I'm already in over my head in the RT runtime stuff, but
> > counting the time spent in suspend as RT runtime smells like a bug to
> > me. no?
> > 
> > Peter? Thomas?
> 
> it looks like removing console output completely (echo 0 >
> /proc/sysrq-trigger) I don't see the issue anymore. Let me just run for
> a few more iterations to make sure what I'm saying is correct.

Yeah, really looks like removing console output makes the problem go
away. Ran a few iterations and it always worked fine. Full logs attached

BTW, In the meantime I think I might have found some problems with
omap_device's PM implementation. I'll test the patchset a little longer
and send an RFC.

-- 
balbi