All of lore.kernel.org
 help / color / mirror / Atom feed
* DSS2/PM on 3.2 broken?
@ 2012-01-09 12:46 Joe Woodward
  2012-01-09 21:08 ` NeilBrown
                   ` (2 more replies)
  0 siblings, 3 replies; 64+ messages in thread
From: Joe Woodward @ 2012-01-09 12:46 UTC (permalink / raw)
  To: linux-omap

I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel connected via the DPI interface (using the generic panel driver).

Entering standby used to work just fine on 3.0, but on 3.2 I get the following:

# echo mem > /sys/power/state
[   23.186279] PM: Syncing filesystems ... done.
[   23.194244] Freezing user space processes ... (elapsed 0.01 seconds) done.
[   23.219543] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) done.
[   23.251037] Suspending console(s) (use no_console_suspend to debug)
[   23.554656] PM: suspend of devices complete after 296.417 msecs
[   23.561859] PM: late suspend of devices complete after 6.957 msecs
[   24.464813] Successfully put all powerdomains to target state
[   24.466674] ------------[ cut here ]------------
[   24.466857] WARNING: at drivers/video/omap2/dss/dss.c:713 0xc01350f8()
[   24.467010] Modules linked in:
[   24.467132] Backtrace:
[   24.467254] Function entered at [<c0010c3c>] from [<c02c4a60>]
[   24.467407]  r6:c02ffdaa r5:000002c9 r4:00000000 r3:00000000
[   24.467651] Function entered at [<c02c4a48>] from [<c00344d0>]
[   24.467803] Function entered at [<c003447c>] from [<c003450c>]
[   24.467926]  r8:00000000 r7:c0390a84 r6:c00288a4 r5:c037b85c r4:fffffff3
[   24.468200] r3:00000009
[   24.468322] Function entered at [<c00344e8>] from [<c01350f8>]
[   24.468475] Function entered at [<c01350ac>] from [<c013595c>]
[   24.468597]  r4:dec50208 r3:c013594c
[   24.468780] Function entered at [<c013594c>] from [<c0182724>]
[   24.468902]  r6:c00288a4 r5:c037b85c r4:dec50208 r3:c013594c
[   24.469177] Function entered at [<c01826f0>] from [<c0028900>]
[   24.469299] Function entered at [<c00288a4>] from [<c01836f4>]
[   24.469421]  r4:dec50208 r3:00000000
[   24.469604] Function entered at [<c0183614>] from [<c0183d20>]
[   24.469726]  r9:c02d7044 r8:00000000 r6:dec5025c r5:00000010 r4:dec50208
[   24.470031] Function entered at [<c0183c54>] from [<c0063aa4>]
[   24.470153]  r8:c02ca888 r7:00000000 r6:00000003 r5:00000000 r4:00000000
[   24.470458] Function entered at [<c006391c>] from [<c0063c60>]
[   24.470581]  r7:00000004 r6:00000000 r5:c02ca87c r4:00000003
[   24.471130] Function entered at [<c0063b50>] from [<c0062ad4>]
[   24.471282]  r6:00000003 r5:00000003 r4:c6a87000 r3:0000006d
[   24.471557] Function entered at [<c0062a2c>] from [<c0117728>]
[   24.471679] Function entered at [<c011770c>] from [<c00e2ab0>]
[   24.471832] Function entered at [<c00e29a0>] from [<c0098c98>]
[   24.471954] Function entered at [<c0098be4>] from [<c0098f14>]
[   24.472076]  r8:00000004 r7:00000000 r6:00000000 r5:000ac750 r4:d8a70dc0
[   24.472412] Function entered at [<c0098ed0>] from [<c000dcc0>]
[   24.472534]  r8:c000de44 r7:00000004 r6:000ac750 r5:00000004 r4:000a8e38
[   24.472839] ---[ end trace 9f4f3053f6637dae ]---
[   24.475006] PM: early resume of devices complete after 8.666 msecs
[   25.040344] PM: resume of devices complete after 560.943 msecs
[   25.277801] Restarting tasks ... done.

At which point the screen either restarts, or sometimes flickers and I get the following:
[   22.578796] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled
[   23.391571] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled
[   24.391571] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled

It normally recovers after doing this for a while...

Anyone have any ideas?

Cheers,
Joe




^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-09 12:46 DSS2/PM on 3.2 broken? Joe Woodward
@ 2012-01-09 21:08 ` NeilBrown
  2012-01-10  9:58   ` Joe Woodward
  2012-01-11 13:43   ` Paul Walmsley
  2012-01-11 13:32 ` Paul Walmsley
  2012-01-12 16:42 ` Tomi Valkeinen
  2 siblings, 2 replies; 64+ messages in thread
From: NeilBrown @ 2012-01-09 21:08 UTC (permalink / raw)
  To: Joe Woodward; +Cc: linux-omap

[-- Attachment #1: Type: text/plain, Size: 5374 bytes --]

On Mon, 09 Jan 2012 12:46:43 +0000 "Joe Woodward" <jw@terrafix.co.uk> wrote:

> I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel connected via the DPI interface (using the generic panel driver).
> 
> Entering standby used to work just fine on 3.0, but on 3.2 I get the following:
> 
> # echo mem > /sys/power/state
> [   23.186279] PM: Syncing filesystems ... done.
> [   23.194244] Freezing user space processes ... (elapsed 0.01 seconds) done.
> [   23.219543] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) done.
> [   23.251037] Suspending console(s) (use no_console_suspend to debug)
> [   23.554656] PM: suspend of devices complete after 296.417 msecs
> [   23.561859] PM: late suspend of devices complete after 6.957 msecs
> [   24.464813] Successfully put all powerdomains to target state
> [   24.466674] ------------[ cut here ]------------
> [   24.466857] WARNING: at drivers/video/omap2/dss/dss.c:713 0xc01350f8()
> [   24.467010] Modules linked in:
> [   24.467132] Backtrace:
> [   24.467254] Function entered at [<c0010c3c>] from [<c02c4a60>]
> [   24.467407]  r6:c02ffdaa r5:000002c9 r4:00000000 r3:00000000
> [   24.467651] Function entered at [<c02c4a48>] from [<c00344d0>]
> [   24.467803] Function entered at [<c003447c>] from [<c003450c>]
> [   24.467926]  r8:00000000 r7:c0390a84 r6:c00288a4 r5:c037b85c r4:fffffff3
> [   24.468200] r3:00000009
> [   24.468322] Function entered at [<c00344e8>] from [<c01350f8>]
> [   24.468475] Function entered at [<c01350ac>] from [<c013595c>]
> [   24.468597]  r4:dec50208 r3:c013594c
> [   24.468780] Function entered at [<c013594c>] from [<c0182724>]
> [   24.468902]  r6:c00288a4 r5:c037b85c r4:dec50208 r3:c013594c
> [   24.469177] Function entered at [<c01826f0>] from [<c0028900>]
> [   24.469299] Function entered at [<c00288a4>] from [<c01836f4>]
> [   24.469421]  r4:dec50208 r3:00000000
> [   24.469604] Function entered at [<c0183614>] from [<c0183d20>]
> [   24.469726]  r9:c02d7044 r8:00000000 r6:dec5025c r5:00000010 r4:dec50208
> [   24.470031] Function entered at [<c0183c54>] from [<c0063aa4>]
> [   24.470153]  r8:c02ca888 r7:00000000 r6:00000003 r5:00000000 r4:00000000
> [   24.470458] Function entered at [<c006391c>] from [<c0063c60>]
> [   24.470581]  r7:00000004 r6:00000000 r5:c02ca87c r4:00000003
> [   24.471130] Function entered at [<c0063b50>] from [<c0062ad4>]
> [   24.471282]  r6:00000003 r5:00000003 r4:c6a87000 r3:0000006d
> [   24.471557] Function entered at [<c0062a2c>] from [<c0117728>]
> [   24.471679] Function entered at [<c011770c>] from [<c00e2ab0>]
> [   24.471832] Function entered at [<c00e29a0>] from [<c0098c98>]
> [   24.471954] Function entered at [<c0098be4>] from [<c0098f14>]
> [   24.472076]  r8:00000004 r7:00000000 r6:00000000 r5:000ac750 r4:d8a70dc0
> [   24.472412] Function entered at [<c0098ed0>] from [<c000dcc0>]
> [   24.472534]  r8:c000de44 r7:00000004 r6:000ac750 r5:00000004 r4:000a8e38
> [   24.472839] ---[ end trace 9f4f3053f6637dae ]---
> [   24.475006] PM: early resume of devices complete after 8.666 msecs
> [   25.040344] PM: resume of devices complete after 560.943 msecs
> [   25.277801] Restarting tasks ... done.
> 
> At which point the screen either restarts, or sometimes flickers and I get the following:
> [   22.578796] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled
> [   23.391571] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled
> [   24.391571] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled
> 
> It normally recovers after doing this for a while...
> 
> Anyone have any ideas?

I think I can help you work around the problem but I would much rather see it
fixed.  So the main reason I'm replying is to make this thread seem more
interesting so that more people look at it and hopefully the "right" person
sees it :-)

It seems that when cpuidle on an omap3 tries to switch to lower power
states, various things misbehave:
 - UARTs lose characters
 - dss loses sync
 - HDQ seems to lose everything.

The first is known in the code so cpuidle is disabled if there is any UART
traffic.  At first boot it is assume thered might be uart activity and the
timeout for "there doesn't seem to be any activity" is '0' meaning 'don't
time out' (look for sleep_timeout in sysfs).

On suspend, the UARTs are allowed to go to sleep and they are not explicitly
woken at resume.  So after a suspend/resume cycle, cpuidle is more likely to
try to adjust idle states and so DSS and HDQ are more likely to fail.

You can disable cpuidle by writing '0' to all the 'sleep_timeout' fields, or
probably by
   echo 1 >  /sys/module/cpuidle/parameters/off

That should stablise the display.

It would be great if you could check if cpuidle was active in 3.0 when this
all worked.  I would check by:

  grep . /sys/devices/system/cpu/cpu?/cpuidle/state?/usage

and see if any state other than state0 is used.

If cpuidle is changing power states but the display is happy, then it would
be fantastic if you could 'git bisect' to find out when it broke, but that
would be a lot of work with uncertain gain so I wouldn't be at all surprised
if you declined.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-09 21:08 ` NeilBrown
@ 2012-01-10  9:58   ` Joe Woodward
  2012-01-11 13:43   ` Paul Walmsley
  1 sibling, 0 replies; 64+ messages in thread
From: Joe Woodward @ 2012-01-10  9:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-omap

-----Original Message-----
From: NeilBrown <neilb@suse.de>
To: "Joe Woodward" <jw@terrafix.co.uk>
Cc: linux-omap@vger.kernel.org
Date: Tue, 10 Jan 2012 08:08:49 +1100
Subject: Re: DSS2/PM on 3.2 broken?

> On Mon, 09 Jan 2012 12:46:43 +0000 "Joe Woodward" <jw@terrafix.co.uk>
> wrote:
> 
> > I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel
> connected via the DPI interface (using the generic panel driver).
> > 
> > Entering standby used to work just fine on 3.0, but on 3.2 I get the
> following:
> > 
> > # echo mem > /sys/power/state
> > [   23.186279] PM: Syncing filesystems ... done.
> > [   23.194244] Freezing user space processes ... (elapsed 0.01
> seconds) done.
> > [   23.219543] Freezing remaining freezable tasks ... (elapsed 0.02
> seconds) done.
> > [   23.251037] Suspending console(s) (use no_console_suspend to
> debug)
> > [   23.554656] PM: suspend of devices complete after 296.417 msecs
> > [   23.561859] PM: late suspend of devices complete after 6.957 msecs
> > [   24.464813] Successfully put all powerdomains to target state
> > [   24.466674] ------------[ cut here ]------------
> > [   24.466857] WARNING: at drivers/video/omap2/dss/dss.c:713
> 0xc01350f8()
> > [   24.467010] Modules linked in:
> > [   24.467132] Backtrace:
> > [   24.467254] Function entered at [<c0010c3c>] from [<c02c4a60>]
> > [   24.467407]  r6:c02ffdaa r5:000002c9 r4:00000000 r3:00000000
> > [   24.467651] Function entered at [<c02c4a48>] from [<c00344d0>]
> > [   24.467803] Function entered at [<c003447c>] from [<c003450c>]
> > [   24.467926]  r8:00000000 r7:c0390a84 r6:c00288a4 r5:c037b85c
> r4:fffffff3
> > [   24.468200] r3:00000009
> > [   24.468322] Function entered at [<c00344e8>] from [<c01350f8>]
> > [   24.468475] Function entered at [<c01350ac>] from [<c013595c>]
> > [   24.468597]  r4:dec50208 r3:c013594c
> > [   24.468780] Function entered at [<c013594c>] from [<c0182724>]
> > [   24.468902]  r6:c00288a4 r5:c037b85c r4:dec50208 r3:c013594c
> > [   24.469177] Function entered at [<c01826f0>] from [<c0028900>]
> > [   24.469299] Function entered at [<c00288a4>] from [<c01836f4>]
> > [   24.469421]  r4:dec50208 r3:00000000
> > [   24.469604] Function entered at [<c0183614>] from [<c0183d20>]
> > [   24.469726]  r9:c02d7044 r8:00000000 r6:dec5025c r5:00000010
> r4:dec50208
> > [   24.470031] Function entered at [<c0183c54>] from [<c0063aa4>]
> > [   24.470153]  r8:c02ca888 r7:00000000 r6:00000003 r5:00000000
> r4:00000000
> > [   24.470458] Function entered at [<c006391c>] from [<c0063c60>]
> > [   24.470581]  r7:00000004 r6:00000000 r5:c02ca87c r4:00000003
> > [   24.471130] Function entered at [<c0063b50>] from [<c0062ad4>]
> > [   24.471282]  r6:00000003 r5:00000003 r4:c6a87000 r3:0000006d
> > [   24.471557] Function entered at [<c0062a2c>] from [<c0117728>]
> > [   24.471679] Function entered at [<c011770c>] from [<c00e2ab0>]
> > [   24.471832] Function entered at [<c00e29a0>] from [<c0098c98>]
> > [   24.471954] Function entered at [<c0098be4>] from [<c0098f14>]
> > [   24.472076]  r8:00000004 r7:00000000 r6:00000000 r5:000ac750
> r4:d8a70dc0
> > [   24.472412] Function entered at [<c0098ed0>] from [<c000dcc0>]
> > [   24.472534]  r8:c000de44 r7:00000004 r6:000ac750 r5:00000004
> r4:000a8e38
> > [   24.472839] ---[ end trace 9f4f3053f6637dae ]---
> > [   24.475006] PM: early resume of devices complete after 8.666 msecs
> > [   25.040344] PM: resume of devices complete after 560.943 msecs
> > [   25.277801] Restarting tasks ... done.
> > 
> > At which point the screen either restarts, or sometimes flickers and
> I get the following:
> > [   22.578796] omapdss DISPC error: SYNC_LOST on channel lcd,
> restarting the output with video overlays disabled
> > [   23.391571] omapdss DISPC error: SYNC_LOST on channel lcd,
> restarting the output with video overlays disabled
> > [   24.391571] omapdss DISPC error: SYNC_LOST on channel lcd,
> restarting the output with video overlays disabled
> > 
> > It normally recovers after doing this for a while...
> > 
> > Anyone have any ideas?
> 
> I think I can help you work around the problem but I would much rather
> see it
> fixed.  So the main reason I'm replying is to make this thread seem
> more
> interesting so that more people look at it and hopefully the "right"
> person
> sees it :-)
> 
> It seems that when cpuidle on an omap3 tries to switch to lower power
> states, various things misbehave:
>  - UARTs lose characters
>  - dss loses sync
>  - HDQ seems to lose everything.
> 
> The first is known in the code so cpuidle is disabled if there is any
> UART
> traffic.  At first boot it is assume thered might be uart activity and
> the
> timeout for "there doesn't seem to be any activity" is '0' meaning
> 'don't
> time out' (look for sleep_timeout in sysfs).
> 
> On suspend, the UARTs are allowed to go to sleep and they are not
> explicitly
> woken at resume.  So after a suspend/resume cycle, cpuidle is more
> likely to
> try to adjust idle states and so DSS and HDQ are more likely to fail.
> 
> You can disable cpuidle by writing '0' to all the 'sleep_timeout'
> fields, or
> probably by
>    echo 1 >  /sys/module/cpuidle/parameters/off
> 
> That should stablise the display.
> 
> It would be great if you could check if cpuidle was active in 3.0 when
> this
> all worked.  I would check by:
> 
>   grep . /sys/devices/system/cpu/cpu?/cpuidle/state?/usage
> 
> and see if any state other than state0 is used.
> 
> If cpuidle is changing power states but the display is happy, then it
> would
> be fantastic if you could 'git bisect' to find out when it broke, but
> that
> would be a lot of work with uncertain gain so I wouldn't be at all
> surprised
> if you declined.
> 
> NeilBrown

Thanks for picking this up Neil...

I don't actually have CPUIDLE enabled in the kernel build (as the UARTs are used as data-pumps with messages received every second or 
so - so not much opportunity to IDLE and not worth getting garbled characters).

I've tried re-building the kernel with CPUIDLE and followed what you said and sadly it makes no difference. The error log when suspending 
is the same. I've checked that CPUIDLE is disabled and it does indeed stay in the same state, and never transitions.

The 3.1 kernel is broken in the same way as the 3.2, but I've not looked any further to find the commit causing the failure - I was hoping 
for a few pointers before having to do this as it's a bit tedious!

I'm assuming the problems started when DSS2 was adapted to runtime PM by Tomi as the warning comes from dss_runtime_get()?

Cheers,
Joe



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-09 12:46 DSS2/PM on 3.2 broken? Joe Woodward
  2012-01-09 21:08 ` NeilBrown
@ 2012-01-11 13:32 ` Paul Walmsley
  2012-01-12 16:42 ` Tomi Valkeinen
  2 siblings, 0 replies; 64+ messages in thread
From: Paul Walmsley @ 2012-01-11 13:32 UTC (permalink / raw)
  To: Joe Woodward; +Cc: linux-omap

On Mon, 9 Jan 2012, Joe Woodward wrote:

> I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel connected via the DPI interface (using the generic panel driver).
> 
> Entering standby used to work just fine on 3.0, but on 3.2 I get the following:
> 
> # echo mem > /sys/power/state

...

> [   24.466674] ------------[ cut here ]------------
> [   24.466857] WARNING: at drivers/video/omap2/dss/dss.c:713 0xc01350f8()

...

> At which point the screen either restarts, or sometimes flickers and I get the following:
> [   22.578796] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled
> [   23.391571] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled
> [   24.391571] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled
> 
> It normally recovers after doing this for a while...
> 
> Anyone have any ideas?

Consider cc'ing the DSS maintainers and linux-fbdev list:

OMAP DISPLAY SUBSYSTEM and FRAMEBUFFER SUPPORT (DSS2)
M:	Tomi Valkeinen <tomi.valkeinen@ti.com>
L:	linux-omap@vger.kernel.org
L:	linux-fbdev@vger.kernel.org
S:	Maintained
F:	drivers/video/omap2/
F:	Documentation/arm/OMAP/DSS


- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-09 21:08 ` NeilBrown
  2012-01-10  9:58   ` Joe Woodward
@ 2012-01-11 13:43   ` Paul Walmsley
  2012-01-11 14:22     ` Archit
  2012-01-11 22:59     ` NeilBrown
  1 sibling, 2 replies; 64+ messages in thread
From: Paul Walmsley @ 2012-01-11 13:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Woodward, khilman, linux-omap

cc Kevin

On Tue, 10 Jan 2012, NeilBrown wrote:

> It seems that when cpuidle on an omap3 tries to switch to lower power
> states, various things misbehave:
>  - UARTs lose characters

Is this with off-mode enabled, or is this with only retention idle 
enabled?

If off-mode is enabled, and incoming serial traffic is used to wake the 
system, it's known and expected that the first byte will be lost.  This is 
an artifact of the hardware and seems to be unavoidable.

>  - dss loses sync

Best to take this up with the DSS maintainer directly.

>  - HDQ seems to lose everything.

Is this with off-mode enabled, or just with retention idle?

If the former, this is hardly surprising as the HDQ driver 
(drivers/w1/masters/omap_hdq.c) doesn't contain any context save/restore 
code.  In fact the HDQ driver doesn't even use PM runtime, so quite a bit 
of work is needed there. 


- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-11 13:43   ` Paul Walmsley
@ 2012-01-11 14:22     ` Archit
  2012-01-11 15:15       ` Joe Woodward
  2012-01-11 22:59     ` NeilBrown
  1 sibling, 1 reply; 64+ messages in thread
From: Archit @ 2012-01-11 14:22 UTC (permalink / raw)
  To: Joe Woodward
  Cc: Paul Walmsley, NeilBrown, khilman, linux-omap, Tomi Valkeinen

Hi,

On Wednesday 11 January 2012 07:13 PM, Paul Walmsley wrote:
> cc Kevin
>
> On Tue, 10 Jan 2012, NeilBrown wrote:
>
>> It seems that when cpuidle on an omap3 tries to switch to lower power
>> states, various things misbehave:
>>   - UARTs lose characters
>
> Is this with off-mode enabled, or is this with only retention idle
> enabled?
>
> If off-mode is enabled, and incoming serial traffic is used to wake the
> system, it's known and expected that the first byte will be lost.  This is
> an artifact of the hardware and seems to be unavoidable.
>
>>   - dss loses sync
>
> Best to take this up with the DSS maintainer directly.

Could you enable omapdss debug by adding 'omapdss.debug=1 debug' in your 
bootargs and share logs?

The pm_runtime_get_sync() call in dss_runtime_get() is returning  a 
negative value, could you print out that value too?

If the call to pm_runtime_get_sync() fails, we bail out immediately. So 
the LCD interface shouldn't have been enabled at all. I don't know why 
we get sync lost errors.

There is a possibility that the LCD interface wasn't switched off when 
for some reason when we suspended, and switching on the clocks again 
tries to start the transfer without DSS being configured correctly?

Also, are you on the final 3.2, or the 3.2-rc's by any chance? There 
were some hwmod patches which got in 3.2-rc4 related to hwmod fixes.

Archit

>
>>   - HDQ seems to lose everything.
>
> Is this with off-mode enabled, or just with retention idle?
>
> If the former, this is hardly surprising as the HDQ driver
> (drivers/w1/masters/omap_hdq.c) doesn't contain any context save/restore
> code.  In fact the HDQ driver doesn't even use PM runtime, so quite a bit
> of work is needed there.
>
>
> - Paul
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-11 14:22     ` Archit
@ 2012-01-11 15:15       ` Joe Woodward
  2012-01-11 15:52         ` Archit
  2012-01-12  9:28         ` Tomi Valkeinen
  0 siblings, 2 replies; 64+ messages in thread
From: Joe Woodward @ 2012-01-11 15:15 UTC (permalink / raw)
  To: Archit; +Cc: Paul Walmsley, NeilBrown, khilman, linux-omap, Tomi Valkeinen

...snip...

> 
> Could you enable omapdss debug by adding 'omapdss.debug=1 debug' in
> your bootargs and share logs?

As requested:

# echo mem > /sys/power/state
[   37.371734] PM: Syncing filesystems ... done.
[   37.397460] PM: Preparing system for mem sleep
[   37.402923] Freezing user space processes ... (elapsed 0.02 seconds) done.
[   37.432312] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) done.
[   37.463256] PM: Entering mem sleep
[   37.467132] Suspending console(s) (use no_console_suspend to debug)

#### SYSTEM SUSPENDS HERE

[   37.587646] omapdss CORE: suspend 2
[   37.776000] omapdss DISPC: dispc_runtime_put
[   37.776336] omapdss DSS: dss_runtime_put
[   37.779693] PM: suspend of devices complete after 305.389 msecs
[   37.786010] omapdss DISPC: dispc_save_context
[   37.786437] omapdss DISPC: context saved, ctx_loss_count 0
[   37.786560] omapdss DSS: dss_runtime_put
[   37.786773] omapdss DSS: dss_save_context
[   37.786895] omapdss DSS: context saved
[   37.787384] PM: late suspend of devices complete after 7.446 msecs
[   40.932312] Successfully put all powerdomains to target state
[   40.934509] omapdss DSS: dss_restore_context
[   40.934661] omapdss DSS: context restored
[   40.934814] omapdss DSS: dss_runtime_get
[   40.934967] @@@ dss_runtime_get::pm_runtime_get_sync() return val = -13
[   40.935089] ------------[ cut here ]------------
[   40.935241] WARNING: at drivers/video/omap2/dss/dss.c:716 0xc0137304()
[   40.935363] Modules linked in:
[   40.935516] Backtrace:
[   40.935638] Function entered at [<c0010c48>] from [<c02bdad0>]
[   40.935791]  r6:c02fa302 r5:000002cc r4:00000000 r3:00000000
[   40.936065] Function entered at [<c02bdab8>] from [<c0034874>]
[   40.936218] Function entered at [<c0034820>] from [<c00348b0>]
[   40.936340]  r8:00000000 r7:c0386c14 r6:c0028c48 r5:c0371c24 r4:fffffff3
[   40.936645] r3:00000009
[   40.936767] Function entered at [<c003488c>] from [<c0137304>]
[   40.936920] Function entered at [<c01372a8>] from [<c0137b20>]
[   40.937042]  r4:dec50008 r3:c0137b10
[   40.937225] Function entered at [<c0137b10>] from [<c0179ff0>]
[   40.937347]  r6:c0028c48 r5:c0371c24 r4:dec50008 r3:c0137b10
[   40.937652] Function entered at [<c0179fbc>] from [<c0028ca4>]
[   40.937774] Function entered at [<c0028c48>] from [<c017afc0>]
[   40.937927]  r4:dec50008 r3:00000000
[   40.938079] Function entered at [<c017aee0>] from [<c017b5ec>]
[   40.938232]  r9:c02d0044 r8:00000000 r6:dec5005c r5:00000010 r4:dec50008
[   40.938568] Function entered at [<c017b520>] from [<c0063e48>]
[   40.938690]  r8:c02c3898 r7:00000000 r6:00000003 r5:00000000 r4:00000000
[   40.938995] Function entered at [<c0063cc0>] from [<c0064004>]
[   40.939147]  r7:00000004 r6:00000000 r5:c02c388c r4:00000003
[   40.939422] Function entered at [<c0063ef4>] from [<c0062e78>]
[   40.939544]  r6:00000003 r5:00000003 r4:d8869000 r3:0000006d
[   40.939819] Function entered at [<c0062dd0>] from [<c0117ac8>]
[   40.939971] Function entered at [<c0117aac>] from [<c00e2e54>]
[   40.940124] Function entered at [<c00e2d44>] from [<c009903c>]
[   40.940246] Function entered at [<c0098f88>] from [<c00992b8>]
[   40.940368]  r8:00000004 r7:00000000 r6:00000000 r5:000aabf8 r4:d857c940
[   40.940704] Function entered at [<c0099274>] from [<c000dcc0>]
[   40.941101]  r8:c000de44 r7:00000004 r6:000aabf8 r5:00000004 r4:000a8e38
[   40.941436] ---[ end trace db0a5fd53a21ce6e ]---
[   40.943511] PM: early resume of devices complete after 9.307 msecs
[   40.950622] omapdss DISPC: dispc_save_context
[   40.951049] omapdss DISPC: context saved, ctx_loss_count 0
[   40.951171] omapdss DSS: dss_runtime_put
[   40.951812] omapdss CORE: resume
[   40.953460] omapdss DSS: dss_runtime_get
[   40.953613] @@@ dss_runtime_get::pm_runtime_get_sync() return val = 1
[   40.953735] omapdss DISPC: dispc_runtime_get
[   40.953948] omapdss DSS: dss_runtime_get
[   40.954071] @@@ dss_runtime_get::pm_runtime_get_sync() return val = 1
[   40.954193] omapdss DISPC: dispc_restore_context
[   40.954467] omapdss DISPC: onoff 0 rf 0 ieo 0 ipc 0 ihs 1 ivs 1 acbi 0 acb 0
[   40.954620] omapdss DSS: dispc clock info found from cache.
[   40.954772] omapdss DSS: dpll4_m4 = 432000000
[   40.954986] omapdss DSS: fck = 72000000 (12)
[   40.955108] omapdss DISPC: lck = 72000000 (1)
[   40.955230] omapdss DISPC: pck = 36000000 (2)
[   40.955383] omapdss DISPC: channel 0 xres 800 yres 480
[   40.955505] omapdss DISPC: pck 36000
[   40.955657] omapdss DISPC: hsw 10 hfp 164 hbp 89 vsw 10 vfp 10 vbp 23
[   40.955810] omapdss DISPC: hsync 33866Hz, vsync 64Hz
[   41.294464] PM: resume of devices complete after 345.153 msecs
[   41.688568] PM: Finishing wakeup.
[   41.692169] Restarting tasks ... done.

When SYNC is lost we get something like:
[  118.972534] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled
[  119.166839] omapdss DISPC: dispc_runtime_put
[  119.175292] omapdss DSS: dss_runtime_put
[  119.183746] omapdss DISPC: dispc_enable_plane 1, 0
[  119.193054] omapdss DISPC: dispc_enable_plane 2, 0
[  119.261230] omapdss DSS: dss_runtime_get
[  119.261383] @@@ dss_runtime_get::pm_runtime_get_sync() return val = 1
[  119.261505] omapdss DISPC: dispc_runtime_get
[  119.261718] omapdss DISPC: onoff 0 rf 0 ieo 0 ipc 0 ihs 1 ivs 1 acbi 0 acb 0
[  119.261871] omapdss DSS: dispc clock info found from cache.
[  119.262023] omapdss DSS: dpll4_m4 = 432000000
[  119.262237] omapdss DSS: fck = 72000000 (12)
[  119.262329] omapdss DISPC: lck = 72000000 (1)
[  119.262451] omapdss DISPC: pck = 36000000 (2)
[  119.262603] omapdss DISPC: channel 0 xres 800 yres 480
[  119.262725] omapdss DISPC: pck 36000
[  119.262878] omapdss DISPC: hsw 10 hfp 164 hbp 89 vsw 10 vfp 10 vbp 23
[  119.263000] omapdss DISPC: hsync 33866Hz, vsync 64Hz
[  119.265228] omapdss DISPC: dispc_runtime_put
[  119.962738] DISPC IRQ: 0x40a2: SYNC_LOST
[  119.968017] omapdss DISPC: dispc_runtime_get


> 
> The pm_runtime_get_sync() call in dss_runtime_get() is returning  a 
> negative value, could you print out that value too?
> 
> If the call to pm_runtime_get_sync() fails, we bail out immediately. So
> the LCD interface shouldn't have been enabled at all. I don't know why 
> we get sync lost errors.
> 
> There is a possibility that the LCD interface wasn't switched off when 
> for some reason when we suspended, and switching on the clocks again 
> tries to start the transfer without DSS being configured correctly?
> 
> Also, are you on the final 3.2, or the 3.2-rc's by any chance? There 
> were some hwmod patches which got in 3.2-rc4 related to hwmod fixes.

Yep, it's definately the final mainline 3.2 taken from kernel.org.

Cheers,
Joe

> Archit
> 
> >
> >>   - HDQ seems to lose everything.
> >
> > Is this with off-mode enabled, or just with retention idle?
> >
> > If the former, this is hardly surprising as the HDQ driver
> > (drivers/w1/masters/omap_hdq.c) doesn't contain any context
> save/restore
> > code.  In fact the HDQ driver doesn't even use PM runtime, so quite a
> bit
> > of work is needed there.
> >
> >
> > - Paul
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-omap"
> in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-11 15:15       ` Joe Woodward
@ 2012-01-11 15:52         ` Archit
  2012-01-11 16:13           ` Joe Woodward
  2012-01-12  9:28         ` Tomi Valkeinen
  1 sibling, 1 reply; 64+ messages in thread
From: Archit @ 2012-01-11 15:52 UTC (permalink / raw)
  To: Joe Woodward
  Cc: Paul Walmsley, NeilBrown, khilman, linux-omap, Tomi Valkeinen

Hi,

On Wednesday 11 January 2012 08:45 PM, Joe Woodward wrote:
> ...snip...
>
>>
>> Could you enable omapdss debug by adding 'omapdss.debug=1 debug' in
>> your bootargs and share logs?
>
> As requested:
>
> # echo mem>  /sys/power/state
> [   37.371734] PM: Syncing filesystems ... done.
> [   37.397460] PM: Preparing system for mem sleep
> [   37.402923] Freezing user space processes ... (elapsed 0.02 seconds) done.
> [   37.432312] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) done.
> [   37.463256] PM: Entering mem sleep
> [   37.467132] Suspending console(s) (use no_console_suspend to debug)
>
> #### SYSTEM SUSPENDS HERE
>
> [   37.587646] omapdss CORE: suspend 2
> [   37.776000] omapdss DISPC: dispc_runtime_put
> [   37.776336] omapdss DSS: dss_runtime_put
> [   37.779693] PM: suspend of devices complete after 305.389 msecs
> [   37.786010] omapdss DISPC: dispc_save_context
> [   37.786437] omapdss DISPC: context saved, ctx_loss_count 0
> [   37.786560] omapdss DSS: dss_runtime_put
> [   37.786773] omapdss DSS: dss_save_context
> [   37.786895] omapdss DSS: context saved
> [   37.787384] PM: late suspend of devices complete after 7.446 msecs
> [   40.932312] Successfully put all powerdomains to target state
> [   40.934509] omapdss DSS: dss_restore_context
> [   40.934661] omapdss DSS: context restored
> [   40.934814] omapdss DSS: dss_runtime_get
> [   40.934967] @@@ dss_runtime_get::pm_runtime_get_sync() return val = -13

This resume/restore looks to be happening a bit early according to me. 
All the resuming should happen after the print 'omapdss CORE: resume'.

One experiment worth trying would be to reboot/halt the kernel which 
will lead to only shutdown of DSS, and no resume. This will help us 
figure out if the issue is in the suspend or resume path.

Archit

> [   40.935089] ------------[ cut here ]------------
> [   40.935241] WARNING: at drivers/video/omap2/dss/dss.c:716 0xc0137304()
> [   40.935363] Modules linked in:
> [   40.935516] Backtrace:
> [   40.935638] Function entered at [<c0010c48>] from [<c02bdad0>]
> [   40.935791]  r6:c02fa302 r5:000002cc r4:00000000 r3:00000000
> [   40.936065] Function entered at [<c02bdab8>] from [<c0034874>]
> [   40.936218] Function entered at [<c0034820>] from [<c00348b0>]
> [   40.936340]  r8:00000000 r7:c0386c14 r6:c0028c48 r5:c0371c24 r4:fffffff3
> [   40.936645] r3:00000009
> [   40.936767] Function entered at [<c003488c>] from [<c0137304>]
> [   40.936920] Function entered at [<c01372a8>] from [<c0137b20>]
> [   40.937042]  r4:dec50008 r3:c0137b10
> [   40.937225] Function entered at [<c0137b10>] from [<c0179ff0>]
> [   40.937347]  r6:c0028c48 r5:c0371c24 r4:dec50008 r3:c0137b10
> [   40.937652] Function entered at [<c0179fbc>] from [<c0028ca4>]
> [   40.937774] Function entered at [<c0028c48>] from [<c017afc0>]
> [   40.937927]  r4:dec50008 r3:00000000
> [   40.938079] Function entered at [<c017aee0>] from [<c017b5ec>]
> [   40.938232]  r9:c02d0044 r8:00000000 r6:dec5005c r5:00000010 r4:dec50008
> [   40.938568] Function entered at [<c017b520>] from [<c0063e48>]
> [   40.938690]  r8:c02c3898 r7:00000000 r6:00000003 r5:00000000 r4:00000000
> [   40.938995] Function entered at [<c0063cc0>] from [<c0064004>]
> [   40.939147]  r7:00000004 r6:00000000 r5:c02c388c r4:00000003
> [   40.939422] Function entered at [<c0063ef4>] from [<c0062e78>]
> [   40.939544]  r6:00000003 r5:00000003 r4:d8869000 r3:0000006d
> [   40.939819] Function entered at [<c0062dd0>] from [<c0117ac8>]
> [   40.939971] Function entered at [<c0117aac>] from [<c00e2e54>]
> [   40.940124] Function entered at [<c00e2d44>] from [<c009903c>]
> [   40.940246] Function entered at [<c0098f88>] from [<c00992b8>]
> [   40.940368]  r8:00000004 r7:00000000 r6:00000000 r5:000aabf8 r4:d857c940
> [   40.940704] Function entered at [<c0099274>] from [<c000dcc0>]
> [   40.941101]  r8:c000de44 r7:00000004 r6:000aabf8 r5:00000004 r4:000a8e38
> [   40.941436] ---[ end trace db0a5fd53a21ce6e ]---
> [   40.943511] PM: early resume of devices complete after 9.307 msecs
> [   40.950622] omapdss DISPC: dispc_save_context
> [   40.951049] omapdss DISPC: context saved, ctx_loss_count 0
> [   40.951171] omapdss DSS: dss_runtime_put
> [   40.951812] omapdss CORE: resume
> [   40.953460] omapdss DSS: dss_runtime_get
> [   40.953613] @@@ dss_runtime_get::pm_runtime_get_sync() return val = 1
> [   40.953735] omapdss DISPC: dispc_runtime_get
> [   40.953948] omapdss DSS: dss_runtime_get
> [   40.954071] @@@ dss_runtime_get::pm_runtime_get_sync() return val = 1
> [   40.954193] omapdss DISPC: dispc_restore_context
> [   40.954467] omapdss DISPC: onoff 0 rf 0 ieo 0 ipc 0 ihs 1 ivs 1 acbi 0 acb 0
> [   40.954620] omapdss DSS: dispc clock info found from cache.
> [   40.954772] omapdss DSS: dpll4_m4 = 432000000
> [   40.954986] omapdss DSS: fck = 72000000 (12)
> [   40.955108] omapdss DISPC: lck = 72000000 (1)
> [   40.955230] omapdss DISPC: pck = 36000000 (2)
> [   40.955383] omapdss DISPC: channel 0 xres 800 yres 480
> [   40.955505] omapdss DISPC: pck 36000
> [   40.955657] omapdss DISPC: hsw 10 hfp 164 hbp 89 vsw 10 vfp 10 vbp 23
> [   40.955810] omapdss DISPC: hsync 33866Hz, vsync 64Hz
> [   41.294464] PM: resume of devices complete after 345.153 msecs
> [   41.688568] PM: Finishing wakeup.
> [   41.692169] Restarting tasks ... done.
>
> When SYNC is lost we get something like:
> [  118.972534] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled
> [  119.166839] omapdss DISPC: dispc_runtime_put
> [  119.175292] omapdss DSS: dss_runtime_put
> [  119.183746] omapdss DISPC: dispc_enable_plane 1, 0
> [  119.193054] omapdss DISPC: dispc_enable_plane 2, 0
> [  119.261230] omapdss DSS: dss_runtime_get
> [  119.261383] @@@ dss_runtime_get::pm_runtime_get_sync() return val = 1
> [  119.261505] omapdss DISPC: dispc_runtime_get
> [  119.261718] omapdss DISPC: onoff 0 rf 0 ieo 0 ipc 0 ihs 1 ivs 1 acbi 0 acb 0
> [  119.261871] omapdss DSS: dispc clock info found from cache.
> [  119.262023] omapdss DSS: dpll4_m4 = 432000000
> [  119.262237] omapdss DSS: fck = 72000000 (12)
> [  119.262329] omapdss DISPC: lck = 72000000 (1)
> [  119.262451] omapdss DISPC: pck = 36000000 (2)
> [  119.262603] omapdss DISPC: channel 0 xres 800 yres 480
> [  119.262725] omapdss DISPC: pck 36000
> [  119.262878] omapdss DISPC: hsw 10 hfp 164 hbp 89 vsw 10 vfp 10 vbp 23
> [  119.263000] omapdss DISPC: hsync 33866Hz, vsync 64Hz
> [  119.265228] omapdss DISPC: dispc_runtime_put
> [  119.962738] DISPC IRQ: 0x40a2: SYNC_LOST
> [  119.968017] omapdss DISPC: dispc_runtime_get
>
>
>>
>> The pm_runtime_get_sync() call in dss_runtime_get() is returning  a
>> negative value, could you print out that value too?
>>
>> If the call to pm_runtime_get_sync() fails, we bail out immediately. So
>> the LCD interface shouldn't have been enabled at all. I don't know why
>> we get sync lost errors.
>>
>> There is a possibility that the LCD interface wasn't switched off when
>> for some reason when we suspended, and switching on the clocks again
>> tries to start the transfer without DSS being configured correctly?
>>
>> Also, are you on the final 3.2, or the 3.2-rc's by any chance? There
>> were some hwmod patches which got in 3.2-rc4 related to hwmod fixes.
>
> Yep, it's definately the final mainline 3.2 taken from kernel.org.
>
> Cheers,
> Joe
>
>> Archit
>>
>>>
>>>>    - HDQ seems to lose everything.
>>>
>>> Is this with off-mode enabled, or just with retention idle?
>>>
>>> If the former, this is hardly surprising as the HDQ driver
>>> (drivers/w1/masters/omap_hdq.c) doesn't contain any context
>> save/restore
>>> code.  In fact the HDQ driver doesn't even use PM runtime, so quite a
>> bit
>>> of work is needed there.
>>>
>>>
>>> - Paul
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-omap"
>> in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-omap"
>> in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-11 15:52         ` Archit
@ 2012-01-11 16:13           ` Joe Woodward
  2012-01-11 16:54             ` Archit
  0 siblings, 1 reply; 64+ messages in thread
From: Joe Woodward @ 2012-01-11 16:13 UTC (permalink / raw)
  To: Archit; +Cc: Paul Walmsley, NeilBrown, khilman, linux-omap, Tomi Valkeinen

-----Original Message-----
From: Archit <a0393947@ti.com>
To: Joe Woodward <jw@terrafix.co.uk>
Cc: Paul Walmsley <paul@pwsan.com>, NeilBrown <neilb@suse.de>, <khilman@ti.com>, <linux-omap@vger.kernel.org>, Tomi Valkeinen 
<tomi.valkeinen@ti.com>
Date: Wed, 11 Jan 2012 21:22:00 +0530
Subject: Re: DSS2/PM on 3.2 broken?

> Hi,
> 
> On Wednesday 11 January 2012 08:45 PM, Joe Woodward wrote:
> > ...snip...
> >
> >>
> >> Could you enable omapdss debug by adding 'omapdss.debug=1 debug' in
> >> your bootargs and share logs?
> >
> > As requested:
> >
> > # echo mem>  /sys/power/state
> > [   37.371734] PM: Syncing filesystems ... done.
> > [   37.397460] PM: Preparing system for mem sleep
> > [   37.402923] Freezing user space processes ... (elapsed 0.02
> seconds) done.
> > [   37.432312] Freezing remaining freezable tasks ... (elapsed 0.02
> seconds) done.
> > [   37.463256] PM: Entering mem sleep
> > [   37.467132] Suspending console(s) (use no_console_suspend to
> debug)
> >
> > #### SYSTEM SUSPENDS HERE
> >
> > [   37.587646] omapdss CORE: suspend 2
> > [   37.776000] omapdss DISPC: dispc_runtime_put
> > [   37.776336] omapdss DSS: dss_runtime_put
> > [   37.779693] PM: suspend of devices complete after 305.389 msecs
> > [   37.786010] omapdss DISPC: dispc_save_context
> > [   37.786437] omapdss DISPC: context saved, ctx_loss_count 0
> > [   37.786560] omapdss DSS: dss_runtime_put
> > [   37.786773] omapdss DSS: dss_save_context
> > [   37.786895] omapdss DSS: context saved
> > [   37.787384] PM: late suspend of devices complete after 7.446 msecs
> > [   40.932312] Successfully put all powerdomains to target state
> > [   40.934509] omapdss DSS: dss_restore_context
> > [   40.934661] omapdss DSS: context restored
> > [   40.934814] omapdss DSS: dss_runtime_get
> > [   40.934967] @@@ dss_runtime_get::pm_runtime_get_sync() return val
> = -13
> 
> This resume/restore looks to be happening a bit early according to me. 
> All the resuming should happen after the print 'omapdss CORE: resume'.
> 
> One experiment worth trying would be to reboot/halt the kernel which 
> will lead to only shutdown of DSS, and no resume. This will help us 
> figure out if the issue is in the suspend or resume path.
> 
> Archit

I'm not 100% sure what you mean for me to do here?

Do you mean to halt the kernel after suspending, but before resuming? If so, how would I go about doing this?

If I just reboot or poweroff I get the following...

# poweroff
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system poweroff
[   43.310607] omapdss CORE: shutdown
[   43.499053] omapdss DISPC: dispc_runtime_put
[   43.504272] omapdss DISPC: dispc_save_context
[   43.509399] omapdss DISPC: context saved, ctx_loss_count 0
[   43.515747] omapdss DSS: dss_runtime_put
[   43.521453] omapdss DSS: dss_runtime_put
[   43.527221] System halted.

# reboot
The system is going down NOW!
Sent SIGTERM to all processes
Sent SIGKILL to all processes
Requesting system reboot
[   25.652008] omapdss CORE: shutdown
[   25.836120] omapdss DISPC: dispc_runtime_put
[   25.841125] omapdss DISPC: dispc_save_context
[   25.846221] omapdss DSS: dss_runtime_put
[   25.851776] omapdss DISPC: context saved, ctx_loss_count 0
[   25.857788] omapdss DSS: dss_runtime_put
[   25.864074] Restarting system.

Cheers,
Joe



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-11 16:13           ` Joe Woodward
@ 2012-01-11 16:54             ` Archit
  0 siblings, 0 replies; 64+ messages in thread
From: Archit @ 2012-01-11 16:54 UTC (permalink / raw)
  To: Joe Woodward
  Cc: Paul Walmsley, NeilBrown, khilman, linux-omap, Tomi Valkeinen

Hi,

On Wednesday 11 January 2012 09:43 PM, Joe Woodward wrote:
> -----Original Message-----
> From: Archit<a0393947@ti.com>
> To: Joe Woodward<jw@terrafix.co.uk>
> Cc: Paul Walmsley<paul@pwsan.com>, NeilBrown<neilb@suse.de>,<khilman@ti.com>,<linux-omap@vger.kernel.org>, Tomi Valkeinen
> <tomi.valkeinen@ti.com>
> Date: Wed, 11 Jan 2012 21:22:00 +0530
> Subject: Re: DSS2/PM on 3.2 broken?
>
>> Hi,
>>
>> On Wednesday 11 January 2012 08:45 PM, Joe Woodward wrote:
>>> ...snip...
>>>
>>>>
>>>> Could you enable omapdss debug by adding 'omapdss.debug=1 debug' in
>>>> your bootargs and share logs?
>>>

<snip>

>> Archit
>
> I'm not 100% sure what you mean for mmount -t debugfs debugfs /sys/kernel/debuge to do here?
>
> Do you mean to halt the kernel after suspending, but before resuming? If so, how would I go about doing this?
>

I'd asked for reboot or power off as you have done below. The suspend 
path looks okay, I asked for the power off to see if there was a 
dss_runtime_get() was called at the wrong time by some interrupt while 
suspending.

One difference between the shutdown and suspend logs is that we don't 
see the prints 'omapdss DSS: dss_save_context' and 'omapdss DSS: context 
saved', that seems a bit fishy.

I'm not sure how to go ahead from here. We haven't seen this on OMAP4, 
and I don't think I have tried this on OAMP3. Maybe Tomi has more ideas.

Archit

> If I just reboot or poweroff I get the following...
>
> # poweroff
> The system is going down NOW!
> Sent SIGTERM to all processes
> Sent SIGKILL to all processes
> Requesting system poweroff
> [   43.310607] omapdss CORE: shutdown
> [   43.499053] omapdss DISPC: dispc_runtime_put
> [   43.504272] omapdss DISPC: dispc_save_context
> [   43.509399] omapdss DISPC: context saved, ctx_loss_count 0
> [   43.515747] omapdss DSS: dss_runtime_put
> [   43.521453] omapdss DSS: dss_runtime_put
> [   43.527221] System halted.
>
> # reboot
> The system is going down NOW!
> Sent SIGTERM to all processes
> Sent SIGKILL to all processes
> Requesting system reboot
> [   25.652008] omapdss CORE: shutdown
> [   25.836120] omapdss DISPC: dispc_runtime_put
> [   25.841125] omapdss DISPC: dispc_save_context
> [   25.846221] omapdss DSS: dss_runtime_put
> [   25.851776] omapdss DISPC: context saved, ctx_loss_count 0
> [   25.857788] omapdss DSS: dss_runtime_put
> [   25.864074] Restarting system.
>
> Cheers,
> Joe
>
>


^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-11 13:43   ` Paul Walmsley
  2012-01-11 14:22     ` Archit
@ 2012-01-11 22:59     ` NeilBrown
  2012-01-13 10:05       ` Paul Walmsley
  2012-01-13 11:19       ` Paul Walmsley
  1 sibling, 2 replies; 64+ messages in thread
From: NeilBrown @ 2012-01-11 22:59 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: Joe Woodward, khilman, linux-omap

[-- Attachment #1: Type: text/plain, Size: 2687 bytes --]

On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:

> cc Kevin
> 
> On Tue, 10 Jan 2012, NeilBrown wrote:
> 
> > It seems that when cpuidle on an omap3 tries to switch to lower power
> > states, various things misbehave:
> >  - UARTs lose characters
> 
> Is this with off-mode enabled, or is this with only retention idle 
> enabled?

I haven't experimented with off-mode much.  When I try it the suspend-time
current drain goes from 65mA to about 90mA which doesn't seem the right
direction.  The only other observation is that the "cam" power domain
doesn't reach the target state of '0'.  I'm guessing they are related but
haven't progressed further. (I don't have anything like a camera driver
compiled in - nor do I have a camera attached).

So it is just retention.

I occasionally see evidence of a garbled char when I wake from suspend, but
that doesn't bother me much.

I spent some time exploring why cpuidle never drops below state 0 and found
out that the code explicitly disables other states when uart is active - for
a fairly broad definition of 'active'.

I found the "sleep_timeout" setting and set them all to 1 second.  This meant
that cpuidle started working, but I got a lot of garbage characters.

> 
> If off-mode is enabled, and incoming serial traffic is used to wake the 
> system, it's known and expected that the first byte will be lost.  This is 
> an artifact of the hardware and seems to be unavoidable.

Does this really mean CPU_IDLE cannot be used when you might expect chars
from a serial port - e.g. GPS or Bluetooth?
Which power domain needs to stay non-off?  I would guess 'CORE' for UART1,2
and 'PER' for UART3,4.  I'm using UART3 for the console port, but it looks
like 'PER' doesn't get switched off by CPUIDLE.

What is the important issue here that I don't know about ? :-)

> 
> >  - dss loses sync
> 
> Best to take this up with the DSS maintainer directly.
> 
> >  - HDQ seems to lose everything.
> 
> Is this with off-mode enabled, or just with retention idle?

Not off-mode - just retention.

> 
> If the former, this is hardly surprising as the HDQ driver 
> (drivers/w1/masters/omap_hdq.c) doesn't contain any context save/restore 
> code.  In fact the HDQ driver doesn't even use PM runtime, so quite a bit 
> of work is needed there. 

The HDQ driver does enable/disable the iclk and fclk.  I thought I read that
having the clocks enabled would prevent the powerdomain from dropping to a
lower-power state ??

Could you suggest some other driver which does do the right sort of runtime
PM that I could try copying?

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-11 15:15       ` Joe Woodward
  2012-01-11 15:52         ` Archit
@ 2012-01-12  9:28         ` Tomi Valkeinen
  2012-01-12  9:30           ` Tomi Valkeinen
  2012-01-12  9:51           ` Tomi Valkeinen
  1 sibling, 2 replies; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-12  9:28 UTC (permalink / raw)
  To: Joe Woodward, khilman; +Cc: Archit, Paul Walmsley, NeilBrown, linux-omap

[-- Attachment #1: Type: text/plain, Size: 1389 bytes --]

Hi,

On Wed, 2012-01-11 at 15:15 +0000, Joe Woodward wrote:

> > The pm_runtime_get_sync() call in dss_runtime_get() is returning  a 
> > negative value, could you print out that value too?
> > 
> > If the call to pm_runtime_get_sync() fails, we bail out immediately. So
> > the LCD interface shouldn't have been enabled at all. I don't know why 
> > we get sync lost errors.
> > 
> > There is a possibility that the LCD interface wasn't switched off when 
> > for some reason when we suspended, and switching on the clocks again 
> > tries to start the transfer without DSS being configured correctly?

I can reproduce this on my Overo. The problem goes away if I change
dss's and dispc'c pm_runtime_put() calls to pm_runtime_put_sync(). But
I'm not quite sure why.

I would imagine that the pm framework would flush the async
pm_runtime_put calls before the system goes into suspend. Comparing the
logs of successful and non-successful suspends it seems that the order
of suspend and resume events changes. Perhaps the suspend goes fine, but
for whatever reason the resume calls happen in a different order,
causing the problems. I need to study this more to understand the
problem.

Kevin, can you confirm that just using pm_runtime_put() is enough even
in system suspend case, and the framework handles doing the runtime_put
before suspending?

 Tomi


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-12  9:28         ` Tomi Valkeinen
@ 2012-01-12  9:30           ` Tomi Valkeinen
  2012-01-12  9:51           ` Tomi Valkeinen
  1 sibling, 0 replies; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-12  9:30 UTC (permalink / raw)
  To: Joe Woodward; +Cc: khilman, Archit, Paul Walmsley, NeilBrown, linux-omap

[-- Attachment #1: Type: text/plain, Size: 1019 bytes --]

On Thu, 2012-01-12 at 11:28 +0200, Tomi Valkeinen wrote:
> Hi,
> 
> On Wed, 2012-01-11 at 15:15 +0000, Joe Woodward wrote:
> 
> > > The pm_runtime_get_sync() call in dss_runtime_get() is returning  a 
> > > negative value, could you print out that value too?
> > > 
> > > If the call to pm_runtime_get_sync() fails, we bail out immediately. So
> > > the LCD interface shouldn't have been enabled at all. I don't know why 
> > > we get sync lost errors.
> > > 
> > > There is a possibility that the LCD interface wasn't switched off when 
> > > for some reason when we suspended, and switching on the clocks again 
> > > tries to start the transfer without DSS being configured correctly?
> 
> I can reproduce this on my Overo. The problem goes away if I change
> dss's and dispc'c pm_runtime_put() calls to pm_runtime_put_sync(). But
> I'm not quite sure why.

Oh, one more thing. I didn't get any sync losts, but I may just have
been lucky. So it may or may not be a separate problem.

 Tomi


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-12  9:28         ` Tomi Valkeinen
  2012-01-12  9:30           ` Tomi Valkeinen
@ 2012-01-12  9:51           ` Tomi Valkeinen
  1 sibling, 0 replies; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-12  9:51 UTC (permalink / raw)
  To: Joe Woodward; +Cc: khilman, Archit, Paul Walmsley, NeilBrown, linux-omap

[-- Attachment #1: Type: text/plain, Size: 1161 bytes --]

On Thu, 2012-01-12 at 11:28 +0200, Tomi Valkeinen wrote:
> Hi,
> 
> On Wed, 2012-01-11 at 15:15 +0000, Joe Woodward wrote:
> 
> > > The pm_runtime_get_sync() call in dss_runtime_get() is returning  a 
> > > negative value, could you print out that value too?
> > > 
> > > If the call to pm_runtime_get_sync() fails, we bail out immediately. So
> > > the LCD interface shouldn't have been enabled at all. I don't know why 
> > > we get sync lost errors.
> > > 
> > > There is a possibility that the LCD interface wasn't switched off when 
> > > for some reason when we suspended, and switching on the clocks again 
> > > tries to start the transfer without DSS being configured correctly?
> 
> I can reproduce this on my Overo. The problem goes away if I change
> dss's and dispc'c pm_runtime_put() calls to pm_runtime_put_sync(). But
> I'm not quite sure why.

(I seem to have not waken up today...)

I also forgot to mention that the fail happens in
drivers/base/power/runtime.c:rpm_resume(), where the check
"dev->power.disable_depth > 0" causes the function to return EACCES.

I'm not sure what disable_depth > 0 means.

 Tomi


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-09 12:46 DSS2/PM on 3.2 broken? Joe Woodward
  2012-01-09 21:08 ` NeilBrown
  2012-01-11 13:32 ` Paul Walmsley
@ 2012-01-12 16:42 ` Tomi Valkeinen
  2012-01-12 22:40   ` Kevin Hilman
  2 siblings, 1 reply; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-12 16:42 UTC (permalink / raw)
  To: Joe Woodward, Kevin Hilman; +Cc: linux-omap, Archit Taneja

[-- Attachment #1: Type: text/plain, Size: 9090 bytes --]

On Mon, 2012-01-09 at 12:46 +0000, Joe Woodward wrote:
> I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel connected via the DPI interface (using the generic panel driver).
> 
> Entering standby used to work just fine on 3.0, but on 3.2 I get the following:

I've been debugging this, but I'm at loss. I added some debug prints,
which I show below, and I also pushed them to
"git://gitorious.org/linux-omap-dss2/linux.git pm-test-prints" if
somebody wants to see exactly what they print.

Kevin, perhaps you have an idea what could be wrong here. Long version
below, short version: doing system suspend breaks omapdss, if omapdss
uses pm_runtime_put, but works with pm_runtime_put_sync.

First, as a test case, here's what happens if I manually disable a
display (with my comments beginning with #):

echo 0 > /sys/devices/platform/omapdss/display0/enabled 
# here dpi.c calls dispc_runtime_put
[   44.528015] dispc_runtime_put
[   44.531158] dispc disable depth: 0
[   44.535308] dispc disable depth: 0
[   44.538879] dispc_runtime_put end 0
# here dpi.c calls dss_runtime_put
[   44.542633] dss_runtime_put
[   44.545593] dss disable depth: 0
[   44.548980] dss disable depth: 0
[   44.552429] dss_runtime_put end 0

# then the runtime PM runs dispc suspend in its workqueue
[   44.557861] dispc_runtime_suspend start
# here dispc_runtime_suspend uses dss_runtime_put
[   44.561981] dss_runtime_put
[   44.564910] dss disable depth: 0
[   44.568420] dss disable depth: 0
[   44.571838] dss_runtime_put end 0
[   44.575378] dispc_runtime_suspend end
# then the runtime PM runs dss suspend in its workqueue
[   44.579315] dss_runtime_suspend start
[   44.583221] dss_runtime_suspend end


And similarly when enabling the display:

echo 1 > /sys/devices/platform/omapdss/display0/enabled 
[  442.109222] dss_runtime_get
[  442.112304] dss disable depth: 0
[  442.115753] dss_runtime_resume start
[  442.119537] dss_runtime_resume end
[  442.123199] dss disable depth: 0
[  442.126586] dss_runtime_get end 0
[  442.130126] dispc_runtime_get
[  442.133270] dispc disable depth: 0
[  442.136932] dispc_runtime_resume start
[  442.140869] dss_runtime_get
[  442.143890] dss disable depth: 0
[  442.147308] dss disable depth: 0
[  442.150695] dss_runtime_get end 1
[  442.154235] dispc_runtime_resume end
[  442.158020] dispc disable depth: 0
[  442.161651] dispc_runtime_get end 0


This all works fine. Now, if I suspend the system:

echo mem > /sys/power/state 
[  482.097930] PM: Syncing filesystems ... done.
[  482.105041] PM: Preparing system for mem sleep
[  482.116394] Freezing user space processes ... (elapsed 0.02 seconds) done.
[  482.146331] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) done.
[  482.177520] PM: Entering mem sleep
# dss core suspend disables the displays, as was done manually above
[  482.197570] core suspend 2
[  482.213867] dispc_runtime_put
[  482.217010] dispc disable depth: 0
[  482.220886] dispc disable depth: 0
[  482.224578] dispc_runtime_put end 0
[  482.228271] dss_runtime_put
[  482.231262] dss disable depth: 0
[  482.234649] dss disable depth: 0
[  482.238128] dss_runtime_put end 0
[  482.242218] core suspend end 0
# so far looks exactly the same as above

[  482.246795] PM: suspend of devices complete after 60.260 msecs
# now runtime PM runs the suspends in its workqueue
[  482.256744] dispc_runtime_suspend start
[  482.260955] dss_runtime_put

# dss's disable_depth is 1?? Which, afaik, means something like runtime PM is disabled for DSS.

[  482.263916] dss disable depth: 1
[  482.267303] dss disable depth: 1
[  482.270782] dss_runtime_put end 0
[  482.274261] dispc_runtime_suspend end
[  482.278381] dss_runtime_suspend start
[  482.282257] dss_runtime_suspend end
[  482.286224] PM: late suspend of devices complete after 33.209 msecs
[  482.292968] Disabling non-boot CPUs ...


Except the disable_depth, the suspend went as in manual case. But on resume things are rather broken:


[  696.654754] Successfully put all powerdomains to target state

# Here the PM framework calls dss's runtime_resume callback, even if nobody has called pm_rutime_get for dss!

[  696.661254] dss_runtime_resume start
[  696.665100] dss_runtime_resume end
[  696.668731] dispc_runtime_resume start
[  696.672760] dss_runtime_get
[  696.675689] dss disable depth: 1
[  696.679168] EACCESS RPM_RESUME
[  696.682373] dss disable depth: 1

# And here pm_runtime_get fails for DSS, because dss disable_depth is 1...

[  696.685791] ------------[ cut here ]------------
[  696.690734] WARNING: at drivers/video/omap2/dss/dss.c:716 dss_runtime_get+0x74/0x8c()
[  696.699035] Modules linked in:
[  696.702392] [<c001b3c4>] (unwind_backtrace+0x0/0xf0) from [<c005248c>] (warn_slowpath_common+0x4c
/0x64)
[  696.712371] [<c005248c>] (warn_slowpath_common+0x4c/0x64) from [<c00524c0>] (warn_slowpath_null+0
x1c/0x24)
[  696.722625] [<c00524c0>] (warn_slowpath_null+0x1c/0x24) from [<c0273cd8>] (dss_runtime_get+0x74/0
x8c)
[  696.732421] [<c0273cd8>] (dss_runtime_get+0x74/0x8c) from [<c0274610>] (dispc_runtime_resume+0x10
/0x149c)
[  696.742584] [<c0274610>] (dispc_runtime_resume+0x10/0x149c) from [<c02d09c0>] (pm_generic_runtime
_resume+0x2c/0x40)
[  696.753662] [<c02d09c0>] (pm_generic_runtime_resume+0x2c/0x40) from [<c003af70>] (_od_resume_noir
q+0x48/0x54)
[  696.764190] [<c003af70>] (_od_resume_noirq+0x48/0x54) from [<c02d1c54>] (pm_noirq_op.clone.5+0x9c
/0x164)
[  696.774291] [<c02d1c54>] (pm_noirq_op.clone.5+0x9c/0x164) from [<c02d1d7c>] (dpm_resume_noirq+0x6
0/0x1f4)
[  696.784454] [<c02d1d7c>] (dpm_resume_noirq+0x60/0x1f4) from [<c0098e2c>] (suspend_devices_and_ent
er+0x114/0x2d0)
[  696.795227] [<c0098e2c>] (suspend_devices_and_enter+0x114/0x2d0) from [<c0099128>] (enter_state+0
x140/0x180)
[  696.805694] [<c0099128>] (enter_state+0x140/0x180) from [<c0097f54>] (state_store+0xd0/0x178)
[  696.814758] [<c0097f54>] (state_store+0xd0/0x178) from [<c02460a4>] (kobj_attr_store+0x14/0x20)
[  696.824005] [<c02460a4>] (kobj_attr_store+0x14/0x20) from [<c01626c0>] (sysfs_write_file+0x100/0x
184)
[  696.833801] [<c01626c0>] (sysfs_write_file+0x100/0x184) from [<c0101e20>] (vfs_write+0xb4/0x148)
[  696.843139] [<c0101e20>] (vfs_write+0xb4/0x148) from [<c01020a8>] (sys_write+0x40/0x70)
[  696.851654] [<c01020a8>] (sys_write+0x40/0x70) from [<c0013b60>] (ret_fast_syscall+0x0/0x3c)
[  696.860626] ---[ end trace 14e40b198c698a6f ]---
[  696.865478] dss_runtime_get end -13
[  696.869201] dispc_runtime_resume end fail
[  696.875549] PM: early resume of devices complete after 214.509 msecs
[  696.884704] dispc_runtime_suspend start
[  696.888824] dss_runtime_put
[  696.891845] dss disable depth: 0
[  696.895568] dss disable depth: 0
[  696.899047] dss_runtime_put end 0
[  696.902526] dispc_runtime_suspend end
[  696.906616] core resume
[  696.910339] dss_runtime_get
[  696.913391] dss disable depth: 0
[  696.916809] dss disable depth: 0
[  696.920196] dss_runtime_get end 1
[  696.923767] dispc_runtime_get
[  696.926879] dispc disable depth: 0
[  696.930603] dispc_runtime_resume start
[  696.934539] dss_runtime_get
[  696.937561] dss disable depth: 0
[  696.940979] dss disable depth: 0
[  696.944366] dss_runtime_get end 1
[  696.947906] dispc_runtime_resume end
[  696.951751] dispc disable depth: 0
[  696.955383] dispc_runtime_get end 0
[  696.961212] core resume end 0
[  696.973419] PM: resume of devices complete after 89.477 msecs
[  696.984893] PM: Finishing wakeup.
[  696.988403] Restarting tasks ... done.


After changing pm_runtime_put calls in dss and dispc to sync versions:


echo mem > /sys/power/state 
[   21.312133] PM: Syncing filesystems ... done.
[   21.319000] PM: Preparing system for mem sleep
[   21.331390] Freezing user space processes ... (elapsed 0.02 seconds) done.
[   21.361297] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) done.
[   21.392517] PM: Entering mem sleep
[   21.412506] core suspend 2
[   21.423370] dispc_runtime_put
[   21.426513] dispc disable depth: 0
[   21.430236] dispc_runtime_suspend start
[   21.434326] dss_runtime_put
[   21.437347] dss disable depth: 0
[   21.440734] dss disable depth: 0
[   21.444152] dss_runtime_put end 0
[   21.447692] dispc_runtime_suspend end
[   21.451660] dispc disable depth: 0
[   21.455322] dispc_runtime_put end 0
[   21.458984] dss_runtime_put
[   21.462005] dss disable depth: 0
[   21.465423] dss_runtime_suspend start
[   21.469329] dss_runtime_suspend end
[   21.473052] dss disable depth: 0
[   21.476531] dss_runtime_put end 0
[   21.480560] core suspend end 0
[   21.485137] PM: suspend of devices complete after 83.606 msecs
[   21.495483] PM: late suspend of devices complete after 4.089 msecs
[   21.502166] Disabling non-boot CPUs ...

Looks quite similar, but everything is done before "PM: suspend of
devices..." line. And resume works fine also.

 Tomi


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-12 16:42 ` Tomi Valkeinen
@ 2012-01-12 22:40   ` Kevin Hilman
  2012-01-13  5:29     ` Tomi Valkeinen
  0 siblings, 1 reply; 64+ messages in thread
From: Kevin Hilman @ 2012-01-12 22:40 UTC (permalink / raw)
  To: Tomi Valkeinen; +Cc: Joe Woodward, linux-omap, Archit Taneja

Tomi Valkeinen <tomi.valkeinen@ti.com> writes:

> On Mon, 2012-01-09 at 12:46 +0000, Joe Woodward wrote:
>> I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel connected via the DPI interface (using the generic panel driver).
>> 
>> Entering standby used to work just fine on 3.0, but on 3.2 I get the following:
>
> I've been debugging this, but I'm at loss. I added some debug prints,
> which I show below, and I also pushed them to
> "git://gitorious.org/linux-omap-dss2/linux.git pm-test-prints" if
> somebody wants to see exactly what they print.
>
> Kevin, perhaps you have an idea what could be wrong here. Long version
> below, short version: doing system suspend breaks omapdss, if omapdss
> uses pm_runtime_put, but works with pm_runtime_put_sync.

Is the pm_runtime_put() done after system suspend has started?  

After system suspend has begun, the workqueue used for async callbacks
is frozen, so any runtime PM calls that you want to work during
suspend/resume must use the _sync versions.

Kevin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-12 22:40   ` Kevin Hilman
@ 2012-01-13  5:29     ` Tomi Valkeinen
  2012-01-13 19:30       ` Kevin Hilman
  0 siblings, 1 reply; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-13  5:29 UTC (permalink / raw)
  To: Kevin Hilman; +Cc: Joe Woodward, linux-omap, Archit Taneja

On Thu, 2012-01-12 at 14:40 -0800, Kevin Hilman wrote:
> Tomi Valkeinen <tomi.valkeinen@ti.com> writes:
> 
> > On Mon, 2012-01-09 at 12:46 +0000, Joe Woodward wrote:
> >> I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel connected via the DPI interface (using the generic panel driver).
> >> 
> >> Entering standby used to work just fine on 3.0, but on 3.2 I get the following:
> >
> > I've been debugging this, but I'm at loss. I added some debug prints,
> > which I show below, and I also pushed them to
> > "git://gitorious.org/linux-omap-dss2/linux.git pm-test-prints" if
> > somebody wants to see exactly what they print.
> >
> > Kevin, perhaps you have an idea what could be wrong here. Long version
> > below, short version: doing system suspend breaks omapdss, if omapdss
> > uses pm_runtime_put, but works with pm_runtime_put_sync.
> 
> Is the pm_runtime_put() done after system suspend has started?  
> 
> After system suspend has begun, the workqueue used for async callbacks
> is frozen, so any runtime PM calls that you want to work during
> suspend/resume must use the _sync versions.

pm_runtime_put() is called inside omapdss driver's .suspend callback. So
I guess that means the system suspend has started. However, the logs
show that the runtime_suspend callback _is_ being called before the
system suspend is finished, so the workqueue can't be frozen...

 Tomi



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-11 22:59     ` NeilBrown
@ 2012-01-13 10:05       ` Paul Walmsley
  2012-01-13 11:20         ` NeilBrown
                           ` (2 more replies)
  2012-01-13 11:19       ` Paul Walmsley
  1 sibling, 3 replies; 64+ messages in thread
From: Paul Walmsley @ 2012-01-13 10:05 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Woodward, khilman, t-kristo, govindraj.r, linux-omap

cc Tero, Govindraj

On Thu, 12 Jan 2012, NeilBrown wrote:

> On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
> 
> I spent some time exploring why cpuidle never drops below state 0 and found
> out that the code explicitly disables other states when uart is active - for
> a fairly broad definition of 'active'.
> 
> I found the "sleep_timeout" setting and set them all to 1 second.  This meant
> that cpuidle started working, but I got a lot of garbage characters.
> 

...

> Does this really mean CPU_IDLE cannot be used when you might expect chars
> from a serial port - e.g. GPS or Bluetooth?

Hmmm.  Looking at mach-omap2/pm34xx.c and mach-omap2/cpuidle34xx.c, it 
indeed prevents even the MPU from entering a low-power state when the UART 
is active, as you write.  That doesn't seem correct.

Please try the following patch.  It works fine for me on v3.2 with 
omap2plus_defconfig on a 35xx BeagleBoard.  No dropped or garbled 
characters with console use.  Not too surprising, since the UART has a 
receive FIFO to withstand interrupt servicing delays.


- Paul

From: Paul Walmsley <paul@pwsan.com>
Date: Fri, 13 Jan 2012 02:10:30 -0700
Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
 when the UART is active

For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
driver prevent the MPU powerdomain from entering low-power modes when
any UART isn't asleep.  Possibly it is intended to minimize the ARM
wakeup latency when UART activity arrives, but the UART has a FIFO
that should handle this for most cases, with no dropped characters.  I
may be forgetting something important, though.  And CORE/PER low-power
states are a different matter entirely.

Thanks to NeilBrown <neilb@suse.de> for reporting the problem.

Signed-off-by: Paul Walmsley <paul@pwsan.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Joe Woodward <jw@terrafix.co.uk>
Cc: Tero Kristo <t-kristo@ti.com>
Cc: Kevin Hilman <khilman@ti.com>
Cc: Govindraj.R <govindraj.raja@ti.com>
---
 arch/arm/mach-omap2/cpuidle34xx.c |    8 +++-----
 arch/arm/mach-omap2/pm.h          |    1 -
 arch/arm/mach-omap2/pm34xx.c      |   10 ----------
 3 files changed, 3 insertions(+), 16 deletions(-)

diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c
index 942bb4f..4ef32d4 100644
--- a/arch/arm/mach-omap2/cpuidle34xx.c
+++ b/arch/arm/mach-omap2/cpuidle34xx.c
@@ -183,6 +183,9 @@ static int next_valid_state(struct cpuidle_device *dev,
 			core_deepest_state = PWRDM_POWER_OFF;
 	}
 
+	if (!omap_uart_can_sleep())
+		core_deepest_state = PWRDM_POWER_ON;
+
 	/* Check if current state is valid */
 	if ((cx->valid) &&
 	    (cx->mpu_state >= mpu_deepest_state) &&
@@ -244,11 +247,6 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev,
 	struct omap3_idle_statedata *cx;
 	int ret;
 
-	if (!omap3_can_sleep()) {
-		new_state_idx = drv->safe_state_index;
-		goto select_state;
-	}
-
 	/*
 	 * Prevent idle completely if CAM is active.
 	 * CAM does not have wakeup capability in OMAP3.
diff --git a/arch/arm/mach-omap2/pm.h b/arch/arm/mach-omap2/pm.h
index 4e166ad..eac6fce 100644
--- a/arch/arm/mach-omap2/pm.h
+++ b/arch/arm/mach-omap2/pm.h
@@ -18,7 +18,6 @@
 extern void *omap3_secure_ram_storage;
 extern void omap3_pm_off_mode_enable(int);
 extern void omap_sram_idle(void);
-extern int omap3_can_sleep(void);
 extern int omap_set_pwrdm_state(struct powerdomain *pwrdm, u32 state);
 extern int omap3_idle_init(void);
 
diff --git a/arch/arm/mach-omap2/pm34xx.c b/arch/arm/mach-omap2/pm34xx.c
index efa6649..1c49824 100644
--- a/arch/arm/mach-omap2/pm34xx.c
+++ b/arch/arm/mach-omap2/pm34xx.c
@@ -484,21 +484,11 @@ console_still_active:
 	clkdm_allow_idle(mpu_pwrdm->pwrdm_clkdms[0]);
 }
 
-int omap3_can_sleep(void)
-{
-	if (!omap_uart_can_sleep())
-		return 0;
-	return 1;
-}
-
 static void omap3_pm_idle(void)
 {
 	local_irq_disable();
 	local_fiq_disable();
 
-	if (!omap3_can_sleep())
-		goto out;
-
 	if (omap_irq_pending() || need_resched())
 		goto out;
 
-- 
1.7.8.3


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-11 22:59     ` NeilBrown
  2012-01-13 10:05       ` Paul Walmsley
@ 2012-01-13 11:19       ` Paul Walmsley
  1 sibling, 0 replies; 64+ messages in thread
From: Paul Walmsley @ 2012-01-13 11:19 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Woodward, khilman, linux-omap

On Thu, 12 Jan 2012, NeilBrown wrote:

> On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
> 
> Not off-mode - just retention.

Okay.  It's expected that the UART will drop the first byte also when CORE 
is in retention, due to the delay involved in waking up the chip.  But 
that shouldn't prevent the MPU from going into a low-power state.  I've 
sent a patch under separate cover that should fix this.

> > If the former, this is hardly surprising as the HDQ driver 
> > (drivers/w1/masters/omap_hdq.c) doesn't contain any context save/restore 
> > code.  In fact the HDQ driver doesn't even use PM runtime, so quite a bit 
> > of work is needed there. 
> 
> The HDQ driver does enable/disable the iclk and fclk.  I thought I read that
> having the clocks enabled would prevent the powerdomain from dropping to a
> lower-power state ??

Several years ago, we started handling all of the operations needed to 
quiesce and wake an IP block in a common integration layer called 'hwmod', 
which is currently called through the PM runtime functions.  This includes 
clk_{enable,disable}().  Unfortunately, due to some internal political 
reasons, not all of our drivers are converted to use PM runtime functions 
yet.  You just happened to hit one that isn't -- apparently very few 
people care very much about this driver...

> Could you suggest some other driver which does do the right sort of runtime
> PM that I could try copying?

I did a quick conversion for you.  The series is at 
git://git.pwsan.com/linux-2.6 in the branch named 'hdq_hwmod_runtime_pm'.  
Please let me know if it works.

I did boot-test it, and noticed that the hwmod subsystem was unable to 
softreset the device:

[    0.182769] omap_hwmod: hdq: softreset failed (waited 10000 usec)

This could be because I screwed up something in the hwmod data, or it 
could be that the HDQ reset hardware doesn't work correctly, similar to 
I2C.  If that presents a problem, you might add the HWMOD_INIT_NO_RESET 
flag to the struct omap_hwmod.flags field for the HDQ.

It also spit out these messages later, but I have no idea what they mean.  
At least the kernel seems to be able to access the HDQ registers:

[    1.685272] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in 
Interrupt mode
[    1.888153] w1_master_driver w1_bus_master1: Family 1 for 
01.000000000000.3d is not registered.


- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 10:05       ` Paul Walmsley
@ 2012-01-13 11:20         ` NeilBrown
  2012-01-13 11:31           ` Paul Walmsley
  2012-01-18  7:13           ` Tomi Valkeinen
  2012-01-13 11:34         ` Govindraj
  2012-01-13 19:21         ` Kevin Hilman
  2 siblings, 2 replies; 64+ messages in thread
From: NeilBrown @ 2012-01-13 11:20 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: Joe Woodward, khilman, t-kristo, govindraj.r, linux-omap

[-- Attachment #1: Type: text/plain, Size: 6796 bytes --]

On Fri, 13 Jan 2012 03:05:03 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:

> cc Tero, Govindraj
> 
> On Thu, 12 Jan 2012, NeilBrown wrote:
> 
> > On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
> > 
> > I spent some time exploring why cpuidle never drops below state 0 and found
> > out that the code explicitly disables other states when uart is active - for
> > a fairly broad definition of 'active'.
> > 
> > I found the "sleep_timeout" setting and set them all to 1 second.  This meant
> > that cpuidle started working, but I got a lot of garbage characters.
> > 
> 
> ...
> 
> > Does this really mean CPU_IDLE cannot be used when you might expect chars
> > from a serial port - e.g. GPS or Bluetooth?
> 
> Hmmm.  Looking at mach-omap2/pm34xx.c and mach-omap2/cpuidle34xx.c, it 
> indeed prevents even the MPU from entering a low-power state when the UART 
> is active, as you write.  That doesn't seem correct.
> 
> Please try the following patch.  It works fine for me on v3.2 with 
> omap2plus_defconfig on a 35xx BeagleBoard.  No dropped or garbled 
> characters with console use.  Not too surprising, since the UART has a 
> receive FIFO to withstand interrupt servicing delays.

Much nicer!!

I now see CPUIDLE using state1 and state2 as well as the normal state0.

I also see idle current drain drop from about 160mA to 95mA

And there are no garbled characters when I type.

Having CPUIDLE makes the DSS2 problem worse: lots of 

[   21.085113] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled

messages whenever the CPU isn't busy.

The only way to get rid of them that I have found is to blank the display:
# echo 1 > /sys/devices/platform/omapfb/graphics/fb0/blank

Also, the HDQ access to the battery 'fuel gauge' is working fine, so
presumably that gets disturbed by one of the lower power states (3,4,5).
I guess it is 4,5,6 when CORE goes to RET or OFF.  So we need some way for
HDQ to say "I'm busy" just like UART can.  omap_hdq_can_sleep() ???
There must be a cleaner way...

More experiments:
 - I enabled off_mode and started seeing state3 happening.  UART and HDQ
   still fine - as expected.
 - I set the  /sys/devices/system/cpu/cpu0/cpuidle/state?/usage values
   all to '10'.
   Now things start to break;
   Console is mostly OK, but the first char after 10 seconds of idle is bad.
   I type 'enter' and get 'C'.  In general the first char typed is mapped to
   something else in a consistent way:
      0a -> c3
      20 -> 90
      55 -> d5
      2a -> 95
   It is a bit like we are missing a start bit, and a stop bit is shifting
   down into the msb .. sometimes.

   HDQ also breaks.
   That surprised me a bit as the HDQ access is immediately after typing so
   it should be in the 10-second timeout when the UART keeps CORE powered on.
   Maybe we need runtime_pm to run omap_hdq_break() again to reinitialise
   the HDQ port if CORE might have been turned off.

   Also saw states all the way down to 6.  Cannot tell if this helps current
   drain as I need HDQ working for that.

So this is real progress, but there is still room for improvement.  I might
try writing an omap_hdq_can_sleep() and use it like omap_uart_can_sleep().
And get runtime_pm working on HDQ so it turns off after a short delay, and
re-initialises if we have to turn it back on again.

Thanks for the patch, it's been:
   Tested-by: NeilBrown <neilb@suse.de>


Thanks,
NeilBrown


> 
> 
> - Paul
> 
> From: Paul Walmsley <paul@pwsan.com>
> Date: Fri, 13 Jan 2012 02:10:30 -0700
> Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
>  when the UART is active
> 
> For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
> driver prevent the MPU powerdomain from entering low-power modes when
> any UART isn't asleep.  Possibly it is intended to minimize the ARM
> wakeup latency when UART activity arrives, but the UART has a FIFO
> that should handle this for most cases, with no dropped characters.  I
> may be forgetting something important, though.  And CORE/PER low-power
> states are a different matter entirely.
> 
> Thanks to NeilBrown <neilb@suse.de> for reporting the problem.
> 
> Signed-off-by: Paul Walmsley <paul@pwsan.com>
> Cc: NeilBrown <neilb@suse.de>
> Cc: Joe Woodward <jw@terrafix.co.uk>
> Cc: Tero Kristo <t-kristo@ti.com>
> Cc: Kevin Hilman <khilman@ti.com>
> Cc: Govindraj.R <govindraj.raja@ti.com>
> ---
>  arch/arm/mach-omap2/cpuidle34xx.c |    8 +++-----
>  arch/arm/mach-omap2/pm.h          |    1 -
>  arch/arm/mach-omap2/pm34xx.c      |   10 ----------
>  3 files changed, 3 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c
> index 942bb4f..4ef32d4 100644
> --- a/arch/arm/mach-omap2/cpuidle34xx.c
> +++ b/arch/arm/mach-omap2/cpuidle34xx.c
> @@ -183,6 +183,9 @@ static int next_valid_state(struct cpuidle_device *dev,
>  			core_deepest_state = PWRDM_POWER_OFF;
>  	}
>  
> +	if (!omap_uart_can_sleep())
> +		core_deepest_state = PWRDM_POWER_ON;
> +
>  	/* Check if current state is valid */
>  	if ((cx->valid) &&
>  	    (cx->mpu_state >= mpu_deepest_state) &&
> @@ -244,11 +247,6 @@ static int omap3_enter_idle_bm(struct cpuidle_device *dev,
>  	struct omap3_idle_statedata *cx;
>  	int ret;
>  
> -	if (!omap3_can_sleep()) {
> -		new_state_idx = drv->safe_state_index;
> -		goto select_state;
> -	}
> -
>  	/*
>  	 * Prevent idle completely if CAM is active.
>  	 * CAM does not have wakeup capability in OMAP3.
> diff --git a/arch/arm/mach-omap2/pm.h b/arch/arm/mach-omap2/pm.h
> index 4e166ad..eac6fce 100644
> --- a/arch/arm/mach-omap2/pm.h
> +++ b/arch/arm/mach-omap2/pm.h
> @@ -18,7 +18,6 @@
>  extern void *omap3_secure_ram_storage;
>  extern void omap3_pm_off_mode_enable(int);
>  extern void omap_sram_idle(void);
> -extern int omap3_can_sleep(void);
>  extern int omap_set_pwrdm_state(struct powerdomain *pwrdm, u32 state);
>  extern int omap3_idle_init(void);
>  
> diff --git a/arch/arm/mach-omap2/pm34xx.c b/arch/arm/mach-omap2/pm34xx.c
> index efa6649..1c49824 100644
> --- a/arch/arm/mach-omap2/pm34xx.c
> +++ b/arch/arm/mach-omap2/pm34xx.c
> @@ -484,21 +484,11 @@ console_still_active:
>  	clkdm_allow_idle(mpu_pwrdm->pwrdm_clkdms[0]);
>  }
>  
> -int omap3_can_sleep(void)
> -{
> -	if (!omap_uart_can_sleep())
> -		return 0;
> -	return 1;
> -}
> -
>  static void omap3_pm_idle(void)
>  {
>  	local_irq_disable();
>  	local_fiq_disable();
>  
> -	if (!omap3_can_sleep())
> -		goto out;
> -
>  	if (omap_irq_pending() || need_resched())
>  		goto out;
>  


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 11:20         ` NeilBrown
@ 2012-01-13 11:31           ` Paul Walmsley
  2012-01-13 23:09             ` NeilBrown
  2012-01-18  7:13           ` Tomi Valkeinen
  1 sibling, 1 reply; 64+ messages in thread
From: Paul Walmsley @ 2012-01-13 11:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

On Fri, 13 Jan 2012, NeilBrown wrote:

> Also, the HDQ access to the battery 'fuel gauge' is working fine, so
> presumably that gets disturbed by one of the lower power states (3,4,5).
> I guess it is 4,5,6 when CORE goes to RET or OFF.  So we need some way for
> HDQ to say "I'm busy" just like UART can.  omap_hdq_can_sleep() ???
> There must be a cleaner way...

The first thing to try is the HDQ runtime PM conversion series that was 
sent separately.  Maybe let us know if that fixes the problem.

> More experiments:
>  - I enabled off_mode and started seeing state3 happening.  UART and HDQ
>    still fine - as expected.
>  - I set the  /sys/devices/system/cpu/cpu0/cpuidle/state?/usage values
>    all to '10'.
>    Now things start to break;
>    Console is mostly OK, but the first char after 10 seconds of idle is bad.
>    I type 'enter' and get 'C'.  In general the first char typed is mapped to
>    something else in a consistent way:
>       0a -> c3
>       20 -> 90
>       55 -> d5
>       2a -> 95
>    It is a bit like we are missing a start bit, and a stop bit is shifting
>    down into the msb .. sometimes.

I'm not seeing garbling at all on the OMAP3 BeagleBoard here, running v3.2 
with omap2plus_defconfig.  The first character from the console gets lost, 
which again is expected when the CORE enters a low-power state and there's 
no flow control on the serial interface -- but no garbling.

>    HDQ also breaks.
>    That surprised me a bit as the HDQ access is immediately after typing so
>    it should be in the 10-second timeout when the UART keeps CORE powered on.
>    Maybe we need runtime_pm to run omap_hdq_break() again to reinitialise
>    the HDQ port if CORE might have been turned off.
> 
>    Also saw states all the way down to 6.  Cannot tell if this helps current
>    drain as I need HDQ working for that.

If CORE is going off, the HDQ device context will be lost, since the HDQ 
controller will be powered off.  You'll need to add code to either save 
and restore the HDQ register context into the HDQ driver, or to 
reinitialize the HDQ when it is re-enabled.  The latter is what we do for 
I2C, I think.

As far as the DSS goes, your best bet there is to try to work it out with 
Archit and Tomi.

> Thanks for the patch, it's been:
>    Tested-by: NeilBrown <neilb@suse.de>

Thanks.


- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 10:05       ` Paul Walmsley
  2012-01-13 11:20         ` NeilBrown
@ 2012-01-13 11:34         ` Govindraj
  2012-01-13 13:23           ` Paul Walmsley
  2012-01-13 19:21         ` Kevin Hilman
  2 siblings, 1 reply; 64+ messages in thread
From: Govindraj @ 2012-01-13 11:34 UTC (permalink / raw)
  To: Paul Walmsley
  Cc: NeilBrown, Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

On Fri, Jan 13, 2012 at 3:35 PM, Paul Walmsley <paul@pwsan.com> wrote:
> cc Tero, Govindraj
>
> On Thu, 12 Jan 2012, NeilBrown wrote:
>
>> On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
>>
>> I spent some time exploring why cpuidle never drops below state 0 and found
>> out that the code explicitly disables other states when uart is active - for
>> a fairly broad definition of 'active'.
>>
>> I found the "sleep_timeout" setting and set them all to 1 second.  This meant
>> that cpuidle started working, but I got a lot of garbage characters.
>>
>
> ...
>
>> Does this really mean CPU_IDLE cannot be used when you might expect chars
>> from a serial port - e.g. GPS or Bluetooth?
>
> Hmmm.  Looking at mach-omap2/pm34xx.c and mach-omap2/cpuidle34xx.c, it
> indeed prevents even the MPU from entering a low-power state when the UART
> is active, as you write.  That doesn't seem correct.
>
> Please try the following patch.  It works fine for me on v3.2 with
> omap2plus_defconfig on a 35xx BeagleBoard.  No dropped or garbled
> characters with console use.  Not too surprising, since the UART has a
> receive FIFO to withstand interrupt servicing delays.
>
>
> - Paul
>
> From: Paul Walmsley <paul@pwsan.com>
> Date: Fri, 13 Jan 2012 02:10:30 -0700
> Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
>  when the UART is active
>
> For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
> driver prevent the MPU powerdomain from entering low-power modes when
> any UART isn't asleep.  Possibly it is intended to minimize the ARM
> wakeup latency when UART activity arrives, but the UART has a FIFO
> that should handle this for most cases, with no dropped characters.  I
> may be forgetting something important, though.  And CORE/PER low-power
> states are a different matter entirely.
>
> Thanks to NeilBrown <neilb@suse.de> for reporting the problem.
>
> Signed-off-by: Paul Walmsley <paul@pwsan.com>
> Cc: NeilBrown <neilb@suse.de>
> Cc: Joe Woodward <jw@terrafix.co.uk>
> Cc: Tero Kristo <t-kristo@ti.com>
> Cc: Kevin Hilman <khilman@ti.com>
> Cc: Govindraj.R <govindraj.raja@ti.com>
> ---
>  arch/arm/mach-omap2/cpuidle34xx.c |    8 +++-----
>  arch/arm/mach-omap2/pm.h          |    1 -
>  arch/arm/mach-omap2/pm34xx.c      |   10 ----------
>  3 files changed, 3 insertions(+), 16 deletions(-)
>
> diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c
> index 942bb4f..4ef32d4 100644
> --- a/arch/arm/mach-omap2/cpuidle34xx.c
> +++ b/arch/arm/mach-omap2/cpuidle34xx.c
> @@ -183,6 +183,9 @@ static int next_valid_state(struct cpuidle_device *dev,
>                        core_deepest_state = PWRDM_POWER_OFF;
>        }
>
> +       if (!omap_uart_can_sleep())
> +               core_deepest_state = PWRDM_POWER_ON;
> +

omap_uart_can_sleep is removed part of uart runtime conversion,
and is no more available on latest mainline, looks like uart runtime
patches got merged after 3.2 tagging.

--
Thanks,
Govindraj.R
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 11:34         ` Govindraj
@ 2012-01-13 13:23           ` Paul Walmsley
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Walmsley @ 2012-01-13 13:23 UTC (permalink / raw)
  To: Govindraj
  Cc: NeilBrown, Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

[-- Attachment #1: Type: TEXT/PLAIN, Size: 771 bytes --]

On Fri, 13 Jan 2012, Govindraj wrote:

> > diff --git a/arch/arm/mach-omap2/cpuidle34xx.c b/arch/arm/mach-omap2/cpuidle34xx.c
> > index 942bb4f..4ef32d4 100644
> > --- a/arch/arm/mach-omap2/cpuidle34xx.c
> > +++ b/arch/arm/mach-omap2/cpuidle34xx.c
> > @@ -183,6 +183,9 @@ static int next_valid_state(struct cpuidle_device *dev,
> >                        core_deepest_state = PWRDM_POWER_OFF;
> >        }
> >
> > +       if (!omap_uart_can_sleep())
> > +               core_deepest_state = PWRDM_POWER_ON;
> > +
> 
> omap_uart_can_sleep is removed part of uart runtime conversion,
> and is no more available on latest mainline, looks like uart runtime
> patches got merged after 3.2 tagging.

Great, so sounds like this patch is not needed...

- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 10:05       ` Paul Walmsley
  2012-01-13 11:20         ` NeilBrown
  2012-01-13 11:34         ` Govindraj
@ 2012-01-13 19:21         ` Kevin Hilman
  2012-01-13 22:37           ` Kevin Hilman
  2 siblings, 1 reply; 64+ messages in thread
From: Kevin Hilman @ 2012-01-13 19:21 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: NeilBrown, Joe Woodward, t-kristo, govindraj.r, linux-omap

Paul Walmsley <paul@pwsan.com> writes:

> cc Tero, Govindraj
>
> On Thu, 12 Jan 2012, NeilBrown wrote:
>
>> On Wed, 11 Jan 2012 06:43:04 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
>> 
>> I spent some time exploring why cpuidle never drops below state 0 and found
>> out that the code explicitly disables other states when uart is active - for
>> a fairly broad definition of 'active'.
>> 
>> I found the "sleep_timeout" setting and set them all to 1 second.  This meant
>> that cpuidle started working, but I got a lot of garbage characters.
>> 
>
> ...
>
>> Does this really mean CPU_IDLE cannot be used when you might expect chars
>> from a serial port - e.g. GPS or Bluetooth?
>
> Hmmm.  Looking at mach-omap2/pm34xx.c and mach-omap2/cpuidle34xx.c, it 
> indeed prevents even the MPU from entering a low-power state when the UART 
> is active, as you write.  That doesn't seem correct.
>
> Please try the following patch.  It works fine for me on v3.2 with 
> omap2plus_defconfig on a 35xx BeagleBoard.  No dropped or garbled 
> characters with console use.  Not too surprising, since the UART has a 
> receive FIFO to withstand interrupt servicing delays.
>
>
> - Paul
>
> From: Paul Walmsley <paul@pwsan.com>
> Date: Fri, 13 Jan 2012 02:10:30 -0700
> Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
>  when the UART is active
>
> For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
> driver prevent the MPU powerdomain from entering low-power modes when
> any UART isn't asleep.  Possibly it is intended to minimize the ARM
> wakeup latency when UART activity arrives, but the UART has a FIFO
> that should handle this for most cases, with no dropped characters.  I
> may be forgetting something important, though.  And CORE/PER low-power
> states are a different matter entirely.

Just FYI... the UART can_sleep hackery was removed for v3.3 and replaced
by using a PM QoS constraint:

commit 2fd149645eb46d26130d7070c6de037dddf34880
Author: Govindraj.R <govindraj.raja@ti.com>
Date:   Wed Nov 9 17:41:21 2011 +0530

    ARM: OMAP2+: UART: Remove omap_uart_can_sleep and add pm_qos
    
Kevin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13  5:29     ` Tomi Valkeinen
@ 2012-01-13 19:30       ` Kevin Hilman
  2012-01-16 11:11         ` Tomi Valkeinen
  0 siblings, 1 reply; 64+ messages in thread
From: Kevin Hilman @ 2012-01-13 19:30 UTC (permalink / raw)
  To: Tomi Valkeinen; +Cc: Joe Woodward, linux-omap, Archit Taneja

Tomi Valkeinen <tomi.valkeinen@ti.com> writes:

> On Thu, 2012-01-12 at 14:40 -0800, Kevin Hilman wrote:
>> Tomi Valkeinen <tomi.valkeinen@ti.com> writes:
>> 
>> > On Mon, 2012-01-09 at 12:46 +0000, Joe Woodward wrote:
>> >> I'm running on a Gumstix Overo (OMAP3530) with an 24-bit LCD panel connected via the DPI interface (using the generic panel driver).
>> >> 
>> >> Entering standby used to work just fine on 3.0, but on 3.2 I get the following:
>> >
>> > I've been debugging this, but I'm at loss. I added some debug prints,
>> > which I show below, and I also pushed them to
>> > "git://gitorious.org/linux-omap-dss2/linux.git pm-test-prints" if
>> > somebody wants to see exactly what they print.
>> >
>> > Kevin, perhaps you have an idea what could be wrong here. Long version
>> > below, short version: doing system suspend breaks omapdss, if omapdss
>> > uses pm_runtime_put, but works with pm_runtime_put_sync.
>> 
>> Is the pm_runtime_put() done after system suspend has started?  
>> 
>> After system suspend has begun, the workqueue used for async callbacks
>> is frozen, so any runtime PM calls that you want to work during
>> suspend/resume must use the _sync versions.
>
> pm_runtime_put() is called inside omapdss driver's .suspend callback. So
> I guess that means the system suspend has started. However, the logs
> show that the runtime_suspend callback _is_ being called before the
> system suspend is finished, so the workqueue can't be frozen...

...or there are other ways that the runtime_suspend callback is called.

In order to ensure that devices are properly idled for system-wide
suspend, The OMAP PM domain layer will call the drivers
->runtime_suspend callback during "late" suspend (using the PM domain's
_noirq callbacks.)

This is there so that even when runtime PM has been disabled (via
userspace, or pm_runtime_forbid() calls) the driver can still be
properly idled before suspend.

In your case, I suspect you're seeing the driver's ->runtime_suspend
callback called during late suspend, not by the PM workqueue.

I don't understand DSS enough to make sense of the logs you sent, so not
sure how to be of more help.

Kevin




^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 19:21         ` Kevin Hilman
@ 2012-01-13 22:37           ` Kevin Hilman
  2012-01-13 23:06             ` Paul Walmsley
  0 siblings, 1 reply; 64+ messages in thread
From: Kevin Hilman @ 2012-01-13 22:37 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: NeilBrown, Joe Woodward, t-kristo, govindraj.r, linux-omap

Kevin Hilman <khilman@ti.com> writes:

[...]

>> From: Paul Walmsley <paul@pwsan.com>
>> Date: Fri, 13 Jan 2012 02:10:30 -0700
>> Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
>>  when the UART is active
>>
>> For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
>> driver prevent the MPU powerdomain from entering low-power modes when
>> any UART isn't asleep.  Possibly it is intended to minimize the ARM
>> wakeup latency when UART activity arrives, but the UART has a FIFO
>> that should handle this for most cases, with no dropped characters.  I
>> may be forgetting something important, though.  And CORE/PER low-power
>> states are a different matter entirely.
>
> Just FYI... the UART can_sleep hackery was removed for v3.3 and replaced
> by using a PM QoS constraint:

However, looking closer I see now that the constraint being used in the
mailine driver uses the CPU_DMA_LATENCY QoS constraint, which is preventing 
deeper C states as well just like the current code (before your patch),
so an similar fix will still be needed for mainline.

Kevin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 22:37           ` Kevin Hilman
@ 2012-01-13 23:06             ` Paul Walmsley
  2012-01-13 23:34               ` Paul Walmsley
  2012-01-13 23:39               ` Paul Walmsley
  0 siblings, 2 replies; 64+ messages in thread
From: Paul Walmsley @ 2012-01-13 23:06 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: NeilBrown, Joe Woodward, t-kristo, govindraj.raja, linux-omap

On Fri, 13 Jan 2012, Kevin Hilman wrote:

> Kevin Hilman <khilman@ti.com> writes:
> 
> [...]
> 
> >> From: Paul Walmsley <paul@pwsan.com>
> >> Date: Fri, 13 Jan 2012 02:10:30 -0700
> >> Subject: [PATCH] ARM: OMAP3: PM: allow MPU to enter low-power states even
> >>  when the UART is active
> >>
> >> For some reason, both the existing OMAP3 PM code and the OMAP3 CPUIdle
> >> driver prevent the MPU powerdomain from entering low-power modes when
> >> any UART isn't asleep.  Possibly it is intended to minimize the ARM
> >> wakeup latency when UART activity arrives, but the UART has a FIFO
> >> that should handle this for most cases, with no dropped characters.  I
> >> may be forgetting something important, though.  And CORE/PER low-power
> >> states are a different matter entirely.
> >
> > Just FYI... the UART can_sleep hackery was removed for v3.3 and replaced
> > by using a PM QoS constraint:
> 
> However, looking closer I see now that the constraint being used in the
> mailine driver uses the CPU_DMA_LATENCY QoS constraint, which is preventing 
> deeper C states as well just like the current code (before your patch),
> so an similar fix will still be needed for mainline.

Hmm.  This is the following code in serial_omap_set_termios() ? 

	/* calculate wakeup latency constraint */
	up->calc_latency = (1000000 * up->port.fifosize) /
				(1000 * baud / 8);
	up->latency = up->calc_latency;
	schedule_work(&up->qos_work);

Let's work out this formula for 115.2kbps:

(1000000 * 16) / (1000 * 115200 / 8)

= 16000000 / 14400000

= 1.11...  (presumably the unit here is milliseconds)

= 1 (since up->calc_latency is a u32)

Then up->calc_latency is passed to pm_qos_update_request().  According to 
Documentation/power/pm_qos_interface.txt, it is expecting microseconds.
So that's one problem.  To fix that it should be 

        up->calc_latency = (1000000 * up->port.fifosize) /
                                (baud / 8);

With that change, it should be possible to enter C3, with the timings that 
are in the mainline cpuidle34xx.c.


- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 11:31           ` Paul Walmsley
@ 2012-01-13 23:09             ` NeilBrown
  2012-01-13 23:35               ` Paul Walmsley
  2012-01-17 21:24               ` NeilBrown
  0 siblings, 2 replies; 64+ messages in thread
From: NeilBrown @ 2012-01-13 23:09 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

[-- Attachment #1: Type: text/plain, Size: 2348 bytes --]

On Fri, 13 Jan 2012 04:31:37 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:

> On Fri, 13 Jan 2012, NeilBrown wrote:
> 
> > Also, the HDQ access to the battery 'fuel gauge' is working fine, so
> > presumably that gets disturbed by one of the lower power states (3,4,5).
> > I guess it is 4,5,6 when CORE goes to RET or OFF.  So we need some way for
> > HDQ to say "I'm busy" just like UART can.  omap_hdq_can_sleep() ???
> > There must be a cleaner way...
> 
> The first thing to try is the HDQ runtime PM conversion series that was 
> sent separately.  Maybe let us know if that fixes the problem.

Not completely - but I'm still exploring.

> 
> > More experiments:
> >  - I enabled off_mode and started seeing state3 happening.  UART and HDQ
> >    still fine - as expected.
> >  - I set the  /sys/devices/system/cpu/cpu0/cpuidle/state?/usage values
> >    all to '10'.
> >    Now things start to break;
> >    Console is mostly OK, but the first char after 10 seconds of idle is bad.
> >    I type 'enter' and get 'C'.  In general the first char typed is mapped to 
> >    something else in a consistent way:
> >       0a -> c3
> >       20 -> 90
> >       55 -> d5
> >       2a -> 95
> >    It is a bit like we are missing a start bit, and a stop bit is shifting
> >    down into the msb .. sometimes.
> 
> I'm not seeing garbling at all on the OMAP3 BeagleBoard here, running v3.2 
> with omap2plus_defconfig.  The first character from the console gets lost, 
> which again is expected when the CORE enters a low-power state and there's 
> no flow control on the serial interface -- but no garbling.

What baud rate?

If I use 57600 or lower I see no problem at all.  The first char I type gets
through fine.
If I try higher rates (115200 is the default, I've tried 230400 and 460800)
I get consistent corruption, though different corruptions at different rates.

I've tried enabling hardware handshaking but it makes no difference -
however it is possibly there is something wrong with the wiring so I'll check
that before pursuing that direction much further.

Presumably with working hardware handshaking we should be able to avoid any
loss or corruption??

(I'm not sure if I said - but the board I am working on is the GTA04 Openmoko
board: www.gta04.org).

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 23:06             ` Paul Walmsley
@ 2012-01-13 23:34               ` Paul Walmsley
  2012-01-14  1:17                 ` NeilBrown
  2012-01-13 23:39               ` Paul Walmsley
  1 sibling, 1 reply; 64+ messages in thread
From: Paul Walmsley @ 2012-01-13 23:34 UTC (permalink / raw)
  To: govindraj.raja, Kevin Hilman
  Cc: NeilBrown, Joe Woodward, t-kristo, linux-omap

Hi

just to follow up on this, I think there are some other bugs in this 
formula... maybe someone might want to tackle them.

On Fri, 13 Jan 2012, Paul Walmsley wrote:

>         up->calc_latency = (1000000 * up->port.fifosize) /
>                                 (baud / 8);

One problem is that UARTs generally don't transmit only 8 bits per byte.  
In the most common framing arrangement, 8N1, there are ten bits per byte.  
See for example

    https://en.wikipedia.org/wiki/8-N-1

So that part of this formula will contribute to an underestimation of the 
time it takes to fill the FIFO.  Using 10 data bits per byte adds another 
277 microseconds.

So ideally this formula should change the "8" depending on the number of 
data bits, whether there is a parity bit, and on the number of stop bits.  
The real formula to determine the number of bits per byte should be 
something like:

    bits_per_byte = number of data bits + 1 bit if parity is used + number 
                    of stop bits

Those quantities should all be easily available in 
serial_omap_set_termios().  So then the formula would be:

         up->calc_latency = (1000000 * up->port.fifosize) /
                                 (baud / bits_per_byte);


...

Another issue is that the formula is missing the receive FIFO trigger 
level watermark.  That might contribute to an overestimation of the amount 
of time available for the MPU to service the FIFO.  So rather than using 
simply up->port.fifosize, it should use (up->port.fifosize - 
up.port->rxtrigthres).  up.port->rxtrigthres should be the appropriate 
number derived from the 34xx TRM vZR section 17.4.2.1.2 "Receive FIFO 
Trigger" section - it comes from the UART's TCR register.

So then it would be

         up->calc_latency = 
               (1000000 * (up->port.fifosize - up->port.rxtrigthres)) /
                                 (baud / bits_per_byte);

It seems however that omap-serial.c just hardcodes the rxtrigthres to 0, 
so maybe this is not such a significant issue:

	serial_out(up, UART_TI752_TCR, OMAP_UART_TCR_TRIG);

Ideally the two FIFO thresholds here would be configurable by the UART 
user - especially given that the OMAP UART can be quite deep - 60 bytes.

...

One other issue.  If flow control is enabled, then IMHO, the UART probably 
shouldn't set a wakeup latency constraint at all.  That's because there 
would be no risk of data loss.  Users of the UART might have some 
performance constraints, but those should be set by the UART-using code or 
some system-wide wakeup latency value.

...

As an aside, I wonder how we handle the UART flow control lines during 
chip off-mode?  I assume those would need an appropriate off-mode mux 
value to indicate that the OMAP UART isn't ready; I wonder if we do that?

  
- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 23:09             ` NeilBrown
@ 2012-01-13 23:35               ` Paul Walmsley
  2012-01-17 21:24               ` NeilBrown
  1 sibling, 0 replies; 64+ messages in thread
From: Paul Walmsley @ 2012-01-13 23:35 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

On Sat, 14 Jan 2012, NeilBrown wrote:

> On Fri, 13 Jan 2012 04:31:37 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
> 
> > I'm not seeing garbling at all on the OMAP3 BeagleBoard here, running v3.2 
> > with omap2plus_defconfig.  The first character from the console gets lost, 
> > which again is expected when the CORE enters a low-power state and there's 
> > no flow control on the serial interface -- but no garbling.
> 
> What baud rate?

115200 bps, no flow control.

> Presumably with working hardware handshaking we should be able to avoid any
> loss or corruption??

Yes.  If you're using off-mode, you'll need to ensure that the I/O ring is 
set to wake the system when it sees the appropriate signal level on the 
incoming flow control line.

> (I'm not sure if I said - but the board I am working on is the GTA04 Openmoko
> board: www.gta04.org).

Did not know that.  Exciting!  Maybe you can send some to the OMAP 
maintainers ;-)
 

- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 23:06             ` Paul Walmsley
  2012-01-13 23:34               ` Paul Walmsley
@ 2012-01-13 23:39               ` Paul Walmsley
  1 sibling, 0 replies; 64+ messages in thread
From: Paul Walmsley @ 2012-01-13 23:39 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: NeilBrown, Joe Woodward, t-kristo, govindraj.raja, linux-omap

On Fri, 13 Jan 2012, Paul Walmsley wrote:

> Let's work out this formula for 115.2kbps:
> 
> (1000000 * 16) / (1000 * 115200 / 8)
> 
> = 16000000 / 14400000
> 
> = 1.11...  (presumably the unit here is milliseconds)
> 
> = 1 (since up->calc_latency is a u32)
> 
> Then up->calc_latency is passed to pm_qos_update_request().  According to 
> Documentation/power/pm_qos_interface.txt, it is expecting microseconds.
> So that's one problem.  To fix that it should be 
> 
>         up->calc_latency = (1000000 * up->port.fifosize) /
>                                 (baud / 8);
> 
> With that change, it should be possible to enter C3, with the timings that 
> are in the mainline cpuidle34xx.c.

Actually, just noticed that the fifosize is 60 bytes.  So that's even 
better.  With the fixed formula above, it should be ~ 5.2 ms wakeup 
latency - enough to enter C4 in the CPUIdle data in mainline -- at least 
in theory...


- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 23:34               ` Paul Walmsley
@ 2012-01-14  1:17                 ` NeilBrown
  2012-01-14  1:28                   ` Paul Walmsley
  0 siblings, 1 reply; 64+ messages in thread
From: NeilBrown @ 2012-01-14  1:17 UTC (permalink / raw)
  To: Paul Walmsley
  Cc: govindraj.raja, Kevin Hilman, Joe Woodward, t-kristo, linux-omap

[-- Attachment #1: Type: text/plain, Size: 512 bytes --]

On Fri, 13 Jan 2012 16:34:06 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:


> 
> So ideally this formula should change the "8" depending on the number of 
> data bits, whether there is a parity bit, and on the number of stop bits.  
> The real formula to determine the number of bits per byte should be 
> something like:
> 
>     bits_per_byte = number of data bits + 1 bit if parity is used + number 
>                     of stop bits
                     + 1 for the start bit.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-14  1:17                 ` NeilBrown
@ 2012-01-14  1:28                   ` Paul Walmsley
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Walmsley @ 2012-01-14  1:28 UTC (permalink / raw)
  To: NeilBrown
  Cc: govindraj.raja, Kevin Hilman, Joe Woodward, t-kristo, linux-omap

On Sat, 14 Jan 2012, NeilBrown wrote:

> On Fri, 13 Jan 2012 16:34:06 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
> 
> 
> > 
> > So ideally this formula should change the "8" depending on the number of 
> > data bits, whether there is a parity bit, and on the number of stop bits.  
> > The real formula to determine the number of bits per byte should be 
> > something like:
> > 
> >     bits_per_byte = number of data bits + 1 bit if parity is used + number 
> >                     of stop bits
>                      + 1 for the start bit.

Indeed...


- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 19:30       ` Kevin Hilman
@ 2012-01-16 11:11         ` Tomi Valkeinen
  2012-01-19 19:24           ` Kevin Hilman
  0 siblings, 1 reply; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-16 11:11 UTC (permalink / raw)
  To: Kevin Hilman; +Cc: Joe Woodward, linux-omap, Archit Taneja

[-- Attachment #1: Type: text/plain, Size: 2303 bytes --]

On Fri, 2012-01-13 at 11:30 -0800, Kevin Hilman wrote:
> Tomi Valkeinen <tomi.valkeinen@ti.com> writes:

> > pm_runtime_put() is called inside omapdss driver's .suspend callback. So
> > I guess that means the system suspend has started. However, the logs
> > show that the runtime_suspend callback _is_ being called before the
> > system suspend is finished, so the workqueue can't be frozen...
> 
> ...or there are other ways that the runtime_suspend callback is called.
> 
> In order to ensure that devices are properly idled for system-wide
> suspend, The OMAP PM domain layer will call the drivers
> ->runtime_suspend callback during "late" suspend (using the PM domain's
> _noirq callbacks.)
> 
> This is there so that even when runtime PM has been disabled (via
> userspace, or pm_runtime_forbid() calls) the driver can still be
> properly idled before suspend.
> 
> In your case, I suspect you're seeing the driver's ->runtime_suspend
> callback called during late suspend, not by the PM workqueue.

Ok, that explains the invocation of the callbacks.

> I don't understand DSS enough to make sense of the logs you sent, so not
> sure how to be of more help.

Well, I think I can summarize the situation:

Normally, when the user wants to turn a display off, the panel drivers
call DSS functions to disable the corresponding output, leading to
pm_runtime_put() calls.

In system suspend case, the suspend callback initiates the turning off
of the displays, which again leads to pm_runtime_put() calls.

Now, you already said using pm_runtime_put_sync version is the correct
way when suspending. But to use that I need to either always use
pm_runtime_put_sync, or add an extra boolean which marks that we're
suspending, and pass that around, or make it a DSS global variable.

Both of those options sound a bit ugly to me. So my two questions are,

1) Is there something funny with the DSS architecture to have a problem
like this? I tried to grep around, but I didn't see any other driver
doing something like that (but I could've missed it).

2) why can't the PM framework handle this, by directing the
pm_runtime_put calls to the _sync version when suspending. Somehow it
feels strange to me that the driver needs to care about that.

 Tomi


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 23:09             ` NeilBrown
  2012-01-13 23:35               ` Paul Walmsley
@ 2012-01-17 21:24               ` NeilBrown
  2012-01-22  0:07                 ` Paul Walmsley
  1 sibling, 1 reply; 64+ messages in thread
From: NeilBrown @ 2012-01-17 21:24 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

[-- Attachment #1: Type: text/plain, Size: 3400 bytes --]

On Sat, 14 Jan 2012 10:09:15 +1100 NeilBrown <neilb@suse.de> wrote:

> On Fri, 13 Jan 2012 04:31:37 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
> 
> > On Fri, 13 Jan 2012, NeilBrown wrote:
> > 
> > > Also, the HDQ access to the battery 'fuel gauge' is working fine, so
> > > presumably that gets disturbed by one of the lower power states (3,4,5).
> > > I guess it is 4,5,6 when CORE goes to RET or OFF.  So we need some way for
> > > HDQ to say "I'm busy" just like UART can.  omap_hdq_can_sleep() ???
> > > There must be a cleaner way...
> > 
> > The first thing to try is the HDQ runtime PM conversion series that was 
> > sent separately.  Maybe let us know if that fixes the problem.
> 
> Not completely - but I'm still exploring.
> 
>

I've finished exploring (for now) and am seriously stumped.  Maybe you can
help... 

I modified your patch so that hdq uses autosuspend - because it seemed like a
good idea and ends up providing slightly more informative failure modes.

The normal behaviour of the bq27000 driver is to request lots of values in
quick succession.  So using autosuspend (with 100ms timeout) means that we
have only one pm_resume and one pm_suspend for each batch of requests which
at least reduces the noise.

I have added an omap_hdq_can_sleep() function which only succeeds if the hdq
device is run-time suspended, and call it in the same place that
omap_uart_can_sleep is called, to ensure the core never goes to a lower state
while he hdq is active.  I've double-checked that this really works.

HDQ all works find normally, but then normally CORE stays ON because the UART
keeps it on.

If I tell the UARTs they can sleep (by setting sleep_timeout) and then access
the bq27000 via HDQ it doesn't work properly.

The normal sequence is that it resets the HDQ and sends a 'BREAK' signal
(possibly not really necessary) for which we get a TIMEOUT interrupt.
Then 
 write address - get TX interrupt - wait a bit - get RX interrupt - read byte.

We normally get the TIMEOUT interrupt and often get the first TX and then RX
interrupts but then the device goes silent.  No more interrupts come in, and
the status register (which shows interrupt source) reads as zero.

Occasionally I have seen maybe half a dozen successful reads before it goes
silent, but usually it is zero or 1.

It is as though something is timing out and going to sleep.  A clock
auto-sleeping?  Seems unlikely.

If I set all the UART sleep_timeout values to 0 to keep CORE always ON, the
problem doesn't fix itself.  Whatever got turned off stays off.

Once it starts failing, it never works again until a reboot.

Could we be caching something in memory which we think is on, but due to CORE
going into RET (or maybe OFF), the thing is now off and we never bother to
reset it because the cached value is 'on' ??  I'm wondering about
_sysc_cache, but I haven't looked very deeply into it.

Any other ideas?

Oh - another thing.
Sometimes during early boot I get:

[    0.158447] omap_hwmod: usbtll_fck: missing clockdomain for usbtll_fck.
[    0.176879] omap_hwmod: hdq: softreset failed (waited 10000 usec)

At the same time the UART seem to hiccup and my USB-Serial port loses track
and doesn't see any more characters until I close and re-open.

Could that be related?  Any idea what it means?


Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-13 11:20         ` NeilBrown
  2012-01-13 11:31           ` Paul Walmsley
@ 2012-01-18  7:13           ` Tomi Valkeinen
  2012-01-18 11:15             ` NeilBrown
  1 sibling, 1 reply; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-18  7:13 UTC (permalink / raw)
  To: NeilBrown
  Cc: Paul Walmsley, Joe Woodward, khilman, t-kristo, govindraj.r, linux-omap

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

On Fri, 2012-01-13 at 22:20 +1100, NeilBrown wrote:
> Having CPUIDLE makes the DSS2 problem worse: lots of 
> 
> [   21.085113] omapdss DISPC error: SYNC_LOST on channel lcd,
> restarting the output with video overlays disabled
> 
> messages whenever the CPU isn't busy.

I'm not sure if it is the case here, but DSS has restrictions about the
max DSS clocks on different OPPs. For example, on OMAP4430 LCD clock
maximum is 186MHz at OPP100, and 93MHz at OPP50. So it's a quite big
drop, causing problems with all but the rather small displays.

And the DSS driver doesn't have any support to handle this at the
moment, as there isn't support in the PM framework to do this. I think
the only way to handle this at the moment is for the DSS driver to set
an arbitrarily high constraint on, say, mem throughput, and hope that it
keeps the OMAP in the required OPP.

 Tomi


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-18  7:13           ` Tomi Valkeinen
@ 2012-01-18 11:15             ` NeilBrown
  2012-01-18 11:42               ` Tomi Valkeinen
  0 siblings, 1 reply; 64+ messages in thread
From: NeilBrown @ 2012-01-18 11:15 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: Paul Walmsley, Joe Woodward, khilman, t-kristo, govindraj.r, linux-omap

[-- Attachment #1: Type: text/plain, Size: 1299 bytes --]

On Wed, 18 Jan 2012 09:13:59 +0200 Tomi Valkeinen <tomi.valkeinen@ti.com>
wrote:

> On Fri, 2012-01-13 at 22:20 +1100, NeilBrown wrote:
> > Having CPUIDLE makes the DSS2 problem worse: lots of 
> > 
> > [   21.085113] omapdss DISPC error: SYNC_LOST on channel lcd,
> > restarting the output with video overlays disabled
> > 
> > messages whenever the CPU isn't busy.
> 
> I'm not sure if it is the case here, but DSS has restrictions about the
> max DSS clocks on different OPPs. For example, on OMAP4430 LCD clock
> maximum is 186MHz at OPP100, and 93MHz at OPP50. So it's a quite big
> drop, causing problems with all but the rather small displays.
> 
> And the DSS driver doesn't have any support to handle this at the
> moment, as there isn't support in the PM framework to do this. I think
> the only way to handle this at the moment is for the DSS driver to set
> an arbitrarily high constraint on, say, mem throughput, and hope that it
> keeps the OMAP in the required OPP.
> 
>  Tomi
> 

This LCD panel on this device sets:
   .pixel_clock	= 22000,
in the "struct omap_video_timings" so I'm guessing that is 22MHz?

I'll try building without SMARTREFLEX and see if that makes a difference -
presumably that should keep it at OPP100 (??).

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-18 11:15             ` NeilBrown
@ 2012-01-18 11:42               ` Tomi Valkeinen
  2012-01-18 20:30                 ` NeilBrown
  0 siblings, 1 reply; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-18 11:42 UTC (permalink / raw)
  To: NeilBrown
  Cc: Paul Walmsley, Joe Woodward, khilman, t-kristo, govindraj.r, linux-omap

On Wed, 2012-01-18 at 22:15 +1100, NeilBrown wrote:
> On Wed, 18 Jan 2012 09:13:59 +0200 Tomi Valkeinen <tomi.valkeinen@ti.com>
> wrote:
> 
> > On Fri, 2012-01-13 at 22:20 +1100, NeilBrown wrote:
> > > Having CPUIDLE makes the DSS2 problem worse: lots of 
> > > 
> > > [   21.085113] omapdss DISPC error: SYNC_LOST on channel lcd,
> > > restarting the output with video overlays disabled
> > > 
> > > messages whenever the CPU isn't busy.
> > 
> > I'm not sure if it is the case here, but DSS has restrictions about the
> > max DSS clocks on different OPPs. For example, on OMAP4430 LCD clock
> > maximum is 186MHz at OPP100, and 93MHz at OPP50. So it's a quite big
> > drop, causing problems with all but the rather small displays.
> > 
> > And the DSS driver doesn't have any support to handle this at the
> > moment, as there isn't support in the PM framework to do this. I think
> > the only way to handle this at the moment is for the DSS driver to set
> > an arbitrarily high constraint on, say, mem throughput, and hope that it
> > keeps the OMAP in the required OPP.
> > 
> >  Tomi
> > 
> 
> This LCD panel on this device sets:
>    .pixel_clock	= 22000,
> in the "struct omap_video_timings" so I'm guessing that is 22MHz?

No, that's the pixel clock. There are probably limitations on the pix
clock also, but usually the problem is the functional clocks, which need
to be n x pck, where n depends on the needs for scaling.

 Tomi



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-18 11:42               ` Tomi Valkeinen
@ 2012-01-18 20:30                 ` NeilBrown
  2012-01-19 10:17                   ` Joe Woodward
  0 siblings, 1 reply; 64+ messages in thread
From: NeilBrown @ 2012-01-18 20:30 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: Paul Walmsley, Joe Woodward, khilman, t-kristo, govindraj.r, linux-omap

[-- Attachment #1: Type: text/plain, Size: 2603 bytes --]

On Wed, 18 Jan 2012 13:42:20 +0200 Tomi Valkeinen <tomi.valkeinen@ti.com>
wrote:

> On Wed, 2012-01-18 at 22:15 +1100, NeilBrown wrote:
> > On Wed, 18 Jan 2012 09:13:59 +0200 Tomi Valkeinen <tomi.valkeinen@ti.com>
> > wrote:
> > 
> > > On Fri, 2012-01-13 at 22:20 +1100, NeilBrown wrote:
> > > > Having CPUIDLE makes the DSS2 problem worse: lots of 
> > > > 
> > > > [   21.085113] omapdss DISPC error: SYNC_LOST on channel lcd,
> > > > restarting the output with video overlays disabled
> > > > 
> > > > messages whenever the CPU isn't busy.
> > > 
> > > I'm not sure if it is the case here, but DSS has restrictions about the
> > > max DSS clocks on different OPPs. For example, on OMAP4430 LCD clock
> > > maximum is 186MHz at OPP100, and 93MHz at OPP50. So it's a quite big
> > > drop, causing problems with all but the rather small displays.
> > > 
> > > And the DSS driver doesn't have any support to handle this at the
> > > moment, as there isn't support in the PM framework to do this. I think
> > > the only way to handle this at the moment is for the DSS driver to set
> > > an arbitrarily high constraint on, say, mem throughput, and hope that it
> > > keeps the OMAP in the required OPP.
> > > 
> > >  Tomi
> > > 
> > 
> > This LCD panel on this device sets:
> >    .pixel_clock	= 22000,
> > in the "struct omap_video_timings" so I'm guessing that is 22MHz?
> 
> No, that's the pixel clock. There are probably limitations on the pix
> clock also, but usually the problem is the functional clocks, which need
> to be n x pck, where n depends on the needs for scaling.

Ahh..

   cat /sys/kernel/debug/omapdss/clk

is below and reports 66461538 for fck, so 66MHz?  Still safe for OPP50.

And disabling SMART REFLEX had no obvious effect.

If you can think of anything else I could try to explore to narrow down 
the source of this, I am very happy to test or examine anything you suggest.

Thanks,
NeilBrown



- DSS -
dpll4_ck 864000000
DSS_FCK (DSS1_ALWON_FCLK) = 864000000 / 13  = 66461538
- DISPC -
dispc fclk source = DSS_FCK (DSS1_ALWON_FCLK)
fck		66461538        
- LCD1 -
lcd1_clk source = DSS_FCK (DSS1_ALWON_FCLK)
lck		66461538        lck div	1
pck		22153846        pck div	3
- DSI1 PLL -
dsi pll source = pclkfree
Fint		0               regn 0
CLKIN4DDR	0               regm 0
DSS_FCK (DSS1_ALWON_FCLK)	0               regm_dispc 0	(off)
DSS_FCK (DSS1_ALWON_FCLK)	0               regm_dsi 0	(off)
- DSI1 -
dsi fclk source = DSS_FCK (DSS1_ALWON_FCLK)
DSI_FCLK	66461538
DDR_CLK		0
TxByteClkHS	0
LP_CLK		0



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-18 20:30                 ` NeilBrown
@ 2012-01-19 10:17                   ` Joe Woodward
  2012-01-19 10:40                     ` Tomi Valkeinen
  0 siblings, 1 reply; 64+ messages in thread
From: Joe Woodward @ 2012-01-19 10:17 UTC (permalink / raw)
  To: NeilBrown, Tomi Valkeinen
  Cc: Paul Walmsley, khilman, t-kristo, govindraj.r, linux-omap

...snip...
> 
>    cat /sys/kernel/debug/omapdss/clk
> 
> is below and reports 66461538 for fck, so 66MHz?  Still safe for OPP50.
> 
> And disabling SMART REFLEX had no obvious effect.
> 
> If you can think of anything else I could try to explore to narrow down
> the source of this, I am very happy to test or examine anything you
> suggest.
> 
> Thanks,
> NeilBrown
> 

I'm not sure if this will give you any pointers but i'll tell you what I'm fairly certain I'm seeing...

If I run the stock 3.2 with the changes to dss.c and dispc.c to make the pm_runtime_put()'s in to pm_runtime_put_sync()'s then the DSS warnings when 
suspending do indeed go away.

I have two wake sources, one being the UART the console is on and the other being a GPIO button.

I've tested the following situations:
 - If I sleep using the console and wake by typing a character then I *never* get a SYNC_LOST error.
 - If I sleep by pressing a button on the display (which does a system ("echo mem > /sys/power/state") and then wake by typing a character in to the console then 
I *never* get a SYNC_LOST error.
 - If I sleep by pressing a button on the display (which does a system ("echo mem > /sys/power/state") and then wake by pressing a button attached to the GPIO 
then I get a sequence of SYNC_LOST errors which continue at a rate of about one every 0.5 seconds until a type a character in to the console and then everything 
settles down and starts working again.

So, at least from what I'm seeing, the SYNC_LOST errors seem related somehow to the UART, if that is indeed possible?

Cheers,
Joe



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-19 10:17                   ` Joe Woodward
@ 2012-01-19 10:40                     ` Tomi Valkeinen
  2012-01-19 11:29                       ` Joe Woodward
  0 siblings, 1 reply; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-19 10:40 UTC (permalink / raw)
  To: Joe Woodward
  Cc: NeilBrown, Paul Walmsley, khilman, t-kristo, govindraj.r, linux-omap

[-- Attachment #1: Type: text/plain, Size: 2318 bytes --]

On Thu, 2012-01-19 at 10:17 +0000, Joe Woodward wrote:
> ...snip...
> > 
> >    cat /sys/kernel/debug/omapdss/clk
> > 
> > is below and reports 66461538 for fck, so 66MHz?  Still safe for OPP50.
> > 
> > And disabling SMART REFLEX had no obvious effect.
> > 
> > If you can think of anything else I could try to explore to narrow down
> > the source of this, I am very happy to test or examine anything you
> > suggest.
> > 
> > Thanks,
> > NeilBrown
> > 
> 
> I'm not sure if this will give you any pointers but i'll tell you what I'm fairly certain I'm seeing...
> 
> If I run the stock 3.2 with the changes to dss.c and dispc.c to make the pm_runtime_put()'s in to pm_runtime_put_sync()'s then the DSS warnings when 
> suspending do indeed go away.
> 
> I have two wake sources, one being the UART the console is on and the other being a GPIO button.
> 
> I've tested the following situations:
>  - If I sleep using the console and wake by typing a character then I *never* get a SYNC_LOST error.
>  - If I sleep by pressing a button on the display (which does a system ("echo mem > /sys/power/state") and then wake by typing a character in to the console then 
> I *never* get a SYNC_LOST error.
>  - If I sleep by pressing a button on the display (which does a system ("echo mem > /sys/power/state") and then wake by pressing a button attached to the GPIO 
> then I get a sequence of SYNC_LOST errors which continue at a rate of about one every 0.5 seconds until a type a character in to the console and then everything 
> settles down and starts working again.
> 
> So, at least from what I'm seeing, the SYNC_LOST errors seem related somehow to the UART, if that is indeed possible?

Well, I don't see how UART could directly affect DSS. The thing that
comes to my mind is that typing a char in the console causes a change in
the power management, which then "fixes" the DSS. But then again, I'd
expect the console to go into sleep/idle state after a short while,
which should again cause sync_losts. But that's not happening...

Can you show the dss clocks you use (cat debugfs/omapdss/clk)?

It's been a while since I did anything with PM, so I don't remember how
and if it can be done, but a simple test would be to lock the OMAP to
always use OPP100.

 Tomi


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-19 10:40                     ` Tomi Valkeinen
@ 2012-01-19 11:29                       ` Joe Woodward
  2012-01-19 11:36                         ` Tomi Valkeinen
  0 siblings, 1 reply; 64+ messages in thread
From: Joe Woodward @ 2012-01-19 11:29 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: NeilBrown, Paul Walmsley, khilman, t-kristo, govindraj.r, linux-omap

... (apologies for awful formatting of the previous mail) ...
 
> Well, I don't see how UART could directly affect DSS. The thing that
> comes to my mind is that typing a char in the console causes a change
> in
> the power management, which then "fixes" the DSS. But then again, I'd
> expect the console to go into sleep/idle state after a short while,
> which should again cause sync_losts. But that's not happening...
> 
> Can you show the dss clocks you use (cat debugfs/omapdss/clk)?
> 
> It's been a while since I did anything with PM, so I don't remember how
> and if it can be done, but a simple test would be to lock the OMAP to
> always use OPP100.
> 
>  Tomi
> 

Right,

I'm running with both CPUIDLE and CPUFREQ disabled at present so I assume that means I should always be in OPP100?

The panel is 800x480 (I've added a configuration to the generic-panel with the correct panel parameters), with just 1 FB device (/dev/fb0).

Clock dump is shown below:

# pwd
/debug/omapdss
# cat clk
[   53.146057] omapdss DSS: dss_runtime_get
[   53.151000] omapdss DSS: dss_runtime_put
[   53.155151] omapdss DISPC: dispc_runtime_get
[   53.160430] omapdss DISPC: dispc_runtime_put
- DSS -
dpll4_ck 432000000
DSS_FCK (DSS1_ALWON_FCLK) = 432000000 / 12 * 2 = 72000000
- DISPC -
dispc fclk source = DSS_FCK (DSS1_ALWON_FCLK)
fck             72000000
- LCD1 -
lcd1_clk source = DSS_FCK (DSS1_ALWON_FCLK)
lck             72000000        lck div 1
pck             36000000        pck div 2


The SYNC_LOST errors sometimes don't happen, and sometimes fix themselves. It does seem that whenever a character is typed in to the UART the SYNC_LOST errors stop immediately.

This is a question more for Kevin, but is there some debug/tracing I can enable within the PM framework to see a bit more of whats going on?

I've added some more traces below...

Trace with no errors:
[   91.806427] PM: Syncing filesystems ... done.
[   91.811370] PM: Preparing system for mem sleep
[   91.816986] Freezing user space processes ... (elapsed 0.02 seconds) done.
[   91.846221] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) done.
[   91.877441] PM: Entering mem sleep
[   91.881317] Suspending console(s) (use no_console_suspend to debug)
[   92.001068] omapdss CORE: suspend 2
[   92.184600] omapdss DISPC: dispc_runtime_put
[   92.184631] omapdss DISPC: dispc_save_context
[   92.184692] omapdss DISPC: context saved, ctx_loss_count 0
[   92.184722] omapdss DSS: dss_runtime_put
[   92.184753] omapdss DSS: dss_runtime_put
[   92.184783] omapdss DSS: dss_save_context
[   92.184783] omapdss DSS: context saved
[   92.185485] PM: suspend of devices complete after 297.088 msecs
[   92.186401] PM: late suspend of devices complete after 0.885 msecs
[   94.111755] Successfully put all powerdomains to target state
[   94.112518] PM: early resume of devices complete after 0.549 msecs
[   94.113555] omapdss CORE: resume
[   94.114135] omapdss DSS: dss_runtime_get
[   94.114166] omapdss DSS: dss_restore_context
[   94.114196] omapdss DSS: context restored
[   94.114196] omapdss DISPC: dispc_runtime_get
[   94.114227] omapdss DSS: dss_runtime_get
[   94.114257] omapdss DISPC: dispc_restore_context
[   94.114288] omapdss DISPC: onoff 0 rf 0 ieo 0 ipc 0 ihs 1 ivs 1 acbi 0 acb 0
[   94.114318] omapdss DSS: dispc clock info found from cache.
[   94.114318] omapdss DSS: dpll4_m4 = 432000000
[   94.114349] omapdss DSS: fck = 72000000 (12)
[   94.114379] omapdss DISPC: lck = 72000000 (1)
[   94.114379] omapdss DISPC: pck = 36000000 (2)
[   94.114410] omapdss DISPC: channel 0 xres 800 yres 480
[   94.114440] omapdss DISPC: pck 36000
[   94.114440] omapdss DISPC: hsw 10 hfp 164 hbp 89 vsw 10 vfp 10 vbp 23
[   94.114471] omapdss DISPC: hsync 33866Hz, vsync 64Hz
[   94.698455] PM: resume of devices complete after 585.082 msecs
[   94.847900] PM: Finishing wakeup.
[   94.851409] Restarting tasks ... done.

Trace with SYNC_LOST errors:
[  106.635437] PM: Syncing filesystems ... done.
[  106.640563] PM: Preparing system for mem sleep
[  106.646118] Freezing user space processes ... (elapsed 0.02 seconds) done.
[  106.677703] Freezing remaining freezable tasks ... (elapsed 0.02 seconds) done.
[  106.708923] PM: Entering mem sleep
[  106.712768] Suspending console(s) (use no_console_suspend to debug)
[  106.832672] omapdss CORE: suspend 2
[  107.011260] omapdss DISPC: dispc_runtime_put
[  107.011291] omapdss DISPC: dispc_save_context
[  107.011352] omapdss DISPC: context saved, ctx_loss_count 0
[  107.011383] omapdss DSS: dss_runtime_put
[  107.011413] omapdss DSS: dss_runtime_put
[  107.011444] omapdss DSS: dss_save_context
[  107.011444] omapdss DSS: context saved
[  107.012145] PM: suspend of devices complete after 292.266 msecs
[  107.013061] PM: late suspend of devices complete after 0.885 msecs
[  107.716400] Successfully put all powerdomains to target state
[  107.717102] PM: early resume of devices complete after 0.488 msecs
[  107.718139] omapdss CORE: resume
[  107.718780] omapdss DSS: dss_runtime_get
[  107.718811] omapdss DSS: dss_restore_context
[  107.718841] omapdss DSS: context restored
[  107.718872] omapdss DISPC: dispc_runtime_get
[  107.718872] omapdss DSS: dss_runtime_get
[  107.718902] omapdss DISPC: dispc_restore_context
[  107.718933] omapdss DISPC: onoff 0 rf 0 ieo 0 ipc 0 ihs 1 ivs 1 acbi 0 acb 0
[  107.718963] omapdss DSS: dispc clock info found from cache.
[  107.718963] omapdss DSS: dpll4_m4 = 432000000
[  107.718994] omapdss DSS: fck = 72000000 (12)
[  107.719024] omapdss DISPC: lck = 72000000 (1)
[  107.719024] omapdss DISPC: pck = 36000000 (2)
[  107.719055] omapdss DISPC: channel 0 xres 800 yres 480
[  107.719085] omapdss DISPC: pck 36000
[  107.719085] omapdss DISPC: hsw 10 hfp 164 hbp 89 vsw 10 vfp 10 vbp 23
[  107.719116] omapdss DISPC: hsync 33866Hz, vsync 64Hz
[  108.307922] PM: resume of devices complete after 589.996 msecs
[  108.457397] PM: Finishing wakeup.
[  108.460876] Restarting tasks ... done.
[  108.597045] DISPC IRQ: 0x40a2: SYNC_LOST
[  108.601470] omapdss DISPC: dispc_runtime_get
[  108.606872] omapdss DISPC error: SYNC_LOST on channel lcd, restarting the output with video overlays disabled
[  108.802520] omapdss DISPC: dispc_runtime_put
[  108.807037] omapdss DSS: dss_runtime_put
[  108.812103] omapdss DISPC: dispc_enable_plane 1, 0
[  108.817596] omapdss DISPC: dispc_enable_plane 2, 0
[  108.873352] omapdss DSS: dss_runtime_get
[  108.877502] omapdss DISPC: dispc_runtime_get
[  108.882568] omapdss DISPC: onoff 0 rf 0 ieo 0 ipc 0 ihs 1 ivs 1 acbi 0 acb 0
[  108.890411] omapdss DSS: dispc clock info found from cache.
[  108.896789] omapdss DSS: dpll4_m4 = 432000000
[  108.901733] omapdss DSS: fck = 72000000 (12)
[  108.906250] omapdss DISPC: lck = 72000000 (1)
[  108.911407] omapdss DISPC: pck = 36000000 (2)
[  108.916015] omapdss DISPC: channel 0 xres 800 yres 480
[  108.922027] omapdss DISPC: pck 36000
[  108.926116] omapdss DISPC: hsw 10 hfp 164 hbp 89 vsw 10 vfp 10 vbp 23
[  108.933288] omapdss DISPC: hsync 33866Hz, vsync 64Hz
[  108.940887] omapdss DISPC: dispc_runtime_put

Thanks,
Joe



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-19 11:29                       ` Joe Woodward
@ 2012-01-19 11:36                         ` Tomi Valkeinen
  2012-01-19 12:21                           ` Joe Woodward
  0 siblings, 1 reply; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-19 11:36 UTC (permalink / raw)
  To: Joe Woodward
  Cc: NeilBrown, Paul Walmsley, khilman, t-kristo, govindraj.r, linux-omap

On Thu, 2012-01-19 at 11:29 +0000, Joe Woodward wrote:
> ... (apologies for awful formatting of the previous mail) ...
>  
> > Well, I don't see how UART could directly affect DSS. The thing that
> > comes to my mind is that typing a char in the console causes a change
> > in
> > the power management, which then "fixes" the DSS. But then again, I'd
> > expect the console to go into sleep/idle state after a short while,
> > which should again cause sync_losts. But that's not happening...
> > 
> > Can you show the dss clocks you use (cat debugfs/omapdss/clk)?
> > 
> > It's been a while since I did anything with PM, so I don't remember how
> > and if it can be done, but a simple test would be to lock the OMAP to
> > always use OPP100.
> > 
> >  Tomi
> > 
> 
> Right,
> 
> I'm running with both CPUIDLE and CPUFREQ disabled at present so I assume that means I should always be in OPP100?
> 
> The panel is 800x480 (I've added a configuration to the generic-panel with the correct panel parameters), with just 1 FB device (/dev/fb0).
> 
> Clock dump is shown below:
> 
> # pwd
> /debug/omapdss
> # cat clk
> [   53.146057] omapdss DSS: dss_runtime_get
> [   53.151000] omapdss DSS: dss_runtime_put
> [   53.155151] omapdss DISPC: dispc_runtime_get
> [   53.160430] omapdss DISPC: dispc_runtime_put
> - DSS -
> dpll4_ck 432000000
> DSS_FCK (DSS1_ALWON_FCLK) = 432000000 / 12 * 2 = 72000000
> - DISPC -
> dispc fclk source = DSS_FCK (DSS1_ALWON_FCLK)
> fck             72000000
> - LCD1 -
> lcd1_clk source = DSS_FCK (DSS1_ALWON_FCLK)
> lck             72000000        lck div 1
> pck             36000000        pck div 2
> 
> 
> The SYNC_LOST errors sometimes don't happen, and sometimes fix themselves. It does seem that whenever a character is typed in to the UART the SYNC_LOST errors stop immediately.

Yep, the clocks are so low that they should work fine with OPP50 also...

 Tomi



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-19 11:36                         ` Tomi Valkeinen
@ 2012-01-19 12:21                           ` Joe Woodward
  2012-01-19 14:52                             ` Tomi Valkeinen
  2012-01-19 19:37                             ` Kevin Hilman
  0 siblings, 2 replies; 64+ messages in thread
From: Joe Woodward @ 2012-01-19 12:21 UTC (permalink / raw)
  To: Tomi Valkeinen
  Cc: NeilBrown, Paul Walmsley, khilman, t-kristo, govindraj.r, linux-omap



-----Original Message-----
From: Tomi Valkeinen <tomi.valkeinen@ti.com>
To: Joe Woodward <jw@terrafix.co.uk>
Cc: NeilBrown <neilb@suse.de>, Paul Walmsley <paul@pwsan.com>, khilman@ti.com, t-kristo@ti.com, govindraj.r@ti.com, linux-omap@vger.kernel.org
Date: Thu, 19 Jan 2012 13:36:59 +0200
Subject: Re: DSS2/PM on 3.2 broken?

> On Thu, 2012-01-19 at 11:29 +0000, Joe Woodward wrote:
> > ... (apologies for awful formatting of the previous mail) ...
> >  
> > > Well, I don't see how UART could directly affect DSS. The thing
> that
> > > comes to my mind is that typing a char in the console causes a
> change
> > > in
> > > the power management, which then "fixes" the DSS. But then again,
> I'd
> > > expect the console to go into sleep/idle state after a short while,
> > > which should again cause sync_losts. But that's not happening...
> > > 
> > > Can you show the dss clocks you use (cat debugfs/omapdss/clk)?
> > > 
> > > It's been a while since I did anything with PM, so I don't remember
> how
> > > and if it can be done, but a simple test would be to lock the OMAP
> to
> > > always use OPP100.
> > > 
> > >  Tomi
> > > 
> > 
> > Right,
> > 
> > I'm running with both CPUIDLE and CPUFREQ disabled at present so I
> assume that means I should always be in OPP100?
> > 
> > The panel is 800x480 (I've added a configuration to the generic-panel
> with the correct panel parameters), with just 1 FB device (/dev/fb0).
> > 
> > Clock dump is shown below:
> > 
> > # pwd
> > /debug/omapdss
> > # cat clk
> > [   53.146057] omapdss DSS: dss_runtime_get
> > [   53.151000] omapdss DSS: dss_runtime_put
> > [   53.155151] omapdss DISPC: dispc_runtime_get
> > [   53.160430] omapdss DISPC: dispc_runtime_put
> > - DSS -
> > dpll4_ck 432000000
> > DSS_FCK (DSS1_ALWON_FCLK) = 432000000 / 12 * 2 = 72000000
> > - DISPC -
> > dispc fclk source = DSS_FCK (DSS1_ALWON_FCLK)
> > fck             72000000
> > - LCD1 -
> > lcd1_clk source = DSS_FCK (DSS1_ALWON_FCLK)
> > lck             72000000        lck div 1
> > pck             36000000        pck div 2
> > 
> > 
> > The SYNC_LOST errors sometimes don't happen, and sometimes fix
> themselves. It does seem that whenever a character is typed in to the
> UART the SYNC_LOST errors stop immediately.
> 
> Yep, the clocks are so low that they should work fine with OPP50
> also...

If I do (either from the console or via a button press on the screen) then I never get a SYNC_LOST.
  echo 0 > /sys/devices/omapdss/display0/enabled
  echo 1 > /sys/devices/omapdss/display0/enabled

Just trying to think of some ideas that may be affecting the DSS...
  - Could it to be to do with the GPIO being used as a wake source (i.e. does the GPIO driver do runtime_pm properly?)?
  - Could it to be to do with the UART as it seems to fix itself whenever a character is pressed?
  - Could it to be to do with the ordering in which drivers are resumed?

Is the SYNC_LOST normally due to a lack of memory bandwidth? If so, it is possible to find out what the kernel is doing during the resume?

And before looking at this too much more, is the changing of the pm_runtime_put to the _sync versions the correct fix?

Sorry for so many questions, but I'm interested in getting this fixed as it's the only thing stopping me from switching to 3.2 from 3.0!

Cheers,
Joe



^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-19 12:21                           ` Joe Woodward
@ 2012-01-19 14:52                             ` Tomi Valkeinen
  2012-01-19 19:37                             ` Kevin Hilman
  1 sibling, 0 replies; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-19 14:52 UTC (permalink / raw)
  To: Joe Woodward
  Cc: khilman, NeilBrown, Paul Walmsley, t-kristo, govindraj.r, linux-omap

[-- Attachment #1: Type: text/plain, Size: 2491 bytes --]

On Thu, 2012-01-19 at 12:21 +0000, Joe Woodward wrote:

> If I do (either from the console or via a button press on the screen) then I never get a SYNC_LOST.
>   echo 0 > /sys/devices/omapdss/display0/enabled
>   echo 1 > /sys/devices/omapdss/display0/enabled
> 
> Just trying to think of some ideas that may be affecting the DSS...
>   - Could it to be to do with the GPIO being used as a wake source (i.e. does the GPIO driver do runtime_pm properly?)?
>   - Could it to be to do with the UART as it seems to fix itself whenever a character is pressed?
>   - Could it to be to do with the ordering in which drivers are resumed?

Well, none of these sound probable to me. I don't see any connection
with GPIOs or UART as such with DSS (I mean something that could cause
sync losts).

The only thing that I can see affecting DSS via GPIO/UART is indirectly
via PM, with voltages or clocks. But if DSS responds in some way, I
presume the voltages are ok. And your clocks should be low enough to
work with lower OPPs also. So I'm quite out of ideas.

Of course there could be a problem in the omapdss when it returns its
registers. But it wouldn't explain why the synclosts end if you press a
key in the console.

> Is the SYNC_LOST normally due to a lack of memory bandwidth? If so, it is possible to find out what the kernel is doing during the resume?

No. That should cause fifo underflows. I don't know the exact reasons
for sync_lost, but I think it generally means that the DSS sub-modules
lose sync with each other. Mostly this is due to clock config, but can
be cause also by other (wrong) configuration which makes a DSS
sub-module behave somehow wrong (or halt totally).

> And before looking at this too much more, is the changing of the pm_runtime_put to the _sync versions the correct fix?

Well. I think it's a correct fix, in functional sense. However, I would
like to use non-sync versions normally, but that's just for performance
optimization.

> Sorry for so many questions, but I'm interested in getting this fixed as it's the only thing stopping me from switching to 3.2 from 3.0!

The only idea I have currently is to add/enable debug prints which show
information about PM changing its states, so we could see if something
actually changes at the moment you press a key in the console.

What kind of setup did you have again? I wonder if I could reproduce it
easily with overo/beagle (it was omap3, wasn't it?)?

 Tomi


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-16 11:11         ` Tomi Valkeinen
@ 2012-01-19 19:24           ` Kevin Hilman
  2012-01-20  7:16             ` Tomi Valkeinen
  0 siblings, 1 reply; 64+ messages in thread
From: Kevin Hilman @ 2012-01-19 19:24 UTC (permalink / raw)
  To: Tomi Valkeinen; +Cc: Joe Woodward, linux-omap, Archit Taneja

Tomi Valkeinen <tomi.valkeinen@ti.com> writes:

> On Fri, 2012-01-13 at 11:30 -0800, Kevin Hilman wrote:
>> Tomi Valkeinen <tomi.valkeinen@ti.com> writes:
>
>> > pm_runtime_put() is called inside omapdss driver's .suspend callback. So
>> > I guess that means the system suspend has started. However, the logs
>> > show that the runtime_suspend callback _is_ being called before the
>> > system suspend is finished, so the workqueue can't be frozen...
>> 
>> ...or there are other ways that the runtime_suspend callback is called.
>> 
>> In order to ensure that devices are properly idled for system-wide
>> suspend, The OMAP PM domain layer will call the drivers
>> ->runtime_suspend callback during "late" suspend (using the PM domain's
>> _noirq callbacks.)
>> 
>> This is there so that even when runtime PM has been disabled (via
>> userspace, or pm_runtime_forbid() calls) the driver can still be
>> properly idled before suspend.
>> 
>> In your case, I suspect you're seeing the driver's ->runtime_suspend
>> callback called during late suspend, not by the PM workqueue.
>
> Ok, that explains the invocation of the callbacks.
>
>> I don't understand DSS enough to make sense of the logs you sent, so not
>> sure how to be of more help.
>
> Well, I think I can summarize the situation:
>
> Normally, when the user wants to turn a display off, the panel drivers
> call DSS functions to disable the corresponding output, leading to
> pm_runtime_put() calls.
>
> In system suspend case, the suspend callback initiates the turning off
> of the displays, which again leads to pm_runtime_put() calls.
>
> Now, you already said using pm_runtime_put_sync version is the correct
> way when suspending. But to use that I need to either always use
> pm_runtime_put_sync, or add an extra boolean which marks that we're
> suspending, and pass that around, or make it a DSS global variable.

I'm not sure why can't you use the sync version just in the suspend
callback?

> Both of those options sound a bit ugly to me. So my two questions are,
>
> 1) Is there something funny with the DSS architecture to have a problem
> like this? I tried to grep around, but I didn't see any other driver
> doing something like that (but I could've missed it).
>
> 2) why can't the PM framework handle this, by directing the
> pm_runtime_put calls to the _sync version when suspending. Somehow it
> feels strange to me that the driver needs to care about that.

This has been a big debate with the runtime PM framework in general.
Some of us have argued that system PM and runtime PM should be more
unified, and that runtime PM calls should work "normally" from system
suspend.  After all, system suspend could be considered simply a forced
idle/quiecient state state where all the runtime callbacks kick in to
idle the system.  However those of us arguing for that lost the battle
with the PM core maintainers, and system PM and runtime PM have to be
managed separately.

I've "worked around" that by using the PM domain level calls to ensure
runtime PM callbacks are called on system suspend, but the side effect
is the complications you've run into.

So for now, if you use runtime PM in your system PM you have to use the
sync versions.  For most drivers I've dealt with, that means just using
sync version in the suspend callback is required, but maybe the
interaction of the various DSS subblocks means you need the sync version
everywhere.

Kevin




^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-19 12:21                           ` Joe Woodward
  2012-01-19 14:52                             ` Tomi Valkeinen
@ 2012-01-19 19:37                             ` Kevin Hilman
  2012-01-19 21:05                               ` NeilBrown
  1 sibling, 1 reply; 64+ messages in thread
From: Kevin Hilman @ 2012-01-19 19:37 UTC (permalink / raw)
  To: Joe Woodward
  Cc: Tomi Valkeinen, NeilBrown, Paul Walmsley, t-kristo, govindraj.r,
	linux-omap

"Joe Woodward" <jw@terrafix.co.uk> writes:

[...]

> If I do (either from the console or via a button press on the screen)
> then I never get a SYNC_LOST.
>
>   echo 0 > /sys/devices/omapdss/display0/enabled
>   echo 1 > /sys/devices/omapdss/display0/enabled
>
> Just trying to think of some ideas that may be affecting the DSS...
>   - Could it to be to do with the GPIO being used as a wake source (i.e. does the GPIO driver do runtime_pm properly?)?
>   - Could it to be to do with the UART as it seems to fix itself whenever a character is pressed?
>   - Could it to be to do with the ordering in which drivers are resumed?

Here's my guess/hunch as to why the UART wakeup helps and GPIO doesn't.

The UART's are idled using timeouts, so after any activity (including a
wakeup) the UART timeout will not alow the UARTs to idle (and thus the
system to hit low power states) for a given timeout period.

I don't know what DSS does on wakeup, but presumably it has a burst of
activity to do right away.  Because of the UART timeout, the DSS
probably completes its "burst" of activity before the system ever idles
to a low power state.

With a GPIO wakeup, the system can (re)idle and possibly hit low power
states immediately after the wakeup.

If that is the case, that suggests that the DSS probably needs a
constraint someplace to prevent this from happening, at least during
this "burst" period.

Kevin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-19 19:37                             ` Kevin Hilman
@ 2012-01-19 21:05                               ` NeilBrown
  2012-01-20  0:22                                 ` Kevin Hilman
  0 siblings, 1 reply; 64+ messages in thread
From: NeilBrown @ 2012-01-19 21:05 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Joe Woodward, Tomi Valkeinen, Paul Walmsley, t-kristo,
	govindraj.r, linux-omap

[-- Attachment #1: Type: text/plain, Size: 10803 bytes --]

On Thu, 19 Jan 2012 11:37:39 -0800 Kevin Hilman <khilman@ti.com> wrote:

> "Joe Woodward" <jw@terrafix.co.uk> writes:
> 
> [...]
> 
> > If I do (either from the console or via a button press on the screen)
> > then I never get a SYNC_LOST.
> >
> >   echo 0 > /sys/devices/omapdss/display0/enabled
> >   echo 1 > /sys/devices/omapdss/display0/enabled
> >
> > Just trying to think of some ideas that may be affecting the DSS...
> >   - Could it to be to do with the GPIO being used as a wake source (i.e. does the GPIO driver do runtime_pm properly?)?
> >   - Could it to be to do with the UART as it seems to fix itself whenever a character is pressed?
> >   - Could it to be to do with the ordering in which drivers are resumed?
> 
> Here's my guess/hunch as to why the UART wakeup helps and GPIO doesn't.
> 
> The UART's are idled using timeouts, so after any activity (including a
> wakeup) the UART timeout will not alow the UARTs to idle (and thus the
> system to hit low power states) for a given timeout period.

While I agree with that, I'm wondering if there might be something else odd
relating to the UARTs that we haven't spotted yet.

I'm chasing done the different problem of HDQ not coping with CPUIDLE but I
think it is very likely to have a similar root cause, as HDQ goes wrong at
the same time that DSS goes wrong, and is a vaguely similar way.

I just tried compiling omap_hwmod.c with "#define DEBUG" and got some strange
results.

Firstly: something strange early in boot.  Here is an annotated extract from
the dmesg log:

[    0.862304] omap_hwmod: l4_wkup: enabling
[    0.866546] omap_hwmod: l4_wkup: enabling clocks
[    0.871429] omap_hwmod: l4_wkup: resetting
[    0.875732] omap_hwmod: l4_wkup: idling
[    0.879791] omap_hwmod: l4_wkup: disabling clocks
[    0.884765] omap_hwmod: mmc3: enabling
[    0.888702] omap_hwmod: mmc3: enabling clocks
[    0.893341] omap_hwmod: mmc3: resetting
[    0.897430] omap_hwmod: mmc3: resetting via OCP SOFTRESET
[    0.903106] omap_hwmod: mmc3: softreset in 0 usec
[    0.908050] omap_hwmod: mmc3: idling
[    0.911834] omap_hwmod: mmc3: disabling clocks

At this point my serial console (UART3, 19200bps) start emitting nuls rather than what
it should, so I don't see the next few lines.
154 nuls instread of 91 characters.


[    0.916534] omap_hwmod: hdq: enabling
[    0.944793] omap_hwmod: hdq: enabling clocks

Then normal text resumes for a short while:

[    0.977691] omap_hwmod: hdq: resetting
[    0.981658] omap_hwmod: hdq: resetting via OCP SOFTRESET
[    0.998413] omap_hwmod: hdq: softreset failed (waited 10000 usec)
[    1.004791] omap_hwmod: hdq: idling
[    1.008483] omap_hwmod: hdq: disabling clocks

Here again we switch to all nuls.
7959 nuls instead of 4701 characters.

[    1.013092] omap_hwmod: timer1: enabling
[    1.043334] omap_hwmod: timer1: enabling clocks
[    1.078155] omap_hwmod: timer1: resetting
[    1.109008] omap_hwmod: timer1: resetting via OCP SOFTRESET
[    1.151702] omap_hwmod: timer1: softreset in 0 usec
[    1.189147] omap_hwmod: timer1: idling
[    1.218048] omap_hwmod: timer1: disabling clocks
[    1.253540] omap_hwmod: timer2: enabling
[    1.283752] omap_hwmod: timer2: enabling clocks
[    1.318572] omap_hwmod: timer2: resetting
[    1.349456] omap_hwmod: timer2: resetting via OCP SOFTRESET
[    1.392150] omap_hwmod: timer2: softreset in 0 usec
[    1.429565] omap_hwmod: timer2: idling
[    1.458496] omap_hwmod: timer2: disabling clocks
[    1.493957] omap_hwmod: timer3: enabling
[    1.524169] omap_hwmod: timer3: enabling clocks
[    1.558990] omap_hwmod: timer3: resetting
[    1.589874] omap_hwmod: timer3: resetting via OCP SOFTRESET
[    1.632568] omap_hwmod: timer3: softreset in 0 usec
[    1.669982] omap_hwmod: timer3: idling
[    1.698913] omap_hwmod: timer3: disabling clocks
[    1.734374] omap_hwmod: timer4: enabling
[    1.764617] omap_hwmod: timer4: enabling clocks
[    1.799438] omap_hwmod: timer4: resetting
[    1.830291] omap_hwmod: timer4: resetting via OCP SOFTRESET
[    1.872985] omap_hwmod: timer4: softreset in 0 usec
[    1.910430] omap_hwmod: timer4: idling
[    1.939331] omap_hwmod: timer4: disabling clocks
[    1.974822] omap_hwmod: timer5: enabling
[    2.005035] omap_hwmod: timer5: enabling clocks
[    2.039855] omap_hwmod: timer5: resetting
[    2.070739] omap_hwmod: timer5: resetting via OCP SOFTRESET
[    2.113403] omap_hwmod: timer5: softreset in 0 usec
[    2.150848] omap_hwmod: timer5: idling
[    2.179748] omap_hwmod: timer5: disabling clocks
[    2.215270] omap_hwmod: timer6: enabling
[    2.245483] omap_hwmod: timer6: enabling clocks
[    2.280303] omap_hwmod: timer6: resetting
[    2.311157] omap_hwmod: timer6: resetting via OCP SOFTRESET
[    2.353851] omap_hwmod: timer6: softreset in 0 usec
[    2.391296] omap_hwmod: timer6: idling
[    2.420196] omap_hwmod: timer6: disabling clocks
[    2.455688] omap_hwmod: timer7: enabling
[    2.485900] omap_hwmod: timer7: enabling clocks
[    2.520721] omap_hwmod: timer7: resetting
[    2.551605] omap_hwmod: timer7: resetting via OCP SOFTRESET
[    2.594268] omap_hwmod: timer7: softreset in 0 usec
[    2.631713] omap_hwmod: timer7: idling
[    2.660614] omap_hwmod: timer7: disabling clocks
[    2.696105] omap_hwmod: timer8: enabling
[    2.726318] omap_hwmod: timer8: enabling clocks
[    2.761138] omap_hwmod: timer8: resetting
[    2.792022] omap_hwmod: timer8: resetting via OCP SOFTRESET
[    2.834716] omap_hwmod: timer8: softreset in 0 usec
[    2.872131] omap_hwmod: timer8: idling
[    2.901062] omap_hwmod: timer8: disabling clocks
[    2.936523] omap_hwmod: timer9: enabling
[    2.966735] omap_hwmod: timer9: enabling clocks
[    3.001556] omap_hwmod: timer9: resetting
[    3.032440] omap_hwmod: timer9: resetting via OCP SOFTRESET
[    3.075134] omap_hwmod: timer9: softreset in 0 usec
[    3.112579] omap_hwmod: timer9: idling
[    3.141510] omap_hwmod: timer9: disabling clocks
[    3.176971] omap_hwmod: timer10: enabling
[    3.207855] omap_hwmod: timer10: enabling clocks
[    3.243316] omap_hwmod: timer10: resetting
[    3.274871] omap_hwmod: timer10: resetting via OCP SOFTRESET
[    3.318206] omap_hwmod: timer10: softreset in 0 usec
[    3.356292] omap_hwmod: timer10: idling
[    3.385864] omap_hwmod: timer10: disabling clocks
[    3.421997] omap_hwmod: timer11: enabling
[    3.452880] omap_hwmod: timer11: enabling clocks
[    3.488342] omap_hwmod: timer11: resetting
[    3.519866] omap_hwmod: timer11: resetting via OCP SOFTRESET
[    3.563232] omap_hwmod: timer11: softreset in 0 usec
[    3.601318] omap_hwmod: timer11: idling
[    3.630889] omap_hwmod: timer11: disabling clocks
[    3.667022] omap_hwmod: wd_timer2: enabling
[    3.699188] omap_hwmod: wd_timer2: enabling clocks
[    3.735992] omap_hwmod: wd_timer2: resetting
[    3.768829] omap_hwmod: wd_timer2: resetting via OCP SOFTRESET
[    3.813629] omap_hwmod: wd_timer2: softreset in 62 usec
[    3.853698] omap_hwmod: wd_timer2: disabling
[    3.886779] omap_hwmod: wd_timer2: disabling clocks
[    3.924224] omap_hwmod: uart1: enabling
[    3.953796] omap_hwmod: uart1: enabling clocks

Here the serial port starts working again and I get the right text.

[    3.988006] omap_hwmod: uart2: enabling
[    3.992065] omap_hwmod: uart2: enabling clocks
[    3.996765] omap_hwmod: uart3: enabling
[    4.000793] omap_hwmod: uart3: enabling clocks
[    4.005523] omap_hwmod: dss_dispc: enabling
[    4.009948] omap_hwmod: dss_dispc: enabling clocks
[    4.014984] omap_hwmod: dss_dispc: resetting
[    4.019500] omap_hwmod: dss_dispc: resetting via OCP SOFTRESET
[    4.025634] omap_hwmod: dss_dispc: softreset in 0 usec
[    4.031005] omap_hwmod: dss_dispc: idling
[    4.035247] omap_hwmod: dss_dispc: disabling clocks


This is quite repeatable.
It seems to me that there might be something odd going on with clocks.
Maybe there is some dependency that isn't encoded properly or something?

The fact that UART3 has problems while the HDQ is originally being reset is
certainly very strange.

The other thing I discovered is that when I set the uart 'sleep_timeout' to 5
(all uarts) so that CPUIDLE can let CORE enter the lower power states, I get
a fairly steady stream of:

[ 1168.490478] omap_hwmod: uart1: enabling
[ 1168.490509] omap_hwmod: uart1: enabling clocks
[ 1168.490539] omap_hwmod: uart2: enabling
[ 1168.490539] omap_hwmod: uart2: enabling clocks
[ 1168.490631] omap_hwmod: uart3: enabling
[ 1168.490661] omap_hwmod: uart3: enabling clocks
[ 1168.490661] omap_hwmod: uart4: enabling
[ 1168.490692] omap_hwmod: uart4: enabling clocks
[ 1168.578796] omap_hwmod: uart3: idling
[ 1168.578796] omap_hwmod: uart3: disabling clocks
[ 1168.578826] omap_hwmod: uart4: idling
[ 1168.578826] omap_hwmod: uart4: disabling clocks
[ 1168.578857] omap_hwmod: uart1: idling
[ 1168.578857] omap_hwmod: uart1: disabling clocks
[ 1168.578887] omap_hwmod: uart2: idling
[ 1168.578887] omap_hwmod: uart2: disabling clocks
[ 1168.714263] omap_hwmod: uart1: enabling
[ 1168.714294] omap_hwmod: uart1: enabling clocks
[ 1168.714324] omap_hwmod: uart2: enabling
[ 1168.714324] omap_hwmod: uart2: enabling clocks
[ 1168.714416] omap_hwmod: uart3: enabling
[ 1168.714447] omap_hwmod: uart3: enabling clocks
[ 1168.714447] omap_hwmod: uart4: enabling
[ 1168.714477] omap_hwmod: uart4: enabling clocks
[ 1168.803344] omap_hwmod: uart3: idling
[ 1168.803344] omap_hwmod: uart3: disabling clocks
[ 1168.803375] omap_hwmod: uart4: idling
[ 1168.803375] omap_hwmod: uart4: disabling clocks
[ 1168.851257] omap_hwmod: uart3: enabling
[ 1168.851287] omap_hwmod: uart3: enabling clocks
[ 1168.851318] omap_hwmod: uart4: enabling
[ 1168.851318] omap_hwmod: uart4: enabling clocks
[ 1168.885955] omap_hwmod: uart3: idling
[ 1168.885986] omap_hwmod: uart3: disabling clocks
[ 1168.885986] omap_hwmod: uart4: idling
[ 1168.886016] omap_hwmod: uart4: disabling clocks
[ 1168.886383] omap_hwmod: uart3: enabling
[ 1168.886383] omap_hwmod: uart3: enabling clocks
[ 1168.886383] omap_hwmod: uart4: enabling
[ 1168.886413] omap_hwmod: uart4: enabling clocks
[ 1168.944885] omap_hwmod: uart3: idling
[ 1168.944915] omap_hwmod: uart3: disabling clocks


It seems to mostly be uart3 and uart4.
If there is any interaction between the uart clock and the hdq clock, this
could explain why the HDQ gets confused when all this is happening.

And if there is a connection there, then there could also be a connection
with a DSS clock which could be confusing it.

I admit this is largely surmise and hypothesis - but there is definitely
something odd here!

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-19 21:05                               ` NeilBrown
@ 2012-01-20  0:22                                 ` Kevin Hilman
  2012-01-21 12:12                                   ` NeilBrown
  0 siblings, 1 reply; 64+ messages in thread
From: Kevin Hilman @ 2012-01-20  0:22 UTC (permalink / raw)
  To: NeilBrown
  Cc: Joe Woodward, Tomi Valkeinen, Paul Walmsley, t-kristo,
	govindraj.r, linux-omap

NeilBrown <neilb@suse.de> writes:

> On Thu, 19 Jan 2012 11:37:39 -0800 Kevin Hilman <khilman@ti.com> wrote:
>
>> "Joe Woodward" <jw@terrafix.co.uk> writes:
>> 
>> [...]
>> 
>> > If I do (either from the console or via a button press on the screen)
>> > then I never get a SYNC_LOST.
>> >
>> >   echo 0 > /sys/devices/omapdss/display0/enabled
>> >   echo 1 > /sys/devices/omapdss/display0/enabled
>> >
>> > Just trying to think of some ideas that may be affecting the DSS...
>> >   - Could it to be to do with the GPIO being used as a wake source (i.e. does the GPIO driver do runtime_pm properly?)?
>> >   - Could it to be to do with the UART as it seems to fix itself whenever a character is pressed?
>> >   - Could it to be to do with the ordering in which drivers are resumed?
>> 
>> Here's my guess/hunch as to why the UART wakeup helps and GPIO doesn't.
>> 
>> The UART's are idled using timeouts, so after any activity (including a
>> wakeup) the UART timeout will not alow the UARTs to idle (and thus the
>> system to hit low power states) for a given timeout period.
>
> While I agree with that, I'm wondering if there might be something else odd
> relating to the UARTs that we haven't spotted yet.

I suspect the problem for both is constraints that aren't stated
explicitly.

> I'm chasing done the different problem of HDQ not coping with CPUIDLE but I
> think it is very likely to have a similar root cause, as HDQ goes wrong at
> the same time that DSS goes wrong, and is a vaguely similar way.
>
> I just tried compiling omap_hwmod.c with "#define DEBUG" and got some strange
> results.

Strange indeed.

I assume you are you using the runtime PM converted driver from Paul?  I
tried that branch on my 3530/Overo (console on UART3) and didn't see any
nul's on my UART3 console.  

[...]

> The other thing I discovered is that when I set the uart 'sleep_timeout' to 5
> (all uarts) so that CPUIDLE can let CORE enter the lower power states, I get
> a fairly steady stream of:
>
> [ 1168.490478] omap_hwmod: uart1: enabling
> [ 1168.490509] omap_hwmod: uart1: enabling clocks
> [ 1168.490539] omap_hwmod: uart2: enabling
> [ 1168.490539] omap_hwmod: uart2: enabling clocks
> [ 1168.490631] omap_hwmod: uart3: enabling
> [ 1168.490661] omap_hwmod: uart3: enabling clocks
> [ 1168.490661] omap_hwmod: uart4: enabling
> [ 1168.490692] omap_hwmod: uart4: enabling clocks
> [ 1168.578796] omap_hwmod: uart3: idling
> [ 1168.578796] omap_hwmod: uart3: disabling clocks
> [ 1168.578826] omap_hwmod: uart4: idling
> [ 1168.578826] omap_hwmod: uart4: disabling clocks
> [ 1168.578857] omap_hwmod: uart1: idling
> [ 1168.578857] omap_hwmod: uart1: disabling clocks
> [ 1168.578887] omap_hwmod: uart2: idling
> [ 1168.578887] omap_hwmod: uart2: disabling clocks
> [ 1168.714263] omap_hwmod: uart1: enabling
> [ 1168.714294] omap_hwmod: uart1: enabling clocks
> [ 1168.714324] omap_hwmod: uart2: enabling
> [ 1168.714324] omap_hwmod: uart2: enabling clocks
> [ 1168.714416] omap_hwmod: uart3: enabling
> [ 1168.714447] omap_hwmod: uart3: enabling clocks
> [ 1168.714447] omap_hwmod: uart4: enabling
> [ 1168.714477] omap_hwmod: uart4: enabling clocks
> [ 1168.803344] omap_hwmod: uart3: idling
> [ 1168.803344] omap_hwmod: uart3: disabling clocks
> [ 1168.803375] omap_hwmod: uart4: idling
> [ 1168.803375] omap_hwmod: uart4: disabling clocks
> [ 1168.851257] omap_hwmod: uart3: enabling
> [ 1168.851287] omap_hwmod: uart3: enabling clocks
> [ 1168.851318] omap_hwmod: uart4: enabling
> [ 1168.851318] omap_hwmod: uart4: enabling clocks
> [ 1168.885955] omap_hwmod: uart3: idling
> [ 1168.885986] omap_hwmod: uart3: disabling clocks
> [ 1168.885986] omap_hwmod: uart4: idling
> [ 1168.886016] omap_hwmod: uart4: disabling clocks
> [ 1168.886383] omap_hwmod: uart3: enabling
> [ 1168.886383] omap_hwmod: uart3: enabling clocks
> [ 1168.886383] omap_hwmod: uart4: enabling
> [ 1168.886413] omap_hwmod: uart4: enabling clocks
> [ 1168.944885] omap_hwmod: uart3: idling
> [ 1168.944915] omap_hwmod: uart3: disabling clocks

At least this part is expected.  

In the kernel you're using the UART clocks are enabled/disabled during
the idle path depending on the low-power state being targetted, so would
expect to see lots of UART clock gating going on.

Kevin

> It seems to mostly be uart3 and uart4.
> If there is any interaction between the uart clock and the hdq clock, this
> could explain why the HDQ gets confused when all this is happening.
>
> And if there is a connection there, then there could also be a connection
> with a DSS clock which could be confusing it.
>
> I admit this is largely surmise and hypothesis - but there is definitely
> something odd here!
>
> Thanks,
> NeilBrown

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-19 19:24           ` Kevin Hilman
@ 2012-01-20  7:16             ` Tomi Valkeinen
  2012-01-20 18:06               ` Kevin Hilman
  0 siblings, 1 reply; 64+ messages in thread
From: Tomi Valkeinen @ 2012-01-20  7:16 UTC (permalink / raw)
  To: Kevin Hilman; +Cc: Joe Woodward, linux-omap, Archit Taneja

[-- Attachment #1: Type: text/plain, Size: 1589 bytes --]

On Thu, 2012-01-19 at 11:24 -0800, Kevin Hilman wrote:
> Tomi Valkeinen <tomi.valkeinen@ti.com> writes:

> > Now, you already said using pm_runtime_put_sync version is the correct
> > way when suspending. But to use that I need to either always use
> > pm_runtime_put_sync, or add an extra boolean which marks that we're
> > suspending, and pass that around, or make it a DSS global variable.
> 
> I'm not sure why can't you use the sync version just in the suspend
> callback?

To do that the suspend callback should be the one that disables the
device and calls pm_runtime_suspend. With DSS that's not the case, it's
the panel drivers that are in charge of enabling/disabling DSS (by
calling appropriate functions in omapdss).

The only thing that the omapdss suspend callback does is to call the
normal disable functions on the display drivers. This leads to calls to
disable-functions on the dss hwmod drivers, and then pm_runtime_put
calls. But at the point the pm_runtime funcs are called, the code has no
idea that we're actually doing system suspend. Thus I'd need to pass
that information somehow, probably with a global variable.

And while that's not difficult to do, it sure feels a bit ugly.

I think I'll just change the pm_runtime_put calls to sync versions for
now. I imagine the perf impact with the change should be negligible.
I'll return to this issue after the devtree adaptation has been made, as
it changes the child-parent relations of the dss related devices, and
perhaps managing the PM states will also get a bit easier.

 Tomi


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-20  7:16             ` Tomi Valkeinen
@ 2012-01-20 18:06               ` Kevin Hilman
  0 siblings, 0 replies; 64+ messages in thread
From: Kevin Hilman @ 2012-01-20 18:06 UTC (permalink / raw)
  To: Tomi Valkeinen; +Cc: Joe Woodward, linux-omap, Archit Taneja

Tomi Valkeinen <tomi.valkeinen@ti.com> writes:

> On Thu, 2012-01-19 at 11:24 -0800, Kevin Hilman wrote:
>> Tomi Valkeinen <tomi.valkeinen@ti.com> writes:
>
>> > Now, you already said using pm_runtime_put_sync version is the correct
>> > way when suspending. But to use that I need to either always use
>> > pm_runtime_put_sync, or add an extra boolean which marks that we're
>> > suspending, and pass that around, or make it a DSS global variable.
>> 
>> I'm not sure why can't you use the sync version just in the suspend
>> callback?
>
> To do that the suspend callback should be the one that disables the
> device and calls pm_runtime_suspend. With DSS that's not the case, it's
> the panel drivers that are in charge of enabling/disabling DSS (by
> calling appropriate functions in omapdss).

Ah, OK.  makes sense.

> The only thing that the omapdss suspend callback does is to call the
> normal disable functions on the display drivers. This leads to calls to
> disable-functions on the dss hwmod drivers, and then pm_runtime_put
> calls. But at the point the pm_runtime funcs are called, the code has no
> idea that we're actually doing system suspend. Thus I'd need to pass
> that information somehow, probably with a global variable.
>
> And while that's not difficult to do, it sure feels a bit ugly.
>
> I think I'll just change the pm_runtime_put calls to sync versions for
> now. I imagine the perf impact with the change should be negligible.
> I'll return to this issue after the devtree adaptation has been made, as
> it changes the child-parent relations of the dss related devices, and
> perhaps managing the PM states will also get a bit easier.

Sounds reasonable.

Kevin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-20  0:22                                 ` Kevin Hilman
@ 2012-01-21 12:12                                   ` NeilBrown
  2012-01-23 22:11                                     ` Kevin Hilman
  0 siblings, 1 reply; 64+ messages in thread
From: NeilBrown @ 2012-01-21 12:12 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Joe Woodward, Tomi Valkeinen, Paul Walmsley, t-kristo,
	govindraj.r, linux-omap

[-- Attachment #1: Type: text/plain, Size: 6181 bytes --]

On Thu, 19 Jan 2012 16:22:37 -0800 Kevin Hilman <khilman@ti.com> wrote:

> NeilBrown <neilb@suse.de> writes:
> 
> > On Thu, 19 Jan 2012 11:37:39 -0800 Kevin Hilman <khilman@ti.com> wrote:
> >
> >> "Joe Woodward" <jw@terrafix.co.uk> writes:
> >> 
> >> [...]
> >> 
> >> > If I do (either from the console or via a button press on the screen)
> >> > then I never get a SYNC_LOST.
> >> >
> >> >   echo 0 > /sys/devices/omapdss/display0/enabled
> >> >   echo 1 > /sys/devices/omapdss/display0/enabled
> >> >
> >> > Just trying to think of some ideas that may be affecting the DSS...
> >> >   - Could it to be to do with the GPIO being used as a wake source (i.e. does the GPIO driver do runtime_pm properly?)?
> >> >   - Could it to be to do with the UART as it seems to fix itself whenever a character is pressed?
> >> >   - Could it to be to do with the ordering in which drivers are resumed?
> >> 
> >> Here's my guess/hunch as to why the UART wakeup helps and GPIO doesn't.
> >> 
> >> The UART's are idled using timeouts, so after any activity (including a
> >> wakeup) the UART timeout will not alow the UARTs to idle (and thus the
> >> system to hit low power states) for a given timeout period.
> >
> > While I agree with that, I'm wondering if there might be something else odd
> > relating to the UARTs that we haven't spotted yet.
> 
> I suspect the problem for both is constraints that aren't stated
> explicitly.
> 
> > I'm chasing done the different problem of HDQ not coping with CPUIDLE but I
> > think it is very likely to have a similar root cause, as HDQ goes wrong at
> > the same time that DSS goes wrong, and is a vaguely similar way.
> >
> > I just tried compiling omap_hwmod.c with "#define DEBUG" and got some strange
> > results.
> 
> Strange indeed.
> 
> I assume you are you using the runtime PM converted driver from Paul?  I
> tried that branch on my 3530/Overo (console on UART3) and didn't see any
> nul's on my UART3 console.  

I found out what is causing this.  I'm open to suggestions for fixing it.

I turned on debugging in clock.c and found that just before the nuls start
appearing, I'm told
        clock: dpll4_ck: disabling in hardware
and just before it starts working again I see
        clock: dpll4_ck: enabling in hardware

The uart3 fck depends on dpll4_ck, but this early it boot, uart3 hasn't been
initialised (I use earlyprintk to see this at all).  So as soon as something
that depends on dpll4_ck goes to sleep and disables its clock, the uart3
clock can get turned off because the dependency isn't known yet.

I think it is mmc3 turning off that initially turns dpll4_ck off.
Then when HDQ wants hdq_fck which depends on dpll4_ck, we get the uart clock
back for a while.
Then when uart2 gets enabled, uart3 comes back on, and soon after uart3 gets
enabled, which keeps it on.

So it seems we need some way to avoid turning off any clocks until all
dependencies have been registered.  At least when earlyprintk is being used.

Would that make sense?

Unfortunately understanding this issue doesn't seem to give any insight into
the other issues that seemed to be related - ah well...


> 
> [...]
> 
> > The other thing I discovered is that when I set the uart 'sleep_timeout' to 5
> > (all uarts) so that CPUIDLE can let CORE enter the lower power states, I get
> > a fairly steady stream of:
> >
> > [ 1168.490478] omap_hwmod: uart1: enabling
> > [ 1168.490509] omap_hwmod: uart1: enabling clocks
> > [ 1168.490539] omap_hwmod: uart2: enabling
> > [ 1168.490539] omap_hwmod: uart2: enabling clocks
> > [ 1168.490631] omap_hwmod: uart3: enabling
> > [ 1168.490661] omap_hwmod: uart3: enabling clocks
> > [ 1168.490661] omap_hwmod: uart4: enabling
> > [ 1168.490692] omap_hwmod: uart4: enabling clocks
> > [ 1168.578796] omap_hwmod: uart3: idling
> > [ 1168.578796] omap_hwmod: uart3: disabling clocks
> > [ 1168.578826] omap_hwmod: uart4: idling
> > [ 1168.578826] omap_hwmod: uart4: disabling clocks
> > [ 1168.578857] omap_hwmod: uart1: idling
> > [ 1168.578857] omap_hwmod: uart1: disabling clocks
> > [ 1168.578887] omap_hwmod: uart2: idling
> > [ 1168.578887] omap_hwmod: uart2: disabling clocks
> > [ 1168.714263] omap_hwmod: uart1: enabling
> > [ 1168.714294] omap_hwmod: uart1: enabling clocks
> > [ 1168.714324] omap_hwmod: uart2: enabling
> > [ 1168.714324] omap_hwmod: uart2: enabling clocks
> > [ 1168.714416] omap_hwmod: uart3: enabling
> > [ 1168.714447] omap_hwmod: uart3: enabling clocks
> > [ 1168.714447] omap_hwmod: uart4: enabling
> > [ 1168.714477] omap_hwmod: uart4: enabling clocks
> > [ 1168.803344] omap_hwmod: uart3: idling
> > [ 1168.803344] omap_hwmod: uart3: disabling clocks
> > [ 1168.803375] omap_hwmod: uart4: idling
> > [ 1168.803375] omap_hwmod: uart4: disabling clocks
> > [ 1168.851257] omap_hwmod: uart3: enabling
> > [ 1168.851287] omap_hwmod: uart3: enabling clocks
> > [ 1168.851318] omap_hwmod: uart4: enabling
> > [ 1168.851318] omap_hwmod: uart4: enabling clocks
> > [ 1168.885955] omap_hwmod: uart3: idling
> > [ 1168.885986] omap_hwmod: uart3: disabling clocks
> > [ 1168.885986] omap_hwmod: uart4: idling
> > [ 1168.886016] omap_hwmod: uart4: disabling clocks
> > [ 1168.886383] omap_hwmod: uart3: enabling
> > [ 1168.886383] omap_hwmod: uart3: enabling clocks
> > [ 1168.886383] omap_hwmod: uart4: enabling
> > [ 1168.886413] omap_hwmod: uart4: enabling clocks
> > [ 1168.944885] omap_hwmod: uart3: idling
> > [ 1168.944915] omap_hwmod: uart3: disabling clocks
> 
> At least this part is expected.  
> 
> In the kernel you're using the UART clocks are enabled/disabled during
> the idle path depending on the low-power state being targetted, so would
> expect to see lots of UART clock gating going on.

Even though uarts 1,2,4 aren't even in use?

Maybe this is fixed in 3.3?

I'm just using 3.2 with a small number of extras which don't include the
power-management for uart (I tries backporting some of it, but it completely
disabled cpuidle - I think because the latency is registers is much too low,
as Paul observed).

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-17 21:24               ` NeilBrown
@ 2012-01-22  0:07                 ` Paul Walmsley
  2012-01-22 11:30                   ` NeilBrown
  2012-01-24 10:37                   ` OMAP HDQ: was " NeilBrown
  0 siblings, 2 replies; 64+ messages in thread
From: Paul Walmsley @ 2012-01-22  0:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

On Wed, 18 Jan 2012, NeilBrown wrote:

> Oh - another thing.
> Sometimes during early boot I get:
> 
> [    0.158447] omap_hwmod: usbtll_fck: missing clockdomain for usbtll_fck.
> [    0.176879] omap_hwmod: hdq: softreset failed (waited 10000 usec)

This latter bug just got fixed, it's updated in the new HDQ series that I 
sent out:

http://marc.info/?l=linux-omap&m=132719073710874&w=2

BTW, if you could give that a spin for us at some point, it would be 
greatly appreciated; I don't think I have a board with a 1-wire device on 
it (at least, not that I know of).

> At the same time the UART seem to hiccup and my USB-Serial port loses track
> and doesn't see any more characters until I close and re-open.
> 
> Could that be related?  Any idea what it means?

It shouldn't be related.


- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-22  0:07                 ` Paul Walmsley
@ 2012-01-22 11:30                   ` NeilBrown
  2012-01-24 10:37                   ` OMAP HDQ: was " NeilBrown
  1 sibling, 0 replies; 64+ messages in thread
From: NeilBrown @ 2012-01-22 11:30 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

[-- Attachment #1: Type: text/plain, Size: 855 bytes --]

On Sat, 21 Jan 2012 17:07:18 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:

> On Wed, 18 Jan 2012, NeilBrown wrote:
> 
> > Oh - another thing.
> > Sometimes during early boot I get:
> > 
> > [    0.158447] omap_hwmod: usbtll_fck: missing clockdomain for usbtll_fck.
> > [    0.176879] omap_hwmod: hdq: softreset failed (waited 10000 usec)
> 
> This latter bug just got fixed, it's updated in the new HDQ series that I 
> sent out:
> 
> http://marc.info/?l=linux-omap&m=132719073710874&w=2
> 
> BTW, if you could give that a spin for us at some point, it would be 
> greatly appreciated; I don't think I have a board with a 1-wire device on 
> it (at least, not that I know of).

I'll try them against 3.2 tomorrow.  I'll be a while longer before I move up
to 3.3-rc but when I do I'll try to test there too.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-21 12:12                                   ` NeilBrown
@ 2012-01-23 22:11                                     ` Kevin Hilman
  2012-01-25  0:32                                       ` NeilBrown
  0 siblings, 1 reply; 64+ messages in thread
From: Kevin Hilman @ 2012-01-23 22:11 UTC (permalink / raw)
  To: NeilBrown
  Cc: Joe Woodward, Tomi Valkeinen, Paul Walmsley, t-kristo,
	govindraj.r, linux-omap

NeilBrown <neilb@suse.de> writes:

> On Thu, 19 Jan 2012 16:22:37 -0800 Kevin Hilman <khilman@ti.com> wrote:
>
>> NeilBrown <neilb@suse.de> writes:
>> 
>> > On Thu, 19 Jan 2012 11:37:39 -0800 Kevin Hilman <khilman@ti.com> wrote:
>> >
>> >> "Joe Woodward" <jw@terrafix.co.uk> writes:

>> >> 

[...]

>> At least this part is expected.  
>> 
>> In the kernel you're using the UART clocks are enabled/disabled during
>> the idle path depending on the low-power state being targetted, so would
>> expect to see lots of UART clock gating going on.
>
> Even though uarts 1,2,4 aren't even in use?

Yes.  Our UART idle management before v3.3 was, um... a disaster.

> Maybe this is fixed in 3.3?

As of v3.3, the UARTs are managed independently using runtime PM
autosupend, so only UART that are in use should be coming on and back
off again.

However, as Paul has recently posted some UART fixes, you'll see that we
have a few kinks to work out in the UART driver as well.

Kevin

^ permalink raw reply	[flat|nested] 64+ messages in thread

* OMAP HDQ: was Re: DSS2/PM on 3.2 broken?
  2012-01-22  0:07                 ` Paul Walmsley
  2012-01-22 11:30                   ` NeilBrown
@ 2012-01-24 10:37                   ` NeilBrown
  2012-01-26 14:19                     ` Paul Walmsley
  1 sibling, 1 reply; 64+ messages in thread
From: NeilBrown @ 2012-01-24 10:37 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

[-- Attachment #1: Type: text/plain, Size: 1898 bytes --]

On Sat, 21 Jan 2012 17:07:18 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:

> On Wed, 18 Jan 2012, NeilBrown wrote:
> 
> > Oh - another thing.
> > Sometimes during early boot I get:
> > 
> > [    0.158447] omap_hwmod: usbtll_fck: missing clockdomain for usbtll_fck.
> > [    0.176879] omap_hwmod: hdq: softreset failed (waited 10000 usec)
> 
> This latter bug just got fixed, it's updated in the new HDQ series that I 
> sent out:
> 
> http://marc.info/?l=linux-omap&m=132719073710874&w=2
> 
> BTW, if you could give that a spin for us at some point, it would be 
> greatly appreciated; I don't think I have a board with a 1-wire device on 
> it (at least, not that I know of).

Yes, the patch series seems to work OK - at least it doesn't break anything:
I can still access my battery charge meter.

It doesn't fix the problem I am having where HDQ gets confused when CPUIDLE
kicks in, but I think I might have a handle on that now.

Unlike other modules, HDQ doesn't have SIDLEMODE.
According to 18.4.5 (AM/DM37x Multimedia Device Silicon Revision) it acts as
though SIDLEMODE were set to SIDLE_FORCE so if the clock-domain trys to
switch off, HDQ lets it.

Normally UART has SIDLEMODE set to SIDLE_NO so that keeps the necessary clock
domain active.  But if I let the UARTs sleep, SIDLEMODE changes to
SIDLE_SMART which can allow the clockdomain to be stopped, which affects HDQ.

However I'm not sure how PRCM.CM_AUTOIDLE1_CORE fits into this. if AUTO_HDQ
is clear - which it is by default and I cannot see code that changes it -
then the clock for HDQ should be independent of .... something.

I definitely see behaviour that looks like the HDQ clock stopping
when the UART allows SIDLE_SMART or when I explicitly deny idling of the
clockdomains.

Should there be special handling of HDQ because it doesn't have SIDLEMODE??

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: DSS2/PM on 3.2 broken?
  2012-01-23 22:11                                     ` Kevin Hilman
@ 2012-01-25  0:32                                       ` NeilBrown
  0 siblings, 0 replies; 64+ messages in thread
From: NeilBrown @ 2012-01-25  0:32 UTC (permalink / raw)
  To: Kevin Hilman
  Cc: Joe Woodward, Tomi Valkeinen, Paul Walmsley, t-kristo,
	govindraj.r, linux-omap

[-- Attachment #1: Type: text/plain, Size: 1597 bytes --]

On Mon, 23 Jan 2012 14:11:16 -0800 Kevin Hilman <khilman@ti.com> wrote:

> NeilBrown <neilb@suse.de> writes:
> 
> > On Thu, 19 Jan 2012 16:22:37 -0800 Kevin Hilman <khilman@ti.com> wrote:
> >
> >> NeilBrown <neilb@suse.de> writes:
> >> 
> >> > On Thu, 19 Jan 2012 11:37:39 -0800 Kevin Hilman <khilman@ti.com> wrote:
> >> >
> >> >> "Joe Woodward" <jw@terrafix.co.uk> writes:
> 
> >> >> 
> 
> [...]
> 
> >> At least this part is expected.  
> >> 
> >> In the kernel you're using the UART clocks are enabled/disabled during
> >> the idle path depending on the low-power state being targetted, so would
> >> expect to see lots of UART clock gating going on.
> >
> > Even though uarts 1,2,4 aren't even in use?
> 
> Yes.  Our UART idle management before v3.3 was, um... a disaster.
> 
> > Maybe this is fixed in 3.3?
> 
> As of v3.3, the UARTs are managed independently using runtime PM
> autosupend, so only UART that are in use should be coming on and back
> off again.
> 
> However, as Paul has recently posted some UART fixes, you'll see that we
> have a few kinks to work out in the UART driver as well.

Good news ... I'll probably hold out until -rc2 before I start experimenting
with 3.3 much.

I think I understand why I was getting single-character corruption when
waking the UART while you just lost the character - I suspect you had
off_mode enabled.  That makes the difference for me.
Hopefully we can get rid of both the corruption and the character loss, but
I'll wait until I've tried 3.3 before I pursue that much more.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?
  2012-01-24 10:37                   ` OMAP HDQ: was " NeilBrown
@ 2012-01-26 14:19                     ` Paul Walmsley
  2012-01-27 22:35                       ` NeilBrown
  0 siblings, 1 reply; 64+ messages in thread
From: Paul Walmsley @ 2012-01-26 14:19 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

On Tue, 24 Jan 2012, NeilBrown wrote:

> On Sat, 21 Jan 2012 17:07:18 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
> 
> > On Wed, 18 Jan 2012, NeilBrown wrote:
> > 
> > > Oh - another thing.
> > > Sometimes during early boot I get:
> > > 
> > > [    0.158447] omap_hwmod: usbtll_fck: missing clockdomain for usbtll_fck.
> > > [    0.176879] omap_hwmod: hdq: softreset failed (waited 10000 usec)
> > 
> > This latter bug just got fixed, it's updated in the new HDQ series that I 
> > sent out:
> > 
> > http://marc.info/?l=linux-omap&m=132719073710874&w=2
> > 
> > BTW, if you could give that a spin for us at some point, it would be 
> > greatly appreciated; I don't think I have a board with a 1-wire device on 
> > it (at least, not that I know of).
> 
> Yes, the patch series seems to work OK - at least it doesn't break anything:
> I can still access my battery charge meter.

Thanks.  Should I add a Tested-by: from you on those patches?

> It doesn't fix the problem I am having where HDQ gets confused when CPUIDLE
> kicks in, but I think I might have a handle on that now.
> 
> Unlike other modules, HDQ doesn't have SIDLEMODE.
> According to 18.4.5 (AM/DM37x Multimedia Device Silicon Revision) it acts as
> though SIDLEMODE were set to SIDLE_FORCE so if the clock-domain trys to
> switch off, HDQ lets it.

Hmm.  Do you know if the problem occurs in the middle of the HDQ 
transaction, or when HDQ is completely idle?  I'd suspect the former.

Here's a theory: perhaps the MPU powerdomain is hitting a low-power state 
while waiting for an HDQ interrupt.  When the MPU powerdomain is in a low 
power state, so is the MPU interrupt controller, so the only way that the 
MPU can wake up is if the HDQ can issue a wakeup event to the PRCM.  And I 
don't see any evidence that the HDQ is capable of doing this, based on the 
HDQ sections of the TRM.  What a huge energy waste, if true.

Maybe try something like the following patch -- compile-tested only here.

If this works, you might want to try dropping this patch and using the pad 
mux to set a wakeup event on the 1-wire pad when the signal goes low.  
That might be a more power-efficient approach.  You may still have to use 
some PM QoS request there to ensure that the HDQ can wake up fast enough 
to see the pulse, but the constraint shouldn't need to be as ludicrously 
low as it is in the following patch.


- Paul

commit 92fa561191b60ddf1296e02a6206160e8a5718f0
Author: Paul Walmsley <paul@pwsan.com>
Date:   Thu Jan 26 07:04:40 2012 -0700

    Attempt to work around lack of HDQ wakeup support

diff --git a/drivers/w1/masters/omap_hdq.c b/drivers/w1/masters/omap_hdq.c
index 63e3eda..990e38b 100644
--- a/drivers/w1/masters/omap_hdq.c
+++ b/drivers/w1/masters/omap_hdq.c
@@ -17,6 +17,7 @@
 #include <linux/io.h>
 #include <linux/sched.h>
 #include <linux/pm_runtime.h>
+#include <linux/pm_qos.h>
 
 #include <asm/irq.h>
 #include <mach/hardware.h>
@@ -61,6 +62,8 @@ struct hdq_data {
 	/* lock status update */
 	struct  mutex		hdq_mutex;
 	int			hdq_usecount;
+	struct pm_qos_request	pm_qos_request;
+
 	u8			hdq_irqstatus;
 	/* device lock */
 	spinlock_t		hdq_spinlock;
@@ -409,6 +412,18 @@ static int omap_hdq_get(struct hdq_data *hdq_data)
 		goto rtn;
 	}
 
+	/*
+	 * XXX Horrible hack to keep the MPU powerdomain out of a
+	 * low-power state during HDQ transfers, since the HDQ doesn't
+	 * appear to be able to wake the system.  This constraint is
+	 * way too short - it will probably keep the MPU out of WFI
+	 * which is a total waste - but we don't have a
+	 * platform-independent way to control powerdomain states at
+	 * the moment and there is no obvious replacement for platform_data
+	 * fn ptrs AFAIK
+	 */
+	pm_qos_update_request(&hdq_data->pm_qos_request, 1);
+
 	if (OMAP_HDQ_MAX_USER == hdq_data->hdq_usecount) {
 		dev_dbg(hdq_data->dev, "attempt to exceed the max use count");
 		ret = -EINVAL;
@@ -464,6 +479,10 @@ static int omap_hdq_put(struct hdq_data *hdq_data)
 		if (0 == hdq_data->hdq_usecount)
 			pm_runtime_put_sync(hdq_data->dev);
 	}
+
+	pm_qos_update_request(&hdq_data->pm_qos_request,
+			      PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE);
+
 	mutex_unlock(&hdq_data->hdq_mutex);
 
 	return ret;
@@ -578,6 +597,10 @@ static int __devinit omap_hdq_probe(struct platform_device *pdev)
 	hdq_data->hdq_usecount = 0;
 	mutex_init(&hdq_data->hdq_mutex);
 
+	pm_qos_add_request(&hdq_data->pm_qos_request,
+			   PM_QOS_CPU_DMA_LATENCY,
+			   PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE);
+
 	pm_runtime_enable(&pdev->dev);
 	pm_runtime_get_sync(&pdev->dev);
 

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?
  2012-01-26 14:19                     ` Paul Walmsley
@ 2012-01-27 22:35                       ` NeilBrown
  2012-01-27 22:58                         ` Paul Walmsley
  0 siblings, 1 reply; 64+ messages in thread
From: NeilBrown @ 2012-01-27 22:35 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

[-- Attachment #1: Type: text/plain, Size: 7329 bytes --]

On Thu, 26 Jan 2012 07:19:19 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:

> On Tue, 24 Jan 2012, NeilBrown wrote:
> 
> > On Sat, 21 Jan 2012 17:07:18 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
> > 
> > > On Wed, 18 Jan 2012, NeilBrown wrote:
> > > 
> > > > Oh - another thing.
> > > > Sometimes during early boot I get:
> > > > 
> > > > [    0.158447] omap_hwmod: usbtll_fck: missing clockdomain for usbtll_fck.
> > > > [    0.176879] omap_hwmod: hdq: softreset failed (waited 10000 usec)
> > > 
> > > This latter bug just got fixed, it's updated in the new HDQ series that I 
> > > sent out:
> > > 
> > > http://marc.info/?l=linux-omap&m=132719073710874&w=2
> > > 
> > > BTW, if you could give that a spin for us at some point, it would be 
> > > greatly appreciated; I don't think I have a board with a 1-wire device on 
> > > it (at least, not that I know of).
> > 
> > Yes, the patch series seems to work OK - at least it doesn't break anything:
> > I can still access my battery charge meter.
> 
> Thanks.  Should I add a Tested-by: from you on those patches?

Well... I only tested them after backporting to 3.2, and Linus is a big
proponent of the idea that once you rebase patches you invalidate any
testing....
However I don't think there difference between 3.2 and 3.3-rc1 is big enough
in the area, so yes:

   Tested-by: NeilBrown <neilb@suse.de>

Thanks.


> 
> > It doesn't fix the problem I am having where HDQ gets confused when CPUIDLE
> > kicks in, but I think I might have a handle on that now.
> > 
> > Unlike other modules, HDQ doesn't have SIDLEMODE.
> > According to 18.4.5 (AM/DM37x Multimedia Device Silicon Revision) it acts as
> > though SIDLEMODE were set to SIDLE_FORCE so if the clock-domain trys to
> > switch off, HDQ lets it.
> 
> Hmm.  Do you know if the problem occurs in the middle of the HDQ 
> transaction, or when HDQ is completely idle?  I'd suspect the former.

I cannot tell.  Each transaction involves receiving two interrupts - one to
say the write completed and one to say the read completed.
I don't recall seeing any transaction which got the write-completion
interrupt but not the read-completion, but I haven't watched closely enough
to be sure.

> 
> Here's a theory: perhaps the MPU powerdomain is hitting a low-power state 
> while waiting for an HDQ interrupt.  When the MPU powerdomain is in a low 
> power state, so is the MPU interrupt controller, so the only way that the 
> MPU can wake up is if the HDQ can issue a wakeup event to the PRCM.  And I 
> don't see any evidence that the HDQ is capable of doing this, based on the 
> HDQ sections of the TRM.  What a huge energy waste, if true.

In the config I am testing MPU only goes to RET, never OFF. Same for CORE.

> 
> Maybe try something like the following patch -- compile-tested only here.
> 
> If this works, you might want to try dropping this patch and using the pad 
> mux to set a wakeup event on the 1-wire pad when the signal goes low.  
> That might be a more power-efficient approach.  You may still have to use 
> some PM QoS request there to ensure that the HDQ can wake up fast enough 
> to see the pulse, but the constraint shouldn't need to be as ludicrously 
> low as it is in the following patch.

Doesn't work - crashes :-(

pm_qos_power_init is defined as a late_initcall, so you cannot call
pm_qos_update_request until after all the initcalls have run.
But with your patch, the probe of the bq27000 triggers a read of the registers
which tries to update_request - and it goes 'bang'.

I really think the problem is the CORE pwrdm gating a clock because no module
says it needs it - i.e. nothing to do with MPU at all.

We want to keep CORE active when an HDQ transaction is happening, but MPU is
welcome to go to sleep.  I don't think you can express that with 'qos'.  I
think it needs some omap-specific machinery.

I can 'fix' the problem simply by making sure

		pwrdm_for_each_clkdm(core_pd, _cpuidle_deny_idle);

runs in omap3_enter_idle whenever HDQ is active.

One of the reasons that I think it is a clock problem rather than just
missing a wakeup event is that once the problem starts happening I cannot
recovery without rebooting.
i.e. even if I tell the UARTs to keep the clocks on permanently and keep the
CPUIDLE state at 0, the HDQ doesn't start working again.  It has clearly
become confused.
The HDQ doco makes a point of saying that you shouldn't issue any commands
(except 'enable clock') when the clock is disabled.  I think we end up doing
that and it gets confused and cannot recover.

I note that there is an ad-hoc dependency between the camera and various
power states as well.  Maybe we need a little bit of infrastructure so that
camera can say "Keep CORE and MPU on" (or whatever it needs) and HDQ can say
"Keep CORE on".  ???

Thanks,
NeilBrown


> 
> 
> - Paul
> 
> commit 92fa561191b60ddf1296e02a6206160e8a5718f0
> Author: Paul Walmsley <paul@pwsan.com>
> Date:   Thu Jan 26 07:04:40 2012 -0700
> 
>     Attempt to work around lack of HDQ wakeup support
> 
> diff --git a/drivers/w1/masters/omap_hdq.c b/drivers/w1/masters/omap_hdq.c
> index 63e3eda..990e38b 100644
> --- a/drivers/w1/masters/omap_hdq.c
> +++ b/drivers/w1/masters/omap_hdq.c
> @@ -17,6 +17,7 @@
>  #include <linux/io.h>
>  #include <linux/sched.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/pm_qos.h>
>  
>  #include <asm/irq.h>
>  #include <mach/hardware.h>
> @@ -61,6 +62,8 @@ struct hdq_data {
>  	/* lock status update */
>  	struct  mutex		hdq_mutex;
>  	int			hdq_usecount;
> +	struct pm_qos_request	pm_qos_request;
> +
>  	u8			hdq_irqstatus;
>  	/* device lock */
>  	spinlock_t		hdq_spinlock;
> @@ -409,6 +412,18 @@ static int omap_hdq_get(struct hdq_data *hdq_data)
>  		goto rtn;
>  	}
>  
> +	/*
> +	 * XXX Horrible hack to keep the MPU powerdomain out of a
> +	 * low-power state during HDQ transfers, since the HDQ doesn't
> +	 * appear to be able to wake the system.  This constraint is
> +	 * way too short - it will probably keep the MPU out of WFI
> +	 * which is a total waste - but we don't have a
> +	 * platform-independent way to control powerdomain states at
> +	 * the moment and there is no obvious replacement for platform_data
> +	 * fn ptrs AFAIK
> +	 */
> +	pm_qos_update_request(&hdq_data->pm_qos_request, 1);
> +
>  	if (OMAP_HDQ_MAX_USER == hdq_data->hdq_usecount) {
>  		dev_dbg(hdq_data->dev, "attempt to exceed the max use count");
>  		ret = -EINVAL;
> @@ -464,6 +479,10 @@ static int omap_hdq_put(struct hdq_data *hdq_data)
>  		if (0 == hdq_data->hdq_usecount)
>  			pm_runtime_put_sync(hdq_data->dev);
>  	}
> +
> +	pm_qos_update_request(&hdq_data->pm_qos_request,
> +			      PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE);
> +
>  	mutex_unlock(&hdq_data->hdq_mutex);
>  
>  	return ret;
> @@ -578,6 +597,10 @@ static int __devinit omap_hdq_probe(struct platform_device *pdev)
>  	hdq_data->hdq_usecount = 0;
>  	mutex_init(&hdq_data->hdq_mutex);
>  
> +	pm_qos_add_request(&hdq_data->pm_qos_request,
> +			   PM_QOS_CPU_DMA_LATENCY,
> +			   PM_QOS_CPU_DMA_LAT_DEFAULT_VALUE);
> +
>  	pm_runtime_enable(&pdev->dev);
>  	pm_runtime_get_sync(&pdev->dev);
>  


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?
  2012-01-27 22:35                       ` NeilBrown
@ 2012-01-27 22:58                         ` Paul Walmsley
  2012-01-28  0:40                           ` NeilBrown
  0 siblings, 1 reply; 64+ messages in thread
From: Paul Walmsley @ 2012-01-27 22:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

On Sat, 28 Jan 2012, NeilBrown wrote:

> On Thu, 26 Jan 2012 07:19:19 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
>
> > Thanks.  Should I add a Tested-by: from you on those patches?
> 
> Well... I only tested them after backporting to 3.2, and Linus is a big
> proponent of the idea that once you rebase patches you invalidate any
> testing....
> However I don't think there difference between 3.2 and 3.3-rc1 is big enough
> in the area, so yes:
> 
>    Tested-by: NeilBrown <neilb@suse.de>

Okay, thanks.

> > Here's a theory: perhaps the MPU powerdomain is hitting a low-power state 
> > while waiting for an HDQ interrupt.  When the MPU powerdomain is in a low 
> > power state, so is the MPU interrupt controller, so the only way that the 
> > MPU can wake up is if the HDQ can issue a wakeup event to the PRCM.  And I 
> > don't see any evidence that the HDQ is capable of doing this, based on the 
> > HDQ sections of the TRM.  What a huge energy waste, if true.
> 
> In the config I am testing MPU only goes to RET, never OFF. Same for CORE.

That qualifies as a low-power state for the MPU :-)

In MPU RET, at least in theory, the MPU INTC will not be operational, and 
thus unable to respond to interrupts.

> > Maybe try something like the following patch -- compile-tested only here.
> > 
> > If this works, you might want to try dropping this patch and using the pad 
> > mux to set a wakeup event on the 1-wire pad when the signal goes low.  
> > That might be a more power-efficient approach.  You may still have to use 
> > some PM QoS request there to ensure that the HDQ can wake up fast enough 
> > to see the pulse, but the constraint shouldn't need to be as ludicrously 
> > low as it is in the following patch.
> 
> Doesn't work - crashes :-(
> 
> pm_qos_power_init is defined as a late_initcall, so you cannot call
> pm_qos_update_request until after all the initcalls have run.
> But with your patch, the probe of the bq27000 triggers a read of the registers
> which tries to update_request - and it goes 'bang'.

Hmm, okay.  Could you please remove the register read from the HDQ probe 
and see if that makes a difference?

> I really think the problem is the CORE pwrdm gating a clock because no module
> says it needs it - i.e. nothing to do with MPU at all.

Until pm_runtime_put*() is called, the usecount of hdq_fck will still be 
non-zero.  So the CORE shouldn't be able to gate it or hdq_ick at that 
time, and thus should not be able to enter idle. Hence the question about 
where the problems occur: whether they occur in the middle of the 
transaction or when the HDQ clocks are disabled.

> We want to keep CORE active when an HDQ transaction is happening, but MPU is
> welcome to go to sleep.  I don't think you can express that with 'qos'.  I
> think it needs some omap-specific machinery.

The OMAP PRCM hardware should keep the CORE* clkdms active when the 
hdq_fck is enabled.  So it's possible there could be a PRCM silicon bug 
that doesn't take hdq_fck into account when determining whether the CORE_* 
clkdms are inactive.

> I can 'fix' the problem simply by making sure
> 
> 		pwrdm_for_each_clkdm(core_pd, _cpuidle_deny_idle);
> 
> runs in omap3_enter_idle whenever HDQ is active.

Hmm, that does suggest that it's not wakeup related.

> One of the reasons that I think it is a clock problem rather than just 
> missing a wakeup event is that once the problem starts happening I 
> cannot recovery without rebooting. i.e. even if I tell the UARTs to keep 
> the clocks on permanently and keep the CPUIDLE state at 0, the HDQ 
> doesn't start working again.  It has clearly become confused. The HDQ 
> doco makes a point of saying that you shouldn't issue any commands 
> (except 'enable clock') when the clock is disabled.  I think we end up 
> doing that and it gets confused and cannot recover.
> 
> I note that there is an ad-hoc dependency between the camera and various 
> power states as well.  Maybe we need a little bit of infrastructure so 
> that camera can say "Keep CORE and MPU on" (or whatever it needs) and 
> HDQ can say "Keep CORE on".  ???

I'm not familiar with the camera problems, but in the HDQ case, this 
should only be needed if a silicon bug exists.  Which is certainly 
possible; we've seen this problem with one other IP block in the past. 
Based on a quick glance at the errata, I don't see anything related to the 
HDQ, but that doesn't really mean anything.

In any case, we should be able to work around this via the hwmod layer and 
a special flag, if the problem really is a PRCM bug.  This will depend on 
the functional powerstate conversion.  

Thanks for the detailed test reports.

- Paul

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?
  2012-01-27 22:58                         ` Paul Walmsley
@ 2012-01-28  0:40                           ` NeilBrown
  2012-01-28  6:02                             ` Paul Walmsley
  0 siblings, 1 reply; 64+ messages in thread
From: NeilBrown @ 2012-01-28  0:40 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

[-- Attachment #1: Type: text/plain, Size: 8605 bytes --]

On Fri, 27 Jan 2012 15:58:40 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:

> On Sat, 28 Jan 2012, NeilBrown wrote:

> > > Here's a theory: perhaps the MPU powerdomain is hitting a low-power state 
> > > while waiting for an HDQ interrupt.  When the MPU powerdomain is in a low 
> > > power state, so is the MPU interrupt controller, so the only way that the 
> > > MPU can wake up is if the HDQ can issue a wakeup event to the PRCM.  And I 
> > > don't see any evidence that the HDQ is capable of doing this, based on the 
> > > HDQ sections of the TRM.  What a huge energy waste, if true.
> > 
> > In the config I am testing MPU only goes to RET, never OFF. Same for CORE.
> 
> That qualifies as a low-power state for the MPU :-)
> 
> In MPU RET, at least in theory, the MPU INTC will not be operational, and 
> thus unable to respond to interrupts.
> 
> > > Maybe try something like the following patch -- compile-tested only here.
> > > 
> > > If this works, you might want to try dropping this patch and using the pad 
> > > mux to set a wakeup event on the 1-wire pad when the signal goes low.  
> > > That might be a more power-efficient approach.  You may still have to use 
> > > some PM QoS request there to ensure that the HDQ can wake up fast enough 
> > > to see the pulse, but the constraint shouldn't need to be as ludicrously 
> > > low as it is in the following patch.
> > 
> > Doesn't work - crashes :-(
> > 
> > pm_qos_power_init is defined as a late_initcall, so you cannot call
> > pm_qos_update_request until after all the initcalls have run.
> > But with your patch, the probe of the bq27000 triggers a read of the registers
> > which tries to update_request - and it goes 'bang'.
> 
> Hmm, okay.  Could you please remove the register read from the HDQ probe 
> and see if that makes a difference?

Sorry, I misdiagnosed.  The problem was that the pm_qos_request structure
needs to be zeroed before pm_qos_add_request() or it complains, doesn't
initialise it, and future calls can crash.

So with 'kzalloc' in place of 'kmalloc' where hdq_data is allocated, the qos
patch does make my problem go away.  It feels a bit heavy-handed though.

> 
> > I really think the problem is the CORE pwrdm gating a clock because no module
> > says it needs it - i.e. nothing to do with MPU at all.
> 
> Until pm_runtime_put*() is called, the usecount of hdq_fck will still be 
> non-zero.  So the CORE shouldn't be able to gate it or hdq_ick at that 
> time, and thus should not be able to enter idle. Hence the question about 
> where the problems occur: whether they occur in the middle of the 
> transaction or when the HDQ clocks are disabled.
> 
> > We want to keep CORE active when an HDQ transaction is happening, but MPU is
> > welcome to go to sleep.  I don't think you can express that with 'qos'.  I
> > think it needs some omap-specific machinery.
> 
> The OMAP PRCM hardware should keep the CORE* clkdms active when the 
> hdq_fck is enabled.  So it's possible there could be a PRCM silicon bug 
> that doesn't take hdq_fck into account when determining whether the CORE_* 
> clkdms are inactive.

That isn't the way I understand the TRM and the code.  I admit there is a lot
that I don't understand, but I've really drilled down in this area and my
understanding aligns with what I see experimentally, so....

clkdm_allow_idle(struct clockdomain *clkdm) tells the hardware to allow that
clkdm to go idle, and it does so with no reference to whether any clk in the
clkdm is current enabled.  This seems to be called conditional only on which
IDLE state is being entered. Any state below C1 is entered with
clkdm_allow_idle in force.

When the hardware has been told that it can idle a clockdomain, it will
handshake with each module that uses the clockdomain.  i.e. it doesn't check
if the clock is enabled, it asks the module directly.

The module answers according to the SIDLE (slave idle) setting. This can be
set to
 - always say "no, don't idle, I'm busy"
 - always say "yes, force me idle, I don't care"
 - 'smartidle' where the response is based on the internal state of the
   module.

A good example of smartidle is the UART.  If it hasn't seen a character in or
out for about 10 (I think) character times it will say "yes, idle the clock",
else it will say "no".   When RX then goes low it will handshake back to say
"Hey, I'm busy all of a sudden, can I have the clock back please" which it
then gets in time to count off the start bit and then read the rest of the
incoming byte.(*)

However HDQ doesn't have SIDLE.  So the PRCM doesn't handshake with it.  It
just assumes that HDQ will say "yes, force me idle, I don't care".

The diversion about UARTs is relevant because when the UARTs have SIDLE set
to 'no' I don't see the problem (they keep the clock domain active).  It is
only when the UARTs set their SIDLE to smartidle that the problem appears.

So: if the HDQ driver wants the clkdm to stay on it has to explicitly arrange
it in software by appropriate clkdm_deny_idle() calls.

From the "System Power Management and Wakeup" section of the HDQ chapter of
the TRM:


     As part of the system-wide power-management scheme, the HDQ/1-Wire module
     can go into idle state at the request of the PRCM module (for more information,
     see Chapter 3, Power, Reset, and Clock Management). However, the HDQ/1-Wire module
     does not support  handshake protocol with the PRCM.

and 

     Software must ensure correct clock management.

It also mentions the AUTO_HDQ bit of PRCM.CM_AUTOIDLE1_CORE.  This had me
confused for a while, but I think it only affects the iclk.  I don't think the
iclk is the problem.  It is the fclk that is disappearing because the clkdm
that feeds it is being turned off.


> 
> > I can 'fix' the problem simply by making sure
> > 
> > 		pwrdm_for_each_clkdm(core_pd, _cpuidle_deny_idle);
> > 
> > runs in omap3_enter_idle whenever HDQ is active.
> 
> Hmm, that does suggest that it's not wakeup related.
> 
> > One of the reasons that I think it is a clock problem rather than just 
> > missing a wakeup event is that once the problem starts happening I 
> > cannot recovery without rebooting. i.e. even if I tell the UARTs to keep 
> > the clocks on permanently and keep the CPUIDLE state at 0, the HDQ 
> > doesn't start working again.  It has clearly become confused. The HDQ 
> > doco makes a point of saying that you shouldn't issue any commands 
> > (except 'enable clock') when the clock is disabled.  I think we end up 
> > doing that and it gets confused and cannot recover.
> > 
> > I note that there is an ad-hoc dependency between the camera and various 
> > power states as well.  Maybe we need a little bit of infrastructure so 
> > that camera can say "Keep CORE and MPU on" (or whatever it needs) and 
> > HDQ can say "Keep CORE on".  ???
> 
> I'm not familiar with the camera problems, but in the HDQ case, this 
> should only be needed if a silicon bug exists.  Which is certainly 
> possible; we've seen this problem with one other IP block in the past. 
> Based on a quick glance at the errata, I don't see anything related to the 
> HDQ, but that doesn't really mean anything.
> 
> In any case, we should be able to work around this via the hwmod layer and 
> a special flag, if the problem really is a PRCM bug.  This will depend on 
> the functional powerstate conversion.  

My idea of a fix would be a secondary reference count on each clkdm.
When a clk that doesn't have an associated SIDLE setting is activated we
increase this reference count (and decrease it when the clk is deactivated).

While the count is non-zero we deny_idle for that clkdm.

Or something like that.


Thanks for listening :-)

NeilBrown


> 
> Thanks for the detailed test reports.
> 
> - Paul


(*) Another note about UARTs.
  In 3.2 (haven't checked 3.3) the driver not only sets SMARTIDLE but also
  stops the clocks when things are idle.
  So when RX goes low and the UART wakes us up it doesn't get a clock
  straight away but we have to wait for code to enable the clock.
  In my experiments, this is fast enough to stay in-sync for speeds up to
  38400 (I think).  Above there we miss the first bit and get corrupt
  characters.
  I think that if the UART is expecting data it should not disable the clock
  but should use SMARTIDLE to let it turn off, and then turn back on again
  quickly.... but maybe 3.3 gets that right.

  NB

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?
  2012-01-28  0:40                           ` NeilBrown
@ 2012-01-28  6:02                             ` Paul Walmsley
  2012-02-01  7:51                               ` NeilBrown
  0 siblings, 1 reply; 64+ messages in thread
From: Paul Walmsley @ 2012-01-28  6:02 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

On Sat, 28 Jan 2012, NeilBrown wrote:

> On Fri, 27 Jan 2012 15:58:40 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
> 
> > Hmm, okay.  Could you please remove the register read from the HDQ probe 
> > and see if that makes a difference?
> 
> So with 'kzalloc' in place of 'kmalloc' where hdq_data is allocated, the qos
> patch does make my problem go away.  It feels a bit heavy-handed though.

Thanks for the test.  It's not clear to me how this driver could really 
work well if the MPU powerdomain is able to go to a low power state 
between the two phases of the transaction.  There's no wakeup source, so 
the interrupt won't be handled until another device wakes the system up.  
Of course, if you have a lot of wakeups coming from other devices, or if 
something is forcing the MPU powerdomain to stay ON or INACTIVE, this may 
not really matter.

So you might want that patch, at least on 3.3.  But from your following 
text, it sounds like there is another problem.

> > The OMAP PRCM hardware should keep the CORE* clkdms active when the 
> > hdq_fck is enabled.  So it's possible there could be a PRCM silicon bug 
> > that doesn't take hdq_fck into account when determining whether the CORE_* 
> > clkdms are inactive.

...

> clkdm_allow_idle(struct clockdomain *clkdm) tells the hardware to allow that
> clkdm to go idle, and it does so with no reference to whether any clk in the
> clkdm is current enabled.

Since the HDQ module doesn't support the idle protocol, the target clock 
FSM in the CM is what should determine whether the module is considered 
idle or not.  And as long as the bit in the CM_FCLKEN register 
corresponding to HDQ_FCLK is set, the FSM should consider the module as 
active, and the clockdomain should not be allowed to go inactive.

I write "should."  Given that big caution at the bottom of section 20.4.5 
"System Power Management and Wakeup", this appears to have not been 
correctly implemented.

> It also mentions the AUTO_HDQ bit of PRCM.CM_AUTOIDLE1_CORE.  This had me
> confused for a while, but I think it only affects the iclk.  

That's correct.

> I don't think the iclk is the problem.  It is the fclk that is 
> disappearing because the clkdm that feeds it is being turned off.

Disabling AUTO_HDQ is probably a reasonable workaround.  It would prevent 
the CORE_L4 clockdomain from going inactive, and that happens to be where 
the functional clock originates from, also.

We have a partial facility to handle this type of bug in the existing 
code.  Could you please try the patch enclosed at the bottom of this 
E-mail and see if it helps?

> My idea of a fix would be a secondary reference count on each clkdm.
> When a clk that doesn't have an associated SIDLE setting is activated we
> increase this reference count (and decrease it when the clk is deactivated).
> 
> While the count is non-zero we deny_idle for that clkdm.
> 
> Or something like that.

I think we can work around this case with the OCPIF_SWSUP_IDLE flag, since 
the functional clock and interface clock are in the same clockdomain.

> Thanks for listening :-)

Thanks for working through this bug and helping test :-)

> (*) Another note about UARTs.
>   In 3.2 (haven't checked 3.3) the driver not only sets SMARTIDLE but also
>   stops the clocks when things are idle.
>   So when RX goes low and the UART wakes us up it doesn't get a clock
>   straight away but we have to wait for code to enable the clock.
>   In my experiments, this is fast enough to stay in-sync for speeds up to
>   38400 (I think).  Above there we miss the first bit and get corrupt
>   characters.
>   I think that if the UART is expecting data it should not disable the clock
>   but should use SMARTIDLE to let it turn off, and then turn back on again
>   quickly.... but maybe 3.3 gets that right.

With the most recent UART patch set that was posted against 3.3-rc1, as 
long as the CORE powerdomain isn't in a low-power state, the UART can 
enter idle with clocks cut and still wake up quickly enough to handle 
incoming data at 115200 8n1 on the 35xx BeagleBoard.  That all happens in 
hardware; no kernel involvement is needed.  I don't see the corrupt 
characters that you do, but there may be a 35xx vs. 37xx difference here.  
(You're using an OMAP37xx?)


- Paul

>From 16b9f4ee44f96846fad5a9cb0e55bcd25b77ed59 Mon Sep 17 00:00:00 2001
From: Paul Walmsley <paul@pwsan.com>
Date: Fri, 27 Jan 2012 22:46:34 -0700
Subject: [PATCH] Try software-supervised control of HDQ_ICLK

 Do not automatically enable interface clock autoidle on
 HDQ_ICLK. Require the hwmod layer to use software gating on
 HDQ_ICLK. This is because the CM FSM that handles HDQ_FCLK
 appears to not be working as expected.  This patch is
 experimental, untested, and for bug-testing purposes only.

---
 arch/arm/mach-omap2/clock3xxx_data.c       |    1 +
 arch/arm/mach-omap2/omap_hwmod_3xxx_data.c |    2 +-
 arch/arm/plat-omap/clock.c                 |    4 +++-
 arch/arm/plat-omap/include/plat/clock.h    |    1 +
 4 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mach-omap2/clock3xxx_data.c b/arch/arm/mach-omap2/clock3xxx_data.c
index 98cb408..96ba9f9 100644
--- a/arch/arm/mach-omap2/clock3xxx_data.c
+++ b/arch/arm/mach-omap2/clock3xxx_data.c
@@ -1883,6 +1883,7 @@ static struct clk hdq_ick = {
 	.enable_bit	= OMAP3430_EN_HDQ_SHIFT,
 	.clkdm_name	= "core_l4_clkdm",
 	.recalc		= &followparent_recalc,
+	.flags		= SWSUP_ICLK_IDLE, 
 };
 
 static struct clk mcspi4_ick = {
diff --git a/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c b/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c
index dd155c4..37af75a 100644
--- a/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c
+++ b/arch/arm/mach-omap2/omap_hwmod_3xxx_data.c
@@ -239,7 +239,7 @@ static struct omap_hwmod_ocp_if omap3xxx_l4_core__hdq1w = {
 	.clk		= "hdq_ick",
 	.addr		= omap2_hdq1w_addr_space,
 	.user		= OCP_USER_MPU | OCP_USER_SDMA,
-	.flags		= OMAP_FIREWALL_L4
+	.flags		= OMAP_FIREWALL_L4 | OCPIF_SWSUP_IDLE,
 };
 
 /* L4 CORE -> UART1 interface */
diff --git a/arch/arm/plat-omap/clock.c b/arch/arm/plat-omap/clock.c
index 567e4b5..3fce16a 100644
--- a/arch/arm/plat-omap/clock.c
+++ b/arch/arm/plat-omap/clock.c
@@ -348,7 +348,9 @@ int omap_clk_enable_autoidle_all(void)
 	spin_lock_irqsave(&clockfw_lock, flags);
 
 	list_for_each_entry(c, &clocks, node)
-		if (c->ops->allow_idle)
+		if (c->flags & SWSUP_ICLK_IDLE && c->ops->deny_idle)
+			c->ops->deny_idle(c);
+		else if (c->ops->allow_idle)
 			c->ops->allow_idle(c);
 
 	spin_unlock_irqrestore(&clockfw_lock, flags);
diff --git a/arch/arm/plat-omap/include/plat/clock.h b/arch/arm/plat-omap/include/plat/clock.h
index 240a7b9..f4f6e7b 100644
--- a/arch/arm/plat-omap/include/plat/clock.h
+++ b/arch/arm/plat-omap/include/plat/clock.h
@@ -191,6 +191,7 @@ struct dpll_data {
 #define ENABLE_ON_INIT		(1 << 3)	/* Enable upon framework init */
 #define INVERT_ENABLE		(1 << 4)	/* 0 enables, 1 disables */
 #define CLOCK_CLKOUTX2		(1 << 5)
+#define SWSUP_ICLK_IDLE		(1 << 6)
 
 /**
  * struct clk - OMAP struct clk
-- 
1.7.8.3


^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?
  2012-01-28  6:02                             ` Paul Walmsley
@ 2012-02-01  7:51                               ` NeilBrown
  2012-02-01 18:36                                 ` Paul Walmsley
  0 siblings, 1 reply; 64+ messages in thread
From: NeilBrown @ 2012-02-01  7:51 UTC (permalink / raw)
  To: Paul Walmsley; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

[-- Attachment #1: Type: text/plain, Size: 2076 bytes --]

On Fri, 27 Jan 2012 23:02:51 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:


> Since the HDQ module doesn't support the idle protocol, the target clock 
> FSM in the CM is what should determine whether the module is considered 
> idle or not.  And as long as the bit in the CM_FCLKEN register 
> corresponding to HDQ_FCLK is set, the FSM should consider the module as 
> active, and the clockdomain should not be allowed to go inactive.
> 
> I write "should."  Given that big caution at the bottom of section 20.4.5 
> "System Power Management and Wakeup", this appears to have not been 
> correctly implemented.

Can you help me understand why you say "should" there.
You seem to be implying that a down-stream clock gate should cause an
upstream clock to stay active, but that doesn't make sense to me given the
presence of the much richer SIDLE framework.  Also I cannot find it in the
TRM (though there a lots of words in there and I might have missed some).

Most modules use SIDLE to keep the upstream clock active.  HDQ doesn't have
that so it needs software over-rid to keep it active.

> 
> > It also mentions the AUTO_HDQ bit of PRCM.CM_AUTOIDLE1_CORE.  This had me
> > confused for a while, but I think it only affects the iclk.  
> 
> That's correct.
> 
> > I don't think the iclk is the problem.  It is the fclk that is 
> > disappearing because the clkdm that feeds it is being turned off.
> 
> Disabling AUTO_HDQ is probably a reasonable workaround.  It would prevent 
> the CORE_L4 clockdomain from going inactive, and that happens to be where 
> the functional clock originates from, also.
> 
> We have a partial facility to handle this type of bug in the existing 
> code.  Could you please try the patch enclosed at the bottom of this 
> E-mail and see if it helps?

Yes, that patch fixes my problem too - the HDQ keeps working.

(it's a bit smoke-and-mirrors though .. I want fclk to stay on, so let's make
sure iclk doesn't autoidle, because we *know* they have the same source :-)


Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: OMAP HDQ: was Re: DSS2/PM on 3.2 broken?
  2012-02-01  7:51                               ` NeilBrown
@ 2012-02-01 18:36                                 ` Paul Walmsley
  0 siblings, 0 replies; 64+ messages in thread
From: Paul Walmsley @ 2012-02-01 18:36 UTC (permalink / raw)
  To: NeilBrown; +Cc: Joe Woodward, khilman, t-kristo, govindraj.raja, linux-omap

On Wed, 1 Feb 2012, NeilBrown wrote:

> On Fri, 27 Jan 2012 23:02:51 -0700 (MST) Paul Walmsley <paul@pwsan.com> wrote:
> 
> > Since the HDQ module doesn't support the idle protocol, the target clock 
> > FSM in the CM is what should determine whether the module is considered 
> > idle or not.  And as long as the bit in the CM_FCLKEN register 
> > corresponding to HDQ_FCLK is set, the FSM should consider the module as 
> > active, and the clockdomain should not be allowed to go inactive.
> > 
> > I write "should."  Given that big caution at the bottom of section 20.4.5 
> > "System Power Management and Wakeup", this appears to have not been 
> > correctly implemented.
> 
> Can you help me understand why you say "should" there.
> You seem to be implying that a down-stream clock gate should cause an
> upstream clock to stay active, but that doesn't make sense to me given the
> presence of the much richer SIDLE framework.  Also I cannot find it in the
> TRM (though there a lots of words in there and I might have missed some).

Functional clocks are almost never auto-idled by the hardware[*].  So 
when the kernel sets a CM_FCLKEN* bit in a register for a clockdomain, 
that clockdomain should never go inactive.

As I understand it, main functional clocks on I/O IP blocks aren't 
automatically idled for a few basic reasons.  One is that the main 
functional clock has to be running in order for register accesses to 
succeed, so initiators have to have some way of preventing it from going 
off while they are interacting with it.  The PRCM doesn't snoop the 
interconnect :-)  Another is that some of the I/O IP blocks can continue 
to operate after they've idle-acked.  They idle-ack, which could allow 
clocks to be gated to interconnects, but the module can still continue to 
fill or drain FIFOs, which obviously requires a clock.  This was what is 
not working correctly with the UARTs... some events that generate 
interrupts can't generate SWAKEUPs :-(  And there are a few other lesser 
reasons that I am aware of that are not worth going into at this point.

Even interface clock auto-idle works in a different way than what one 
might expect.  Even if a module idle-acks, the interface clock will stay 
on until the interface clockdomain can go idle.  So suppose that you have 
an interface clockdomain with 20 modules and 19 of them idle-ack.  The 
interface clock stays supplied to those 19 modules until the 20th 
idle-acks.  This ensures that an initiator access to a target module will 
succeed.  (This is true for OMAP2/3; OMAP4 does something more clever.) In 
general, setting ICLKEN/FCLKEN bits to zero doesn't shut down clocks 
directly; it simply indicates that the clocks may be shut down once other 
users of that clock also indicate that they don't need the clocks.

> Most modules use SIDLE to keep the upstream clock active.

It's the other way around.  The device driver uses the FCLKEN bits to keep 
the upstream clock active :-)

> > Disabling AUTO_HDQ is probably a reasonable workaround.  It would prevent 
> > the CORE_L4 clockdomain from going inactive, and that happens to be where 
> > the functional clock originates from, also.
> > 
> > We have a partial facility to handle this type of bug in the existing 
> > code.  Could you please try the patch enclosed at the bottom of this 
> > E-mail and see if it helps?
> 
> Yes, that patch fixes my problem too - the HDQ keeps working.

Great; thanks for the test.  We'll implement a variant of that for 
mainline.

Please also let me know if you wind up having wakeup problems with HDQ, 
especially with MPU off-mode.  If you don't, obviously that's good; but I 
suspect you might want it at some point.

> (it's a bit smoke-and-mirrors though .. I want fclk to stay on, so let's make
> sure iclk doesn't autoidle, because we *know* they have the same source :-)

A theoretically clearer workaround would be to place the main functional 
clock's clockdomain into software-supervised wakeup as long as the hwmod 
code thinks that the HDQ IP block is active.  But that is not so easy to 
implement with our current codebase.  We'd need to implement usecounting 
for our clockdomain control operations, and the other PM code assumes that 
it can force clockdomain states directly.  So I'm not too eager to do that 
rewrite at the moment given all the other stuff that's going on...  
Meanwhile, this fix targets the target clock FSM, and that's where the bug 
is, albeit on a slightly different path.  It's also functionally 
equivalent, given that the HDQ apparently can't send wakeups.


- Paul


*. Main functional clocks can be autoidled if they are 
interface+functional clocks.  This is the case for some purely 
computational, non-I/O IP blocks such as MAILBOXES.  Those combined 
interface+functional clocks are controlled via ICLKEN registers, not 
FCLKEN registers.



^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2012-02-01 18:36 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-09 12:46 DSS2/PM on 3.2 broken? Joe Woodward
2012-01-09 21:08 ` NeilBrown
2012-01-10  9:58   ` Joe Woodward
2012-01-11 13:43   ` Paul Walmsley
2012-01-11 14:22     ` Archit
2012-01-11 15:15       ` Joe Woodward
2012-01-11 15:52         ` Archit
2012-01-11 16:13           ` Joe Woodward
2012-01-11 16:54             ` Archit
2012-01-12  9:28         ` Tomi Valkeinen
2012-01-12  9:30           ` Tomi Valkeinen
2012-01-12  9:51           ` Tomi Valkeinen
2012-01-11 22:59     ` NeilBrown
2012-01-13 10:05       ` Paul Walmsley
2012-01-13 11:20         ` NeilBrown
2012-01-13 11:31           ` Paul Walmsley
2012-01-13 23:09             ` NeilBrown
2012-01-13 23:35               ` Paul Walmsley
2012-01-17 21:24               ` NeilBrown
2012-01-22  0:07                 ` Paul Walmsley
2012-01-22 11:30                   ` NeilBrown
2012-01-24 10:37                   ` OMAP HDQ: was " NeilBrown
2012-01-26 14:19                     ` Paul Walmsley
2012-01-27 22:35                       ` NeilBrown
2012-01-27 22:58                         ` Paul Walmsley
2012-01-28  0:40                           ` NeilBrown
2012-01-28  6:02                             ` Paul Walmsley
2012-02-01  7:51                               ` NeilBrown
2012-02-01 18:36                                 ` Paul Walmsley
2012-01-18  7:13           ` Tomi Valkeinen
2012-01-18 11:15             ` NeilBrown
2012-01-18 11:42               ` Tomi Valkeinen
2012-01-18 20:30                 ` NeilBrown
2012-01-19 10:17                   ` Joe Woodward
2012-01-19 10:40                     ` Tomi Valkeinen
2012-01-19 11:29                       ` Joe Woodward
2012-01-19 11:36                         ` Tomi Valkeinen
2012-01-19 12:21                           ` Joe Woodward
2012-01-19 14:52                             ` Tomi Valkeinen
2012-01-19 19:37                             ` Kevin Hilman
2012-01-19 21:05                               ` NeilBrown
2012-01-20  0:22                                 ` Kevin Hilman
2012-01-21 12:12                                   ` NeilBrown
2012-01-23 22:11                                     ` Kevin Hilman
2012-01-25  0:32                                       ` NeilBrown
2012-01-13 11:34         ` Govindraj
2012-01-13 13:23           ` Paul Walmsley
2012-01-13 19:21         ` Kevin Hilman
2012-01-13 22:37           ` Kevin Hilman
2012-01-13 23:06             ` Paul Walmsley
2012-01-13 23:34               ` Paul Walmsley
2012-01-14  1:17                 ` NeilBrown
2012-01-14  1:28                   ` Paul Walmsley
2012-01-13 23:39               ` Paul Walmsley
2012-01-13 11:19       ` Paul Walmsley
2012-01-11 13:32 ` Paul Walmsley
2012-01-12 16:42 ` Tomi Valkeinen
2012-01-12 22:40   ` Kevin Hilman
2012-01-13  5:29     ` Tomi Valkeinen
2012-01-13 19:30       ` Kevin Hilman
2012-01-16 11:11         ` Tomi Valkeinen
2012-01-19 19:24           ` Kevin Hilman
2012-01-20  7:16             ` Tomi Valkeinen
2012-01-20 18:06               ` Kevin Hilman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.